Task 12492371

Name	famous_wnqe_1199_200_007117869_0
Workunit	7316229
Created	16 Jan 2011, 15:29:25 UTC
Sent	18 Jan 2011, 21:23:47 UTC
Report deadline	20 Apr 2011, 4:50:58 UTC
Received	13 Jun 2011, 22:42:41 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	-226 (0xFFFFFF1E) ERR_TOO_MANY_EXITS
Computer ID	1103724
Run time	30 days 4 hours 30 min 8 sec
CPU time	18 days 19 hours 1 min 15 sec
Validate state	Invalid
Credit	3,675.00
Device peak FLOPS	0.82 GFLOPS
Application version	UK Met Office FAMOUS v6.11 windows_intelx86
Stderr	<core_client_version>6.6.28</core_client_version> <![CDATA[ <message> too many exit(0)s </message> <stderr_txt> 09:51:40 (2168): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3728, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2512, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4484, iMonCtr=1 Model crash detected, will try to restart... 09:52:52 (6088): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 60 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 61 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 CCPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 07:43:32 (1724): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 07:44:23 (2704): Can't acquire lockfile (32) - waiting 35s CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4972, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 60 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 61 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 60 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 61 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5588, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2260, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3648, iMonCtr=1 Model crash detected, will try to restart... 16:13:54 (6124): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1660, iMonCtr=1 Model crash detected, will try to restart... C09:35:09 (3552): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4456, iMonCtr=1 Model crash detected, will try to restart... BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 60 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 61 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4708, iMonCtr=1 Model crash detected, will try to restart... 18:45:26 (2072): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5976, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3408, iMonCtr=1 Model crash detected, will try to restart... 15:35:04 (1576): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2096, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5080, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4420, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5468, iMonCtr=1 Model crash detected, will try to restart... 23:17:28 (5536): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:17:49 (4772): Can't acquire lockfile (32) - waiting 35s Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4664, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3484, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... 09:30:11 (3324): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 00:20:44 (6040): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 08:54:25 (2620): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6048, iMonCtr=1 Model crash detected, will try to restart... 06:24:24 (4716): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 06:24:40 (4716): No heartbeat from core client for 30 sec - exiting BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 60 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 61 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 BUFFIN: Read Failed: Result too large BUFFIN: C I/O Error feof - Unit 60 - Return code = 16 BUFFIN: Read Failed: Result too large BUFFIN: C I/O Error feof - Unit 61 - Return code = 16 BUFFIN: Read Failed: Result too large BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: Read Failed: Result too large BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4540, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... C09:20:12 (5656): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 11:05:13 (4964): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 14:11:43 (3856): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 14:11:44 (3856): No heartbeat from core client for 30 sec - exiting 14:11:45 (3856): No heartbeat from core client for 30 sec - exiting 14:11:46 (3856): No heartbeat from core client for 30 sec - exiting 14:11:47 (3856): No heartbeat from core client for 30 sec - exiting 14:11:48 (3856): No heartbeat from core client for 30 sec - exiting 14:11:49 (3856): No heartbeat from core client for 30 sec - exiting 14:11:50 (3856): No heartbeat from core client for 30 sec - exiting 14:11:51 (3856): No heartbeat from core client for 30 sec - exiting 14:11:52 (3856): No heartbeat from core client for 30 sec - exiting 14:11:53 (3856): No heartbeat from core client for 30 sec - exiting 19:45:52 (1560): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 11:38:55 (4576): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 15:56:55 (5624): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4772, iMonCtr=1 Model crash detected, will try to restart... BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 60 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 61 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
13 Jun 2011 18:55:41	1103724	12492371	famous_wnqe_1199_200_007117869_0	1,113,866	1,617,457	1.4521
10 Jun 2011 15:27:15	1103724	12492371	famous_wnqe_1199_200_007117869_0	1,104,506	1,593,103	1.4424
08 Jun 2011 06:17:44	1103724	12492371	famous_wnqe_1199_200_007117869_0	1,095,146	1,577,461	1.4404
07 Jun 2011 05:02:26	1103724	12492371	famous_wnqe_1199_200_007117869_0	1,085,786	1,563,135	1.4396
07 Jun 2011 00:28:46	1103724	12492371	famous_wnqe_1199_200_007117869_0	1,076,426	1,549,591	1.4396
06 Jun 2011 17:09:01	1103724	12492371	famous_wnqe_1199_200_007117869_0	1,067,066	1,533,186	1.4368
06 Jun 2011 00:56:30	1103724	12492371	famous_wnqe_1199_200_007117869_0	1,057,706	1,518,398	1.4356
05 Jun 2011 04:09:44	1103724	12492371	famous_wnqe_1199_200_007117869_0	1,048,346	1,498,967	1.4298
03 Jun 2011 17:56:07	1103724	12492371	famous_wnqe_1199_200_007117869_0	1,038,986	1,478,304	1.4228
02 Jun 2011 14:31:46	1103724	12492371	famous_wnqe_1199_200_007117869_0	1,029,626	1,461,753	1.4197
01 Jun 2011 14:21:17	1103724	12492371	famous_wnqe_1199_200_007117869_0	1,020,266	1,441,346	1.4127
31 May 2011 17:13:04	1103724	12492371	famous_wnqe_1199_200_007117869_0	1,010,906	1,422,543	1.4072
28 May 2011 11:25:23	1103724	12492371	famous_wnqe_1199_200_007117869_0	1,001,546	1,403,478	1.4013
26 May 2011 13:36:34	1103724	12492371	famous_wnqe_1199_200_007117869_0	992,186	1,382,632	1.3935
23 May 2011 15:41:55	1103724	12492371	famous_wnqe_1199_200_007117869_0	982,826	1,358,343	1.3821
20 May 2011 13:17:44	1103724	12492371	famous_wnqe_1199_200_007117869_0	973,466	1,335,840	1.3723
16 May 2011 14:46:16	1103724	12492371	famous_wnqe_1199_200_007117869_0	964,106	1,309,072	1.3578
11 May 2011 16:10:41	1103724	12492371	famous_wnqe_1199_200_007117869_0	954,746	1,275,726	1.3362
08 May 2011 02:30:29	1103724	12492371	famous_wnqe_1199_200_007117869_0	945,386	1,258,786	1.3315
05 May 2011 19:19:33	1103724	12492371	famous_wnqe_1199_200_007117869_0	936,026	1,237,986	1.3226