Task 11689521

Name	famous_v04w_1599_200_006685950_2
Workunit	6889203
Created	26 Aug 2010, 15:39:04 UTC
Sent	27 Aug 2010, 9:24:19 UTC
Report deadline	26 Nov 2010, 16:51:30 UTC
Received	20 Oct 2010, 17:01:56 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	-226 (0xFFFFFF1E) ERR_TOO_MANY_EXITS
Computer ID	1055314
Run time	8 days 10 hours 0 min 24 sec
CPU time	6 days 0 hours 26 min 15 sec
Validate state	Invalid
Credit	555.96
Device peak FLOPS	0.67 GFLOPS
Application version	UK Met Office FAMOUS v6.11 windows_intelx86
Stderr	<core_client_version>6.10.18</core_client_version> <![CDATA[ <message> too many exit(0)s </message> <stderr_txt> Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4036, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPIController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4444, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1080, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4716, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4028, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4240, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4912, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3548, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4024, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4040, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4324, iMonCtr=1 Model crash detected, will try to restart... BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 60 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 61 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3948, iMonCtr=1 Model crash detected, will try to restart... 19:15:18 (4044): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 22:14:01 (5448): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5552, iMonCtr=1 Model crash detected, will try to restart... 21:08:31 (4760): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 13:08:29 (3960): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 13:16:10 (4820): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 15:07:15 (2352): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2616, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... 19:18:48 (4080): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 15:52:42 (3668): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4068, iMonCtr=1 Model crash detected, will try to restart... 11:34:29 (5012): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 14:46:27 (4132): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 16:45:38 (5124): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:44:14 (5956): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:42:59 (4200): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4156, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3936, iMonCtr=1 Model crash detected, will try to restart... BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 60 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 61 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 14:00:34 (5112): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=556, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... 10:32:03 (4428): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:08:31 (4512): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 22:07:25 (4036): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 60 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 61 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4836, iMonCtr=1 Model crash detected, will try to restart... BUFFIN: Read Failed: Result too large BUFFIN: C I/O Error feof - Unit 60 - Return code = 16 BUFFIN: Read Failed: Result too large BUFFIN: C I/O Error feof - Unit 61 - Return code = 16 BUFFIN: Read Failed: Result too large BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: Read Failed: Result too large BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 19:41:05 (2248): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:40:09 (4452): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 12:37:30 (4740): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 15:36:28 (3092): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 17:35:37 (4284): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
20 Oct 2010 11:49:41	1055314	11689521	famous_v04w_1599_200_006685950_2	168,506	510,684	3.0307
17 Oct 2010 11:08:18	1055314	11689521	famous_v04w_1599_200_006685950_2	159,146	482,577	3.0323
16 Oct 2010 07:10:49	1055314	11689521	famous_v04w_1599_200_006685950_2	149,786	453,854	3.0300
23 Sep 2010 06:12:17	1055314	11689521	famous_v04w_1599_200_006685950_2	140,426	425,541	3.0304
21 Sep 2010 20:44:12	1055314	11689521	famous_v04w_1599_200_006685950_2	131,066	397,043	3.0293
20 Sep 2010 11:14:32	1055314	11689521	famous_v04w_1599_200_006685950_2	121,706	368,429	3.0272
15 Sep 2010 14:17:03	1055314	11689521	famous_v04w_1599_200_006685950_2	112,346	341,279	3.0377
14 Sep 2010 15:15:58	1055314	11689521	famous_v04w_1599_200_006685950_2	102,986	312,857	3.0379
13 Sep 2010 10:28:10	1055314	11689521	famous_v04w_1599_200_006685950_2	93,626	284,569	3.0394
12 Sep 2010 10:01:50	1055314	11689521	famous_v04w_1599_200_006685950_2	84,266	256,276	3.0413
11 Sep 2010 13:24:57	1055314	11689521	famous_v04w_1599_200_006685950_2	74,906	227,951	3.0432
09 Sep 2010 17:10:18	1055314	11689521	famous_v04w_1599_200_006685950_2	65,546	199,565	3.0447
08 Sep 2010 12:56:46	1055314	11689521	famous_v04w_1599_200_006685950_2	56,186	171,038	3.0441
07 Sep 2010 09:34:37	1055314	11689521	famous_v04w_1599_200_006685950_2	46,826	141,517	3.0222
04 Sep 2010 08:33:34	1055314	11689521	famous_v04w_1599_200_006685950_2	37,466	112,549	3.0040
04 Sep 2010 06:24:41	1055314	11689521	famous_v04w_1599_200_006685950_2	28,106	84,496	3.0063
30 Aug 2010 05:50:43	1055314	11689521	famous_v04w_1599_200_006685950_2	18,746	56,221	2.9991
28 Aug 2010 16:59:01	1055314	11689521	famous_v04w_1599_200_006685950_2	9,386	28,594	3.0465