Task 11488144

Name	famous_uba3_1599_200_006647462_4
Workunit	6850834
Created	10 Jun 2010, 13:08:04 UTC
Sent	11 Aug 2010, 7:24:21 UTC
Report deadline	10 Nov 2010, 14:51:32 UTC
Received	24 Oct 2010, 19:30:56 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	976532
Run time	16 days 9 hours 48 min 17 sec
CPU time	13 days 6 hours 0 min 11 sec
Validate state	Invalid
Credit	4,972.03
Device peak FLOPS	1.74 GFLOPS
Application version	UK Met Office FAMOUS v6.11 windows_intelx86
Stderr	<core_client_version>6.6.28</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> 05:12:27 (5260): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:14:44 (5100): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4920, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4396, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2912, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 10:16:30 (7132): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 15:30:12 (2404): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4968, iMonCtr=1 Model crash detected, will try to restart... BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 60 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 61 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 60 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 61 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 BUFFIN: Read Failed: Result too large BUFFIN: C I/O Error feof - Unit 60 - Return code = 16 BUFFIN: Read Failed: Result too large BUFFIN: C I/O Error feof - Unit 61 - Return code = 16 BUFFIN: Read Failed: Result too large BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: Read Failed: Result too large BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3584, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3584, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... 08:29:23 (5260): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 16:20:26 (4368): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 15:25:04 (5092): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 15:26:43 (4712): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 11:52:29 (5112): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... BUFFOUT: Write Failed: Invalid argument BUFFOUT: C I/O Error - Return code = 32 Model crashed: WRITDUMP: BAD BUFFOUT OF DATA tmp/pipe_dummy BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 60 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 61 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 13:00:48 (3832): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... BUFFIN: Read Failed: Result too large BUFFIN: C I/O Error feof - Unit 60 - Return code = 16 BUFFIN: Read Failed: Result too large BUFFIN: C I/O Error feof - Unit 61 - Return code = 16 BUFFIN: Read Failed: Result too large BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: Read Failed: Result too large BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 13:02:40 (5684): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... BUFFIN: Read Failed: Result too large BUFFIN: C I/O Error feof - Unit 60 - Return code = 16 BUFFIN: Read Failed: Result too large BUFFIN: C I/O Error feof - Unit 61 - Return code = 16 BUFFIN: Read Failed: Result too large BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: Read Failed: Result too large BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 13:18:23 (3364): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Signal 11 received, exiting... 13:19:02 (5288): called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4340, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... 13:19:15 (492): called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4340, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... 13:19:24 (236): called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4340, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... 13:19:34 (3440): called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4340, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... 13:19:52 (992): called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4340, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... 13:20:06 (6228): called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4340, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( 13:20:11 (4340): called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
24 Oct 2010 18:54:25	976532	11488144	famous_uba3_1599_200_006647462_4	1,506,986	1,143,996	0.7591
24 Oct 2010 15:40:20	976532	11488144	famous_uba3_1599_200_006647462_4	1,497,626	1,137,285	0.7594
24 Oct 2010 13:20:34	976532	11488144	famous_uba3_1599_200_006647462_4	1,488,266	1,130,507	0.7596
24 Oct 2010 10:42:36	976532	11488144	famous_uba3_1599_200_006647462_4	1,478,906	1,123,682	0.7598
24 Oct 2010 06:30:41	976532	11488144	famous_uba3_1599_200_006647462_4	1,469,546	1,117,038	0.7601
24 Oct 2010 03:45:48	976532	11488144	famous_uba3_1599_200_006647462_4	1,460,186	1,109,879	0.7601
24 Oct 2010 03:29:31	976532	11488144	famous_uba3_1599_200_006647462_4	1,450,826	1,103,075	0.7603
23 Oct 2010 23:31:45	976532	11488144	famous_uba3_1599_200_006647462_4	1,441,466	1,096,285	0.7605
23 Oct 2010 21:25:00	976532	11488144	famous_uba3_1599_200_006647462_4	1,432,106	1,089,575	0.7608
23 Oct 2010 19:24:56	976532	11488144	famous_uba3_1599_200_006647462_4	1,422,746	1,082,724	0.7610
21 Oct 2010 17:40:38	976532	11488144	famous_uba3_1599_200_006647462_4	1,413,386	1,075,535	0.7610
19 Oct 2010 22:48:46	976532	11488144	famous_uba3_1599_200_006647462_4	1,404,026	1,067,817	0.7605
19 Oct 2010 20:14:57	976532	11488144	famous_uba3_1599_200_006647462_4	1,394,666	1,060,270	0.7602
19 Oct 2010 18:03:05	976532	11488144	famous_uba3_1599_200_006647462_4	1,385,306	1,053,355	0.7604
19 Oct 2010 15:56:59	976532	11488144	famous_uba3_1599_200_006647462_4	1,375,946	1,046,422	0.7605
19 Oct 2010 04:04:22	976532	11488144	famous_uba3_1599_200_006647462_4	1,366,586	1,039,079	0.7603
19 Oct 2010 04:04:22	976532	11488144	famous_uba3_1599_200_006647462_4	1,357,226	1,031,653	0.7601
18 Oct 2010 22:34:21	976532	11488144	famous_uba3_1599_200_006647462_4	1,347,866	1,024,514	0.7601
15 Oct 2010 11:10:49	976532	11488144	famous_uba3_1599_200_006647462_4	1,338,506	1,016,971	0.7598
15 Oct 2010 08:55:36	976532	11488144	famous_uba3_1599_200_006647462_4	1,329,146	1,009,288	0.7594