Task 13593347

Name	hadcm3n_ydeq_1900_40_007527670_2
Workunit	7725145
Created	4 Nov 2011, 12:49:22 UTC
Sent	4 Nov 2011, 12:51:54 UTC
Report deadline	3 Feb 2012, 20:19:05 UTC
Received	18 Dec 2011, 18:24:19 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1340931
Run time	10 days 21 hours 56 min 10 sec
CPU time	9 days 14 hours 42 min 17 sec
Validate state	Invalid
Credit	6,531.84
Device peak FLOPS	2.83 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.10.58</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... 00:55:14 (4844): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4408, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4120, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... 21:38:46 (3020): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2864, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1960, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1480, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3884, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3256, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3256, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3256, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3256, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3256, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3256, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
18 Dec 2011 14:28:39	1097785	13593347	hadcm3n_ydeq_1900_40_007527670_2	544,320	830,083	1.5250
17 Dec 2011 15:00:17	1097785	13593347	hadcm3n_ydeq_1900_40_007527670_2	518,400	792,750	1.5292
12 Dec 2011 21:25:19	1097785	13593347	hadcm3n_ydeq_1900_40_007527670_2	492,480	754,856	1.5328
06 Dec 2011 17:53:04	1097785	13593347	hadcm3n_ydeq_1900_40_007527670_2	466,560	716,477	1.5357
04 Dec 2011 16:06:45	1097785	13593347	hadcm3n_ydeq_1900_40_007527670_2	440,640	677,084	1.5366
03 Dec 2011 15:12:48	1097785	13593347	hadcm3n_ydeq_1900_40_007527670_2	414,720	636,991	1.5360
01 Dec 2011 18:36:16	1097785	13593347	hadcm3n_ydeq_1900_40_007527670_2	388,800	596,916	1.5353
23 Nov 2011 20:17:40	1097785	13593347	hadcm3n_ydeq_1900_40_007527670_2	362,880	555,819	1.5317
22 Nov 2011 07:53:36	1097785	13593347	hadcm3n_ydeq_1900_40_007527670_2	336,960	514,038	1.5255
21 Nov 2011 16:50:39	1097785	13593347	hadcm3n_ydeq_1900_40_007527670_2	311,040	472,738	1.5199
21 Nov 2011 04:32:26	1097785	13593347	hadcm3n_ydeq_1900_40_007527670_2	285,120	432,500	1.5169
16 Nov 2011 17:48:29	1097785	13593347	hadcm3n_ydeq_1900_40_007527670_2	259,200	392,112	1.5128
15 Nov 2011 19:23:29	1097785	13593347	hadcm3n_ydeq_1900_40_007527670_2	233,280	355,665	1.5246
15 Nov 2011 19:23:29	1097785	13593347	hadcm3n_ydeq_1900_40_007527670_2	207,360	319,557	1.5411
15 Nov 2011 19:23:28	1097785	13593347	hadcm3n_ydeq_1900_40_007527670_2	181,440	278,521	1.5351
15 Nov 2011 19:23:28	1097785	13593347	hadcm3n_ydeq_1900_40_007527670_2	155,520	238,354	1.5326
15 Nov 2011 19:23:28	1097785	13593347	hadcm3n_ydeq_1900_40_007527670_2	129,600	197,278	1.5222
09 Nov 2011 18:59:55	1097785	13593347	hadcm3n_ydeq_1900_40_007527670_2	103,680	157,111	1.5153
07 Nov 2011 19:41:36	1097785	13593347	hadcm3n_ydeq_1900_40_007527670_2	77,760	117,487	1.5109
06 Nov 2011 12:56:27	1097785	13593347	hadcm3n_ydeq_1900_40_007527670_2	51,840	78,233	1.5091