Task 13609471

Name	hadcm3n_yd1z_1940_40_007539590_1
Workunit	7736822
Created	6 Nov 2011, 3:01:15 UTC
Sent	9 Nov 2011, 6:34:04 UTC
Report deadline	8 Feb 2012, 14:01:15 UTC
Received	19 Dec 2011, 21:40:22 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1082927
Run time	23 days 7 hours 15 min 12 sec
CPU time	22 days 23 hours 33 min 7 sec
Validate state	Invalid
Credit	10,264.32
Device peak FLOPS	2.07 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.12.34</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> 12:48:59 (2608): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6136, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5424, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2096, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=11:15:46 (6744): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6388, iMonCtr=1 Model crash detected, will try to restart... 19:32:02 (4624): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2616, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2616, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2616, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CCPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3304, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5924, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6628, iMonCtr=1 Model crash detected, will try to restart... 10:05:51 (5784): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 08:10:33 (3148): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 17:28:40 (1184): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7564, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3428, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3740, iMonCtr=1 Model crash detected, will try to restart... CCPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4980, iMonCtr=1 Model crash detected, will try to restart... 11:20:24 (7488): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7504, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7504, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7504, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7504, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7476, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7476, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7476, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=8096, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7648, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7648, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3724, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
19 Dec 2011 04:01:37	1082927	13609471	hadcm3n_yd1z_1940_40_007539590_1	855,360	1,954,649	2.2852
17 Dec 2011 19:33:10	1082927	13609471	hadcm3n_yd1z_1940_40_007539590_1	829,440	1,891,943	2.2810
16 Dec 2011 07:16:10	1082927	13609471	hadcm3n_yd1z_1940_40_007539590_1	803,520	1,832,791	2.2810
14 Dec 2011 18:37:43	1082927	13609471	hadcm3n_yd1z_1940_40_007539590_1	777,600	1,772,136	2.2790
13 Dec 2011 17:13:44	1082927	13609471	hadcm3n_yd1z_1940_40_007539590_1	751,680	1,710,201	2.2752
12 Dec 2011 17:01:11	1082927	13609471	hadcm3n_yd1z_1940_40_007539590_1	725,760	1,649,691	2.2731
11 Dec 2011 17:09:03	1082927	13609471	hadcm3n_yd1z_1940_40_007539590_1	699,840	1,588,581	2.2699
10 Dec 2011 04:45:14	1082927	13609471	hadcm3n_yd1z_1940_40_007539590_1	673,920	1,527,669	2.2668
09 Dec 2011 05:16:47	1082927	13609471	hadcm3n_yd1z_1940_40_007539590_1	648,000	1,464,908	2.2607
08 Dec 2011 03:31:32	1082927	13609471	hadcm3n_yd1z_1940_40_007539590_1	622,080	1,404,152	2.2572
07 Dec 2011 10:37:18	1082927	13609471	hadcm3n_yd1z_1940_40_007539590_1	596,160	1,345,223	2.2565
06 Dec 2011 17:48:01	1082927	13609471	hadcm3n_yd1z_1940_40_007539590_1	570,240	1,285,808	2.2549
05 Dec 2011 03:59:49	1082927	13609471	hadcm3n_yd1z_1940_40_007539590_1	544,320	1,227,151	2.2545
04 Dec 2011 08:04:46	1082927	13609471	hadcm3n_yd1z_1940_40_007539590_1	518,400	1,167,394	2.2519
03 Dec 2011 14:12:32	1082927	13609471	hadcm3n_yd1z_1940_40_007539590_1	492,480	1,107,384	2.2486
02 Dec 2011 19:06:20	1082927	13609471	hadcm3n_yd1z_1940_40_007539590_1	466,560	1,048,718	2.2478
01 Dec 2011 06:14:53	1082927	13609471	hadcm3n_yd1z_1940_40_007539590_1	440,640	985,096	2.2356
30 Nov 2011 03:34:57	1082927	13609471	hadcm3n_yd1z_1940_40_007539590_1	414,720	923,673	2.2272
27 Nov 2011 11:06:19	1082927	13609471	hadcm3n_yd1z_1940_40_007539590_1	388,800	863,832	2.2218
26 Nov 2011 02:02:53	1082927	13609471	hadcm3n_yd1z_1940_40_007539590_1	362,880	803,564	2.2144