Task 15811543

Name	hadcm3n_o5oe_1940_40_008380448_0
Workunit	8531307
Created	31 May 2013, 21:51:09 UTC
Sent	16 Jun 2013, 19:34:27 UTC
Report deadline	16 Sep 2013, 3:01:38 UTC
Received	29 Aug 2013, 0:34:36 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1290798
Run time	27 days 20 hours 38 min 37 sec
CPU time	26 days 19 hours 29 min 30 sec
Validate state	Invalid
Credit	10,886.40
Device peak FLOPS	2.40 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>7.0.64</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> 20:57:18 (5584): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5540, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6008, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4408, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5024, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4964, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4964, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4964, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4964, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4964, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4964, iMonCtr=1 Model crash detected, will try to restart... 20:31:13 (5968): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:31:14 (5968): No heartbeat from core client for 30 sec - exiting 20:31:15 (5968): No heartbeat from core client for 30 sec - exiting 20:31:16 (5968): No heartbeat from core client for 30 sec - exiting 20:31:17 (5968): No heartbeat from core client for 30 sec - exiting 20:31:18 (5968): No heartbeat from core client for 30 sec - exiting 20:31:20 (5968): No heartbeat from core client for 30 sec - exiting 20:31:21 (5968): No heartbeat from core client for 30 sec - exiting 20:31:22 (5968): No heartbeat from core client for 30 sec - exiting 20:31:23 (5968): No heartbeat from core client for 30 sec - exiting 20:31:24 (5968): No heartbeat from core client for 30 sec - exiting 20:31:25 (5968): No heartbeat from core client for 30 sec - exiting 20:31:26 (5968): No heartbeat from core client for 30 sec - exiting 20:31:27 (5968): No heartbeat from core client for 30 sec - exiting 20:31:28 (5968): No heartbeat from core client for 30 sec - exiting 20:31:29 (5968): No heartbeat from core client for 30 sec - exiting 20:31:30 (5968): No heartbeat from core client for 30 sec - exiting 20:31:32 (5968): No heartbeat from core client for 30 sec - exiting 20:31:33 (5968): No heartbeat from core client for 30 sec - exiting 20:31:34 (5968): No heartbeat from core client for 30 sec - exiting 20:31:35 (5968): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2256, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2256, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5360, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5368, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5368, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4932, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4868, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2972, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2972, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5240, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4644, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7036, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5060, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5076, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5076, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5076, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5076, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5076, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5968, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3544, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5052, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4964, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=852, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=852, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4744, iMonCtr=1 Model crash detected, will try to restart... CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4872, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4436, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4436, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4904, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4904, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4904, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=8012, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=8012, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
18 Aug 2013 19:31:53	1189727	15811543	hadcm3n_o5oe_1940_40_008380448_0	907,200	2,276,786	2.5097
18 Aug 2013 01:08:04	1189727	15811543	hadcm3n_o5oe_1940_40_008380448_0	881,280	2,211,233	2.5091
17 Aug 2013 06:14:09	1189727	15811543	hadcm3n_o5oe_1940_40_008380448_0	855,360	2,145,655	2.5085
16 Aug 2013 00:42:41	1189727	15811543	hadcm3n_o5oe_1940_40_008380448_0	829,440	2,085,672	2.5146
14 Aug 2013 19:54:34	1189727	15811543	hadcm3n_o5oe_1940_40_008380448_0	803,520	2,027,205	2.5229
14 Aug 2013 19:54:34	1189727	15811543	hadcm3n_o5oe_1940_40_008380448_0	777,600	1,963,640	2.5253
14 Aug 2013 19:54:34	1189727	15811543	hadcm3n_o5oe_1940_40_008380448_0	751,680	1,898,461	2.5256
14 Aug 2013 19:54:34	1189727	15811543	hadcm3n_o5oe_1940_40_008380448_0	725,760	1,832,959	2.5256
14 Aug 2013 19:54:34	1189727	15811543	hadcm3n_o5oe_1940_40_008380448_0	699,840	1,772,358	2.5325
14 Aug 2013 19:54:34	1189727	15811543	hadcm3n_o5oe_1940_40_008380448_0	673,920	1,713,460	2.5425
24 Jul 2013 20:58:58	1189727	15811543	hadcm3n_o5oe_1940_40_008380448_0	648,000	1,649,564	2.5456
23 Jul 2013 21:58:38	1189727	15811543	hadcm3n_o5oe_1940_40_008380448_0	622,080	1,579,014	2.5383
23 Jul 2013 20:56:09	1189727	15811543	hadcm3n_o5oe_1940_40_008380448_0	596,160	1,511,659	2.5357
23 Jul 2013 20:11:44	1189727	15811543	hadcm3n_o5oe_1940_40_008380448_0	570,240	1,439,726	2.5248
23 Jul 2013 18:53:18	1189727	15811543	hadcm3n_o5oe_1940_40_008380448_0	544,320	1,363,941	2.5058
23 Jul 2013 18:53:16	1189727	15811543	hadcm3n_o5oe_1940_40_008380448_0	518,400	1,294,220	2.4966
23 Jul 2013 18:53:16	1189727	15811543	hadcm3n_o5oe_1940_40_008380448_0	492,480	1,223,534	2.4844
23 Jul 2013 18:53:16	1189727	15811543	hadcm3n_o5oe_1940_40_008380448_0	466,560	1,148,636	2.4619
23 Jul 2013 18:53:15	1189727	15811543	hadcm3n_o5oe_1940_40_008380448_0	440,640	1,073,940	2.4372
11 Jul 2013 04:33:48	1189727	15811543	hadcm3n_o5oe_1940_40_008380448_0	414,720	1,002,855	2.4181