Task 14365001

Name	hadcm3n_o1xx_2020_40_007858089_1
Workunit	8013201
Created	5 Apr 2012, 15:13:40 UTC
Sent	5 Apr 2012, 15:19:19 UTC
Report deadline	5 Jul 2012, 22:46:30 UTC
Received	22 May 2012, 17:35:06 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	928765
Run time	29 days 15 hours 17 min 1 sec
CPU time	29 days 15 hours 17 min 1 sec
Validate state	Invalid
Credit	5,598.72
Device peak FLOPS	1.82 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.2.19</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 17:01:32 (3180): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 03:01:58 (7196): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 19:36:35 (3472): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:36:36 (3472): No heartbeat from core client for 30 sec - exiting 19:36:37 (3472): No heartbeat from core client for 30 sec - exiting 19:36:38 (3472): No heartbeat from core client for 30 sec - exiting 19:36:39 (3472): No heartbeat from core client for 30 sec - exiting 19:36:40 (3472): No heartbeat from core client for 30 sec - exiting 19:36:41 (3472): No heartbeat from core client for 30 sec - exiting 19:36:42 (3472): No heartbeat from core client for 30 sec - exiting 19:36:43 (3472): No heartbeat from core client for 30 sec - exiting 19:36:44 (3472): No heartbeat from core client for 30 sec - exiting 19:36:45 (3472): No heartbeat from core client for 30 sec - exiting 19:39:30 (7948): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:39:31 (7948): No heartbeat from core client for 30 sec - exiting 19:39:32 (7948): No heartbeat from core client for 30 sec - exiting 19:39:33 (7948): No heartbeat from core client for 30 sec - exiting 19:39:35 (7948): No heartbeat from core client for 30 sec - exiting 19:39:36 (7948): No heartbeat from core client for 30 sec - exiting 19:39:37 (7948): No heartbeat from core client for 30 sec - exiting 19:39:38 (7948): No heartbeat from core client for 30 sec - exiting 19:39:39 (7948): No heartbeat from core client for 30 sec - exiting 19:39:40 (7948): No heartbeat from core client for 30 sec - exiting 19:39:41 (7948): No heartbeat from core client for 30 sec - exiting 19:43:15 (3540): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3036, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3036, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish 16:19:53 (3036): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3920, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3920, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3920, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3920, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
18 May 2012 09:17:16	928765	14365001	hadcm3n_o1xx_2020_40_007858089_1	466,560	2,445,212	5.2409
16 May 2012 17:26:57	928765	14365001	hadcm3n_o1xx_2020_40_007858089_1	440,640	2,322,645	5.2711
12 May 2012 18:54:01	928765	14365001	hadcm3n_o1xx_2020_40_007858089_1	414,720	2,207,848	5.3237
10 May 2012 23:04:18	928765	14365001	hadcm3n_o1xx_2020_40_007858089_1	388,800	2,089,810	5.3750
06 May 2012 15:01:13	928765	14365001	hadcm3n_o1xx_2020_40_007858089_1	362,880	1,945,325	5.3608
03 May 2012 20:46:27	928765	14365001	hadcm3n_o1xx_2020_40_007858089_1	336,960	1,790,879	5.3148
01 May 2012 08:57:08	928765	14365001	hadcm3n_o1xx_2020_40_007858089_1	311,040	1,655,922	5.3238
29 Apr 2012 02:53:58	928765	14365001	hadcm3n_o1xx_2020_40_007858089_1	285,120	1,520,443	5.3326
26 Apr 2012 21:31:43	928765	14365001	hadcm3n_o1xx_2020_40_007858089_1	259,200	1,386,872	5.3506
25 Apr 2012 01:23:52	928765	14365001	hadcm3n_o1xx_2020_40_007858089_1	233,280	1,260,440	5.4031
23 Apr 2012 04:16:55	928765	14365001	hadcm3n_o1xx_2020_40_007858089_1	207,360	1,117,363	5.3885
21 Apr 2012 04:36:33	928765	14365001	hadcm3n_o1xx_2020_40_007858089_1	181,440	971,640	5.3552
18 Apr 2012 12:12:03	928765	14365001	hadcm3n_o1xx_2020_40_007858089_1	155,520	838,503	5.3916
16 Apr 2012 00:29:05	928765	14365001	hadcm3n_o1xx_2020_40_007858089_1	129,600	677,910	5.2308
14 Apr 2012 05:53:19	928765	14365001	hadcm3n_o1xx_2020_40_007858089_1	103,680	541,505	5.2228
12 Apr 2012 06:26:45	928765	14365001	hadcm3n_o1xx_2020_40_007858089_1	77,760	408,582	5.2544
09 Apr 2012 10:41:34	928765	14365001	hadcm3n_o1xx_2020_40_007858089_1	51,840	273,473	5.2753
07 Apr 2012 14:23:04	928765	14365001	hadcm3n_o1xx_2020_40_007858089_1	25,920	138,885	5.3582