Task 15814149

Name	hadcm3n_o3yc_1940_40_008382604_0
Workunit	8533463
Created	1 Jun 2013, 5:27:09 UTC
Sent	2 Jun 2013, 15:31:42 UTC
Report deadline	1 Sep 2013, 22:58:53 UTC
Received	13 Jun 2013, 5:32:34 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1253770
Run time	8 days 6 hours 15 min 27 sec
CPU time	8 days 4 hours 54 min 11 sec
Validate state	Invalid
Credit	4,354.56
Device peak FLOPS	1.38 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>7.0.64</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> 13:47:08 (4688): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Atmos Hold Restart file rename failed on atmos_restart.hold 13:50:20 (52): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Atmos Hold Restart file rename failed on atmos_restart.hold 03:35:17 (1704): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... 03:12:30 (36): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 22:24:37 (4476): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 07:18:42 (7368): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 07:18:43 (7368): No heartbeat from core client for 30 sec - exiting 07:18:44 (7368): No heartbeat from core client for 30 sec - exiting 07:18:45 (7368): No heartbeat from core client for 30 sec - exiting 07:18:46 (7368): No heartbeat from core client for 30 sec - exiting 07:18:47 (7368): No heartbeat from core client for 30 sec - exiting 07:18:49 (7368): No heartbeat from core client for 30 sec - exiting 07:18:50 (7368): No heartbeat from core client for 30 sec - exiting 07:18:51 (7368): No heartbeat from core client for 30 sec - exiting 07:18:52 (7368): No heartbeat from core client for 30 sec - exiting 07:18:53 (7368): No heartbeat from core client for 30 sec - exiting Atmos Hold Restart file rename failed on atmos_restart.hold CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 00:06:29 (4896): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 19:05:44 (6240): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:05:45 (6240): No heartbeat from core client for 30 sec - exiting 19:05:46 (6240): No heartbeat from core client for 30 sec - exiting 19:05:47 (6240): No heartbeat from core client for 30 sec - exiting 19:05:48 (6240): No heartbeat from core client for 30 sec - exiting 19:05:49 (6240): No heartbeat from core client for 30 sec - exiting 19:05:50 (6240): No heartbeat from core client for 30 sec - exiting 19:05:51 (6240): No heartbeat from core client for 30 sec - exiting 19:05:52 (6240): No heartbeat from core client for 30 sec - exiting 19:05:53 (6240): No heartbeat from core client for 30 sec - exiting 19:05:54 (6240): No heartbeat from core client for 30 sec - exiting 23:00:19 (4236): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:31:58 (7056): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:44:16 (5680): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 04:36:17 (4848): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Atmos Hold Restart file rename failed on atmos_restart.hold Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4640, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4640, iMonCtr=1 Model crash detected, will try to restart... 19:11:46 (4640): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:02:01 (1792): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 11:10:53 (4508): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4840, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4840, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4840, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4840, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
11 Jun 2013 10:23:57	1253770	15814149	hadcm3n_o3yc_1940_40_008382604_0	362,880	706,037	1.9456
10 Jun 2013 19:44:40	1253770	15814149	hadcm3n_o3yc_1940_40_008382604_0	336,960	654,747	1.9431
10 Jun 2013 06:41:59	1253770	15814149	hadcm3n_o3yc_1940_40_008382604_0	311,040	607,960	1.9546
09 Jun 2013 17:33:35	1253770	15814149	hadcm3n_o3yc_1940_40_008382604_0	285,120	561,409	1.9690
09 Jun 2013 05:03:15	1253770	15814149	hadcm3n_o3yc_1940_40_008382604_0	259,200	516,450	1.9925
08 Jun 2013 15:36:34	1253770	15814149	hadcm3n_o3yc_1940_40_008382604_0	233,280	466,189	1.9984
07 Jun 2013 22:48:42	1253770	15814149	hadcm3n_o3yc_1940_40_008382604_0	207,360	415,931	2.0058
07 Jun 2013 08:20:20	1253770	15814149	hadcm3n_o3yc_1940_40_008382604_0	181,440	364,378	2.0083
06 Jun 2013 17:25:11	1253770	15814149	hadcm3n_o3yc_1940_40_008382604_0	155,520	311,153	2.0007
05 Jun 2013 20:39:19	1253770	15814149	hadcm3n_o3yc_1940_40_008382604_0	129,600	260,938	2.0134
05 Jun 2013 05:39:04	1253770	15814149	hadcm3n_o3yc_1940_40_008382604_0	103,680	207,564	2.0020
04 Jun 2013 12:31:31	1253770	15814149	hadcm3n_o3yc_1940_40_008382604_0	77,760	152,742	1.9643
03 Jun 2013 20:06:17	1253770	15814149	hadcm3n_o3yc_1940_40_008382604_0	51,840	101,030	1.9489
03 Jun 2013 06:21:34	1253770	15814149	hadcm3n_o3yc_1940_40_008382604_0	25,920	51,132	1.9727