Task 13119553

Name	hadcm3n_yj9z_1900_40_007357841_0
Workunit	7555271
Created	6 Jul 2011, 14:55:44 UTC
Sent	8 Jul 2011, 20:44:44 UTC
Report deadline	8 Oct 2011, 4:11:55 UTC
Received	19 Jul 2011, 17:04:08 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1113142
Run time	7 days 3 hours 33 min 13 sec
CPU time	6 days 22 hours 57 min 29 sec
Validate state	Invalid
Credit	4,043.52
Device peak FLOPS	2.61 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.10.58</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 03:25:46 (7564): No heartbeat from core client for 30 sec - exiting 03:25:47 (7564): No heartbeat from core client for 30 sec - exiting 03:25:49 (7564): No heartbeat from core client for 30 sec - exiting 03:25:50 (7564): No heartbeat from core client for 30 sec - exiting 03:25:51 (7564): No heartbeat from core client for 30 sec - exiting 03:25:52 (7564): No heartbeat from core client for 30 sec - exiting 03:25:53 (7564): No heartbeat from core client for 30 sec - exiting 03:25:54 (7564): No heartbeat from core client for 30 sec - exiting 03:25:55 (7564): No heartbeat from core client for 30 sec - exiting 03:25:56 (7564): No heartbeat from core client for 30 sec - exiting 03:25:57 (7564): No heartbeat from core client for 30 sec - exiting 03:25:58 (7564): No heartbeat from core client for 30 sec - exiting 03:25:59 (7564): No heartbeat from core client for 30 sec - exiting 03:26:01 (7564): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6764, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6764, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6764, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6764, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6764, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6764, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
25 Jul 2011 17:55:52	1113142	13119553	hadcm3n_yj9z_1900_40_007357841_0	336,960	587,137	1.7425
25 Jul 2011 17:34:39	1113142	13119553	hadcm3n_yj9z_1900_40_007357841_0	311,040	541,369	1.7405
25 Jul 2011 16:48:06	1113142	13119553	hadcm3n_yj9z_1900_40_007357841_0	285,120	495,818	1.7390
25 Jul 2011 16:20:35	1113142	13119553	hadcm3n_yj9z_1900_40_007357841_0	259,200	450,518	1.7381
25 Jul 2011 13:39:20	1113142	13119553	hadcm3n_yj9z_1900_40_007357841_0	233,280	404,488	1.7339
25 Jul 2011 13:39:20	1113142	13119553	hadcm3n_yj9z_1900_40_007357841_0	207,360	358,852	1.7306
25 Jul 2011 13:39:20	1113142	13119553	hadcm3n_yj9z_1900_40_007357841_0	181,440	313,313	1.7268
25 Jul 2011 13:39:20	1113142	13119553	hadcm3n_yj9z_1900_40_007357841_0	155,520	268,457	1.7262
25 Jul 2011 13:39:20	1113142	13119553	hadcm3n_yj9z_1900_40_007357841_0	129,600	223,173	1.7220
25 Jul 2011 13:39:20	1113142	13119553	hadcm3n_yj9z_1900_40_007357841_0	103,680	179,072	1.7272
25 Jul 2011 13:39:20	1113142	13119553	hadcm3n_yj9z_1900_40_007357841_0	77,760	133,435	1.7160
10 Jul 2011 13:23:40	1113142	13119553	hadcm3n_yj9z_1900_40_007357841_0	51,840	88,061	1.6987
10 Jul 2011 00:03:16	1113142	13119553	hadcm3n_yj9z_1900_40_007357841_0	25,920	43,822	1.6907