Task 16277492

Name	hadcm3n_oe4h_1900_40_008473428_3
Workunit	8624267
Created	31 Jan 2014, 15:19:46 UTC
Sent	31 Jan 2014, 15:20:01 UTC
Report deadline	2 May 2014, 22:47:12 UTC
Received	12 Feb 2014, 21:15:20 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1340833
Run time	11 days 2 hours 7 min 50 sec
CPU time	10 days 16 hours 7 min 32 sec
Validate state	Invalid
Credit	8,398.08
Device peak FLOPS	2.98 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>7.2.33</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> 18:05:43 (13116): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:50:29 (5596): No heartbeat from core client for 30 sec - exiting 19:50:30 (5596): No heartbeat from core client for 30 sec - exiting 19:50:31 (5596): No heartbeat from core client for 30 sec - exiting 19:50:32 (5596): No heartbeat from core client for 30 sec - exiting 19:50:33 (5596): No heartbeat from core client for 30 sec - exiting 19:50:34 (5596): No heartbeat from core client for 30 sec - exiting 19:50:35 (5596): No heartbeat from core client for 30 sec - exiting 19:50:36 (5596): No heartbeat from core client for 30 sec - exiting 19:50:37 (5596): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2364, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2364, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2364, iMonCtr=1 Model crash detected, will try to restart... 13:04:25 (2364): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3316, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3316, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3316, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
12 Feb 2014 17:54:33	1285422	16277492	hadcm3n_oe4h_1900_40_008473428_3	699,840	921,613	1.3169
11 Feb 2014 23:39:58	1285422	16277492	hadcm3n_oe4h_1900_40_008473428_3	673,920	887,114	1.3163
11 Feb 2014 13:26:45	1285422	16277492	hadcm3n_oe4h_1900_40_008473428_3	648,000	852,586	1.3157
11 Feb 2014 04:15:05	1285422	16277492	hadcm3n_oe4h_1900_40_008473428_3	622,080	818,706	1.3161
10 Feb 2014 17:52:41	1285422	16277492	hadcm3n_oe4h_1900_40_008473428_3	596,160	784,558	1.3160
10 Feb 2014 08:12:04	1285422	16277492	hadcm3n_oe4h_1900_40_008473428_3	570,240	749,950	1.3151
09 Feb 2014 22:25:12	1285422	16277492	hadcm3n_oe4h_1900_40_008473428_3	544,320	715,771	1.3150
09 Feb 2014 13:01:25	1285422	16277492	hadcm3n_oe4h_1900_40_008473428_3	518,400	681,551	1.3147
09 Feb 2014 03:14:35	1285422	16277492	hadcm3n_oe4h_1900_40_008473428_3	492,480	647,254	1.3143
08 Feb 2014 17:31:24	1285422	16277492	hadcm3n_oe4h_1900_40_008473428_3	466,560	612,737	1.3133
08 Feb 2014 07:10:30	1285422	16277492	hadcm3n_oe4h_1900_40_008473428_3	440,640	577,942	1.3116
07 Feb 2014 21:13:19	1285422	16277492	hadcm3n_oe4h_1900_40_008473428_3	414,720	542,984	1.3093
07 Feb 2014 10:10:30	1285422	16277492	hadcm3n_oe4h_1900_40_008473428_3	388,800	507,768	1.3060
07 Feb 2014 00:18:48	1285422	16277492	hadcm3n_oe4h_1900_40_008473428_3	362,880	473,065	1.3036
06 Feb 2014 14:06:37	1285422	16277492	hadcm3n_oe4h_1900_40_008473428_3	336,960	438,375	1.3010
06 Feb 2014 03:14:24	1285422	16277492	hadcm3n_oe4h_1900_40_008473428_3	311,040	402,783	1.2950
05 Feb 2014 17:16:38	1285422	16277492	hadcm3n_oe4h_1900_40_008473428_3	285,120	367,959	1.2905
05 Feb 2014 04:55:17	1285422	16277492	hadcm3n_oe4h_1900_40_008473428_3	259,200	333,978	1.2885
04 Feb 2014 19:23:18	1285422	16277492	hadcm3n_oe4h_1900_40_008473428_3	233,280	301,289	1.2915
04 Feb 2014 10:04:17	1285422	16277492	hadcm3n_oe4h_1900_40_008473428_3	207,360	269,330	1.2989