Task 16390309

Name	hadcm3n_3e0p_1980_40_008395697_4
Workunit	8546556
Created	25 Mar 2014, 9:57:57 UTC
Sent	25 Mar 2014, 9:58:11 UTC
Report deadline	24 Jun 2014, 17:25:22 UTC
Received	14 May 2014, 2:07:49 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1241124
Run time	17 days 16 hours 14 min 59 sec
CPU time	16 days 20 hours 3 min 44 sec
Validate state	Invalid
Credit	4,354.56
Device peak FLOPS	1.33 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>7.2.42</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5396, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=11380, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3556, iMonCtr=1 Model crash detected, will try to restart... 19:07:18 (5880): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3144, iMonCtr=1 Model crash detected, will try to restart... 06:14:05 (5960): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6060, iMonCtr=1 Model crash detected, will try to restart... 07:32:57 (5932): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5688, iMonCtr=1 Model crash detected, will try to restart... 06:33:01 (4164): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5312, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2912, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5956, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5836, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2284, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2284, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2284, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2284, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2284, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2284, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2852, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2852, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2852, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2852, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2852, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6100, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
13 May 2014 07:06:44	1241124	16390309	hadcm3n_3e0p_1980_40_008395697_4	362,880	1,435,961	3.9571
11 May 2014 11:28:38	1241124	16390309	hadcm3n_3e0p_1980_40_008395697_4	336,960	1,335,343	3.9629
10 May 2014 06:32:53	1241124	16390309	hadcm3n_3e0p_1980_40_008395697_4	311,040	1,235,521	3.9722
06 May 2014 11:03:59	1241124	16390309	hadcm3n_3e0p_1980_40_008395697_4	285,120	1,135,706	3.9833
04 May 2014 15:43:04	1241124	16390309	hadcm3n_3e0p_1980_40_008395697_4	259,200	1,035,194	3.9938
03 May 2014 10:27:43	1241124	16390309	hadcm3n_3e0p_1980_40_008395697_4	233,280	935,259	4.0092
27 Apr 2014 12:34:39	1241124	16390309	hadcm3n_3e0p_1980_40_008395697_4	207,360	833,424	4.0192
26 Apr 2014 05:27:49	1241124	16390309	hadcm3n_3e0p_1980_40_008395697_4	181,440	725,650	3.9994
20 Apr 2014 09:18:12	1241124	16390309	hadcm3n_3e0p_1980_40_008395697_4	155,520	614,823	3.9533
13 Apr 2014 00:29:45	1241124	16390309	hadcm3n_3e0p_1980_40_008395697_4	129,600	506,083	3.9050
06 Apr 2014 23:59:00	1241124	16390309	hadcm3n_3e0p_1980_40_008395697_4	103,680	403,152	3.8884
05 Apr 2014 18:46:19	1241124	16390309	hadcm3n_3e0p_1980_40_008395697_4	77,760	302,254	3.8870
30 Mar 2014 20:03:17	1241124	16390309	hadcm3n_3e0p_1980_40_008395697_4	51,840	200,417	3.8661
29 Mar 2014 14:11:08	1241124	16390309	hadcm3n_3e0p_1980_40_008395697_4	25,920	98,386	3.7958