Task 16162620

Name	hadcm3n_of7u_1900_40_008474845_1
Workunit	8625684
Created	28 Dec 2013, 5:01:22 UTC
Sent	28 Dec 2013, 5:01:26 UTC
Report deadline	29 Mar 2014, 12:28:37 UTC
Received	24 Jan 2014, 12:29:32 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1319477
Run time	22 days 5 hours 33 min 39 sec
CPU time	19 days 9 hours 12 min 29 sec
Validate state	Invalid
Credit	11,197.44
Device peak FLOPS	2.22 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>7.2.33</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> 02:31:48 (7144): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 05:46:45 (7656): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 02:30:57 (6840): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 02:33:42 (3704): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 02:54:32 (7832): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 02:32:00 (6868): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 02:31:15 (2664): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 02:54:26 (8124): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5796, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 02:16:50 (4524): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 20:22:07 (5992): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 02:05:49 (3856): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 01:51:12 (5036): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 01:58:08 (7096): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 01:51:20 (5748): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 17:59:54 (6184): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4004, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4004, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4004, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4004, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4004, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4004, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
24 Jan 2014 12:32:23	1306645	16162620	hadcm3n_of7u_1900_40_008474845_1	933,120	1,638,808	1.7563
23 Jan 2014 00:03:34	1306645	16162620	hadcm3n_of7u_1900_40_008474845_1	907,200	1,593,091	1.7561
22 Jan 2014 10:18:43	1306645	16162620	hadcm3n_of7u_1900_40_008474845_1	881,280	1,547,374	1.7558
21 Jan 2014 19:00:20	1306645	16162620	hadcm3n_of7u_1900_40_008474845_1	855,360	1,501,579	1.7555
21 Jan 2014 04:14:03	1306645	16162620	hadcm3n_of7u_1900_40_008474845_1	829,440	1,455,827	1.7552
20 Jan 2014 13:51:29	1306645	16162620	hadcm3n_of7u_1900_40_008474845_1	803,520	1,410,187	1.7550
19 Jan 2014 23:28:21	1306645	16162620	hadcm3n_of7u_1900_40_008474845_1	777,600	1,364,630	1.7549
19 Jan 2014 08:35:08	1306645	16162620	hadcm3n_of7u_1900_40_008474845_1	751,680	1,319,152	1.7549
18 Jan 2014 18:11:35	1306645	16162620	hadcm3n_of7u_1900_40_008474845_1	725,760	1,273,856	1.7552
18 Jan 2014 02:00:31	1306645	16162620	hadcm3n_of7u_1900_40_008474845_1	699,840	1,228,768	1.7558
17 Jan 2014 11:52:50	1306645	16162620	hadcm3n_of7u_1900_40_008474845_1	673,920	1,183,510	1.7562
16 Jan 2014 21:49:16	1306645	16162620	hadcm3n_of7u_1900_40_008474845_1	648,000	1,138,526	1.7570
16 Jan 2014 07:18:10	1306645	16162620	hadcm3n_of7u_1900_40_008474845_1	622,080	1,093,034	1.7571
15 Jan 2014 17:03:28	1306645	16162620	hadcm3n_of7u_1900_40_008474845_1	596,160	1,048,038	1.7580
15 Jan 2014 02:52:27	1306645	16162620	hadcm3n_of7u_1900_40_008474845_1	570,240	1,002,879	1.7587
14 Jan 2014 12:49:15	1306645	16162620	hadcm3n_of7u_1900_40_008474845_1	544,320	957,741	1.7595
14 Jan 2014 00:40:58	1306645	16162620	hadcm3n_of7u_1900_40_008474845_1	518,400	912,247	1.7597
13 Jan 2014 07:52:38	1306645	16162620	hadcm3n_of7u_1900_40_008474845_1	492,480	866,937	1.7603
12 Jan 2014 16:32:56	1306645	16162620	hadcm3n_of7u_1900_40_008474845_1	466,560	821,273	1.7603
12 Jan 2014 01:46:48	1306645	16162620	hadcm3n_of7u_1900_40_008474845_1	440,640	775,876	1.7608