Task 16272676

Name	hadcm3n_obxn_1900_40_008470590_3
Workunit	8621429
Created	16 Jan 2014, 1:57:28 UTC
Sent	16 Jan 2014, 1:57:44 UTC
Report deadline	17 Apr 2014, 9:24:55 UTC
Received	26 Jan 2014, 20:13:16 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1045847
Run time	9 days 17 hours 11 min 17 sec
CPU time	9 days 2 hours 8 min 10 sec
Validate state	Invalid
Credit	7,153.92
Device peak FLOPS	2.82 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>7.2.33</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> Suspended CPDN Monitor - Suspend request from BOINC... 23:58:26 (12572): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 08:24:07 (10548): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 13:30:33 (5444): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 13:35:59 (5000): No heartbeat from core client for 30 sec - exiting 13:36:00 (5000): No heartbeat from core client for 30 sec - exiting 13:36:01 (5000): No heartbeat from core client for 30 sec - exiting 13:36:02 (5000): No heartbeat from core client for 30 sec - exiting 13:36:03 (5000): No heartbeat from core client for 30 sec - exiting 13:36:04 (5000): No heartbeat from core client for 30 sec - exiting 13:36:05 (5000): No heartbeat from core client for 30 sec - exiting 13:36:06 (5000): No heartbeat from core client for 30 sec - exiting 13:36:08 (5000): No heartbeat from core client for 30 sec - exiting 13:36:09 (5000): No heartbeat from core client for 30 sec - exiting 13:36:10 (5000): No heartbeat from core client for 30 sec - exiting 13:36:11 (5000): No heartbeat from core client for 30 sec - exiting 13:36:12 (5000): No heartbeat from core client for 30 sec - exiting 13:36:13 (5000): No heartbeat from core client for 30 sec - exiting 13:36:14 (5000): No heartbeat from core client for 30 sec - exiting 13:36:15 (5000): No heartbeat from core client for 30 sec - exiting 13:36:16 (5000): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 13:36:17 (5000): No heartbeat from core client for 30 sec - exiting 15:11:52 (2112): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 16:02:58 (1852): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1132, iMonCtr=1 Model crash detected, will try to restart... 16:12:24 (2060): No heartbeat from core client for 30 sec - exiting 16:12:25 (2060): No heartbeat from core client for 30 sec - exiting 16:12:26 (2060): No heartbeat from core client for 30 sec - exiting 16:12:27 (2060): No heartbeat from core client for 30 sec - exiting 16:12:28 (2060): No heartbeat from core client for 30 sec - exiting 16:12:29 (2060): No heartbeat from core client for 30 sec - exiting 16:12:30 (2060): No heartbeat from core client for 30 sec - exiting 16:12:31 (2060): No heartbeat from core client for 30 sec - exiting 16:12:32 (2060): No heartbeat from core client for 30 sec - exiting 16:12:33 (2060): No heartbeat from core client for 30 sec - exiting 16:12:35 (2060): No heartbeat from core client for 30 sec - exiting 16:12:36 (2060): No heartbeat from core client for 30 sec - exiting 16:12:37 (2060): No heartbeat from core client for 30 sec - exiting 16:12:38 (2060): No heartbeat from core client for 30 sec - exiting 16:12:39 (2060): No heartbeat from core client for 30 sec - exiting 16:12:40 (2060): No heartbeat from core client for 30 sec - exiting 16:12:41 (2060): No heartbeat from core client for 30 sec - exiting 16:12:42 (2060): No heartbeat from core client for 30 sec - exiting 16:12:43 (2060): No heartbeat from core client for 30 sec - exiting 16:12:44 (2060): No heartbeat from core client for 30 sec - exiting 16:12:45 (2060): No heartbeat from core client for 30 sec - exiting 16:12:47 (2060): No heartbeat from core client for 30 sec - exiting 16:12:48 (2060): No heartbeat from core client for 30 sec - exiting 16:12:49 (2060): No heartbeat from core client for 30 sec - exiting 16:12:50 (2060): No heartbeat from core client for 30 sec - exiting 16:12:51 (2060): No heartbeat from core client for 30 sec - exiting 16:12:52 (2060): No heartbeat from core client for 30 sec - exiting 16:12:53 (2060): No heartbeat from core client for 30 sec - exiting 16:12:54 (2060): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 16:12:55 (2060): No heartbeat from core client for 30 sec - exiting Model crashed: ATM_DYN : INVALID THETA DETECTED. tmp/pipe_dummy 2048 Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=260, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=260, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=260, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=260, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=260, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=260, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
25 Jan 2014 16:27:34	1045847	16272676	hadcm3n_obxn_1900_40_008470590_3	596,160	771,799	1.2946
25 Jan 2014 06:21:43	1045847	16272676	hadcm3n_obxn_1900_40_008470590_3	570,240	772,972	1.3555
24 Jan 2014 20:04:33	1045847	16272676	hadcm3n_obxn_1900_40_008470590_3	544,320	737,953	1.3557
24 Jan 2014 12:27:21	1045847	16272676	hadcm3n_obxn_1900_40_008470590_3	518,400	702,834	1.3558
24 Jan 2014 12:27:21	1045847	16272676	hadcm3n_obxn_1900_40_008470590_3	492,480	667,451	1.3553
24 Jan 2014 12:27:21	1045847	16272676	hadcm3n_obxn_1900_40_008470590_3	466,560	631,927	1.3544
23 Jan 2014 03:49:20	1045847	16272676	hadcm3n_obxn_1900_40_008470590_3	440,640	596,176	1.3530
22 Jan 2014 17:51:28	1045847	16272676	hadcm3n_obxn_1900_40_008470590_3	414,720	560,736	1.3521
22 Jan 2014 07:58:16	1045847	16272676	hadcm3n_obxn_1900_40_008470590_3	388,800	525,797	1.3524
21 Jan 2014 22:01:17	1045847	16272676	hadcm3n_obxn_1900_40_008470590_3	362,880	490,652	1.3521
21 Jan 2014 11:40:30	1045847	16272676	hadcm3n_obxn_1900_40_008470590_3	336,960	455,688	1.3524
21 Jan 2014 01:48:34	1045847	16272676	hadcm3n_obxn_1900_40_008470590_3	311,040	420,848	1.3530
20 Jan 2014 16:09:33	1045847	16272676	hadcm3n_obxn_1900_40_008470590_3	285,120	385,929	1.3536
20 Jan 2014 06:09:45	1045847	16272676	hadcm3n_obxn_1900_40_008470590_3	259,200	350,891	1.3537
19 Jan 2014 20:17:32	1045847	16272676	hadcm3n_obxn_1900_40_008470590_3	233,280	315,773	1.3536
19 Jan 2014 10:30:31	1045847	16272676	hadcm3n_obxn_1900_40_008470590_3	207,360	280,984	1.3551
19 Jan 2014 00:38:25	1045847	16272676	hadcm3n_obxn_1900_40_008470590_3	181,440	246,216	1.3570
18 Jan 2014 14:45:52	1045847	16272676	hadcm3n_obxn_1900_40_008470590_3	155,520	211,148	1.3577
18 Jan 2014 04:41:09	1045847	16272676	hadcm3n_obxn_1900_40_008470590_3	129,600	176,134	1.3591
17 Jan 2014 18:48:40	1045847	16272676	hadcm3n_obxn_1900_40_008470590_3	103,680	141,145	1.3614