Task 16278194

Name	hadcm3n_7ejh_1980_40_008430128_2
Workunit	8580984
Created	3 Feb 2014, 22:35:21 UTC
Sent	3 Feb 2014, 22:35:23 UTC
Report deadline	8 Aug 2023, 3:55:23 UTC
Received	23 Feb 2014, 16:42:12 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1310485
Run time	9 days 0 hours 10 min 48 sec
CPU time	8 days 7 hours 35 min 44 sec
Validate state	Invalid
Credit	4,976.64
Device peak FLOPS	1.73 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>7.2.33</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7532, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... 20:52:56 (8512): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 14:49:51 (4672): No heartbeat from core client for 30 sec - exiting 14:49:52 (4672): No heartbeat from core client for 30 sec - exiting 14:49:53 (4672): No heartbeat from core client for 30 sec - exiting 14:49:54 (4672): No heartbeat from core client for 30 sec - exiting 14:49:55 (4672): No heartbeat from core client for 30 sec - exiting 14:49:56 (4672): No heartbeat from core client for 30 sec - exiting 14:49:57 (4672): No heartbeat from core client for 30 sec - exiting 14:49:58 (4672): No heartbeat from core client for 30 sec - exiting 14:49:59 (4672): No heartbeat from core client for 30 sec - exiting 14:50:00 (4672): No heartbeat from core client for 30 sec - exiting 14:50:01 (4672): No heartbeat from core client for 30 sec - exiting 14:50:02 (4672): No heartbeat from core client for 30 sec - exiting 14:50:03 (4672): No heartbeat from core client for 30 sec - exiting 14:50:04 (4672): No heartbeat from core client for 30 sec - exiting 14:50:05 (4672): No heartbeat from core client for 30 sec - exiting 14:50:06 (4672): No heartbeat from core client for 30 sec - exiting 14:50:07 (4672): No heartbeat from core client for 30 sec - exiting 14:50:08 (4672): No heartbeat from core client for 30 sec - exiting 14:50:09 (4672): No heartbeat from core client for 30 sec - exiting 14:50:10 (4672): No heartbeat from core client for 30 sec - exiting 14:50:11 (4672): No heartbeat from core client for 30 sec - exiting 14:50:12 (4672): No heartbeat from core client for 30 sec - exiting 14:50:13 (4672): No heartbeat from core client for 30 sec - exiting 14:50:14 (4672): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 14:50:15 (4672): No heartbeat from core client for 30 sec - exiting CPDN Monitor - Quit request from BOINC... 21:00:06 (9696): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 07:50:29 (9900): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 13:37:26 (17120): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 22:41:56 (14828): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 11:52:32 (7740): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... BUFFOUT: C I/O Error - Return code = 32 Model crashed: WRITDUMP: BAD BUFFOUT OF DATA tmp/pipe_dummy 2048 Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6020, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6020, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6020, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6020, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6020, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
23 Feb 2014 08:35:48	1310485	16278194	hadcm3n_7ejh_1980_40_008430128_2	414,720	712,363	1.7177
22 Feb 2014 09:32:37	1310485	16278194	hadcm3n_7ejh_1980_40_008430128_2	388,800	666,015	1.7130
21 Feb 2014 14:03:07	1310485	16278194	hadcm3n_7ejh_1980_40_008430128_2	362,880	616,883	1.7000
21 Feb 2014 02:56:38	1310485	16278194	hadcm3n_7ejh_1980_40_008430128_2	336,960	576,010	1.7094
21 Feb 2014 02:56:38	1310485	16278194	hadcm3n_7ejh_1980_40_008430128_2	311,040	540,122	1.7365
21 Feb 2014 02:56:38	1310485	16278194	hadcm3n_7ejh_1980_40_008430128_2	285,120	503,892	1.7673
21 Feb 2014 02:56:38	1310485	16278194	hadcm3n_7ejh_1980_40_008430128_2	259,200	468,596	1.8079
21 Feb 2014 02:56:38	1310485	16278194	hadcm3n_7ejh_1980_40_008430128_2	233,280	427,329	1.8318
18 Feb 2014 12:03:08	1310485	16278194	hadcm3n_7ejh_1980_40_008430128_2	207,360	380,147	1.8333
18 Feb 2014 12:03:08	1310485	16278194	hadcm3n_7ejh_1980_40_008430128_2	181,440	334,846	1.8455
16 Feb 2014 12:29:08	1310485	16278194	hadcm3n_7ejh_1980_40_008430128_2	155,520	287,580	1.8492
15 Feb 2014 20:01:37	1310485	16278194	hadcm3n_7ejh_1980_40_008430128_2	129,600	240,282	1.8540
13 Feb 2014 13:43:20	1310485	16278194	hadcm3n_7ejh_1980_40_008430128_2	103,680	192,643	1.8581
12 Feb 2014 20:30:28	1310485	16278194	hadcm3n_7ejh_1980_40_008430128_2	77,760	145,153	1.8667
10 Feb 2014 17:57:43	1310485	16278194	hadcm3n_7ejh_1980_40_008430128_2	51,840	97,644	1.8836
08 Feb 2014 19:42:57	1310485	16278194	hadcm3n_7ejh_1980_40_008430128_2	25,920	52,934	2.0422