Task 16611810

Name	hadcm3n_84ng_1980_40_008463984_4
Workunit	8614823
Created	6 May 2014, 9:46:21 UTC
Sent	6 May 2014, 9:47:36 UTC
Report deadline	5 Aug 2014, 17:14:47 UTC
Received	25 May 2014, 7:10:42 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1169024
Run time	9 days 17 hours 10 min 4 sec
CPU time	9 days 4 hours 58 min 59 sec
Validate state	Invalid
Credit	5,287.68
Device peak FLOPS	2.43 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>7.2.42</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2672, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2672, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2672, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4528, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4528, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1124, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1124, iMonCtr=1 Model crash detected, will try to restart... 08:15:02 (6496): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 07:46:34 (6684): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3840, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3840, iMonCtr=1 Model crash detected, will try to restart... 07:28:55 (5660): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3640, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3640, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... 07:38:58 (5684): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5244, iMonCtr=1 Model crash detected, will try to restart... 07:39:08 (2744): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 08:16:52 (4000): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 07:54:39 (2308): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 08:04:05 (3452): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6084, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6084, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6084, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6084, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6084, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6084, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
23 May 2014 09:00:32	1169024	16611810	hadcm3n_84ng_1980_40_008463984_4	440,640	760,485	1.7259
22 May 2014 09:10:12	1169024	16611810	hadcm3n_84ng_1980_40_008463984_4	414,720	715,832	1.7261
21 May 2014 10:31:27	1169024	16611810	hadcm3n_84ng_1980_40_008463984_4	388,800	670,257	1.7239
20 May 2014 11:04:42	1169024	16611810	hadcm3n_84ng_1980_40_008463984_4	362,880	625,429	1.7235
19 May 2014 09:43:43	1169024	16611810	hadcm3n_84ng_1980_40_008463984_4	336,960	581,882	1.7269
17 May 2014 17:40:13	1169024	16611810	hadcm3n_84ng_1980_40_008463984_4	311,040	537,785	1.7290
16 May 2014 19:14:09	1169024	16611810	hadcm3n_84ng_1980_40_008463984_4	285,120	493,224	1.7299
15 May 2014 19:36:40	1169024	16611810	hadcm3n_84ng_1980_40_008463984_4	259,200	449,109	1.7327
14 May 2014 20:31:24	1169024	16611810	hadcm3n_84ng_1980_40_008463984_4	233,280	404,614	1.7345
14 May 2014 08:14:14	1169024	16611810	hadcm3n_84ng_1980_40_008463984_4	207,360	361,266	1.7422
13 May 2014 09:44:17	1169024	16611810	hadcm3n_84ng_1980_40_008463984_4	181,440	316,057	1.7419
12 May 2014 07:52:38	1169024	16611810	hadcm3n_84ng_1980_40_008463984_4	155,520	267,792	1.7219
10 May 2014 19:48:16	1169024	16611810	hadcm3n_84ng_1980_40_008463984_4	129,600	223,235	1.7225
09 May 2014 18:54:47	1169024	16611810	hadcm3n_84ng_1980_40_008463984_4	103,680	178,345	1.7201
08 May 2014 21:33:13	1169024	16611810	hadcm3n_84ng_1980_40_008463984_4	77,760	134,396	1.7283
08 May 2014 07:58:36	1169024	16611810	hadcm3n_84ng_1980_40_008463984_4	51,840	88,820	1.7133
07 May 2014 08:53:46	1169024	16611810	hadcm3n_84ng_1980_40_008463984_4	25,920	44,339	1.7106