Task 15864844

Name	hadcm3n_n2fc_1920_40_008396083_0
Workunit	8546942
Created	26 Jun 2013, 1:04:55 UTC
Sent	26 Jun 2013, 5:46:20 UTC
Report deadline	25 Sep 2013, 13:13:31 UTC
Received	21 Jul 2013, 23:53:42 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	193 (0x000000C1) EXIT_SIGNAL
Computer ID	1179495
Run time	17 days 11 hours 26 min 46 sec
CPU time	15 days 1 hours 34 min 2 sec
Validate state	Invalid
Credit	6,220.80
Device peak FLOPS	1.51 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>7.0.64</core_client_version> <![CDATA[ <message> (unknown error) - exit code 193 (0xc1) </message> <stderr_txt> Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3156, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3156, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3520, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3520, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3520, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3428, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3428, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3428, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3468, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3468, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4976, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3940, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3940, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3364, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3672, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3288, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3552, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4652, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3504, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3504, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3504, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3504, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... 17:33:00 (4968): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3600, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3536, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... 21:00:46 (7140): No heartbeat from core client for 30 sec - exiting 21:00:48 (7140): No heartbeat from core client for 30 sec - exiting 21:00:49 (7140): No heartbeat from core client for 30 sec - exiting 21:00:50 (7140): No heartbeat from core client for 30 sec - exiting 21:00:51 (7140): No heartbeat from core client for 30 sec - exiting 21:00:52 (7140): No heartbeat from core client for 30 sec - exiting 21:00:53 (7140): No heartbeat from core client for 30 sec - exiting 21:00:54 (7140): No heartbeat from core client for 30 sec - exiting 21:00:55 (7140): No heartbeat from core client for 30 sec - exiting 21:00:56 (7140): No heartbeat from core client for 30 sec - exiting 21:00:57 (7140): No heartbeat from core client for 30 sec - exiting 21:00:58 (7140): No heartbeat from core client for 30 sec - exiting 21:00:59 (7140): No heartbeat from core client for 30 sec - exiting 21:01:00 (7140): No heartbeat from core client for 30 sec - exiting 21:01:01 (7140): No heartbeat from core client for 30 sec - exiting 21:01:02 (7140): No heartbeat from core client for 30 sec - exiting 21:01:03 (7140): No heartbeat from core client for 30 sec - exiting 21:01:04 (7140): No heartbeat from core client for 30 sec - exiting 21:01:05 (7140): No heartbeat from core client for 30 sec - exiting 21:01:06 (7140): No heartbeat from core client for 30 sec - exiting 21:01:07 (7140): No heartbeat from core client for 30 sec - exiting 21:01:08 (7140): No heartbeat from core client for 30 sec - exiting 21:01:09 (7140): No heartbeat from core client for 30 sec - exiting 21:01:10 (7140): No heartbeat from core client for 30 sec - exiting 21:01:11 (7140): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3568, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3568, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3568, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3708, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3708, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
23 Jul 2013 21:26:26	1179495	15864844	hadcm3n_n2fc_1920_40_008396083_0	518,400	1,301,635	2.5109
23 Jul 2013 20:12:45	1179495	15864844	hadcm3n_n2fc_1920_40_008396083_0	492,480	1,221,424	2.4801
23 Jul 2013 18:52:42	1179495	15864844	hadcm3n_n2fc_1920_40_008396083_0	466,560	1,142,143	2.4480
23 Jul 2013 18:52:42	1179495	15864844	hadcm3n_n2fc_1920_40_008396083_0	440,640	1,080,983	2.4532
23 Jul 2013 18:52:41	1179495	15864844	hadcm3n_n2fc_1920_40_008396083_0	414,720	1,018,114	2.4549
23 Jul 2013 18:52:40	1179495	15864844	hadcm3n_n2fc_1920_40_008396083_0	388,800	961,134	2.4721
23 Jul 2013 18:52:39	1179495	15864844	hadcm3n_n2fc_1920_40_008396083_0	362,880	905,849	2.4963
11 Jul 2013 02:48:27	1179495	15864844	hadcm3n_n2fc_1920_40_008396083_0	336,960	839,217	2.4906
09 Jul 2013 16:07:31	1179495	15864844	hadcm3n_n2fc_1920_40_008396083_0	311,040	778,246	2.5021
07 Jul 2013 23:20:01	1179495	15864844	hadcm3n_n2fc_1920_40_008396083_0	285,120	712,808	2.5000
07 Jul 2013 07:00:42	1179495	15864844	hadcm3n_n2fc_1920_40_008396083_0	259,200	656,448	2.5326
06 Jul 2013 06:30:30	1179495	15864844	hadcm3n_n2fc_1920_40_008396083_0	233,280	595,416	2.5524
06 Jul 2013 04:38:32	1179495	15864844	hadcm3n_n2fc_1920_40_008396083_0	207,360	533,820	2.5744
04 Jul 2013 14:22:28	1179495	15864844	hadcm3n_n2fc_1920_40_008396083_0	181,440	478,893	2.6394
02 Jul 2013 23:08:11	1179495	15864844	hadcm3n_n2fc_1920_40_008396083_0	155,520	424,067	2.7268
02 Jul 2013 12:03:04	1179495	15864844	hadcm3n_n2fc_1920_40_008396083_0	129,600	355,624	2.7440
02 Jul 2013 11:02:18	1179495	15864844	hadcm3n_n2fc_1920_40_008396083_0	103,680	285,429	2.7530
02 Jul 2013 10:16:13	1179495	15864844	hadcm3n_n2fc_1920_40_008396083_0	77,760	212,968	2.7388
28 Jun 2013 15:01:08	1179495	15864844	hadcm3n_n2fc_1920_40_008396083_0	51,840	142,640	2.7515
27 Jun 2013 10:49:54	1179495	15864844	hadcm3n_n2fc_1920_40_008396083_0	25,920	71,588	2.7619