Task 15407412

Name	hadcm3n_o0o2_2060_40_008241702_3
Workunit	8396826
Created	27 Oct 2012, 13:46:29 UTC
Sent	27 Oct 2012, 13:46:37 UTC
Report deadline	26 Jan 2013, 21:13:48 UTC
Received	9 Mar 2013, 18:41:04 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	193 (0x000000C1) EXIT_SIGNAL
Computer ID	1156294
Run time	28 days 9 hours 37 min 10 sec
CPU time	25 days 8 hours 25 min 33 sec
Validate state	Invalid
Credit	6,220.80
Device peak FLOPS	2.41 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.10.58</core_client_version> <![CDATA[ <message> - exit code 193 (0xc1) </message> <stderr_txt> CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=15580, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4520, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 08:32:11 (5172): No heartbeat from core client for 30 sec - exiting 08:32:12 (5172): No heartbeat from core client for 30 sec - exiting 08:32:13 (5172): No heartbeat from core client for 30 sec - exiting 08:32:14 (5172): No heartbeat from core client for 30 sec - exiting 08:32:15 (5172): No heartbeat from core client for 30 sec - exiting 08:32:16 (5172): No heartbeat from core client for 30 sec - exiting 08:32:17 (5172): No heartbeat from core client for 30 sec - exiting 08:32:18 (5172): No heartbeat from core client for 30 sec - exiting 08:32:19 (5172): No heartbeat from core client for 30 sec - exiting 08:32:20 (5172): No heartbeat from core client for 30 sec - exiting 08:32:21 (5172): No heartbeat from core client for 30 sec - exiting 08:32:22 (5172): No heartbeat from core client for 30 sec - exiting 08:32:23 (5172): No heartbeat from core client for 30 sec - exiting 08:32:24 (5172): No heartbeat from core client for 30 sec - exiting 08:32:25 (5172): No heartbeat from core client for 30 sec - exiting 08:32:26 (5172): No heartbeat from core client for 30 sec - exiting 08:32:27 (5172): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 08:32:28 (5172): No heartbeat from core client for 30 sec - exiting 08:32:29 (5172): No heartbeat from core client for 30 sec - exiting CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5044, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4956, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 21:01:00 (120): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN prController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3360, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2920, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 07:59:08 (2348): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6112, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3076, iMonCtr=1 Model crash detected, will try to restart... 09:33:16 (716): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3636, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3676, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4300, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4652, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2160, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1420, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3304, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4124, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4124, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4876, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4824, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... 20:59:02 (5868): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4704, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4892, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4892, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4684, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3648, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=744, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1680, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3828, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... 11:01:23 (4600): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3724, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3724, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4488, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3448, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4120, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3716, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3864, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4408, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5684, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3760, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4864, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4544, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2336, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3772, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3772, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... CCPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3660, iMonCtr=1 Model crash detected, will try to restart... Ocean Restart file copy failed on o0o2ko.das08n0 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3664, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4672, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
09 Mar 2013 17:41:05	1156294	15407412	hadcm3n_o0o2_2060_40_008241702_3	518,400	2,190,301	4.2251
03 Mar 2013 00:38:54	1156294	15407412	hadcm3n_o0o2_2060_40_008241702_3	492,480	2,075,010	4.2134
24 Feb 2013 13:20:21	1156294	15407412	hadcm3n_o0o2_2060_40_008241702_3	466,560	1,960,303	4.2016
09 Feb 2013 08:11:39	1156294	15407412	hadcm3n_o0o2_2060_40_008241702_3	440,640	1,845,305	4.1878
03 Feb 2013 08:37:47	1156294	15407412	hadcm3n_o0o2_2060_40_008241702_3	414,720	1,732,073	4.1765
29 Jan 2013 14:10:21	1156294	15407412	hadcm3n_o0o2_2060_40_008241702_3	388,800	1,613,773	4.1507
22 Jan 2013 16:05:51	1156294	15407412	hadcm3n_o0o2_2060_40_008241702_3	362,880	1,501,612	4.1380
19 Jan 2013 11:10:37	1156294	15407412	hadcm3n_o0o2_2060_40_008241702_3	336,960	1,383,508	4.1059
14 Jan 2013 13:30:24	1156294	15407412	hadcm3n_o0o2_2060_40_008241702_3	311,040	1,272,324	4.0905
08 Jan 2013 19:44:29	1156294	15407412	hadcm3n_o0o2_2060_40_008241702_3	285,120	1,156,094	4.0548
03 Jan 2013 08:30:53	1156294	15407412	hadcm3n_o0o2_2060_40_008241702_3	259,200	1,037,639	4.0032
28 Dec 2012 13:50:32	1156294	15407412	hadcm3n_o0o2_2060_40_008241702_3	233,280	918,263	3.9363
20 Dec 2012 19:53:28	1156294	15407412	hadcm3n_o0o2_2060_40_008241702_3	207,360	803,748	3.8761
16 Dec 2012 13:50:35	1156294	15407412	hadcm3n_o0o2_2060_40_008241702_3	181,440	686,310	3.7826
05 Dec 2012 09:04:29	1156294	15407412	hadcm3n_o0o2_2060_40_008241702_3	155,520	575,847	3.7027
21 Nov 2012 18:18:50	1156294	15407412	hadcm3n_o0o2_2060_40_008241702_3	129,600	473,951	3.6570
17 Nov 2012 13:31:01	1156294	15407412	hadcm3n_o0o2_2060_40_008241702_3	103,680	385,376	3.7170
12 Nov 2012 07:32:12	1156294	15407412	hadcm3n_o0o2_2060_40_008241702_3	77,760	301,168	3.8730
06 Nov 2012 15:41:36	1156294	15407412	hadcm3n_o0o2_2060_40_008241702_3	51,840	201,828	3.8933
01 Nov 2012 11:21:56	1156294	15407412	hadcm3n_o0o2_2060_40_008241702_3	25,920	99,791	3.8500