Task 13560838

Name	hadcm3n_y9hc_1900_40_007527530_1
Workunit	7725005
Created	28 Oct 2011, 13:51:21 UTC
Sent	28 Oct 2011, 20:37:43 UTC
Report deadline	28 Jan 2012, 4:04:54 UTC
Received	11 Dec 2011, 23:15:59 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	193 (0x000000C1) EXIT_SIGNAL
Computer ID	1177060
Run time	9 days 22 hours 18 min 29 sec
CPU time	9 days 20 hours 2 min 29 sec
Validate state	Invalid
Credit	12,441.60
Device peak FLOPS	3.24 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.12.34</core_client_version> <![CDATA[ <message> - exit code 193 (0xc1) </message> <stderr_txt> Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3860, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4276, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4288, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4288, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3316, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3316, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3364, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3364, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3204, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3204, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3256, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3256, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3236, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3236, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3236, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 14:48:13 (3300): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3224, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3224, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Signal 11 received, exiting... Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
11 Dec 2011 16:58:56	1177060	13560838	hadcm3n_y9hc_1900_40_007527530_1	1,036,800	849,748	0.8196
11 Dec 2011 06:51:24	1177060	13560838	hadcm3n_y9hc_1900_40_007527530_1	1,010,880	828,747	0.8198
10 Dec 2011 23:49:22	1177060	13560838	hadcm3n_y9hc_1900_40_007527530_1	984,960	809,474	0.8218
10 Dec 2011 14:47:25	1177060	13560838	hadcm3n_y9hc_1900_40_007527530_1	959,040	789,236	0.8229
09 Dec 2011 21:58:18	1177060	13560838	hadcm3n_y9hc_1900_40_007527530_1	933,120	768,386	0.8235
09 Dec 2011 15:59:22	1177060	13560838	hadcm3n_y9hc_1900_40_007527530_1	907,200	746,988	0.8234
08 Dec 2011 17:20:20	1177060	13560838	hadcm3n_y9hc_1900_40_007527530_1	881,280	726,539	0.8244
06 Dec 2011 20:51:08	1177060	13560838	hadcm3n_y9hc_1900_40_007527530_1	855,360	705,227	0.8245
05 Dec 2011 19:03:14	1177060	13560838	hadcm3n_y9hc_1900_40_007527530_1	829,440	684,256	0.8250
04 Dec 2011 12:35:50	1177060	13560838	hadcm3n_y9hc_1900_40_007527530_1	803,520	662,797	0.8249
03 Dec 2011 19:41:05	1177060	13560838	hadcm3n_y9hc_1900_40_007527530_1	777,600	641,178	0.8246
03 Dec 2011 13:27:20	1177060	13560838	hadcm3n_y9hc_1900_40_007527530_1	751,680	618,872	0.8233
02 Dec 2011 21:33:10	1177060	13560838	hadcm3n_y9hc_1900_40_007527530_1	725,760	597,288	0.8230
02 Dec 2011 15:38:47	1177060	13560838	hadcm3n_y9hc_1900_40_007527530_1	699,840	576,126	0.8232
01 Dec 2011 17:58:54	1177060	13560838	hadcm3n_y9hc_1900_40_007527530_1	673,920	555,744	0.8246
30 Nov 2011 17:27:58	1177060	13560838	hadcm3n_y9hc_1900_40_007527530_1	648,000	535,992	0.8271
29 Nov 2011 17:08:30	1177060	13560838	hadcm3n_y9hc_1900_40_007527530_1	622,080	515,982	0.8294
28 Nov 2011 17:37:59	1177060	13560838	hadcm3n_y9hc_1900_40_007527530_1	596,160	494,825	0.8300
27 Nov 2011 17:18:04	1177060	13560838	hadcm3n_y9hc_1900_40_007527530_1	570,240	473,485	0.8303
27 Nov 2011 10:26:07	1177060	13560838	hadcm3n_y9hc_1900_40_007527530_1	544,320	450,253	0.8272