Task 13123553

Name	hadcm3n_yktj_1900_40_007359841_0
Workunit	7557271
Created	6 Jul 2011, 15:08:49 UTC
Sent	7 Jul 2011, 21:59:32 UTC
Report deadline	7 Oct 2011, 5:26:43 UTC
Received	21 Jan 2012, 19:50:39 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	193 (0x000000C1) EXIT_SIGNAL
Computer ID	1133571
Run time	34 days 22 hours 1 min 56 sec
CPU time	18 days 16 hours 22 min 22 sec
Validate state	Invalid
Credit	6,220.80
Device peak FLOPS	1.28 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.10.58</core_client_version> <![CDATA[ <message> - exit code 193 (0xc1) </message> <stderr_txt> Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5800, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5480, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4760, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5552, iMonCtr=1 Model crash detected, will try to restart... 20:04:54 (4328): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:04:55 (4328): No heartbeat from core client for 30 sec - exiting 20:04:56 (4328): No heartbeat from core client for 30 sec - exiting 20:04:57 (4328): No heartbeat from core client for 30 sec - exiting 20:04:58 (4328): No heartbeat from core client for 30 sec - exiting 20:04:59 (4328): No heartbeat from core client for 30 sec - exiting Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1700, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4972, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5340, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4104, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5152, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3688, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4244, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4364, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 20:23:58 (4496): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 20:59:50 (4440): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:59:51 (4440): No heartbeat from core client for 30 sec - exiting 20:59:52 (4440): No heartbeat from core client for 30 sec - exiting 20:59:53 (4440): No heartbeat from core client for 30 sec - exiting 20:59:54 (4440): No heartbeat from core client for 30 sec - exiting 20:59:55 (4440): No heartbeat from core client for 30 sec - exiting 21:06:20 (668): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:06:21 (668): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 19:50:51 (5180): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:50:52 (5180): No heartbeat from core client for 30 sec - exiting 19:50:53 (5180): No heartbeat from core client for 30 sec - exiting 19:50:54 (5180): No heartbeat from core client for 30 sec - exiting 19:50:55 (5180): No heartbeat from core client for 30 sec - exiting 19:59:24 (5724): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:59:25 (5724): No heartbeat from core client for 30 sec - exiting 19:59:26 (5724): No heartbeat from core client for 30 sec - exiting 19:59:27 (5724): No heartbeat from core client for 30 sec - exiting 19:59:28 (5724): No heartbeat from core client for 30 sec - exiting 19:59:29 (5724): No heartbeat from core client for 30 sec - exiting Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4604, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4328, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4560, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
21 Jan 2012 18:52:00	1133571	13123553	hadcm3n_yktj_1900_40_007359841_0	518,400	1,614,135	3.1137
16 Jan 2012 21:00:23	1133571	13123553	hadcm3n_yktj_1900_40_007359841_0	492,480	1,533,523	3.1139
11 Jan 2012 02:42:34	1133571	13123553	hadcm3n_yktj_1900_40_007359841_0	466,560	1,452,514	3.1132
30 Dec 2011 23:33:51	1133571	13123553	hadcm3n_yktj_1900_40_007359841_0	440,640	1,367,574	3.1036
20 Dec 2011 03:16:32	1133571	13123553	hadcm3n_yktj_1900_40_007359841_0	414,720	1,285,812	3.1004
09 Dec 2011 20:17:40	1133571	13123553	hadcm3n_yktj_1900_40_007359841_0	388,800	1,206,414	3.1029
01 Dec 2011 14:22:51	1133571	13123553	hadcm3n_yktj_1900_40_007359841_0	362,880	1,124,154	3.0979
23 Nov 2011 23:08:46	1133571	13123553	hadcm3n_yktj_1900_40_007359841_0	336,960	1,045,422	3.1025
15 Nov 2011 23:10:27	1133571	13123553	hadcm3n_yktj_1900_40_007359841_0	311,040	961,394	3.0909
31 Oct 2011 18:53:50	1133571	13123553	hadcm3n_yktj_1900_40_007359841_0	285,120	881,069	3.0902
31 Oct 2011 16:46:57	1133571	13123553	hadcm3n_yktj_1900_40_007359841_0	259,200	803,396	3.0995
09 Oct 2011 01:19:59	1133571	13123553	hadcm3n_yktj_1900_40_007359841_0	233,280	723,884	3.1031
23 Sep 2011 23:05:06	1133571	13123553	hadcm3n_yktj_1900_40_007359841_0	207,360	642,353	3.0978
15 Sep 2011 05:40:58	1133571	13123553	hadcm3n_yktj_1900_40_007359841_0	181,440	561,783	3.0962
29 Aug 2011 01:36:57	1133571	13123553	hadcm3n_yktj_1900_40_007359841_0	155,520	482,628	3.1033
20 Aug 2011 16:41:46	1133571	13123553	hadcm3n_yktj_1900_40_007359841_0	129,600	403,274	3.1117
07 Aug 2011 23:31:07	1133571	13123553	hadcm3n_yktj_1900_40_007359841_0	103,680	321,483	3.1007
25 Jul 2011 22:55:30	1133571	13123553	hadcm3n_yktj_1900_40_007359841_0	77,760	241,158	3.1013
25 Jul 2011 17:39:30	1133571	13123553	hadcm3n_yktj_1900_40_007359841_0	51,840	162,947	3.1433
25 Jul 2011 14:44:16	1133571	13123553	hadcm3n_yktj_1900_40_007359841_0	25,920	80,668	3.1122