Task 13119495

Name	hadcm3n_yj96_1900_40_007357812_0
Workunit	7555242
Created	6 Jul 2011, 14:55:33 UTC
Sent	8 Jul 2011, 21:04:52 UTC
Report deadline	8 Oct 2011, 4:32:03 UTC
Received	23 Aug 2011, 8:22:10 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	-226 (0xFFFFFF1E) ERR_TOO_MANY_EXITS
Computer ID	1157559
Run time	4 days 6 hours 44 min 48 sec
CPU time	3 days 21 hours 54 min 12 sec
Validate state	Invalid
Credit	1,866.24
Device peak FLOPS	2.81 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.12.33</core_client_version> <![CDATA[ <message> too many exit(0)s </message> <stderr_txt> CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4792, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1080, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1080, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1080, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1080, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4472, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4472, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3408, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=972, iMonCtr=1 Model crash detected, will try to restart... 08:29:49 (224): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3880, iMonCtr=1 Model crash detected, will try to restart... 19:48:19 (5060): No heartbeat from core client for 30 sec - exiting 19:48:21 (5060): No heartbeat from core client for 30 sec - exiting 19:48:22 (5060): No heartbeat from core client for 30 sec - exiting 19:48:23 (5060): No heartbeat from core client for 30 sec - exiting 19:48:24 (5060): No heartbeat from core client for 30 sec - exiting 19:48:25 (5060): No heartbeat from core client for 30 sec - exiting 19:48:26 (5060): No heartbeat from core client for 30 sec - exiting 19:48:27 (5060): No heartbeat from core client for 30 sec - exiting 19:48:28 (5060): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4548, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4700, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4624, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4252, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4908, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3172, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5828, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CCPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4888, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4960, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... 08:47:15 (3720): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4544, iMonCtr=1 Model crash detected, will try to restart... 09:12:14 (3616): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2296, iMonCtr=1 Model crash detected, will try to restart... </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
23 Aug 2011 08:24:54	1157559	13119495	hadcm3n_yj96_1900_40_007357812_0	155,520	337,302	2.1689
17 Aug 2011 16:37:42	1157559	13119495	hadcm3n_yj96_1900_40_007357812_0	129,600	296,341	2.2866
07 Aug 2011 11:47:14	1157559	13119495	hadcm3n_yj96_1900_40_007357812_0	103,680	255,899	2.4682
01 Aug 2011 12:11:51	1157559	13119495	hadcm3n_yj96_1900_40_007357812_0	77,760	215,405	2.7701
25 Jul 2011 14:47:12	1157559	13119495	hadcm3n_yj96_1900_40_007357812_0	51,840	80,915	1.5609
10 Jul 2011 23:20:02	1157559	13119495	hadcm3n_yj96_1900_40_007357812_0	25,920	40,578	1.5655