Task 13690764

Name	hadcm3n_o5bo_1980_40_007547956_3
Workunit	7745188
Created	2 Dec 2011, 16:05:38 UTC
Sent	2 Dec 2011, 16:07:00 UTC
Report deadline	2 Mar 2012, 23:34:11 UTC
Received	21 Dec 2011, 10:33:01 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	193 (0x000000C1) EXIT_SIGNAL
Computer ID	1181599
Run time	16 days 0 hours 23 min 30 sec
CPU time	8 days 4 hours 8 min 37 sec
Validate state	Invalid
Credit	3,110.40
Device peak FLOPS	1.92 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.12.34</core_client_version> <![CDATA[ <message> - exit code 193 (0xc1) </message> <stderr_txt> Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3944, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3944, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3944, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3944, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3944, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3944, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4420, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4088, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4448, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1504, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5928, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4148, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5232, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=9408, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4420, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4420, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5388, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=9784, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1236, iMonCtr=1 Model crash detected, will try to restart... 20:05:12 (7964): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:05:13 (7964): No heartbeat from core client for 30 sec - exiting 20:05:15 (7964): No heartbeat from core client for 30 sec - exiting 20:05:16 (7964): No heartbeat from core client for 30 sec - exiting 20:05:17 (7964): No heartbeat from core client for 30 sec - exiting 20:05:18 (7964): No heartbeat from core client for 30 sec - exiting 20:05:20 (7964): No heartbeat from core client for 30 sec - exiting 20:05:21 (7964): No heartbeat from core client for 30 sec - exiting 20:05:22 (7964): No heartbeat from core client for 30 sec - exiting Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4904, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4904, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4480, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4856, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2420, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Signal 11 received, exiting... Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
21 Dec 2011 09:33:07	1181599	13690764	hadcm3n_o5bo_1980_40_007547956_3	259,200	706,112	2.7242
20 Dec 2011 00:15:45	1181599	13690764	hadcm3n_o5bo_1980_40_007547956_3	233,280	636,313	2.7277
18 Dec 2011 17:39:58	1181599	13690764	hadcm3n_o5bo_1980_40_007547956_3	207,360	569,724	2.7475
16 Dec 2011 08:11:22	1181599	13690764	hadcm3n_o5bo_1980_40_007547956_3	181,440	501,771	2.7655
15 Dec 2011 03:37:58	1181599	13690764	hadcm3n_o5bo_1980_40_007547956_3	155,520	435,814	2.8023
13 Dec 2011 08:00:34	1181599	13690764	hadcm3n_o5bo_1980_40_007547956_3	129,600	366,793	2.8302
11 Dec 2011 14:13:10	1181599	13690764	hadcm3n_o5bo_1980_40_007547956_3	103,680	297,012	2.8647
09 Dec 2011 17:15:05	1181599	13690764	hadcm3n_o5bo_1980_40_007547956_3	77,760	227,826	2.9299
07 Dec 2011 04:39:33	1181599	13690764	hadcm3n_o5bo_1980_40_007547956_3	51,840	155,124	2.9924
05 Dec 2011 03:59:49	1181599	13690764	hadcm3n_o5bo_1980_40_007547956_3	25,920	85,053	3.2814