Task 12743481

Name	hadcm3n_o4he_1900_40_007201141_0
Workunit	7399421
Created	28 Mar 2011, 14:10:24 UTC
Sent	30 Mar 2011, 12:53:43 UTC
Report deadline	29 Jun 2011, 20:20:54 UTC
Received	6 May 2011, 20:06:20 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	193 (0x000000C1) EXIT_SIGNAL
Computer ID	1117493
Run time	12 days 1 hours 21 min 48 sec
CPU time	10 days 20 hours 25 min 31 sec
Validate state	Invalid
Credit	6,220.80
Device peak FLOPS	2.29 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.10.58</core_client_version> <![CDATA[ <message> - exit code 193 (0xc1) </message> <stderr_txt> Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6020, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 10:32:51 (7980): Can't acquire lockfile (32) - waiting 35s 10:33:04 (4692): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4076, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 13:02:56 (6712): Can't acquire lockfile (32) - waiting 35s 13:03:20 (7556): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5668, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 22:01:47 (5996): Can't acquire lockfile (32) - waiting 35s 22:02:06 (6028): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4300, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7636, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2740, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 09:12:39 (1756): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 09:20:01 (3000): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1684, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5028, iMonCtr=1 Model crash detected, will try to restart... 22:16:10 (4996): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4036, iMonCtr=1 Model crash detected, will try to restart... 10:39:00 (4764): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 11:20:29 (5976): Can't acquire lockfile (32) - waiting 35s 11:20:42 (2352): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2148, iMonCtr=1 Model crash detected, will try to restart... 22:00:22 (4228): Can't acquire lockfile (32) - waiting 35s 22:00:42 (4800): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 16:03:34 (8148): Can't acquire lockfile (32) - waiting 35s 16:03:55 (3768): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Signal 11 received, exiting... Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
06 May 2011 20:07:38	1117493	12743481	hadcm3n_o4he_1900_40_007201141_0	518,400	937,516	1.8085
06 May 2011 20:07:38	1117493	12743481	hadcm3n_o4he_1900_40_007201141_0	492,480	889,644	1.8065
04 May 2011 21:02:16	1117493	12743481	hadcm3n_o4he_1900_40_007201141_0	466,560	842,184	1.8051
03 May 2011 21:59:01	1117493	12743481	hadcm3n_o4he_1900_40_007201141_0	440,640	793,262	1.8002
28 Apr 2011 11:53:16	1117493	12743481	hadcm3n_o4he_1900_40_007201141_0	414,720	744,499	1.7952
25 Apr 2011 14:33:25	1117493	12743481	hadcm3n_o4he_1900_40_007201141_0	388,800	693,932	1.7848
21 Apr 2011 15:23:01	1117493	12743481	hadcm3n_o4he_1900_40_007201141_0	362,880	651,036	1.7941
21 Apr 2011 15:23:01	1117493	12743481	hadcm3n_o4he_1900_40_007201141_0	336,960	603,967	1.7924
21 Apr 2011 15:23:01	1117493	12743481	hadcm3n_o4he_1900_40_007201141_0	311,040	557,920	1.7937
21 Apr 2011 15:23:01	1117493	12743481	hadcm3n_o4he_1900_40_007201141_0	285,120	511,749	1.7949
21 Apr 2011 15:23:01	1117493	12743481	hadcm3n_o4he_1900_40_007201141_0	259,200	464,942	1.7938
13 Apr 2011 13:33:19	1117493	12743481	hadcm3n_o4he_1900_40_007201141_0	233,280	418,602	1.7944
11 Apr 2011 20:22:18	1117493	12743481	hadcm3n_o4he_1900_40_007201141_0	207,360	371,097	1.7896
09 Apr 2011 15:32:04	1117493	12743481	hadcm3n_o4he_1900_40_007201141_0	181,440	325,376	1.7933
07 Apr 2011 18:28:16	1117493	12743481	hadcm3n_o4he_1900_40_007201141_0	155,520	278,109	1.7883
06 Apr 2011 14:28:19	1117493	12743481	hadcm3n_o4he_1900_40_007201141_0	129,600	230,535	1.7788
05 Apr 2011 17:43:25	1117493	12743481	hadcm3n_o4he_1900_40_007201141_0	103,680	184,201	1.7766
04 Apr 2011 11:18:06	1117493	12743481	hadcm3n_o4he_1900_40_007201141_0	77,760	137,670	1.7704
02 Apr 2011 20:26:16	1117493	12743481	hadcm3n_o4he_1900_40_007201141_0	51,840	92,744	1.7890
01 Apr 2011 19:35:59	1117493	12743481	hadcm3n_o4he_1900_40_007201141_0	25,920	44,965	1.7348