Task 13096731

Name	hadcm3n_yah2_1900_40_007346432_1
Workunit	7543862
Created	6 Jul 2011, 13:37:43 UTC
Sent	19 Jul 2011, 18:03:44 UTC
Report deadline	19 Oct 2011, 1:30:55 UTC
Received	10 Oct 2011, 16:54:22 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	193 (0x000000C1) EXIT_SIGNAL
Computer ID	1083083
Run time	13 days 10 hours 34 min 18 sec
CPU time	12 days 21 hours 38 min 52 sec
Validate state	Invalid
Credit	9,331.20
Device peak FLOPS	3.23 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.10.56</core_client_version> <![CDATA[ <message> - exit code 193 (0xc1) </message> <stderr_txt> Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4472, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4472, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3796, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3796, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3796, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5188, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5188, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4932, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4932, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6092, iMonCtr=1 Model crash detected, will try to restart... 18:10:32 (1864): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4348, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2556, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2556, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4044, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4140, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4140, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4140, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4140, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 07:08:39 (4328): No heartbeat from core client for 30 sec - exiting 07:08:40 (4328): No heartbeat from core client for 30 sec - exiting 07:08:41 (4328): No heartbeat from core client for 30 sec - exiting 07:08:42 (4328): No heartbeat from core client for 30 sec - exiting 07:08:43 (4328): No heartbeat from core client for 30 sec - exiting 07:08:44 (4328): No heartbeat from core client for 30 sec - exiting 07:08:45 (4328): No heartbeat from core client for 30 sec - exiting 07:08:46 (4328): No heartbeat from core client for 30 sec - exiting 07:08:47 (4328): No heartbeat from core client for 30 sec - exiting 07:08:48 (4328): No heartbeat from core client for 30 sec - exiting 07:08:49 (4328): No heartbeat from core client for 30 sec - exiting 07:08:50 (4328): No heartbeat from core client for 30 sec - exiting 07:08:51 (4328): No heartbeat from core client for 30 sec - exiting 07:08:52 (4328): No heartbeat from core client for 30 sec - exiting 07:08:53 (4328): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 07:56:27 (5104): No heartbeat from core client for 30 sec - exiting 07:56:29 (5104): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5676, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4872, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5772, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5772, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5772, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5772, iMonCtr=1 Model crash detected, will try to restart... Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x77A13A93 read attempt to address 0x00000000 Engaging BOINC Windows Runtime Debugger... Cannot serialize file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_yah2_1900_40_007346432/dataout/shmem_restart.day Signal 11 received, exiting... Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
10 Oct 2011 16:54:30	1083083	13096731	hadcm3n_yah2_1900_40_007346432_1	777,600	1,114,727	1.4335
09 Oct 2011 21:02:54	1083083	13096731	hadcm3n_yah2_1900_40_007346432_1	751,680	1,078,338	1.4346
08 Oct 2011 07:00:58	1083083	13096731	hadcm3n_yah2_1900_40_007346432_1	725,760	1,042,369	1.4362
24 Sep 2011 21:51:23	1083083	13096731	hadcm3n_yah2_1900_40_007346432_1	699,840	1,006,384	1.4380
24 Sep 2011 11:12:56	1083083	13096731	hadcm3n_yah2_1900_40_007346432_1	673,920	968,769	1.4375
20 Sep 2011 21:08:44	1083083	13096731	hadcm3n_yah2_1900_40_007346432_1	648,000	931,954	1.4382
18 Sep 2011 23:51:32	1083083	13096731	hadcm3n_yah2_1900_40_007346432_1	622,080	894,947	1.4386
18 Sep 2011 10:03:07	1083083	13096731	hadcm3n_yah2_1900_40_007346432_1	596,160	857,313	1.4381
17 Sep 2011 15:43:47	1083083	13096731	hadcm3n_yah2_1900_40_007346432_1	570,240	819,927	1.4379
28 Aug 2011 18:56:33	1083083	13096731	hadcm3n_yah2_1900_40_007346432_1	544,320	782,161	1.4370
22 Aug 2011 17:49:28	1083083	13096731	hadcm3n_yah2_1900_40_007346432_1	518,400	744,886	1.4369
21 Aug 2011 14:46:27	1083083	13096731	hadcm3n_yah2_1900_40_007346432_1	492,480	707,613	1.4368
20 Aug 2011 14:26:48	1083083	13096731	hadcm3n_yah2_1900_40_007346432_1	466,560	670,445	1.4370
15 Aug 2011 12:30:11	1083083	13096731	hadcm3n_yah2_1900_40_007346432_1	440,640	633,805	1.4384
14 Aug 2011 14:55:34	1083083	13096731	hadcm3n_yah2_1900_40_007346432_1	414,720	596,060	1.4373
14 Aug 2011 03:51:25	1083083	13096731	hadcm3n_yah2_1900_40_007346432_1	388,800	557,520	1.4340
13 Aug 2011 17:49:14	1083083	13096731	hadcm3n_yah2_1900_40_007346432_1	362,880	519,532	1.4317
12 Aug 2011 19:16:21	1083083	13096731	hadcm3n_yah2_1900_40_007346432_1	336,960	482,559	1.4321
10 Aug 2011 19:43:49	1083083	13096731	hadcm3n_yah2_1900_40_007346432_1	311,040	445,373	1.4319
08 Aug 2011 17:50:30	1083083	13096731	hadcm3n_yah2_1900_40_007346432_1	285,120	407,508	1.4293