Task 14202847

Name	hadcm3n_001r_1900_40_007817210_2
Workunit	7972319
Created	28 Feb 2012, 19:35:01 UTC
Sent	28 Feb 2012, 19:35:25 UTC
Report deadline	30 May 2012, 3:02:36 UTC
Received	24 Apr 2012, 22:06:45 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	-1073741819 (0xC0000005) STATUS_ACCESS_VIOLATION
Computer ID	1190096
Run time	13 days 2 hours 5 min 15 sec
CPU time	11 days 13 hours 3 min 43 sec
Validate state	Invalid
Credit	6,220.80
Device peak FLOPS	2.86 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>7.0.25</core_client_version> <![CDATA[ <message> - exit code -1073741819 (0xc0000005) </message> <stderr_txt> Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4656, iMonCtr=1 Model crash detected, will try to restart... C15:05:19 (5080): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 15:05:20 (5080): No heartbeat from core client for 30 sec - exiting 15:05:21 (5080): No heartbeat from core client for 30 sec - exiting 15:05:22 (5080): No heartbeat from core client for 30 sec - exiting 15:05:23 (5080): No heartbeat from core client for 30 sec - exiting 15:05:24 (5080): No heartbeat from core client for 30 sec - exiting 15:05:25 (5080): No heartbeat from core client for 30 sec - exiting 15:05:26 (5080): No heartbeat from core client for 30 sec - exiting 15:05:27 (5080): No heartbeat from core client for 30 sec - exiting 15:05:28 (5080): No heartbeat from core client for 30 sec - exiting 15:05:29 (5080): No heartbeat from core client for 30 sec - exiting Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1188, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4044, iMonCtr=1 Model crash detected, will try to restart... BUFFIN: C I/O Error feof - Unit 63 - Return code = 16 BUFFIN: C I/O Error feof - Unit 64 - Return code = 16 BUFFIN: C I/O Error feof - Unit 65 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 Error converting file to netcdf: dataout/001rko.pja6c10 Error converting file to netcdf: dataout/001rko.pia6c10 Error converting file to netcdf: dataout/001rko.pfa6c10 Error converting file to netcdf: dataout/001rka.pha6c10 Error converting file to netcdf: dataout/001rka.pga6c10 Error converting file to netcdf: dataout/001rka.pea6c10 Error converting file to netcdf: dataout/001rka.pda6c10 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3420, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3420, iMonCtr=1 Model crash detected, will try to restart... CCPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4728, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4460, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4824, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 08:40:13 (5276): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x77C9FF4B write attempt to address 0x4341DAAA Engaging BOINC Windows Runtime Debugger... Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x77C83242 read attempt to address 0x00000000 Engaging BOINC Windows Runtime Debugger... </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
24 Apr 2012 16:18:30	1190096	14202847	hadcm3n_001r_1900_40_007817210_2	518,400	995,256	1.9199
27 Mar 2012 13:04:32	1190096	14202847	hadcm3n_001r_1900_40_007817210_2	492,480	947,988	1.9249
24 Mar 2012 23:34:34	1190096	14202847	hadcm3n_001r_1900_40_007817210_2	466,560	898,010	1.9247
21 Mar 2012 10:57:20	1190096	14202847	hadcm3n_001r_1900_40_007817210_2	440,640	848,394	1.9254
20 Mar 2012 09:19:03	1190096	14202847	hadcm3n_001r_1900_40_007817210_2	414,720	800,015	1.9290
19 Mar 2012 09:56:20	1190096	14202847	hadcm3n_001r_1900_40_007817210_2	388,800	750,785	1.9310
17 Mar 2012 22:40:51	1190096	14202847	hadcm3n_001r_1900_40_007817210_2	362,880	701,736	1.9338
16 Mar 2012 20:25:09	1190096	14202847	hadcm3n_001r_1900_40_007817210_2	336,960	652,504	1.9364
15 Mar 2012 20:07:41	1190096	14202847	hadcm3n_001r_1900_40_007817210_2	311,040	603,337	1.9397
14 Mar 2012 18:38:43	1190096	14202847	hadcm3n_001r_1900_40_007817210_2	285,120	555,697	1.9490
11 Mar 2012 11:43:14	1190096	14202847	hadcm3n_001r_1900_40_007817210_2	259,200	504,241	1.9454
09 Mar 2012 19:32:42	1190096	14202847	hadcm3n_001r_1900_40_007817210_2	233,280	456,046	1.9549
08 Mar 2012 19:34:41	1190096	14202847	hadcm3n_001r_1900_40_007817210_2	207,360	408,161	1.9684
07 Mar 2012 19:24:27	1190096	14202847	hadcm3n_001r_1900_40_007817210_2	181,440	359,724	1.9826
06 Mar 2012 16:37:34	1190096	14202847	hadcm3n_001r_1900_40_007817210_2	155,520	311,059	2.0001
05 Mar 2012 14:13:46	1190096	14202847	hadcm3n_001r_1900_40_007817210_2	129,600	261,313	2.0163
04 Mar 2012 13:23:32	1190096	14202847	hadcm3n_001r_1900_40_007817210_2	103,680	210,683	2.0321
03 Mar 2012 11:48:01	1190096	14202847	hadcm3n_001r_1900_40_007817210_2	77,760	158,163	2.0340
02 Mar 2012 10:39:33	1190096	14202847	hadcm3n_001r_1900_40_007817210_2	51,840	106,512	2.0546
01 Mar 2012 10:32:56	1190096	14202847	hadcm3n_001r_1900_40_007817210_2	25,920	55,171	2.1285