Task 13420602

Name	hadcm3n_u2ps_1980_40_007459131_2
Workunit	7656634
Created	25 Sep 2011, 12:52:25 UTC
Sent	25 Sep 2011, 13:05:41 UTC
Report deadline	25 Dec 2011, 20:32:52 UTC
Received	2 Dec 2011, 19:22:55 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	193 (0x000000C1) EXIT_SIGNAL
Computer ID	1143106
Run time	13 days 10 hours 43 min 32 sec
CPU time	11 days 20 hours 32 min 53 sec
Validate state	Invalid
Credit	6,220.80
Device peak FLOPS	2.41 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.6.28</core_client_version> <![CDATA[ <message> - exit code 193 (0xc1) </message> <stderr_txt> Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4820, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5208, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5864, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5632, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5036, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5808, iMonCtr=1 Model crash detected, will try to restart... 15:43:35 (5792): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3680, iMonCtr=1 Model crash detected, will try to restart... 06:20:03 (5188): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3096, iMonCtr=1 Model crash detected, will try to restart... 18:08:17 (5276): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:08:18 (5276): No heartbeat from core client for 30 sec - exiting 18:08:19 (5276): No heartbeat from core client for 30 sec - exiting 18:08:20 (5276): No heartbeat from core client for 30 sec - exiting 18:08:21 (5276): No heartbeat from core client for 30 sec - exiting 18:08:22 (5276): No heartbeat from core client for 30 sec - exiting 18:08:23 (5276): No heartbeat from core client for 30 sec - exiting 18:08:24 (5276): No heartbeat from core client for 30 sec - exiting 18:08:25 (5276): No heartbeat from core client for 30 sec - exiting 18:08:26 (5276): No heartbeat from core client for 30 sec - exiting 18:08:27 (5276): No heartbeat from core client for 30 sec - exiting Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4528, iMonCtr=1 Model crash detected, will try to restart... 12:34:45 (5384): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4272, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3868, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CCPDN Monitor - Quit request from BOINC... C16:00:41 (2420): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 16:00:43 (2420): No heartbeat from core client for 30 sec - exiting 16:00:44 (2420): No heartbeat from core client for 30 sec - exiting 16:00:45 (2420): No heartbeat from core client for 30 sec - exiting 16:00:46 (2420): No heartbeat from core client for 30 sec - exiting 16:00:47 (2420): No heartbeat from core client for 30 sec - exiting 16:00:48 (2420): No heartbeat from core client for 30 sec - exiting 16:00:49 (2420): No heartbeat from core client for 30 sec - exiting 16:00:50 (2420): No heartbeat from core client for 30 sec - exiting 16:00:51 (2420): No heartbeat from core client for 30 sec - exiting CPDN Monitor - Quit request from BOINC... 20:36:00 (5640): No heartbeat from core client for 30 sec - exiting 20:36:01 (5640): No heartbeat from core client for 30 sec - exiting 20:36:02 (5640): No heartbeat from core client for 30 sec - exiting 20:36:03 (5640): No heartbeat from core client for 30 sec - exiting 20:36:04 (5640): No heartbeat from core client for 30 sec - exiting 20:36:05 (5640): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... BUFFIN: C I/O Error feof - Unit 63 - Return code = 16 BUFFIN: C I/O Error feof - Unit 64 - Return code = 16 BUFFIN: C I/O Error feof - Unit 65 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 Error converting file to netcdf: dataout/u2psko.pjj7c10 Error converting file to netcdf: dataout/u2psko.pij7c10 Error converting file to netcdf: dataout/u2psko.pfj7c10 Error converting file to netcdf: dataout/u2pska.phj7c10 Error converting file to netcdf: dataout/u2pska.pgj7c10 Error converting file to netcdf: dataout/u2pska.pej7c10 Error converting file to netcdf: dataout/u2pska.pdj7c10 CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1548, iMonCtr=1 Model crash detected, will try to restart... C19:32:41 (4220): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5808, iMonCtr=1 Model crash detected, will try to restart... CC20:27:42 (5812): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x77936E0F read attempt to address 0x40A8B938 Engaging BOINC Windows Runtime Debugger... Cannot serialize file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_u2ps_1980_40_007459131/dataout/shmem_restart.day Signal 11 received, exiting... Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
01 Dec 2011 19:27:50	1143106	13420602	hadcm3n_u2ps_1980_40_007459131_2	518,400	1,024,369	1.9760
26 Nov 2011 20:36:38	1143106	13420602	hadcm3n_u2ps_1980_40_007459131_2	492,480	972,803	1.9753
22 Nov 2011 18:57:36	1143106	13420602	hadcm3n_u2ps_1980_40_007459131_2	466,560	922,558	1.9774
19 Nov 2011 16:15:42	1143106	13420602	hadcm3n_u2ps_1980_40_007459131_2	440,640	869,814	1.9740
15 Nov 2011 20:43:37	1143106	13420602	hadcm3n_u2ps_1980_40_007459131_2	414,720	818,978	1.9748
15 Nov 2011 20:43:37	1143106	13420602	hadcm3n_u2ps_1980_40_007459131_2	388,800	767,933	1.9751
09 Nov 2011 14:45:39	1143106	13420602	hadcm3n_u2ps_1980_40_007459131_2	362,880	718,276	1.9794
06 Nov 2011 17:11:13	1143106	13420602	hadcm3n_u2ps_1980_40_007459131_2	336,960	668,144	1.9829
05 Nov 2011 06:36:35	1143106	13420602	hadcm3n_u2ps_1980_40_007459131_2	311,040	616,909	1.9834
31 Oct 2011 18:22:22	1143106	13420602	hadcm3n_u2ps_1980_40_007459131_2	285,120	567,298	1.9897
31 Oct 2011 13:14:44	1143106	13420602	hadcm3n_u2ps_1980_40_007459131_2	259,200	515,618	1.9893
31 Oct 2011 13:14:44	1143106	13420602	hadcm3n_u2ps_1980_40_007459131_2	233,280	464,113	1.9895
16 Oct 2011 08:46:57	1143106	13420602	hadcm3n_u2ps_1980_40_007459131_2	207,360	413,301	1.9932
15 Oct 2011 03:02:23	1143106	13420602	hadcm3n_u2ps_1980_40_007459131_2	181,440	362,123	1.9958
14 Oct 2011 12:22:29	1143106	13420602	hadcm3n_u2ps_1980_40_007459131_2	155,520	312,177	2.0073
11 Oct 2011 15:42:33	1143106	13420602	hadcm3n_u2ps_1980_40_007459131_2	129,600	261,561	2.0182
09 Oct 2011 04:48:11	1143106	13420602	hadcm3n_u2ps_1980_40_007459131_2	103,680	210,004	2.0255
05 Oct 2011 07:53:56	1143106	13420602	hadcm3n_u2ps_1980_40_007459131_2	77,760	158,009	2.0320
02 Oct 2011 18:23:05	1143106	13420602	hadcm3n_u2ps_1980_40_007459131_2	51,840	105,970	2.0442
01 Oct 2011 05:55:54	1143106	13420602	hadcm3n_u2ps_1980_40_007459131_2	25,920	52,409	2.0220