Task 13110371

Name	hadcm3n_yfqh_1900_40_007353251_0
Workunit	7550681
Created	6 Jul 2011, 14:25:44 UTC
Sent	15 Jul 2011, 17:33:24 UTC
Report deadline	15 Oct 2011, 1:00:35 UTC
Received	31 Oct 2011, 21:56:16 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	25 (0x00000019) Unknown error code
Computer ID	950229
Run time	105 days 10 hours 6 min 17 sec
CPU time	94 days 4 hours 18 min 53 sec
Validate state	Invalid
Credit	11,819.52
Device peak FLOPS	1.14 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.12.34</core_client_version> <![CDATA[ <message> The drive cannot locate a specific area or track on the disk. (0x19) - exit code 25 (0x19) </message> <stderr_txt> Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=148, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1900, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3168, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4224, iMonCtr=1 Model crash detected, will try to restart... 21:15:53 (3712): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4700, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4160, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5340, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=464, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4524, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4524, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... 19:00:59 (4256): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3720, iMonCtr=1 Model crash detected, will try to restart... 10:12:22 (144): Can't acquire lockfile (32) - waiting 35s 10:12:39 (2704): Can't acquire lockfile (32) - waiting 35s 10:12:48 (4316): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 10:13:14 (2704): Can't acquire lockfile (32) - exiting 10:13:14 (2704): Error: The process cannot access the file because it is being used by another process. (0x20) 10:13:45 (144): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4368, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 01:21:21 (1820): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=736, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... 21:26:16 (4900): Can't acquire lockfile (32) - waiting 35s 21:26:38 (3624): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
31 Oct 2011 18:36:02	950229	13110371	hadcm3n_yfqh_1900_40_007353251_0	984,960	7,946,701	8.0680
31 Oct 2011 16:39:57	950229	13110371	hadcm3n_yfqh_1900_40_007353251_0	959,040	7,725,222	8.0552
31 Oct 2011 14:12:06	950229	13110371	hadcm3n_yfqh_1900_40_007353251_0	933,120	7,503,190	8.0410
31 Oct 2011 14:12:06	950229	13110371	hadcm3n_yfqh_1900_40_007353251_0	907,200	7,279,955	8.0246
18 Oct 2011 02:14:48	950229	13110371	hadcm3n_yfqh_1900_40_007353251_0	881,280	7,057,345	8.0081
15 Oct 2011 05:19:09	950229	13110371	hadcm3n_yfqh_1900_40_007353251_0	855,360	6,835,246	7.9911
12 Oct 2011 10:26:59	950229	13110371	hadcm3n_yfqh_1900_40_007353251_0	829,440	6,620,180	7.9815
09 Oct 2011 22:04:09	950229	13110371	hadcm3n_yfqh_1900_40_007353251_0	803,520	6,408,169	7.9751
07 Oct 2011 08:55:57	950229	13110371	hadcm3n_yfqh_1900_40_007353251_0	777,600	6,195,036	7.9669
04 Oct 2011 14:57:36	950229	13110371	hadcm3n_yfqh_1900_40_007353251_0	751,680	5,973,753	7.9472
29 Sep 2011 18:42:30	950229	13110371	hadcm3n_yfqh_1900_40_007353251_0	725,760	5,751,864	7.9253
26 Sep 2011 22:25:43	950229	13110371	hadcm3n_yfqh_1900_40_007353251_0	699,840	5,528,204	7.8992
24 Sep 2011 01:16:29	950229	13110371	hadcm3n_yfqh_1900_40_007353251_0	673,920	5,304,511	7.8711
20 Sep 2011 23:20:18	950229	13110371	hadcm3n_yfqh_1900_40_007353251_0	648,000	5,085,092	7.8474
18 Sep 2011 12:35:45	950229	13110371	hadcm3n_yfqh_1900_40_007353251_0	622,080	4,878,186	7.8417
15 Sep 2011 19:32:58	950229	13110371	hadcm3n_yfqh_1900_40_007353251_0	596,160	4,659,142	7.8153
11 Sep 2011 10:18:33	950229	13110371	hadcm3n_yfqh_1900_40_007353251_0	570,240	4,457,071	7.8161
08 Sep 2011 16:17:49	950229	13110371	hadcm3n_yfqh_1900_40_007353251_0	544,320	4,249,483	7.8070
06 Sep 2011 06:47:55	950229	13110371	hadcm3n_yfqh_1900_40_007353251_0	518,400	4,042,754	7.7985
03 Sep 2011 09:31:37	950229	13110371	hadcm3n_yfqh_1900_40_007353251_0	492,480	3,822,413	7.7616