climateprediction.net home page
Task 12558260

Task 12558260

Name hadam3p_saf_2grc_1974_1_007151403_1
Workunit 7336183
Created 5 Feb 2011, 12:19:31 UTC
Sent 5 Feb 2011, 12:23:56 UTC
Report deadline 18 Jan 2012, 17:43:56 UTC
Received 20 Mar 2011, 12:23:03 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 0 (0x00000000)
Computer ID 1131653
Run time 2 days 3 hours 41 min 19 sec
CPU time 56 min 48 sec
Validate state Invalid
Credit 749.07
Device peak FLOPS 2.28 GFLOPS
Application version UK Met Office HadAM3P-HadRM3P Southern Africa v6.08
windows_intelx86
Stderr
<core_client_version>6.10.58</core_client_version>
<![CDATA[
<stderr_txt>
Suspended CPDN Monitor - Suspend request from BOINC...
CPDN Monitor - Quit request from BOINC...
Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4232, selfPID=4232, iMonCtr=2
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
20:49:40 (3216): Can't acquire lockfile (32) - waiting 35s
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5408, selfPID=3216, iMonCtr=1
Model crash detected, will try to restart...
21:29:53 (3696): No heartbeat from core client for 30 sec - exiting
21:29:55 (3696): No heartbeat from core client for 30 sec - exiting
21:29:56 (3696): No heartbeat from core client for 30 sec - exiting
21:29:57 (3696): No heartbeat from core client for 30 sec - exiting
21:29:58 (3696): No heartbeat from core client for 30 sec - exiting
21:29:59 (3696): No heartbeat from core client for 30 sec - exiting
21:30:00 (3696): No heartbeat from core client for 30 sec - exiting
21:30:01 (3696): No heartbeat from core client for 30 sec - exiting
21:30:02 (3696): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
CPDN Monitor - Quit request from BOINC...
Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=1600, selfPID=1600, iMonCtr=2
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
C10:18:37 (4692): No heartbeat from core client for 30 sec - exiting
10:18:38 (4692): No heartbeat from core client for 30 sec - exiting
10:18:39 (4692): No heartbeat from core client for 30 sec - exiting
10:18:40 (4692): No heartbeat from core client for 30 sec - exiting
10:18:41 (4692): No heartbeat from core client for 30 sec - exiting
10:18:42 (4692): No heartbeat from core client for 30 sec - exiting
10:18:43 (4692): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=836, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2440, selfPID=3964, iMonCtr=1
Model crash detected, will try to restart...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
CPDN Monitor - Quit request from BOINC...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2700, iMonCtr=2
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4216, selfPID=4216, iMonCtr=2
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
16:11:24 (1328): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
16:21:07 (2860): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4172, selfPID=4172, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4472, selfPID=2040, iMonCtr=1
Model crash detected, will try to restart...
CPDN Monitor - Quit request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4780, iMonCtr=2
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3732, iMonCtr=2
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2016, iMonCtr=2
Model crash detected, will try to restart...
18:21:07 (3568): No heartbeat from core client for 30 sec - exiting
18:21:08 (3568): No heartbeat from core client for 30 sec - exiting
18:21:09 (3568): No heartbeat from core client for 30 sec - exiting
18:21:11 (3568): No heartbeat from core client for 30 sec - exiting
18:21:12 (3568): No heartbeat from core client for 30 sec - exiting
18:21:13 (3568): No heartbeat from core client for 30 sec - exiting
18:21:14 (3568): No heartbeat from core client for 30 sec - exiting
18:21:15 (3568): No heartbeat from core client for 30 sec - exiting
18:21:16 (3568): No heartbeat from core client for 30 sec - exiting
18:21:17 (3568): No heartbeat from core client for 30 sec - exiting
18:21:18 (3568): No heartbeat from core client for 30 sec - exiting
18:21:19 (3568): No heartbeat from core client for 30 sec - exiting
18:21:20 (3568): No heartbeat from core client for 30 sec - exiting
18:21:21 (3568): No heartbeat from core client for 30 sec - exiting
18:21:23 (3568): No heartbeat from core client for 30 sec - exiting
18:21:24 (3568): No heartbeat from core client for 30 sec - exiting
18:21:25 (3568): No heartbeat from core client for 30 sec - exiting
18:21:26 (3568): No heartbeat from core client for 30 sec - exiting
18:21:27 (3568): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
GController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3616, selfPID=3672, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
CPDN Monitor - Quit request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2328, selfPID=1748, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
18:29:44 (1748): called boinc_finish
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2968, selfPID=3208, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
cpdnmonitor: cannot open input file D:\TMP\BOINC/projects/climateprediction.net/hadam3p_saf_2grc_1974_1_007151403/dataout/atmos_restart.day after 11 attempts
cpdnmonitor: cannot open input file D:\TMP\BOINC/projects/climateprediction.net/hadam3p_saf_2grc_1974_1_007151403/dataout/region_restart.day after 11 attempts

Model crashed: READHIST: End of file in READ from history file for namelist NLIHISTO                                                                                                                                                                                           tmp/xaakm.pipe_dummy                                                            2048    

Model crashed: READHIST: End of file in READ from history file for namelist NLIHISTO                                                                                                                                                                                           tmp/xaakg.pipe_dummy                                                            2048    
Leaving CPDN_Main::Monitor...
15:47:49 (4176): called boinc_finish

</stderr_txt>
<message>
<file_xfer_error>
  <file_name>hadam3p_saf_2grc_1974_1_007151403_1_5.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_saf_2grc_1974_1_007151403_1_6.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_saf_2grc_1974_1_007151403_1_7.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_saf_2grc_1974_1_007151403_1_8.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_saf_2grc_1974_1_007151403_1_9.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_saf_2grc_1974_1_007151403_1_10.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_saf_2grc_1974_1_007151403_1_11.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_saf_2grc_1974_1_007151403_1_12.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>

</message>
]]>
Latest Trickles Received
Time Sent (UTC) Host ID Result ID Result Name Timestep CPU Time (sec) Average (sec/TS)
14 Mar 2011 08:58:48 1131653 12558260 hadam3p_saf_2grc_1974_1_007151403_1 46,176 139,997 3.0318
08 Mar 2011 18:01:06 1131653 12558260 hadam3p_saf_2grc_1974_1_007151403_1 34,656 105,353 3.0400
08 Mar 2011 17:10:47 1131653 12558260 hadam3p_saf_2grc_1974_1_007151403_1 23,136 70,575 3.0504
23 Feb 2011 20:29:47 1131653 12558260 hadam3p_saf_2grc_1974_1_007151403_1 11,616 35,564 3.0616


©2024 cpdn.org