climateprediction.net home page
Task 16757596

Task 16757596

Name hadam3p_eu_o9ef_2013_1_008838768_0
Workunit 8984697
Created 8 Jul 2014, 14:37:35 UTC
Sent 8 Jul 2014, 14:39:36 UTC
Report deadline 20 Jun 2015, 19:59:36 UTC
Received 14 Aug 2014, 14:06:06 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 0 (0x00000000)
Computer ID 1327910
Run time 2 days 13 hours 32 min 35 sec
CPU time 15 hours 44 min 23 sec
Validate state Invalid
Credit 200.76
Device peak FLOPS 1.62 GFLOPS
Application version UK Met Office HadAM3P-HadRM3P Europe v6.09
windows_intelx86
Stderr
<core_client_version>7.2.42</core_client_version>
<![CDATA[
<stderr_txt>
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4844, iMonCtr=2
19:35:53 (836): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
19:35:54 (836): No heartbeat from core client for 30 sec - exiting
19:35:55 (836): No heartbeat from core client for 30 sec - exiting
19:35:56 (836): No heartbeat from core client for 30 sec - exiting
20:32:52 (3012): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
20:32:53 (3012): No heartbeat from core client for 30 sec - exiting
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3508, iMonCtr=2
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=772, iMonCtr=2
Model crash detected, will try to restart...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3864, selfPID=1200, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3160, iMonCtr=2
Model crash detected, will try to restart...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=604, iMonCtr=2
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=704, iMonCtr=2
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
CPDN Monitor - Quit request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2424, selfPID=2120, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3896, selfPID=1880, iMonCtr=1
Model crash detected, will try to restart...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2700, selfPID=2700, iMonCtr=2
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3392, iMonCtr=2
07:25:22 (4512): No heartbeat from core client for 30 sec - exiting
07:25:23 (4512): No heartbeat from core client for 30 sec - exiting
07:25:24 (4512): No heartbeat from core client for 30 sec - exiting
07:25:25 (4512): No heartbeat from core client for 30 sec - exiting
07:25:26 (4512): No heartbeat from core client for 30 sec - exiting
07:25:27 (4512): No heartbeat from core client for 30 sec - exiting
07:25:28 (4512): No heartbeat from core client for 30 sec - exiting
07:25:29 (4512): No heartbeat from core client for 30 sec - exiting
07:25:30 (4512): No heartbeat from core client for 30 sec - exiting
07:25:31 (4512): No heartbeat from core client for 30 sec - exiting
07:25:32 (4512): No heartbeat from core client for 30 sec - exiting
07:27:06 (4512): No heartbeat from core client for 30 sec - exiting
07:27:07 (4512): No heartbeat from core client for 30 sec - exiting
07:27:08 (4512): No heartbeat from core client for 30 sec - exiting
07:27:09 (4512): No heartbeat from core client for 30 sec - exiting
07:27:10 (4512): No heartbeat from core client for 30 sec - exiting
07:27:11 (4512): No heartbeat from core client for 30 sec - exiting
07:27:12 (4512): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
07:30:31 (2892): No heartbeat from core client for 30 sec - exiting
07:30:34 (2892): No heartbeat from core client for 30 sec - exiting
07:30:35 (2892): No heartbeat from core client for 30 sec - exiting
07:30:36 (2892): No heartbeat from core client for 30 sec - exiting
07:30:37 (2892): No heartbeat from core client for 30 sec - exiting
07:30:38 (2892): No heartbeat from core client for 30 sec - exiting
07:30:39 (2892): No heartbeat from core client for 30 sec - exiting
07:30:40 (2892): No heartbeat from core client for 30 sec - exiting
07:30:41 (2892): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3116, iMonCtr=2
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
GController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2884, selfPID=3452, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=1800, selfPID=2932, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3776, selfPID=3360, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3268, selfPID=3400, iMonCtr=1
Model crash detected, will try to restart...
CPDN Monitor - Quit request from BOINC...
Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4500, selfPID=4500, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4628, iMonCtr=2
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2604, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2180, selfPID=3204, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3132, selfPID=2708, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3312, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2736, selfPID=2080, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
CPDN Monitor - Quit request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=1260, selfPID=2800, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3596, selfPID=3240, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Called boinc_finish

</stderr_txt>
<message>
upload failure: <file_xfer_error>
  <file_name>hadam3p_eu_o9ef_2013_1_008838768_0_2.zip</file_name>
  <error_code>-161 (not found)</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_eu_o9ef_2013_1_008838768_0_3.zip</file_name>
  <error_code>-161 (not found)</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_eu_o9ef_2013_1_008838768_0_4.zip</file_name>
  <error_code>-161 (not found)</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_eu_o9ef_2013_1_008838768_0_5.zip</file_name>
  <error_code>-161 (not found)</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_eu_o9ef_2013_1_008838768_0_6.zip</file_name>
  <error_code>-161 (not found)</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_eu_o9ef_2013_1_008838768_0_7.zip</file_name>
  <error_code>-161 (not found)</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_eu_o9ef_2013_1_008838768_0_8.zip</file_name>
  <error_code>-161 (not found)</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_eu_o9ef_2013_1_008838768_0_9.zip</file_name>
  <error_code>-161 (not found)</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_eu_o9ef_2013_1_008838768_0_10.zip</file_name>
  <error_code>-161 (not found)</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_eu_o9ef_2013_1_008838768_0_11.zip</file_name>
  <error_code>-161 (not found)</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_eu_o9ef_2013_1_008838768_0_12.zip</file_name>
  <error_code>-161 (not found)</error_code>
</file_xfer_error>

</message>
]]>
Latest Trickles Received
Time Sent (UTC) Host ID Result ID Result Name Timestep CPU Time (sec) Average (sec/TS)
03 Aug 2014 18:21:51 1327910 16757596 hadam3p_eu_o9ef_2013_1_008838768_0 11,638 41,539 3.5693
01 Aug 2014 16:16:04 1327910 16757596 hadam3p_eu_o9ef_2013_1_008838768_0 11,630 40,959 3.5218
01 Aug 2014 04:46:01 1327910 16757596 hadam3p_eu_o9ef_2013_1_008838768_0 11,624 40,408 3.4763
31 Jul 2014 16:28:52 1327910 16757596 hadam3p_eu_o9ef_2013_1_008838768_0 11,616 39,881 3.4333


©2024 climateprediction.net