climateprediction.net home page
Task 16414853

Task 16414853

Name hadam3p_anz_n7ph_2012_1_008597605_0
Workunit 8744117
Created 26 Mar 2014, 18:50:22 UTC
Sent 28 Mar 2014, 4:07:23 UTC
Report deadline 10 Mar 2015, 9:27:23 UTC
Received 20 Apr 2014, 23:36:21 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 0 (0x00000000)
Computer ID 1067463
Run time 8 days 9 hours 4 min 10 sec
CPU time 8 days 1 hours 49 min 11 sec
Validate state Invalid
Credit 3,987.46
Device peak FLOPS 2.53 GFLOPS
Application version UK Met Office HadAM3P-HadRM3P Australia New Zealand v6.10
windows_intelx86
Stderr
<core_client_version>7.2.33</core_client_version>
<![CDATA[
<stderr_txt>
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6648, iMonCtr=2
09:21:08 (3216): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6192, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5668, selfPID=11072, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3580, selfPID=6536, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
08:30:45 (10148): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5340, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6844, selfPID=7184, iMonCtr=1
Model crash detected, will try to restart...
08:14:36 (2840): No heartbeat from core client for 30 sec - exiting
08:14:37 (2840): No heartbeat from core client for 30 sec - exiting
08:14:38 (2840): No heartbeat from core client for 30 sec - exiting
08:14:39 (2840): No heartbeat from core client for 30 sec - exiting
08:14:40 (2840): No heartbeat from core client for 30 sec - exiting
08:14:41 (2840): No heartbeat from core client for 30 sec - exiting
08:14:42 (2840): No heartbeat from core client for 30 sec - exiting
08:14:43 (2840): No heartbeat from core client for 30 sec - exiting
08:14:44 (2840): No heartbeat from core client for 30 sec - exiting
08:14:45 (2840): No heartbeat from core client for 30 sec - exiting
08:14:46 (2840): No heartbeat from core client for 30 sec - exiting
08:14:47 (2840): No heartbeat from core client for 30 sec - exiting
08:14:48 (2840): No heartbeat from core client for 30 sec - exiting
08:14:49 (2840): No heartbeat from core client for 30 sec - exiting
08:14:50 (2840): No heartbeat from core client for 30 sec - exiting
08:14:51 (2840): No heartbeat from core client for 30 sec - exiting
08:14:52 (2840): No heartbeat from core client for 30 sec - exiting
08:14:53 (2840): No heartbeat from core client for 30 sec - exiting
08:14:54 (2840): No heartbeat from core client for 30 sec - exiting
08:14:55 (2840): No heartbeat from core client for 30 sec - exiting
08:14:56 (2840): No heartbeat from core client for 30 sec - exiting
08:14:57 (2840): No heartbeat from core client for 30 sec - exiting
08:14:58 (2840): No heartbeat from core client for 30 sec - exiting
08:14:59 (2840): No heartbeat from core client for 30 sec - exiting
08:15:00 (2840): No heartbeat from core client for 30 sec - exiting
08:15:01 (2840): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6204, selfPID=7104, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3036, selfPID=1624, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6544, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6944, selfPID=5732, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=9256, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=9980, selfPID=9368, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5500, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3888, selfPID=2800, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Global Worker:: CPDN process is not running, exiting, bRetVal = 0, checkPID=0, selfPID=7724, iMonCtr=1
Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=8008, selfPID=8008, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=8008, selfPID=6360, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Called boinc_finish

</stderr_txt>
<message>
upload failure: <file_xfer_error>
  <file_name>hadam3p_anz_n7ph_2012_1_008597605_0_9.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_anz_n7ph_2012_1_008597605_0_10.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_anz_n7ph_2012_1_008597605_0_11.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_anz_n7ph_2012_1_008597605_0_12.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>

</message>
]]>
Latest Trickles Received
Time Sent (UTC) Host ID Result ID Result Name Timestep CPU Time (sec) Average (sec/TS)
19 Apr 2014 20:34:56 1067463 16414853 hadam3p_anz_n7ph_2012_1_008597605_0 92,459 674,454 7.2946
18 Apr 2014 00:40:36 1067463 16414853 hadam3p_anz_n7ph_2012_1_008597605_0 80,939 588,238 7.2677
15 Apr 2014 23:49:01 1067463 16414853 hadam3p_anz_n7ph_2012_1_008597605_0 69,419 502,037 7.2320
14 Apr 2014 12:51:37 1067463 16414853 hadam3p_anz_n7ph_2012_1_008597605_0 57,899 414,747 7.1633
12 Apr 2014 18:23:26 1067463 16414853 hadam3p_anz_n7ph_2012_1_008597605_0 46,379 327,846 7.0688
10 Apr 2014 21:40:17 1067463 16414853 hadam3p_anz_n7ph_2012_1_008597605_0 34,859 241,875 6.9387
02 Apr 2014 15:39:33 1067463 16414853 hadam3p_anz_n7ph_2012_1_008597605_0 23,339 161,821 6.9335
01 Apr 2014 17:07:38 1067463 16414853 hadam3p_anz_n7ph_2012_1_008597605_0 11,819 81,806 6.9216


©2024 cpdn.org