climateprediction.net home page
Task 15740579

Task 15740579

Name hadam3p_pnw_qab9_2030_1_008356766_0
Workunit 8507625
Created 19 Apr 2013, 16:58:35 UTC
Sent 19 Apr 2013, 18:25:00 UTC
Report deadline 1 Apr 2014, 23:45:00 UTC
Received 31 Aug 2013, 14:09:07 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 0 (0x00000000)
Computer ID 1023214
Run time 9 days 9 hours 28 min 11 sec
CPU time 6 days 14 hours 58 min 45 sec
Validate state Invalid
Credit 2,254.99
Device peak FLOPS 1.90 GFLOPS
Application version UK Met Office HadAM3P-HadRM3P Pacific North West v6.09
windows_intelx86
Stderr
<core_client_version>7.0.28</core_client_version>
<![CDATA[
<stderr_txt>
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=1580, selfPID=1820, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5528, selfPID=5804, iMonCtr=1
Model crash detected, will try to restart...
CGlobal Worker:: CPDN process is not running, exiting, bRetVal = 1, chController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5716, iMonCtr=2
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3756, selfPID=5660, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4928, selfPID=352, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1280, iMonCtr=2
Model crash detected, will try to restart...
20:01:56 (6104): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5072, iMonCtr=2
Model crash detected, will try to restart...
20:43:20 (5688): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
20:44:46 (2440): No heartbeat from core client for 30 sec - exiting
20:44:52 (2440): No heartbeat from core client for 30 sec - exiting
20:44:53 (2440): No heartbeat from core client for 30 sec - exiting
20:44:54 (2440): No heartbeat from core client for 30 sec - exiting
20:44:55 (2440): No heartbeat from core client for 30 sec - exiting
20:44:56 (2440): No heartbeat from core client for 30 sec - exiting
20:44:58 (2440): No heartbeat from core client for 30 sec - exiting
20:44:59 (2440): No heartbeat from core client for 30 sec - exiting
20:45:00 (2440): No heartbeat from core client for 30 sec - exiting
20:45:01 (2440): No heartbeat from core client for 30 sec - exiting
20:45:02 (2440): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
CCGlobal Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4980, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3396, selfPID=5852, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Regional yearly means requires 12 input files got 2
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1280, iMonCtr=2
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3200, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4716, selfPID=1228, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4220, selfPID=5964, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Regional yearly means requires 12 input files got 3
CPDN Monitor - Quit request from BOINC...
Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5608, selfPID=5608, iMonCtr=2
Suspended CPDN Monitor - Suspend request from BOINC...
21:00:00 (4504): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1772, iMonCtr=2
Model crash detected, will try to restart...
GSuspended CPDN Monitor - Suspend request from BOINC...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4324, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5740, iMonCtr=2
Model crash detected, will try to restart...
CSuspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4976, selfPID=5992, iMonCtr=1
Model crash detected, will try to restart...
CCPDN Monitor - Quit request from BOINC...
CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5724, selfPID=5532, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=956, iMonCtr=2
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5444, selfPID=5856, iMonCtr=1
Model crash detected, will try to restart...
CSuspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3620, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5752, iMonCtr=2
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5968, iMonCtr=2
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
CSuspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6000, iMonCtr=2
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5108, selfPID=5508, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5808, iMonCtr=2
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3864, selfPID=5620, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5640, iMonCtr=2
Model crash detected, will try to restart...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4204, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5216, iMonCtr=2
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5900, iMonCtr=2
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
CSuspended CPDN Monitor - Suspend request from BOINC...
Regional yearly means requires 12 input files got 9
Signal 11 received, exiting...
Called boinc_finish
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4852, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3220, selfPID=5676, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Regional yearly means requires 12 input files got 9
Called boinc_finish

</stderr_txt>
<message>
upload failure: <file_xfer_error>
  <file_name>hadam3p_pnw_qab9_2030_1_008356766_0_10.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_pnw_qab9_2030_1_008356766_0_11.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_pnw_qab9_2030_1_008356766_0_12.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>

</message>
]]>
Latest Trickles Received
Time Sent (UTC) Host ID Result ID Result Name Timestep CPU Time (sec) Average (sec/TS)
28 Aug 2013 13:34:42 1023214 15740579 hadam3p_pnw_qab9_2030_1_008356766_0 103,779 565,960 5.4535
27 Aug 2013 20:14:15 1023214 15740579 hadam3p_pnw_qab9_2030_1_008356766_0 103,776 565,200 5.4463
14 Aug 2013 15:36:20 1023214 15740579 hadam3p_pnw_qab9_2030_1_008356766_0 92,265 502,652 5.4479
14 Aug 2013 15:36:20 1023214 15740579 hadam3p_pnw_qab9_2030_1_008356766_0 92,260 501,893 5.4400
14 Aug 2013 15:36:20 1023214 15740579 hadam3p_pnw_qab9_2030_1_008356766_0 92,256 501,145 5.4321
14 Aug 2013 15:36:20 1023214 15740579 hadam3p_pnw_qab9_2030_1_008356766_0 80,746 438,915 5.4357
14 Aug 2013 15:36:20 1023214 15740579 hadam3p_pnw_qab9_2030_1_008356766_0 80,743 438,136 5.4263
29 Jul 2013 14:45:04 1023214 15740579 hadam3p_pnw_qab9_2030_1_008356766_0 80,741 437,385 5.4171
29 Jul 2013 14:45:04 1023214 15740579 hadam3p_pnw_qab9_2030_1_008356766_0 80,736 436,647 5.4083
11 Jul 2013 19:13:01 1023214 15740579 hadam3p_pnw_qab9_2030_1_008356766_0 69,219 376,076 5.4331
10 Jul 2013 19:44:34 1023214 15740579 hadam3p_pnw_qab9_2030_1_008356766_0 69,216 375,333 5.4226
22 Jun 2013 19:08:00 1023214 15740579 hadam3p_pnw_qab9_2030_1_008356766_0 57,696 311,849 5.4050
09 Jun 2013 18:53:28 1023214 15740579 hadam3p_pnw_qab9_2030_1_008356766_0 46,176 249,418 5.4015
22 May 2013 19:15:12 1023214 15740579 hadam3p_pnw_qab9_2030_1_008356766_0 34,656 186,701 5.3873
11 May 2013 20:41:13 1023214 15740579 hadam3p_pnw_qab9_2030_1_008356766_0 23,138 124,615 5.3857
11 May 2013 19:39:31 1023214 15740579 hadam3p_pnw_qab9_2030_1_008356766_0 23,136 123,879 5.3544
03 May 2013 19:05:37 1023214 15740579 hadam3p_pnw_qab9_2030_1_008356766_0 11,616 62,196 5.3543


©2024 cpdn.org