climateprediction.net home page
Task 14346241

Task 14346241

Name hadam3p_saf_0uj3_1974_1_006872743_1
Workunit 7076059
Created 2 Apr 2012, 13:26:46 UTC
Sent 2 Apr 2012, 13:26:52 UTC
Report deadline 15 Mar 2013, 18:46:52 UTC
Received 11 Apr 2012, 19:08:06 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 0 (0x00000000)
Computer ID 1206904
Run time 4 days 2 hours 17 min 1 sec
CPU time 3 days 2 hours 3 min 11 sec
Validate state Invalid
Credit 1,122.82
Device peak FLOPS 2.34 GFLOPS
Application version UK Met Office HadAM3P-HadRM3P Southern Africa v6.09
windows_intelx86
Stderr
<core_client_version>6.12.34</core_client_version>
<![CDATA[
<stderr_txt>
Suspended CPDN Monitor - Suspend request from BOINC...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7272, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7320, selfPID=5408, iMonCtr=1
Model crash detected, will try to restart...
10:05:33 (4848): No heartbeat from core client for 30 sec - exiting
10:05:34 (4848): No heartbeat from core client for 30 sec - exiting
10:05:35 (4848): No heartbeat from core client for 30 sec - exiting
10:05:36 (4848): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=880, selfPID=5744, iMonCtr=1
Model crash detected, will try to restart...
15:27:30 (5052): No heartbeat from core client for 30 sec - exiting
15:27:31 (5052): No heartbeat from core client for 30 sec - exiting
15:27:32 (5052): No heartbeat from core client for 30 sec - exiting
15:27:33 (5052): No heartbeat from core client for 30 sec - exiting
15:27:34 (5052): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
15:28:20 (6708): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Regional Worker:: Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7084, iMonCtr=2
Model crash detected, will try to restart...
16:41:35 (4664): No heartbeat from core client for 30 sec - exiting
16:41:37 (4664): No heartbeat from core client for 30 sec - exiting
16:41:38 (4664): No heartbeat from core client for 30 sec - exiting
16:41:39 (4664): No heartbeat from core client for 30 sec - exiting
16:41:40 (4664): No heartbeat from core client for 30 sec - exiting
16:41:41 (4664): No heartbeat from core client for 30 sec - exiting
16:41:42 (4664): No heartbeat from core client for 30 sec - exiting
16:41:43 (4664): No heartbeat from core client for 30 sec - exiting
16:41:44 (4664): No heartbeat from core client for 30 sec - exiting
16:41:45 (4664): No heartbeat from core client for 30 sec - exiting
16:41:46 (4664): No heartbeat from core client for 30 sec - exiting
16:41:47 (4664): No heartbeat from core client for 30 sec - exiting
16:41:49 (4664): No heartbeat from core client for 30 sec - exiting
16:41:50 (4664): No heartbeat from core client for 30 sec - exiting
16:41:51 (4664): No heartbeat from core client for 30 sec - exiting
16:41:52 (4664): No heartbeat from core client for 30 sec - exiting
16:41:53 (4664): No heartbeat from core client for 30 sec - exiting
16:41:54 (4664): No heartbeat from core client for 30 sec - exiting
16:41:55 (4664): No heartbeat from core client for 30 sec - exiting
16:41:56 (4664): No heartbeat from core client for 30 sec - exiting
16:41:57 (4664): No heartbeat from core client for 30 sec - exiting
16:41:58 (4664): No heartbeat from core client for 30 sec - exiting
16:41:59 (4664): No heartbeat from core client for 30 sec - exiting
16:42:01 (4664): No heartbeat from core client for 30 sec - exiting
16:42:02 (4664): No heartbeat from core client for 30 sec - exiting
16:42:03 (4664): No heartbeat from core client for 30 sec - exiting
16:42:04 (4664): No heartbeat from core client for 30 sec - exiting
16:42:05 (4664): No heartbeat from core client for 30 sec - exiting
16:42:06 (4664): No heartbeat from core client for 30 sec - exiting
16:42:07 (4664): No heartbeat from core client for 30 sec - exiting
16:42:08 (4664): No heartbeat from core client for 30 sec - exiting
16:42:09 (4664): No heartbeat from core client for 30 sec - exiting
16:42:10 (4664): No heartbeat from core client for 30 sec - exiting
16:42:11 (4664): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6672, iMonCtr=2
Model crash detected, will try to restart...
15:00:48 (1432): No heartbeat from core client for 30 sec - exiting
15:00:49 (1432): No heartbeat from core client for 30 sec - exiting
15:00:51 (1432): No heartbeat from core client for 30 sec - exiting
15:00:52 (1432): No heartbeat from core client for 30 sec - exiting
15:00:53 (1432): No heartbeat from core client for 30 sec - exiting
15:00:54 (1432): No heartbeat from core client for 30 sec - exiting
15:00:55 (1432): No heartbeat from core client for 30 sec - exiting
15:00:56 (1432): No heartbeat from core client for 30 sec - exiting
15:00:57 (1432): No heartbeat from core client for 30 sec - exiting
15:00:58 (1432): No heartbeat from core client for 30 sec - exiting
15:00:59 (1432): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6808, iMonCtr=2
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5856, iMonCtr=2
Model crash detected, will try to restart...
07:54:33 (4008): No heartbeat from core client for 30 sec - exiting
07:54:35 (4008): No heartbeat from core client for 30 sec - exiting
07:54:36 (4008): No heartbeat from core client for 30 sec - exiting
07:54:37 (4008): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7164, selfPID=6276, iMonCtr=1
Model crash detected, will try to restart...
11:14:12 (5240): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
11:14:14 (5240): No heartbeat from core client for 30 sec - exiting
11:14:15 (5240): No heartbeat from core client for 30 sec - exiting
11:14:16 (5240): No heartbeat from core client for 30 sec - exiting
11:14:17 (5240): No heartbeat from core client for 30 sec - exiting
11:14:18 (5240): No heartbeat from core client for 30 sec - exiting
11:14:19 (5240): No heartbeat from core client for 30 sec - exiting
11:14:20 (5240): No heartbeat from core client for 30 sec - exiting
11:14:21 (5240): No heartbeat from core client for 30 sec - exiting
11:14:23 (5240): No heartbeat from core client for 30 sec - exiting

RCM: BUFFIN : Read Failed: No such file or directory
RCM : BUFFIN: C I/O Error feof - Unit 62 - Return code = 16
RCM : BUFFIN: C I/O Error feof - Unit 62 - Return code = 16


GCM: BUFFIN : Read Failed: Result too large
GCM : BUFFIN: C I/O Error feof - Unit 62 - Return code = 16
GCM : BUFFIN: C I/O Error feof - Unit 62 - Return code = 16


Model crashed: STWORK  : I/O error - PP fixed length header                                                                                                                                                                                                                    tmp/xaakm.pipe_dummy                                                            2048    

</stderr_txt>
<message>
upload failure: <file_xfer_error>
  <file_name>hadam3p_saf_0uj3_1974_1_006872743_1_7.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_saf_0uj3_1974_1_006872743_1_8.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_saf_0uj3_1974_1_006872743_1_9.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_saf_0uj3_1974_1_006872743_1_10.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_saf_0uj3_1974_1_006872743_1_11.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_saf_0uj3_1974_1_006872743_1_12.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>

</message>
]]>
Latest Trickles Received
Time Sent (UTC) Host ID Result ID Result Name Timestep CPU Time (sec) Average (sec/TS)
10 Apr 2012 16:02:48 1206904 14346241 hadam3p_saf_0uj3_1974_1_006872743_1 69,216 230,398 3.3287
09 Apr 2012 18:23:06 1206904 14346241 hadam3p_saf_0uj3_1974_1_006872743_1 57,696 192,515 3.3367
08 Apr 2012 16:43:38 1206904 14346241 hadam3p_saf_0uj3_1974_1_006872743_1 46,176 154,312 3.3418
05 Apr 2012 19:15:37 1206904 14346241 hadam3p_saf_0uj3_1974_1_006872743_1 34,656 115,782 3.3409
04 Apr 2012 18:15:28 1206904 14346241 hadam3p_saf_0uj3_1974_1_006872743_1 23,136 77,527 3.3509
03 Apr 2012 14:30:31 1206904 14346241 hadam3p_saf_0uj3_1974_1_006872743_1 11,616 39,009 3.3582


©2024 climateprediction.net