climateprediction.net home page
Task 16786536

Task 16786536

Name hadam3p_eu_k54j_2013_1_008866628_0
Workunit 9012557
Created 9 Jul 2014, 14:20:43 UTC
Sent 14 Jul 2014, 22:36:30 UTC
Report deadline 27 Jun 2015, 3:56:30 UTC
Received 21 Sep 2014, 19:27:29 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 0 (0x00000000)
Computer ID 1310670
Run time 5 days 0 hours 7 min 28 sec
CPU time 4 days 3 hours 23 min 36 sec
Validate state Invalid
Credit 1,790.21
Device peak FLOPS 2.16 GFLOPS
Application version UK Met Office HadAM3P-HadRM3P Europe v6.09
windows_intelx86
Stderr
<core_client_version>7.2.33</core_client_version>
<![CDATA[
<stderr_txt>
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=9592, selfPID=4116, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=10172, selfPID=9048, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=8028, selfPID=10704, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3884, selfPID=4720, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=11144, selfPID=4432, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5172, selfPID=6996, iMonCtr=1
Model crash detected, will try to restart...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=8952, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=9068, selfPID=8136, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2876, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4176, selfPID=7664, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
16:52:00 (6056): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
16:52:04 (6056): No heartbeat from core client for 30 sec - exiting
16:52:05 (6056): No heartbeat from core client for 30 sec - exiting
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=10364, iMonCtr=2
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=13308, selfPID=3884, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7880, iMonCtr=2
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
23:44:53 (5456): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
17:22:38 (11360): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6132, selfPID=6132, iMonCtr=2
17:22:40 (11360): No heartbeat from core client for 30 sec - exiting
20:23:11 (9264): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5264, iMonCtr=2
Model crash detected, will try to restart...
21:17:33 (11288): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=11700, selfPID=1104, iMonCtr=1
Model crash detected, will try to restart...
19:58:46 (11936): No heartbeat from core client for 30 sec - exiting
19:58:47 (11936): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
19:58:48 (11936): No heartbeat from core client for 30 sec - exiting
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4704, selfPID=7968, iMonCtr=1
Model crash detected, will try to restart...
02:53:53 (1928): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=11392, iMonCtr=2
Model crash detected, will try to restart...
06:45:08 (7088): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
06:45:09 (7088): No heartbeat from core client for 30 sec - exiting
Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=12080, selfPID=12080, iMonCtr=2
06:57:13 (1660): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
07:03:17 (1736): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6272, iMonCtr=2
17:01:28 (5736): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=8396, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4620, iMonCtr=2
Model crash detected, will try to restart...
05:26:11 (8176): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
06:43:45 (5660): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=8044, selfPID=5856, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7612, selfPID=7352, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6168, selfPID=5132, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=9288, selfPID=7324, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7964, selfPID=2872, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5968, iMonCtr=2
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7500, iMonCtr=2
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4216, iMonCtr=2
Model crash detected, will try to restart...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7808, iMonCtr=2
Suspended CPDN Monitor - Suspend request from BOINC...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4936, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3444, iMonCtr=2
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6448, iMonCtr=2
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1548, iMonCtr=2
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Called boinc_finish

</stderr_txt>
<message>
upload failure: <file_xfer_error>
  <file_name>hadam3p_eu_k54j_2013_1_008866628_0_10.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_eu_k54j_2013_1_008866628_0_11.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_eu_k54j_2013_1_008866628_0_12.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>

</message>
]]>
Latest Trickles Received
Time Sent (UTC) Host ID Result ID Result Name Timestep CPU Time (sec) Average (sec/TS)
16 Sep 2014 00:31:16 1310670 16786536 hadam3p_eu_k54j_2013_1_008866628_0 103,776 329,199 3.1722
10 Sep 2014 01:02:22 1310670 16786536 hadam3p_eu_k54j_2013_1_008866628_0 92,256 292,944 3.1753
30 Aug 2014 10:37:08 1310670 16786536 hadam3p_eu_k54j_2013_1_008866628_0 80,773 257,136 3.1834
30 Aug 2014 01:34:53 1310670 16786536 hadam3p_eu_k54j_2013_1_008866628_0 80,736 256,401 3.1758
24 Aug 2014 04:10:42 1310670 16786536 hadam3p_eu_k54j_2013_1_008866628_0 69,216 220,201 3.1814
15 Aug 2014 10:47:47 1310670 16786536 hadam3p_eu_k54j_2013_1_008866628_0 57,696 182,904 3.1701
15 Aug 2014 10:47:47 1310670 16786536 hadam3p_eu_k54j_2013_1_008866628_0 46,176 145,469 3.1503
15 Aug 2014 10:47:47 1310670 16786536 hadam3p_eu_k54j_2013_1_008866628_0 34,656 108,191 3.1219
03 Aug 2014 15:39:08 1310670 16786536 hadam3p_eu_k54j_2013_1_008866628_0 23,159 73,162 3.1591
02 Aug 2014 15:23:01 1310670 16786536 hadam3p_eu_k54j_2013_1_008866628_0 23,136 72,605 3.1382
29 Jul 2014 17:36:28 1310670 16786536 hadam3p_eu_k54j_2013_1_008866628_0 11,616 36,236 3.1195


©2024 climateprediction.net