Name | hadam3p_anz_n7ph_2012_1_008597605_0 |
Workunit | 8744117 |
Created | 26 Mar 2014, 18:50:22 UTC |
Sent | 28 Mar 2014, 4:07:23 UTC |
Report deadline | 10 Mar 2015, 9:27:23 UTC |
Received | 20 Apr 2014, 23:36:21 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | 0 (0x00000000) |
Computer ID | 1067463 |
Run time | 8 days 9 hours 4 min 10 sec |
CPU time | 8 days 1 hours 49 min 11 sec |
Validate state | Invalid |
Credit | 3,987.46 |
Device peak FLOPS | 2.53 GFLOPS |
Application version | UK Met Office HadAM3P-HadRM3P Australia New Zealand v6.10 windows_intelx86 |
Stderr | <core_client_version>7.2.33</core_client_version> <![CDATA[ <stderr_txt> Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6648, iMonCtr=2 09:21:08 (3216): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6192, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5668, selfPID=11072, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3580, selfPID=6536, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... 08:30:45 (10148): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5340, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6844, selfPID=7184, iMonCtr=1 Model crash detected, will try to restart... 08:14:36 (2840): No heartbeat from core client for 30 sec - exiting 08:14:37 (2840): No heartbeat from core client for 30 sec - exiting 08:14:38 (2840): No heartbeat from core client for 30 sec - exiting 08:14:39 (2840): No heartbeat from core client for 30 sec - exiting 08:14:40 (2840): No heartbeat from core client for 30 sec - exiting 08:14:41 (2840): No heartbeat from core client for 30 sec - exiting 08:14:42 (2840): No heartbeat from core client for 30 sec - exiting 08:14:43 (2840): No heartbeat from core client for 30 sec - exiting 08:14:44 (2840): No heartbeat from core client for 30 sec - exiting 08:14:45 (2840): No heartbeat from core client for 30 sec - exiting 08:14:46 (2840): No heartbeat from core client for 30 sec - exiting 08:14:47 (2840): No heartbeat from core client for 30 sec - exiting 08:14:48 (2840): No heartbeat from core client for 30 sec - exiting 08:14:49 (2840): No heartbeat from core client for 30 sec - exiting 08:14:50 (2840): No heartbeat from core client for 30 sec - exiting 08:14:51 (2840): No heartbeat from core client for 30 sec - exiting 08:14:52 (2840): No heartbeat from core client for 30 sec - exiting 08:14:53 (2840): No heartbeat from core client for 30 sec - exiting 08:14:54 (2840): No heartbeat from core client for 30 sec - exiting 08:14:55 (2840): No heartbeat from core client for 30 sec - exiting 08:14:56 (2840): No heartbeat from core client for 30 sec - exiting 08:14:57 (2840): No heartbeat from core client for 30 sec - exiting 08:14:58 (2840): No heartbeat from core client for 30 sec - exiting 08:14:59 (2840): No heartbeat from core client for 30 sec - exiting 08:15:00 (2840): No heartbeat from core client for 30 sec - exiting 08:15:01 (2840): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6204, selfPID=7104, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3036, selfPID=1624, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6544, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6944, selfPID=5732, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=9256, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=9980, selfPID=9368, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5500, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3888, selfPID=2800, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Global Worker:: CPDN process is not running, exiting, bRetVal = 0, checkPID=0, selfPID=7724, iMonCtr=1 Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=8008, selfPID=8008, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=8008, selfPID=6360, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Called boinc_finish </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>hadam3p_anz_n7ph_2012_1_008597605_0_9.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_anz_n7ph_2012_1_008597605_0_10.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_anz_n7ph_2012_1_008597605_0_11.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_anz_n7ph_2012_1_008597605_0_12.zip</file_name> <error_code>-161</error_code> </file_xfer_error> </message> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
19 Apr 2014 20:34:56 | 1067463 | 16414853 | hadam3p_anz_n7ph_2012_1_008597605_0 | 92,459 | 674,454 | 7.2946 |
18 Apr 2014 00:40:36 | 1067463 | 16414853 | hadam3p_anz_n7ph_2012_1_008597605_0 | 80,939 | 588,238 | 7.2677 |
15 Apr 2014 23:49:01 | 1067463 | 16414853 | hadam3p_anz_n7ph_2012_1_008597605_0 | 69,419 | 502,037 | 7.2320 |
14 Apr 2014 12:51:37 | 1067463 | 16414853 | hadam3p_anz_n7ph_2012_1_008597605_0 | 57,899 | 414,747 | 7.1633 |
12 Apr 2014 18:23:26 | 1067463 | 16414853 | hadam3p_anz_n7ph_2012_1_008597605_0 | 46,379 | 327,846 | 7.0688 |
10 Apr 2014 21:40:17 | 1067463 | 16414853 | hadam3p_anz_n7ph_2012_1_008597605_0 | 34,859 | 241,875 | 6.9387 |
02 Apr 2014 15:39:33 | 1067463 | 16414853 | hadam3p_anz_n7ph_2012_1_008597605_0 | 23,339 | 161,821 | 6.9335 |
01 Apr 2014 17:07:38 | 1067463 | 16414853 | hadam3p_anz_n7ph_2012_1_008597605_0 | 11,819 | 81,806 | 6.9216 |
©2024 cpdn.org