Name | hadam3p_anz_n9r4_2012_1_008600256_1 |
Workunit | 8746768 |
Created | 2 Aug 2014, 11:12:05 UTC |
Sent | 2 Aug 2014, 11:17:17 UTC |
Report deadline | 15 Jul 2015, 16:37:17 UTC |
Received | 27 Sep 2014, 7:26:04 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | 0 (0x00000000) |
Computer ID | 1335527 |
Run time | 8 days 17 hours 25 min 37 sec |
CPU time | 8 days 1 hours 15 min 39 sec |
Validate state | Invalid |
Credit | 4,484.28 |
Device peak FLOPS | 2.33 GFLOPS |
Application version | UK Met Office HadAM3P-HadRM3P Australia New Zealand v6.10 windows_intelx86 |
Stderr | <core_client_version>7.2.33</core_client_version> <![CDATA[ <stderr_txt> Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5900, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3844, selfPID=6140, iMonCtr=1 Model crash detected, will try to restart... GController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3780, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3064, selfPID=4760, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2548, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5280, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5508, selfPID=5636, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4940, selfPID=2696, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=704, selfPID=5668, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5840, selfPID=4000, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5140, selfPID=3216, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4784, selfPID=4136, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3892, iMonCtr=2 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3748, selfPID=4328, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... 10:11:56 (3912): No heartbeat from core client for 30 sec - exiting 10:11:57 (3912): No heartbeat from core client for 30 sec - exiting 10:11:58 (3912): No heartbeat from core client for 30 sec - exiting 10:11:59 (3912): No heartbeat from core client for 30 sec - exiting 10:12:00 (3912): No heartbeat from core client for 30 sec - exiting 10:12:01 (3912): No heartbeat from core client for 30 sec - exiting 10:12:02 (3912): No heartbeat from core client for 30 sec - exiting 10:12:03 (3912): No heartbeat from core client for 30 sec - exiting 10:12:05 (3912): No heartbeat from core client for 30 sec - exiting 10:12:06 (3912): No heartbeat from core client for 30 sec - exiting 10:12:07 (3912): No heartbeat from core client for 30 sec - exiting 10:12:08 (3912): No heartbeat from core client for 30 sec - exiting 10:12:09 (3912): No heartbeat from core client for 30 sec - exiting 10:12:10 (3912): No heartbeat from core client for 30 sec - exiting 10:12:11 (3912): No heartbeat from core client for 30 sec - exiting 10:12:12 (3912): No heartbeat from core client for 30 sec - exiting 10:12:13 (3912): No heartbeat from core client for 30 sec - exiting 10:12:14 (3912): No heartbeat from core client for 30 sec - exiting 10:12:15 (3912): No heartbeat from core client for 30 sec - exiting 10:12:17 (3912): No heartbeat from core client for 30 sec - exiting 10:12:18 (3912): No heartbeat from core client for 30 sec - exiting 10:12:19 (3912): No heartbeat from core client for 30 sec - exiting 10:12:20 (3912): No heartbeat from core client for 30 sec - exiting 10:12:21 (3912): No heartbeat from core client for 30 sec - exiting 10:12:22 (3912): No heartbeat from core client for 30 sec - exiting 10:12:23 (3912): No heartbeat from core client for 30 sec - exiting 10:12:24 (3912): No heartbeat from core client for 30 sec - exiting 10:12:25 (3912): No heartbeat from core client for 30 sec - exiting 10:12:26 (3912): No heartbeat from core client for 30 sec - exiting 10:12:27 (3912): No heartbeat from core client for 30 sec - exiting 10:12:29 (3912): No heartbeat from core client for 30 sec - exiting 10:12:30 (3912): No heartbeat from core client for 30 sec - exiting 10:12:31 (3912): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... GController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2076, selfPID=4636, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4624, selfPID=4040, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4896, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4656, selfPID=4448, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4904, selfPID=5640, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4728, selfPID=4256, iMonCtr=1 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5476, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5064, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5060, selfPID=4812, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3940, selfPID=4916, iMonCtr=1 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6008, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6028, selfPID=2344, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2188, selfPID=3464, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=292, selfPID=5052, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5948, selfPID=2868, iMonCtr=1 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5816, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3948, iMonCtr=2 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5360, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5304, selfPID=4944, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5856, selfPID=3696, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CSuspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6084, selfPID=3672, iMonCtr=1 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5708, iMonCtr=2 Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... GSuspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3880, selfPID=3948, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2480, iMonCtr=2 Model crash detected, will try to restart... CGlobal Worker:: CPDN process is not running, exiting, bRetVal = 0, checkPID=0, selfPID=4476, iMonCtr=1 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4876, selfPID=4080, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Called boinc_finish </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>hadam3p_anz_n9r4_2012_1_008600256_1_10.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_anz_n9r4_2012_1_008600256_1_11.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_anz_n9r4_2012_1_008600256_1_12.zip</file_name> <error_code>-161</error_code> </file_xfer_error> </message> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
26 Sep 2014 20:43:11 | 1335527 | 16834897 | hadam3p_anz_n9r4_2012_1_008600256_1 | 103,979 | 695,732 | 6.6911 |
19 Sep 2014 18:30:10 | 1335527 | 16834897 | hadam3p_anz_n9r4_2012_1_008600256_1 | 92,459 | 624,556 | 6.7550 |
07 Sep 2014 15:14:38 | 1335527 | 16834897 | hadam3p_anz_n9r4_2012_1_008600256_1 | 80,939 | 552,573 | 6.8270 |
31 Aug 2014 09:54:39 | 1335527 | 16834897 | hadam3p_anz_n9r4_2012_1_008600256_1 | 69,419 | 476,133 | 6.8588 |
24 Aug 2014 16:02:00 | 1335527 | 16834897 | hadam3p_anz_n9r4_2012_1_008600256_1 | 57,899 | 396,922 | 6.8554 |
21 Aug 2014 12:59:49 | 1335527 | 16834897 | hadam3p_anz_n9r4_2012_1_008600256_1 | 46,379 | 317,405 | 6.8437 |
14 Aug 2014 17:28:43 | 1335527 | 16834897 | hadam3p_anz_n9r4_2012_1_008600256_1 | 34,859 | 239,722 | 6.8769 |
14 Aug 2014 16:26:14 | 1335527 | 16834897 | hadam3p_anz_n9r4_2012_1_008600256_1 | 23,339 | 161,272 | 6.9100 |
07 Aug 2014 11:11:42 | 1335527 | 16834897 | hadam3p_anz_n9r4_2012_1_008600256_1 | 11,819 | 81,545 | 6.8995 |
©2024 cpdn.org