Name | hadam3p_anz_n5mu_2012_1_008594918_0 |
Workunit | 8741430 |
Created | 26 Mar 2014, 18:26:34 UTC |
Sent | 29 Mar 2014, 1:37:50 UTC |
Report deadline | 11 Mar 2015, 6:57:50 UTC |
Received | 20 May 2014, 5:55:44 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | 0 (0x00000000) |
Computer ID | 1322450 |
Run time | 4 days 16 hours 42 min 30 sec |
CPU time | 4 days 7 hours 31 min 51 sec |
Validate state | Invalid |
Credit | 1,503.36 |
Device peak FLOPS | 1.65 GFLOPS |
Application version | UK Met Office HadAM3P-HadRM3P Australia New Zealand v6.10 windows_intelx86 |
Stderr | <core_client_version>7.2.42</core_client_version> <![CDATA[ <stderr_txt> Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=8588, selfPID=8240, iMonCtr=1 Model crash detected, will try to restart... 19:25:09 (4128): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5348, selfPID=5488, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1084, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5420, iMonCtr=2 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5964, selfPID=4392, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... GController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6976, selfPID=6516, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3896, iMonCtr=2 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7064, iMonCtr=2 Leaving CPDN_Main::Monitor... CPDN Monitor - Quit request from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5676, selfPID=5676, iMonCtr=2 CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3756, selfPID=3532, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5368, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2520, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... 17:19:17 (4808): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 17:19:19 (4808): No heartbeat from core client for 30 sec - exiting CPDN Monitor - Quit request from BOINC... 14:36:23 (2612): No heartbeat from core client for 30 sec - exiting 14:36:24 (2612): No heartbeat from core client for 30 sec - exiting 14:36:25 (2612): No heartbeat from core client for 30 sec - exiting 14:36:26 (2612): No heartbeat from core client for 30 sec - exiting 14:36:27 (2612): No heartbeat from core client for 30 sec - exiting 14:36:28 (2612): No heartbeat from core client for 30 sec - exiting 14:36:29 (2612): No heartbeat from core client for 30 sec - exiting 14:36:30 (2612): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 14:39:00 (8460): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 14:43:46 (1032): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 14:44:37 (8252): No heartbeat from core client for 30 sec - exiting 14:44:38 (8252): No heartbeat from core client for 30 sec - exiting 14:44:39 (8252): No heartbeat from core client for 30 sec - exiting 14:44:41 (8252): No heartbeat from core client for 30 sec - exiting 14:44:42 (8252): No heartbeat from core client for 30 sec - exiting 14:44:43 (8252): No heartbeat from core client for 30 sec - exiting 14:44:44 (8252): No heartbeat from core client for 30 sec - exiting 14:44:45 (8252): No heartbeat from core client for 30 sec - exiting 14:44:46 (8252): No heartbeat from core client for 30 sec - exiting 14:44:47 (8252): No heartbeat from core client for 30 sec - exiting 14:44:49 (8252): No heartbeat from core client for 30 sec - exiting 14:44:50 (8252): No heartbeat from core client for 30 sec - exiting 14:44:51 (8252): No heartbeat from core client for 30 sec - exiting 14:44:52 (8252): No heartbeat from core client for 30 sec - exiting 14:44:53 (8252): No heartbeat from core client for 30 sec - exiting 14:44:54 (8252): No heartbeat from core client for 30 sec - exiting 14:44:55 (8252): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=8716, selfPID=8716, iMonCtr=2 14:47:46 (5592): No heartbeat from core client for 30 sec - exiting 14:47:47 (5592): No heartbeat from core client for 30 sec - exiting 14:47:48 (5592): No heartbeat from core client for 30 sec - exiting 14:47:49 (5592): No heartbeat from core client for 30 sec - exiting 14:47:50 (5592): No heartbeat from core client for 30 sec - exiting 14:47:51 (5592): No heartbeat from core client for 30 sec - exiting 14:47:52 (5592): No heartbeat from core client for 30 sec - exiting 14:47:53 (5592): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 14:47:55 (5592): No heartbeat from core client for 30 sec - exiting 14:51:34 (9120): No heartbeat from core client for 30 sec - exiting 14:51:35 (9120): No heartbeat from core client for 30 sec - exiting 14:51:36 (9120): No heartbeat from core client for 30 sec - exiting 14:51:37 (9120): No heartbeat from core client for 30 sec - exiting 14:51:38 (9120): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 14:53:44 (4428): No heartbeat from core client for 30 sec - exiting 14:53:45 (4428): No heartbeat from core client for 30 sec - exiting 14:53:46 (4428): No heartbeat from core client for 30 sec - exiting 14:53:47 (4428): No heartbeat from core client for 30 sec - exiting 14:53:48 (4428): No heartbeat from core client for 30 sec - exiting 14:53:49 (4428): No heartbeat from core client for 30 sec - exiting 14:53:50 (4428): No heartbeat from core client for 30 sec - exiting 14:53:51 (4428): No heartbeat from core client for 30 sec - exiting 14:53:52 (4428): No heartbeat from core client for 30 sec - exiting 14:53:53 (4428): No heartbeat from core client for 30 sec - exiting 14:53:54 (4428): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 14:59:47 (2320): No heartbeat from core client for 30 sec - exiting 14:59:48 (2320): No heartbeat from core client for 30 sec - exiting 14:59:49 (2320): No heartbeat from core client for 30 sec - exiting 14:59:50 (2320): No heartbeat from core client for 30 sec - exiting 14:59:51 (2320): No heartbeat from core client for 30 sec - exiting 14:59:52 (2320): No heartbeat from core client for 30 sec - exiting 14:59:53 (2320): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 14:59:54 (2320): No heartbeat from core client for 30 sec - exiting CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3580, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... 16:31:33 (3700): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4424, selfPID=5700, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3592, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... CPDN Monitor - Quit request from BOINC... 14:33:35 (3936): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6876, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6276, selfPID=8792, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6084, selfPID=4792, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3268, selfPID=3268, iMonCtr=2 CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Colobatroller ::rkerD: prCPDss is not running, ening, exiting, bRetVal1, c1, checkPID=0, selfPID=383iMonCtr=2 =2 del crash detected, will try to restart... Leaving CPDN_Main::Monitor... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=14020, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=13224, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... 20:58:22 (5520): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6736, selfPID=5876, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Model crashed: READHIST: End of file in READ from history file for namelist NLIHISTO tmp/xaakm.pipe_dummy 2048 Leaving CPDN_Main::Monitor... Called boinc_finish </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>hadam3p_anz_n5mu_2012_1_008594918_0_4.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_anz_n5mu_2012_1_008594918_0_5.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_anz_n5mu_2012_1_008594918_0_6.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_anz_n5mu_2012_1_008594918_0_7.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_anz_n5mu_2012_1_008594918_0_8.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_anz_n5mu_2012_1_008594918_0_9.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_anz_n5mu_2012_1_008594918_0_10.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_anz_n5mu_2012_1_008594918_0_11.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_anz_n5mu_2012_1_008594918_0_12.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> </message> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
18 May 2014 12:27:32 | 1322450 | 16412147 | hadam3p_anz_n5mu_2012_1_008594918_0 | 34,859 | 301,704 | 8.6550 |
08 Apr 2014 11:41:40 | 1322450 | 16412147 | hadam3p_anz_n5mu_2012_1_008594918_0 | 23,339 | 206,200 | 8.8350 |
07 Apr 2014 05:50:13 | 1322450 | 16412147 | hadam3p_anz_n5mu_2012_1_008594918_0 | 11,819 | 106,887 | 9.0437 |
©2024 cpdn.org