Task 16757596

Name	hadam3p_eu_o9ef_2013_1_008838768_0
Workunit	8984697
Created	8 Jul 2014, 14:37:35 UTC
Sent	8 Jul 2014, 14:39:36 UTC
Report deadline	20 Jun 2015, 19:59:36 UTC
Received	14 Aug 2014, 14:06:06 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	0 (0x00000000)
Computer ID	1327910
Run time	2 days 13 hours 32 min 35 sec
CPU time	15 hours 44 min 23 sec
Validate state	Invalid
Credit	200.76
Device peak FLOPS	1.62 GFLOPS
Application version	UK Met Office HadAM3P-HadRM3P Europe v6.09 windows_intelx86
Stderr	<core_client_version>7.2.42</core_client_version> <![CDATA[ <stderr_txt> Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4844, iMonCtr=2 19:35:53 (836): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:35:54 (836): No heartbeat from core client for 30 sec - exiting 19:35:55 (836): No heartbeat from core client for 30 sec - exiting 19:35:56 (836): No heartbeat from core client for 30 sec - exiting 20:32:52 (3012): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:32:53 (3012): No heartbeat from core client for 30 sec - exiting Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3508, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=772, iMonCtr=2 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3864, selfPID=1200, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3160, iMonCtr=2 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=604, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=704, iMonCtr=2 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2424, selfPID=2120, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3896, selfPID=1880, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2700, selfPID=2700, iMonCtr=2 Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3392, iMonCtr=2 07:25:22 (4512): No heartbeat from core client for 30 sec - exiting 07:25:23 (4512): No heartbeat from core client for 30 sec - exiting 07:25:24 (4512): No heartbeat from core client for 30 sec - exiting 07:25:25 (4512): No heartbeat from core client for 30 sec - exiting 07:25:26 (4512): No heartbeat from core client for 30 sec - exiting 07:25:27 (4512): No heartbeat from core client for 30 sec - exiting 07:25:28 (4512): No heartbeat from core client for 30 sec - exiting 07:25:29 (4512): No heartbeat from core client for 30 sec - exiting 07:25:30 (4512): No heartbeat from core client for 30 sec - exiting 07:25:31 (4512): No heartbeat from core client for 30 sec - exiting 07:25:32 (4512): No heartbeat from core client for 30 sec - exiting 07:27:06 (4512): No heartbeat from core client for 30 sec - exiting 07:27:07 (4512): No heartbeat from core client for 30 sec - exiting 07:27:08 (4512): No heartbeat from core client for 30 sec - exiting 07:27:09 (4512): No heartbeat from core client for 30 sec - exiting 07:27:10 (4512): No heartbeat from core client for 30 sec - exiting 07:27:11 (4512): No heartbeat from core client for 30 sec - exiting 07:27:12 (4512): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 07:30:31 (2892): No heartbeat from core client for 30 sec - exiting 07:30:34 (2892): No heartbeat from core client for 30 sec - exiting 07:30:35 (2892): No heartbeat from core client for 30 sec - exiting 07:30:36 (2892): No heartbeat from core client for 30 sec - exiting 07:30:37 (2892): No heartbeat from core client for 30 sec - exiting 07:30:38 (2892): No heartbeat from core client for 30 sec - exiting 07:30:39 (2892): No heartbeat from core client for 30 sec - exiting 07:30:40 (2892): No heartbeat from core client for 30 sec - exiting 07:30:41 (2892): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3116, iMonCtr=2 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... GController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2884, selfPID=3452, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=1800, selfPID=2932, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3776, selfPID=3360, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3268, selfPID=3400, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4500, selfPID=4500, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4628, iMonCtr=2 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2604, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2180, selfPID=3204, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3132, selfPID=2708, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3312, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2736, selfPID=2080, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=1260, selfPID=2800, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3596, selfPID=3240, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Called boinc_finish </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>hadam3p_eu_o9ef_2013_1_008838768_0_2.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_o9ef_2013_1_008838768_0_3.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_o9ef_2013_1_008838768_0_4.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_o9ef_2013_1_008838768_0_5.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_o9ef_2013_1_008838768_0_6.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_o9ef_2013_1_008838768_0_7.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_o9ef_2013_1_008838768_0_8.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_o9ef_2013_1_008838768_0_9.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_o9ef_2013_1_008838768_0_10.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_o9ef_2013_1_008838768_0_11.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_o9ef_2013_1_008838768_0_12.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> </message> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
03 Aug 2014 18:21:51	1327910	16757596	hadam3p_eu_o9ef_2013_1_008838768_0	11,638	41,539	3.5693
01 Aug 2014 16:16:04	1327910	16757596	hadam3p_eu_o9ef_2013_1_008838768_0	11,630	40,959	3.5218
01 Aug 2014 04:46:01	1327910	16757596	hadam3p_eu_o9ef_2013_1_008838768_0	11,624	40,408	3.4763
31 Jul 2014 16:28:52	1327910	16757596	hadam3p_eu_o9ef_2013_1_008838768_0	11,616	39,881	3.4333