Task 15707821

Name	hadam3p_eu_qfvv_2001_1_008346376_0
Workunit	8497237
Created	5 Apr 2013, 14:16:51 UTC
Sent	5 Apr 2013, 19:02:31 UTC
Report deadline	19 Mar 2014, 0:22:31 UTC
Received	20 Aug 2013, 13:38:05 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	0 (0x00000000)
Computer ID	1253472
Run time	4 days 3 hours 46 min
CPU time	2 days 15 hours 20 min 21 sec
Validate state	Invalid
Credit	995.30
Device peak FLOPS	1.85 GFLOPS
Application version	UK Met Office HadAM3P-HadRM3P Europe v6.09 windows_intelx86
Stderr	<core_client_version>7.0.64</core_client_version> <![CDATA[ <stderr_txt> Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5052, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6904, iMonCtr=2 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5188, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4132, iMonCtr=2 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2444, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1772, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3100, selfPID=4808, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6356, selfPID=1028, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4264, selfPID=2620, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4020, selfPID=2928, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5776, selfPID=5776, iMonCtr=2 CPDN Monitor - Quit request from BOINC... 09:03:42 (2392): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3316, selfPID=5200, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5888, selfPID=1932, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2828, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5540, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6700, selfPID=6712, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5100, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3716, selfPID=5344, iMonCtr=1 Model crash detected, will try to restart... 13:58:18 (1932): No heartbeat from core client for 30 sec - exiting 13:58:19 (1932): No heartbeat from core client for 30 sec - exiting 13:58:20 (1932): No heartbeat from core client for 30 sec - exiting 13:58:21 (1932): No heartbeat from core client for 30 sec - exiting 13:58:22 (1932): No heartbeat from core client for 30 sec - exiting 13:58:23 (1932): No heartbeat from core client for 30 sec - exiting 13:58:24 (1932): No heartbeat from core client for 30 sec - exiting 13:58:25 (1932): No heartbeat from core client for 30 sec - exiting 13:58:26 (1932): No heartbeat from core client for 30 sec - exiting 13:58:27 (1932): No heartbeat from core client for 30 sec - exiting 13:58:28 (1932): No heartbeat from core client for 30 sec - exiting 13:58:29 (1932): No heartbeat from core client for 30 sec - exiting 13:58:30 (1932): No heartbeat from core client for 30 sec - exiting 13:58:32 (1932): No heartbeat from core client for 30 sec - exiting 13:58:33 (1932): No heartbeat from core client for 30 sec - exiting 13:58:34 (1932): No heartbeat from core client for 30 sec - exiting 13:58:35 (1932): No heartbeat from core client for 30 sec - exiting 13:58:36 (1932): No heartbeat from core client for 30 sec - exiting 13:58:37 (1932): No heartbeat from core client for 30 sec - exiting 13:58:38 (1932): No heartbeat from core client for 30 sec - exiting 13:58:39 (1932): No heartbeat from core client for 30 sec - exiting 13:58:40 (1932): No heartbeat from core client for 30 sec - exiting 13:58:41 (1932): No heartbeat from core client for 30 sec - exiting 13:58:42 (1932): No heartbeat from core client for 30 sec - exiting 13:58:43 (1932): No heartbeat from core client for 30 sec - exiting 13:58:44 (1932): No heartbeat from core client for 30 sec - exiting 13:58:45 (1932): No heartbeat from core client for 30 sec - exiting 13:58:46 (1932): No heartbeat from core client for 30 sec - exiting 13:58:47 (1932): No heartbeat from core client for 30 sec - exiting 13:58:48 (1932): No heartbeat from core client for 30 sec - exiting 13:58:49 (1932): No heartbeat from core client for 30 sec - exiting 13:58:50 (1932): No heartbeat from core client for 30 sec - exiting 13:58:51 (1932): No heartbeat from core client for 30 sec - exiting 13:58:52 (1932): No heartbeat from core client for 30 sec - exiting 13:58:53 (1932): No heartbeat from core client for 30 sec - exiting 13:58:54 (1932): No heartbeat from core client for 30 sec - exiting 13:58:55 (1932): No heartbeat from core client for 30 sec - exiting 13:58:56 (1932): No heartbeat from core client for 30 sec - exiting 13:58:57 (1932): No heartbeat from core client for 30 sec - exiting 13:58:58 (1932): No heartbeat from core client for 30 sec - exiting 13:58:59 (1932): No heartbeat from core client for 30 sec - exiting 13:59:00 (1932): No heartbeat from core client for 30 sec - exiting 13:59:01 (1932): No heartbeat from core client for 30 sec - exiting 13:59:02 (1932): No heartbeat from core client for 30 sec - exiting 13:59:03 (1932): No heartbeat from core client for 30 sec - exiting 13:59:04 (1932): No heartbeat from core client for 30 sec - exiting 13:59:05 (1932): No heartbeat from core client for 30 sec - exiting 13:59:06 (1932): No heartbeat from core client for 30 sec - exiting 13:59:07 (1932): No heartbeat from core client for 30 sec - exiting 13:59:08 (1932): No heartbeat from core client for 30 sec - exiting 13:59:09 (1932): No heartbeat from core client for 30 sec - exiting 13:59:10 (1932): No heartbeat from core client for 30 sec - exiting 13:59:11 (1932): No heartbeat from core client for 30 sec - exiting 13:59:12 (1932): No heartbeat from core client for 30 sec - exiting 13:59:13 (1932): No heartbeat from core client for 30 sec - exiting 13:59:14 (1932): No heartbeat from core client for 30 sec - exiting 13:59:15 (1932): No heartbeat from core client for 30 sec - exiting 13:59:16 (1932): No heartbeat from core client for 30 sec - exiting 13:59:17 (1932): No heartbeat from core client for 30 sec - exiting 13:59:18 (1932): No heartbeat from core client for 30 sec - exiting 13:59:19 (1932): No heartbeat from core client for 30 sec - exiting 13:59:20 (1932): No heartbeat from core client for 30 sec - exiting 13:59:21 (1932): No heartbeat from core client for 30 sec - exiting 13:59:22 (1932): No heartbeat from core client for 30 sec - exiting 13:59:23 (1932): No heartbeat from core client for 30 sec - exiting 13:59:24 (1932): No heartbeat from core client for 30 sec - exiting 13:59:25 (1932): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 16:03:28 (2404): No heartbeat from core client for 30 sec - exiting 16:03:29 (2404): No heartbeat from core client for 30 sec - exiting 16:03:30 (2404): No heartbeat from core client for 30 sec - exiting 16:03:32 (2404): No heartbeat from core client for 30 sec - exiting 16:03:33 (2404): No heartbeat from core client for 30 sec - exiting 16:03:34 (2404): No heartbeat from core client for 30 sec - exiting 16:03:35 (2404): No heartbeat from core client for 30 sec - exiting 16:03:36 (2404): No heartbeat from core client for 30 sec - exiting 16:03:37 (2404): No heartbeat from core client for 30 sec - exiting 16:03:38 (2404): No heartbeat from core client for 30 sec - exiting 16:03:40 (2404): No heartbeat from core client for 30 sec - exiting 16:03:41 (2404): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6308, selfPID=728, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... 10:37:54 (4280): No heartbeat from core client for 30 sec - exiting 10:37:55 (4280): No heartbeat from core client for 30 sec - exiting 10:37:57 (4280): No heartbeat from core client for 30 sec - exiting 10:37:58 (4280): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1020, iMonCtr= 2 del crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4880, selfPID=1752, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Called boinc_finish </stderr_txt><message> upload failure: <file_xfer_error> <file_name>hadam3p_eu_qfvv_2001_1_008346376_0_6.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_qfvv_2001_1_008346376_0_7.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_qfvv_2001_1_008346376_0_8.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_qfvv_2001_1_008346376_0_9.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_qfvv_2001_1_008346376_0_10.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_qfvv_2001_1_008346376_0_11.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_qfvv_2001_1_008346376_0_12.zip</file_name> <error_code>-161</error_code> </file_xfer_error> </message> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
17 Aug 2013 11:25:06	1253472	15707821	hadam3p_eu_qfvv_2001_1_008346376_0	57,696	207,036	3.5884
26 Jul 2013 15:56:56	1253472	15707821	hadam3p_eu_qfvv_2001_1_008346376_0	46,176	165,580	3.5858
23 Jul 2013 19:47:20	1253472	15707821	hadam3p_eu_qfvv_2001_1_008346376_0	34,656	124,292	3.5864
02 Jul 2013 11:48:52	1253472	15707821	hadam3p_eu_qfvv_2001_1_008346376_0	23,136	83,825	3.6231
10 Apr 2013 15:15:34	1253472	15707821	hadam3p_eu_qfvv_2001_1_008346376_0	11,616	42,514	3.6600