Task 16786536

Name	hadam3p_eu_k54j_2013_1_008866628_0
Workunit	9012557
Created	9 Jul 2014, 14:20:43 UTC
Sent	14 Jul 2014, 22:36:30 UTC
Report deadline	27 Jun 2015, 3:56:30 UTC
Received	21 Sep 2014, 19:27:29 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	0 (0x00000000)
Computer ID	1310670
Run time	5 days 0 hours 7 min 28 sec
CPU time	4 days 3 hours 23 min 36 sec
Validate state	Invalid
Credit	1,790.21
Device peak FLOPS	2.16 GFLOPS
Application version	UK Met Office HadAM3P-HadRM3P Europe v6.09 windows_intelx86
Stderr	<core_client_version>7.2.33</core_client_version> <![CDATA[ <stderr_txt> Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=9592, selfPID=4116, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=10172, selfPID=9048, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=8028, selfPID=10704, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3884, selfPID=4720, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=11144, selfPID=4432, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5172, selfPID=6996, iMonCtr=1 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=8952, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=9068, selfPID=8136, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2876, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4176, selfPID=7664, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 16:52:00 (6056): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 16:52:04 (6056): No heartbeat from core client for 30 sec - exiting 16:52:05 (6056): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=10364, iMonCtr=2 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=13308, selfPID=3884, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7880, iMonCtr=2 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... 23:44:53 (5456): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 17:22:38 (11360): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6132, selfPID=6132, iMonCtr=2 17:22:40 (11360): No heartbeat from core client for 30 sec - exiting 20:23:11 (9264): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5264, iMonCtr=2 Model crash detected, will try to restart... 21:17:33 (11288): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=11700, selfPID=1104, iMonCtr=1 Model crash detected, will try to restart... 19:58:46 (11936): No heartbeat from core client for 30 sec - exiting 19:58:47 (11936): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:58:48 (11936): No heartbeat from core client for 30 sec - exiting Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4704, selfPID=7968, iMonCtr=1 Model crash detected, will try to restart... 02:53:53 (1928): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=11392, iMonCtr=2 Model crash detected, will try to restart... 06:45:08 (7088): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 06:45:09 (7088): No heartbeat from core client for 30 sec - exiting Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=12080, selfPID=12080, iMonCtr=2 06:57:13 (1660): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 07:03:17 (1736): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6272, iMonCtr=2 17:01:28 (5736): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=8396, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4620, iMonCtr=2 Model crash detected, will try to restart... 05:26:11 (8176): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 06:43:45 (5660): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=8044, selfPID=5856, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7612, selfPID=7352, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6168, selfPID=5132, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=9288, selfPID=7324, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7964, selfPID=2872, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5968, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7500, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4216, iMonCtr=2 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7808, iMonCtr=2 Suspended CPDN Monitor - Suspend request from BOINC... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4936, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3444, iMonCtr=2 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6448, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1548, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Called boinc_finish </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>hadam3p_eu_k54j_2013_1_008866628_0_10.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_k54j_2013_1_008866628_0_11.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_k54j_2013_1_008866628_0_12.zip</file_name> <error_code>-161</error_code> </file_xfer_error> </message> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
16 Sep 2014 00:31:16	1310670	16786536	hadam3p_eu_k54j_2013_1_008866628_0	103,776	329,199	3.1722
10 Sep 2014 01:02:22	1310670	16786536	hadam3p_eu_k54j_2013_1_008866628_0	92,256	292,944	3.1753
30 Aug 2014 10:37:08	1310670	16786536	hadam3p_eu_k54j_2013_1_008866628_0	80,773	257,136	3.1834
30 Aug 2014 01:34:53	1310670	16786536	hadam3p_eu_k54j_2013_1_008866628_0	80,736	256,401	3.1758
24 Aug 2014 04:10:42	1310670	16786536	hadam3p_eu_k54j_2013_1_008866628_0	69,216	220,201	3.1814
15 Aug 2014 10:47:47	1310670	16786536	hadam3p_eu_k54j_2013_1_008866628_0	57,696	182,904	3.1701
15 Aug 2014 10:47:47	1310670	16786536	hadam3p_eu_k54j_2013_1_008866628_0	46,176	145,469	3.1503
15 Aug 2014 10:47:47	1310670	16786536	hadam3p_eu_k54j_2013_1_008866628_0	34,656	108,191	3.1219
03 Aug 2014 15:39:08	1310670	16786536	hadam3p_eu_k54j_2013_1_008866628_0	23,159	73,162	3.1591
02 Aug 2014 15:23:01	1310670	16786536	hadam3p_eu_k54j_2013_1_008866628_0	23,136	72,605	3.1382
29 Jul 2014 17:36:28	1310670	16786536	hadam3p_eu_k54j_2013_1_008866628_0	11,616	36,236	3.1195