Name | hadam3p_eu_6l7j_2005_1_007494756_0 |
Workunit | 7692231 |
Created | 14 Oct 2011, 19:14:25 UTC |
Sent | 14 Oct 2011, 19:21:54 UTC |
Report deadline | 26 Sep 2012, 0:41:54 UTC |
Received | 28 Oct 2011, 9:27:18 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | 194 (0x000000C2) EXIT_ABORTED_BY_CLIENT |
Computer ID | 1041608 |
Run time | 3 days 7 hours 50 min 53 sec |
CPU time | 3 days 7 hours 50 min 53 sec |
Validate state | Invalid |
Credit | 1,194.02 |
Device peak FLOPS | 1.85 GFLOPS |
Application version | UK Met Office HadAM3P-HadRM3P Europe v6.09 windows_intelx86 |
Stderr | <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> Got ack for job that's till active </message> <stderr_txt> 17:41:27 (4624): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:18:57 (400): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:18:59 (5604): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:54:31 (6104): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5964, selfPID=5964, iMonCtr=2 Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5732, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... 20:29:58 (5684): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 17:17:18 (5512): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 17:17:19 (5512): No heartbeat from core client for 30 sec - exiting 19:08:40 (5528): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 18:08:48 (5008): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:08:51 (5008): No heartbeat from core client for 30 sec - exiting 18:08:52 (5008): No heartbeat from core client for 30 sec - exiting 18:08:54 (5008): No heartbeat from core client for 30 sec - exiting 18:08:55 (5008): No heartbeat from core client for 30 sec - exiting 18:08:56 (5008): No heartbeat from core client for 30 sec - exiting 19:00:46 (7716): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:00:47 (7716): No heartbeat from core client for 30 sec - exiting Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6012, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1836, iMonCtr=2 Model crash detected, will try to restart... 20:08:37 (5712): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:08:38 (5712): No heartbeat from core client for 30 sec - exiting 20:08:39 (5712): No heartbeat from core client for 30 sec - exiting 20:15:33 (3400): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:15:35 (3400): No heartbeat from core client for 30 sec - exiting Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3596, iMonCtr=2 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3812, iMonCtr=2 09:22:19 (5820): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6128, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3304, selfPID=2476, iMonCtr=1 Model crash detected, will try to restart... 09:55:47 (5564): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 09:55:48 (5564): No heartbeat from core client for 30 sec - exiting 09:55:49 (5564): No heartbeat from core client for 30 sec - exiting 13:51:21 (5868): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 13:51:22 (5868): No heartbeat from core client for 30 sec - exiting 13:51:23 (5868): No heartbeat from core client for 30 sec - exiting 13:51:24 (5868): No heartbeat from core client for 30 sec - exiting 14:06:24 (3112): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 14:22:05 (4092): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 14:22:06 (4092): No heartbeat from core client for 30 sec - exiting 14:59:36 (2276): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5888, selfPID=5888, iMonCtr=2 15:06:48 (3860): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 15:09:24 (4888): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3220, selfPID=3220, iMonCtr=2 Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5620, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4584, selfPID=3280, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4444, selfPID=5028, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 18:11:01 (3480): No heartbeat from core client for 30 sec - exiting 18:11:03 (3480): No heartbeat from core client for 30 sec - exiting 18:11:04 (3480): No heartbeat from core client for 30 sec - exiting 18:11:05 (3480): No heartbeat from core client for 30 sec - exiting 18:11:06 (3480): No heartbeat from core client for 30 sec - exiting 18:11:07 (3480): No heartbeat from core client for 30 sec - exiting 18:11:08 (3480): No heartbeat from core client for 30 sec - exiting 18:11:09 (3480): No heartbeat from core client for 30 sec - exiting 18:11:10 (3480): No heartbeat from core client for 30 sec - exiting 18:11:11 (3480): No heartbeat from core client for 30 sec - exiting 18:11:12 (3480): No heartbeat from core client for 30 sec - exiting 18:11:13 (3480): No heartbeat from core client for 30 sec - exiting 18:11:14 (3480): No heartbeat from core client for 30 sec - exiting 18:11:15 (3480): No heartbeat from core client for 30 sec - exiting 18:11:16 (3480): No heartbeat from core client for 30 sec - exiting 18:11:17 (3480): No heartbeat from core client for 30 sec - exiting 18:11:18 (3480): No heartbeat from core client for 30 sec - exiting 18:11:19 (3480): No heartbeat from core client for 30 sec - exiting 18:11:20 (3480): No heartbeat from core client for 30 sec - exiting 18:11:21 (3480): No heartbeat from core client for 30 sec - exiting 18:11:22 (3480): No heartbeat from core client for 30 sec - exiting 18:11:23 (3480): No heartbeat from core client for 30 sec - exiting 18:11:24 (3480): No heartbeat from core client for 30 sec - exiting 18:11:25 (3480): No heartbeat from core client for 30 sec - exiting 18:11:26 (3480): No heartbeat from core client for 30 sec - exiting 18:11:27 (3480): No heartbeat from core client for 30 sec - exiting 18:11:28 (3480): No heartbeat from core client for 30 sec - exiting 18:11:29 (3480): No heartbeat from core client for 30 sec - exiting 18:11:30 (3480): No heartbeat from core client for 30 sec - exiting 18:11:31 (3480): No heartbeat from core client for 30 sec - exiting 18:11:32 (3480): No heartbeat from core client for 30 sec - exiting 18:11:33 (3480): No heartbeat from core client for 30 sec - exiting 18:11:34 (3480): No heartbeat from core client for 30 sec - exiting 18:11:35 (3480): No heartbeat from core client for 30 sec - exiting 18:11:36 (3480): No heartbeat from core client for 30 sec - exiting 18:11:37 (3480): No heartbeat from core client for 30 sec - exiting 18:11:38 (3480): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5556, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3420, iMonCtr=2 Model crash detected, will try to restart... BUFFOUT: Write Failed: BLeaving CPDN_Main::Monitor... zip error: Output file write failure (write error on zip file) Called boinc_finish </stderr_txt> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
31 Oct 2011 17:14:45 | 1041608 | 13487307 | hadam3p_eu_6l7j_2005_1_007494756_0 | 69,216 | 265,830 | 3.8406 |
31 Oct 2011 17:14:45 | 1041608 | 13487307 | hadam3p_eu_6l7j_2005_1_007494756_0 | 57,707 | 221,688 | 3.8416 |
31 Oct 2011 17:14:45 | 1041608 | 13487307 | hadam3p_eu_6l7j_2005_1_007494756_0 | 57,699 | 221,058 | 3.8312 |
31 Oct 2011 17:14:45 | 1041608 | 13487307 | hadam3p_eu_6l7j_2005_1_007494756_0 | 57,696 | 220,442 | 3.8208 |
31 Oct 2011 17:14:45 | 1041608 | 13487307 | hadam3p_eu_6l7j_2005_1_007494756_0 | 46,176 | 175,809 | 3.8074 |
31 Oct 2011 17:14:45 | 1041608 | 13487307 | hadam3p_eu_6l7j_2005_1_007494756_0 | 34,656 | 131,484 | 3.7940 |
31 Oct 2011 17:14:45 | 1041608 | 13487307 | hadam3p_eu_6l7j_2005_1_007494756_0 | 23,136 | 87,723 | 3.7916 |
18 Oct 2011 18:24:38 | 1041608 | 13487307 | hadam3p_eu_6l7j_2005_1_007494756_0 | 11,616 | 43,831 | 3.7733 |
©2024 cpdn.org