Name | hadam3p_eu_d08c_2003_1_007966510_1 |
Workunit | 8121624 |
Created | 4 Jun 2012, 5:58:33 UTC |
Sent | 4 Jun 2012, 6:07:19 UTC |
Report deadline | 17 May 2013, 11:27:19 UTC |
Received | 8 Jul 2012, 11:41:43 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | -226 (0xFFFFFF1E) ERR_TOO_MANY_EXITS |
Computer ID | 1202128 |
Run time | 5 days 1 hours 33 min 31 sec |
CPU time | 4 days 5 hours 40 min 36 sec |
Validate state | Invalid |
Credit | 1,591.48 |
Device peak FLOPS | 2.69 GFLOPS |
Application version | UK Met Office HadAM3P-HadRM3P Europe v6.09 windows_intelx86 |
Stderr | <core_client_version>7.0.25</core_client_version> <![CDATA[ <message> too many exit(0)s </message> <stderr_txt> 14:35:34 (1004): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 14:35:35 (1004): No heartbeat from core client for 30 sec - exiting Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7600, selfPID=4824, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6820, selfPID=4684, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7380, selfPID=5412, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7820, selfPID=6520, iMonCtr=1 Model crash detected, will try to restart... GController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3752, selfPID=7300, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2152, selfPID=5648, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7784, selfPID=5108, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6156, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... 19:09:27 (5748): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7588, selfPID=7172, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7924, selfPID=6696, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5048, selfPID=6368, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7896, selfPID=6408, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7656, selfPID=5944, iMonCtr=1 Model crash detected, will try to restart... 14:23:10 (6004): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6860, selfPID=6860, iMonCtr=2 14:23:11 (6004): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2860, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=1532, selfPID=5532, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... 06:01:39 (7096): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7016, selfPID=4628, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6660, selfPID=4496, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... 04:53:25 (5700): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4788, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3752, selfPID=7708, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7948, selfPID=7028, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6288, selfPID=5704, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5808, selfPID=5756, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4816, iMonCtr=2 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6172, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7148, selfPID=6240, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... 05:55:18 (5708): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6892, iMonCtr=2 05:53:22 (6656): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7500, selfPID=5864, iMonCtr=1 Model crash detected, will try to restart... 18:08:35 (4744): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6860, iMonCtr=2 06:18:50 (5868): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... </stderr_txt> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
22 Jun 2012 04:20:23 | 1202128 | 14770577 | hadam3p_eu_d08c_2003_1_007966510_1 | 92,256 | 356,403 | 3.8632 |
22 Jun 2012 04:20:23 | 1202128 | 14770577 | hadam3p_eu_d08c_2003_1_007966510_1 | 80,736 | 314,054 | 3.8899 |
22 Jun 2012 04:20:23 | 1202128 | 14770577 | hadam3p_eu_d08c_2003_1_007966510_1 | 69,216 | 262,875 | 3.7979 |
22 Jun 2012 04:20:23 | 1202128 | 14770577 | hadam3p_eu_d08c_2003_1_007966510_1 | 57,696 | 217,908 | 3.7768 |
12 Jun 2012 03:05:50 | 1202128 | 14770577 | hadam3p_eu_d08c_2003_1_007966510_1 | 46,176 | 173,248 | 3.7519 |
12 Jun 2012 03:05:50 | 1202128 | 14770577 | hadam3p_eu_d08c_2003_1_007966510_1 | 34,656 | 129,570 | 3.7387 |
07 Jun 2012 04:09:09 | 1202128 | 14770577 | hadam3p_eu_d08c_2003_1_007966510_1 | 23,136 | 86,267 | 3.7287 |
07 Jun 2012 04:09:09 | 1202128 | 14770577 | hadam3p_eu_d08c_2003_1_007966510_1 | 11,616 | 45,269 | 3.8971 |
©2024 cpdn.org