climateprediction.net home page
Task 14169593

Task 14169593

Name hadam3p_eu_aix2_1986_1_007807787_0
Workunit 7962896
Created 21 Feb 2012, 20:06:22 UTC
Sent 21 Feb 2012, 20:07:35 UTC
Report deadline 3 Feb 2013, 1:27:35 UTC
Received 18 Jun 2012, 9:04:11 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status -226 (0xFFFFFF1E) ERR_TOO_MANY_EXITS
Computer ID 1090072
Run time 4 days 14 hours 14 min 50 sec
CPU time 3 days 18 hours 47 min 12 sec
Validate state Invalid
Credit 1,790.38
Device peak FLOPS 0.78 GFLOPS
Application version UK Met Office HadAM3P-HadRM3P Europe v6.09
windows_intelx86
Stderr
<core_client_version>6.12.34</core_client_version>
<![CDATA[
<message>
too many exit(0)s
</message>
<stderr_txt>
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4804, selfPID=4468, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5556, selfPID=4772, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4084, selfPID=5080, iMonCtr=1
Model crash detected, will try to restart...
20:36:33 (2940): No heartbeat from core client for 30 sec - exiting
20:36:34 (2940): No heartbeat from core client for 30 sec - exiting
20:36:35 (2940): No heartbeat from core client for 30 sec - exiting
20:36:37 (2940): No heartbeat from core client for 30 sec - exiting
20:36:38 (2940): No heartbeat from core client for 30 sec - exiting
20:36:39 (2940): No heartbeat from core client for 30 sec - exiting
20:36:40 (2940): No heartbeat from core client for 30 sec - exiting
20:36:41 (2940): No heartbeat from core client for 30 sec - exiting
20:36:42 (2940): No heartbeat from core client for 30 sec - exiting
20:36:43 (2940): No heartbeat from core client for 30 sec - exiting
20:36:44 (2940): No heartbeat from core client for 30 sec - exiting
20:36:45 (2940): No heartbeat from core client for 30 sec - exiting
20:36:46 (2940): No heartbeat from core client for 30 sec - exiting
20:36:47 (2940): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
GController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=628, selfPID=4292, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2008, iMonCtr=2
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
CPDN Monitor - Quit request from BOINC...
Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4680, selfPID=4680, iMonCtr=2
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5648, iMonCtr=2
CPDN Monitor - Quit request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5576, selfPID=4988, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2976, selfPID=4916, iMonCtr=1
Model crash detected, will try to restart...
CPDN Monitor - Quit request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5116, iMonCtr=2
Model crash detected, will try to restart...
Controller:: CPDNCPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5436, iMonCtr=2
 process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4612, iMonCtr=2
Model crash detected, will try to restart...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=1628, selfPID=5116, iMonCtr=1
Model crash detected, will try to restart...
CPDN Monitor - Quit request from BOINC...
Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4448, selfPID=4448, iMonCtr=2
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2096, iMonCtr=2
CPDN Monitor - Quit request from BOINC...
Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5616, selfPID=5616, iMonCtr=2
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5640, selfPID=4244, iMonCtr=1
Model crash detected, will try to restart...
CPDN Monitor - Quit request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5464, selfPID=5068, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5532, selfPID=4216, iMonCtr=1
Model crash detected, will try to restart...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4208, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4100, iMonCtr=2
Model crash detected, will try to restart...
CPDN Monitor - Quit request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4504, selfPID=6012, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
CPDN Monitor - Quit request from BOINC...
Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4144, selfPID=4144, iMonCtr=2
CPDN Monitor - Quit request from BOINC...
Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5824, selfPID=5824, iMonCtr=2
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2696, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2784, selfPID=2976, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5912, selfPID=5004, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4980, selfPID=3876, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4696, selfPID=4268, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2780, selfPID=3804, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5028, selfPID=4820, iMonCtr=1
Model crash detected, will try to restart...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4860, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5644, selfPID=4760, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5584, selfPID=4744, iMonCtr=1
Model crash detected, will try to restart...
CPDN Monitor - Quit request from BOINC...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5424, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3632, iMonCtr=2
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
CPDN Monitor - Quit request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4892, iMonCtr=2
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4616, selfPID=4156, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4412, iMonCtr=2
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5556, selfPID=5052, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5108, iMonCtr=2
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5364, selfPID=5100, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5908, selfPID=5992, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5544, selfPID=4068, iMonCtr=1
Model crash detected, will try to restart...
17:47:47 (3848): No heartbeat from core client for 30 sec - exiting
17:47:49 (3848): No heartbeat from core client for 30 sec - exiting
17:47:50 (3848): No heartbeat from core client for 30 sec - exiting
17:47:51 (3848): No heartbeat from core client for 30 sec - exiting
17:47:52 (3848): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
17:52:40 (3412): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3520, selfPID=3520, iMonCtr=2
17:56:09 (180): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
17:56:10 (180): No heartbeat from core client for 30 sec - exiting
17:56:11 (180): No heartbeat from core client for 30 sec - exiting
17:56:12 (180): No heartbeat from core client for 30 sec - exiting
17:56:13 (180): No heartbeat from core client for 30 sec - exiting
17:56:14 (180): No heartbeat from core client for 30 sec - exiting
18:08:14 (3824): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
18:08:15 (3824): No heartbeat from core client for 30 sec - exiting
18:08:16 (3824): No heartbeat from core client for 30 sec - exiting
18:08:17 (3824): No heartbeat from core client for 30 sec - exiting
18:08:19 (3824): No heartbeat from core client for 30 sec - exiting
18:08:20 (3824): No heartbeat from core client for 30 sec - exiting
18:08:21 (3824): No heartbeat from core client for 30 sec - exiting
18:08:22 (3824): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
18:35:27 (4040): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
18:35:28 (4040): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4708, selfPID=2244, iMonCtr=1
Model crash detected, will try to restart...

</stderr_txt>
]]>
Latest Trickles Received
Time Sent (UTC) Host ID Result ID Result Name Timestep CPU Time (sec) Average (sec/TS)
15 Jun 2012 19:51:04 1090072 14169593 hadam3p_eu_aix2_1986_1_007807787_0 103,786 307,925 2.9669
14 Jun 2012 21:10:26 1090072 14169593 hadam3p_eu_aix2_1986_1_007807787_0 103,782 307,465 2.9626
13 Jun 2012 15:42:07 1090072 14169593 hadam3p_eu_aix2_1986_1_007807787_0 103,776 306,502 2.9535
08 Jun 2012 08:34:44 1090072 14169593 hadam3p_eu_aix2_1986_1_007807787_0 92,256 271,028 2.9378
02 Jun 2012 12:43:47 1090072 14169593 hadam3p_eu_aix2_1986_1_007807787_0 80,736 241,646 2.9930
26 May 2012 18:30:19 1090072 14169593 hadam3p_eu_aix2_1986_1_007807787_0 69,216 204,314 2.9518
20 May 2012 22:55:36 1090072 14169593 hadam3p_eu_aix2_1986_1_007807787_0 57,701 172,937 2.9971
20 May 2012 21:54:15 1090072 14169593 hadam3p_eu_aix2_1986_1_007807787_0 57,696 172,569 2.9910
07 May 2012 07:56:50 1090072 14169593 hadam3p_eu_aix2_1986_1_007807787_0 46,176 136,513 2.9564
30 Apr 2012 09:50:36 1090072 14169593 hadam3p_eu_aix2_1986_1_007807787_0 34,656 96,712 2.7906
22 Apr 2012 18:29:55 1090072 14169593 hadam3p_eu_aix2_1986_1_007807787_0 23,136 64,645 2.7941
16 Apr 2012 18:12:17 1090072 14169593 hadam3p_eu_aix2_1986_1_007807787_0 11,618 32,450 2.7931
16 Apr 2012 12:48:52 1090072 14169593 hadam3p_eu_aix2_1986_1_007807787_0 11,616 32,096 2.7631


©2024 cpdn.org