Name | hadam3p_eu_j146_2013_1_008532955_0 |
Workunit | 8680467 |
Created | 3 Mar 2014, 15:31:01 UTC |
Sent | 4 Mar 2014, 11:10:06 UTC |
Report deadline | 14 Feb 2015, 16:30:06 UTC |
Received | 13 May 2014, 11:31:54 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | -226 (0xFFFFFF1E) ERR_TOO_MANY_EXITS |
Computer ID | 1316560 |
Run time | 8 days 0 hours 49 min 5 sec |
CPU time | 6 days 17 hours 48 min 36 sec |
Validate state | Invalid |
Credit | 2,186.01 |
Device peak FLOPS | 1.58 GFLOPS |
Application version | UK Met Office HadAM3P-HadRM3P Europe v6.09 windows_intelx86 |
Stderr | <core_client_version>7.2.42</core_client_version> <![CDATA[ <message> too many exit(0)s </message> <stderr_txt> CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4412, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4728, selfPID=4536, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4424, selfPID=2916, iMonCtr=1 Model crash detected, will try to restart... 14:28:19 (4200): No heartbeat from core client for 30 sec - exiting 14:28:20 (4200): No heartbeat from core client for 30 sec - exiting 14:28:21 (4200): No heartbeat from core client for 30 sec - exiting 14:28:22 (4200): No heartbeat from core client for 30 sec - exiting 14:28:23 (4200): No heartbeat from core client for 30 sec - exiting 14:28:24 (4200): No heartbeat from core client for 30 sec - exiting 14:28:26 (4200): No heartbeat from core client for 30 sec - exiting 14:28:27 (4200): No heartbeat from core client for 30 sec - exiting 14:28:28 (4200): No heartbeat from core client for 30 sec - exiting 14:28:29 (4200): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3152, iMonCtr=2 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=1228, selfPID=1228, iMonCtr=2 Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2380, iMonCtr=2 CPDN Monitor - Quit request from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2276, selfPID=2276, iMonCtr=2 CCPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4744, selfPID=3524, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=804, iMonCtr=2 Model crash detected, will try to restart... Atmos Restart file copy failed on atmos_restart.day CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4876, selfPID=3996, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3564, iMonCtr=2 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4800, selfPID=4800, iMonCtr=2 CPDN Monitor - Quit request from BOINC... 11:28:34 (3088): No heartbeat from core client for 30 sec - exiting 11:28:35 (3088): No heartbeat from core client for 30 sec - exiting 11:28:36 (3088): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 12:27:58 (4312): No heartbeat from core client for 30 sec - exiting 12:27:59 (4312): No heartbeat from core client for 30 sec - exiting 12:28:00 (4312): No heartbeat from core client for 30 sec - exiting 12:28:01 (4312): No heartbeat from core client for 30 sec - exiting 12:28:02 (4312): No heartbeat from core client for 30 sec - exiting 12:28:03 (4312): No heartbeat from core client for 30 sec - exiting 12:28:04 (4312): No heartbeat from core client for 30 sec - exiting 12:28:05 (4312): No heartbeat from core client for 30 sec - exiting 12:28:06 (4312): No heartbeat from core client for 30 sec - exiting 12:28:07 (4312): No heartbeat from core client for 30 sec - exiting 12:28:08 (4312): No heartbeat from core client for 30 sec - exiting 12:28:09 (4312): No heartbeat from core client for 30 sec - exiting 12:28:10 (4312): No heartbeat from core client for 30 sec - exiting 12:28:11 (4312): No heartbeat from core client for 30 sec - exiting 12:28:12 (4312): No heartbeat from core client for 30 sec - exiting 12:28:13 (4312): No heartbeat from core client for 30 sec - exiting 12:28:14 (4312): No heartbeat from core client for 30 sec - exiting 12:28:15 (4312): No heartbeat from core client for 30 sec - exiting 12:28:16 (4312): No heartbeat from core client for 30 sec - exiting 12:28:17 (4312): No heartbeat from core client for 30 sec - exiting 12:28:18 (4312): No heartbeat from core client for 30 sec - exiting 12:28:20 (4312): No heartbeat from core client for 30 sec - exiting 12:28:21 (4312): No heartbeat from core client for 30 sec - exiting 12:28:22 (4312): No heartbeat from core client for 30 sec - exiting 12:28:23 (4312): No heartbeat from core client for 30 sec - exiting 12:28:24 (4312): No heartbeat from core client for 30 sec - exiting 12:28:25 (4312): No heartbeat from core client for 30 sec - exiting 12:28:26 (4312): No heartbeat from core client for 30 sec - exiting 12:28:27 (4312): No heartbeat from core client for 30 sec - exiting 12:28:28 (4312): No heartbeat from core client for 30 sec - exiting 12:28:29 (4312): No heartbeat from core client for 30 sec - exiting CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7824, selfPID=7900, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5752, selfPID=964, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4412, selfPID=3660, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2120, selfPID=4176, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3884, selfPID=4200, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5108, selfPID=5108, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=576, selfPID=3744, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4080, selfPID=3344, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2952, selfPID=3832, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3176, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2524, selfPID=3404, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5008, selfPID=4004, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4960, selfPID=3096, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4180, selfPID=4000, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3044, selfPID=4084, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4796, selfPID=3860, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... CPDN Monitor - Quit request from BOINC... GController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3744, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5080, selfPID=3612, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4952, iMonCtr=2 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5032, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4988, selfPID=3780, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4612, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2936, selfPID=3420, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2028, selfPID=3020, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5068, selfPID=2776, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5932, selfPID=4004, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4028, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4916, selfPID=3904, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2336, iMonCtr=2 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4196, iMonCtr=2 CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5880, selfPID=3216, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3040, selfPID=3788, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4340, iMonCtr=2 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1744, iMonCtr=2 Model crash detected, will try to restart... GCPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4596, iMonCtr=2 CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4408, selfPID=4356, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5096, selfPID=4504, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 14:20:33 (4132): No heartbeat from core client for 30 sec - exiting 14:20:34 (4132): No heartbeat from core client for 30 sec - exiting 14:20:35 (4132): No heartbeat from core client for 30 sec - exiting 14:20:36 (4132): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 11:58:21 (3932): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=172, selfPID=560, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5556, selfPID=5556, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4900, selfPID=4176, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7412, selfPID=5144, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4996, selfPID=4076, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... GController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4716, selfPID=4180, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2096, selfPID=3452, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4932, selfPID=3756, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3984, selfPID=4120, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3324, selfPID=4424, iMonCtr=1 Model crash detected, will try to restart... 15:20:22 (4384): No heartbeat from core client for 30 sec - exiting 15:20:23 (4384): No heartbeat from core client for 30 sec - exiting 15:20:24 (4384): No heartbeat from core client for 30 sec - exiting 15:20:25 (4384): No heartbeat from core client for 30 sec - exiting 15:20:26 (4384): No heartbeat from core client for 30 sec - exiting 15:20:27 (4384): No heartbeat from core client for 30 sec - exiting 15:20:28 (4384): No heartbeat from core client for 30 sec - exiting 15:20:30 (4384): No heartbeat from core client for 30 sec - exiting 15:20:31 (4384): No heartbeat from core client for 30 sec - exiting 15:20:32 (4384): No heartbeat from core client for 30 sec - exiting 15:20:33 (4384): No heartbeat from core client for 30 sec - exiting 15:20:34 (4384): No heartbeat from core client for 30 sec - exiting 15:20:35 (4384): No heartbeat from core client for 30 sec - exiting 15:20:36 (4384): No heartbeat from core client for 30 sec - exiting 15:20:37 (4384): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... </stderr_txt> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
08 May 2014 22:28:27 | 1316560 | 16314961 | hadam3p_eu_j146_2013_1_008532955_0 | 126,720 | 572,377 | 4.5169 |
04 May 2014 13:17:15 | 1316560 | 16314961 | hadam3p_eu_j146_2013_1_008532955_0 | 115,200 | 514,577 | 4.4668 |
29 Apr 2014 10:41:34 | 1316560 | 16314961 | hadam3p_eu_j146_2013_1_008532955_0 | 103,680 | 457,024 | 4.4080 |
23 Apr 2014 17:13:49 | 1316560 | 16314961 | hadam3p_eu_j146_2013_1_008532955_0 | 92,160 | 403,549 | 4.3788 |
20 Apr 2014 10:33:25 | 1316560 | 16314961 | hadam3p_eu_j146_2013_1_008532955_0 | 80,640 | 349,430 | 4.3332 |
16 Apr 2014 19:46:44 | 1316560 | 16314961 | hadam3p_eu_j146_2013_1_008532955_0 | 69,120 | 290,900 | 4.2086 |
01 Apr 2014 09:46:09 | 1316560 | 16314961 | hadam3p_eu_j146_2013_1_008532955_0 | 57,600 | 237,182 | 4.1177 |
26 Mar 2014 17:56:36 | 1316560 | 16314961 | hadam3p_eu_j146_2013_1_008532955_0 | 46,080 | 190,634 | 4.1370 |
19 Mar 2014 13:07:18 | 1316560 | 16314961 | hadam3p_eu_j146_2013_1_008532955_0 | 34,560 | 143,576 | 4.1544 |
13 Mar 2014 11:08:29 | 1316560 | 16314961 | hadam3p_eu_j146_2013_1_008532955_0 | 23,142 | 97,923 | 4.2314 |
12 Mar 2014 16:16:30 | 1316560 | 16314961 | hadam3p_eu_j146_2013_1_008532955_0 | 23,136 | 97,498 | 4.2141 |
07 Mar 2014 16:20:52 | 1316560 | 16314961 | hadam3p_eu_j146_2013_1_008532955_0 | 11,616 | 47,053 | 4.0507 |
©2024 cpdn.org