Name | hadam3p_eu_cjsz_2002_1_007983812_2 |
Workunit | 8138926 |
Created | 16 Jun 2012, 23:50:08 UTC |
Sent | 16 Jun 2012, 23:50:19 UTC |
Report deadline | 30 May 2013, 5:10:19 UTC |
Received | 25 Jul 2012, 4:23:44 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | 193 (0x000000C1) EXIT_SIGNAL |
Computer ID | 989394 |
Run time | 7 days 12 hours 51 min 35 sec |
CPU time | 6 days 0 hours 37 min 13 sec |
Validate state | Invalid |
Credit | 1,988.94 |
Device peak FLOPS | 1.47 GFLOPS |
Application version | UK Met Office HadAM3P-HadRM3P Europe v6.09 windows_intelx86 |
Stderr | <core_client_version>6.6.36</core_client_version> <![CDATA[ <message> - exit code 193 (0xc1) </message> <stderr_txt> Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=784, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... 19:48:59 (2708): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3188, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3884, iMonCtr=2 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4664, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2928, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... 11:45:05 (4540): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 11:45:08 (4540): No heartbeat from core client for 30 sec - exiting 11:45:09 (4540): No heartbeat from core client for 30 sec - exiting 11:45:10 (4540): No heartbeat from core client for 30 sec - exiting 11:45:11 (4540): No heartbeat from core client for 30 sec - exiting 11:45:12 (4540): No heartbeat from core client for 30 sec - exiting 11:45:13 (4540): No heartbeat from core client for 30 sec - exiting 11:45:14 (4540): No heartbeat from core client for 30 sec - exiting 11:45:15 (4540): No heartbeat from core client for 30 sec - exiting 11:45:16 (4540): No heartbeat from core client for 30 sec - exiting 11:45:17 (4540): No heartbeat from core client for 30 sec - exiting 11:45:18 (4540): No heartbeat from core client for 30 sec - exiting 11:45:19 (4540): No heartbeat from core client for 30 sec - exiting 11:45:20 (4540): No heartbeat from core client for 30 sec - exiting 11:45:21 (4540): No heartbeat from core client for 30 sec - exiting 11:45:22 (4540): No heartbeat from core client for 30 sec - exiting 11:45:23 (4540): No heartbeat from core client for 30 sec - exiting 11:45:24 (4540): No heartbeat from core client for 30 sec - exiting 11:45:25 (4540): No heartbeat from core client for 30 sec - exiting 11:45:26 (4540): No heartbeat from core client for 30 sec - exiting 11:45:27 (4540): No heartbeat from core client for 30 sec - exiting 11:45:28 (4540): No heartbeat from core client for 30 sec - exiting 11:45:29 (4540): No heartbeat from core client for 30 sec - exiting 11:45:30 (4540): No heartbeat from core client for 30 sec - exiting 11:45:31 (4540): No heartbeat from core client for 30 sec - exiting 11:45:32 (4540): No heartbeat from core client for 30 sec - exiting 11:45:33 (4540): No heartbeat from core client for 30 sec - exiting 11:45:34 (4540): No heartbeat from core client for 30 sec - exiting 11:45:35 (4540): No heartbeat from core client for 30 sec - exiting 11:45:36 (4540): No heartbeat from core client for 30 sec - exiting 11:45:37 (4540): No heartbeat from core client for 30 sec - exiting 11:45:38 (4540): No heartbeat from core client for 30 sec - exiting 11:45:39 (4540): No heartbeat from core client for 30 sec - exiting 11:45:40 (4540): No heartbeat from core client for 30 sec - exiting 11:45:41 (4540): No heartbeat from core client for 30 sec - exiting 11:45:42 (4540): No heartbeat from core client for 30 sec - exiting 11:45:43 (4540): No heartbeat from core client for 30 sec - exiting 11:45:44 (4540): No heartbeat from core client for 30 sec - exiting 11:45:45 (4540): No heartbeat from core client for 30 sec - exiting 11:45:46 (4540): No heartbeat from core client for 30 sec - exiting 11:45:47 (4540): No heartbeat from core client for 30 sec - exiting 11:45:48 (4540): No heartbeat from core client for 30 sec - exiting 11:45:50 (4540): No heartbeat from core client for 30 sec - exiting 11:45:51 (4540): No heartbeat from core client for 30 sec - exiting 11:45:52 (4540): No heartbeat from core client for 30 sec - exiting 11:45:53 (4540): No heartbeat from core client for 30 sec - exiting 11:45:54 (4540): No heartbeat from core client for 30 sec - exiting 11:45:55 (4540): No heartbeat from core client for 30 sec - exiting 11:45:56 (4540): No heartbeat from core client for 30 sec - exiting 11:45:57 (4540): No heartbeat from core client for 30 sec - exiting 11:45:58 (4540): No heartbeat from core client for 30 sec - exiting 11:45:59 (4540): No heartbeat from core client for 30 sec - exiting 11:46:00 (4540): No heartbeat from core client for 30 sec - exiting 11:46:01 (4540): No heartbeat from core client for 30 sec - exiting 11:46:02 (4540): No heartbeat from core client for 30 sec - exiting 11:46:03 (4540): No heartbeat from core client for 30 sec - exiting 11:46:04 (4540): No heartbeat from core client for 30 sec - exiting 11:46:05 (4540): No heartbeat from core client for 30 sec - exiting 11:46:06 (4540): No heartbeat from core client for 30 sec - exiting 11:46:07 (4540): No heartbeat from core client for 30 sec - exiting 11:46:08 (4540): No heartbeat from core client for 30 sec - exiting 11:46:09 (4540): No heartbeat from core client for 30 sec - exiting 11:46:10 (4540): No heartbeat from core client for 30 sec - exiting 11:46:11 (4540): No heartbeat from core client for 30 sec - exiting 11:46:12 (4540): No heartbeat from core client for 30 sec - exiting 11:46:13 (4540): No heartbeat from core client for 30 sec - exiting 11:46:14 (4540): No heartbeat from core client for 30 sec - exiting 11:46:15 (4540): No heartbeat from core client for 30 sec - exiting 11:46:16 (4540): No heartbeat from core client for 30 sec - exiting 11:46:17 (4540): No heartbeat from core client for 30 sec - exiting 11:46:18 (4540): No heartbeat from core client for 30 sec - exiting 11:46:19 (4540): No heartbeat from core client for 30 sec - exiting 11:46:20 (4540): No heartbeat from core client for 30 sec - exiting 11:46:21 (4540): No heartbeat from core client for 30 sec - exiting 11:46:22 (4540): No heartbeat from core client for 30 sec - exiting 11:46:23 (4540): No heartbeat from core client for 30 sec - exiting 11:46:24 (4540): No heartbeat from core client for 30 sec - exiting 11:46:25 (4540): No heartbeat from core client for 30 sec - exiting 11:46:26 (4540): No heartbeat from core client for 30 sec - exiting 11:46:27 (4540): No heartbeat from core client for 30 sec - exiting 11:46:28 (4540): No heartbeat from core client for 30 sec - exiting 11:46:29 (4540): No heartbeat from core client for 30 sec - exiting 11:46:30 (4540): No heartbeat from core client for 30 sec - exiting 11:46:31 (4540): No heartbeat from core client for 30 sec - exiting 11:46:32 (4540): No heartbeat from core client for 30 sec - exiting 11:46:33 (4540): No heartbeat from core client for 30 sec - exiting 11:46:34 (4540): No heartbeat from core client for 30 sec - exiting 11:46:35 (4540): No heartbeat from core client for 30 sec - exiting 11:46:36 (4540): No heartbeat from core client for 30 sec - exiting 11:46:37 (4540): No heartbeat from core client for 30 sec - exiting 11:46:38 (4540): No heartbeat from core client for 30 sec - exiting 11:46:39 (4540): No heartbeat from core client for 30 sec - exiting 11:46:40 (4540): No heartbeat from core client for 30 sec - exiting 11:46:41 (4540): No heartbeat from core client for 30 sec - exiting 11:46:42 (4540): No heartbeat from core client for 30 sec - exiting 11:46:43 (4540): No heartbeat from core client for 30 sec - exiting 11:46:44 (4540): No heartbeat from core client for 30 sec - exiting 11:46:45 (4540): No heartbeat from core client for 30 sec - exiting 11:46:46 (4540): No heartbeat from core client for 30 sec - exiting 11:46:47 (4540): No heartbeat from core client for 30 sec - exiting 11:46:48 (4540): No heartbeat from core client for 30 sec - exiting 11:46:49 (4540): No heartbeat from core client for 30 sec - exiting 11:46:50 (4540): No heartbeat from core client for 30 sec - exiting 11:46:51 (4540): No heartbeat from core client for 30 sec - exiting Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4280, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3096, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4560, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4956, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4528, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4412, iMonCtr=2 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5240, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4392, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1260, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4280, selfPID=4580, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5008, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4620, iMonCtr=2 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5616, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4628, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4792, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4212, iMonCtr=2 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5876, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4976, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4276, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4628, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4308, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... 22:20:43 (4412): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5388, selfPID=5388, iMonCtr=2 Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5420, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5792, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4756, iMonCtr=2 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2628, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3128, iMonCtr=2 Model crash detected, will try to restart... 11:15:12 (4368): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 11:15:13 (4368): No heartbeat from core client for 30 sec - exiting CPDN Monitor - Quit request from BOINC... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4804, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4244, iMonCtr=2 Model crash detected, will try to restart... GController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4784, iMonCtr=2 Model crash detected, will try to restart... lobal Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5892, iMonCtr=2 Leaving CPDN_Main::Monitor... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5820, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4724, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5812, selfPID=4428, iMonCtr=1 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5808, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4616, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4308, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1996, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5040, selfPID=2092, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6012, selfPID=4852, iMonCtr=1 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5644, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4620, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4184, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4504, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4416, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4216, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4252, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5756, selfPID=4700, iMonCtr=1 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5892, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4664, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5276, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=780, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3868, iMonCtr=2 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2688, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4844, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5172, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1376, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4524, iMonCtr=2 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5408, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4468, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4432, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5736, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5284, iMonCtr=2 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5020, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2292, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5292, selfPID=4840, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4328, iMonCtr=2 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5048, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4320, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4016, iMonCtr=2 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5684, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5444, iMonCtr=2 Model crash detected, will try to restart... 22:46:26 (4844): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4268, selfPID=4268, iMonCtr=2 Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5464, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5264, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3288, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3476, selfPID=4912, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4536, iMonCtr=2 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish </stderr_txt> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
18 Jul 2012 04:43:20 | 989394 | 14794488 | hadam3p_eu_cjsz_2002_1_007983812_2 | 115,296 | 485,172 | 4.2081 |
13 Jul 2012 22:12:51 | 989394 | 14794488 | hadam3p_eu_cjsz_2002_1_007983812_2 | 103,781 | 436,285 | 4.2039 |
13 Jul 2012 06:39:06 | 989394 | 14794488 | hadam3p_eu_cjsz_2002_1_007983812_2 | 103,776 | 435,617 | 4.1977 |
07 Jul 2012 18:13:39 | 989394 | 14794488 | hadam3p_eu_cjsz_2002_1_007983812_2 | 92,256 | 386,767 | 4.1923 |
03 Jul 2012 04:17:18 | 989394 | 14794488 | hadam3p_eu_cjsz_2002_1_007983812_2 | 80,736 | 338,634 | 4.1943 |
02 Jul 2012 16:49:04 | 989394 | 14794488 | hadam3p_eu_cjsz_2002_1_007983812_2 | 69,216 | 291,694 | 4.2143 |
29 Jun 2012 04:29:16 | 989394 | 14794488 | hadam3p_eu_cjsz_2002_1_007983812_2 | 57,696 | 243,588 | 4.2219 |
25 Jun 2012 03:59:00 | 989394 | 14794488 | hadam3p_eu_cjsz_2002_1_007983812_2 | 46,176 | 195,166 | 4.2266 |
23 Jun 2012 05:19:33 | 989394 | 14794488 | hadam3p_eu_cjsz_2002_1_007983812_2 | 34,656 | 146,418 | 4.2249 |
20 Jun 2012 23:11:22 | 989394 | 14794488 | hadam3p_eu_cjsz_2002_1_007983812_2 | 23,136 | 97,809 | 4.2276 |
18 Jun 2012 22:05:34 | 989394 | 14794488 | hadam3p_eu_cjsz_2002_1_007983812_2 | 11,616 | 50,115 | 4.3143 |
©2024 cpdn.org