Name | hadam3p_pnw_31nr_1961_1_008183730_0 |
Workunit | 8338854 |
Created | 4 Sep 2012, 17:49:48 UTC |
Sent | 4 Sep 2012, 17:55:43 UTC |
Report deadline | 17 Aug 2013, 23:15:43 UTC |
Received | 6 Oct 2012, 10:39:38 UTC |
Server state | Over |
Outcome | Success |
Client state | Done |
Exit status | 0 (0x00000000) |
Computer ID | 1170519 |
Run time | 5 days 14 hours 19 min 14 sec |
CPU time | 5 days 6 hours 35 min 11 sec |
Validate state | Workunit error - check skipped |
Credit | 3,005.88 |
Device peak FLOPS | 2.49 GFLOPS |
Application version | UK Met Office HadAM3P-HadRM3P Pacific North West v6.09 windows_intelx86 |
Stderr | <core_client_version>7.0.28</core_client_version> <![CDATA[ <stderr_txt> Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=1984, selfPID=1872, iMonCtr=1 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2796, selfPID=2584, iMonCtr=1 Model crash detected, will try to restart... No Process Handle Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2064, selfPID=3252, iMonCtr=1 Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1688, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4028, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=788, iMonCtr=2 No Process Handle Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3480, selfPID=320, iMonCtr=1 CPDN Monitor - Quit request from BOINC... No Process Handle Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=800, selfPID=692, iMonCtr=1 CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=1468, selfPID=3712, iMonCtr=1 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2324, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3652, selfPID=3256, iMonCtr=1 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1452, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2292, selfPID=3844, iMonCtr=1 Model crash detected, will try to restart... No Process Handle Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2920, selfPID=2452, iMonCtr=1 CPDN Monitor - Quit request from BOINC... No Process Handle Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3784, selfPID=3556, iMonCtr=1 CPDN Monitor - Quit request from BOINC... No Process Handle Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2416, selfPID=3128, iMonCtr=1 CPDN Monitor - Quit request from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2296, selfPID=2296, iMonCtr=2 No Process Handle Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3188, selfPID=4172, iMonCtr=1 CPDN Monitor - Quit request from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2628, selfPID=2628, iMonCtr=2 Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4868, selfPID=4868, iMonCtr=1 No Process Handle Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4868, selfPID=2372, iMonCtr=1 CPDN Monitor - Quit request from BOINC... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3944, selfPID=3944, iMonCtr=1 No Process Handle Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3944, selfPID=3952, iMonCtr=1 CPDN Monitor - Quit request from BOINC... No Process Handle Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2520, selfPID=2232, iMonCtr=1 GController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=788, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3956, iMonCtr=2 Model crash detected, will try to restart... No Process Handle Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=1072, selfPID=1692, iMonCtr=1 No Process Handle Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3996, selfPID=1456, iMonCtr=1 Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4252, iMonCtr=2 Model crash detected, will try to restart... No Process Handle Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2792, selfPID=3260, iMonCtr=1 CPDN Monitor - Quit request from BOINC... No Process Handle Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4108, selfPID=700, iMonCtr=1 CPDN Monitor - Quit request from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=876, selfPID=876, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=996, iMonCtr=2 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3624, iMonCtr=2 Leaving CPDN_Main::Monitor... Regional yearly means requires 12 input files got 3 No Process Handle Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=1732, selfPID=1144, iMonCtr=1 CPDN Monitor - Quit request from BOINC... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4216, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2256, selfPID=1384, iMonCtr=1 Model crash detected, will try to restart... No Process Handle Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=708, selfPID=3824, iMonCtr=1 No Process Handle Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2664, selfPID=304, iMonCtr=1 14:22:05 (2568): No heartbeat from core client for 30 sec - exiting 14:22:06 (2568): No heartbeat from core client for 30 sec - exiting 14:22:07 (2568): No heartbeat from core client for 30 sec - exiting 14:22:09 (2568): No heartbeat from core client for 30 sec - exiting 14:22:10 (2568): No heartbeat from core client for 30 sec - exiting 14:22:11 (2568): No heartbeat from core client for 30 sec - exiting 14:22:12 (2568): No heartbeat from core client for 30 sec - exiting 14:22:13 (2568): No heartbeat from core client for 30 sec - exiting 14:22:14 (2568): No heartbeat from core client for 30 sec - exiting 14:22:15 (2568): No heartbeat from core client for 30 sec - exiting 14:22:16 (2568): No heartbeat from core client for 30 sec - exiting 14:22:17 (2568): No heartbeat from core client for 30 sec - exiting 14:22:18 (2568): No heartbeat from core client for 30 sec - exiting 14:22:19 (2568): No heartbeat from core client for 30 sec - exiting 14:22:21 (2568): No heartbeat from core client for 30 sec - exiting 14:22:22 (2568): No heartbeat from core client for 30 sec - exiting 14:22:23 (2568): No heartbeat from core client for 30 sec - exiting 14:22:24 (2568): No heartbeat from core client for 30 sec - exiting 14:22:25 (2568): No heartbeat from core client for 30 sec - exiting 14:22:26 (2568): No heartbeat from core client for 30 sec - exiting 14:22:27 (2568): No heartbeat from core client for 30 sec - exiting 14:22:28 (2568): No heartbeat from core client for 30 sec - exiting 14:22:29 (2568): No heartbeat from core client for 30 sec - exiting 14:22:30 (2568): No heartbeat from core client for 30 sec - exiting 14:22:31 (2568): No heartbeat from core client for 30 sec - exiting 14:22:33 (2568): No heartbeat from core client for 30 sec - exiting 14:22:34 (2568): No heartbeat from core client for 30 sec - exiting 14:22:35 (2568): No heartbeat from core client for 30 sec - exiting 14:22:36 (2568): No heartbeat from core client for 30 sec - exiting 14:22:37 (2568): No heartbeat from core client for 30 sec - exiting 14:22:38 (2568): No heartbeat from core client for 30 sec - exiting 14:22:39 (2568): No heartbeat from core client for 30 sec - exiting 14:22:40 (2792): Can't acquire lockfile (32) - waiting 35s 14:22:40 (2568): No heartbeat from core client for 30 sec - exiting 14:22:41 (2568): No heartbeat from core client for 30 sec - exiting 14:22:42 (2568): No heartbeat from core client for 30 sec - exiting 14:22:43 (2568): No heartbeat from core client for 30 sec - exiting 14:22:45 (2568): No heartbeat from core client for 30 sec - exiting 14:22:46 (2568): No heartbeat from core client for 30 sec - exiting 14:22:47 (2568): No heartbeat from core client for 30 sec - exiting 14:22:48 (2568): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 14:22:49 (2568): No heartbeat from core client for 30 sec - exiting Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4888, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2792, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... No Process Handle Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2480, selfPID=344, iMonCtr=1 CPDN Monitor - Quit request from BOINC... No Process Handle Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4988, selfPID=1336, iMonCtr=1 CPDN Monitor - Quit request from BOINC... No Process Handle Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=1892, selfPID=3324, iMonCtr=1 GController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3376, selfPID=2504, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Regional yearly means requires 12 input files got 4 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3772, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Regional yearly means requires 12 input files got 4 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=200, selfPID=3608, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3564, iMonCtr=2 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3504, iMonCtr=2 Leaving CPDN_Main::Monitor... Regional yearly means requires 12 input files got 4 Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3996, selfPID=3608, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3408, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3664, iMonCtr=2 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3616, iMonCtr=2 Leaving CPDN_Main::Monitor... Regional yearly means requires 12 input files got 5 No Process Handle Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3708, selfPID=3608, iMonCtr=1 No Process Handle Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=1680, selfPID=1916, iMonCtr=1 No Process Handle Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2256, selfPID=3700, iMonCtr=1 CPDN Monitor - Quit request from BOINC... No Process Handle Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4012, selfPID=1268, iMonCtr=1 CPDN Monitor - Quit request from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3660, selfPID=3660, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4184, selfPID=2796, iMonCtr=1 Model crash detected, will try to restart... No Process Handle Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=1464, selfPID=2724, iMonCtr=1 GNo Process Handle Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2904, selfPID=1368, iMonCtr=1 CPDN Monitor - Quit request from BOINC... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=940, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3852, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3180, selfPID=3300, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3892, selfPID=3776, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Regional yearly means requires 12 input files got 6 GNo Process Handle Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=1804, selfPID=128, iMonCtr=1 CPDN Monitor - Quit request from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4784, selfPID=4784, iMonCtr=2 CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2788, selfPID=2128, iMonCtr=1 Model crash detected, will try to restart... No Process Handle Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2684, selfPID=720, iMonCtr=1 Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2028, selfPID=2028, iMonCtr=1 No Process Handle Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2028, selfPID=3016, iMonCtr=1 CPDN Monitor - Quit request from BOINC... No Process Handle Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2868, selfPID=2052, iMonCtr=1 CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4016, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4484, selfPID=3428, iMonCtr=1 Model crash detected, will try to restart... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3808, selfPID=2028, iMonCtr=1 No Process Handle Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3948, selfPID=3892, iMonCtr=1 10:05:27 (2660): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 10:05:30 (2660): No heartbeat from core client for 30 sec - exiting 10:05:31 (2660): No heartbeat from core client for 30 sec - exiting 10:05:32 (2660): No heartbeat from core client for 30 sec - exiting 10:05:34 (2660): No heartbeat from core client for 30 sec - exiting 10:05:35 (2660): No heartbeat from core client for 30 sec - exiting 10:05:36 (2660): No heartbeat from core client for 30 sec - exiting 10:05:37 (2660): No heartbeat from core client for 30 sec - exiting 10:05:38 (2660): No heartbeat from core client for 30 sec - exiting 10:05:39 (2660): No heartbeat from core client for 30 sec - exiting 10:05:40 (2660): No heartbeat from core client for 30 sec - exiting 10:05:41 (2660): No heartbeat from core client for 30 sec - exiting 10:05:42 (2660): No heartbeat from core client for 30 sec - exiting 10:05:43 (2660): No heartbeat from core client for 30 sec - exiting 10:05:44 (2660): No heartbeat from core client for 30 sec - exiting 10:05:46 (2660): No heartbeat from core client for 30 sec - exiting 10:05:47 (2660): No heartbeat from core client for 30 sec - exiting CPDN Monitor - Quit request from BOINC... No Process Handle Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4436, selfPID=2732, iMonCtr=1 CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3856, selfPID=1748, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3340, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2292, selfPID=3620, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Regional yearly means requires 12 input files got 8 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2812, selfPID=3788, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3544, selfPID=1376, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Regional yearly means requires 12 input files got 10 Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2288, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=848, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Regional yearly means requires 12 input files got 10 09:13:49 (2012): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=704, selfPID=704, iMonCtr=2 Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2224, selfPID=3344, iMonCtr=1 CGlobal Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3528, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3576, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Regional yearly means requires 12 input files got 11 Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=940, selfPID=940, iMonCtr=1 No Process Handle Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=940, selfPID=1072, iMonCtr=1 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4468, selfPID=3000, iMonCtr=1 Model crash detected, will try to restart... No Process Handle Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4704, selfPID=4728, iMonCtr=1 No Process Handle Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4748, selfPID=4548, iMonCtr=1 CPDN Monitor - Quit request from BOINC... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3892, selfPID=3892, iMonCtr=1 No Process Handle Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3892, selfPID=1272, iMonCtr=1 CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3372, selfPID=2080, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4700, selfPID=3020, iMonCtr=1 Model crash detected, will try to restart... 09:54:24 (2688): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Leaving CPDN_Main::Monitor... Called boinc_finish </stderr_txt> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
06 Oct 2012 09:42:51 | 1170519 | 15234204 | hadam3p_pnw_31nr_1961_1_008183730_0 | 138,336 | 454,864 | 3.2881 |
02 Oct 2012 17:52:04 | 1170519 | 15234204 | hadam3p_pnw_31nr_1961_1_008183730_0 | 126,816 | 416,040 | 3.2807 |
01 Oct 2012 15:46:49 | 1170519 | 15234204 | hadam3p_pnw_31nr_1961_1_008183730_0 | 115,296 | 377,416 | 3.2735 |
30 Sep 2012 09:46:44 | 1170519 | 15234204 | hadam3p_pnw_31nr_1961_1_008183730_0 | 103,776 | 340,964 | 3.2856 |
29 Sep 2012 00:19:02 | 1170519 | 15234204 | hadam3p_pnw_31nr_1961_1_008183730_0 | 92,256 | 304,118 | 3.2965 |
27 Sep 2012 09:49:22 | 1170519 | 15234204 | hadam3p_pnw_31nr_1961_1_008183730_0 | 80,736 | 266,363 | 3.2992 |
24 Sep 2012 15:41:31 | 1170519 | 15234204 | hadam3p_pnw_31nr_1961_1_008183730_0 | 69,217 | 228,368 | 3.2993 |
24 Sep 2012 13:05:57 | 1170519 | 15234204 | hadam3p_pnw_31nr_1961_1_008183730_0 | 69,216 | 227,884 | 3.2924 |
23 Sep 2012 12:01:55 | 1170519 | 15234204 | hadam3p_pnw_31nr_1961_1_008183730_0 | 57,696 | 190,271 | 3.2978 |
17 Sep 2012 18:37:46 | 1170519 | 15234204 | hadam3p_pnw_31nr_1961_1_008183730_0 | 46,176 | 152,594 | 3.3046 |
12 Sep 2012 07:54:18 | 1170519 | 15234204 | hadam3p_pnw_31nr_1961_1_008183730_0 | 34,656 | 114,102 | 3.2924 |
09 Sep 2012 20:43:35 | 1170519 | 15234204 | hadam3p_pnw_31nr_1961_1_008183730_0 | 23,136 | 76,017 | 3.2857 |
05 Sep 2012 14:46:51 | 1170519 | 15234204 | hadam3p_pnw_31nr_1961_1_008183730_0 | 11,616 | 38,126 | 3.2822 |
©2024 cpdn.org