Name | hadcm3n_sba7_1940_40_009110606_0 |
Workunit | 9240942 |
Created | 22 Oct 2014, 14:34:37 UTC |
Sent | 25 Oct 2014, 22:22:04 UTC |
Report deadline | 25 Jan 2015, 5:49:15 UTC |
Received | 29 Nov 2014, 12:04:35 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | 193 (0x000000C1) EXIT_SIGNAL |
Computer ID | 1290798 |
Run time | 21 days 21 hours 15 min 46 sec |
CPU time | 20 days 8 hours 59 min 4 sec |
Validate state | Invalid |
Credit | 9,331.20 |
Device peak FLOPS | 2.38 GFLOPS |
Application version | UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86 |
Stderr | <core_client_version>7.2.42</core_client_version> <![CDATA[ <message> (unknown error) - exit code 193 (0xc1) </message> <stderr_txt> 03:24:15 (8468): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:43:44 (9872): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:43:45 (9872): No heartbeat from core client for 30 sec - exiting 19:43:46 (9872): No heartbeat from core client for 30 sec - exiting 19:43:47 (9872): No heartbeat from core client for 30 sec - exiting Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=11428, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6076, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4716, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6964, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6028, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... 00:02:41 (5676): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 06:36:13 (5696): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Atmos Hold Restart file rename failed on atmos_restart.hold Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=9612, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4932, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4932, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4932, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5864, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5864, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4104, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4108, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4108, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... 01:06:51 (4468): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 01:06:53 (4468): No heartbeat from core client for 30 sec - exiting 01:06:54 (4468): No heartbeat from core client for 30 sec - exiting 01:06:55 (4468): No heartbeat from core client for 30 sec - exiting 01:06:56 (4468): No heartbeat from core client for 30 sec - exiting 01:06:57 (4468): No heartbeat from core client for 30 sec - exiting 01:06:58 (4468): No heartbeat from core client for 30 sec - exiting 01:07:24 (12584): Can't acquire lockfile (32) - waiting 35s 01:21:58 (12584): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=11588, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5680, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5680, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4676, iMonCtr=1 Model crash detected, will try to restart... 21:43:15 (5776): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:52:29 (11720): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 22:13:26 (3476): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3192, iMonCtr=1 Model crash detected, will try to restart... 21:20:23 (1600): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3620, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5388, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish </stderr_txt> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
29 Nov 2014 11:08:20 | 1290798 | 17252377 | hadcm3n_sba7_1940_40_009110606_0 | 777,600 | 1,760,332 | 2.2638 |
28 Nov 2014 04:55:02 | 1290798 | 17252377 | hadcm3n_sba7_1940_40_009110606_0 | 751,680 | 1,700,731 | 2.2626 |
26 Nov 2014 23:19:13 | 1290798 | 17252377 | hadcm3n_sba7_1940_40_009110606_0 | 725,760 | 1,639,867 | 2.2595 |
25 Nov 2014 04:54:27 | 1290798 | 17252377 | hadcm3n_sba7_1940_40_009110606_0 | 699,840 | 1,580,198 | 2.2579 |
23 Nov 2014 10:18:15 | 1290798 | 17252377 | hadcm3n_sba7_1940_40_009110606_0 | 673,920 | 1,517,480 | 2.2517 |
22 Nov 2014 16:34:38 | 1290798 | 17252377 | hadcm3n_sba7_1940_40_009110606_0 | 648,000 | 1,457,954 | 2.2499 |
21 Nov 2014 23:57:20 | 1290798 | 17252377 | hadcm3n_sba7_1940_40_009110606_0 | 622,080 | 1,399,718 | 2.2501 |
20 Nov 2014 04:59:44 | 1290798 | 17252377 | hadcm3n_sba7_1940_40_009110606_0 | 596,160 | 1,338,773 | 2.2457 |
19 Nov 2014 11:35:57 | 1290798 | 17252377 | hadcm3n_sba7_1940_40_009110606_0 | 570,240 | 1,278,291 | 2.2417 |
18 Nov 2014 05:49:27 | 1290798 | 17252377 | hadcm3n_sba7_1940_40_009110606_0 | 544,320 | 1,217,724 | 2.2371 |
17 Nov 2014 00:17:28 | 1290798 | 17252377 | hadcm3n_sba7_1940_40_009110606_0 | 518,400 | 1,159,525 | 2.2367 |
16 Nov 2014 07:31:33 | 1290798 | 17252377 | hadcm3n_sba7_1940_40_009110606_0 | 492,480 | 1,102,797 | 2.2393 |
15 Nov 2014 15:07:04 | 1290798 | 17252377 | hadcm3n_sba7_1940_40_009110606_0 | 466,560 | 1,046,259 | 2.2425 |
14 Nov 2014 22:47:56 | 1290798 | 17252377 | hadcm3n_sba7_1940_40_009110606_0 | 440,640 | 989,906 | 2.2465 |
13 Nov 2014 08:21:01 | 1290798 | 17252377 | hadcm3n_sba7_1940_40_009110606_0 | 414,720 | 930,673 | 2.2441 |
12 Nov 2014 01:06:32 | 1290798 | 17252377 | hadcm3n_sba7_1940_40_009110606_0 | 388,800 | 873,822 | 2.2475 |
10 Nov 2014 02:07:48 | 1290798 | 17252377 | hadcm3n_sba7_1940_40_009110606_0 | 362,880 | 815,175 | 2.2464 |
09 Nov 2014 09:02:04 | 1290798 | 17252377 | hadcm3n_sba7_1940_40_009110606_0 | 336,960 | 757,618 | 2.2484 |
08 Nov 2014 15:41:44 | 1290798 | 17252377 | hadcm3n_sba7_1940_40_009110606_0 | 311,040 | 700,254 | 2.2513 |
07 Nov 2014 23:12:56 | 1290798 | 17252377 | hadcm3n_sba7_1940_40_009110606_0 | 285,120 | 643,864 | 2.2582 |
©2024 cpdn.org