Name | hadcm3n_o1xx_2020_40_007858089_1 |
Workunit | 8013201 |
Created | 5 Apr 2012, 15:13:40 UTC |
Sent | 5 Apr 2012, 15:19:19 UTC |
Report deadline | 5 Jul 2012, 22:46:30 UTC |
Received | 22 May 2012, 17:35:06 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | 22 (0x00000016) Unknown error code |
Computer ID | 928765 |
Run time | 29 days 15 hours 17 min 1 sec |
CPU time | 29 days 15 hours 17 min 1 sec |
Validate state | Invalid |
Credit | 5,598.72 |
Device peak FLOPS | 1.82 GFLOPS |
Application version | UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86 |
Stderr | <core_client_version>6.2.19</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 17:01:32 (3180): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 03:01:58 (7196): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 19:36:35 (3472): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:36:36 (3472): No heartbeat from core client for 30 sec - exiting 19:36:37 (3472): No heartbeat from core client for 30 sec - exiting 19:36:38 (3472): No heartbeat from core client for 30 sec - exiting 19:36:39 (3472): No heartbeat from core client for 30 sec - exiting 19:36:40 (3472): No heartbeat from core client for 30 sec - exiting 19:36:41 (3472): No heartbeat from core client for 30 sec - exiting 19:36:42 (3472): No heartbeat from core client for 30 sec - exiting 19:36:43 (3472): No heartbeat from core client for 30 sec - exiting 19:36:44 (3472): No heartbeat from core client for 30 sec - exiting 19:36:45 (3472): No heartbeat from core client for 30 sec - exiting 19:39:30 (7948): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:39:31 (7948): No heartbeat from core client for 30 sec - exiting 19:39:32 (7948): No heartbeat from core client for 30 sec - exiting 19:39:33 (7948): No heartbeat from core client for 30 sec - exiting 19:39:35 (7948): No heartbeat from core client for 30 sec - exiting 19:39:36 (7948): No heartbeat from core client for 30 sec - exiting 19:39:37 (7948): No heartbeat from core client for 30 sec - exiting 19:39:38 (7948): No heartbeat from core client for 30 sec - exiting 19:39:39 (7948): No heartbeat from core client for 30 sec - exiting 19:39:40 (7948): No heartbeat from core client for 30 sec - exiting 19:39:41 (7948): No heartbeat from core client for 30 sec - exiting 19:43:15 (3540): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3036, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3036, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish 16:19:53 (3036): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3920, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3920, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3920, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3920, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
18 May 2012 09:17:16 | 928765 | 14365001 | hadcm3n_o1xx_2020_40_007858089_1 | 466,560 | 2,445,212 | 5.2409 |
16 May 2012 17:26:57 | 928765 | 14365001 | hadcm3n_o1xx_2020_40_007858089_1 | 440,640 | 2,322,645 | 5.2711 |
12 May 2012 18:54:01 | 928765 | 14365001 | hadcm3n_o1xx_2020_40_007858089_1 | 414,720 | 2,207,848 | 5.3237 |
10 May 2012 23:04:18 | 928765 | 14365001 | hadcm3n_o1xx_2020_40_007858089_1 | 388,800 | 2,089,810 | 5.3750 |
06 May 2012 15:01:13 | 928765 | 14365001 | hadcm3n_o1xx_2020_40_007858089_1 | 362,880 | 1,945,325 | 5.3608 |
03 May 2012 20:46:27 | 928765 | 14365001 | hadcm3n_o1xx_2020_40_007858089_1 | 336,960 | 1,790,879 | 5.3148 |
01 May 2012 08:57:08 | 928765 | 14365001 | hadcm3n_o1xx_2020_40_007858089_1 | 311,040 | 1,655,922 | 5.3238 |
29 Apr 2012 02:53:58 | 928765 | 14365001 | hadcm3n_o1xx_2020_40_007858089_1 | 285,120 | 1,520,443 | 5.3326 |
26 Apr 2012 21:31:43 | 928765 | 14365001 | hadcm3n_o1xx_2020_40_007858089_1 | 259,200 | 1,386,872 | 5.3506 |
25 Apr 2012 01:23:52 | 928765 | 14365001 | hadcm3n_o1xx_2020_40_007858089_1 | 233,280 | 1,260,440 | 5.4031 |
23 Apr 2012 04:16:55 | 928765 | 14365001 | hadcm3n_o1xx_2020_40_007858089_1 | 207,360 | 1,117,363 | 5.3885 |
21 Apr 2012 04:36:33 | 928765 | 14365001 | hadcm3n_o1xx_2020_40_007858089_1 | 181,440 | 971,640 | 5.3552 |
18 Apr 2012 12:12:03 | 928765 | 14365001 | hadcm3n_o1xx_2020_40_007858089_1 | 155,520 | 838,503 | 5.3916 |
16 Apr 2012 00:29:05 | 928765 | 14365001 | hadcm3n_o1xx_2020_40_007858089_1 | 129,600 | 677,910 | 5.2308 |
14 Apr 2012 05:53:19 | 928765 | 14365001 | hadcm3n_o1xx_2020_40_007858089_1 | 103,680 | 541,505 | 5.2228 |
12 Apr 2012 06:26:45 | 928765 | 14365001 | hadcm3n_o1xx_2020_40_007858089_1 | 77,760 | 408,582 | 5.2544 |
09 Apr 2012 10:41:34 | 928765 | 14365001 | hadcm3n_o1xx_2020_40_007858089_1 | 51,840 | 273,473 | 5.2753 |
07 Apr 2012 14:23:04 | 928765 | 14365001 | hadcm3n_o1xx_2020_40_007858089_1 | 25,920 | 138,885 | 5.3582 |
©2024 cpdn.org