Name | hadcm3n_o3yc_1940_40_008382604_0 |
Workunit | 8533463 |
Created | 1 Jun 2013, 5:27:09 UTC |
Sent | 2 Jun 2013, 15:31:42 UTC |
Report deadline | 1 Sep 2013, 22:58:53 UTC |
Received | 13 Jun 2013, 5:32:34 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | 22 (0x00000016) Unknown error code |
Computer ID | 1253770 |
Run time | 8 days 6 hours 15 min 27 sec |
CPU time | 8 days 4 hours 54 min 11 sec |
Validate state | Invalid |
Credit | 4,354.56 |
Device peak FLOPS | 1.38 GFLOPS |
Application version | UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86 |
Stderr | <core_client_version>7.0.64</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> 13:47:08 (4688): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Atmos Hold Restart file rename failed on atmos_restart.hold 13:50:20 (52): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Atmos Hold Restart file rename failed on atmos_restart.hold 03:35:17 (1704): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... 03:12:30 (36): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 22:24:37 (4476): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 07:18:42 (7368): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 07:18:43 (7368): No heartbeat from core client for 30 sec - exiting 07:18:44 (7368): No heartbeat from core client for 30 sec - exiting 07:18:45 (7368): No heartbeat from core client for 30 sec - exiting 07:18:46 (7368): No heartbeat from core client for 30 sec - exiting 07:18:47 (7368): No heartbeat from core client for 30 sec - exiting 07:18:49 (7368): No heartbeat from core client for 30 sec - exiting 07:18:50 (7368): No heartbeat from core client for 30 sec - exiting 07:18:51 (7368): No heartbeat from core client for 30 sec - exiting 07:18:52 (7368): No heartbeat from core client for 30 sec - exiting 07:18:53 (7368): No heartbeat from core client for 30 sec - exiting Atmos Hold Restart file rename failed on atmos_restart.hold CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 00:06:29 (4896): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 19:05:44 (6240): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:05:45 (6240): No heartbeat from core client for 30 sec - exiting 19:05:46 (6240): No heartbeat from core client for 30 sec - exiting 19:05:47 (6240): No heartbeat from core client for 30 sec - exiting 19:05:48 (6240): No heartbeat from core client for 30 sec - exiting 19:05:49 (6240): No heartbeat from core client for 30 sec - exiting 19:05:50 (6240): No heartbeat from core client for 30 sec - exiting 19:05:51 (6240): No heartbeat from core client for 30 sec - exiting 19:05:52 (6240): No heartbeat from core client for 30 sec - exiting 19:05:53 (6240): No heartbeat from core client for 30 sec - exiting 19:05:54 (6240): No heartbeat from core client for 30 sec - exiting 23:00:19 (4236): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:31:58 (7056): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:44:16 (5680): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 04:36:17 (4848): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Atmos Hold Restart file rename failed on atmos_restart.hold Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4640, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4640, iMonCtr=1 Model crash detected, will try to restart... 19:11:46 (4640): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:02:01 (1792): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 11:10:53 (4508): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4840, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4840, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4840, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4840, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
11 Jun 2013 10:23:57 | 1253770 | 15814149 | hadcm3n_o3yc_1940_40_008382604_0 | 362,880 | 706,037 | 1.9456 |
10 Jun 2013 19:44:40 | 1253770 | 15814149 | hadcm3n_o3yc_1940_40_008382604_0 | 336,960 | 654,747 | 1.9431 |
10 Jun 2013 06:41:59 | 1253770 | 15814149 | hadcm3n_o3yc_1940_40_008382604_0 | 311,040 | 607,960 | 1.9546 |
09 Jun 2013 17:33:35 | 1253770 | 15814149 | hadcm3n_o3yc_1940_40_008382604_0 | 285,120 | 561,409 | 1.9690 |
09 Jun 2013 05:03:15 | 1253770 | 15814149 | hadcm3n_o3yc_1940_40_008382604_0 | 259,200 | 516,450 | 1.9925 |
08 Jun 2013 15:36:34 | 1253770 | 15814149 | hadcm3n_o3yc_1940_40_008382604_0 | 233,280 | 466,189 | 1.9984 |
07 Jun 2013 22:48:42 | 1253770 | 15814149 | hadcm3n_o3yc_1940_40_008382604_0 | 207,360 | 415,931 | 2.0058 |
07 Jun 2013 08:20:20 | 1253770 | 15814149 | hadcm3n_o3yc_1940_40_008382604_0 | 181,440 | 364,378 | 2.0083 |
06 Jun 2013 17:25:11 | 1253770 | 15814149 | hadcm3n_o3yc_1940_40_008382604_0 | 155,520 | 311,153 | 2.0007 |
05 Jun 2013 20:39:19 | 1253770 | 15814149 | hadcm3n_o3yc_1940_40_008382604_0 | 129,600 | 260,938 | 2.0134 |
05 Jun 2013 05:39:04 | 1253770 | 15814149 | hadcm3n_o3yc_1940_40_008382604_0 | 103,680 | 207,564 | 2.0020 |
04 Jun 2013 12:31:31 | 1253770 | 15814149 | hadcm3n_o3yc_1940_40_008382604_0 | 77,760 | 152,742 | 1.9643 |
03 Jun 2013 20:06:17 | 1253770 | 15814149 | hadcm3n_o3yc_1940_40_008382604_0 | 51,840 | 101,030 | 1.9489 |
03 Jun 2013 06:21:34 | 1253770 | 15814149 | hadcm3n_o3yc_1940_40_008382604_0 | 25,920 | 51,132 | 1.9727 |
©2024 cpdn.org