Name | hadcm3n_t61m_1940_40_007316440_1 |
Workunit | 7513870 |
Created | 29 Jun 2011, 3:17:14 UTC |
Sent | 29 Jun 2011, 3:23:22 UTC |
Report deadline | 28 Sep 2011, 10:50:33 UTC |
Received | 24 Jul 2011, 9:07:47 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | 22 (0x00000016) Unknown error code |
Computer ID | 1122757 |
Run time | 19 days 4 hours 48 min 23 sec |
CPU time | 18 days 23 hours 0 min 48 sec |
Validate state | Invalid |
Credit | 6,531.84 |
Device peak FLOPS | 1.69 GFLOPS |
Application version | UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86 |
Stderr | <core_client_version>6.12.33</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 21:09:57 (4216): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 22:44:57 (4296): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 22:45:39 (4148): No heartbeat from core client for 30 sec - exiting 22:45:40 (4148): No heartbeat from core client for 30 sec - exiting 22:45:41 (4148): No heartbeat from core client for 30 sec - exiting 22:45:42 (4148): No heartbeat from core client for 30 sec - exiting 22:45:43 (4148): No heartbeat from core client for 30 sec - exiting 22:45:44 (4148): No heartbeat from core client for 30 sec - exiting 22:45:46 (4148): No heartbeat from core client for 30 sec - exiting 22:45:47 (4148): No heartbeat from core client for 30 sec - exiting 22:45:48 (4148): No heartbeat from core client for 30 sec - exiting 22:45:49 (4148): No heartbeat from core client for 30 sec - exiting 22:45:50 (4148): No heartbeat from core client for 30 sec - exiting 22:45:51 (4148): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 22:46:31 (3012): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 22:47:57 (4340): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 10:27:43 (4524): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - No 'heartbeat' from BOINC... 10:27:44 (4524): No heartbeat from core client for 30 sec - exiting 10:27:45 (4524): No heartbeat from core client for 30 sec - exiting 10:27:46 (4524): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 02:00:17 (6944): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 17:06:13 (10172): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Atmos Hold Restart file rename failed on atmos_restart.hold 17:12:23 (9720): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 17:12:24 (9720): No heartbeat from core client for 30 sec - exiting 17:12:25 (9720): No heartbeat from core client for 30 sec - exiting 17:12:26 (9720): No heartbeat from core client for 30 sec - exiting 17:12:27 (9720): No heartbeat from core client for 30 sec - exiting 17:14:40 (10432): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 17:15:37 (9584): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Atmos Hold Restart file rename failed on atmos_restart.hold 17:23:51 (9916): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 17:23:52 (9916): No heartbeat from core client for 30 sec - exiting 17:23:53 (9916): No heartbeat from core client for 30 sec - exiting 17:23:54 (9916): No heartbeat from core client for 30 sec - exiting 17:23:55 (9916): No heartbeat from core client for 30 sec - exiting 17:23:56 (9916): No heartbeat from core client for 30 sec - exiting 17:23:57 (9916): No heartbeat from core client for 30 sec - exiting 17:23:58 (9916): No heartbeat from core client for 30 sec - exiting 17:23:59 (9916): No heartbeat from core client for 30 sec - exiting 17:24:00 (9916): No heartbeat from core client for 30 sec - exiting 17:24:01 (9916): No heartbeat from core client for 30 sec - exiting 17:26:05 (6692): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 17:27:16 (10388): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 17:27:20 (10388): No heartbeat from core client for 30 sec - exiting 17:27:24 (10388): No heartbeat from core client for 30 sec - exiting 17:27:25 (10388): No heartbeat from core client for 30 sec - exiting 17:27:26 (10388): No heartbeat from core client for 30 sec - exiting 17:27:28 (10388): No heartbeat from core client for 30 sec - exiting 17:27:29 (10388): No heartbeat from core client for 30 sec - exiting 17:27:30 (10388): No heartbeat from core client for 30 sec - exiting 17:27:31 (10388): No heartbeat from core client for 30 sec - exiting 17:27:32 (10388): No heartbeat from core client for 30 sec - exiting 17:27:33 (10388): No heartbeat from core client for 30 sec - exiting 17:27:34 (10388): No heartbeat from core client for 30 sec - exiting Atmos Hold Restart file rename failed on atmos_restart.hold 17:33:57 (6392): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Atmos Hold Restart file rename failed on atmos_restart.hold CPDN Monitor - Quit request from BOINC... Atmos Hold Restart file rename failed on atmos_restart.hold 22:59:34 (2724): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 22:59:35 (2724): No heartbeat from core client for 30 sec - exiting 22:59:37 (2724): No heartbeat from core client for 30 sec - exiting 22:59:38 (2724): No heartbeat from core client for 30 sec - exiting 22:59:39 (2724): No heartbeat from core client for 30 sec - exiting 22:59:40 (2724): No heartbeat from core client for 30 sec - exiting 22:59:41 (2724): No heartbeat from core client for 30 sec - exiting 22:59:42 (2724): No heartbeat from core client for 30 sec - exiting 22:59:43 (2724): No heartbeat from core client for 30 sec - exiting 22:59:44 (2724): No heartbeat from core client for 30 sec - exiting 22:59:45 (2724): No heartbeat from core client for 30 sec - exiting 23:13:04 (6600): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:20:10 (5832): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:20:50 (5832): No heartbeat from core client for 30 sec - exiting 23:20:51 (5832): No heartbeat from core client for 30 sec - exiting 23:20:52 (5832): No heartbeat from core client for 30 sec - exiting 23:20:53 (5832): No heartbeat from core client for 30 sec - exiting 23:20:54 (5832): No heartbeat from core client for 30 sec - exiting 23:20:55 (5832): No heartbeat from core client for 30 sec - exiting 23:20:56 (5832): No heartbeat from core client for 30 sec - exiting 23:20:57 (5832): No heartbeat from core client for 30 sec - exiting 23:20:58 (5832): No heartbeat from core client for 30 sec - exiting 23:21:00 (5832): No heartbeat from core client for 30 sec - exiting 23:26:06 (5372): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:33:09 (1900): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:36:20 (2832): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:39:12 (5084): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:39:14 (5084): No heartbeat from core client for 30 sec - exiting 23:39:15 (5084): No heartbeat from core client for 30 sec - exiting 23:39:16 (5084): No heartbeat from core client for 30 sec - exiting 23:40:32 (4176): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:41:59 (1588): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:43:37 (5368): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:45:17 (5960): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:45:18 (5960): No heartbeat from core client for 30 sec - exiting 23:45:19 (5960): No heartbeat from core client for 30 sec - exiting 23:45:20 (5960): No heartbeat from core client for 30 sec - exiting 23:45:21 (5960): No heartbeat from core client for 30 sec - exiting 23:45:22 (5960): No heartbeat from core client for 30 sec - exiting 23:45:23 (5960): No heartbeat from core client for 30 sec - exiting 23:45:24 (5960): No heartbeat from core client for 30 sec - exiting 23:45:25 (5960): No heartbeat from core client for 30 sec - exiting 23:45:26 (5960): No heartbeat from core client for 30 sec - exiting 23:45:28 (5960): No heartbeat from core client for 30 sec - exiting 23:48:58 (5208): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:57:01 (6216): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:59:08 (5648): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 00:04:21 (5784): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 00:29:47 (6396): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 02:01:50 (4100): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2028, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2028, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2028, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2028, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2028, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2028, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
25 Jul 2011 21:49:09 | 1122757 | 13026553 | hadcm3n_t61m_1940_40_007316440_1 | 544,320 | 1,617,587 | 2.9718 |
25 Jul 2011 20:28:46 | 1122757 | 13026553 | hadcm3n_t61m_1940_40_007316440_1 | 518,400 | 1,537,837 | 2.9665 |
25 Jul 2011 19:21:50 | 1122757 | 13026553 | hadcm3n_t61m_1940_40_007316440_1 | 492,480 | 1,458,024 | 2.9606 |
25 Jul 2011 19:21:45 | 1122757 | 13026553 | hadcm3n_t61m_1940_40_007316440_1 | 466,560 | 1,378,733 | 2.9551 |
25 Jul 2011 18:54:19 | 1122757 | 13026553 | hadcm3n_t61m_1940_40_007316440_1 | 440,640 | 1,299,409 | 2.9489 |
25 Jul 2011 18:01:58 | 1122757 | 13026553 | hadcm3n_t61m_1940_40_007316440_1 | 414,720 | 1,220,068 | 2.9419 |
25 Jul 2011 17:26:18 | 1122757 | 13026553 | hadcm3n_t61m_1940_40_007316440_1 | 388,800 | 1,140,914 | 2.9344 |
25 Jul 2011 16:18:06 | 1122757 | 13026553 | hadcm3n_t61m_1940_40_007316440_1 | 362,880 | 1,061,509 | 2.9252 |
25 Jul 2011 15:41:25 | 1122757 | 13026553 | hadcm3n_t61m_1940_40_007316440_1 | 336,960 | 982,572 | 2.9160 |
25 Jul 2011 14:37:46 | 1122757 | 13026553 | hadcm3n_t61m_1940_40_007316440_1 | 311,040 | 903,570 | 2.9050 |
25 Jul 2011 13:15:53 | 1122757 | 13026553 | hadcm3n_t61m_1940_40_007316440_1 | 285,120 | 838,077 | 2.9394 |
25 Jul 2011 13:15:53 | 1122757 | 13026553 | hadcm3n_t61m_1940_40_007316440_1 | 259,200 | 774,032 | 2.9862 |
25 Jul 2011 13:15:52 | 1122757 | 13026553 | hadcm3n_t61m_1940_40_007316440_1 | 233,280 | 704,676 | 3.0207 |
25 Jul 2011 13:15:52 | 1122757 | 13026553 | hadcm3n_t61m_1940_40_007316440_1 | 207,360 | 627,299 | 3.0252 |
25 Jul 2011 13:15:52 | 1122757 | 13026553 | hadcm3n_t61m_1940_40_007316440_1 | 181,440 | 548,500 | 3.0230 |
10 Jul 2011 10:07:09 | 1122757 | 13026553 | hadcm3n_t61m_1940_40_007316440_1 | 155,520 | 469,446 | 3.0186 |
09 Jul 2011 11:51:22 | 1122757 | 13026553 | hadcm3n_t61m_1940_40_007316440_1 | 129,600 | 390,790 | 3.0154 |
08 Jul 2011 14:34:36 | 1122757 | 13026553 | hadcm3n_t61m_1940_40_007316440_1 | 103,680 | 311,641 | 3.0058 |
07 Jul 2011 15:49:05 | 1122757 | 13026553 | hadcm3n_t61m_1940_40_007316440_1 | 77,760 | 234,348 | 3.0137 |
02 Jul 2011 13:23:23 | 1122757 | 13026553 | hadcm3n_t61m_1940_40_007316440_1 | 51,840 | 156,044 | 3.0101 |
©2024 cpdn.org