Name | hadcm3n_p1gr_1940_40_007450462_2 |
Workunit | 7647965 |
Created | 13 Sep 2011, 15:58:23 UTC |
Sent | 13 Sep 2011, 16:01:57 UTC |
Report deadline | 13 Dec 2011, 23:29:08 UTC |
Received | 8 Oct 2011, 0:20:01 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | 22 (0x00000016) Unknown error code |
Computer ID | 957386 |
Run time | 2 days 4 hours 21 min 9 sec |
CPU time | 1 days 20 hours 22 min 48 sec |
Validate state | Invalid |
Credit | 1,244.16 |
Device peak FLOPS | 2.93 GFLOPS |
Application version | UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86 |
Stderr | <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 12:36:37 (3960): No heartbeat from core client for 30 sec - exiting 12:36:38 (3960): No heartbeat from core client for 30 sec - exiting 12:36:39 (3960): No heartbeat from core client for 30 sec - exiting 12:36:40 (3960): No heartbeat from core client for 30 sec - exiting 12:36:41 (3960): No heartbeat from core client for 30 sec - exiting 12:36:42 (3960): No heartbeat from core client for 30 sec - exiting 12:36:43 (3960): No heartbeat from core client for 30 sec - exiting 12:36:44 (3960): No heartbeat from core client for 30 sec - exiting 12:36:45 (3960): No heartbeat from core client for 30 sec - exiting 12:36:46 (3960): No heartbeat from core client for 30 sec - exiting 12:36:47 (3960): No heartbeat from core client for 30 sec - exiting 12:36:48 (3960): No heartbeat from core client for 30 sec - exiting 12:36:49 (3960): No heartbeat from core client for 30 sec - exiting 12:36:50 (3960): No heartbeat from core client for 30 sec - exiting 12:36:51 (3960): No heartbeat from core client for 30 sec - exiting 12:36:52 (3960): No heartbeat from core client for 30 sec - exiting 12:36:53 (3960): No heartbeat from core client for 30 sec - exiting 12:36:54 (3960): No heartbeat from core client for 30 sec - exiting 12:36:55 (3960): No heartbeat from core client for 30 sec - exiting 12:36:56 (3960): No heartbeat from core client for 30 sec - exiting 12:36:57 (3960): No heartbeat from core client for 30 sec - exiting 12:36:58 (3960): No heartbeat from core client for 30 sec - exiting 12:36:59 (3960): No heartbeat from core client for 30 sec - exiting 12:37:00 (3960): No heartbeat from core client for 30 sec - exiting 12:37:01 (3960): No heartbeat from core client for 30 sec - exiting 12:37:02 (3960): No heartbeat from core client for 30 sec - exiting 12:37:03 (3960): No heartbeat from core client for 30 sec - exiting 12:37:04 (3960): No heartbeat from core client for 30 sec - exiting 12:37:05 (3960): No heartbeat from core client for 30 sec - exiting 12:37:06 (3960): No heartbeat from core client for 30 sec - exiting 12:37:07 (3960): No heartbeat from core client for 30 sec - exiting 12:37:08 (3960): No heartbeat from core client for 30 sec - exiting 12:37:09 (3960): No heartbeat from core client for 30 sec - exiting 12:37:10 (3960): No heartbeat from core client for 30 sec - exiting 12:37:11 (3960): No heartbeat from core client for 30 sec - exiting 12:37:12 (3960): No heartbeat from core client for 30 sec - exiting 12:37:13 (3960): No heartbeat from core client for 30 sec - exiting 12:37:14 (3960): No heartbeat from core client for 30 sec - exiting 12:37:15 (3960): No heartbeat from core client for 30 sec - exiting 12:37:16 (3960): No heartbeat from core client for 30 sec - exiting 12:37:17 (3960): No heartbeat from core client for 30 sec - exiting 12:37:18 (3960): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 12:37:19 (3960): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 06:53:32 (18452): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 10:57:53 (4964): No heartbeat from core client for 30 sec - exiting 10:57:54 (4964): No heartbeat from core client for 30 sec - exiting 10:57:55 (4964): No heartbeat from core client for 30 sec - exiting 10:57:56 (4964): No heartbeat from core client for 30 sec - exiting 10:57:57 (4964): No heartbeat from core client for 30 sec - exiting 10:57:58 (4964): No heartbeat from core client for 30 sec - exiting 10:57:59 (4964): No heartbeat from core client for 30 sec - exiting 10:58:00 (4964): No heartbeat from core client for 30 sec - exiting 10:58:01 (4964): No heartbeat from core client for 30 sec - exiting 10:58:02 (4964): No heartbeat from core client for 30 sec - exiting 10:58:03 (4964): No heartbeat from core client for 30 sec - exiting 10:58:04 (4964): No heartbeat from core client for 30 sec - exiting 10:58:05 (4964): No heartbeat from core client for 30 sec - exiting 10:58:06 (4964): No heartbeat from core client for 30 sec - exiting 10:58:07 (4964): No heartbeat from core client for 30 sec - exiting 10:58:08 (4964): No heartbeat from core client for 30 sec - exiting 10:58:09 (4964): No heartbeat from core client for 30 sec - exiting 10:58:10 (4964): No heartbeat from core client for 30 sec - exiting 10:58:11 (4964): No heartbeat from core client for 30 sec - exiting 10:58:12 (4964): No heartbeat from core client for 30 sec - exiting 10:58:13 (4964): No heartbeat from core client for 30 sec - exiting 10:58:14 (4964): No heartbeat from core client for 30 sec - exiting 10:58:15 (4964): No heartbeat from core client for 30 sec - exiting 10:58:16 (4964): No heartbeat from core client for 30 sec - exiting 10:58:17 (4964): No heartbeat from core client for 30 sec - exiting 10:58:18 (4964): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 10:58:19 (4964): No heartbeat from core client for 30 sec - exiting 11:40:43 (632): No heartbeat from core client for 30 sec - exiting 11:40:44 (632): No heartbeat from core client for 30 sec - exiting 11:40:45 (632): No heartbeat from core client for 30 sec - exiting 11:40:46 (632): No heartbeat from core client for 30 sec - exiting 11:40:47 (632): No heartbeat from core client for 30 sec - exiting 11:40:48 (632): No heartbeat from core client for 30 sec - exiting 11:40:49 (632): No heartbeat from core client for 30 sec - exiting 11:40:50 (632): No heartbeat from core client for 30 sec - exiting 11:40:51 (632): No heartbeat from core client for 30 sec - exiting 11:40:52 (632): No heartbeat from core client for 30 sec - exiting 11:40:53 (632): No heartbeat from core client for 30 sec - exiting 11:40:54 (632): No heartbeat from core client for 30 sec - exiting 11:40:55 (632): No heartbeat from core client for 30 sec - exiting 11:40:56 (632): No heartbeat from core client for 30 sec - exiting 11:40:57 (632): No heartbeat from core client for 30 sec - exiting 11:40:58 (632): No heartbeat from core client for 30 sec - exiting 11:40:59 (632): No heartbeat from core client for 30 sec - exiting 11:41:00 (632): No heartbeat from core client for 30 sec - exiting 11:41:01 (632): No heartbeat from core client for 30 sec - exiting 11:41:02 (632): No heartbeat from core client for 30 sec - exiting 11:41:03 (632): No heartbeat from core client for 30 sec - exiting 11:41:04 (632): No heartbeat from core client for 30 sec - exiting 11:41:05 (632): No heartbeat from core client for 30 sec - exiting 11:41:06 (632): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 11:41:07 (632): No heartbeat from core client for 30 sec - exiting Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5200, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5200, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5200, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Quit request from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1908, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1908, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1908, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
25 Sep 2011 14:19:00 | 957386 | 13383812 | hadcm3n_p1gr_1940_40_007450462_2 | 103,680 | 157,753 | 1.5215 |
21 Sep 2011 10:22:36 | 957386 | 13383812 | hadcm3n_p1gr_1940_40_007450462_2 | 77,760 | 119,570 | 1.5377 |
18 Sep 2011 12:15:23 | 957386 | 13383812 | hadcm3n_p1gr_1940_40_007450462_2 | 51,840 | 79,763 | 1.5386 |
15 Sep 2011 13:27:59 | 957386 | 13383812 | hadcm3n_p1gr_1940_40_007450462_2 | 25,920 | 40,744 | 1.5719 |
©2024 cpdn.org