Name | hadcm3n_ydtz_1900_40_007518381_1 |
Workunit | 7715856 |
Created | 28 Oct 2011, 12:57:54 UTC |
Sent | 20 Nov 2011, 21:29:35 UTC |
Report deadline | 20 Feb 2012, 4:56:46 UTC |
Received | 3 Dec 2011, 23:13:39 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | 22 (0x00000016) Unknown error code |
Computer ID | 890338 |
Run time | 3 days 6 hours 59 min 22 sec |
CPU time | 3 days 4 hours 3 min 13 sec |
Validate state | Invalid |
Credit | 2,177.28 |
Device peak FLOPS | 2.66 GFLOPS |
Application version | UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86 |
Stderr | <core_client_version>6.12.34</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1760, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3460, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3732, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3164, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3576, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3576, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3864, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3136, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3960, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5856, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2776, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6972, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 17:58:06 (3456): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 17:27:34 (4632): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3524, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3124, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 16:51:14 (3164): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 16:51:15 (3164): No heartbeat from core client for 30 sec - exiting 16:51:16 (3164): No heartbeat from core client for 30 sec - exiting 16:51:17 (3164): No heartbeat from core client for 30 sec - exiting 16:51:18 (3164): No heartbeat from core client for 30 sec - exiting 16:51:19 (3164): No heartbeat from core client for 30 sec - exiting 16:51:20 (3164): No heartbeat from core client for 30 sec - exiting 16:51:21 (3164): No heartbeat from core client for 30 sec - exiting 16:51:22 (3164): No heartbeat from core client for 30 sec - exiting 16:51:23 (3164): No heartbeat from core client for 30 sec - exiting 16:51:24 (3164): No heartbeat from core client for 30 sec - exiting 16:51:25 (3164): No heartbeat from core client for 30 sec - exiting 16:51:26 (3164): No heartbeat from core client for 30 sec - exiting 16:51:27 (3164): No heartbeat from core client for 30 sec - exiting 16:51:28 (3164): No heartbeat from core client for 30 sec - exiting 16:51:29 (3164): No heartbeat from core client for 30 sec - exiting 16:51:30 (3164): No heartbeat from core client for 30 sec - exiting 16:51:31 (3164): No heartbeat from core client for 30 sec - exiting 16:51:32 (3164): No heartbeat from core client for 30 sec - exiting 16:51:33 (3164): No heartbeat from core client for 30 sec - exiting 16:51:34 (3164): No heartbeat from core client for 30 sec - exiting 16:51:35 (3164): No heartbeat from core client for 30 sec - exiting 16:51:36 (3164): No heartbeat from core client for 30 sec - exiting 16:51:37 (3164): No heartbeat from core client for 30 sec - exiting 16:51:38 (3164): No heartbeat from core client for 30 sec - exiting 16:51:39 (3164): No heartbeat from core client for 30 sec - exiting 16:51:40 (3164): No heartbeat from core client for 30 sec - exiting 16:51:41 (3164): No heartbeat from core client for 30 sec - exiting 16:51:42 (3164): No heartbeat from core client for 30 sec - exiting 16:51:43 (3164): No heartbeat from core client for 30 sec - exiting 16:51:44 (3164): No heartbeat from core client for 30 sec - exiting 16:51:45 (3164): No heartbeat from core client for 30 sec - exiting 16:51:46 (3164): No heartbeat from core client for 30 sec - exiting 16:51:47 (3164): No heartbeat from core client for 30 sec - exiting 16:51:48 (3164): No heartbeat from core client for 30 sec - exiting 16:51:49 (3164): No heartbeat from core client for 30 sec - exiting 16:51:50 (3164): No heartbeat from core client for 30 sec - exiting 16:51:51 (3164): No heartbeat from core client for 30 sec - exiting 16:51:52 (3164): No heartbeat from core client for 30 sec - exiting 16:51:53 (3164): No heartbeat from core client for 30 sec - exiting 16:51:54 (3164): No heartbeat from core client for 30 sec - exiting 16:51:55 (3164): No heartbeat from core client for 30 sec - exiting 16:51:56 (3164): No heartbeat from core client for 30 sec - exiting 16:51:57 (3164): No heartbeat from core client for 30 sec - exiting 16:51:58 (3164): No heartbeat from core client for 30 sec - exiting 16:51:59 (3164): No heartbeat from core client for 30 sec - exiting 16:52:00 (3164): No heartbeat from core client for 30 sec - exiting 16:52:01 (3164): No heartbeat from core client for 30 sec - exiting 16:52:02 (3164): No heartbeat from core client for 30 sec - exiting 16:52:03 (3164): No heartbeat from core client for 30 sec - exiting 16:52:04 (3164): No heartbeat from core client for 30 sec - exiting 16:52:05 (3164): No heartbeat from core client for 30 sec - exiting 16:52:06 (3164): No heartbeat from core client for 30 sec - exiting 16:52:07 (3164): No heartbeat from core client for 30 sec - exiting 16:52:08 (3164): No heartbeat from core client for 30 sec - exiting 16:52:09 (3164): No heartbeat from core client for 30 sec - exiting 16:52:10 (3164): No heartbeat from core client for 30 sec - exiting 16:52:11 (3164): No heartbeat from core client for 30 sec - exiting 16:52:12 (3164): No heartbeat from core client for 30 sec - exiting 16:52:13 (3164): No heartbeat from core client for 30 sec - exiting 16:52:14 (3164): No heartbeat from core client for 30 sec - exiting 16:52:15 (3164): No heartbeat from core client for 30 sec - exiting 16:52:16 (3164): No heartbeat from core client for 30 sec - exiting 16:52:17 (3164): No heartbeat from core client for 30 sec - exiting 16:52:18 (3164): No heartbeat from core client for 30 sec - exiting 16:52:19 (3164): No heartbeat from core client for 30 sec - exiting 16:52:20 (3164): No heartbeat from core client for 30 sec - exiting 16:52:21 (3164): No heartbeat from core client for 30 sec - exiting 16:52:22 (3164): No heartbeat from core client for 30 sec - exiting 16:52:23 (3164): No heartbeat from core client for 30 sec - exiting 16:52:24 (3164): No heartbeat from core client for 30 sec - exiting 16:52:25 (3164): No heartbeat from core client for 30 sec - exiting 16:52:26 (3164): No heartbeat from core client for 30 sec - exiting 16:52:27 (3164): No heartbeat from core client for 30 sec - exiting 16:52:28 (3164): No heartbeat from core client for 30 sec - exiting 16:52:29 (3164): No heartbeat from core client for 30 sec - exiting 16:52:30 (3164): No heartbeat from core client for 30 sec - exiting 16:52:31 (3164): No heartbeat from core client for 30 sec - exiting 16:52:32 (3164): No heartbeat from core client for 30 sec - exiting 16:52:33 (3164): No heartbeat from core client for 30 sec - exiting 16:52:34 (3164): No heartbeat from core client for 30 sec - exiting 16:52:35 (3164): No heartbeat from core client for 30 sec - exiting 16:52:36 (3164): No heartbeat from core client for 30 sec - exiting 16:52:37 (3164): No heartbeat from core client for 30 sec - exiting 16:52:38 (3164): No heartbeat from core client for 30 sec - exiting 16:52:39 (3164): No heartbeat from core client for 30 sec - exiting 16:52:40 (3164): No heartbeat from core client for 30 sec - exiting 16:52:41 (3164): No heartbeat from core client for 30 sec - exiting 16:52:42 (3164): No heartbeat from core client for 30 sec - exiting 16:52:43 (3164): No heartbeat from core client for 30 sec - exiting 16:52:44 (3164): No heartbeat from core client for 30 sec - exiting 16:52:45 (3164): No heartbeat from core client for 30 sec - exiting 16:52:46 (3164): No heartbeat from core client for 30 sec - exiting 16:52:47 (3164): No heartbeat from core client for 30 sec - exiting 16:52:48 (3164): No heartbeat from core client for 30 sec - exiting Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1740, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1740, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1740, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1740, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1740, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1740, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
03 Dec 2011 22:17:07 | 890338 | 13542443 | hadcm3n_ydtz_1900_40_007518381_1 | 181,440 | 270,608 | 1.4914 |
03 Dec 2011 11:16:49 | 890338 | 13542443 | hadcm3n_ydtz_1900_40_007518381_1 | 155,520 | 229,650 | 1.4767 |
01 Dec 2011 22:24:09 | 890338 | 13542443 | hadcm3n_ydtz_1900_40_007518381_1 | 129,600 | 188,961 | 1.4580 |
28 Nov 2011 18:23:52 | 890338 | 13542443 | hadcm3n_ydtz_1900_40_007518381_1 | 103,680 | 150,201 | 1.4487 |
27 Nov 2011 12:01:34 | 890338 | 13542443 | hadcm3n_ydtz_1900_40_007518381_1 | 77,760 | 111,352 | 1.4320 |
26 Nov 2011 18:43:56 | 890338 | 13542443 | hadcm3n_ydtz_1900_40_007518381_1 | 51,840 | 75,960 | 1.4653 |
22 Nov 2011 19:02:50 | 890338 | 13542443 | hadcm3n_ydtz_1900_40_007518381_1 | 25,920 | 38,823 | 1.4978 |
©2024 cpdn.org