Name | hadcm3n_4n3x_1940_40_008306265_2 |
Workunit | 8457400 |
Created | 26 Mar 2013, 15:03:20 UTC |
Sent | 26 Mar 2013, 15:03:24 UTC |
Report deadline | 25 Jun 2013, 22:30:35 UTC |
Received | 24 Apr 2013, 9:04:24 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | 22 (0x00000016) Unknown error code |
Computer ID | 1147390 |
Run time | 5 days 19 hours 54 min 18 sec |
CPU time | 4 days 2 hours 23 min 33 sec |
Validate state | Invalid |
Credit | 1,866.24 |
Device peak FLOPS | 2.06 GFLOPS |
Application version | UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86 |
Stderr | <core_client_version>6.12.34</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> 02:11:54 (2588): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 09:10:51 (4712): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 16:10:00 (3612): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 09:08:41 (5864): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 17:07:43 (5580): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 17:45:45 (4132): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 11:52:29 (3900): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 18:51:20 (1076): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1312, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1312, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1312, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1312, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1312, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1312, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
24 Apr 2013 09:05:13 | 1147390 | 15683860 | hadcm3n_4n3x_1940_40_008306265_2 | 155,520 | 324,646 | 2.0875 |
02 Apr 2013 12:45:16 | 1147390 | 15683860 | hadcm3n_4n3x_1940_40_008306265_2 | 129,600 | 271,979 | 2.0986 |
01 Apr 2013 04:53:03 | 1147390 | 15683860 | hadcm3n_4n3x_1940_40_008306265_2 | 103,680 | 219,478 | 2.1169 |
29 Mar 2013 21:44:57 | 1147390 | 15683860 | hadcm3n_4n3x_1940_40_008306265_2 | 77,760 | 165,850 | 2.1328 |
28 Mar 2013 11:24:49 | 1147390 | 15683860 | hadcm3n_4n3x_1940_40_008306265_2 | 51,840 | 110,484 | 2.1313 |
27 Mar 2013 12:53:57 | 1147390 | 15683860 | hadcm3n_4n3x_1940_40_008306265_2 | 25,920 | 55,132 | 2.1270 |
©2024 cpdn.org