Name | hadcm3n_y864_1900_40_007522664_4 |
Workunit | 7720139 |
Created | 2 Nov 2011, 9:16:34 UTC |
Sent | 2 Nov 2011, 9:18:02 UTC |
Report deadline | 1 Feb 2012, 16:45:13 UTC |
Received | 12 Dec 2011, 9:54:25 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | 22 (0x00000016) Unknown error code |
Computer ID | 887618 |
Run time | 29 days 20 hours 7 min 36 sec |
CPU time | 20 days 0 hours 43 min 38 sec |
Validate state | Invalid |
Credit | 9,331.20 |
Device peak FLOPS | 1.83 GFLOPS |
Application version | UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86 |
Stderr | <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 23:32:27 (4908): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:33:25 (5664): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:35:24 (6092): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 15:38:38 (4832): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4520, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5284, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3560, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... 23:57:53 (4424): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 02:11:14 (12088): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 04:16:27 (4556): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 06:21:08 (6888): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 08:25:37 (5632): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 10:30:26 (9292): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... BUFFIN: C I/O Error feof - Unit 63 - Return code = 16 BUFFIN: C I/O Error feof - Unit 64 - Return code = 16 BUFFIN: C I/O Error feof - Unit 65 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 Error converting file to netcdf: dataout/y864ko.pjc5c10 Error converting file to netcdf: dataout/y864ko.pic5c10 Error converting file to netcdf: dataout/y864ko.pfc5c10 Error converting file to netcdf: dataout/y864ka.phc5c10 Error converting file to netcdf: dataout/y864ka.pgc5c10 Error converting file to netcdf: dataout/y864ka.pec5c10 Error converting file to netcdf: dataout/y864ka.pdc5c10 12:34:58 (10560): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 02:30:59 (12276): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 11:06:22 (7276): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:30:33 (10100): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 01:23:52 (10772): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 01:23:53 (10772): No heartbeat from core client for 30 sec - exiting 01:23:54 (10772): No heartbeat from core client for 30 sec - exiting 01:23:55 (10772): No heartbeat from core client for 30 sec - exiting 01:23:56 (10772): No heartbeat from core client for 30 sec - exiting 01:23:57 (10772): No heartbeat from core client for 30 sec - exiting 01:23:58 (10772): No heartbeat from core client for 30 sec - exiting 01:23:59 (10772): No heartbeat from core client for 30 sec - exiting CPDN Monitor - Quit request from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=9416, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=9416, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=9416, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=9416, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=9416, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=9416, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
11 Dec 2011 08:51:48 | 887618 | 13584977 | hadcm3n_y864_1900_40_007522664_4 | 777,600 | 1,730,199 | 2.2251 |
10 Dec 2011 02:39:44 | 887618 | 13584977 | hadcm3n_y864_1900_40_007522664_4 | 751,680 | 1,676,859 | 2.2308 |
08 Dec 2011 23:05:14 | 887618 | 13584977 | hadcm3n_y864_1900_40_007522664_4 | 725,760 | 1,620,526 | 2.2329 |
08 Dec 2011 23:05:14 | 887618 | 13584977 | hadcm3n_y864_1900_40_007522664_4 | 699,840 | 1,557,472 | 2.2255 |
08 Dec 2011 23:05:14 | 887618 | 13584977 | hadcm3n_y864_1900_40_007522664_4 | 673,920 | 1,494,015 | 2.2169 |
08 Dec 2011 23:05:14 | 887618 | 13584977 | hadcm3n_y864_1900_40_007522664_4 | 648,000 | 1,433,418 | 2.2121 |
08 Dec 2011 23:05:14 | 887618 | 13584977 | hadcm3n_y864_1900_40_007522664_4 | 622,080 | 1,377,841 | 2.2149 |
05 Dec 2011 00:08:50 | 887618 | 13584977 | hadcm3n_y864_1900_40_007522664_4 | 596,160 | 1,314,957 | 2.2057 |
03 Dec 2011 13:57:26 | 887618 | 13584977 | hadcm3n_y864_1900_40_007522664_4 | 570,240 | 1,256,428 | 2.2033 |
01 Dec 2011 15:28:07 | 887618 | 13584977 | hadcm3n_y864_1900_40_007522664_4 | 544,320 | 1,200,390 | 2.2053 |
29 Nov 2011 16:18:02 | 887618 | 13584977 | hadcm3n_y864_1900_40_007522664_4 | 518,400 | 1,143,855 | 2.2065 |
28 Nov 2011 23:22:50 | 887618 | 13584977 | hadcm3n_y864_1900_40_007522664_4 | 492,480 | 1,086,527 | 2.2062 |
28 Nov 2011 23:22:50 | 887618 | 13584977 | hadcm3n_y864_1900_40_007522664_4 | 466,560 | 1,030,802 | 2.2094 |
28 Nov 2011 23:22:50 | 887618 | 13584977 | hadcm3n_y864_1900_40_007522664_4 | 440,640 | 975,503 | 2.2138 |
25 Nov 2011 03:04:31 | 887618 | 13584977 | hadcm3n_y864_1900_40_007522664_4 | 414,720 | 920,001 | 2.2184 |
24 Nov 2011 10:16:58 | 887618 | 13584977 | hadcm3n_y864_1900_40_007522664_4 | 388,800 | 862,762 | 2.2190 |
22 Nov 2011 17:00:58 | 887618 | 13584977 | hadcm3n_y864_1900_40_007522664_4 | 362,880 | 802,650 | 2.2119 |
21 Nov 2011 11:29:13 | 887618 | 13584977 | hadcm3n_y864_1900_40_007522664_4 | 336,960 | 744,970 | 2.2109 |
20 Nov 2011 23:26:10 | 887618 | 13584977 | hadcm3n_y864_1900_40_007522664_4 | 311,040 | 689,044 | 2.2153 |
20 Nov 2011 23:26:10 | 887618 | 13584977 | hadcm3n_y864_1900_40_007522664_4 | 285,120 | 633,577 | 2.2221 |
©2024 cpdn.org