Name | hadcm3n_o7bh_2140_40_008269505_3 |
Workunit | 8424629 |
Created | 14 Jun 2013, 23:13:59 UTC |
Sent | 14 Jun 2013, 23:39:19 UTC |
Report deadline | 14 Sep 2013, 7:06:30 UTC |
Received | 3 Jul 2013, 23:12:57 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | 22 (0x00000016) Unknown error code |
Computer ID | 1242215 |
Run time | 10 days 7 hours 22 min 58 sec |
CPU time | 8 days 0 hours 40 min 21 sec |
Validate state | Invalid |
Credit | 9,331.20 |
Device peak FLOPS | 3.54 GFLOPS |
Application version | UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86 |
Stderr | <core_client_version>7.0.64</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4868, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 08:44:01 (5248): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - No 'heartbeat' from BOINC... 08:44:04 (5248): No heartbeat from core client for 30 sec - exiting 08:44:05 (5248): No heartbeat from core client for 30 sec - exiting 08:44:06 (5248): No heartbeat from core client for 30 sec - exiting 08:44:07 (5248): No heartbeat from core client for 30 sec - exiting 08:44:08 (5248): No heartbeat from core client for 30 sec - exiting 08:44:09 (5248): No heartbeat from core client for 30 sec - exiting 08:44:10 (5248): No heartbeat from core client for 30 sec - exiting 08:44:11 (5248): No heartbeat from core client for 30 sec - exiting 08:44:12 (5248): No heartbeat from core client for 30 sec - exiting 08:44:13 (5248): No heartbeat from core client for 30 sec - exiting 08:44:14 (5248): No heartbeat from core client for 30 sec - exiting 08:44:15 (5248): No heartbeat from core client for 30 sec - exiting 08:44:16 (5248): No heartbeat from core client for 30 sec - exiting 08:44:17 (5248): No heartbeat from core client for 30 sec - exiting 08:44:18 (5248): No heartbeat from core client for 30 sec - exiting 08:44:19 (5248): No heartbeat from core client for 30 sec - exiting 08:44:20 (5248): No heartbeat from core client for 30 sec - exiting 08:44:21 (5248): No heartbeat from core client for 30 sec - exiting 08:44:22 (5248): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5612, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6708, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4852, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... MainError: 10:11:03 PM No files match the supplied pattern. MainError: 10:11:03 PM No files match the supplied pattern. Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6528, iMonCtr=1 Model crash detected, will try to restart... MainError: 06:31:33 AM No files match the supplied pattern. MainError: 06:31:33 AM No files match the supplied pattern. MainError: 02:26:52 PM No files match the supplied pattern. MainError: 02:26:52 PM No files match the supplied pattern. MainError: 10:17:19 PM No files match the supplied pattern. MainError: 10:17:19 PM No files match the supplied pattern. MainError: 06:03:32 AM No files match the supplied pattern. MainError: 06:03:32 AM No files match the supplied pattern. Suspended CPDN Monitor - Suspend request from BOINC... MainError: 03:36:19 AM No files match the supplied pattern. MainError: 03:36:19 AM No files match the supplied pattern. MainError: 11:50:48 AM No files match the supplied pattern. MainError: 11:50:48 AM No files match the supplied pattern. Suspended CPDN Monitor - Suspend request from BOINC... MainError: 08:27:07 PM No files match the supplied pattern. MainError: 08:27:07 PM No files match the supplied pattern. MainError: 04:33:20 AM No files match the supplied pattern. MainError: 04:33:20 AM No files match the supplied pattern. MainError: 12:39:59 AM No files match the supplied pattern. MainError: 12:39:59 AM No files match the supplied pattern. Error converting file to netcdf: dataout/o7bhka.ph11c10 Error converting file to netcdf: dataout/o7bhka.pg11c10 Error converting file to netcdf: dataout/o7bhka.pe11c10 MainError: 08:47:52 PM No files match the supplied pattern. MainError: 08:47:52 PM No files match the supplied pattern. BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 Model crashed: STWORK : I/O error - PP fixed length header tmp/pipe_dummy 2048 BUFFIN: C I/O Error feof - Unit 64 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 Model crashed: STWORK : I/O error - PP fixed length header tmp/pipe_dummy 2048 BUFFIN: C I/O Error feof - Unit 64 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 Model crashed: STWORK : I/O error - PP fixed length header tmp/pipe_dummy 2048 BUFFIN: C I/O Error feof - Unit 64 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 Model crashed: STWORK : I/O error - PP fixed length header tmp/pipe_dummy 2048 BUFFIN: C I/O Error feof - Unit 64 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 Model crashed: STWORK : I/O error - PP fixed length header tmp/pipe_dummy 2048 BUFFIN: C I/O Error feof - Unit 64 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 Model crashed: STWORK : I/O error - PP fixed length header tmp/pipe_dummy 2048 Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
04 Jul 2013 14:02:58 | 1242215 | 15843183 | hadcm3n_o7bh_2140_40_008269505_3 | 777,600 | 838,755 | 1.0786 |
04 Jul 2013 14:02:58 | 1242215 | 15843183 | hadcm3n_o7bh_2140_40_008269505_3 | 751,680 | 809,899 | 1.0775 |
04 Jul 2013 14:02:58 | 1242215 | 15843183 | hadcm3n_o7bh_2140_40_008269505_3 | 725,760 | 781,088 | 1.0762 |
03 Jul 2013 02:03:48 | 1242215 | 15843183 | hadcm3n_o7bh_2140_40_008269505_3 | 699,840 | 752,300 | 1.0750 |
02 Jul 2013 16:06:42 | 1242215 | 15843183 | hadcm3n_o7bh_2140_40_008269505_3 | 673,920 | 723,843 | 1.0741 |
02 Jul 2013 16:06:42 | 1242215 | 15843183 | hadcm3n_o7bh_2140_40_008269505_3 | 648,000 | 694,666 | 1.0720 |
02 Jul 2013 11:59:11 | 1242215 | 15843183 | hadcm3n_o7bh_2140_40_008269505_3 | 622,080 | 667,179 | 1.0725 |
02 Jul 2013 11:07:19 | 1242215 | 15843183 | hadcm3n_o7bh_2140_40_008269505_3 | 596,160 | 639,842 | 1.0733 |
02 Jul 2013 11:07:19 | 1242215 | 15843183 | hadcm3n_o7bh_2140_40_008269505_3 | 570,240 | 612,092 | 1.0734 |
02 Jul 2013 11:07:19 | 1242215 | 15843183 | hadcm3n_o7bh_2140_40_008269505_3 | 544,320 | 584,049 | 1.0730 |
02 Jul 2013 10:23:37 | 1242215 | 15843183 | hadcm3n_o7bh_2140_40_008269505_3 | 518,400 | 555,826 | 1.0722 |
02 Jul 2013 10:12:52 | 1242215 | 15843183 | hadcm3n_o7bh_2140_40_008269505_3 | 492,480 | 527,574 | 1.0713 |
02 Jul 2013 10:07:33 | 1242215 | 15843183 | hadcm3n_o7bh_2140_40_008269505_3 | 466,560 | 499,281 | 1.0701 |
02 Jul 2013 09:52:13 | 1242215 | 15843183 | hadcm3n_o7bh_2140_40_008269505_3 | 440,640 | 471,081 | 1.0691 |
27 Jun 2013 22:57:57 | 1242215 | 15843183 | hadcm3n_o7bh_2140_40_008269505_3 | 414,720 | 443,167 | 1.0686 |
27 Jun 2013 22:57:57 | 1242215 | 15843183 | hadcm3n_o7bh_2140_40_008269505_3 | 388,800 | 415,548 | 1.0688 |
27 Jun 2013 22:57:57 | 1242215 | 15843183 | hadcm3n_o7bh_2140_40_008269505_3 | 362,880 | 387,829 | 1.0688 |
25 Jun 2013 21:26:35 | 1242215 | 15843183 | hadcm3n_o7bh_2140_40_008269505_3 | 336,960 | 360,121 | 1.0687 |
25 Jun 2013 02:48:34 | 1242215 | 15843183 | hadcm3n_o7bh_2140_40_008269505_3 | 311,040 | 332,258 | 1.0682 |
25 Jun 2013 02:48:34 | 1242215 | 15843183 | hadcm3n_o7bh_2140_40_008269505_3 | 285,120 | 304,595 | 1.0683 |
©2024 climateprediction.net