Name | hadcm3n_o732_2140_40_008269662_3 |
Workunit | 8424786 |
Created | 27 Apr 2013, 13:46:56 UTC |
Sent | 27 Apr 2013, 13:47:01 UTC |
Report deadline | 27 Jul 2013, 21:14:12 UTC |
Received | 18 May 2013, 2:35:26 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | 22 (0x00000016) Unknown error code |
Computer ID | 913414 |
Run time | 14 days 6 hours 33 min 30 sec |
CPU time | 14 days 6 hours 33 min 30 sec |
Validate state | Invalid |
Credit | 9,331.20 |
Device peak FLOPS | 2.22 GFLOPS |
Application version | UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86 |
Stderr | <core_client_version>6.2.19</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> 07:20:04 (279164): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 07:20:22 (279164): No heartbeat from core client for 30 sec - exiting 07:20:24 (279164): No heartbeat from core client for 30 sec - exiting 07:20:25 (279164): No heartbeat from core client for 30 sec - exiting 07:20:26 (279164): No heartbeat from core client for 30 sec - exiting 07:20:27 (279164): No heartbeat from core client for 30 sec - exiting 07:20:28 (279164): No heartbeat from core client for 30 sec - exiting 07:20:29 (279164): No heartbeat from core client for 30 sec - exiting 07:20:30 (279164): No heartbeat from core client for 30 sec - exiting 07:20:31 (279164): No heartbeat from core client for 30 sec - exiting 07:20:36 (279164): No heartbeat from core client for 30 sec - exiting 07:20:37 (279164): No heartbeat from core client for 30 sec - exiting 07:20:38 (279164): No heartbeat from core client for 30 sec - exiting 07:20:39 (279164): No heartbeat from core client for 30 sec - exiting 07:20:40 (279164): No heartbeat from core client for 30 sec - exiting 07:20:41 (279164): No heartbeat from core client for 30 sec - exiting 07:20:42 (279164): No heartbeat from core client for 30 sec - exiting 07:20:43 (279164): No heartbeat from core client for 30 sec - exiting 07:20:44 (279164): No heartbeat from core client for 30 sec - exiting 07:20:45 (279164): No heartbeat from core client for 30 sec - exiting 07:20:46 (279164): No heartbeat from core client for 30 sec - exiting 07:20:47 (279164): No heartbeat from core client for 30 sec - exiting 07:20:48 (279164): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - Suspend request from BOINC... 20:08:14 (299860): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:08:16 (299860): No heartbeat from core client for 30 sec - exiting 20:08:17 (299860): No heartbeat from core client for 30 sec - exiting 20:08:18 (299860): No heartbeat from core client for 30 sec - exiting 20:08:19 (299860): No heartbeat from core client for 30 sec - exiting Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=655592, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=655592, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=655592, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... MainError: 09:46:21 PM No files match the supplied pattern. MainError: 09:46:21 PM No files match the supplied pattern. MainError: 12:14:24 AM No files match the supplied pattern. MainError: 12:14:24 AM No files match the supplied pattern. MainError: 03:28:21 AM No files match the supplied pattern. MainError: 03:28:21 AM No files match the supplied pattern. MainError: 04:47:32 PM No files match the supplied pattern. MainError: 04:47:32 PM No files match the supplied pattern. MainError: 05:20:26 AM No files match the supplied pattern. MainError: 05:20:26 AM No files match the supplied pattern. MainError: 05:46:04 PM No files match the supplied pattern. MainError: 05:46:05 PM No files match the supplied pattern. MainError: 06:09:11 AM No files match the supplied pattern. MainError: 06:09:11 AM No files match the supplied pattern. MainError: 07:52:49 PM No files match the supplied pattern. MainError: 07:52:49 PM No files match the supplied pattern. MainError: 09:42:22 AM No files match the supplied pattern. MainError: 09:42:22 AM No files match the supplied pattern. MainError: 11:14:26 PM No files match the supplied pattern. MainError: 11:14:26 PM No files match the supplied pattern. CPDN Monitor - Quit request from BOINC... Error converting file to netcdf: dataout/o732ka.ph11c10 Error converting file to netcdf: dataout/o732ka.pg11c10 Error converting file to netcdf: dataout/o732ka.pe11c10 MainError: 01:20:52 PM No files match the supplied pattern. MainError: 01:20:52 PM No files match the supplied pattern. BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 Model crashed: STWORK : I/O error - PP fixed length header tmp/pipe_dummy 2048 BUFFIN: C I/O Error feof - Unit 64 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 Model crashed: STWORK : I/O error - PP fixed length header tmp/pipe_dummy 2048 BUFFIN: C I/O Error feof - Unit 64 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 Model crashed: STWORK : I/O error - PP fixed length header tmp/pipe_dummy 2048 BUFFIN: C I/O Error feof - Unit 64 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 Model crashed: STWORK : I/O error - PP fixed length header tmp/pipe_dummy 2048 BUFFIN: C I/O Error feof - Unit 64 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 Model crashed: STWORK : I/O error - PP fixed length header tmp/pipe_dummy 2048 BUFFIN: C I/O Error feof - Unit 64 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 Model crashed: STWORK : I/O error - PP fixed length header tmp/pipe_dummy 2048 Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
18 May 2013 02:36:18 | 913414 | 15753197 | hadcm3n_o732_2140_40_008269662_3 | 777,600 | 1,276,765 | 1.6419 |
16 May 2013 11:13:13 | 913414 | 15753197 | hadcm3n_o732_2140_40_008269662_3 | 751,680 | 1,228,563 | 1.6344 |
15 May 2013 09:45:52 | 913414 | 15753197 | hadcm3n_o732_2140_40_008269662_3 | 725,760 | 1,180,870 | 1.6271 |
14 May 2013 19:53:57 | 913414 | 15753197 | hadcm3n_o732_2140_40_008269662_3 | 699,840 | 1,132,116 | 1.6177 |
14 May 2013 06:17:51 | 913414 | 15753197 | hadcm3n_o732_2140_40_008269662_3 | 673,920 | 1,083,518 | 1.6078 |
14 May 2013 02:26:16 | 913414 | 15753197 | hadcm3n_o732_2140_40_008269662_3 | 648,000 | 1,039,402 | 1.6040 |
13 May 2013 11:35:50 | 913414 | 15753197 | hadcm3n_o732_2140_40_008269662_3 | 622,080 | 995,038 | 1.5995 |
12 May 2013 17:26:25 | 913414 | 15753197 | hadcm3n_o732_2140_40_008269662_3 | 596,160 | 950,454 | 1.5943 |
11 May 2013 03:30:44 | 913414 | 15753197 | hadcm3n_o732_2140_40_008269662_3 | 570,240 | 901,858 | 1.5815 |
10 May 2013 22:42:25 | 913414 | 15753197 | hadcm3n_o732_2140_40_008269662_3 | 544,320 | 849,467 | 1.5606 |
10 May 2013 03:39:47 | 913414 | 15753197 | hadcm3n_o732_2140_40_008269662_3 | 518,400 | 799,268 | 1.5418 |
09 May 2013 11:27:03 | 913414 | 15753197 | hadcm3n_o732_2140_40_008269662_3 | 492,480 | 752,329 | 1.5276 |
09 May 2013 00:38:46 | 913414 | 15753197 | hadcm3n_o732_2140_40_008269662_3 | 466,560 | 707,906 | 1.5173 |
08 May 2013 08:34:06 | 913414 | 15753197 | hadcm3n_o732_2140_40_008269662_3 | 440,640 | 663,995 | 1.5069 |
08 May 2013 00:52:00 | 913414 | 15753197 | hadcm3n_o732_2140_40_008269662_3 | 414,720 | 619,925 | 1.4948 |
07 May 2013 07:11:14 | 913414 | 15753197 | hadcm3n_o732_2140_40_008269662_3 | 388,800 | 575,225 | 1.4795 |
07 May 2013 00:14:12 | 913414 | 15753197 | hadcm3n_o732_2140_40_008269662_3 | 362,880 | 533,175 | 1.4693 |
06 May 2013 00:31:04 | 913414 | 15753197 | hadcm3n_o732_2140_40_008269662_3 | 336,960 | 491,585 | 1.4589 |
05 May 2013 10:55:29 | 913414 | 15753197 | hadcm3n_o732_2140_40_008269662_3 | 311,040 | 583,139 | 1.8748 |
04 May 2013 22:01:54 | 913414 | 15753197 | hadcm3n_o732_2140_40_008269662_3 | 285,120 | 538,860 | 1.8899 |
©2024 cpdn.org