climateprediction.net home page
Task 15559916

Task 15559916

Name hadcm3n_o1kg_2140_40_008270300_2
Workunit 8425424
Created 28 Jan 2013, 19:00:15 UTC
Sent 28 Jan 2013, 19:00:25 UTC
Report deadline 30 Apr 2013, 2:27:36 UTC
Received 6 Apr 2013, 20:19:21 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 22 (0x00000016) Unknown error code
Computer ID 1206904
Run time 29 days 23 hours 54 min 33 sec
CPU time 22 days 11 hours 30 min 18 sec
Validate state Invalid
Credit 9,331.20
Device peak FLOPS 2.40 GFLOPS
Application version UK Met Office Coupled Model Full Resolution Ocean v6.07
windows_intelx86
Stderr
<core_client_version>7.0.25</core_client_version>
<![CDATA[
<message>
The device does not recognize the command. (0x16) - exit code 22 (0x16)
</message>
<stderr_txt>
13:10:34 (5496): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
09:14:18 (4272): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7260, iMonCtr=1
Model crash detected, will try to restart...
10:31:35 (4276): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3684, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1148, iMonCtr=1
Model crash detected, will try to restart...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
11:28:17 (7136): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6340, iMonCtr=1
Model crash detected, will try to restart...
12:47:54 (5612): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6792, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4724, iMonCtr=1
Model crash detected, will try to restart...
14:32:42 (6604): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
08:21:48 (4276): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
09:10:22 (4988): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6644, iMonCtr=1
Model crash detected, will try to restart...
08:05:21 (5964): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
08:06:53 (1520): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
08:42:30 (1576): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
08:44:47 (2708): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3064, iMonCtr=1
Model crash detected, will try to restart...
12:38:14 (4824): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5804, iMonCtr=1
Model crash detected, will try to restart...
11:14:23 (5580): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2108, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
09:58:31 (4336): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6448, iMonCtr=1
Model crash detected, will try to restart...
15:05:45 (4212): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6956, iMonCtr=1
Model crash detected, will try to restart...
11:33:44 (5560): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
11:27:07 (4016): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6096, iMonCtr=1
Model crash detected, will try to restart...
12:15:13 (6792): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
15:19:15 (5664): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
16:31:21 (4116): No heartbeat from core client for 30 sec - exiting
16:31:22 (4116): No heartbeat from core client for 30 sec - exiting
16:31:23 (4116): No heartbeat from core client for 30 sec - exiting
16:31:24 (4116): No heartbeat from core client for 30 sec - exiting
16:31:25 (4116): No heartbeat from core client for 30 sec - exiting
16:31:26 (4116): No heartbeat from core client for 30 sec - exiting
16:31:27 (4116): No heartbeat from core client for 30 sec - exiting
16:31:28 (4116): No heartbeat from core client for 30 sec - exiting
16:31:29 (4116): No heartbeat from core client for 30 sec - exiting
16:31:30 (4116): No heartbeat from core client for 30 sec - exiting
16:31:31 (4116): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
16:31:32 (4116): No heartbeat from core client for 30 sec - exiting
16:31:34 (4116): No heartbeat from core client for 30 sec - exiting
11:19:15 (528): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6760, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5592, iMonCtr=1
Model crash detected, will try to restart...
12:46:06 (3736): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
12:49:28 (6000): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
15:03:09 (4784): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6448, iMonCtr=1
Model crash detected, will try to restart...
08:07:12 (4480): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7460, iMonCtr=1
Model crash detected, will try to restart...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
09:01:53 (4560): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5368, iMonCtr=1
Model crash detected, will try to restart...
10:34:17 (3928): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7072, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6092, iMonCtr=1
Model crash detected, will try to restart...
11:55:51 (6860): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
06:28:13 (6796): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4700, iMonCtr=1
Model crash detected, will try to restart...
08:43:04 (6556): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1828, iMonCtr=1
Model crash detected, will try to restart...
21:21:15 (6912): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4880, iMonCtr=1
Model crash detected, will try to restart...
15:26:44 (3712): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6804, iMonCtr=1
Model crash detected, will try to restart...
MainError:	06:10:21 PM	No files match the supplied pattern.
MainError:	06:10:22 PM	No files match the supplied pattern.
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4008, iMonCtr=1
Model crash detected, will try to restart...
10:01:49 (4968): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
10:01:51 (4968): No heartbeat from core client for 30 sec - exiting
10:01:52 (4968): No heartbeat from core client for 30 sec - exiting
MainError:	07:20:09 PM	No files match the supplied pattern.
MainError:	07:20:09 PM	No files match the supplied pattern.
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7144, iMonCtr=1
Model crash detected, will try to restart...
06:20:51 (5404): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
MainError:	07:57:11 PM	No files match the supplied pattern.
MainError:	07:57:11 PM	No files match the supplied pattern.
13:05:00 (4376): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
C14:48:49 (4864): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
MainError:	04:24:16 PM	No files match the supplied pattern.
MainError:	04:24:16 PM	No files match the supplied pattern.
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7500, iMonCtr=1
Model crash detected, will try to restart...
09:41:50 (5152): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
09:42:45 (6852): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6220, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4480, iMonCtr=1
Model crash detected, will try to restart...
MainError:	08:56:50 PM	No files match the supplied pattern.
MainError:	08:56:51 PM	No files match the supplied pattern.
14:21:09 (5112): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7096, iMonCtr=1
Model crash detected, will try to restart...
11:55:11 (5848): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
MainError:	06:17:54 PM	No files match the supplied pattern.
MainError:	06:17:54 PM	No files match the supplied pattern.
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1436, iMonCtr=1
Model crash detected, will try to restart...
MainError:	10:16:11 AM	No files match the supplied pattern.
MainError:	10:16:11 AM	No files match the supplied pattern.
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6560, iMonCtr=1
Model crash detected, will try to restart...
MainError:	03:32:03 PM	No files match the supplied pattern.
MainError:	03:32:03 PM	No files match the supplied pattern.
13:20:11 (4196): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
CMainError:	05:36:17 PM	No files match the supplied pattern.
MainError:	05:36:17 PM	No files match the supplied pattern.
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5888, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5560, iMonCtr=1
Model crash detected, will try to restart...
MainError:	08:09:54 PM	No files match the supplied pattern.
MainError:	08:09:54 PM	No files match the supplied pattern.
Error converting file to netcdf: dataout/o1kgka.ph11c10
Error converting file to netcdf: dataout/o1kgka.pg11c10
Error converting file to netcdf: dataout/o1kgka.pe11c10
MainError:	06:04:02 PM	No files match the supplied pattern.
MainError:	06:04:02 PM	No files match the supplied pattern.
BUFFIN: C I/O Error feof - Unit 67 - Return code = 16

Model crashed: STWORK  : I/O error - PP fixed length header                                                                                                                                                                                                                    tmp/pipe_dummy                                                                  2048    
BUFFIN: C I/O Error feof - Unit 64 - Return code = 16
BUFFIN: C I/O Error feof - Unit 66 - Return code = 16
BUFFIN: C I/O Error feof - Unit 67 - Return code = 16
BUFFIN: C I/O Error feof - Unit 68 - Return code = 16
BUFFIN: C I/O Error feof - Unit 69 - Return code = 16
BUFFIN: C I/O Error feof - Unit 67 - Return code = 16

Model crashed: STWORK  : I/O error - PP fixed length header                                                                                                                                                                                                                    tmp/pipe_dummy                                                                  2048    
BUFFIN: C I/O Error feof - Unit 64 - Return code = 16
BUFFIN: C I/O Error feof - Unit 66 - Return code = 16
BUFFIN: C I/O Error feof - Unit 67 - Return code = 16
BUFFIN: C I/O Error feof - Unit 68 - Return code = 16
BUFFIN: C I/O Error feof - Unit 69 - Return code = 16
BUFFIN: C I/O Error feof - Unit 67 - Return code = 16

Model crashed: STWORK  : I/O error - PP fixed length header                                                                                                                                                                                                                    tmp/pipe_dummy                                                                  2048    
BUFFIN: C I/O Error feof - Unit 64 - Return code = 16
BUFFIN: C I/O Error feof - Unit 66 - Return code = 16
BUFFIN: C I/O Error feof - Unit 67 - Return code = 16
BUFFIN: C I/O Error feof - Unit 68 - Return code = 16
BUFFIN: C I/O Error feof - Unit 69 - Return code = 16
BUFFIN: C I/O Error feof - Unit 67 - Return code = 16

Model crashed: STWORK  : I/O error - PP fixed length header                                                                                                                                                                                                                    tmp/pipe_dummy                                                                  2048    
BUFFIN: C I/O Error feof - Unit 64 - Return code = 16
BUFFIN: C I/O Error feof - Unit 66 - Return code = 16
BUFFIN: C I/O Error feof - Unit 67 - Return code = 16
BUFFIN: C I/O Error feof - Unit 68 - Return code = 16
BUFFIN: C I/O Error feof - Unit 69 - Return code = 16
BUFFIN: C I/O Error feof - Unit 67 - Return code = 16

Model crashed: STWORK  : I/O error - PP fixed length header                                                                                                                                                                                                                    tmp/pipe_dummy                                                                  2048    
BUFFIN: C I/O Error feof - Unit 64 - Return code = 16
BUFFIN: C I/O Error feof - Unit 66 - Return code = 16
BUFFIN: C I/O Error feof - Unit 67 - Return code = 16
BUFFIN: C I/O Error feof - Unit 68 - Return code = 16
BUFFIN: C I/O Error feof - Unit 69 - Return code = 16
BUFFIN: C I/O Error feof - Unit 67 - Return code = 16

Model crashed: STWORK  : I/O error - PP fixed length header                                                                                                                                                                                                                    tmp/pipe_dummy                                                                  2048    
Sorry, too many model crashes! :-(
Called boinc_finish

</stderr_txt>
]]>
Latest Trickles Received
Time Sent (UTC) Host ID Result ID Result Name Timestep CPU Time (sec) Average (sec/TS)
06 Apr 2013 18:14:58 1206904 15559916 hadcm3n_o1kg_2140_40_008270300_2 777,600 1,966,542 2.5290
04 Apr 2013 20:13:45 1206904 15559916 hadcm3n_o1kg_2140_40_008270300_2 751,680 1,895,871 2.5222
01 Apr 2013 17:37:39 1206904 15559916 hadcm3n_o1kg_2140_40_008270300_2 725,760 1,827,204 2.5176
30 Mar 2013 15:35:37 1206904 15559916 hadcm3n_o1kg_2140_40_008270300_2 699,840 1,759,135 2.5136
28 Mar 2013 10:19:10 1206904 15559916 hadcm3n_o1kg_2140_40_008270300_2 673,920 1,691,459 2.5099
25 Mar 2013 18:22:10 1206904 15559916 hadcm3n_o1kg_2140_40_008270300_2 648,000 1,624,958 2.5077
22 Mar 2013 21:00:57 1206904 15559916 hadcm3n_o1kg_2140_40_008270300_2 622,080 1,556,581 2.5022
20 Mar 2013 16:25:48 1206904 15559916 hadcm3n_o1kg_2140_40_008270300_2 596,160 1,488,148 2.4962
17 Mar 2013 19:57:51 1206904 15559916 hadcm3n_o1kg_2140_40_008270300_2 570,240 1,426,002 2.5007
16 Mar 2013 19:43:39 1206904 15559916 hadcm3n_o1kg_2140_40_008270300_2 544,320 1,367,845 2.5129
14 Mar 2013 18:14:11 1206904 15559916 hadcm3n_o1kg_2140_40_008270300_2 518,400 1,302,709 2.5129
12 Mar 2013 12:02:27 1206904 15559916 hadcm3n_o1kg_2140_40_008270300_2 492,480 1,234,809 2.5073
10 Mar 2013 09:11:33 1206904 15559916 hadcm3n_o1kg_2140_40_008270300_2 466,560 1,164,509 2.4959
07 Mar 2013 18:51:22 1206904 15559916 hadcm3n_o1kg_2140_40_008270300_2 440,640 1,094,259 2.4833
04 Mar 2013 17:54:43 1206904 15559916 hadcm3n_o1kg_2140_40_008270300_2 414,720 1,025,351 2.4724
02 Mar 2013 10:49:05 1206904 15559916 hadcm3n_o1kg_2140_40_008270300_2 388,800 957,395 2.4624
27 Feb 2013 21:34:10 1206904 15559916 hadcm3n_o1kg_2140_40_008270300_2 362,880 888,509 2.4485
24 Feb 2013 20:47:42 1206904 15559916 hadcm3n_o1kg_2140_40_008270300_2 336,960 820,902 2.4362
23 Feb 2013 07:16:50 1206904 15559916 hadcm3n_o1kg_2140_40_008270300_2 311,040 753,358 2.4221
21 Feb 2013 18:31:43 1206904 15559916 hadcm3n_o1kg_2140_40_008270300_2 285,120 686,190 2.4067


©2024 climateprediction.net