Task 17262061

Name	hadcm3n_sc31_1940_40_009112682_2
Workunit	9243018
Created	22 Oct 2014, 17:33:36 UTC
Sent	22 Oct 2014, 23:11:17 UTC
Report deadline	22 Jan 2015, 6:38:28 UTC
Received	31 Dec 2014, 15:09:44 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1261147
Run time	11 days 6 hours 42 min 55 sec
CPU time	9 days 21 hours 46 min 15 sec
Validate state	Invalid
Credit	7,464.96
Device peak FLOPS	2.81 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>7.0.28</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5924, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4944, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4868, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4868, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4868, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4868, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4868, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4868, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4868, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4868, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4868, iMonCtr=1 Model crash detected, will try to restart... BUFFIN: C I/O Error feof - Unit 60 - Return code = 16 BUFFIN: C I/O Error feof - Unit 61 - Return code = 16 BUFFIN: C I/O Error feof - Unit 62 - Return code = 16 BUFFIN: C I/O Error feof - Unit 63 - Return code = 16 BUFFIN: C I/O Error feof - Unit 64 - Return code = 16 BUFFIN: C I/O Error feof - Unit 65 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 Error converting file to netcdf: dataout/sc31ko.pjf0c10 Error converting file to netcdf: dataout/sc31ko.pif0c10 Error converting file to netcdf: dataout/sc31ko.pff0c10 Error converting file to netcdf: dataout/sc31ko.pcf0c10 Error converting file to netcdf: dataout/sc31ko.pbf0c10 Error converting file to netcdf: dataout/sc31ko.paf0c10 Error converting file to netcdf: dataout/sc31ka.phf0c10 Error converting file to netcdf: dataout/sc31ka.pgf0c10 Error converting file to netcdf: dataout/sc31ka.pef0c10 Error converting file to netcdf: dataout/sc31ka.pdf0c10 Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4024, iMonCtr=1 Model crash detected, will try to restart... 21:10:25 (4352): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:22:07 (20264): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3236, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3236, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3236, iMonCtr=1 Model crash detected, will try to restart... 22:10:11 (5756): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 19:06:43 (4344): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 13:40:02 (4140): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 07:30:36 (4440): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6460, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6460, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6460, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6460, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6460, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6460, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
29 Dec 2014 19:30:46	1261147	17262061	hadcm3n_sc31_1940_40_009112682_2	622,080	829,089	1.3328
29 Dec 2014 01:38:02	1261147	17262061	hadcm3n_sc31_1940_40_009112682_2	596,160	794,154	1.3321
27 Dec 2014 20:40:42	1261147	17262061	hadcm3n_sc31_1940_40_009112682_2	570,240	759,067	1.3311
24 Dec 2014 16:07:59	1261147	17262061	hadcm3n_sc31_1940_40_009112682_2	544,320	724,006	1.3301
21 Dec 2014 18:58:05	1261147	17262061	hadcm3n_sc31_1940_40_009112682_2	518,400	689,384	1.3298
20 Dec 2014 01:28:58	1261147	17262061	hadcm3n_sc31_1940_40_009112682_2	492,480	655,378	1.3308
16 Dec 2014 23:28:19	1261147	17262061	hadcm3n_sc31_1940_40_009112682_2	466,560	621,144	1.3313
14 Dec 2014 19:06:24	1261147	17262061	hadcm3n_sc31_1940_40_009112682_2	440,640	586,852	1.3318
11 Dec 2014 23:48:52	1261147	17262061	hadcm3n_sc31_1940_40_009112682_2	414,720	552,300	1.3317
08 Dec 2014 01:16:52	1261147	17262061	hadcm3n_sc31_1940_40_009112682_2	388,800	517,401	1.3308
07 Dec 2014 03:46:49	1261147	17262061	hadcm3n_sc31_1940_40_009112682_2	362,880	482,785	1.3304
03 Dec 2014 04:26:52	1261147	17262061	hadcm3n_sc31_1940_40_009112682_2	336,960	448,620	1.3314
30 Nov 2014 02:07:26	1261147	17262061	hadcm3n_sc31_1940_40_009112682_2	311,040	413,714	1.3301
28 Nov 2014 01:55:41	1261147	17262061	hadcm3n_sc31_1940_40_009112682_2	285,120	378,885	1.3289
25 Nov 2014 01:09:47	1261147	17262061	hadcm3n_sc31_1940_40_009112682_2	259,200	344,102	1.3276
21 Nov 2014 01:57:33	1261147	17262061	hadcm3n_sc31_1940_40_009112682_2	233,280	309,246	1.3256
16 Nov 2014 19:01:21	1261147	17262061	hadcm3n_sc31_1940_40_009112682_2	207,360	274,524	1.3239
15 Nov 2014 19:05:25	1261147	17262061	hadcm3n_sc31_1940_40_009112682_2	181,440	240,242	1.3241
11 Nov 2014 02:14:19	1261147	17262061	hadcm3n_sc31_1940_40_009112682_2	155,520	205,704	1.3227
06 Nov 2014 00:25:21	1261147	17262061	hadcm3n_sc31_1940_40_009112682_2	129,600	171,169	1.3207