Task 13956767

Name	hadcm3n_o3jf_1940_40_007694003_1
Workunit	7849111
Created	23 Jan 2012, 20:08:54 UTC
Sent	23 Jan 2012, 20:14:58 UTC
Report deadline	24 Apr 2012, 3:42:09 UTC
Received	16 Feb 2012, 19:28:21 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	950229
Run time	14 days 5 hours 57 min 39 sec
CPU time	13 days 21 hours 11 min 22 sec
Validate state	Invalid
Credit	5,287.68
Device peak FLOPS	1.98 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.12.34</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3952, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3804, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=216, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4132, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4132, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4132, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4132, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4132, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4132, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_o3jf_1940_40_007694003/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_o3jf_1940_40_007694003/dataout/ocean_restart.day after 11 attempts Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6108, iMonCtr=1 Model crash detected, will try to restart... cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_o3jf_1940_40_007694003/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_o3jf_1940_40_007694003/dataout/ocean_restart.day after 11 attempts Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6108, iMonCtr=1 Model crash detected, will try to restart... cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_o3jf_1940_40_007694003/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_o3jf_1940_40_007694003/dataout/ocean_restart.day after 11 attempts Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6108, iMonCtr=1 Model crash detected, will try to restart... cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_o3jf_1940_40_007694003/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_o3jf_1940_40_007694003/dataout/ocean_restart.day after 11 attempts Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6108, iMonCtr=1 Model crash detected, will try to restart... cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_o3jf_1940_40_007694003/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_o3jf_1940_40_007694003/dataout/ocean_restart.day after 11 attempts Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6108, iMonCtr=1 Model crash detected, will try to restart... cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_o3jf_1940_40_007694003/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_o3jf_1940_40_007694003/dataout/ocean_restart.day after 11 attempts CPDN Monitor - Quit request from BOINC... cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_o3jf_1940_40_007694003/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_o3jf_1940_40_007694003/dataout/ocean_restart.day after 11 attempts Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6268, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
16 Feb 2012 06:15:37	950229	13956767	hadcm3n_o3jf_1940_40_007694003_1	440,640	1,153,422	2.6176
15 Feb 2012 10:43:40	950229	13956767	hadcm3n_o3jf_1940_40_007694003_1	414,720	1,084,706	2.6155
14 Feb 2012 15:13:59	950229	13956767	hadcm3n_o3jf_1940_40_007694003_1	388,800	1,016,022	2.6132
13 Feb 2012 19:31:26	950229	13956767	hadcm3n_o3jf_1940_40_007694003_1	362,880	947,491	2.6110
13 Feb 2012 00:09:53	950229	13956767	hadcm3n_o3jf_1940_40_007694003_1	336,960	879,327	2.6096
12 Feb 2012 05:39:47	950229	13956767	hadcm3n_o3jf_1940_40_007694003_1	311,040	811,301	2.6083
11 Feb 2012 10:10:17	950229	13956767	hadcm3n_o3jf_1940_40_007694003_1	285,120	743,162	2.6065
10 Feb 2012 14:29:24	950229	13956767	hadcm3n_o3jf_1940_40_007694003_1	259,200	674,921	2.6039
09 Feb 2012 18:57:39	950229	13956767	hadcm3n_o3jf_1940_40_007694003_1	233,280	607,005	2.6020
08 Feb 2012 23:40:59	950229	13956767	hadcm3n_o3jf_1940_40_007694003_1	207,360	539,427	2.6014
08 Feb 2012 04:38:02	950229	13956767	hadcm3n_o3jf_1940_40_007694003_1	181,440	471,926	2.6010
07 Feb 2012 09:11:42	950229	13956767	hadcm3n_o3jf_1940_40_007694003_1	155,520	404,313	2.5997
06 Feb 2012 13:36:48	950229	13956767	hadcm3n_o3jf_1940_40_007694003_1	129,600	336,724	2.5982
05 Feb 2012 18:04:56	950229	13956767	hadcm3n_o3jf_1940_40_007694003_1	103,680	269,350	2.5979
04 Feb 2012 22:49:13	950229	13956767	hadcm3n_o3jf_1940_40_007694003_1	77,760	201,797	2.5951
04 Feb 2012 03:30:50	950229	13956767	hadcm3n_o3jf_1940_40_007694003_1	51,840	134,255	2.5898
03 Feb 2012 08:06:55	950229	13956767	hadcm3n_o3jf_1940_40_007694003_1	25,920	67,041	2.5865