Task 17306574

Name	hadcm3n_sayd_1940_40_009109754_1
Workunit	9240090
Created	26 Oct 2014, 13:59:22 UTC
Sent	26 Oct 2014, 14:16:41 UTC
Report deadline	25 Jan 2015, 21:43:52 UTC
Received	16 Nov 2014, 13:18:49 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1229213
Run time	5 days 11 hours 23 min 24 sec
CPU time	5 days 1 hours 59 min 59 sec
Validate state	Invalid
Credit	8,709.12
Device peak FLOPS	4.13 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>7.2.42</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=900, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5828, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5828, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6224, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5652, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5632, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5632, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6100, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6096, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5764, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4532, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4532, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4128, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5372, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5372, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4712, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4712, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3432, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3432, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3152, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3152, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6588, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6588, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6256, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6256, iMonCtr=1 Model crash detected, will try to restart... Model crashed: P_TH_ADJ : NEGATIVE PRESSURE VALUE CREATED. tmp/pipe_dummy 2048 Model crashed: P_TH_ADJ : NEGATIVE PRESSURE VALUE CREATED. tmp/pipe_dummy 2048 Model crashed: P_TH_ADJ : NEGATIVE PRESSURE VALUE CREATED. tmp/pipe_dummy 2048 Model crashed: P_TH_ADJ : NEGATIVE PRESSURE VALUE CREATED. tmp/pipe_dummy 2048 Model crashed: P_TH_ADJ : NEGATIVE PRESSURE VALUE CREATED. tmp/pipe_dummy 2048 Model crashed: P_TH_ADJ : NEGATIVE PRESSURE VALUE CREATED. tmp/pipe_dummy 2048 Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
16 Nov 2014 13:24:10	1229213	17306574	hadcm3n_sayd_1940_40_009109754_1	725,760	440,399	0.6068
15 Nov 2014 21:04:32	1229213	17306574	hadcm3n_sayd_1940_40_009109754_1	699,840	424,454	0.6065
15 Nov 2014 16:14:27	1229213	17306574	hadcm3n_sayd_1940_40_009109754_1	673,920	408,580	0.6063
14 Nov 2014 23:39:41	1229213	17306574	hadcm3n_sayd_1940_40_009109754_1	648,000	392,888	0.6063
14 Nov 2014 18:23:28	1229213	17306574	hadcm3n_sayd_1940_40_009109754_1	622,080	377,047	0.6061
12 Nov 2014 20:14:18	1229213	17306574	hadcm3n_sayd_1940_40_009109754_1	596,160	361,288	0.6060
11 Nov 2014 21:08:11	1229213	17306574	hadcm3n_sayd_1940_40_009109754_1	570,240	345,633	0.6061
10 Nov 2014 22:03:45	1229213	17306574	hadcm3n_sayd_1940_40_009109754_1	544,320	329,986	0.6062
09 Nov 2014 22:34:58	1229213	17306574	hadcm3n_sayd_1940_40_009109754_1	518,400	314,237	0.6062
09 Nov 2014 17:55:23	1229213	17306574	hadcm3n_sayd_1940_40_009109754_1	492,480	298,588	0.6063
09 Nov 2014 12:55:08	1229213	17306574	hadcm3n_sayd_1940_40_009109754_1	466,560	282,886	0.6063
08 Nov 2014 20:16:09	1229213	17306574	hadcm3n_sayd_1940_40_009109754_1	440,640	267,234	0.6065
08 Nov 2014 15:31:24	1229213	17306574	hadcm3n_sayd_1940_40_009109754_1	414,720	251,550	0.6066
07 Nov 2014 23:33:37	1229213	17306574	hadcm3n_sayd_1940_40_009109754_1	388,800	235,782	0.6064
06 Nov 2014 18:32:27	1229213	17306574	hadcm3n_sayd_1940_40_009109754_1	362,880	220,081	0.6065
05 Nov 2014 19:14:28	1229213	17306574	hadcm3n_sayd_1940_40_009109754_1	336,960	204,347	0.6064
04 Nov 2014 19:52:44	1229213	17306574	hadcm3n_sayd_1940_40_009109754_1	311,040	188,531	0.6061
03 Nov 2014 20:45:57	1229213	17306574	hadcm3n_sayd_1940_40_009109754_1	285,120	172,770	0.6060
02 Nov 2014 20:39:18	1229213	17306574	hadcm3n_sayd_1940_40_009109754_1	259,200	156,914	0.6054
02 Nov 2014 15:54:29	1229213	17306574	hadcm3n_sayd_1940_40_009109754_1	233,280	141,147	0.6051