Task 16152046

Name	hadcm3n_856k_1980_40_008464672_1
Workunit	8615511
Created	20 Dec 2013, 22:13:16 UTC
Sent	20 Dec 2013, 22:13:23 UTC
Report deadline	22 Mar 2014, 5:40:34 UTC
Received	11 Apr 2014, 10:52:14 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1295275
Run time	10 days 6 hours 44 min 2 sec
CPU time	9 days 20 hours 47 min 39 sec
Validate state	Invalid
Credit	7,153.92
Device peak FLOPS	3.32 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>7.2.28</core_client_version> <![CDATA[ <message> Het apparaat herkent de opdracht niet. (0x16) - exit code 22 (0x16) </message> <stderr_txt> 10:39:39 (5104): No heartbeat from core client for 30 sec - exiting 10:39:40 (5104): No heartbeat from core client for 30 sec - exiting 10:39:41 (5104): No heartbeat from core client for 30 sec - exiting 10:39:42 (5104): No heartbeat from core client for 30 sec - exiting 10:39:43 (5104): No heartbeat from core client for 30 sec - exiting 10:39:44 (5104): No heartbeat from core client for 30 sec - exiting 10:39:45 (5104): No heartbeat from core client for 30 sec - exiting 10:39:46 (5104): No heartbeat from core client for 30 sec - exiting 10:39:47 (5104): No heartbeat from core client for 30 sec - exiting 10:39:48 (5104): No heartbeat from core client for 30 sec - exiting 10:39:49 (5104): No heartbeat from core client for 30 sec - exiting 10:39:50 (5104): No heartbeat from core client for 30 sec - exiting 10:39:51 (5104): No heartbeat from core client for 30 sec - exiting 10:39:52 (5104): No heartbeat from core client for 30 sec - exiting 10:39:53 (5104): No heartbeat from core client for 30 sec - exiting 10:39:54 (5104): No heartbeat from core client for 30 sec - exiting 10:39:55 (5104): No heartbeat from core client for 30 sec - exiting 10:39:56 (5104): No heartbeat from core client for 30 sec - exiting 10:39:57 (5104): No heartbeat from core client for 30 sec - exiting 10:39:58 (5104): No heartbeat from core client for 30 sec - exiting 10:39:59 (5104): No heartbeat from core client for 30 sec - exiting 10:40:00 (5104): No heartbeat from core client for 30 sec - exiting 10:40:01 (5104): No heartbeat from core client for 30 sec - exiting 10:40:02 (5104): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 10:40:03 (5104): No heartbeat from core client for 30 sec - exiting Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4856, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5108, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4732, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2888, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3512, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2640, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=924, iMonCtr=1 Model crash detected, will try to restart... 11:24:38 (4900): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4148, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4020, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3656, iMonCtr=1 Model crash detected, will try to restart... 18:59:37 (2548): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4416, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2096, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2508, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4356, iMonCtr=1 Model crash detected, will try to restart... CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3380, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3912, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4132, iMonCtr=1 Model crash detected, will try to restart... BUFFIN: C I/O Error feof - Unit 63 - Return code = 16 BUFFIN: C I/O Error feof - Unit 64 - Return code = 16 BUFFIN: C I/O Error feof - Unit 65 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4260, iMonCtr=1 Model crash detected, will try to restart... CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1404, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3540, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3540, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3540, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3540, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3540, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3540, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
10 Apr 2014 19:09:38	1295275	16152046	hadcm3n_856k_1980_40_008464672_1	596,160	842,457	1.4131
06 Apr 2014 23:43:58	1295275	16152046	hadcm3n_856k_1980_40_008464672_1	570,240	805,543	1.4126
06 Apr 2014 06:23:48	1295275	16152046	hadcm3n_856k_1980_40_008464672_1	544,320	768,363	1.4116
31 Mar 2014 19:00:33	1295275	16152046	hadcm3n_856k_1980_40_008464672_1	518,400	731,322	1.4107
23 Mar 2014 13:04:30	1295275	16152046	hadcm3n_856k_1980_40_008464672_1	492,480	695,123	1.4115
22 Mar 2014 11:40:31	1295275	16152046	hadcm3n_856k_1980_40_008464672_1	466,560	660,745	1.4162
15 Mar 2014 14:37:30	1295275	16152046	hadcm3n_856k_1980_40_008464672_1	440,640	624,708	1.4177
04 Mar 2014 08:04:12	1295275	16152046	hadcm3n_856k_1980_40_008464672_1	414,720	587,725	1.4172
24 Feb 2014 12:49:26	1295275	16152046	hadcm3n_856k_1980_40_008464672_1	388,800	553,111	1.4226
21 Feb 2014 14:23:13	1295275	16152046	hadcm3n_856k_1980_40_008464672_1	362,880	515,867	1.4216
14 Feb 2014 10:56:16	1295275	16152046	hadcm3n_856k_1980_40_008464672_1	336,960	478,635	1.4205
09 Feb 2014 23:25:25	1295275	16152046	hadcm3n_856k_1980_40_008464672_1	311,040	441,591	1.4197
09 Feb 2014 13:21:29	1295275	16152046	hadcm3n_856k_1980_40_008464672_1	285,120	404,582	1.4190
06 Feb 2014 14:01:34	1295275	16152046	hadcm3n_856k_1980_40_008464672_1	259,200	368,692	1.4224
02 Feb 2014 10:53:50	1295275	16152046	hadcm3n_856k_1980_40_008464672_1	233,280	332,164	1.4239
26 Jan 2014 12:07:34	1295275	16152046	hadcm3n_856k_1980_40_008464672_1	207,360	295,442	1.4248
20 Jan 2014 14:06:38	1295275	16152046	hadcm3n_856k_1980_40_008464672_1	181,440	258,664	1.4256
17 Jan 2014 08:32:08	1295275	16152046	hadcm3n_856k_1980_40_008464672_1	155,520	221,807	1.4262
10 Jan 2014 10:32:53	1295275	16152046	hadcm3n_856k_1980_40_008464672_1	129,600	184,793	1.4259
09 Jan 2014 09:52:25	1295275	16152046	hadcm3n_856k_1980_40_008464672_1	103,680	147,969	1.4272