Task 18120337

Name	hadcm3n_xblf_1940_40_009151321_2
Workunit	9281657
Created	16 Mar 2015, 2:17:30 UTC
Sent	16 Mar 2015, 2:17:40 UTC
Report deadline	15 Jun 2015, 9:44:51 UTC
Received	12 May 2015, 17:39:37 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1327756
Run time	14 days 23 hours 56 min 21 sec
CPU time	14 days 2 hours 14 min 16 sec
Validate state	Invalid
Credit	8,398.08
Device peak FLOPS	2.80 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>7.4.42</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2584, iMonCtr=1 Model crash detected, will try to restart... 16:11:20 (3648): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5556, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3384, iMonCtr=1 Model crash detected, will try to restart... 18:10:12 (5224): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:10:13 (5224): No heartbeat from core client for 30 sec - exiting 11:36:51 (2708): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1028, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4532, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5908, iMonCtr=1 Model crash detected, will try to restart... CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5924, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... 15:06:38 (4756): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3144, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5808, iMonCtr=1 Model crash detected, will try to restart... 19:03:28 (6120): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:31:58 (5072): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 23:59:51 (6968): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 16:25:35 (4856): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3696, iMonCtr=1 Model crash detected, will try to restart... 10:24:33 (5584): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 00:35:17 (1236): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 02:18:28 (7808): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 21:31:37 (6624): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3412, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4416, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3892, iMonCtr=1 Model crash detected, will try to restart... 11:35:02 (5484): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 11:47:51 (5056): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 17:53:48 (3732): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 13:19:26 (4024): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5364, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5364, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... 04:13:30 (3440): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 10:25:17 (5180): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6008, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 08:58:50 (7276): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 08:58:51 (7276): No heartbeat from core client for 30 sec - exiting 08:58:52 (7276): No heartbeat from core client for 30 sec - exiting 08:58:53 (7276): No heartbeat from core client for 30 sec - exiting 08:58:54 (7276): No heartbeat from core client for 30 sec - exiting 08:58:55 (7276): No heartbeat from core client for 30 sec - exiting 08:58:56 (7276): No heartbeat from core client for 30 sec - exiting 08:58:57 (7276): No heartbeat from core client for 30 sec - exiting 08:58:58 (7276): No heartbeat from core client for 30 sec - exiting 08:58:59 (7276): No heartbeat from core client for 30 sec - exiting 08:59:00 (7276): No heartbeat from core client for 30 sec - exiting CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5132, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5132, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5132, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5132, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5132, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5132, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_xblf_1940_40_009151321/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_xblf_1940_40_009151321/dataout/ocean_restart.day after 11 attempts Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6012, iMonCtr=1 Model crash detected, will try to restart... cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_xblf_1940_40_009151321/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_xblf_1940_40_009151321/dataout/ocean_restart.day after 11 attempts Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6012, iMonCtr=1 Model crash detected, will try to restart... cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_xblf_1940_40_009151321/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_xblf_1940_40_009151321/dataout/ocean_restart.day after 11 attempts Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6012, iMonCtr=1 Model crash detected, will try to restart... cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_xblf_1940_40_009151321/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_xblf_1940_40_009151321/dataout/ocean_restart.day after 11 attempts Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6012, iMonCtr=1 Model crash detected, will try to restart... cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_xblf_1940_40_009151321/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_xblf_1940_40_009151321/dataout/ocean_restart.day after 11 attempts CPDN Monitor - Quit request from BOINC... cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_xblf_1940_40_009151321/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_xblf_1940_40_009151321/dataout/ocean_restart.day after 11 attempts Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4480, iMonCtr=1 Model crash detected, will try to restart... cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_xblf_1940_40_009151321/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_xblf_1940_40_009151321/dataout/ocean_restart.day after 11 attempts Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4480, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
08 May 2015 19:28:03	1327756	18120337	hadcm3n_xblf_1940_40_009151321_2	699,840	1,213,006	1.7333
08 May 2015 19:24:28	1327756	18120337	hadcm3n_xblf_1940_40_009151321_2	673,920	1,167,895	1.7330
08 May 2015 19:10:29	1327756	18120337	hadcm3n_xblf_1940_40_009151321_2	648,000	1,123,024	1.7331
27 Apr 2015 06:22:55	1327756	18120337	hadcm3n_xblf_1940_40_009151321_2	622,080	1,077,721	1.7324
26 Apr 2015 00:06:37	1327756	18120337	hadcm3n_xblf_1940_40_009151321_2	596,160	1,032,648	1.7322
24 Apr 2015 21:10:03	1327756	18120337	hadcm3n_xblf_1940_40_009151321_2	570,240	987,405	1.7316
20 Apr 2015 04:57:03	1327756	18120337	hadcm3n_xblf_1940_40_009151321_2	544,320	942,092	1.7308
18 Apr 2015 06:43:46	1327756	18120337	hadcm3n_xblf_1940_40_009151321_2	518,400	896,904	1.7301
16 Apr 2015 20:56:56	1327756	18120337	hadcm3n_xblf_1940_40_009151321_2	492,480	851,993	1.7300
15 Apr 2015 05:19:18	1327756	18120337	hadcm3n_xblf_1940_40_009151321_2	466,560	806,731	1.7291
13 Apr 2015 20:48:56	1327756	18120337	hadcm3n_xblf_1940_40_009151321_2	440,640	761,665	1.7285
10 Apr 2015 04:42:27	1327756	18120337	hadcm3n_xblf_1940_40_009151321_2	414,720	717,364	1.7298
07 Apr 2015 02:49:48	1327756	18120337	hadcm3n_xblf_1940_40_009151321_2	388,800	673,540	1.7324
05 Apr 2015 03:19:07	1327756	18120337	hadcm3n_xblf_1940_40_009151321_2	362,880	629,552	1.7349
04 Apr 2015 03:19:36	1327756	18120337	hadcm3n_xblf_1940_40_009151321_2	336,960	585,380	1.7372
02 Apr 2015 01:52:11	1327756	18120337	hadcm3n_xblf_1940_40_009151321_2	311,040	539,862	1.7357
31 Mar 2015 22:44:16	1327756	18120337	hadcm3n_xblf_1940_40_009151321_2	285,120	494,210	1.7333
29 Mar 2015 22:22:34	1327756	18120337	hadcm3n_xblf_1940_40_009151321_2	259,200	448,095	1.7288
28 Mar 2015 20:43:22	1327756	18120337	hadcm3n_xblf_1940_40_009151321_2	233,280	403,982	1.7317
27 Mar 2015 03:47:33	1327756	18120337	hadcm3n_xblf_1940_40_009151321_2	207,360	359,882	1.7355