Task 15876419

Name	hadcm3n_39bc_1940_40_008261718_4
Workunit	8416842
Created	2 Jul 2013, 5:00:52 UTC
Sent	2 Jul 2013, 15:57:30 UTC
Report deadline	1 Oct 2013, 23:24:41 UTC
Received	14 Aug 2013, 16:19:03 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1042468
Run time	20 days 19 hours 26 min 59 sec
CPU time	18 days 23 hours 45 min 26 sec
Validate state	Invalid
Credit	12,130.56
Device peak FLOPS	2.47 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>7.0.64</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> 06:15:07 (6148): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 17:14:06 (12604): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:11:45 (16072): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=8852, iMonCtr=1 Model crash detected, will try to restart... 18:55:34 (6308): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 09:19:07 (5368): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 14:18:06 (1704): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... BUFFIN: C I/O Error feof - Unit 63 - Return code = 16 BUFFIN: C I/O Error feof - Unit 64 - Return code = 16 BUFFIN: C I/O Error feof - Unit 65 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 Error converting file to netcdf: dataout/39bcko.pjh9c10 Error converting file to netcdf: dataout/39bcko.pih9c10 Error converting file to netcdf: dataout/39bcko.pfh9c10 Error converting file to netcdf: dataout/39bcka.phh9c10 Error converting file to netcdf: dataout/39bcka.pgh9c10 Error converting file to netcdf: dataout/39bcka.peh9c10 Error converting file to netcdf: dataout/39bcka.pdh9c10 Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6544, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6544, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6544, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6544, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 19:54:03 (4088): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6740, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6740, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6740, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
24 Jul 2013 18:39:07	1042468	15876419	hadcm3n_39bc_1940_40_008261718_4	1,010,880	1,624,015	1.6065
24 Jul 2013 06:17:09	1042468	15876419	hadcm3n_39bc_1940_40_008261718_4	984,960	1,583,432	1.6076
23 Jul 2013 22:16:05	1042468	15876419	hadcm3n_39bc_1940_40_008261718_4	959,040	1,542,974	1.6089
23 Jul 2013 21:50:30	1042468	15876419	hadcm3n_39bc_1940_40_008261718_4	933,120	1,502,477	1.6102
23 Jul 2013 21:35:11	1042468	15876419	hadcm3n_39bc_1940_40_008261718_4	907,200	1,461,047	1.6105
23 Jul 2013 21:18:39	1042468	15876419	hadcm3n_39bc_1940_40_008261718_4	881,280	1,419,386	1.6106
23 Jul 2013 20:51:07	1042468	15876419	hadcm3n_39bc_1940_40_008261718_4	855,360	1,377,427	1.6103
23 Jul 2013 20:32:34	1042468	15876419	hadcm3n_39bc_1940_40_008261718_4	829,440	1,335,057	1.6096
23 Jul 2013 20:12:29	1042468	15876419	hadcm3n_39bc_1940_40_008261718_4	803,520	1,293,823	1.6102
23 Jul 2013 19:50:39	1042468	15876419	hadcm3n_39bc_1940_40_008261718_4	777,600	1,252,265	1.6104
23 Jul 2013 19:16:08	1042468	15876419	hadcm3n_39bc_1940_40_008261718_4	751,680	1,211,333	1.6115
23 Jul 2013 16:59:46	1042468	15876419	hadcm3n_39bc_1940_40_008261718_4	725,760	1,169,998	1.6121
23 Jul 2013 15:37:51	1042468	15876419	hadcm3n_39bc_1940_40_008261718_4	699,840	1,128,623	1.6127
23 Jul 2013 15:37:51	1042468	15876419	hadcm3n_39bc_1940_40_008261718_4	673,920	1,087,159	1.6132
23 Jul 2013 15:37:51	1042468	15876419	hadcm3n_39bc_1940_40_008261718_4	648,000	1,045,276	1.6131
23 Jul 2013 15:37:51	1042468	15876419	hadcm3n_39bc_1940_40_008261718_4	622,080	1,002,827	1.6121
23 Jul 2013 15:37:51	1042468	15876419	hadcm3n_39bc_1940_40_008261718_4	596,160	958,048	1.6070
23 Jul 2013 15:37:51	1042468	15876419	hadcm3n_39bc_1940_40_008261718_4	570,240	911,717	1.5988
23 Jul 2013 15:37:51	1042468	15876419	hadcm3n_39bc_1940_40_008261718_4	544,320	869,789	1.5979
23 Jul 2013 15:37:51	1042468	15876419	hadcm3n_39bc_1940_40_008261718_4	518,400	828,167	1.5975