Task 11038498

Name	hadsm3dhet2_jqrf_006597677_0
Workunit	6801050
Created	15 Mar 2010, 12:04:10 UTC
Sent	27 Sep 2010, 5:41:34 UTC
Report deadline	9 Sep 2011, 11:01:34 UTC
Received	21 Dec 2010, 15:11:53 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1093542
Run time	32 days 4 hours 31 min
CPU time	28 days 20 hours 58 min 34 sec
Validate state	Invalid
Credit	1,687.14
Device peak FLOPS	1.60 GFLOPS
Application version	UK Met Office HadSM3 Slab Model v6.07 windows_intelx86
Stderr	<core_client_version>6.10.18</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3700, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3272, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2672, iMonCtr=1 Model crash detected, will try to restart... CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2672, iMonCtr=1 Model crash detected, will try to restart... CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2672, iMonCtr=1 Model crash detected, will try to restart... CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3324, iMonCtr=1 Model crash detected, will try to restart... CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3324, iMonCtr=1 Model crash detected, will try to restart... CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1388, iMonCtr=1 Model crash detected, will try to restart... CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1388, iMonCtr=1 Model crash detected, will try to restart... CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1388, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1736, iMonCtr=1 Model crash detected, will try to restart... CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1736, iMonCtr=1 Model crash detected, will try to restart... CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1736, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3408, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2024, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4192, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1900, iMonCtr=1 Model crash detected, will try to restart... CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1900, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CCPDN Monitor - Quit request from BOINC... CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1300, iMonCtr=1 Model crash detected, will try to restart... CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1300, iMonCtr=1 Model crash detected, will try to restart... CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3216, iMonCtr=1 Model crash detected, will try to restart... CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4080, iMonCtr=1 Model crash detected, will try to restart... CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4080, iMonCtr=1 Model crash detected, will try to restart... CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4080, iMonCtr=1 Model crash detected, will try to restart... CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4080, iMonCtr=1 Model crash detected, will try to restart... CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4080, iMonCtr=1 Model crash detected, will try to restart... CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4080, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
09 Oct 2010 23:31:59	1093542	11038498	hadsm3dhet2_jqrf_006597677_0	183,634	310,199	1.6892
09 Oct 2010 23:31:59	1093542	11038498	hadsm3dhet2_jqrf_006597677_0	172,832	292,324	1.6914
09 Oct 2010 23:31:59	1093542	11038498	hadsm3dhet2_jqrf_006597677_0	162,030	274,801	1.6960
09 Oct 2010 23:31:59	1093542	11038498	hadsm3dhet2_jqrf_006597677_0	151,228	256,996	1.6994
06 Oct 2010 03:45:43	1093542	11038498	hadsm3dhet2_jqrf_006597677_0	140,426	239,285	1.7040
05 Oct 2010 04:42:22	1093542	11038498	hadsm3dhet2_jqrf_006597677_0	129,624	221,581	1.7094
03 Oct 2010 19:27:18	1093542	11038498	hadsm3dhet2_jqrf_006597677_0	118,822	204,210	1.7186
02 Oct 2010 19:03:16	1093542	11038498	hadsm3dhet2_jqrf_006597677_0	108,020	185,824	1.7203
02 Oct 2010 13:18:09	1093542	11038498	hadsm3dhet2_jqrf_006597677_0	97,218	167,692	1.7249
02 Oct 2010 03:07:36	1093542	11038498	hadsm3dhet2_jqrf_006597677_0	86,416	148,909	1.7232
01 Oct 2010 13:46:38	1093542	11038498	hadsm3dhet2_jqrf_006597677_0	75,614	130,141	1.7211
30 Sep 2010 11:07:44	1093542	11038498	hadsm3dhet2_jqrf_006597677_0	64,812	111,969	1.7276
30 Sep 2010 06:08:07	1093542	11038498	hadsm3dhet2_jqrf_006597677_0	54,010	94,070	1.7417
29 Sep 2010 13:13:32	1093542	11038498	hadsm3dhet2_jqrf_006597677_0	43,208	75,546	1.7484
28 Sep 2010 14:12:30	1093542	11038498	hadsm3dhet2_jqrf_006597677_0	32,406	57,479	1.7737
28 Sep 2010 06:39:04	1093542	11038498	hadsm3dhet2_jqrf_006597677_0	21,604	38,267	1.7713
27 Sep 2010 20:57:51	1093542	11038498	hadsm3dhet2_jqrf_006597677_0	10,802	18,745	1.7353