Task 15807285

Name	hadcm3n_n4zl_1920_40_008377808_0
Workunit	8528667
Created	30 May 2013, 13:07:31 UTC
Sent	31 May 2013, 12:27:46 UTC
Report deadline	30 Aug 2013, 19:54:57 UTC
Received	5 Jun 2013, 22:09:03 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1211982
Run time	5 days 5 hours 39 min 6 sec
CPU time	4 days 20 hours 5 min 6 sec
Validate state	Invalid
Credit	4,354.56
Device peak FLOPS	3.54 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>7.0.28</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> 11:22:39 (10188): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Signal 22 received, exiting... Called boinc_finish Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7064, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7064, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7064, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7064, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7064, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7064, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
05 Jun 2013 22:11:04	1211982	15807285	hadcm3n_n4zl_1920_40_008377808_0	362,880	417,537	1.1506
05 Jun 2013 09:30:44	1211982	15807285	hadcm3n_n4zl_1920_40_008377808_0	336,960	387,764	1.1508
05 Jun 2013 00:31:49	1211982	15807285	hadcm3n_n4zl_1920_40_008377808_0	311,040	357,806	1.1504
04 Jun 2013 15:07:41	1211982	15807285	hadcm3n_n4zl_1920_40_008377808_0	285,120	327,868	1.1499
04 Jun 2013 06:18:45	1211982	15807285	hadcm3n_n4zl_1920_40_008377808_0	259,200	297,909	1.1493
03 Jun 2013 21:17:41	1211982	15807285	hadcm3n_n4zl_1920_40_008377808_0	233,280	267,955	1.1486
03 Jun 2013 12:51:57	1211982	15807285	hadcm3n_n4zl_1920_40_008377808_0	207,360	238,128	1.1484
03 Jun 2013 03:54:43	1211982	15807285	hadcm3n_n4zl_1920_40_008377808_0	181,440	208,227	1.1476
02 Jun 2013 17:52:16	1211982	15807285	hadcm3n_n4zl_1920_40_008377808_0	155,520	178,320	1.1466
02 Jun 2013 08:58:48	1211982	15807285	hadcm3n_n4zl_1920_40_008377808_0	129,600	148,431	1.1453
02 Jun 2013 00:05:30	1211982	15807285	hadcm3n_n4zl_1920_40_008377808_0	103,680	118,693	1.1448
01 Jun 2013 15:11:55	1211982	15807285	hadcm3n_n4zl_1920_40_008377808_0	77,760	89,285	1.1482
01 Jun 2013 06:27:13	1211982	15807285	hadcm3n_n4zl_1920_40_008377808_0	51,840	59,846	1.1544
31 May 2013 21:28:07	1211982	15807285	hadcm3n_n4zl_1920_40_008377808_0	25,920	29,889	1.1531