Task 12923952

Name	hadcm3n_o4hm_1940_40_007265975_1
Workunit	7464215
Created	2 Jun 2011, 10:14:49 UTC
Sent	2 Jun 2011, 10:14:53 UTC
Report deadline	1 Sep 2011, 17:42:04 UTC
Received	23 Aug 2011, 20:42:08 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	701281
Run time	55 days 7 hours 36 min 33 sec
CPU time	49 days 5 hours 4 min 22 sec
Validate state	Invalid
Credit	10,886.40
Device peak FLOPS	1.81 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.10.58</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> Model crashed: WRITDUMP: BAD BUFFOUT OF DATA tmp/pipe_dummy 2048 15:41:59 (3344): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... forrtl: Not enough quota is available to process this command. 11:26:11 (5916): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 11:26:12 (5916): No heartbeat from core client for 30 sec - exiting 20:24:39 (2508): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:24:40 (2508): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2388, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2388, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2388, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2388, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2388, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2388, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
13 Aug 2011 14:54:08	701281	12923952	hadcm3n_o4hm_1940_40_007265975_1	907,200	4,166,042	4.5922
12 Aug 2011 00:25:03	701281	12923952	hadcm3n_o4hm_1940_40_007265975_1	881,280	4,041,346	4.5858
10 Aug 2011 10:07:53	701281	12923952	hadcm3n_o4hm_1940_40_007265975_1	855,360	3,911,675	4.5731
08 Aug 2011 19:00:39	701281	12923952	hadcm3n_o4hm_1940_40_007265975_1	829,440	3,780,958	4.5584
30 Jul 2011 06:06:28	701281	12923952	hadcm3n_o4hm_1940_40_007265975_1	803,520	3,664,600	4.5607
28 Jul 2011 13:42:02	701281	12923952	hadcm3n_o4hm_1940_40_007265975_1	777,600	3,543,104	4.5565
27 Jul 2011 02:26:05	701281	12923952	hadcm3n_o4hm_1940_40_007265975_1	751,680	3,427,860	4.5603
25 Jul 2011 22:53:14	701281	12923952	hadcm3n_o4hm_1940_40_007265975_1	725,760	3,313,665	4.5658
25 Jul 2011 20:54:39	701281	12923952	hadcm3n_o4hm_1940_40_007265975_1	699,840	3,188,015	4.5553
25 Jul 2011 19:12:15	701281	12923952	hadcm3n_o4hm_1940_40_007265975_1	673,920	3,061,440	4.5427
25 Jul 2011 18:54:35	701281	12923952	hadcm3n_o4hm_1940_40_007265975_1	648,000	2,935,287	4.5298
25 Jul 2011 17:38:19	701281	12923952	hadcm3n_o4hm_1940_40_007265975_1	622,080	2,817,220	4.5287
25 Jul 2011 16:08:18	701281	12923952	hadcm3n_o4hm_1940_40_007265975_1	596,160	2,701,342	4.5312
25 Jul 2011 13:10:43	701281	12923952	hadcm3n_o4hm_1940_40_007265975_1	570,240	2,579,855	4.5242
25 Jul 2011 13:10:43	701281	12923952	hadcm3n_o4hm_1940_40_007265975_1	544,320	2,453,254	4.5070
25 Jul 2011 13:10:43	701281	12923952	hadcm3n_o4hm_1940_40_007265975_1	518,400	2,326,756	4.4883
10 Jul 2011 07:26:14	701281	12923952	hadcm3n_o4hm_1940_40_007265975_1	492,480	2,200,803	4.4688
08 Jul 2011 18:07:11	701281	12923952	hadcm3n_o4hm_1940_40_007265975_1	466,560	2,074,644	4.4467
07 Jul 2011 15:42:29	701281	12923952	hadcm3n_o4hm_1940_40_007265975_1	440,640	1,948,676	4.4224
05 Jul 2011 13:05:45	701281	12923952	hadcm3n_o4hm_1940_40_007265975_1	414,720	1,825,497	4.4018