Task 12736382

Name	hadcm3n_o1r0_1900_40_007197599_1
Workunit	7395879
Created	28 Mar 2011, 14:01:19 UTC
Sent	1 Apr 2011, 13:24:52 UTC
Report deadline	1 Jul 2011, 20:52:03 UTC
Received	1 May 2011, 20:40:55 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1053444
Run time	12 days 13 hours 57 min 6 sec
CPU time	10 days 17 hours 37 min 50 sec
Validate state	Invalid
Credit	4,976.64
Device peak FLOPS	2.07 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.10.18</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> CPDN Monitor - Quit request from BOINC... 21:01:19 (7612): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:02:13 (7612): No heartbeat from core client for 30 sec - exiting 21:02:14 (7612): No heartbeat from core client for 30 sec - exiting 21:02:15 (7612): No heartbeat from core client for 30 sec - exiting 21:02:16 (7612): No heartbeat from core client for 30 sec - exiting 21:03:55 (7728): No heartbeat from core client for 30 sec - exiting 21:03:56 (7728): No heartbeat from core client for 30 sec - exiting 21:03:57 (7728): No heartbeat from core client for 30 sec - exiting 21:03:58 (7728): No heartbeat from core client for 30 sec - exiting 21:03:59 (7728): No heartbeat from core client for 30 sec - exiting 21:04:00 (7728): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... forrtl: The requested operation cannot be performed on a file with a user-mapped section open. Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1372, iMonCtr=1 Model crash detected, will try to restart... 18:18:04 (2504): No heartbeat from core client for 30 sec - exiting 18:18:05 (2504): No heartbeat from core client for 30 sec - exiting 18:18:06 (2504): No heartbeat from core client for 30 sec - exiting 18:18:07 (2504): No heartbeat from core client for 30 sec - exiting 18:18:08 (2504): No heartbeat from core client for 30 sec - exiting 18:18:09 (2504): No heartbeat from core client for 30 sec - exiting 18:18:10 (2504): No heartbeat from core client for 30 sec - exiting 18:18:11 (2504): No heartbeat from core client for 30 sec - exiting 18:18:13 (2504): No heartbeat from core client for 30 sec - exiting 18:18:14 (2504): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5884, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3512, iMonCtr=1 Model crash detected, will try to restart... 22:49:42 (6012): No heartbeat from core client for 30 sec - exiting 22:49:44 (6012): No heartbeat from core client for 30 sec - exiting 22:49:45 (6012): No heartbeat from core client for 30 sec - exiting 22:49:46 (6012): No heartbeat from core client for 30 sec - exiting 22:49:47 (6012): No heartbeat from core client for 30 sec - exiting 22:49:48 (6012): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 12:28:29 (6108): No heartbeat from core client for 30 sec - exiting 12:28:38 (6108): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 09:22:56 (2300): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 09:22:57 (2300): No heartbeat from core client for 30 sec - exiting 09:22:58 (2300): No heartbeat from core client for 30 sec - exiting 09:22:59 (2300): No heartbeat from core client for 30 sec - exiting 09:23:00 (2300): No heartbeat from core client for 30 sec - exiting 09:23:01 (2300): No heartbeat from core client for 30 sec - exiting 09:23:02 (2300): No heartbeat from core client for 30 sec - exiting 09:23:03 (2300): No heartbeat from core client for 30 sec - exiting 09:23:04 (2300): No heartbeat from core client for 30 sec - exiting 09:23:05 (2300): No heartbeat from core client for 30 sec - exiting 09:23:06 (2300): No heartbeat from core client for 30 sec - exiting 09:27:43 (3348): No heartbeat from core client for 30 sec - exiting 09:27:44 (3348): No heartbeat from core client for 30 sec - exiting 09:27:45 (3348): No heartbeat from core client for 30 sec - exiting 09:27:46 (3348): No heartbeat from core client for 30 sec - exiting 09:27:47 (3348): No heartbeat from core client for 30 sec - exiting 09:27:48 (3348): No heartbeat from core client for 30 sec - exiting 09:27:49 (3348): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4068, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4956, iMonCtr=1 Model crash detected, will try to restart... 14:38:51 (5928): No heartbeat from core client for 30 sec - exiting 14:38:57 (5928): No heartbeat from core client for 30 sec - exiting 14:38:58 (5928): No heartbeat from core client for 30 sec - exiting 14:38:59 (5928): No heartbeat from core client for 30 sec - exiting 14:39:00 (5928): No heartbeat from core client for 30 sec - exiting 14:39:01 (5928): No heartbeat from core client for 30 sec - exiting 14:39:02 (5928): No heartbeat from core client for 30 sec - exiting 14:39:03 (5928): No heartbeat from core client for 30 sec - exiting 14:39:04 (5928): No heartbeat from core client for 30 sec - exiting 14:39:05 (5928): No heartbeat from core client for 30 sec - exiting 14:39:06 (5928): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5256, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5992, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5992, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5992, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5992, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5992, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5992, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
30 Apr 2011 08:08:52	1053444	12736382	hadcm3n_o1r0_1900_40_007197599_1	414,720	896,944	2.1628
29 Apr 2011 13:05:53	1053444	12736382	hadcm3n_o1r0_1900_40_007197599_1	388,800	842,016	2.1657
28 Apr 2011 18:38:41	1053444	12736382	hadcm3n_o1r0_1900_40_007197599_1	362,880	785,864	2.1656
25 Apr 2011 19:07:56	1053444	12736382	hadcm3n_o1r0_1900_40_007197599_1	336,960	729,627	2.1653
23 Apr 2011 21:39:06	1053444	12736382	hadcm3n_o1r0_1900_40_007197599_1	311,040	675,060	2.1703
21 Apr 2011 17:30:58	1053444	12736382	hadcm3n_o1r0_1900_40_007197599_1	285,120	624,398	2.1899
20 Apr 2011 21:58:45	1053444	12736382	hadcm3n_o1r0_1900_40_007197599_1	259,200	566,238	2.1846
20 Apr 2011 18:26:35	1053444	12736382	hadcm3n_o1r0_1900_40_007197599_1	233,280	506,825	2.1726
20 Apr 2011 18:26:35	1053444	12736382	hadcm3n_o1r0_1900_40_007197599_1	207,360	448,720	2.1640
20 Apr 2011 18:26:35	1053444	12736382	hadcm3n_o1r0_1900_40_007197599_1	181,440	389,852	2.1487
20 Apr 2011 18:26:35	1053444	12736382	hadcm3n_o1r0_1900_40_007197599_1	155,520	335,090	2.1546
12 Apr 2011 08:50:49	1053444	12736382	hadcm3n_o1r0_1900_40_007197599_1	129,600	279,495	2.1566
10 Apr 2011 17:55:22	1053444	12736382	hadcm3n_o1r0_1900_40_007197599_1	103,680	223,701	2.1576
08 Apr 2011 19:59:02	1053444	12736382	hadcm3n_o1r0_1900_40_007197599_1	77,760	166,823	2.1454
04 Apr 2011 15:30:37	1053444	12736382	hadcm3n_o1r0_1900_40_007197599_1	51,840	110,996	2.1411
03 Apr 2011 23:44:23	1053444	12736382	hadcm3n_o1r0_1900_40_007197599_1	25,920	57,384	2.2139