Task 12900950

Name	hadcm3n_p7bz_1900_40_007227087_2
Workunit	7425327
Created	23 May 2011, 12:56:40 UTC
Sent	23 May 2011, 12:56:44 UTC
Report deadline	22 Aug 2011, 20:23:55 UTC
Received	1 Aug 2011, 6:38:18 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1073842
Run time	9 days 19 hours 28 min 29 sec
CPU time	9 days 3 hours 11 min 47 sec
Validate state	Invalid
Credit	4,665.60
Device peak FLOPS	2.71 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.10.43</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3604, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4404, iMonCtr=1 Model crash detected, will try to restart... 08:38:51 (4836): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4652, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4216, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5432, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5432, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 08:50:37 (4636): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5528, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4688, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1944, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5528, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3312, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4764, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3832, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4396, iMonCtr=1 Model crash detected, will try to restart... 07:59:34 (4336): No heartbeat from core client for 30 sec - exiting 07:59:35 (4336): No heartbeat from core client for 30 sec - exiting 07:59:36 (4336): No heartbeat from core client for 30 sec - exiting 07:59:37 (4336): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4432, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4256, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5140, selfPID=5140, iMonCtr=1 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4836, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4836, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4836, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4836, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4836, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4836, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
29 Jul 2011 13:20:54	1073842	12900950	hadcm3n_p7bz_1900_40_007227087_2	388,800	788,417	2.0278
26 Jul 2011 07:51:40	1073842	12900950	hadcm3n_p7bz_1900_40_007227087_2	362,880	735,416	2.0266
25 Jul 2011 19:06:58	1073842	12900950	hadcm3n_p7bz_1900_40_007227087_2	336,960	682,253	2.0247
25 Jul 2011 19:06:58	1073842	12900950	hadcm3n_p7bz_1900_40_007227087_2	311,040	629,833	2.0249
25 Jul 2011 17:15:21	1073842	12900950	hadcm3n_p7bz_1900_40_007227087_2	285,120	577,182	2.0243
25 Jul 2011 17:15:21	1073842	12900950	hadcm3n_p7bz_1900_40_007227087_2	259,200	524,484	2.0235
01 Jul 2011 07:03:03	1073842	12900950	hadcm3n_p7bz_1900_40_007227087_2	233,280	472,207	2.0242
24 Jun 2011 11:43:09	1073842	12900950	hadcm3n_p7bz_1900_40_007227087_2	207,360	420,160	2.0262
22 Jun 2011 06:09:55	1073842	12900950	hadcm3n_p7bz_1900_40_007227087_2	181,440	368,357	2.0302
15 Jun 2011 10:37:25	1073842	12900950	hadcm3n_p7bz_1900_40_007227087_2	155,520	316,207	2.0332
10 Jun 2011 08:03:35	1073842	12900950	hadcm3n_p7bz_1900_40_007227087_2	129,600	263,672	2.0345
08 Jun 2011 07:04:32	1073842	12900950	hadcm3n_p7bz_1900_40_007227087_2	103,680	211,189	2.0369
06 Jun 2011 06:19:05	1073842	12900950	hadcm3n_p7bz_1900_40_007227087_2	77,760	158,760	2.0417
31 May 2011 10:49:17	1073842	12900950	hadcm3n_p7bz_1900_40_007227087_2	51,840	106,137	2.0474
26 May 2011 13:16:25	1073842	12900950	hadcm3n_p7bz_1900_40_007227087_2	25,920	52,059	2.0084