Task 15523929

Name	hadcm3n_o5gf_2100_40_008254261_4
Workunit	8409385
Created	6 Jan 2013, 0:18:30 UTC
Sent	6 Jan 2013, 0:18:40 UTC
Report deadline	7 Apr 2013, 7:45:51 UTC
Received	18 Feb 2013, 1:00:03 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	193 (0x000000C1) EXIT_SIGNAL
Computer ID	1253802
Run time	11 days 16 hours 31 min 53 sec
CPU time	11 days 5 hours 42 min 32 sec
Validate state	Invalid
Credit	5,287.68
Device peak FLOPS	3.11 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>7.0.28</core_client_version> <![CDATA[ <message> - exit code 193 (0xc1) </message> <stderr_txt> Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=8644, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=8644, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=8644, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 19:40:33 (8844): No heartbeat from core client for 30 sec - exiting 19:40:34 (8844): No heartbeat from core client for 30 sec - exiting 19:40:35 (8844): No heartbeat from core client for 30 sec - exiting 19:40:36 (8844): No heartbeat from core client for 30 sec - exiting 19:40:37 (8844): No heartbeat from core client for 30 sec - exiting 19:40:38 (8844): No heartbeat from core client for 30 sec - exiting 19:40:39 (8844): No heartbeat from core client for 30 sec - exiting 19:40:40 (8844): No heartbeat from core client for 30 sec - exiting 19:40:41 (8844): No heartbeat from core client for 30 sec - exiting 19:40:42 (8844): No heartbeat from core client for 30 sec - exiting 19:40:43 (8844): No heartbeat from core client for 30 sec - exiting 19:40:44 (8844): No heartbeat from core client for 30 sec - exiting 19:40:45 (8844): No heartbeat from core client for 30 sec - exiting 19:40:46 (8844): No heartbeat from core client for 30 sec - exiting 19:40:47 (8844): No heartbeat from core client for 30 sec - exiting 19:40:48 (8844): No heartbeat from core client for 30 sec - exiting 19:40:49 (8844): No heartbeat from core client for 30 sec - exiting 19:40:50 (8844): No heartbeat from core client for 30 sec - exiting 19:40:51 (8844): No heartbeat from core client for 30 sec - exiting 19:40:52 (8844): No heartbeat from core client for 30 sec - exiting 19:40:53 (8844): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 12:24:41 (5932): No heartbeat from core client for 30 sec - exiting 12:24:42 (5932): No heartbeat from core client for 30 sec - exiting 12:24:43 (5932): No heartbeat from core client for 30 sec - exiting 12:24:44 (5932): No heartbeat from core client for 30 sec - exiting 12:24:45 (5932): No heartbeat from core client for 30 sec - exiting 12:24:46 (5932): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 08:55:55 (6520): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 13:05:11 (4812): No heartbeat from core client for 30 sec - exiting 13:05:12 (4812): No heartbeat from core client for 30 sec - exiting 13:05:13 (4812): No heartbeat from core client for 30 sec - exiting 13:05:14 (4812): No heartbeat from core client for 30 sec - exiting 13:05:15 (4812): No heartbeat from core client for 30 sec - exiting 13:05:16 (4812): No heartbeat from core client for 30 sec - exiting 13:05:17 (4812): No heartbeat from core client for 30 sec - exiting 13:05:18 (4812): No heartbeat from core client for 30 sec - exiting 13:05:19 (4812): No heartbeat from core client for 30 sec - exiting 13:05:20 (4812): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5432, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5432, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2072, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2072, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5232, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5232, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4920, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4920, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4920, iMonCtr=1 Model crash detected, will try to restart... 00:17:15 (6104): No heartbeat from core client for 30 sec - exiting 00:17:16 (6104): No heartbeat from core client for 30 sec - exiting 00:17:17 (6104): No heartbeat from core client for 30 sec - exiting 00:17:18 (6104): No heartbeat from core client for 30 sec - exiting 00:17:19 (6104): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... BUFFIN: C I/O Error feof - Unit 63 - Return code = 16 BUFFIN: C I/O Error feof - Unit 64 - Return code = 16 BUFFIN: C I/O Error feof - Unit 65 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 Error converting file to netcdf: dataout/o5gfko.pju9c10 Error converting file to netcdf: dataout/o5gfko.piu9c10 Error converting file to netcdf: dataout/o5gfko.pfu9c10 Error converting file to netcdf: dataout/o5gfka.phu9c10 Error converting file to netcdf: dataout/o5gfka.pgu9c10 Error converting file to netcdf: dataout/o5gfka.peu9c10 Error converting file to netcdf: dataout/o5gfka.pdu9c10 CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Signal 11 received, exiting... Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
16 Jan 2013 07:12:13	1253802	15523929	hadcm3n_o5gf_2100_40_008254261_4	440,640	598,893	1.3591
15 Jan 2013 21:18:21	1253802	15523929	hadcm3n_o5gf_2100_40_008254261_4	414,720	564,282	1.3606
15 Jan 2013 03:27:20	1253802	15523929	hadcm3n_o5gf_2100_40_008254261_4	388,800	529,306	1.3614
14 Jan 2013 17:11:26	1253802	15523929	hadcm3n_o5gf_2100_40_008254261_4	362,880	493,784	1.3607
13 Jan 2013 23:20:11	1253802	15523929	hadcm3n_o5gf_2100_40_008254261_4	336,960	458,299	1.3601
13 Jan 2013 13:20:08	1253802	15523929	hadcm3n_o5gf_2100_40_008254261_4	311,040	422,974	1.3599
13 Jan 2013 03:18:36	1253802	15523929	hadcm3n_o5gf_2100_40_008254261_4	285,120	387,468	1.3590
12 Jan 2013 17:16:07	1253802	15523929	hadcm3n_o5gf_2100_40_008254261_4	259,200	351,819	1.3573
12 Jan 2013 07:19:01	1253802	15523929	hadcm3n_o5gf_2100_40_008254261_4	233,280	316,468	1.3566
11 Jan 2013 20:35:54	1253802	15523929	hadcm3n_o5gf_2100_40_008254261_4	207,360	281,152	1.3559
10 Jan 2013 21:31:46	1253802	15523929	hadcm3n_o5gf_2100_40_008254261_4	181,440	245,537	1.3533
09 Jan 2013 08:10:06	1253802	15523929	hadcm3n_o5gf_2100_40_008254261_4	155,520	209,882	1.3495
08 Jan 2013 19:49:31	1253802	15523929	hadcm3n_o5gf_2100_40_008254261_4	129,600	174,856	1.3492
08 Jan 2013 03:45:55	1253802	15523929	hadcm3n_o5gf_2100_40_008254261_4	103,680	140,034	1.3506
07 Jan 2013 06:35:33	1253802	15523929	hadcm3n_o5gf_2100_40_008254261_4	77,760	105,083	1.3514
06 Jan 2013 20:42:03	1253802	15523929	hadcm3n_o5gf_2100_40_008254261_4	51,840	70,233	1.3548
06 Jan 2013 11:31:43	1253802	15523929	hadcm3n_o5gf_2100_40_008254261_4	25,920	34,997	1.3502