Task 13363822

Name	hadcm3n_t72u_1940_40_007449483_0
Workunit	7646986
Created	10 Sep 2011, 0:27:26 UTC
Sent	10 Sep 2011, 0:31:51 UTC
Report deadline	10 Dec 2011, 7:59:02 UTC
Received	11 Dec 2011, 22:23:09 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	25 (0x00000019) Unknown error code
Computer ID	1096439
Run time	12 days 20 hours 45 min 47 sec
CPU time	12 days 20 hours 45 min 47 sec
Validate state	Invalid
Credit	6,531.84
Device peak FLOPS	1.98 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.10.18</core_client_version> <![CDATA[ <message> The drive cannot locate a specific area or track on the disk. (0x19) - exit code 25 (0x19) </message> <stderr_txt> Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4904, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4904, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2672, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5036, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4872, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2576, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2576, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4200, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4536, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4536, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3396, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3396, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2768, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6844, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5996, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=932, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4588, iMonCtr=1 Model crash detected, will try to restart... 19:10:07 (5880): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4848, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6864, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3864, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6428, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=772, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2996, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=356, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4768, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6672, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6880, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=584, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4664, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4684, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5084, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4248, iMonCtr=1 Model crash detected, will try to restart... Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
09 Dec 2011 03:51:29	1096439	13363822	hadcm3n_t72u_1940_40_007449483_0	544,320	1,076,973	1.9786
04 Dec 2011 22:53:29	1096439	13363822	hadcm3n_t72u_1940_40_007449483_0	518,400	1,024,577	1.9764
02 Dec 2011 01:09:52	1096439	13363822	hadcm3n_t72u_1940_40_007449483_0	492,480	971,836	1.9734
27 Nov 2011 23:37:36	1096439	13363822	hadcm3n_t72u_1940_40_007449483_0	466,560	920,598	1.9732
26 Nov 2011 03:18:11	1096439	13363822	hadcm3n_t72u_1940_40_007449483_0	440,640	869,179	1.9725
24 Nov 2011 21:57:51	1096439	13363822	hadcm3n_t72u_1940_40_007449483_0	414,720	817,624	1.9715
20 Nov 2011 23:41:14	1096439	13363822	hadcm3n_t72u_1940_40_007449483_0	388,800	765,304	1.9684
19 Nov 2011 16:30:45	1096439	13363822	hadcm3n_t72u_1940_40_007449483_0	362,880	714,530	1.9691
16 Nov 2011 01:56:51	1096439	13363822	hadcm3n_t72u_1940_40_007449483_0	336,960	664,341	1.9716
06 Nov 2011 21:21:18	1096439	13363822	hadcm3n_t72u_1940_40_007449483_0	311,040	614,116	1.9744
05 Nov 2011 14:21:12	1096439	13363822	hadcm3n_t72u_1940_40_007449483_0	285,120	563,986	1.9781
31 Oct 2011 19:21:11	1096439	13363822	hadcm3n_t72u_1940_40_007449483_0	259,200	513,115	1.9796
31 Oct 2011 17:21:53	1096439	13363822	hadcm3n_t72u_1940_40_007449483_0	233,280	463,012	1.9848
31 Oct 2011 15:25:45	1096439	13363822	hadcm3n_t72u_1940_40_007449483_0	207,360	413,036	1.9919
31 Oct 2011 15:25:45	1096439	13363822	hadcm3n_t72u_1940_40_007449483_0	181,440	363,354	2.0026
10 Oct 2011 17:31:26	1096439	13363822	hadcm3n_t72u_1940_40_007449483_0	155,520	313,072	2.0131
07 Oct 2011 01:17:10	1096439	13363822	hadcm3n_t72u_1940_40_007449483_0	129,600	261,996	2.0216
01 Oct 2011 21:20:51	1096439	13363822	hadcm3n_t72u_1940_40_007449483_0	103,680	210,099	2.0264
28 Sep 2011 00:33:15	1096439	13363822	hadcm3n_t72u_1940_40_007449483_0	77,760	159,554	2.0519
18 Sep 2011 22:35:24	1096439	13363822	hadcm3n_t72u_1940_40_007449483_0	51,840	107,966	2.0827