Name | famous_wnqe_1199_200_007117869_0 |
Workunit | 7316229 |
Created | 16 Jan 2011, 15:29:25 UTC |
Sent | 18 Jan 2011, 21:23:47 UTC |
Report deadline | 20 Apr 2011, 4:50:58 UTC |
Received | 13 Jun 2011, 22:42:41 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | -226 (0xFFFFFF1E) ERR_TOO_MANY_EXITS |
Computer ID | 1103724 |
Run time | 30 days 4 hours 30 min 8 sec |
CPU time | 18 days 19 hours 1 min 15 sec |
Validate state | Invalid |
Credit | 3,675.00 |
Device peak FLOPS | 0.82 GFLOPS |
Application version | UK Met Office FAMOUS v6.11 windows_intelx86 |
Stderr | <core_client_version>6.6.28</core_client_version> <![CDATA[ <message> too many exit(0)s </message> <stderr_txt> 09:51:40 (2168): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3728, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2512, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4484, iMonCtr=1 Model crash detected, will try to restart... 09:52:52 (6088): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 60 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 61 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 CCPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 07:43:32 (1724): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 07:44:23 (2704): Can't acquire lockfile (32) - waiting 35s CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4972, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 60 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 61 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 60 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 61 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5588, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2260, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3648, iMonCtr=1 Model crash detected, will try to restart... 16:13:54 (6124): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1660, iMonCtr=1 Model crash detected, will try to restart... C09:35:09 (3552): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4456, iMonCtr=1 Model crash detected, will try to restart... BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 60 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 61 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4708, iMonCtr=1 Model crash detected, will try to restart... 18:45:26 (2072): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5976, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3408, iMonCtr=1 Model crash detected, will try to restart... 15:35:04 (1576): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2096, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5080, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4420, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5468, iMonCtr=1 Model crash detected, will try to restart... 23:17:28 (5536): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:17:49 (4772): Can't acquire lockfile (32) - waiting 35s Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4664, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3484, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... 09:30:11 (3324): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 00:20:44 (6040): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 08:54:25 (2620): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6048, iMonCtr=1 Model crash detected, will try to restart... 06:24:24 (4716): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 06:24:40 (4716): No heartbeat from core client for 30 sec - exiting BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 60 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 61 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 BUFFIN: Read Failed: Result too large BUFFIN: C I/O Error feof - Unit 60 - Return code = 16 BUFFIN: Read Failed: Result too large BUFFIN: C I/O Error feof - Unit 61 - Return code = 16 BUFFIN: Read Failed: Result too large BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: Read Failed: Result too large BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4540, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... C09:20:12 (5656): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 11:05:13 (4964): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 14:11:43 (3856): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 14:11:44 (3856): No heartbeat from core client for 30 sec - exiting 14:11:45 (3856): No heartbeat from core client for 30 sec - exiting 14:11:46 (3856): No heartbeat from core client for 30 sec - exiting 14:11:47 (3856): No heartbeat from core client for 30 sec - exiting 14:11:48 (3856): No heartbeat from core client for 30 sec - exiting 14:11:49 (3856): No heartbeat from core client for 30 sec - exiting 14:11:50 (3856): No heartbeat from core client for 30 sec - exiting 14:11:51 (3856): No heartbeat from core client for 30 sec - exiting 14:11:52 (3856): No heartbeat from core client for 30 sec - exiting 14:11:53 (3856): No heartbeat from core client for 30 sec - exiting 19:45:52 (1560): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 11:38:55 (4576): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 15:56:55 (5624): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4772, iMonCtr=1 Model crash detected, will try to restart... BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 60 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 61 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 </stderr_txt> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
13 Jun 2011 18:55:41 | 1103724 | 12492371 | famous_wnqe_1199_200_007117869_0 | 1,113,866 | 1,617,457 | 1.4521 |
10 Jun 2011 15:27:15 | 1103724 | 12492371 | famous_wnqe_1199_200_007117869_0 | 1,104,506 | 1,593,103 | 1.4424 |
08 Jun 2011 06:17:44 | 1103724 | 12492371 | famous_wnqe_1199_200_007117869_0 | 1,095,146 | 1,577,461 | 1.4404 |
07 Jun 2011 05:02:26 | 1103724 | 12492371 | famous_wnqe_1199_200_007117869_0 | 1,085,786 | 1,563,135 | 1.4396 |
07 Jun 2011 00:28:46 | 1103724 | 12492371 | famous_wnqe_1199_200_007117869_0 | 1,076,426 | 1,549,591 | 1.4396 |
06 Jun 2011 17:09:01 | 1103724 | 12492371 | famous_wnqe_1199_200_007117869_0 | 1,067,066 | 1,533,186 | 1.4368 |
06 Jun 2011 00:56:30 | 1103724 | 12492371 | famous_wnqe_1199_200_007117869_0 | 1,057,706 | 1,518,398 | 1.4356 |
05 Jun 2011 04:09:44 | 1103724 | 12492371 | famous_wnqe_1199_200_007117869_0 | 1,048,346 | 1,498,967 | 1.4298 |
03 Jun 2011 17:56:07 | 1103724 | 12492371 | famous_wnqe_1199_200_007117869_0 | 1,038,986 | 1,478,304 | 1.4228 |
02 Jun 2011 14:31:46 | 1103724 | 12492371 | famous_wnqe_1199_200_007117869_0 | 1,029,626 | 1,461,753 | 1.4197 |
01 Jun 2011 14:21:17 | 1103724 | 12492371 | famous_wnqe_1199_200_007117869_0 | 1,020,266 | 1,441,346 | 1.4127 |
31 May 2011 17:13:04 | 1103724 | 12492371 | famous_wnqe_1199_200_007117869_0 | 1,010,906 | 1,422,543 | 1.4072 |
28 May 2011 11:25:23 | 1103724 | 12492371 | famous_wnqe_1199_200_007117869_0 | 1,001,546 | 1,403,478 | 1.4013 |
26 May 2011 13:36:34 | 1103724 | 12492371 | famous_wnqe_1199_200_007117869_0 | 992,186 | 1,382,632 | 1.3935 |
23 May 2011 15:41:55 | 1103724 | 12492371 | famous_wnqe_1199_200_007117869_0 | 982,826 | 1,358,343 | 1.3821 |
20 May 2011 13:17:44 | 1103724 | 12492371 | famous_wnqe_1199_200_007117869_0 | 973,466 | 1,335,840 | 1.3723 |
16 May 2011 14:46:16 | 1103724 | 12492371 | famous_wnqe_1199_200_007117869_0 | 964,106 | 1,309,072 | 1.3578 |
11 May 2011 16:10:41 | 1103724 | 12492371 | famous_wnqe_1199_200_007117869_0 | 954,746 | 1,275,726 | 1.3362 |
08 May 2011 02:30:29 | 1103724 | 12492371 | famous_wnqe_1199_200_007117869_0 | 945,386 | 1,258,786 | 1.3315 |
05 May 2011 19:19:33 | 1103724 | 12492371 | famous_wnqe_1199_200_007117869_0 | 936,026 | 1,237,986 | 1.3226 |
©2024 cpdn.org