Name | famous_v04w_1599_200_006685950_2 |
Workunit | 6889203 |
Created | 26 Aug 2010, 15:39:04 UTC |
Sent | 27 Aug 2010, 9:24:19 UTC |
Report deadline | 26 Nov 2010, 16:51:30 UTC |
Received | 20 Oct 2010, 17:01:56 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | -226 (0xFFFFFF1E) ERR_TOO_MANY_EXITS |
Computer ID | 1055314 |
Run time | 8 days 10 hours 0 min 24 sec |
CPU time | 6 days 0 hours 26 min 15 sec |
Validate state | Invalid |
Credit | 555.96 |
Device peak FLOPS | 0.67 GFLOPS |
Application version | UK Met Office FAMOUS v6.11 windows_intelx86 |
Stderr | <core_client_version>6.10.18</core_client_version> <![CDATA[ <message> too many exit(0)s </message> <stderr_txt> Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4036, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPIController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4444, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1080, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4716, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4028, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4240, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4912, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3548, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4024, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4040, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4324, iMonCtr=1 Model crash detected, will try to restart... BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 60 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 61 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3948, iMonCtr=1 Model crash detected, will try to restart... 19:15:18 (4044): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 22:14:01 (5448): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5552, iMonCtr=1 Model crash detected, will try to restart... 21:08:31 (4760): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 13:08:29 (3960): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 13:16:10 (4820): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 15:07:15 (2352): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2616, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... 19:18:48 (4080): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 15:52:42 (3668): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4068, iMonCtr=1 Model crash detected, will try to restart... 11:34:29 (5012): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 14:46:27 (4132): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 16:45:38 (5124): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:44:14 (5956): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:42:59 (4200): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4156, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3936, iMonCtr=1 Model crash detected, will try to restart... BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 60 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 61 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 14:00:34 (5112): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=556, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... 10:32:03 (4428): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:08:31 (4512): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 22:07:25 (4036): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 60 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 61 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4836, iMonCtr=1 Model crash detected, will try to restart... BUFFIN: Read Failed: Result too large BUFFIN: C I/O Error feof - Unit 60 - Return code = 16 BUFFIN: Read Failed: Result too large BUFFIN: C I/O Error feof - Unit 61 - Return code = 16 BUFFIN: Read Failed: Result too large BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: Read Failed: Result too large BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 19:41:05 (2248): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:40:09 (4452): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 12:37:30 (4740): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 15:36:28 (3092): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 17:35:37 (4284): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... </stderr_txt> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
20 Oct 2010 11:49:41 | 1055314 | 11689521 | famous_v04w_1599_200_006685950_2 | 168,506 | 510,684 | 3.0307 |
17 Oct 2010 11:08:18 | 1055314 | 11689521 | famous_v04w_1599_200_006685950_2 | 159,146 | 482,577 | 3.0323 |
16 Oct 2010 07:10:49 | 1055314 | 11689521 | famous_v04w_1599_200_006685950_2 | 149,786 | 453,854 | 3.0300 |
23 Sep 2010 06:12:17 | 1055314 | 11689521 | famous_v04w_1599_200_006685950_2 | 140,426 | 425,541 | 3.0304 |
21 Sep 2010 20:44:12 | 1055314 | 11689521 | famous_v04w_1599_200_006685950_2 | 131,066 | 397,043 | 3.0293 |
20 Sep 2010 11:14:32 | 1055314 | 11689521 | famous_v04w_1599_200_006685950_2 | 121,706 | 368,429 | 3.0272 |
15 Sep 2010 14:17:03 | 1055314 | 11689521 | famous_v04w_1599_200_006685950_2 | 112,346 | 341,279 | 3.0377 |
14 Sep 2010 15:15:58 | 1055314 | 11689521 | famous_v04w_1599_200_006685950_2 | 102,986 | 312,857 | 3.0379 |
13 Sep 2010 10:28:10 | 1055314 | 11689521 | famous_v04w_1599_200_006685950_2 | 93,626 | 284,569 | 3.0394 |
12 Sep 2010 10:01:50 | 1055314 | 11689521 | famous_v04w_1599_200_006685950_2 | 84,266 | 256,276 | 3.0413 |
11 Sep 2010 13:24:57 | 1055314 | 11689521 | famous_v04w_1599_200_006685950_2 | 74,906 | 227,951 | 3.0432 |
09 Sep 2010 17:10:18 | 1055314 | 11689521 | famous_v04w_1599_200_006685950_2 | 65,546 | 199,565 | 3.0447 |
08 Sep 2010 12:56:46 | 1055314 | 11689521 | famous_v04w_1599_200_006685950_2 | 56,186 | 171,038 | 3.0441 |
07 Sep 2010 09:34:37 | 1055314 | 11689521 | famous_v04w_1599_200_006685950_2 | 46,826 | 141,517 | 3.0222 |
04 Sep 2010 08:33:34 | 1055314 | 11689521 | famous_v04w_1599_200_006685950_2 | 37,466 | 112,549 | 3.0040 |
04 Sep 2010 06:24:41 | 1055314 | 11689521 | famous_v04w_1599_200_006685950_2 | 28,106 | 84,496 | 3.0063 |
30 Aug 2010 05:50:43 | 1055314 | 11689521 | famous_v04w_1599_200_006685950_2 | 18,746 | 56,221 | 2.9991 |
28 Aug 2010 16:59:01 | 1055314 | 11689521 | famous_v04w_1599_200_006685950_2 | 9,386 | 28,594 | 3.0465 |
©2024 cpdn.org