Name | hadcm3n_o772_2060_40_008116481_4 |
Workunit | 8271595 |
Created | 30 Oct 2012, 2:33:11 UTC |
Sent | 30 Oct 2012, 2:33:13 UTC |
Report deadline | 29 Jan 2013, 10:00:24 UTC |
Received | 10 Jan 2013, 21:36:07 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | -226 (0xFFFFFF1E) ERR_TOO_MANY_EXITS |
Computer ID | 1183189 |
Run time | 18 days 20 hours 27 min 13 sec |
CPU time | 18 days 6 hours 30 min 2 sec |
Validate state | Invalid |
Credit | 9,953.28 |
Device peak FLOPS | 2.33 GFLOPS |
Application version | UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86 |
Stderr | <core_client_version>7.0.28</core_client_version> <![CDATA[ <message> too many exit(0)s </message> <stderr_txt> 04:04:50 (5736): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 03:47:55 (5080): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:41:25 (6364): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 05:40:18 (11228): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 15:10:43 (5260): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 21:09:39 (8748): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 05:08:34 (11092): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 09:07:28 (24040): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 15:06:19 (25584): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:05:16 (29424): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 02:04:08 (27104): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=39940, iMonCtr=1 Model crash detected, will try to restart... 19:40:46 (13140): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 02:57:11 (22628): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 08:56:04 (27692): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 14:54:54 (31768): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:53:52 (35016): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 01:52:49 (40420): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 06:51:48 (44180): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 11:50:42 (47320): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 16:49:40 (51572): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 21:48:36 (56092): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 02:47:31 (55332): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 08:46:22 (64600): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 15:45:12 (64020): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 15:31:00 (5372): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:29:47 (11748): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=14784, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... 07:35:45 (12892): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 11:34:46 (15352): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 17:33:35 (18604): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 21:32:30 (23892): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 02:31:24 (26024): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 07:30:20 (31384): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 12:29:20 (33248): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:35:00 (2300): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=11368, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1356, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1356, iMonCtr=1 Model crash detected, will try to restart... 23:09:28 (24860): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 05:08:56 (26916): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 11:07:47 (26192): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=33000, iMonCtr=1 Model crash detected, will try to restart... 19:51:51 (4972): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 15:17:45 (10180): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - No 'heartbeat' from BOINC... 21:16:42 (4068): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=10912, iMonCtr=1 Model crash detected, will try to restart... 09:27:09 (18856): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=20188, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5764, iMonCtr=1 Model crash detected, will try to restart... 23:04:42 (4320): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 05:03:34 (9092): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 11:02:23 (13728): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 17:01:15 (17348): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=23212, iMonCtr=1 Model crash detected, will try to restart... 20:52:21 (4272): No heartbeat from core client for 30 sec - exiting 20:52:22 (4272): No heartbeat from core client for 30 sec - exiting 20:52:23 (4272): No heartbeat from core client for 30 sec - exiting 20:52:24 (4272): No heartbeat from core client for 30 sec - exiting 20:52:25 (4272): No heartbeat from core client for 30 sec - exiting 20:52:26 (4272): No heartbeat from core client for 30 sec - exiting 20:52:27 (4272): No heartbeat from core client for 30 sec - exiting 20:52:28 (4272): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4980, iMonCtr=1 Model crash detected, will try to restart... 15:51:12 (10996): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4884, iMonCtr=1 Model crash detected, will try to restart... 13:38:32 (9024): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5368, iMonCtr=1 Model crash detected, will try to restart... 21:03:35 (20148): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 03:02:29 (18620): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 09:01:25 (26396): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 15:00:16 (31724): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:59:10 (35884): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=35164, iMonCtr=1 Model crash detected, will try to restart... 23:58:08 (40428): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 12:21:27 (7336): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 18:20:23 (11240): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 00:19:18 (13668): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 14:58:49 (13972): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:57:47 (14180): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5596, iMonCtr=1 Model crash detected, will try to restart... 02:42:46 (5132): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 10:41:41 (11764): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:40:39 (14480): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=22984, iMonCtr=1 Model crash detected, will try to restart... 00:36:10 (2420): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 07:35:06 (7868): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 14:34:00 (15676): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:32:58 (22080): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 02:31:58 (25484): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=10796, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... 01:42:57 (10020): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 05:41:50 (8456): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - No 'heartbeat' from BOINC... 13:50:41 (3392): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:44:23 (3596): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 03:37:47 (5788): No heartbeat from core client for 30 sec - exiting 03:37:48 (5788): No heartbeat from core client for 30 sec - exiting 03:37:49 (5788): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:43:17 (2804): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... </stderr_txt> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
10 Jan 2013 02:47:38 | 1183189 | 15419029 | hadcm3n_o772_2060_40_008116481_4 | 829,440 | 1,577,432 | 1.9018 |
22 Dec 2012 18:41:45 | 1183189 | 15419029 | hadcm3n_o772_2060_40_008116481_4 | 803,520 | 1,530,274 | 1.9045 |
19 Dec 2012 22:39:05 | 1183189 | 15419029 | hadcm3n_o772_2060_40_008116481_4 | 777,600 | 1,478,292 | 1.9011 |
19 Dec 2012 07:06:38 | 1183189 | 15419029 | hadcm3n_o772_2060_40_008116481_4 | 751,680 | 1,422,442 | 1.8924 |
17 Dec 2012 21:14:11 | 1183189 | 15419029 | hadcm3n_o772_2060_40_008116481_4 | 725,760 | 1,369,904 | 1.8875 |
17 Dec 2012 05:19:03 | 1183189 | 15419029 | hadcm3n_o772_2060_40_008116481_4 | 699,840 | 1,316,525 | 1.8812 |
15 Dec 2012 22:07:05 | 1183189 | 15419029 | hadcm3n_o772_2060_40_008116481_4 | 673,920 | 1,262,625 | 1.8736 |
13 Dec 2012 19:28:11 | 1183189 | 15419029 | hadcm3n_o772_2060_40_008116481_4 | 648,000 | 1,209,486 | 1.8665 |
13 Dec 2012 17:35:07 | 1183189 | 15419029 | hadcm3n_o772_2060_40_008116481_4 | 622,080 | 1,157,840 | 1.8612 |
13 Dec 2012 17:35:07 | 1183189 | 15419029 | hadcm3n_o772_2060_40_008116481_4 | 596,160 | 1,105,163 | 1.8538 |
13 Dec 2012 17:35:07 | 1183189 | 15419029 | hadcm3n_o772_2060_40_008116481_4 | 570,240 | 1,052,005 | 1.8448 |
03 Dec 2012 03:33:22 | 1183189 | 15419029 | hadcm3n_o772_2060_40_008116481_4 | 544,320 | 994,257 | 1.8266 |
28 Nov 2012 20:08:11 | 1183189 | 15419029 | hadcm3n_o772_2060_40_008116481_4 | 518,400 | 939,166 | 1.8117 |
28 Nov 2012 01:55:12 | 1183189 | 15419029 | hadcm3n_o772_2060_40_008116481_4 | 492,480 | 877,723 | 1.7823 |
25 Nov 2012 14:11:34 | 1183189 | 15419029 | hadcm3n_o772_2060_40_008116481_4 | 466,560 | 820,680 | 1.7590 |
22 Nov 2012 21:01:32 | 1183189 | 15419029 | hadcm3n_o772_2060_40_008116481_4 | 440,640 | 769,580 | 1.7465 |
18 Nov 2012 01:38:56 | 1183189 | 15419029 | hadcm3n_o772_2060_40_008116481_4 | 414,720 | 722,116 | 1.7412 |
15 Nov 2012 13:34:54 | 1183189 | 15419029 | hadcm3n_o772_2060_40_008116481_4 | 388,800 | 673,405 | 1.7320 |
15 Nov 2012 00:38:19 | 1183189 | 15419029 | hadcm3n_o772_2060_40_008116481_4 | 362,880 | 626,436 | 1.7263 |
14 Nov 2012 10:54:15 | 1183189 | 15419029 | hadcm3n_o772_2060_40_008116481_4 | 336,960 | 580,066 | 1.7215 |
©2024 cpdn.org