Name | hadcm3n_yeji_1980_40_007956541_3 |
Workunit | 8111653 |
Created | 9 May 2012, 7:12:03 UTC |
Sent | 9 May 2012, 8:34:13 UTC |
Report deadline | 8 Aug 2012, 16:01:24 UTC |
Received | 14 Jun 2012, 15:39:30 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | 193 (0x000000C1) EXIT_SIGNAL |
Computer ID | 1183651 |
Run time | 16 days 17 hours 34 min 31 sec |
CPU time | 15 days 6 hours 41 min 48 sec |
Validate state | Invalid |
Credit | 12,441.60 |
Device peak FLOPS | 3.25 GFLOPS |
Application version | UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86 |
Stderr | <core_client_version>7.0.25</core_client_version> <![CDATA[ <message> - exit code 193 (0xc1) </message> <stderr_txt> Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=21900, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=232, iMonCtr=1 Model crash detected, will try to restart... 13:49:45 (4864): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5580, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6016, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6668, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6856, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6456, iMonCtr=1 Model crash detected, will try to restart... 14:33:10 (4184): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 14:34:34 (2696): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 11:24:02 (3832): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6068, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7288, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5132, iMonCtr=1 Model crash detected, will try to restart... 11:10:39 (4052): No heartbeat from core client for 30 sec - exiting 11:10:41 (4052): No heartbeat from core client for 30 sec - exiting 11:10:42 (4052): No heartbeat from core client for 30 sec - exiting 11:10:43 (4052): No heartbeat from core client for 30 sec - exiting 11:10:44 (4052): No heartbeat from core client for 30 sec - exiting 11:10:45 (4052): No heartbeat from core client for 30 sec - exiting 11:10:47 (4052): No heartbeat from core client for 30 sec - exiting 11:10:48 (4052): No heartbeat from core client for 30 sec - exiting 11:10:49 (4052): No heartbeat from core client for 30 sec - exiting 11:10:50 (4052): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 11:10:51 (4052): No heartbeat from core client for 30 sec - exiting CPDN Monitor - Quit request from BOINC... 08:48:26 (5480): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 11:39:36 (6240): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 12:35:33 (4616): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 12:35:34 (4616): No heartbeat from core client for 30 sec - exiting 12:35:35 (4616): No heartbeat from core client for 30 sec - exiting 12:35:36 (4616): No heartbeat from core client for 30 sec - exiting 12:35:37 (4616): No heartbeat from core client for 30 sec - exiting 12:35:38 (4616): No heartbeat from core client for 30 sec - exiting 12:35:39 (4616): No heartbeat from core client for 30 sec - exiting 14:19:16 (6676): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6772, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4336, iMonCtr=1 Model crash detected, will try to restart... 06:54:48 (4240): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 06:54:49 (4240): No heartbeat from core client for 30 sec - exiting 06:54:50 (4240): No heartbeat from core client for 30 sec - exiting Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5468, iMonCtr=1 Model crash detected, will try to restart... 16:56:06 (6836): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 10:27:02 (6776): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 16:28:50 (6976): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1976, iMonCtr=1 Model crash detected, will try to restart... 10:47:24 (6668): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 15:11:39 (3196): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 15:11:40 (3196): No heartbeat from core client for 30 sec - exiting 15:23:02 (6600): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 17:35:12 (5516): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 17:35:57 (5604): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 17:36:48 (4888): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 18:45:34 (364): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:45:35 (364): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3348, iMonCtr=1 Model crash detected, will try to restart... 08:49:45 (844): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 08:49:46 (844): No heartbeat from core client for 30 sec - exiting 08:50:56 (6788): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5664, iMonCtr=1 Model crash detected, will try to restart... 19:09:34 (3184): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7504, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6744, iMonCtr=1 Model crash detected, will try to restart... 09:38:37 (1112): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 09:18:11 (6472): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6492, iMonCtr=1 Model crash detected, will try to restart... 14:17:21 (3628): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 16:38:43 (2380): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... C00:21:07 (7776): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 15:13:18 (6668): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Signal 11 received, exiting... Called boinc_finish </stderr_txt> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
14 Jun 2012 15:41:23 | 1183651 | 14646854 | hadcm3n_yeji_1980_40_007956541_3 | 1,036,800 | 1,320,102 | 1.2732 |
13 Jun 2012 11:23:05 | 1183651 | 14646854 | hadcm3n_yeji_1980_40_007956541_3 | 1,010,880 | 1,289,549 | 1.2757 |
12 Jun 2012 16:45:33 | 1183651 | 14646854 | hadcm3n_yeji_1980_40_007956541_3 | 984,960 | 1,257,437 | 1.2766 |
10 Jun 2012 17:13:18 | 1183651 | 14646854 | hadcm3n_yeji_1980_40_007956541_3 | 959,040 | 1,224,792 | 1.2771 |
09 Jun 2012 19:58:04 | 1183651 | 14646854 | hadcm3n_yeji_1980_40_007956541_3 | 933,120 | 1,191,199 | 1.2766 |
09 Jun 2012 09:52:52 | 1183651 | 14646854 | hadcm3n_yeji_1980_40_007956541_3 | 907,200 | 1,156,414 | 1.2747 |
08 Jun 2012 14:57:21 | 1183651 | 14646854 | hadcm3n_yeji_1980_40_007956541_3 | 881,280 | 1,121,912 | 1.2730 |
04 Jun 2012 15:28:41 | 1183651 | 14646854 | hadcm3n_yeji_1980_40_007956541_3 | 855,360 | 1,088,384 | 1.2724 |
04 Jun 2012 06:06:01 | 1183651 | 14646854 | hadcm3n_yeji_1980_40_007956541_3 | 829,440 | 1,055,700 | 1.2728 |
03 Jun 2012 21:23:25 | 1183651 | 14646854 | hadcm3n_yeji_1980_40_007956541_3 | 803,520 | 1,024,715 | 1.2753 |
03 Jun 2012 11:51:53 | 1183651 | 14646854 | hadcm3n_yeji_1980_40_007956541_3 | 777,600 | 992,712 | 1.2766 |
02 Jun 2012 13:34:49 | 1183651 | 14646854 | hadcm3n_yeji_1980_40_007956541_3 | 751,680 | 959,514 | 1.2765 |
01 Jun 2012 16:02:44 | 1183651 | 14646854 | hadcm3n_yeji_1980_40_007956541_3 | 725,760 | 925,491 | 1.2752 |
31 May 2012 21:46:52 | 1183651 | 14646854 | hadcm3n_yeji_1980_40_007956541_3 | 699,840 | 892,011 | 1.2746 |
31 May 2012 10:08:04 | 1183651 | 14646854 | hadcm3n_yeji_1980_40_007956541_3 | 673,920 | 858,591 | 1.2740 |
30 May 2012 14:01:48 | 1183651 | 14646854 | hadcm3n_yeji_1980_40_007956541_3 | 648,000 | 823,802 | 1.2713 |
29 May 2012 17:29:22 | 1183651 | 14646854 | hadcm3n_yeji_1980_40_007956541_3 | 622,080 | 790,163 | 1.2702 |
28 May 2012 21:45:08 | 1183651 | 14646854 | hadcm3n_yeji_1980_40_007956541_3 | 596,160 | 756,425 | 1.2688 |
28 May 2012 11:43:31 | 1183651 | 14646854 | hadcm3n_yeji_1980_40_007956541_3 | 570,240 | 722,523 | 1.2671 |
27 May 2012 14:53:44 | 1183651 | 14646854 | hadcm3n_yeji_1980_40_007956541_3 | 544,320 | 688,421 | 1.2647 |
©2024 cpdn.org