Name | hadcm3n_ye77_1980_40_007833382_2 |
Workunit | 7988494 |
Created | 19 Mar 2012, 11:21:05 UTC |
Sent | 19 Mar 2012, 11:21:37 UTC |
Report deadline | 18 Jun 2012, 18:48:48 UTC |
Received | 5 Jun 2012, 14:23:14 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | 194 (0x000000C2) EXIT_ABORTED_BY_CLIENT |
Computer ID | 1115924 |
Run time | 18 days 7 hours 25 min 11 sec |
CPU time | 17 days 17 hours 22 min 23 sec |
Validate state | Invalid |
Credit | 12,441.60 |
Device peak FLOPS | 2.65 GFLOPS |
Application version | UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86 |
Stderr | <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> Got ack for job that's till active </message> <stderr_txt> Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=8036, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4748, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4844, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4844, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6132, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4268, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2216, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5848, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=796, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4732, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 08:23:45 (5592): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4996, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5140, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5312, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2676, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3108, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5644, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5340, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5300, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5032, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5916, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4288, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3960, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4156, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4896, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5636, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5556, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4244, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4064, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4064, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4948, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2600, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3964, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3964, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5612, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2784, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2784, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4192, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4192, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4192, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4192, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5856, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5936, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5620, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5620, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2604, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2604, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2604, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3232, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2680, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5476, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5476, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5476, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5628, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5628, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5628, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5916, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5820, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5380, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... zip error: Could not create output file (was replacing the original zip file) 09:05:51 (4424): handle_file_upload_status: can't open boinc_ufs_cpdnout2.zip 09:05:51 (4424): handle_file_upload_status: can't open boinc_ufs_cpdnout3.zip cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_ye77_1980_40_007833382/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_ye77_1980_40_007833382/dataout/ocean_restart.day after 11 attempts CPDN Monitor - Abort request from BOINC... Called boinc_finish </stderr_txt> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
05 Jun 2012 07:09:50 | 1115924 | 14289644 | hadcm3n_ye77_1980_40_007833382_2 | 1,036,800 | 1,531,413 | 1.4771 |
01 Jun 2012 13:29:30 | 1115924 | 14289644 | hadcm3n_ye77_1980_40_007833382_2 | 1,010,880 | 1,493,757 | 1.4777 |
31 May 2012 19:15:26 | 1115924 | 14289644 | hadcm3n_ye77_1980_40_007833382_2 | 984,960 | 1,455,922 | 1.4782 |
29 May 2012 11:33:03 | 1115924 | 14289644 | hadcm3n_ye77_1980_40_007833382_2 | 959,040 | 1,417,735 | 1.4783 |
25 May 2012 10:45:55 | 1115924 | 14289644 | hadcm3n_ye77_1980_40_007833382_2 | 933,120 | 1,381,479 | 1.4805 |
24 May 2012 08:09:07 | 1115924 | 14289644 | hadcm3n_ye77_1980_40_007833382_2 | 907,200 | 1,341,238 | 1.4784 |
22 May 2012 14:15:55 | 1115924 | 14289644 | hadcm3n_ye77_1980_40_007833382_2 | 881,280 | 1,301,440 | 1.4768 |
21 May 2012 11:12:14 | 1115924 | 14289644 | hadcm3n_ye77_1980_40_007833382_2 | 855,360 | 1,263,853 | 1.4776 |
18 May 2012 08:56:49 | 1115924 | 14289644 | hadcm3n_ye77_1980_40_007833382_2 | 829,440 | 1,225,859 | 1.4779 |
16 May 2012 07:30:49 | 1115924 | 14289644 | hadcm3n_ye77_1980_40_007833382_2 | 803,520 | 1,187,245 | 1.4776 |
14 May 2012 12:53:10 | 1115924 | 14289644 | hadcm3n_ye77_1980_40_007833382_2 | 777,600 | 1,148,992 | 1.4776 |
11 May 2012 12:06:26 | 1115924 | 14289644 | hadcm3n_ye77_1980_40_007833382_2 | 751,680 | 1,111,400 | 1.4786 |
10 May 2012 10:52:15 | 1115924 | 14289644 | hadcm3n_ye77_1980_40_007833382_2 | 725,760 | 1,073,378 | 1.4790 |
09 May 2012 08:24:00 | 1115924 | 14289644 | hadcm3n_ye77_1980_40_007833382_2 | 699,840 | 1,035,059 | 1.4790 |
08 May 2012 06:04:27 | 1115924 | 14289644 | hadcm3n_ye77_1980_40_007833382_2 | 673,920 | 997,458 | 1.4801 |
04 May 2012 13:35:45 | 1115924 | 14289644 | hadcm3n_ye77_1980_40_007833382_2 | 648,000 | 959,991 | 1.4815 |
03 May 2012 12:39:43 | 1115924 | 14289644 | hadcm3n_ye77_1980_40_007833382_2 | 622,080 | 921,761 | 1.4817 |
02 May 2012 07:19:10 | 1115924 | 14289644 | hadcm3n_ye77_1980_40_007833382_2 | 596,160 | 882,435 | 1.4802 |
30 Apr 2012 07:27:55 | 1115924 | 14289644 | hadcm3n_ye77_1980_40_007833382_2 | 570,240 | 842,899 | 1.4781 |
27 Apr 2012 05:53:29 | 1115924 | 14289644 | hadcm3n_ye77_1980_40_007833382_2 | 544,320 | 804,821 | 1.4786 |
©2024 cpdn.org