climateprediction.net home page
Too Many Trickles??

Too Many Trickles??

Questions and Answers : Windows : Too Many Trickles??
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
old_user470806

Send message
Joined: 6 Sep 07
Posts: 47
Credit: 23,188
RAC: 0
Message 34070 - Posted: 15 Jun 2008, 23:06:01 UTC

Normally, I run BOINC/HadCM3 a total of about 50 hours per week and it sends a trickle about once every 5 or 6 days, and uploads a .ZIP file(s) every few weeks.

However, I ran the computer (and the model) almost continuously from about 11:00 PM on Thursday (June 12) through early Sunday morning (June 15), with one brief interruption to back-up the BOINC files and to do some other operations that shouldn\'t run while the model is executing. Here is an excerpt from stdoutae.txt (edited to remove extraneous messages, mostly .DLL initialization errors):

(12-Jun-2008 messages and 13-Jun-2008 prior to the following run were routine)

13-Jun-2008 20:34:47 [---] Starting BOINC client version 5.10.45 for windows_intelx86
13-Jun-2008 20:34:47 [---] log flags: task, file_xfer, sched_ops
13-Jun-2008 20:34:47 [---] Libraries: libcurl/7.18.0 OpenSSL/0.9.8e zlib/1.2.3
13-Jun-2008 20:34:47 [---] Data directory: C:\\Program Files\\BOINC
13-Jun-2008 20:34:47 [---] Processor: 1 AuthenticAMD AMD Athlon(tm) XP 2400+ [x86 Family 6 Model 8 Stepping 1]
13-Jun-2008 20:34:47 [---] Processor features: fpu tsc sse 3dnow mmx
13-Jun-2008 20:34:47 [---] OS: Microsoft Windows XP: Home Edition, Service Pack 3, (05.01.2600.00)
13-Jun-2008 20:34:47 [---] Memory: 1.94 GB physical, 2.29 GB virtual
13-Jun-2008 20:34:47 [---] Disk: 57.26 GB total, 13.24 GB free
13-Jun-2008 20:34:47 [---] Local time is UTC -4 hours
13-Jun-2008 20:34:47 [climateprediction.net] URL: http://climateprediction.net/; Computer ID: 753323; location: (none); project prefs: default
13-Jun-2008 20:34:47 [---] General prefs: from climateprediction.net (last modified 18-Sep-2007 17:32:13)
13-Jun-2008 20:34:47 [---] Host location: none
13-Jun-2008 20:34:47 [---] General prefs: using your defaults
13-Jun-2008 20:34:47 [---] Reading preferences override file
13-Jun-2008 20:34:47 [---] Preferences limit memory usage when active to 768.55MB
13-Jun-2008 20:34:47 [---] Preferences limit memory usage when idle to 1031.35MB
13-Jun-2008 20:34:47 [---] Preferences limit disk usage to 8.59GB
13-Jun-2008 20:34:47 [climateprediction.net] Task hadcm3iozn_cpm4_2000_80_45898965_7 is 142.91 days overdue.
13-Jun-2008 20:34:47 [climateprediction.net] You may not get credit for it. Consider aborting it.
13-Jun-2008 20:34:47 [climateprediction.net] Restarting task hadcm3iozn_cpm4_2000_80_45898965_7 using hadcm3i version 544

....

14-Jun-2008 06:10:14 [climateprediction.net] Sending scheduler request: To send trickle-up message. Requesting 0 seconds of work, reporting 0 completed tasks
14-Jun-2008 06:10:19 [climateprediction.net] Scheduler request succeeded: got 0 new tasks

....

14-Jun-2008 06:37:44 [climateprediction.net] Sending scheduler request: To send trickle-up message. Requesting 0 seconds of work, reporting 0 completed tasks
14-Jun-2008 06:37:49 [climateprediction.net] Scheduler request succeeded: got 0 new tasks

....

14-Jun-2008 11:25:09 [climateprediction.net] Sending scheduler request: To send trickle-up message. Requesting 0 seconds of work, reporting 0 completed tasks
14-Jun-2008 11:25:14 [climateprediction.net] Scheduler request succeeded: got 0 new tasks

....

15-Jun-2008 06:38:30 [---] Suspending computation - user request
15-Jun-2008 06:41:17 [---] Exit requested by user


The model ran at least 30 hours before the first trickle-up request \"succeeded\". Then the next request follows by only 27 minutes, and the third one follows the second after 4 hours 48 minutes.

I\'ve been running the model since last September, and I\'ve never seen anything like this before (the model is approaching 80% completion!). Does it indicate some sort of problem with execution of the model??


|
| --- Stardance
|
| nil carborundum illegitimi
ID: 34070 · Report as offensive     Reply Quote
Profile Iain Inglis

Send message
Joined: 9 Jan 07
Posts: 467
Credit: 14,549,176
RAC: 317
Message 34071 - Posted: 15 Jun 2008, 23:24:01 UTC
Last modified: 15 Jun 2008, 23:27:09 UTC

The first trickle was submitted at 06:10 local time, which is 10:10 UTC - and there is a trickle logged at 10:09:35 UTC on the results page. So it looks as if that first trickle was accepted correctly, but the others may be spurious repeats. Perhaps the model has hit a problem and is looping through the day/month/year cycle, which will all be much the same if the problem is in early December (i.e. just after a trickle).

What is happening in the graphics? Is the model time reverting to the beginning of December?

If it is looping, then an exit/restart may get it going again. Failing that, try restoring a backup.

PS The application version is 5.44, which shouldn\'t loop for long. After six crashes it\'ll abort with a sensible error message - probably \'negative pressure\' or \'negative theta\'.
ID: 34071 · Report as offensive     Reply Quote
old_user470806

Send message
Joined: 6 Sep 07
Posts: 47
Credit: 23,188
RAC: 0
Message 34072 - Posted: 15 Jun 2008, 23:57:20 UTC - in response to Message 34071.  

The first trickle was submitted at 06:10 local time, which is 10:10 UTC - and there is a trickle logged at 10:09:35 UTC on the results page. So it looks as if that first trickle was accepted correctly, but the others may be spurious repeats. ....


Thank-you for that information.


What is happening in the graphics? Is the model time reverting to the beginning of December?


The BOINC/HadCM3 graphics displayed on my computer are dismal, which is a bit strange considering that my computer has some powerful graphics display capabilities. \"Graphics Available\" shows a black panel with a globe and the continents on it drawn with white lines, no other colors or objects. The lefthand column is white and has something indiscernible at the bottom. Pressing the \"Z\" key removes the lefthand column and displays a temperature scale to the right of the globe, all upon a black background; the scale has colors for different temperature ranges, but they are not displayed on the globe. The software does not respond to pressing the \"8\" key. So, as far as I can determine, I do not have any way to ascertain the \"current date\" that is being processed by the model, or whether it has gone into a loop, etc. IMHO, the software probably could use some serious updating with respect to the information it discloses about what the model is doing.

I shut the system down shortly after exiting the model at the time reported by the excerpt from stdoutae.txt. I started running the model again about an hour before I posted my first message on this thread. So far, it has not sent a trickle-up message today.


|
| --- Stardance
|
| nil carborundum illegitimi
ID: 34072 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 34073 - Posted: 16 Jun 2008, 1:28:16 UTC
Last modified: 16 Jun 2008, 1:31:50 UTC

Hi Stardance

The abnormal graphics have nothing to do with your graphics card and they\'re not the fault of the CPDN graphics program. Abnormal graphics (except in the case of HADSM models after a phase change) are in my experience always an indication that the processing of the model has gone wrong in some way. Usually the abnormal graphics are accompanied by much slower or much faster processing (sec/timestep).

This sort of problem is mostly encountered with HADSM models, but another member recently had a HADCM with the same symptoms.

You can\'t check the model dates in the graphics window, so could you please check the model\'s % progress in the Task window. Write it down on paper together with wall-clock time. If the % goes back it will indicate looping.

I see your extra \'trickles\' haven\'t appeared on your model\'s web page, at least not yet:

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=6800810

This model had been doing about 4 sec/timestep, so it should crunch about 1 model year ie one trickle approx every 24 hours.

You could certainly try restoring a backup made before the graphics & trickles problem started, but the same problem would probably reoccur at the same point. I don\'t think we\'ve seen a case reported where a model with abnormal graphics has been restored from backup and then completed successfully, though most of the cases we\'ve heard of were HADSMs not HADCMs. If you restore and the model completes successfully, this will be very interesting. It\'s worth a try.
Cpdn news
ID: 34073 · Report as offensive     Reply Quote
old_user470806

Send message
Joined: 6 Sep 07
Posts: 47
Credit: 23,188
RAC: 0
Message 34074 - Posted: 16 Jun 2008, 10:04:57 UTC - in response to Message 34073.  
Last modified: 16 Jun 2008, 10:12:37 UTC


The abnormal graphics have nothing to do with your graphics card and they\'re not the fault of the CPDN graphics program. Abnormal graphics (except in the case of HADSM models after a phase change) are in my experience always an indication that the processing of the model has gone wrong in some way. Usually the abnormal graphics are accompanied by much slower or much faster processing (sec/timestep).


BOINC has always displayed the \"graphics\" that I described since the first day that I began running the HadCM3 model. (AFAIK it has never displayed the \"screen saver\", which I eventually deleted from the Windows subdirectory where it was stored.)

The current Progress is 78.713% as of 05:25 EDST (USA East Coast) on 06/16/08, and I will keep an eye on it to see whether it is ever reported to be less. It is now higher than it was at the start of the run on Sunday evening.

According to the firewall network log, if memory serves, CPDNet was contacted three times, at the respective time for each trickle that was reported by the BOINC messages. As to why they haven\'t appeared (yet?) on my model\'s web page, I have no idea.

I have kept the backup that was made on Friday evening and I have one from the preceeding week (before the run that began on 06/08/08), also for each of the three weeks before that (I could go back to the backup made before I installed the current version of the software, or even to a prior version ...).

So far, I have not restored the state of the model\'s computation from backup and have simply let the model continue whatever it is doing. It has not sent a trickle since the last one that was reported in the message excerpt that I posted. If the software doesn\'t stop running of its own accord, or the science folks don\'t tell me that it is producing erroneous output, then I don\'t plan to return to a previous backup.
|
| --- Stardance
|
| nil carborundum illegitimi
ID: 34074 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 34075 - Posted: 16 Jun 2008, 10:16:10 UTC
Last modified: 16 Jun 2008, 10:32:01 UTC

Stardance said
BOINC has always displayed the \"graphics\" that I described in my initial post since the first day that I began running the HadCM3 model. AFAIK it has never displayed the \"screen saver\", which I eventually deleted from the Windows subdirectory where it was stored.

In this case your abnormal graphics may be related not to a problem with the model\'s processing, but to you deleting this file from the Windows subdirectory.

Are you sure you deleted a file from Windows itself? Or was it a file in the BOINC folder?

Let the model continue to crunch while we try to work out why your model graphics don\'t display properly. Your model graph certainly looks perfectly normal for the years crunched so far, so if the graphics have been abnormal since crunching day 1, the graphics problem can\'t be related to a model processing problem.

If you right-click on an empty area of the computer\'s desktop, then select Properties, then the Screensaver tab, then open the drop-down list of Windows default screensavers, can you select and display any of them? (You also need to click Apply and then OK.)
Cpdn news
ID: 34075 · Report as offensive     Reply Quote
Profile Iain Inglis

Send message
Joined: 9 Jan 07
Posts: 467
Credit: 14,549,176
RAC: 317
Message 34076 - Posted: 16 Jun 2008, 11:00:25 UTC
Last modified: 16 Jun 2008, 11:03:05 UTC

One of the things BOINC does rather well is version upgrades. The current BOINC version for Windows is 5.10.45, which is available here (don\'t attempt the major version number upgrade to 6 - that isn\'t stable yet). So, to fix any missing files, just suspend and exit your existing BOINC and install the latest version of BOINC in the same folder - the models will be untouched. The advantage will be that with the graphics working you can then see the model date.

I think the installation asks whether you want the screensaver, but if it doesn\'t then just select another screensaver (or none) using the normal Windows method.

In your situation, I would do what you\'re doing - let it run for a bit. If it keeps reverting (judged, as Mo says, by the percentage or by a date if you can get one), then restore a backup. Things should be clear in a day or so. Not long to wait after all the processing that has been done so far ...
ID: 34076 · Report as offensive     Reply Quote
old_user470806

Send message
Joined: 6 Sep 07
Posts: 47
Credit: 23,188
RAC: 0
Message 34079 - Posted: 16 Jun 2008, 20:14:07 UTC - in response to Message 34075.  


In this case your abnormal graphics may be related not to a problem with the model\'s processing, but to you deleting this file from the Windows subdirectory.

Are you sure you deleted a file from Windows itself? Or was it a file in the BOINC folder?


Please let me emphasize: BOINC has always output the graphics display that I described beginning with the first day that I ran the HadCM3 model. I\'ve never found anything that has changed that display, such as updating the nVidia Force 2 chipset drivers.

BOINC produced such graphics long before I deleted the BOINC screen saver file, which was in a Windows subdirectory with all of the other screen saver choices that are displayed by the Desktop > Properties > Screen Saver > Screen saver dropdown. Everyone here says that we should not choose the BOINC screen saver because updating the data that it displays significantly slows down the progress of the model. BOINC apparently launched the screen saver during each run, according to the Sunbelt (Kerio) Personal Firewall behavior log, but it was never displayed as output on my monitor. Curiously, this only appeared in the log after I added a rule for the firewall to prevent the BOINC screen saver from being run (thus occupying memory). Eventually, I deleted the file because I got tired of seeing the log show BOINC\'s attempts to run it. (I haven\'t looked for a copy of the BOINC screen saver in the BOINC directory, but Windows will not look for one there regardless.)

Please note that Windows XP (the OS) has the feature and the sole responsibility for running a screen saver, if any is to be run. I have always chosen \"Mystify\" as the one to run.

Let the model continue to crunch while we try to work out why your model graphics don\'t display properly. Your model graph certainly looks perfectly normal for the years crunched so far, so if the graphics have been abnormal since crunching day 1, the graphics problem can\'t be related to a model processing problem.


Certainly, provided that the HadCM3 output is valid, then there is no reason to stop or to return to a backup.
|
| --- Stardance
|
| nil carborundum illegitimi
ID: 34079 · Report as offensive     Reply Quote
old_user470806

Send message
Joined: 6 Sep 07
Posts: 47
Credit: 23,188
RAC: 0
Message 34080 - Posted: 16 Jun 2008, 20:26:53 UTC - in response to Message 34076.  

Ian: BOINC/HadCM3 version 5.10.45 is the one that I\'m running. As far as I know, there aren\'t any files missing from its directory and subdirectories. Please see my further reply to mo.v about the graphics.

In your situation, I would do what you\'re doing - let it run for a bit. If it keeps reverting (judged, as Mo says, by the percentage or by a date if you can get one), then restore a backup. Things should be clear in a day or so. Not long to wait after all the processing that has been done so far ...


That is what I\'ve been thinking, too. Thanks.
|
| --- Stardance
|
| nil carborundum illegitimi
ID: 34080 · Report as offensive     Reply Quote
Profile Iain Inglis

Send message
Joined: 9 Jan 07
Posts: 467
Credit: 14,549,176
RAC: 317
Message 34082 - Posted: 16 Jun 2008, 23:19:52 UTC

... you might find that the \'mystify\' screen saver also consumes a bit of the processor\'s time. The advice not to use the BOINC screensaver really ought to be extended to advise against running any screensaver - unless it\'s really boring.

The graphics problem is new to me. I\'ve certainly had lots of problems, as others have, with HADSM3 graphics after phase changes; and completely messed up HADCM3 graphics on machines managed remotely (via Remote Desktop Connection and the like). A third kind of problem has been graphics popping up in the wrong session or running continuously on a PC with multiple logins. But since you\'ve been through all the suggestions about drivers etc., it\'ll just have to remain as an oddity unless someone else has another idea.
ID: 34082 · Report as offensive     Reply Quote
old_user470806

Send message
Joined: 6 Sep 07
Posts: 47
Credit: 23,188
RAC: 0
Message 34091 - Posted: 18 Jun 2008, 2:36:13 UTC
Last modified: 18 Jun 2008, 3:25:43 UTC

The table below shows the percentage of completion of one entire HadCM3 simulation run as I recorded it at the times shown. My time zone is Eastern Daylight Savings Time (USA), which is, I believe, currently UTC-4 (because we set our clocks one hour ahead when Daylight Savings Time begins).
CPDN Progress
Note:  when the time is the same for two percentage values, then the value changed while I was recording it.
Date     Time  per cent  Notes
-------- ----- --------  ---------------------------------
06/16/08 05:25 78.713    End of run (BOINC suspend & exit)         
06/16/08 21:20 78.713    Start of run, initial value
         21:20 78.694    immediate drop
         21:31 78.707 
         21:35 78.710 
         21:44 78.712 
         21:44 78.713    return to initial value
         21:47 78.714
         21:51 78.716 
         22:01 78.719
         22:10 78.720
         22:24 78.723
         22:37 78.727
         22:42 78.729
         22:50 78.731
         22:59 78.734
         23:08 78.737
         23:33 ......    trickle-up sent
06/17/08 00:40 78.776
         01:00 78.782
         01:10 78.785
         01:43 78.792
         02:05 78.797         
         02:16 ......    DLL initialization error!!         
         02:26 78.781    major drop!
         02:28 78.782
         02:32 78.783
         02:41 78.785
         02:45 78.786
         02:50 78.787
         02:56 78.788
         02:56 78.789
         03:03 78.790
         03:08 78.791
         03:09 78.792
         03:15 78.793
         03:23 78.794
         03:27 78.795
         03:30 78.796
         03:35 78.797    return to 02:05 value         
         03:39 78.798
         03:41 78.799
         03:47 78.800    approx.  6-minute interval
         03:50 78.802
         04:03 78.803    approx. 13-minute interval
         04:09 78.804    approx.  6-minute interval
         04:12 78.805
         04:15 78.806
         04:20 78.807
         04:30 78.809    approx. 10-minute interval
         04:35 78.810
         04:45 78.811    approx. 10-minute interval
         04:49 78.812
         04:49 78.813
         ..... ......

06/17/08 05:29 78.828    End of run (BOINC suspend & exit)
________________________________________________________________

First, I have no idea as to why the percentage immediately dropped when the run began, i.e., why the model dropped back to a previous point in time (percentage) from which it returned to the initial point in time (percentage) after about 25 minutes of recalculation.

Second, on the face of it, the error message set:

6/17/2008 2:16:09 AM|climateprediction.net|Task hadcm3iozn_cpm4_2000_80_45898965_7 exited with a DLL initialization error.
6/17/2008 2:16:09 AM|climateprediction.net|If this happens repeatedly you may need to reboot your computer.
6/17/2008 2:16:09 AM|climateprediction.net|Restarting task hadcm3iozn_cpm4_2000_80_45898965_7 using hadcm3i version 544

is really bad news. Regardless of whether any such initialization error actually occurred, the Sunbelt Personal Firewall behavior log shows that BOINC must re-start the HadCM3 model when that \"error\" occurs, just as the last line in the set implies. As the table above shows, the model required at least one hour and almost twenty minutes to reach the level of completion at which it had arrived BEFORE the \"initialization error\".

Since I began running version 5.10.45, there has been only one run of about eight hours in which that error did not occur at all. It usually occurs at least once, more often twice, and frequently three times during a run of more than four or five hours. IMHO, it is not acceptable if a large percentage of the computing time is eaten up by recovering from this \"error\". One must also wonder whether the recalculated data output is always the same at a time to which the model returns after dropping back to a previous time.
|
| --- Stardance
|
| nil carborundum illegitimi
ID: 34091 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 34092 - Posted: 18 Jun 2008, 3:49:37 UTC

First, I have no idea as to why the percentage immediately dropped when the run began
This is normal behavour for a model stopped some distance after the last savepoint (checkpoint).
The model is just re-starting from the last time the data was saved.


DLL initialization error
This is the replacement (BOINC) message for: Result ...... exited with zero status but no \'finished\' file.
I forget at what BOINC version this message changed, but there\'s some notes about the \'original\' here.

This error does\'t affect many people, and then not all of the time, according to previous posts.

ID: 34092 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 34094 - Posted: 18 Jun 2008, 10:37:05 UTC
Last modified: 18 Jun 2008, 10:37:46 UTC

Hi again Stardance

Your model trickled for 2062 on Saturday 14 June and for 2063 on Tues 17 June at 1632960 timesteps.

It\'s an 80-year HADCM so each decade is 12.5%, each year is 1.25%, each model day is 0.00347% and each 6-day save/checkpoint period is 0.0208%.

On Tuesday when it trickled in early December it had reached 78.75%.

23:08 78.737
23:33 ...... trickle-up sent
06/17/08 00:40 78.776


After the next 6-day checkpoint the model will reach 78.7708% and after the next one, 78.7916%. (I think this will be at the end of 18 Dec.)

I think that after the trickle your model looped after one checkpoint and 5 days:

02:05 78.797
02:16 ...... DLL initialization error!!
02:26 78.781 major drop!


78.78815% is the end of a model day, so it went back I think a day. But it seems to have got through the 78.7916% checkpoint OK and also the 78.8124% checkpoint. So it only looped once, recovered, and then progressed.

I think it probably also looped in early December 2062, but on an earlier day in December, and passed the trickle point several times. It therefore sent the same trickle several times. The server disregards repeat trickles.

You have a problematic model here. But your graph looks fine. So far the model has recovered from its loops and progressed. But if it crashes after failing to progress after looping 6 times, just let it go and don\'t restore it. In fact if I were you I\'d stop spending time on making backups of this model.

Stardance said
Since I began running version 5.10.45, there has been only one run of about eight hours in which that error did not occur at all.


Are you sure you didn\'t get this error before upgrading to 5.10.45? Would anyone recommend that Stardance should reinstall 5.10.45 again or for the time being at least go back to an earlier version of BOINC? I think the problem lies with the model and not with BOINC but I\'m not sure. I think it\'s a coincidence that the model has turned into a frequent looper since you upgraded BOINC.
Cpdn news
ID: 34094 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,098,944
RAC: 3,014
Message 34095 - Posted: 18 Jun 2008, 15:10:28 UTC - in response to Message 34094.  

Hi again Stardance

Your model trickled for 2062 on Saturday 14 June and for 2063 on Tues 17 June at 1632960 timesteps.

It\'s an 80-year HADCM so each decade is 12.5%, each year is 1.25%, each model day is 0.00347% and each 6-day save/checkpoint period is 0.0208%.

On Tuesday when it trickled in early December it had reached 78.75%.

23:08 78.737
23:33 ...... trickle-up sent
06/17/08 00:40 78.776


After the next 6-day checkpoint the model will reach 78.7708% and after the next one, 78.7916%. (I think this will be at the end of 18 Dec.)

I think that after the trickle your model looped after one checkpoint and 5 days:

02:05 78.797
02:16 ...... DLL initialization error!!
02:26 78.781 major drop!


78.78815% is the end of a model day, so it went back I think a day. But it seems to have got through the 78.7916% checkpoint OK and also the 78.8124% checkpoint. So it only looped once, recovered, and then progressed.

I think it probably also looped in early December 2062, but on an earlier day in December, and passed the trickle point several times. It therefore sent the same trickle several times. The server disregards repeat trickles.

You have a problematic model here. But your graph looks fine. So far the model has recovered from its loops and progressed. But if it crashes after failing to progress after looping 6 times, just let it go and don\'t restore it. In fact if I were you I\'d stop spending time on making backups of this model.

Stardance said
Since I began running version 5.10.45, there has been only one run of about eight hours in which that error did not occur at all.


Are you sure you didn\'t get this error before upgrading to 5.10.45? Would anyone recommend that Stardance should reinstall 5.10.45 again or for the time being at least go back to an earlier version of BOINC? I think the problem lies with the model and not with BOINC but I\'m not sure. I think it\'s a coincidence that the model has turned into a frequent looper since you upgraded BOINC.


Hi, everyone.

I don’t know if this is important or not, but, you mentioned that the model in question looped once in early December as it was getting ready to trickle. I have noticed that my model will occasionally loop back to Dec. 1 if I open the graphics or even open “messages” when it is getting ready to trickle. It then recovers just fine. It’s best to leave it alone until after it is done trickling.


ID: 34095 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 34096 - Posted: 18 Jun 2008, 15:45:18 UTC
Last modified: 18 Jun 2008, 15:51:29 UTC

What you say is relevant, Jim. HADSM models are serious offenders in this respect, so much so that we\'ve even mentioned this bad behaviour in a News thread post. HADSMs shouldn\'t be disturbed in any way eg by exiting from BOINC until after at least one extra 3-day savepoint after they\'ve created an end-of-phase zip file. If they\'re disturbed they may go back to the beginning of the whole phase. You\'ve mentioned other sorts of disturbances that ideally shouldn\'t affect models at all.

I don\'t think we\'ve noticed HADCMs behaving badly in this respect. When Stardance\'s HADCM looped in Dec 2062 it must have been around the trickle creation point. But its loop in Dec 2063 was after at least one 6-day savepoint after trickle creation. I\'ve lost track of all the decimal numbers now, but I think that in 2063 it looped on 17 Dec. Not exactly one model year after its 2062 loop.

We certainly need to be on the lookout for behaviour usually associated with one model type occurring in another type, but I don\'t think Stardance\'s model is an example of this.
Cpdn news
ID: 34096 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 34097 - Posted: 18 Jun 2008, 16:18:51 UTC
Last modified: 18 Jun 2008, 16:34:02 UTC

Stardance, I\'m going back to the problem of your inability to see proper model graphics when you click the View graphics button in BOINC manager.

Jorden (Ageless) who\'s a moderator on the boinc_dev forum read this thread yesterday and said to the CPDN mods:

He says he updated his chipset drivers, but he doesn\'t say if he also updated his video card drivers. He mentions nothing about his video card, but for that it has \"some powerful graphics display capabilities\", which could mean anything.

A lot of the embedded video chips have problems with OpenGL graphics, as the on-board GPU uses the CPU and main memory. Asking him if he has a separate video card or that he uses on-board video is a pre. As is asking if he ever updated his video card drivers and DirectX version.

The default Windows drivers for a lot of video cards and embedded chips do not support full OpenGL compatibility. Some do partially, some don\'t at all, while the card or chip is capable of doing it.


This may be the cause of your graphics display problem. Could I suggest you read this post by Thyme Lawn which explains what you may need to update and how.

If you want to test whether your computer can display the graphics of another project whose tasks have them, you could try crunching a small number of Seti tasks. Seti has some short tasks.

First limit the amount of work your computer will download from Seti. In Boinc manager\'s Advanced tab, select Preferences, then Network usage, then edit the Additional work buffer to 0.5 days. Otherwise you could be overwhelmed with too many tasks you don\'t want. Click OK.

Then attach to Seti through the Tools menu. Use the same email address as for CPDN. As soon as you have 12 hours\' work from Seti, in Boinc manager Projects tab set Seti to No new work.
Cpdn news
ID: 34097 · Report as offensive     Reply Quote
old_user470806

Send message
Joined: 6 Sep 07
Posts: 47
Credit: 23,188
RAC: 0
Message 34100 - Posted: 19 Jun 2008, 1:44:54 UTC - in response to Message 34092.  

Thanks for explaining why, at the start of the run, the model immediately dropped to a previous point and recalculated from it.

... (the DLL initialization error) This is the replacement (BOINC) message for: Result ...... exited with zero status but no \'finished\' file.
I forget at what BOINC version this message changed, but there\'s some notes about the \'original\' here.


The \"DLL initialization error\" message first appeared in version 5.10.30 and the error message Task ... exited with zero status but no finished file, which had occurred with all previous versions that I used, ceased to appear. See http://climateapps2.oucs.ox.ac.uk/cpdnboinc/forum_thread.php?id=5948#32080

Granted, I remarked, at the end of that thread, that it seemed to \"replace\" the \"Task ... exited\" message, but only because one error stopped appearing only to have another error appear. I don\'t know whether both of the respective messages are related to the same operational condition that causes computation of the model to cease and the software to exit.

Upon my recent examination, continuing the computation of the model is affected far more severely when the \"DLL initialization error\" occurs than it was with the \"Task ... exited\" error condition. (Please refer to my reply to mo.v\'s response to my posting of the percentage data on this thread.)

FWIW, I am all too familiar with the \"Task ... exited\" error message. My computer has an nVidia nForce2 chipset on the Asus A7N8X-VM mainboard, but the primary problem appeared to be that Windows XP was installed with a Hardware Abstraction Layer (HAL) component that doesn\'t match the mainboard. See:
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/forum_thread.php?id=5773#31418

|
| --- Stardance
|
| nil carborundum illegitimi
ID: 34100 · Report as offensive     Reply Quote
old_user470806

Send message
Joined: 6 Sep 07
Posts: 47
Credit: 23,188
RAC: 0
Message 34101 - Posted: 19 Jun 2008, 2:51:55 UTC - in response to Message 34094.  
Last modified: 19 Jun 2008, 3:00:52 UTC

mo.v:

Thank-you for your analysis.

You have a problematic model here. But your graph looks fine. So far the model has recovered from its loops and progressed. But if it crashes after failing to progress after looping 6 times, just let it go and don\'t restore it. In fact if I were you I\'d stop spending time on making backups of this model.

Stardance said
Since I began running version 5.10.45, there has been only one run of about eight hours in which that error did not occur at all.


My statement is true. However, from reviewing the messages that I\'ve posted on this board, the \"DLL initialization error\" first occurred with version 5.10.30 (upgraded from 5.10.28). The prior versions that I ran produced a \"Task ... exited with zero status but no \'finished\' file.\" error message that ceased to be output with version 5.10.30 --please see Message #34100 on this thread. I did not attempt to analyze the effects of the change at the time (whether I had the time to make the effort).

That said, evidently whatever causes the \"DLL initialization error\" to occur has a much worse effect on resuming computation by the model than was evident with the \"Task ... exited\" error. When the \"Task ... exited\" error occurred, after it was restarted, the model would simply return to the most recent checkpoint and proceed with relatively little loss of time. See:
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/forum_thread.php?id=5773#30654

In contrast, before the \"DLL initialization error\" occurs, each checkpoint requires 27 - 33 minutes computation on my computer. That is the same amount of time both before and after the \"Task ... exited\" error occurred, but it is not the same amount of time for each checkpoint after the \"DLL initialization error\" occurs. As shown in the following messages for yesterday\'s run, after the \"DLL initialization error\" occurs, all checkpoints require more time -- about 50 minutes -- probably because of the \"looping\" shown in the percentage data that I posted on this thread.

6/17/2008 9:22:49 PM||Starting BOINC client version 5.10.45 for windows_intelx86
6/17/2008 9:22:49 PM||log flags: task, file_xfer, sched_ops, checkpoint_debug
6/17/2008 9:22:49 PM||Libraries: libcurl/7.18.0 OpenSSL/0.9.8e zlib/1.2.3
6/17/2008 9:22:49 PM||Data directory: C:\\Program Files\\BOINC
6/17/2008 9:22:49 PM||Processor: 1 AuthenticAMD AMD Athlon(tm) XP 2400+ [x86 Family 6 Model 8 Stepping 1]
6/17/2008 9:22:49 PM||Processor features: fpu tsc sse 3dnow mmx
6/17/2008 9:22:49 PM||OS: Microsoft Windows XP: Home Edition, Service Pack 3, (05.01.2600.00)
6/17/2008 9:22:49 PM||Memory: 1.94 GB physical, 2.29 GB virtual
6/17/2008 9:22:49 PM||Disk: 57.26 GB total, 13.65 GB free
6/17/2008 9:22:49 PM||Local time is UTC -4 hours
6/17/2008 9:22:49 PM|climateprediction.net|URL: http://climateprediction.net/; Computer ID: 753323; location: (none); project prefs: default
6/17/2008 9:22:49 PM||General prefs: from climateprediction.net (last modified 18-Sep-2007 17:32:13)
6/17/2008 9:22:49 PM||Host location: none
6/17/2008 9:22:49 PM||General prefs: using your defaults
6/17/2008 9:22:49 PM||Reading preferences override file
6/17/2008 9:22:49 PM||Preferences limit memory usage when active to 768.55MB
6/17/2008 9:22:49 PM||Preferences limit memory usage when idle to 1031.35MB
6/17/2008 9:22:49 PM||Preferences limit disk usage to 8.99GB
6/17/2008 9:22:49 PM|climateprediction.net|Task hadcm3iozn_cpm4_2000_80_45898965_7 is 146.94 days overdue.
6/17/2008 9:22:49 PM|climateprediction.net|You may not get credit for it. Consider aborting it.
6/17/2008 9:22:49 PM|climateprediction.net|Restarting task hadcm3iozn_cpm4_2000_80_45898965_7 using hadcm3i version 544

6/17/2008 9:51:43 PM|climateprediction.net|[checkpoint_debug] result hadcm3iozn_cpm4_2000_80_45898965_7 checkpointed
6/17/2008 10:20:21 PM|climateprediction.net|[checkpoint_debug] result hadcm3iozn_cpm4_2000_80_45898965_7 checkpointed
6/17/2008 10:51:46 PM|climateprediction.net|[checkpoint_debug] result hadcm3iozn_cpm4_2000_80_45898965_7 checkpointed
6/17/2008 11:20:35 PM|climateprediction.net|[checkpoint_debug] result hadcm3iozn_cpm4_2000_80_45898965_7 checkpointed
6/17/2008 11:50:55 PM|climateprediction.net|[checkpoint_debug] result hadcm3iozn_cpm4_2000_80_45898965_7 checkpointed
6/18/2008 12:23:22 AM|climateprediction.net|[checkpoint_debug] result hadcm3iozn_cpm4_2000_80_45898965_7 checkpointed
6/18/2008 12:50:56 AM|climateprediction.net|[checkpoint_debug] result hadcm3iozn_cpm4_2000_80_45898965_7 checkpointed
6/18/2008 1:19:08 AM|climateprediction.net|[checkpoint_debug] result hadcm3iozn_cpm4_2000_80_45898965_7 checkpointed
6/18/2008 1:47:53 AM|climateprediction.net|[checkpoint_debug] result hadcm3iozn_cpm4_2000_80_45898965_7 checkpointed

6/18/2008 1:55:08 AM|climateprediction.net|Task hadcm3iozn_cpm4_2000_80_45898965_7 exited with a DLL initialization error.
6/18/2008 1:55:08 AM|climateprediction.net|If this happens repeatedly you may need to reboot your computer.
6/18/2008 1:55:08 AM|climateprediction.net|Restarting task hadcm3iozn_cpm4_2000_80_45898965_7 using hadcm3i version 544

6/18/2008 2:45:47 AM|climateprediction.net|[checkpoint_debug] result hadcm3iozn_cpm4_2000_80_45898965_7 checkpointed
6/18/2008 3:37:25 AM|climateprediction.net|[checkpoint_debug] result hadcm3iozn_cpm4_2000_80_45898965_7 checkpointed
6/18/2008 4:28:51 AM|climateprediction.net|[checkpoint_debug] result hadcm3iozn_cpm4_2000_80_45898965_7 checkpointed

6/18/2008 5:03:26 AM||Suspending computation - user request

------------------------------------------------------------

So, now I\'m waiting to see what happens when two or more of the \"DLL initialization\" errors occur.

Your recommendation to continue running the model, and not restore it if the model reports that computation cannot continue, has merit. It seems likely to me that computation could continue until the end of the last year and a final data file could be produced, but it is going to be a slow crawl.

I don\'t know whether it would be a good idea to return to BOINC/HadCM3 version 5.10.28 just to speed things up and to see whether the model will loop as much with it instead of the versions since. I don\'t know what modifications were made that would make version 5.10.30 and/or 5.10.45 preferable.
|
| --- Stardance
|
| nil carborundum illegitimi
ID: 34101 · Report as offensive     Reply Quote
old_user470806

Send message
Joined: 6 Sep 07
Posts: 47
Credit: 23,188
RAC: 0
Message 34103 - Posted: 19 Jun 2008, 10:07:47 UTC - in response to Message 34097.  

Jorden (Ageless) who\'s a moderator on the boinc_dev forum read this thread yesterday and said to the CPDN mods:

He says he updated his chipset drivers, but he doesn\'t say if he also updated his video card drivers. He mentions nothing about his video card, but for that it has \"some powerful graphics display capabilities\", which could mean anything.

A lot of the embedded video chips have problems with OpenGL graphics, as the on-board GPU uses the CPU and main memory. Asking him if he has a separate video card or that he uses on-board video is a pre. As is asking if he ever updated his video card drivers and DirectX version.

The default Windows drivers for a lot of video cards and embedded chips do not support full OpenGL compatibility. Some do partially, some don\'t at all, while the card or chip is capable of doing it.


There are three (or four) things to consider here:

(1) The Asus A7N8X-VM mainboard has an Integrated GeForce4 MX GPU, among many other features that are usually implemented with a \"card\" or \"adapter\", such as the built-in Ethernet Network Interface Card (NIC). As far as I can determine, when I update the nForce2 chipset drivers, the drivers for the sub-components are updated, i.e., the \"chipset drivers\" are drivers for the various sub-components. The mainboard BIOS was updated when the vendor assembled the computer almost five years ago, and I have verified the version as the most recent. I subsequently updated the nForceware drivers twice, to version 93.71, and that is still the most recent. I have allotted the maximum of 128 MB of DRAM for the \"Video Frame Buffer\" (with 2 GB of DDR DRAM installed, why not?).

(2) The OpenGL Extensions Viewer 3.0 verified that the integrated GPU driver is the most recent (10/22/2006 ver. 6.14.10.9371), actually more recent than in their database of drivers. It also says that the graphics subsystem supports 100% of all \"core features\" of OpenGL up to and including version 1.5, and five of the ten core features of version 2.0. Apparently, there are many other aspects of OpenGL that the subsystem does or doesn\'t support, but I\'d rather not include the list here, since I can\'t copy it to the Windows clipboard. By the way, Thyme Lawn implies that the graphics subsystem needs to support only OpenGL version 1.2 for output from BOINC (?) -- has his post become outdated?

I have updated the Direct X drivers from version 9.0 to 9.0c, the most recent available from the Microsoft website -- the rigors of Windows Genuine Advantage notwithstanding. (You don\'t want to know the routine that they put me through for running Firefox when I went to their website -- ooops!) Furthermore, from the beginning, every version of Microsoft .NET has been installed on my computer and maintained on each and every Patch Tuesday. As far as I know, the OpenGL Extensions Viewer is the only application that I\'ve run which actually requires .NET (version 2.0).

(3) Also be advised that the display monitor is a Hewlett Packard w1907 Nineteen-Inch Wide LCD Display which currently receives \"VGA\" output from the mainboard. The mainboard Accelerated Graphics Port (AGP) also supports Digital Visual Interface (DVI) cards \"for digital display on LCD monitors and projectors\"; the w1907 could process that, of course. According to an HP Driver Installer Program that I downloaded, the driver that has been installed since I bought the HP w1907 monitor is the most recent.

Last, but not least, as disclosed in the BOINC messages that I\'ve posted, my computer is running Windows XP with SP2 and SP3. Yes, I\'ve heard that computers with AMD CPU chips have problems with SP3, but mine hasn\'t shown any, and the description that I gave for the BOINC graphics output has been the case since I began running the climate model last September.

If you will pay for it, then I will gladly install the most recent nVidia \"graphics adapter\", with the maximum amount of its own memory on the \"card\", in the AGP slot -- if the latest and greatest from nVidia would run optimally in such an ancient attachment (two PCI slots are also available). But I don\'t know whether the rest of the mainboard could actually support such a marvelous device. :-) :-)

If you want to test whether your computer can display the graphics of another project whose tasks have them, you could try crunching a small number of Seti tasks. Seti has some short tasks. ....


Thank-you for the information and advice. I must admit that trying to get BOINC to properly display its \"graphics\" on my computer system is not the highest priority right now. I\'ve spent more than enough time and effort on the matter already. .... I prefer to focus upon whether the data that the climate model outputs and sends to the CPDN is valid, and upon completing computation for the model\'s 80-year span as quickly as possible.
|
| --- Stardance
|
| nil carborundum illegitimi
ID: 34103 · Report as offensive     Reply Quote
Jord
Avatar

Send message
Joined: 5 Aug 04
Posts: 250
Credit: 93,274
RAC: 0
Message 34105 - Posted: 19 Jun 2008, 12:20:03 UTC - in response to Message 34103.  
Last modified: 19 Jun 2008, 12:20:54 UTC

By the way, Thyme Lawn implies that the graphics subsystem needs to support only OpenGL version 1.2 for output from BOINC (?) -- has his post become outdated?
No, it\'s the version of OpenGL that they made their graphics application with, so it\'s still correct.

(You don\'t want to know the routine that they put me through for running Firefox when I went to their website -- ooops!)
It\'s quite easy to circumvent that, by downloading the full multilingual redistribution version. I always keep a link to it in this FAQ (last link).

LCD
You forgot to say what LCD stands for. Liquid Crystal Display, if anyone\'s wondering. ;-)

Windows XP with SP2 and SP3.
So which one is it? SP2 or SP3?
I\'ve read that SP3 adds the \'dislike\' to the OS that programs no longer can write actively to the \\program files\\ directory. Like Vista doesn\'t like this.

This could account for your sudden DLL initialization error messaging. Although this error does (in a way) replace the old message, it\'s not the only one to do so. I\'ve tried to diagnose it here, but it\'s still an ongoing investigation. One omission in there is that it also happens when the drive powers down and on laptops running on batteries.

If you will pay for it, then I will gladly install the most recent nVidia \"graphics adapter\", with the maximum amount of its own memory on the \"card\", in the AGP slot -- if the latest and greatest from nVidia would run optimally in such an ancient attachment (two PCI slots are also available). But I don\'t know whether the rest of the mainboard could actually support such a marvelous device. :-) :-)
Only if you can find one that still has an AGP connector. We hunted one down lately and it\'s mainly PCIe that\'s out there. But the mainboard has an AGP 3.0 slot, so any of the latest ones should work. If your PSU can take it...

At least putting in an AGP card could free up the speed of your memory, depending on what speed you actually have in there. If it\'s DDR400 you have in there, it\'s running at max DDR333 due to you using the integrated video.

Anyway, that\'s peanuts. An external video card will always run faster than the integrated one, as an external card will not use the CPU as much (still use it to transfer graphics maps to the card\'s memory). The integrated video will depend heavily on the PC\'s CPU to do this. Especially those older ones.
Jord.
ID: 34105 · Report as offensive     Reply Quote
1 · 2 · Next

Questions and Answers : Windows : Too Many Trickles??

©2024 climateprediction.net