climateprediction.net home page
Comments for \'Generic solutions to models\' sticky

Comments for \'Generic solutions to models\' sticky

Questions and Answers : Windows : Comments for \'Generic solutions to models\' sticky
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 6 · Next

AuthorMessage
old_user197776

Send message
Joined: 1 Sep 06
Posts: 11
Credit: 4,627
RAC: 0
Message 24503 - Posted: 2 Oct 2006, 17:38:54 UTC - in response to Message 24375.  
Last modified: 2 Oct 2006, 17:45:59 UTC

Hi,

The temperatures sound OK. I have a similar setup to yours in terms of the paging file and so forth, and it works fine.

It may be worth going through the stability checks, Prime95\'s Torture Test is very good...

Does the same thing happen with the Seasonal Attribution project? (attribution.cpdn.org)



I have since downloaded the Seasonal Attribution as well and at one stage both were running fine. I also started using Prime95 and did a torture test before starting to search. No problems running it.

Yesterday I downloaded new drivers and installed. Now all my projects are running fine.

I noticed the first improvement when I changed my target CPU temperature settings to 40C (AMD), so it does not reach 50C (before using Prime95). Then the projects detached. Happened a few times with just Climate Prediction. Not sure why.

I also made some other changes to the registry, but think the temperature setting fixed it as other projects also caused it to hang at one stage. It is early days though, but things look fine now.

Thanks
ID: 24503 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 24504 - Posted: 2 Oct 2006, 20:51:08 UTC

Tobie: Excellent, hope it continues to run :-)

Rick: Have you tried Prime95\'s torture test for a day or so? This will let you confirm that the hardware is OK under stress. The model is pretty much the most stressful software the PC will ever run.
I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 24504 · Report as offensive     Reply Quote
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 24512 - Posted: 3 Oct 2006, 4:33:40 UTC


. . . in adition to which, some of us have been running this \"dicey\" scientific software on a variety of hardware, on different flavors of MAC/Windoze/Linux, with few problems. Those are the tens of thousands of participants you don\'t hear from on these Boards.

I\'ve personally had software troubles -- to be sure -- but they were in CPDN Alpha and Beta tests. The rest? Power problems, hardware issues, self-inflicted wounds. (I\'m old, so I don\'t mind admitting personal imperfections.)

"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 24512 · Report as offensive     Reply Quote
Profile Rick D
Avatar

Send message
Joined: 12 Dec 05
Posts: 4
Credit: 93,834
RAC: 0
Message 24777 - Posted: 19 Oct 2006, 2:02:23 UTC - in response to Message 24504.  

Tobie: Excellent, hope it continues to run :-)

Rick: Have you tried Prime95\'s torture test for a day or so? This will let you confirm that the hardware is OK under stress. The model is pretty much the most stressful software the PC will ever run.


Hi Mike,
Thanks for the suggestion. I waited for CPDN to, uh, disappoint again (got to 6% this time) then downloaded & ran Prime95. 53 hours, no problems.

Whatever is going on here, it just passed my time budget for tending to a screensaver.

Good luck with the climate prediction for those who remain. I\'m off to other BOINC projects.

-Rick

ID: 24777 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 24778 - Posted: 19 Oct 2006, 3:44:28 UTC
Last modified: 19 Oct 2006, 3:52:16 UTC

Rick, on looking at your results I find that 3 of your past/crashed models show -161 errors. These probably happened because of some incident/event on your computer. Have you tried to identify from Mike\'s top post in this thread what the problem(s) may have been?

4 of your past models, including the one that\'s just gone down, crashed with -107 codes. This usually indicates a graphics problem, momentary or longstanding, on your computer. Have you gone to the link Mike provides in his second post up above? Have you, as Thyme Lawn explains in the link, updated your graphics card driver (free download from the web a bit like Windows updates)?

We advise everyone to back up their climate models. The models take upwards of 4 months to complete (up to a year on my computer), running 24/7. It is fairly likely that something will go wrong on the computer during such a long period. So these long workunits/simulations are only suitable for committed members willing to make regular backups.
Cpdn news
ID: 24778 · Report as offensive     Reply Quote
Profile Rick D
Avatar

Send message
Joined: 12 Dec 05
Posts: 4
Credit: 93,834
RAC: 0
Message 24801 - Posted: 20 Oct 2006, 15:42:35 UTC - in response to Message 24778.  

Hi Mo,

Thanks for the research, but I never doubted that the problems involved my computer.

I\'m sorry if people are taking my comments personally.

From the early days of SETI@home, my understanding was that these were programs that could use my computers\' idle periods, with negligible human input. I am definitely not a \"suitable committed member\" willing to manually do checkpoints & restarts and shield the program from other tasks. As I said, I have retreated to the other BOINC projects with their Load/Launch/Leave paradigm.

Best wishes,
Rick

Rick, on looking at your results I find that 3 of your past/crashed models show -161 errors. These probably happened because of some incident/event on your computer. Have you tried to identify from Mike\'s top post in this thread what the problem(s) may have been?

4 of your past models, including the one that\'s just gone down, crashed with -107 codes. This usually indicates a graphics problem, momentary or longstanding, on your computer. Have you gone to the link Mike provides in his second post up above? Have you, as Thyme Lawn explains in the link, updated your graphics card driver (free download from the web a bit like Windows updates)?

We advise everyone to back up their climate models. The models take upwards of 4 months to complete (up to a year on my computer), running 24/7. It is fairly likely that something will go wrong on the computer during such a long period. So these long workunits/simulations are only suitable for committed members willing to make regular backups.


ID: 24801 · Report as offensive     Reply Quote
old_user20481

Send message
Joined: 23 Sep 04
Posts: 1
Credit: 91,424
RAC: 0
Message 25454 - Posted: 4 Dec 2006, 1:08:22 UTC - in response to Message 24495.  

Hmmmmf.

I just crashed a BBC sim AGAIN. I\'ve read the posts about backups and temperatures and so on, but I think that\'s all missing the point--

this software is very fragile.

That is not a compliment.

There are many applications that make my machine (AMD 3700+, 3GB ram) work much harder. It has never failed any app due to heat.

I\'ve had four or five sims get up to around 10% and die of something or other. I don\'t have time to babysit my screensaver. It\'s frustrating that this app, alone of all the BOINC apps I\'ve tried, is so prone to crashing.

Further, I really don\'t subscribe to the \"some programs crash, you know\" proposition. If I had paid for sw that behaved like this, I\'d be furious. Honestly, wouldn\'t you?

I would like to help CPDN and BBC et al. save the world from climate change. Seriously. However, if my sims all die, my machine might just as well be folding proteins or tracking mosquitoes or evesdropping on ET or something.

My preferred solution is robust code from CPDN. My interim solution will be to drop this project after the next crash.

Harrumph.
-Rick



Hi,

I would suggest switching off the screen saver, as it is a waste of your computer\'s energy.
ID: 25454 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 25455 - Posted: 4 Dec 2006, 21:17:33 UTC
Last modified: 4 Dec 2006, 21:19:54 UTC

Rick and others unhappy with this program:

This is NOT a simple little toy that\'s been designed for the amusement of the public.
It\'s a huge (1 million lines plus), Fortran program, written over many decades by many software engineers, and normally runs on the Met office\'s supercomputers to predict both weather and climate.

Getting it to work on a desktop computer took a couple of years, and it does NOT like working with Windows and the thousands of different programs that people use at the same time.

Of course it\'s \"fragile\"! Which is why any computer that attempts to run it needs to be VERY stable.
Where the definition of \"stable\" is the ability to run these climate models without crashing. And there are a lot of us WITH \"stable\" computers.

If your computer hardware/software isn\'t capable of running the models, then there are some simpler, easier projects listed here.

ID: 25455 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 25456 - Posted: 4 Dec 2006, 21:43:10 UTC

I\'m not even certain that the climate model workunits are more fragile than workunits for any other boinc-based project. If you run a workunit for some other project and it needs 24 hours of computer time to complete it, the likelihood of it crashing will be X. A climate model running on the same computer might take 5 months to complete ie 150+ days. The likelihood of a model crash during this time will be X multiplied by 150. This is obviously quite a high-risk venture, so taking the precautions Mike describes above is well worth while. Backing up is also essential to avoid disappointment.
Cpdn news
ID: 25456 · Report as offensive     Reply Quote
old_user212146

Send message
Joined: 7 Dec 06
Posts: 3
Credit: 544
RAC: 0
Message 25506 - Posted: 7 Dec 2006, 22:01:56 UTC - in response to Message 25456.  

I\'m afraid that I have to agree with the above post. Being a seasoned software engineer for 20 years (5 years of which was at the Met Office in Bracknell) I know that you cannot write code perfectly - If you are talking about 1 million lines of Fortran code (god I hated using Fortran when i worked there) then there are bound to be errors in the code - it is just infeasible that the code created is 100% perfect so blaming peoples pc\'s for the problem is a little harsh, it is more likely that a certain combination of data is forcing the code down an unexpected path and causing a crash. The work packages are enormous - I used to run the project but gave up after having it restart itself back at 1920 3 times.

I appreciate that it may be difficult to do but reorganising the code to allow for smaller chunks of work - or at least to fix a \"last known good state\" at regular intervals would greatly increase the amount of results you are receiving.
ID: 25506 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 25507 - Posted: 7 Dec 2006, 22:40:50 UTC
Last modified: 7 Dec 2006, 22:42:44 UTC

\"smaller chunks of work\" will hopefully become possible sometime next year, once some code is written and tested. This is what the \"restart dumps\" every 40 years are for.
But currently, the core team have higher priority matters to deal with.
One of which is to solve the problem of the \'climateapps2\' server falling over because of the huge work load.
Then there\'s the programs that will allow the global researchers to access the data for their own analysis without causing problems for the storage servers.
These people are some of those \'paying the piper\', so they get a bit of a say in the priorities.

As for blaming people\'s computers, if you read back throught the posts of the last couple of years, and the replies to them, you\'ll see that a lot of people have been trying to run the program on seriously under-powered machines, which they think will work \'because SETI works on them\'.
Such as only having 128 Megs of ram, a cpu speed of 98 MHzs, an OS of 98ME, etc.
And you\'ll also find that a lot of people with \"faulty computers\" have been helped to get the program going with just a bit of advice.

One such from way back was a person who \'laid down the law\' about how it was the program, and not his computer. It turned out that he\'d had a psu failure, opened the case to replace it, then hadn\'t replaced the cover. After some advice from \'yours truly\', as they say, when he cleared away a few things so that he could see into the case, he found that the cpu heatsink was covered with a thick layer of dust, from a \'year of neglect\'. Or so.
Last I heard, his computer was crunching away successfully, and he was a \'happy little vegemite\'. (Local saying.)

So, how may we help you?
(Starting with some advice that you can obtain a \'better\' name for yourself by changing it in the preferences on your Account page.)

PS
\"last known good state\" IS written \"at regular intervals\".
But making sure that the tyres on your car have a good tread, and are inflated to the correct pressure, is no use if you get hit by a loaded out of control truck.
And similar things can happen to the model/program which makes the checkpoint useless.

Which is where backups come into the picture. As an IT person, you\'ll know about the value of having everything safely backed up every night.


Backups: Here
ID: 25507 · Report as offensive     Reply Quote
old_user212146

Send message
Joined: 7 Dec 06
Posts: 3
Credit: 544
RAC: 0
Message 25511 - Posted: 7 Dec 2006, 23:41:03 UTC - in response to Message 25507.  

\"smaller chunks of work\" will hopefully become possible sometime next year, once some code is written and tested. This is what the \"restart dumps\" every 40 years are for.
But currently, the core team have higher priority matters to deal with.
One of which is to solve the problem of the \'climateapps2\' server falling over because of the huge work load.
Then there\'s the programs that will allow the global researchers to access the data for their own analysis without causing problems for the storage servers.
These people are some of those \'paying the piper\', so they get a bit of a say in the priorities.

of course - this is understood - business as usual tends to take precedence over this sort of thing.


As for blaming people\'s computers, if you read back throught the posts of the last couple of years, and the replies to them, you\'ll see that a lot of people have been trying to run the program on seriously under-powered machines, which they think will work \'because SETI works on them\'.
Such as only having 128 Megs of ram, a cpu speed of 98 MHzs, an OS of 98ME, etc.
And you\'ll also find that a lot of people with \"faulty computers\" have been helped to get the program going with just a bit of advice.

agreed - but not everybody - and as was posted here - many people who have had problems have simply not bothered to post. I myself have crashed as i say and my pc is pretty stable 3.4ghz dual cpu with 1gb ram + 1 terrabyte of disk storage. It may not have been your intention but the responses came across as \"well it can\'t be us it must be your fault\" which is going to get peoples backs up a bit.


One such from way back was a person who \'laid down the law\' about how it was the program, and not his computer. It turned out that he\'d had a psu failure, opened the case to replace it, then hadn\'t replaced the cover. After some advice from \'yours truly\', as they say, when he cleared away a few things so that he could see into the case, he found that the cpu heatsink was covered with a thick layer of dust, from a \'year of neglect\'. Or so.
Last I heard, his computer was crunching away successfully, and he was a \'happy little vegemite\'. (Local saying.)


the joy of users


So, how may we help you?
(Starting with some advice that you can obtain a \'better\' name for yourself by changing it in the preferences on your Account page.)

I will get around to this at some point soon


PS
\"last known good state\" IS written \"at regular intervals\".
But making sure that the tyres on your car have a good tread, and are inflated to the correct pressure, is no use if you get hit by a loaded out of control truck.
And similar things can happen to the model/program which makes the checkpoint useless.

mmmm - well - my definition of regular would be at least once per day. If the last known checkpoint can get destroyed then it is not an effective checkpoint and how it is done needs to be rethought. I work in the telecoms industry now where we process realtime telephone data (10\'s of millions of items of data every day) - everytime a process crashes it HAS to know how to recover otherwise people lose revenue (lots of revenue) so i know these procedures are difficult, but not impossible to accomplish.


Which is where backups come into the picture. As an IT person, you\'ll know about the value of having everything safely backed up every night.

I understand this - but you have to realise human nature - probably 95% of the people who attach to this project did so thinking - \"ooohhh, that\'s a pretty screensaver and im doing a bit for the environment as well\". It is also a screensaver, which by definition is hands off, you are simply not there when it is running. People are not going to remember to do backups, they just aren\'t going to remember. By insisting on this policy (and with it being unstable as it is it does become mandatory) you are restricting yourself to the IT literate amongst us.

I am not having a go at the project - I appreciate what it is trying to achieve but you have to accept that most people are just running this for fun with the hope that it helps understanding of climate change. If it becomes \"difficult\" they are just gonna quit. This is the climiteprediction.net\'s loss as one more person detaches from the project to run one that is more stable.

I will try and stick with the project and try and remember to do regular backups but it is the most frustrating screensaver I have ever used :-)
ID: 25511 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 25512 - Posted: 8 Dec 2006, 1:11:47 UTC
Last modified: 8 Dec 2006, 1:18:43 UTC

Hi Nazarine

Just to clear up one point first. This program is NOT a screen saver. That concept went out the window when BOINC was created back in 2004. However all of the projects still provide screensavers for people who want to look at the pretty picture for some reason. (Perhaps to get the attention of visitors, who may then get interested in running it themselves.)
What it is now, is a low priority background process.

The core team, as well as us \'old timers\', do realise that lots of people run it just for fun, just as lots of people join a marathon \'run\' just for fun.
I don\'t know about the UK, but here in Sydney, we have a City to Surf (14km), in August. This has a category for fun runners; people with babys in strollers, army groups in full uniform, people in gorilla suits, etc.
I\'ve considered it myself a few times, (in the walking \'group\'), but only as far as the bottom of Heartbreak Hill, which is a real killer. Literally in a few cases.
If people want to run these climate models for fun, that\'s fine.

A lot of people have started out that way on the BBC part, and then given up. Then decided to complain that the BBC said that anyone could run it.
After a few questions and answers, they suddenly realised what they were doing wrong, and have continued.
Some have become so good at it, that they are now giving advice to other newcomers.

Back in about 2003, the researchers had hoped that perhaps they might get as many as a thousand people running the models, and hoped that they might get a couple of thousand of the original slab models, (simple static ocean). They eventually got back over 170,000 of them.

Currently, they are hoping for 5,000 of the TCMs, and at the moment have 3526, with, apparently, more than 30 coming in each day.

mmmm - well - my definition of regular would be at least once per day. If the last known checkpoint can get destroyed then it is not an effective checkpoint and how it is done needs to be rethought. I work in the telecoms industry now where we process realtime telephone data (10\'s of millions of items of data every day) - everytime a process crashes it HAS to know how to recover otherwise people lose revenue (lots of revenue) so i know these procedures are difficult, but not impossible to accomplish.


The checkpoints are written once every 6 model days, which is about every half hour on my 3.2 GHz P4. It was once per model day in the previous simpler models, but people complained about the too frequent hd access.

I had hoped not to have to go throuigh this typing again, which was why I simplified it.

However.
The model data is saved once every 6 days, and various other data are also saved, such as atmosphere restart, ocean restart, day restart, month restart, and year restart.
BOINC also saves it\'s work log regularly, in client_state.xml.

When the program is shut down, the model starts again from the last checkpoint.

If a problem with unrealistic values is encountered during running, the program rewinds one model day, and tries again. If the problem is encountered again, it rewinds a model month and tries again. Finally, it will rewind a model year and try again from there.
If the problem was a momentary computer problem, the model should, during one of these retries, get past the failure point.
If it doesn\'t, the model will be aborted, BOINC will upload some info to the servers, and then request a new model.

However, BOINC and Windows don\'t get along too well. Some Windows programs, for instance, take over the port used by boinc.exe (the \"hidden\" worker part) to \'talk\' to boincmgr.exe, (the visible part, or gui, which displays what the worker part is \'up to\').

Sometimes when this happens, the model just \"disappears\". After all, if the gui can\'t \'see\' the worker, it can\'t display anything. (This can usually be cured by menu/Exit from the gui, then re-booting the computer.)

Sometimes, however, for various reasons, usually hard to pinpoint, BOINC loses track of the \'child process\', (the climate program itself), so it thinks that this has stopped. It then assumes (I know, bad word, but it\'s the best there is), that the model has completed, sets the percentage-to-completion to 100%, ticks the model off it\'s list of \'work units in the queue\', and tries to upload the final zip files. Which don\'t exist, so it returns a 161 error; file not found.

As the model is no longer listed in it\'s work queue, BOINC has now forgotten about the existance of the climate model, which is why the only way to get it back is from a previous backup.

All suggestions for improvements to BOINC should be directed to the BOINC site, at the University of California, Berkeley, here.

If I still haven\'t explained something, please post again.

ID: 25512 · Report as offensive     Reply Quote
old_user212146

Send message
Joined: 7 Dec 06
Posts: 3
Credit: 544
RAC: 0
Message 25513 - Posted: 8 Dec 2006, 1:46:51 UTC - in response to Message 25512.  

Sorry about this - my final post of the night I promise :-)

Does the checkpoint handle completely unexpected results, i.e. if i suddenly had a powercut would it recover from the last checkpoint (my pc is similar spec to yours so from about the previous 30 minutes worth of processing)? This is as opposed to an out of bounds value turning up in the data.

If it does this why would it ever restart back at 1920? If it does this because it has reached a data dead end (i.e. the model has realised it is spiralling unrealistically out of countrol and decides to abort) is the data produced up to that point useful?

If the problems are being caused by PC hardware failure you would expect the checkpoint system to be able to rewind to the recent checlpoint position and just carry on regardless - hardware problems are extremely unlikely to occur in the same processing place and so you are always going to steadily increase your progress (albeit with the odd slight rewind).

Thanks for your patience in answering my questions - I am running the Prime95 Torture Test at the moment to ensure I have no hardware issues to explain my resets and as i say i will be attempting to do backups at regular intervals.
It is way past my bedtime (1:45am) and I need sleep now.
ID: 25513 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 25514 - Posted: 8 Dec 2006, 3:58:32 UTC

Hi Nazarine

When you refer to your \'resets\' what do you mean? Have you had climate model crashes? On your project pages I can only find one model that you\'ve had. If you\'ve reregistered under a different name but had model crashes previously, we can point you to advice about how to avoid the same problems again.

The model never goes back to 1920. If the model crashes, its data get uploaded to the server and a new model downloads. The way to continue processing the crashed model is to restore a backup, which is almost always successful.

At the moment there can\'t be a backup/restore button/facility within the boinc manager because to use it you\'d have to be using the open boinc GUI, but backups only work if you back up the whole boinc folder - which means that before backing up you have to exit from boinc itself.

So the person has to make regular backups, for which our README compilations describe several methods. Auto backup methods also work, as long as you remember to exit from boinc beforehand.

Carl, who\'s the chief programmer for the project in Oxford, considers the models to be extremely stable. Almost the only instability problem emanating from the models themselves is that some of them turn out to \'loop\', ie at a certain point refuse to crunch forward in time. Fortunately, the data created up to that point is useful to the researchers.

As Les has said, most model failures are due to

*hardware that doesn\'t meet the minimum specs, particularly as regards RAM

*members who are unaware of the basic precautions required during normal computer use to protect the model

*catastrophic or unexpected hardware failure, external events, or software events

Regular backups are the ultimate line of defence against all of these. This is where we are now.




Cpdn news
ID: 25514 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 25515 - Posted: 8 Dec 2006, 4:39:31 UTC

Does the checkpoint handle completely unexpected results, i.e. if i suddenly had a powercut would it recover from the last checkpoint


Yes. At least for me.
The half dozen that I\'ve had range from a brownout that lasted a bit too long, to a 2 1/4 hour power failure. (The latter was in summer. And in the 5 minutes that it took me to get a torch, look up a number, find the old rotary dial phone, then an adaptor, and get on to the power company, there was already a recording saying that it would last about 3 hours. I think that someone made a boo-boo at a nearby substation.) I was using the computer at the time, when everything went black. It took a few seconds to get over the shock and work out where the nearest torch was.

I even had one while I was out shopping. I knew this because the F Lock led on the base unit for the wireless mouse/keyboard was on, so the computer had rebooted. A check of the times in the messages of the gui told me when it had happened.

On other occasions of a short failure, (nearby electrical storm), while I\'ve been using the computer, I\'ve been able to see/hear what happens.
First the click as the power supply turns off, when the display going black, and the hds winding down.
After a few seconds the computer powers up, followed by Windows, then BOINC, and finally the model.

But I DO have a larger than normal, 400W psu. When I built the computer, I went for overkill, (at a reasonable price), so that in 10 years time it would still have a chance of coping with the programs available then.
But now we have Conroe, and a different philosophy for designing the cpus.
Still, my machine is rock stable. (Helped by not being near an earthquake zone. :) )

PS
I should have mentioned the 4 README files here, which are full of hints and tips, or links to them.

ID: 25515 · Report as offensive     Reply Quote
Profile old_user191612
Avatar

Send message
Joined: 29 Jun 06
Posts: 4
Credit: 1,883,202
RAC: 0
Message 25865 - Posted: 7 Jan 2007, 12:43:02 UTC - in response to Message 21066.  
Last modified: 7 Jan 2007, 12:54:26 UTC

>>... and the second is a -161 error in the log.
I\'ve this error more than 4 times on one machine. It\'s a AMD 64 under XP Pro.

>>* If you use Norton or Sophos antivirus,
No.

>>* Before playing games or other heavy duty applications ...
Thats possible, sometimes I run Games.

>>* Run a stability test on your machine...
That\'s all ok on my computer.

>>* Overheating can cause instability...
Overheating was\'nt the problem.

* People who have overclocked their machines...
Nope.

>>* Make backups about once per week
Make it daily, but when I make a replay ... Groundhog Day ...

>>* Watch out for firewall messages ...
Firewall was\'nt the problem.

>>* Windows \'time sync\' messages ...
hardly probable

>>* The benchmark boinc runs every 5 days ...
Not the problem.

>>* The Memory requirement for XP machines is now 512MB
1 GB

The Error is only on a AMD Athlon 64 under XP Pro. The other machines are Sempron 3000, Sempron 3200, Athlon 3000, T2400 (all XP) and one AMD 64 but under Win 2003 Server. They are not concerned.

Oh, by the way. On this machine CPDN runs when stopped with 50% of the rest of capability. Normaly I see 45% in taskmanager when stopped. CPDN can\'t be stopped by user or other tasks, it\'s run everytime. That\'s not normal.

ID: 25865 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 25869 - Posted: 7 Jan 2007, 13:01:48 UTC


Regarding the problem with the task still running when Boinc is shut down, this can happen if there are problems with the local network stack (you might notice that your webbrowser sometimes seems to \'stick\' on that PC). If there are any unnecessary network software running, try to get rid of it (I uninstalled two different VPNs on my box, and that solved a similar problem).

Could you provide a link to the PC you\'re having trouble with? You have several in your list.
I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 25869 · Report as offensive     Reply Quote
old_user197776

Send message
Joined: 1 Sep 06
Posts: 11
Credit: 4,627
RAC: 0
Message 25876 - Posted: 7 Jan 2007, 23:08:39 UTC

I\'ve now run CPDN on and off, sometimes for a few days and other times for half-an-hour or maybe a few hours before it freezes up. It always starts up again after reset, so no problem loosing WUs rebooting.

I\'m still trying to figure out why I get the problem though. Maybe it has to do with dual core running two projects all the time or my motherboard/RAM combination. Sometimes my other projects also freezes the computer, but not as easily as CPDN.
ID: 25876 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 25878 - Posted: 7 Jan 2007, 23:53:57 UTC


By freezing up, do you mean that your PC locks solid and you have to power-cycle it to get control back?

If so, there are several possible causes:

* CPU Overheating (usually slows down a bit first and then powers off itself)
* Memory faults causing the PC to fail (usually blue-screens)
* Insufficient memory (everything slows down as if the PC was in treacle, constant traffic on the hard disk, and switching between applications becomes incredibly slow).

I think the third is most likely on your machine. The recommended memory to run the model is 512MB, and you have less than half of that.

I would recommend disabling the CPDN model until you have more memory or maybe a new PC in the future - there is a risk of hard disk damage from the continual use of the paging file.

You could also try trimming down your machine a bit to remove anything which is unused from memory (i.e., go through the system tray and uninstall or disable anything you don\'t actually need). This will free up a little space, but it\'ll be a very tight fit even if you weren\'t running any other projects.
I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 25878 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 . . . 6 · Next

Questions and Answers : Windows : Comments for \'Generic solutions to models\' sticky

©2024 climateprediction.net