climateprediction.net home page
Model stopped running

Model stopped running

Questions and Answers : Windows : Model stopped running
Message board moderation

To post messages, you must log in.

AuthorMessage
Steinar1965

Send message
Joined: 4 Sep 06
Posts: 79
Credit: 5,583,517
RAC: 0
Message 30863 - Posted: 7 Oct 2007, 21:30:06 UTC
Last modified: 7 Oct 2007, 21:32:59 UTC

One of the 4 models running on my Q6600 suddenly stopped running, at least it seemed so.
When I turnes on the graphics the earth was only blue and I could not change it. It was stucked on 336 timesteps til checkpoint. I suspended boinc, rightclicked and exit, then I restarted the machine. The model started again but stopped at the same point.
The graphics window is slower to open and close too.
Is it something with the model or is it something with my PC? All models have run without any problems til now and the other 3 models run without any problems.

The model is:
hadcm3iozn_cpyj_2000_80_45899412_7 using hadcm3i version 544

After I wrote this, the model has advanced 3 timesteps..
I run antivirus, defragment, have updated drivers, did not use it to anything else for hours before it happened.

Edit: I took a backup yesterday :-)))
Should I reinstall?

Thx
Steinar
ID: 30863 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 30877 - Posted: 7 Oct 2007, 22:30:40 UTC


There is a no definitive answer yet to this problem.
A \'blue world\' is what you get initially when starting a model, as the graphics data is not yet available for the display.
Whether this has anything to do with it, I don\'t know.

There is an info thread here, with a link near the bottom to a discussion thread.
This is for the slab models, but the symptoms are often similar for the Coupled Ocean models.

As to re-running from a backup, there\'s only one way to find out. :(
However, the problem may have started from before the time that the backup was made.

ID: 30877 · Report as offensive     Reply Quote
Steinar1965

Send message
Joined: 4 Sep 06
Posts: 79
Credit: 5,583,517
RAC: 0
Message 30883 - Posted: 8 Oct 2007, 6:17:03 UTC - in response to Message 30877.  


There is a no definitive answer yet to this problem.
A \'blue world\' is what you get initially when starting a model, as the graphics data is not yet available for the display.
Whether this has anything to do with it, I don\'t know.

There is an info thread here, with a link near the bottom to a discussion thread.
This is for the slab models, but the symptoms are often similar for the Coupled Ocean models.

As to re-running from a backup, there\'s only one way to find out. :(
However, the problem may have started from before the time that the backup was made.


The model finished and uploaded tonight. I got the following msg:
08.10.2007 01:28:26|climateprediction.net|Reason: Unrecoverable error for result hadcm3iozn_cpyj_2000_80_45899412_7 (The device does not recognize the command. (0x16) - exit code 22 (0x16))
08.10.2007 01:28:26|climateprediction.net|Computation for task hadcm3iozn_cpyj_2000_80_45899412_7 finished
08.10.2007 01:28:26|climateprediction.net|Output file hadcm3iozn_cpyj_2000_80_45899412_7_5.zip for task hadcm3iozn_cpyj_2000_80_45899412_7 absent
08.10.2007 01:28:26|climateprediction.net|Output file hadcm3iozn_cpyj_2000_80_45899412_7_6.zip for task hadcm3iozn_cpyj_2000_80_45899412_7 absent
08.10.2007 01:28:26|climateprediction.net|Output file hadcm3iozn_cpyj_2000_80_45899412_7_7.zip for task hadcm3iozn_cpyj_2000_80_45899412_7 absent
08.10.2007 01:28:26|climateprediction.net|Output file hadcm3iozn_cpyj_2000_80_45899412_7_8.zip for task hadcm3iozn_cpyj_2000_80_45899412_7 absent

Exit code 22 is a fail on \"my side\"? isnt it?
Should I restore from the - thank god - newly created back-up I took after uploading the .zip-files on all 4 models?
I have suspended boinc just to be sure..
ID: 30883 · Report as offensive     Reply Quote
Profile Iain Inglis

Send message
Joined: 9 Jan 07
Posts: 467
Credit: 14,549,176
RAC: 317
Message 30890 - Posted: 8 Oct 2007, 9:05:59 UTC - in response to Message 30883.  
Last modified: 8 Oct 2007, 9:07:44 UTC

...Exit code 22 is a fail on \"my side\"? isnt it?
Should I restore from the - thank god - newly created back-up I took after uploading the .zip-files on all 4 models?
I have suspended boinc just to be sure..


Yes, the exit code is on the PC. Restore the model as quickly as possible: since all models on your quad will be restored, the quicker you restore the less wasted time there is on the other models. I don\'t know of any method to restore just one of the four.

Before restoring it might be a courtesy to the project to abort the new model that has downloaded but not yet trickled. That way, they know for sure the model isn\'t going to restart at a later date.
ID: 30890 · Report as offensive     Reply Quote
Steinar1965

Send message
Joined: 4 Sep 06
Posts: 79
Credit: 5,583,517
RAC: 0
Message 30917 - Posted: 9 Oct 2007, 17:24:35 UTC

After reinstalling after the modelcrash, the model now works fine after passing the piont where it crashed. All other models also works fine :-)

I deleted the model that downloaded after the upload of the crashed model. I did so before I thought of aborting that model, so nobody in the project probably know its not going to finish. Sorry :/
I hope the models will finish without any problems..

Steinar
ID: 30917 · Report as offensive     Reply Quote
Profile Iain Inglis

Send message
Joined: 9 Jan 07
Posts: 467
Credit: 14,549,176
RAC: 317
Message 30918 - Posted: 9 Oct 2007, 17:58:37 UTC

Good to hear that it all worked out.
ID: 30918 · Report as offensive     Reply Quote
Steinar1965

Send message
Joined: 4 Sep 06
Posts: 79
Credit: 5,583,517
RAC: 0
Message 31051 - Posted: 21 Oct 2007, 11:53:00 UTC - in response to Message 30918.  

The same model stopped running again. I have 4 models running and I backed up yesterday after 3 had sendt the zip-files. The fourth zip-file uploaded toningt and when I restore from backup, it will upload again. Will that cause any problems?
I have looked in the forums for a soulution but didnt find anything..

Thx
Steinar
ID: 31051 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 31052 - Posted: 21 Oct 2007, 12:28:57 UTC


No problems.
The trickles also contain the timestep, so the server will be able to compare them with what it already has, and ignore duplicates.

ID: 31052 · Report as offensive     Reply Quote
Steinar1965

Send message
Joined: 4 Sep 06
Posts: 79
Credit: 5,583,517
RAC: 0
Message 31074 - Posted: 23 Oct 2007, 14:39:54 UTC - in response to Message 31052.  

Again the same model crashed, and right after another model crashed for the first time. I get the \"blue screen\" or ice world all the times. I dont know what happens here but my PC is the same as before, it is not used for much other things. All drivers updated, anti virus etc. It is not OC\'ed either.

The models went fine in the beginning but after the first crash it happens more and more often. No models had problem befor 50% was crunshed. I restore from backups but now there is a crash every day. The models have run 75% - 80% and it would be nice to see them reach the end.
Should I reinstall boinc? It is ver. 5.10.20 and/or should I terminate the models and reinstall the PC? Is it better to run only 3 models at the same time? I think it is sad not to finish them..

Steinar
ID: 31074 · Report as offensive     Reply Quote
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 31075 - Posted: 23 Oct 2007, 18:23:57 UTC

Grasping at straws with this -- This is recorded for OS and installed memory:
Operating System Microsoft Windows XP
Professional Edition, Service Pack 2, (05.01.2600.00)
Memory 3318.04 MB

I have no experience with 4GB installed with a 32-bit OS but am aware that WinXP_x32 can\'t address all of it, which probably explains the 3318 MB entry. Perhaps someone with experience on that configuration can help...

Is it possible that, as the Models progress, \'memory leaks\' develop, migrate above three Gig level and, eventually, Windoze, with its legendary space-management abilities, trips over itself? (I said I was grasping at straws!)

Expert help, anyone...?


(My quad has 4GB but has 64-bit OS installed (Vista Home Premium_x64 [so far, unused] and openSuSE 10.3_x64. Have a lock-up issue between boinc and openSuSE 10.3 in \'Tools\' and \'Advanced\' options, possibly related to new Linux security measures, but no other boinc or CPDN problems.)
"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 31075 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 31076 - Posted: 23 Oct 2007, 19:38:06 UTC


Hi Steinar

There are a lot of \'blue worlds\' lately. They may be related to the values being used in the recent batches of models.
I wouldn\'t start reinstalling things; it\'s much more likely to be the models.

************

A note for those people who aren\'t aware: the object of the climate models is NOT to run them to the due completion date. It\'s to run them as far as they\'ll go.
The researchers need to know which combinations of values produce a long time stable model and which don\'t, as much as they also want to know the end \'climate\' result.

The object of the project is to improvement climate forecasting, so it\'s just as important to know what works, as it is to know what doesn\'t.


Backups: Here
ID: 31076 · Report as offensive     Reply Quote
Steinar1965

Send message
Joined: 4 Sep 06
Posts: 79
Credit: 5,583,517
RAC: 0
Message 31094 - Posted: 24 Oct 2007, 8:17:02 UTC - in response to Message 31076.  


Hi Steinar

There are a lot of \'blue worlds\' lately. They may be related to the values being used in the recent batches of models.
I wouldn\'t start reinstalling things; it\'s much more likely to be the models.

************

A note for those people who aren\'t aware: the object of the climate models is NOT to run them to the due completion date. It\'s to run them as far as they\'ll go.
The researchers need to know which combinations of values produce a long time stable model and which don\'t, as much as they also want to know the end \'climate\' result.

The object of the project is to improvement climate forecasting, so it\'s just as important to know what works, as it is to know what doesn\'t.



Ok. I am running the prime95 toture test at the moment to check if everything is ok. Then I reinstall after that and if the first model keeps crashing I abort it and keeps on running the others?
ID: 31094 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 31098 - Posted: 24 Oct 2007, 12:28:05 UTC


If the model continues to run at a normal speed, then I\'d keep it running regardless of whether it turns into an iceworld or not. The only case I\'d recommend aborting it is if the model grinds to a halt.
I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 31098 · Report as offensive     Reply Quote
Steinar1965

Send message
Joined: 4 Sep 06
Posts: 79
Credit: 5,583,517
RAC: 0
Message 31100 - Posted: 24 Oct 2007, 16:33:01 UTC - in response to Message 31098.  

I saw something that might be interesting: The backup-folder has the size 1,37 Gb but the folder that contains the crashed models, the folder in c:7program files, has the size 2,12 Gb. Does that indicate something relevant?

I have just finished 24 hrs of the prime95 torture test and all iterations passed.
I will now resore from backup and start again. If the model(s) fail again after a short time, should I abort them and and finish the others?
ID: 31100 · Report as offensive     Reply Quote
Steinar1965

Send message
Joined: 4 Sep 06
Posts: 79
Credit: 5,583,517
RAC: 0
Message 31284 - Posted: 5 Nov 2007, 21:08:27 UTC

After some problems with 2 of the models stop running, all 4 models now finished computation after reaching the end. Nice to see :)
ID: 31284 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 31285 - Posted: 5 Nov 2007, 22:15:27 UTC

Congratulations! These models are so long that every one completed is a personal success.
Cpdn news
ID: 31285 · Report as offensive     Reply Quote
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 31286 - Posted: 6 Nov 2007, 3:45:38 UTC

Good show, Steinar! Congratulations.

For what it\'s worth, I\'ve had more problems with my Q6600 (G0 stepping) than any earlier build. Issues range from Fedora7 installation twice trashing the Master Boot Record (preventing Windoze boot), to BSOD pushing updates and utilities into Vista Home Premium.

The box tests \'stable\' with Memtest and four copies of Prime-95, is not overclocked, yet seems a pit to collect all ills.

Perhaps there are issues with boinc running certain combinations of four CPDN Models...? One wonders. (Not to mention questionable Linux distros and M$ inadequacies.)


Regardless, job well done in completing the four Models. Many more successes!

"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 31286 · Report as offensive     Reply Quote

Questions and Answers : Windows : Model stopped running

©2024 climateprediction.net