climateprediction.net home page
System crashed, on restart BOINC downloads new model, won't work on old one

System crashed, on restart BOINC downloads new model, won't work on old one

Questions and Answers : Macintosh : System crashed, on restart BOINC downloads new model, won't work on old one
Message board moderation

To post messages, you must log in.

AuthorMessage
old_user35795

Send message
Joined: 12 Jan 05
Posts: 13
Credit: 1,884,525
RAC: 0
Message 7320 - Posted: 16 Jan 2005, 21:02:35 UTC
Last modified: 17 Jan 2005, 4:12:17 UTC

I downloaded my first model and was happily crunching for a few days, up to about timestep 21,000. My system (G4 tower, 10.3.7) froze for reasons unrelated to BOINC.

After rebooting, I restarted BOINC; but it downloaded a new model rather than resuming the old one. It also interspersed the new model's files in with the old, creating a new projects directory inside the 'jobs' directory of the old one, among other things.

I eventually tossed the entire project directory and restarted BOINC. It downloaded a third model and everything is fine again, although I guess the 21,000 timesteps already done were pointless.

For future reference, can somebody please let me know how to get BOINC to resume the old model after a crash? It would be appreciated if this were explained in terms comprehensible to somebody who doesn't typically use a command line interface.

EH
ID: 7320 · Report as offensive     Reply Quote
Profile Andrew Hingston
Volunteer moderator

Send message
Joined: 17 Aug 04
Posts: 753
Credit: 9,804,700
RAC: 0
Message 7323 - Posted: 16 Jan 2005, 21:48:14 UTC

Sadly, there is no way to return to the old model unless you kept a backup of the BOINC folder. The program is designed to rewind itself if there is a possible computing error, but in some situations it will simply crash and there is nothing the user can do about it. It can be very frustrating, so let us hope that you were unlucky this time and the experience does not repeat.
ID: 7323 · Report as offensive     Reply Quote
Profile old_user34261

Send message
Joined: 29 Dec 04
Posts: 9
Credit: 32,552
RAC: 0
Message 9101 - Posted: 10 Feb 2005, 19:48:45 UTC
Last modified: 12 Feb 2005, 17:41:49 UTC

Hello,

I have just experienced the same situation which is indeed rather frustrating considering the time it takes to complete the model. The Mac had been running quite happily for just over a month now !!!
I had restarted it a few times and it always managed to resume its activity without any particular problem.

Tonight however, after installing the Mac OSX updates and restarting the machine I had the following when starting the program :

[HomeG4:~/documents/dev/climateprediction] davidcar% ./boinc
2005-02-10 19:50:07 [---] Starting BOINC client version 4.13 for powerpc-apple-darwin
2005-02-10 19:50:07 [climateprediction.net] Project prefs: no separate prefs for home; using your defaults
2005-02-10 19:50:07 [climateprediction.net] Host ID is 81718
2005-02-10 19:50:07 [---] General prefs: from climateprediction.net (last modified 2005-01-05 22:57:39)
2005-02-10 19:50:07 [---] General prefs: no separate prefs for home; using your defaults
2005-02-10 19:50:07 [climateprediction.net] Resuming computation for result 2s21_100150982_2 using hadsm3 version 4.03
Starting model in /Users/dc/Documents/Dev/ClimatePrediction/projects/climateprediction.net...
Created shared memory region key = 24545
Env Used=DYLD_LIBRARY_PATH=/Users/dc/Documents/Dev/ClimatePrediction/projects/climateprediction.net:../
Starting model ID 2s21_100150982 Phase 2
Stack size=48.00 MB
Waiting for model startup, this may take a minute...
2s21_100150982 - PH 2 TS 091873 - 00/00/0000 00:00 - H:M:S=0931:23:14 AVG= 9.55 DLT= 0.00
Model crashed...retrying...restart level 0
Preparing for restart...
Rewinding a model-day...
Starting model ID 2s21_100150982 Phase 2
Stack size=48.00 MB
Waiting for model startup, this may take a minute...
2s21_100150982 - PH 2 TS 091873 - 00/00/0000 00:00 - H:M:S=0931:23:14 AVG= 9.55 DLT= 0.00
Model crashed...retrying...restart level 1
Preparing for restart...
Rewinding a model-month...
Copying restart files for model retry...
Starting model ID 2s21_100150982 Phase 2
Waiting for model startup, this may take a minute...
Stack size=48.00 MB
2s21_100150982 - PH 2 TS 091873 - 00/00/0000 00:00 - H:M:S=0931:23:14 AVG= 9.55 DLT= 0.00
Model crashed...retrying...restart level 2
Preparing for restart...
Rewinding a model-year...
Copying restart files for model retry...
Starting model ID 2s21_100150982 Phase 2
Waiting for model startup, this may take a minute...
Stack size=48.00 MB
2s21_100150982 - PH 2 TS 091873 - 00/00/0000 00:00 - H:M:S=0931:23:14 AVG= 9.55 DLT= 0.00
Model crashed...retrying...restart level 3
Preparing for restart...
Error: Restart files for not found
Giving up, this result exceeded crash count for available restart files.
adding: 2s21aa.pa.gmts.x1.nc (deflated 35%)
adding: 2s21aa.pa.rmts.x1.nc (deflated 36%)
adding: 2s21aa.pc.gmts.x1.nc (deflated 53%)
adding: 2s21aa.pc.rmts.x1.nc (deflated 40%)
adding: 2s21aa.pd.gmts.x1.nc (deflated 53%)
........

and then it started from scratch on a new model !!
Am only posting this in case this helps find a solution to this issue. It seems a great waste to loose the benefit of over a month of computation just because of a "crash".

Looks like I have lost the files indeed as the size of the project folder is reduced to 400MB (instead of about 600 before the event) so I guess there is no way back.

I can hardly believe this has happened, should there not be any way of preventing this ? Surely the program should never replace an existing set of data without at least giving the opportunity to the operator to make that decision.

Any comment or potential solution will be highly appreciated.

Thank you.

David.
ID: 9101 · Report as offensive     Reply Quote
old_user31747

Send message
Joined: 29 Nov 04
Posts: 7
Credit: 66,811
RAC: 0
Message 9613 - Posted: 20 Feb 2005, 10:58:43 UTC - in response to Message 7323.  

> Sadly, there is no way to return to the old model unless you kept a backup of
> the BOINC folder. The program is designed to rewind itself if there is a
> possible computing error, but in some situations it will simply crash and
> there is nothing the user can do about it. It can be very frustrating, so let
> us hope that you were unlucky this time and the experience does not repeat.
>
It seems to me there are two situations (1) where the computer crashes, and (2) , much more frequently, where ther is a need to shut down the program because of needing to power off or restart to complete installation of a new (other) program. I have been able to shut off Boinc 2.13 by using control-C repeatedly over a couple of months, and the run has always restarted successfully. But I have just downloaded 2.19 to work on a new dual-processor computer, and each of the two times I have shut it off in the same way, the models have crashed leaving a message much like described by Frenchy (9101) (I have not been able to find a log-file on my computer; is there one?). I am obviously worried that the same will happen again, but at least I could plan by backing up the BOINC folder; could you describe the process of restoring from it? Presumably one would have to do the back-up before shutting down boinc.

Thank you


ID: 9613 · Report as offensive     Reply Quote
old_user116558

Send message
Joined: 25 Nov 05
Posts: 1
Credit: 23,295
RAC: 0
Message 18102 - Posted: 12 Dec 2005, 14:49:02 UTC - in response to Message 7323.  

Hi,

I still have some data of my model on my hard drive, but I don\'t know what to change. It seems that all the data are still there, but only a status has to be changed. Is there a brief description of the meaning of all the files?




Sadly, there is no way to return to the old model unless you kept a backup of the BOINC folder. The program is designed to rewind itself if there is a possible computing error, but in some situations it will simply crash and there is nothing the user can do about it. It can be very frustrating, so let us hope that you were unlucky this time and the experience does not repeat.




ID: 18102 · Report as offensive     Reply Quote

Questions and Answers : Macintosh : System crashed, on restart BOINC downloads new model, won't work on old one

©2024 climateprediction.net