climateprediction.net home page
Have I mixed up Tasks / Work units

Have I mixed up Tasks / Work units

Questions and Answers : Windows : Have I mixed up Tasks / Work units
Message board moderation

To post messages, you must log in.

AuthorMessage
old_user519896

Send message
Joined: 28 May 08
Posts: 16
Credit: 32,985
RAC: 0
Message 34252 - Posted: 10 Jul 2008, 8:51:53 UTC

Hello

I was happily computing until a few days ago. I forgot that BOINC was running and I shut my computer down. Ouch. I had reached 4% of my task :-(. Next morning I fired up my computer, restarted BOINC, saw him restart computation and forgot about it. But when I checked again a few hours later, I discovered BOINC had decided that my previous task was invalid and had loaded a new one. I felt that losing days of computation was unbearable, so I stopped BOINC, reloaded the last backup and restarted it. What I hadn\'t seen, though, is that the old task/work unit was flagged \"over\"/\"client error\"/\"compute error\" on the server too. Now I have this strange situation: BOINC is trickling but in an invalid task/work unit. Is this any use?
Frederic
ID: 34252 · Report as offensive     Reply Quote
Profile Thyme Lawn
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1283
Credit: 15,824,334
RAC: 0
Message 34253 - Posted: 10 Jul 2008, 9:03:28 UTC - in response to Message 34252.  

Is this any use?

Absolutely. As long as your task continues to return trickles and upload result files the CPDN server will accept them and grant you credits (the page for your task shows a trickle was received last night).
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer
ID: 34253 · Report as offensive     Reply Quote
old_user519896

Send message
Joined: 28 May 08
Posts: 16
Credit: 32,985
RAC: 0
Message 34254 - Posted: 10 Jul 2008, 9:21:25 UTC - in response to Message 34253.  

Absolutely. As long as your task continues to return trickles and upload result files the CPDN server will accept them and grant you credits (the page for your task shows a trickle was received last night).


Thanks. I don\'t care about the credits, I am doing it for the results (as we say in French, we are all in the same boat here). I had seen that the server had received the trickle but I was worried the server might not use my results, which would mean I was burning watts for nothing and wasting calculation time.

Thanks for your answer.
Frederic
ID: 34254 · Report as offensive     Reply Quote
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 34261 - Posted: 11 Jul 2008, 1:36:24 UTC

The server will forever keep the Model flagged as \'errored\'. There will be some ugly messages when it finishes, things like the result not being accepted. Ignore them. They\'re boinc messages and exist for other projects; they are meaningless at CPDN. This Project only cares about receiving your work, boinc error messages notwithstanding. Best of luck with it.
I don\'t care about the credits

A kindred spirit!
"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 34261 · Report as offensive     Reply Quote
old_user519896

Send message
Joined: 28 May 08
Posts: 16
Credit: 32,985
RAC: 0
Message 34501 - Posted: 5 Aug 2008, 8:06:21 UTC

Sorry, but I must say you are probably both wrong. Here is the snipped sequence of messages I got one week ago:
26/07/2008 14:45:20|climateprediction.net|Sending scheduler request: To send trickle-up message. Requesting 0 seconds of work, reporting 0 completed tasks
26/07/2008 14:45:35|climateprediction.net|Scheduler request succeeded: got 0 new tasks
26/07/2008 14:47:43|climateprediction.net|Computation for task hadcm3istd_0qlf_1920_160_15989616_4 finished
26/07/2008 14:47:43|climateprediction.net|Output file hadcm3istd_0qlf_1920_160_15989616_4_2.zip for task hadcm3istd_0qlf_1920_160_15989616_4 absent
(...)
26/07/2008 14:47:43|climateprediction.net|Output file hadcm3istd_0qlf_1920_160_15989616_4_16.zip for task hadcm3istd_0qlf_1920_160_15989616_4 absent
26/07/2008 14:48:45|climateprediction.net|Sending scheduler request: To fetch work. Requesting 30240 seconds of work, reporting 1 completed tasks
26/07/2008 14:48:55|climateprediction.net|Scheduler request succeeded: got 1 new tasks
26/07/2008 14:48:55|climateprediction.net|Message from server: Completed result hadcm3istd_0qlf_1920_160_15989616_4 refused: result already reported as error
26/07/2008 14:48:57|climateprediction.net|Started download of hadsm3fub_jux2_005958201.zip

I don\'t know if the way the server handles results from \"Compute error\" tasks changed since both of you checked this or if I had another problem, but I know one thing: if ever a task crashes and BOINC loads a new task, I won\'t try to restore, I\'ll let BOINC work on his new one!

One question: is the partially accomplished work of any use to CPDN?
Frederic
ID: 34501 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 34502 - Posted: 5 Aug 2008, 8:43:22 UTC
Last modified: 5 Aug 2008, 8:46:29 UTC

Message from server: Completed result hadcm3istd_0qlf_1920_160_15989616_4 refused: result already reported as error

This has been the normal response to the completion of a Coupled Ocean model (also known as a TCM: Transient Coupled Model), after a failure/restore ever since they were released in 2006.
It\'s a \"BOINC thing\", and, while it may seem alarming to crunchers, it only really applies to other projects with quorums, and is meaningless on this project, because the project people don\'t rely on BOINC messages to decide which models are OK, and which aren\'t.

As has already been said, the data from partially completed models IS of use.
It\'s been posted in lots of places many times, as well as in the READMEs, but once again:
Data from TCMs is returned once per model year for that year, (in the trickle files), with larger amounts returned every 10 model years by means of a zip file.

All of which can tell the project people about the model. But the further a model can run, the more useful the data.
Backups: Here
ID: 34502 · Report as offensive     Reply Quote
old_user519896

Send message
Joined: 28 May 08
Posts: 16
Credit: 32,985
RAC: 0
Message 34503 - Posted: 5 Aug 2008, 9:40:42 UTC

Thanks for the answer. I\'m a bit unsure of what you meant by \"completion\" since Boinc was telling me before switching to the new task that the old one was going to need a few years to complete. But I am happy because

- you confirmed that what had been accomplished could of use

- the new task has already reached 30% in one week, which means I am pretty sure I will complete it without any problem :-)
Frederic
ID: 34503 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 34504 - Posted: 5 Aug 2008, 9:52:10 UTC

On fast desktops, running 24/7, with no other projects, and little other work, the long TCMs can be completed in about 3 months.
And there\'s a lot of us doing that. :)

ID: 34504 · Report as offensive     Reply Quote
old_user519896

Send message
Joined: 28 May 08
Posts: 16
Credit: 32,985
RAC: 0
Message 34505 - Posted: 5 Aug 2008, 10:49:20 UTC

... \"on fast desktops\"! mine is a 2 years old Pentium M. It is my best computer. At that time, dual cores were very new and very hot - physically. The laptop is working 24/7 and it is doing absolutely nothing else, so I don\'t think I can do much better now.

The new task is running very fast, I guess it is a very small one, but the previous task was so long I wasn\'t even sure it would be finished by the limit (around 10 years). I believe the limit does not really exist, so I was not worried about that, but frankly, who can tell if cpdn will still be doing something in 10 years? I wonder if the announced estimated duration was not wrong: I started the task on an even older (5 years) and slower laptop. Although the trickles were more frequent after I switched, the estimated duration stayed about the same.
Frederic
ID: 34505 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 34508 - Posted: 5 Aug 2008, 14:51:15 UTC
Last modified: 5 Aug 2008, 14:57:57 UTC

Hi Frédéric

BOINC\'s estimate of how long a task will take on a particular computer isn\'t always very good, but BOINC keeps revising and correcting its estimate. Your very long 160-year model (the one Thyme Lawn linked to in his above post) certainly wouldn\'t have taken 10 years to complete. Crunching at about 4.5sec/ts a lot of the time you would have completed it before the end of 2009.

It\'s definitely possible, and a good idea, to restore crashed models from backup and continue them.

But I clicked on + and looked at the stderrout messages on the page for your long model. They include the line

Model crashed: umshell1.f: ATM_DYN : NEGATIVE THETA DETECTED.

This means your model developed an atmospheric value that\'s impossible in the real world. If this occurs, the models are designed to stop and crash. If you had restored it again, the same problem would most probably have occurred again on the same model date. A few models develop negative pressure and also crash.

The cause may be the particular combination of initial parameter values for this model. But the results of the processed years will still be added to the data set for the researchers.

If you always want to crunch shorter models you should deselect HADCM in the CPDN preferences section of your account. HADAMs and HADSMs are both relatively short compared with HADCMs.

If you disable the screensaver, the models process faster.

As you\'re running models on a laptop, it would be a good idea to raise the complete laptop slightly above the table surface - not just the little feet at the back. This allows more air underneath and helps to keep the laptop cool.

If members are not sure whether it\'s a good idea to restore a particular model from a backup and continue it, you can ask on the forum. We can look at the reason why it crashed, which usually gives a good indication of whether it can be successfully restored and completed.
Cpdn news
ID: 34508 · Report as offensive     Reply Quote
old_user519896

Send message
Joined: 28 May 08
Posts: 16
Credit: 32,985
RAC: 0
Message 34513 - Posted: 5 Aug 2008, 16:05:40 UTC

Thanks for all these informations!
Frederic
ID: 34513 · Report as offensive     Reply Quote

Questions and Answers : Windows : Have I mixed up Tasks / Work units

©2024 climateprediction.net