How much of a Compute error task is useful?

Author	Message
Digby Send message Joined: 17 Feb 06 Posts: 89 Credit: 4,309,159 RAC: 0	Message 52509 - Posted: 9 Sep 2015, 13:10:17 UTC Last modified: 9 Sep 2015, 13:10:53 UTC Hi, I recently had a task crash after restarting... :( I believe it was approx. 95% complete having run for 1,367,209 seconds... Can anyone suggest how much of this Compute error'd task can be used by the project team as useful information? The task is: http://climateapps2.oerc.ox.ac.uk/cpdnboinc/result.php?resultid=18678670 Thanks for any feedback? Digby ID: 52509 · Reply Quote

Alan K Send message Joined: 22 Feb 06 Posts: 484 Credit: 29,600,814 RAC: 2,129	Message 52510 - Posted: 9 Sep 2015, 15:46:00 UTC - in response to Message 52509. Essentially anything already reported by trickles is useful data - or so I am led to believe. ID: 52510 · Reply Quote

Les Bayliss Volunteer moderator Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0	Message 52512 - Posted: 9 Sep 2015, 23:21:40 UTC I suspect that the answer depends. To start with, it depends on what the researchers at a given research centre are trying to achieve; i.e. short segments of data, (which is what all of the models are these days.), or if they want to join up the bits to make a model run of, e.g. a hundred years. To do this, they need ALL of the zips, because the last one, (usually zip 13), contains the data to start up the next segment. If the data from a failed model is considered important, then they can re-issue that data set with a new name. (All of the data sets now come directly from the researchers.) Then there's the other way of looking at it: If you're looking for, say, peaches, and there are some that have started going bad, then you'll usually pick one that isn't. And the researchers can do the same with the model data. And some of the modelling doesn't use the trickle_up files to return small amounts of data; they're just there to let the server know that the model is still alive, and to create credits. If there are lots of failures for a given batch, then the trickle_up files + the zips can give them a clue as to what went wrong, and where it happened. ID: 52512 · Reply Quote

Digby Send message Joined: 17 Feb 06 Posts: 89 Credit: 4,309,159 RAC: 0	Message 52537 - Posted: 11 Sep 2015, 13:47:01 UTC Thanks for the feedback. OK, so the gist is basically to do what you can to complete a task but if that fails then sometimes something might be salvaged from the trickles already received. (FWIW I had another task error this morning when restarting the pc http://climateapps2.oerc.ox.ac.uk/cpdnboinc/result.php?resultid=18759508.) Running 24/7 will be more stable for task completion but it seems ironic that on a Desktop PC used during the day this will consume more energy and ultimately contribute more to climate change... I would like to shut down at night and backup tasks as well. So I am now taking the following steps to help complete my tasks: - I just upgraded Boinc to 7.6.7 from 7.4.23 using https://launchpad.net/~costamagnagianfranco/+archive/ubuntu/locutusofborg-ppa - Every time I shutdown or reboot I have always suspended the CPDN project but from now on I will ALSO suspend each task individually. - I have also unchecked 'leave non-GPU tasks memory while suspended'. Lets see how it goes. Cheers ID: 52537 · Reply Quote