climateprediction.net home page
Workunit uses lots of disk space+restart/recover the same task
Workunit uses lots of disk space+restart/recover the same task
log in

Advanced search

Questions and Answers : Unix/Linux : Workunit uses lots of disk space+restart/recover the same task

Author Message
bernard_ivo
Send message
Joined: 18 Jul 13
Posts: 252
Credit: 5,903,045
RAC: 23
Message 51473 - Posted: 26 Feb 2015, 16:55:31 UTC

Dear all,

Recently I resumed my computer contribution to CPDN, so I'm still a bit rusty and need some refreshing. I It would be great if one can help me.

One of the problems I'm encountering is that my current model UK Met Office HadAM3P (global only) with MOSES II landsurface scheme link to WUI http://climateapps2.oerc.ox.ac.uk/cpdnboinc/workunit.php?wuid=9546200 at 6% progress occupies 3.1 GB hard disk space in hadam3pm2_k243_1959_10_009463966/dataout
Shouldn't be much less?

Update: While writing this I accidentally deleted the model directory and BOINC exited with error when tried to write at CPU checkpoint. I recovered the folder, but it seems I cannot restart the same task. Can I? Any suggestions? Should I simply deleted all files and start new task?

Cheers

Profile geophi
Volunteer moderator
Send message
Joined: 7 Aug 04
Posts: 1670
Credit: 32,083,245
RAC: 31,083
Message 51474 - Posted: 26 Feb 2015, 18:37:42 UTC - in response to Message 51473.

I'm not sure on the typical size of these folders as I am away from my computers now. However, the model folders do get big.

Unfortunately you cannot recover your model despite restoring that folder. Your client_state.xml file no longer contains info about it since boinc thought that task errored out. Go ahead and delete that folder and start a new task.

bernard_ivo
Send message
Joined: 18 Jul 13
Posts: 252
Credit: 5,903,045
RAC: 23
Message 51475 - Posted: 26 Feb 2015, 18:46:45 UTC - in response to Message 51474.
Last modified: 26 Feb 2015, 18:48:20 UTC

Thanks. Let's see how it goes. My other Linux machine errored out earlier on another hadam3p model.

Profile geophi
Volunteer moderator
Send message
Joined: 7 Aug 04
Posts: 1670
Credit: 32,083,245
RAC: 31,083
Message 51476 - Posted: 26 Feb 2015, 19:46:23 UTC - in response to Message 51475.

If you don't like the idea of errors, I'd go run the hadcm3s models as opposed to the MOSES ones. The MOSES ones hate to be interrupted for any reason. Which kind of goes against the idea of distributed computing running science apps when your computer is otherwise idle.

Les Bayliss
Volunteer moderator
Send message
Joined: 5 Sep 04
Posts: 6408
Credit: 16,839,542
RAC: 21,887
Message 51477 - Posted: 26 Feb 2015, 20:16:29 UTC

And that model that you linked to had lots of Suspends.

See my post here for comments about that.

bernard_ivo
Send message
Joined: 18 Jul 13
Posts: 252
Credit: 5,903,045
RAC: 23
Message 51479 - Posted: 26 Feb 2015, 21:54:57 UTC - in response to Message 51477.

Thanks, but still I use these machines and I need to shut them down almost every day. So I wait for CPU checkpoint and then suspend, then exit and shut down. I mean I can't leave the machine >450h CPU at 100% to finish uninterrupted hadma3p models?! On a laptop CPUs running at 100% all the time are way too hot and I try to give CPDN some CPU computing while I'm working as not much idle time on these machines. I read some time ago that there is no way to make some of these models smaller and less error prone. But if I manage to complete less than 20% of tasks, then 80% of the computing time is just lost.

Nevertheless I will set some of the preferences as suggested.

Thanks mates

Questions and Answers : Unix/Linux : Workunit uses lots of disk space+restart/recover the same task


Main page · Your account · Message boards


Copyright © 2017 climateprediction.net