climateprediction.net home page
Cluster startup: no more work units

Cluster startup: no more work units

Questions and Answers : Getting started : Cluster startup: no more work units
Message board moderation

To post messages, you must log in.

AuthorMessage
old_user474753

Send message
Joined: 28 Sep 07
Posts: 1
Credit: 215,245
RAC: 0
Message 30761 - Posted: 29 Sep 2007, 1:46:23 UTC

Hi all:

I\'m bringing a new computational cluster online, and I wanted to \"stress-test\" it by running boinc/climateprediction.net on it for a while. We have 128 processors on this cluster, so I just set it up to run climateprediction on all processors. It got up to about 37 before all of the remaining procesors are returning:

Fri 28 Sep 2007 11:26:22 AM PDT|climateprediction.net|Requesting 60480 seconds of new work
Fri 28 Sep 2007 11:26:27 AM PDT|climateprediction.net|Scheduler RPC succeeded [server version 509]
Fri 28 Sep 2007 11:26:27 AM PDT|climateprediction.net|Message from server: No work sent
Fri 28 Sep 2007 11:26:27 AM PDT|climateprediction.net|Deferring communication for 44 min 36 sec
Fri 28 Sep 2007 11:26:27 AM PDT|climateprediction.net|Reason: no work from project

My tests require that I fully load the cluster...what can I do? I have 4 nodes (8 cores/node) fully loaded, and a few more with only a single thread. Is there some setting I can set to allow more work units? Did I really empty the work unit queue for the project?

Thanks!
--Jim
Linux Cluster running ROCKS
24 nodes with 2 quad-core 2.33Ghz cpus each
(This is for the Lab for Atmospheric Research when it goes fully online, so I thought it appropriate to test with climateprediction.net)
ID: 30761 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 30762 - Posted: 29 Sep 2007, 10:31:00 UTC
Last modified: 29 Sep 2007, 10:52:18 UTC


Hi,

The Server Status page says that there is still more work to give out. You might be able to get more models on the following day, and so forth (I think there is there is a limit to the number of models you can pick up per 24 hour period on a given host).

You can also pick up more work units by connecting to the SAP project as well (http://attribution.cpdn.org), SAP is a shorter duration but higher resolution model. Takes around 450MB of ram per model, and should take around 11 days to run on the Quad Xeon. They have a limited number of work units remaining, so please only attach to it if you plan to run the models to completion.

I\'m not sure if APS has work to give out, but if they do, connect to http://apsathome.org/ - APS is hosted at the University of Manchester and they run atmospheric physics jobs (typically lasting 30 minutes to 5 hours each).

Looks like an interesting machine... how long are you stress testing it, is it long enough to complete the models, or will they be discarded? (if the latter, running shorter work units would be better).

I prefer to use Prime95\'s torture test for short-term stress testing, the reason being that the climate projects try to silently recover from errors, but the GIMPS project has a mode specifically designed to tell the user if a floating point error occurred. My linux host is running mprime right now to check whether the PC is stable enough to run the climate project. Note that you\'ll need to use the -An flag to assign each job to a specific core (n = 0-7), then select \'N\' for \'join gimps\', \'16\' for \'set CPU options\' (800 for 800MB per job), and finally \'18\' for \'torture test\'.
I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 30762 · Report as offensive     Reply Quote

Questions and Answers : Getting started : Cluster startup: no more work units

©2024 climateprediction.net