climateprediction.net home page
WU\'s stick for ages on the % done

WU\'s stick for ages on the % done

Questions and Answers : Windows : WU\'s stick for ages on the % done
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile old_user2697
Avatar

Send message
Joined: 29 Aug 04
Posts: 11
Credit: 1,281,270
RAC: 0
Message 33076 - Posted: 25 Mar 2008, 14:00:27 UTC

I\'m running a couple of WU\'s on three different machines, and all WU\'s seem to stuck on the same % completion. When looking at the trickles send I see that since november no new trickles are received by CPDN.

Howe come?
Simmel

ID: 33076 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 33079 - Posted: 25 Mar 2008, 17:56:37 UTC
Last modified: 25 Mar 2008, 18:13:40 UTC

Hi Simmel

So far I\'ve found one:

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=6818233
Look at the speed - the sec/timestep of the most recent trickle. That\'s a cumulative average, so the model\'s real speed slowed to almost no progress after the second-last timestep. The model has become a slow-processing \'iceball\'. There are threads about this problem with HADSM slab models in the forum Number Crunching section. If you leave the model it will eventually complete, but it isn\'t worth it because its results will almost certainly be abnormal. Look at the model\'s graphics; I expect they are also abnormal with all-blue \'temperatures\'. Don\'t waste any more computer time on it. Abort it.

The other two models are probably also iceballs, but I\'ll look for them to check. You can easily check yourself by looking at their graphics. If you have 3 iceballs, that\'s very bad luck. It isn\'t your fault or your computers\' fault.

Edit: You need to check the graphics of this HADSM slab. Maybe it\'s progressing so slowly that it can\'t make its next trickle yet:

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=6935131

New edit: I think you have 2 HADSM iceballs running together on the same computer:

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=7106657
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=7085115 (this one became an iceball before its second trickle, possibly before its first trickle)

If I haven\'t found every model you\'re worried about, could you please post a link. I hope there are no more of these. That\'s really bad luck.

You\'ve crunched a lot for CPDN - thanks for your contribution.

Cpdn news
ID: 33079 · Report as offensive     Reply Quote
Profile old_user2697
Avatar

Send message
Joined: 29 Aug 04
Posts: 11
Credit: 1,281,270
RAC: 0
Message 33095 - Posted: 26 Mar 2008, 20:00:26 UTC

Hi mo.v,

Well I guess I\'m bad luck ...

Checked my three crunchers:
Host 1 (mtf-ams-srv-001) is running result 6818233 and 7040879. Both IceBalls ...
Host 2 (mtf-ams-lt-104) is running 7085115 and 7106657. Both IceBalls ...
Host 3 (wks02) is running 6935131 and 6962734. Onlt the last one is no IceBall ...

So 5 out of 6 are iceballs ... Man !

Simmel

ID: 33095 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 33096 - Posted: 26 Mar 2008, 20:49:26 UTC


Five out of six is the worst I\'ve ever come across! Although the more HadSM3s you run the more likely you are to encounter one which iceballs, and then ties up the PC. The other types of model aren\'t affected by iceballs.
I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 33096 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 33097 - Posted: 26 Mar 2008, 20:50:16 UTC


Overclocked?

ID: 33097 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 33098 - Posted: 27 Mar 2008, 1:07:07 UTC
Last modified: 27 Mar 2008, 1:37:22 UTC

Now that we have a definite list of these 5 iceballs, I\'m going to check those 5 complete shared workunits to see if I find any more. If I do I\'ll send private messages to the crunchers.

Edit later:

In those 5 workunits I\'ve found one iceball and will send a message to its owner. It belongs to the same workunit as Simmel\'s 6935131 on host 3.

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=6935132
Cpdn news
ID: 33098 · Report as offensive     Reply Quote
Profile Iain Inglis

Send message
Joined: 9 Jan 07
Posts: 467
Credit: 14,549,176
RAC: 317
Message 33101 - Posted: 27 Mar 2008, 12:56:39 UTC

On host 844370, both results have been taken further on the same platform (Intel+Windows). There may be something wrong with that machine.

Result 7040879 in work unit 6112139 has also been taken further by an Intel+Vista host. Again that suggests a PC problem as Windows variety is not relevant (?). The models themselves are identical and differences should appear only between platforms.
ID: 33101 · Report as offensive     Reply Quote
Profile old_user2697
Avatar

Send message
Joined: 29 Aug 04
Posts: 11
Credit: 1,281,270
RAC: 0
Message 33102 - Posted: 27 Mar 2008, 13:26:36 UTC

Hi Guys,

None of the host I use are overclocked. I do not use any of those hard ware acceleration utils. Just the iron out-of-the-box and then the installation of boinc.

I disconnected all stalling units and got me 5 new ones. Those are now running and proceeding as expected.

Thanx for all the info. I will be more alert next time.

Greetz, Simmel

btw: I was hovering around rank 4200 in november. Need to fight the whole way back now :-(
Simmel

ID: 33102 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 33106 - Posted: 27 Mar 2008, 18:29:43 UTC


There is a \'short cut\' to getting back to your old rac quickly, just run offline for about 15 or 16 days, and then let all your PCs talk to the network. Once the credit job runs (overnight), you\'ll then jump straight back up to your original rac.

I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 33106 · Report as offensive     Reply Quote

Questions and Answers : Windows : WU\'s stick for ages on the % done

©2024 climateprediction.net