climateprediction.net home page
Folding@Home not compatible with HadCM3 shorts

Folding@Home not compatible with HadCM3 shorts

Message boards : Number crunching : Folding@Home not compatible with HadCM3 shorts
Message board moderation

To post messages, you must log in.

AuthorMessage
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 51995 - Posted: 28 May 2015, 3:04:39 UTC

I have found that running Folding@Home on GPUs and CPDN on the CPUs causes the HadCM3 short work units to error out after about 10 to 13 minutes. At least that is the way it is on my dual GTX 750 Tis that I use for Folding, while running CPDN with BOINC 7.6.1 (Win7 64-bit). Normally, Folding gets along fine with the various BOINC projects I run, so that is somewhat of a surprise. I have not tried to isolate it further to see if it is the Folding client software or the Folding cores and work units themselves (Core 17 at the moment) that cause the problem.

But this raises the possibility that other activity on the GPUs could cause problems too (games, video editing, etc.) so if you seem to be getting too many errors, you might try disabling those and see if it reduces the CPDN error rate. I haven't looked into the other CPDN work unit types, but there might be problems there too. Normally, I like to run BOINC as a service, which pretty much isolates it from other software, but that does not work with the HadCM3 shorts, and as I recall the HadAM3P-HadRM3P Pacific North West work units.
ID: 51995 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,053,321
RAC: 4,417
Message 51996 - Posted: 28 May 2015, 4:33:50 UTC - in response to Message 51995.  

Your problem could be the version of Boinc that you are running. I have no problem running Seti (1 task at a time) on the GPU and CPDN (4 tasks on 2 hyperthreaded cores) using Boinc 7.4.42. It may be that Boinc 7.6.1 isn�t stable under that kind of load. Also have you checked core temps? Excessive heat buildup might cause instability.

ID: 51996 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 51997 - Posted: 28 May 2015, 5:13:12 UTC - in response to Message 51996.  
Last modified: 28 May 2015, 5:13:33 UTC

I should have mentioned that running BOINC projects on the GPU does not cause the problem. At least Einstein and POEM don't, and I have run a lot of GPUGrid too without incident. It is just Folding, which uses its own client. That normally is an advantage, since I can then run BOINC as a service, which is a bit more stable in some cases unrelated to the current situation. (The core temps etc. are fine.)

This machine runs Folding (FAH Client 7.4.4) on two GTX 750 Tis: http://climateapps2.oerc.ox.ac.uk/cpdnboinc/results.php?hostid=1363431

This machine does not run Folding, but runs Einstein and POEM on two GTX 750 Tis (BOINC 7.6.1): http://climateapps2.oerc.ox.ac.uk/cpdnboinc/results.php?hostid=1349694

They are otherwise similar Haswell boards (Z87/Z97), with nothing overclocked. If you check, most of the errors are "no resubmissions", which I aborted in some cases.
ID: 51997 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 51998 - Posted: 28 May 2015, 5:55:20 UTC - in response to Message 51997.  

No resubmission means that task isn't needed/required.
BOINC has a problem with recognizing this, and after a period of nnn days re-issues them.

If you get any, Abort them.

ID: 51998 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 925
Credit: 34,100,818
RAC: 11,270
Message 51999 - Posted: 28 May 2015, 7:02:48 UTC - in response to Message 51996.  

Your problem could be the version of Boinc that you are running. I have no problem running Seti (1 task at a time) on the GPU and CPDN (4 tasks on 2 hyperthreaded cores) using Boinc 7.4.42. It may be that Boinc 7.6.1 isn�t stable under that kind of load. Also have you checked core temps? Excessive heat buildup might cause instability.

v7.6.1 was a botched release, and has already been replaced by v7.6.2 - but I think it was more to do with Manager compatibility with Windows 10, rather than any problem with the client running applications.
ID: 51999 · Report as offensive     Reply Quote
currob

Send message
Joined: 23 Jul 13
Posts: 5
Credit: 176,000
RAC: 0
Message 52500 - Posted: 31 Aug 2015, 6:44:42 UTC

Could this be the reason for my errors on this machine: http://climateapps2.oerc.ox.ac.uk/cpdnboinc/results.php?hostid=1373589

it is running FAH on the 780ti and BOINC on the cpu, it is a 24/7 machine with low cpu/system temps and uptimes of around 45 days with no other errors/issues.

Is there a solution for this compatibility issue yet?

ID: 52500 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 52501 - Posted: 31 Aug 2015, 7:34:43 UTC - in response to Message 52500.  

No.
The errors, (from stderr on the Task ID page for each of those models), is:
ATM_DYN : INVALID THETA DETECTED

This means that the physics of that particular theoretical world went beyond know limits, and the program terminated the run.

They're all "short" models, which were/are set to run near the limits of stability to test something that I've forgotten.

ID: 52501 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 52503 - Posted: 31 Aug 2015, 12:56:33 UTC - in response to Message 52500.  

While Les is undoubtedly correct that the "Invalid Theta Detected" usually indicates a bad model, as I have posted before, that is not always the case:
http://climateapps2.oerc.ox.ac.uk/cpdnboinc/forum_thread.php?id=8003&nowrap=true#51187

Each of the three machines that failed had that error message, even though it completed successfully on my machine. So it appears that something else was the cause, or it would have failed on my machine too I would think.

I don't know whether Folding could trigger that particular error, but as Les says, probably not, though I have not investigated the Folding problems to that extent; I just avoid FAH on that machine now.
ID: 52503 · Report as offensive     Reply Quote
currob

Send message
Joined: 23 Jul 13
Posts: 5
Credit: 176,000
RAC: 0
Message 52504 - Posted: 31 Aug 2015, 16:58:40 UTC

Thanks for the replies, I've removed boinc from that machine now and will keep it for fah, that boinc client wasn't getting any more work after those failures either. At least the 2 better machines are running error free. :)
ID: 52504 · Report as offensive     Reply Quote

Message boards : Number crunching : Folding@Home not compatible with HadCM3 shorts

©2024 climateprediction.net