climateprediction.net home page
Lots of tasks end with "Error While Computing". Is there a problem at my end?

Lots of tasks end with "Error While Computing". Is there a problem at my end?

Questions and Answers : Unix/Linux : Lots of tasks end with "Error While Computing". Is there a problem at my end?
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
old_user701332

Send message
Joined: 27 Jul 13
Posts: 14
Credit: 100,367
RAC: 0
Message 49517 - Posted: 9 Jul 2014, 5:12:02 UTC

Most of my CDPN tasks ended with "Error While Computing." I'm wondering if this is the fault of something on my computer, or if I am doing or did something wrong.

I don't run my computer 24/7. I start it up and shut it down about twice a day. I try to protect my BOINC work units by suspending projects before I shut down, and resuming them after I boot my computer.

I understand some work units will end with "Error While Computing", but I noticed the most probable time for a work unit to end in this state is just after I resume CDPN.
ID: 49517 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 49518 - Posted: 9 Jul 2014, 6:18:15 UTC - in response to Message 49517.  

Well, one of the things that you could try, is to set Suspend work if CPU usage is above to zero.
In other words, don't constantly stop and start BOINC. (And the climate program.)

This non-zero setting may be fine for other projects, but the climate models don't like it, and sooner or later ...

ID: 49518 · Report as offensive     Reply Quote
old_user701332

Send message
Joined: 27 Jul 13
Posts: 14
Credit: 100,367
RAC: 0
Message 49519 - Posted: 9 Jul 2014, 6:35:36 UTC - in response to Message 49518.  

I don't have an exact option "Suspend work if CPU usage is above". I did recently change the option "Use at most ___ % of CPU time" to 85%. My CPU tends to run hot when I'm running BOINC. As Spring was ending, it was getting warm in this room. It is poorly insulated and has lots of glass, so there was some danger of my computer overheating. So, I set it to 90% and then to 85%.
ID: 49519 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 49520 - Posted: 9 Jul 2014, 8:10:49 UTC - in response to Message 49519.  

The option that I mentioned is in the Computing preferences of your Account page on the project's server.

These climate models DON'T like being interrupted. Sooner or later they'll crash.
LOTS of your models are crashing.
If it's too hot to run your computer any other way, perhaps you shouldn't run climate models.

ID: 49520 · Report as offensive     Reply Quote
old_user701332

Send message
Joined: 27 Jul 13
Posts: 14
Credit: 100,367
RAC: 0
Message 49525 - Posted: 9 Jul 2014, 19:04:36 UTC - in response to Message 49520.  
Last modified: 9 Jul 2014, 19:06:05 UTC

I do have that option, I didn't notice it "yesterday" when I looked. I have always had it set to zero.

I'm going to put CDPN on "no new tasks" until I can look at my computer's temperature issues. If I get it fixed, it might not be for several months. If I don't get it fixed, I'll detach from the project. I'll let the remaining work unit run when I have the computer on at night.
ID: 49525 · Report as offensive     Reply Quote
old_user701332

Send message
Joined: 27 Jul 13
Posts: 14
Credit: 100,367
RAC: 0
Message 49526 - Posted: 9 Jul 2014, 19:37:59 UTC - in response to Message 49525.  

I'll detach from the project when the current work unit finishes. Even if I resolve the cooling issue, I won't be running my computer 24/7.
ID: 49526 · Report as offensive     Reply Quote
Profile Ron Crouch
Avatar

Send message
Joined: 24 Feb 05
Posts: 45
Credit: 11,332,534
RAC: 0
Message 49528 - Posted: 10 Jul 2014, 4:34:25 UTC - in response to Message 49526.  

You could set your CPU usage down to 50%, or even lower to see if it runs cooler for now. Other than that; time to save up for liquid cooling!
6,000?? Give it a rest.

G�bekli Tepe is more than 10,000 years old. And quite intricate I might add.

Explain that!
ID: 49528 · Report as offensive     Reply Quote
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 49536 - Posted: 10 Jul 2014, 21:51:12 UTC - in response to Message 49517.  

There is another possibility -- dust. I find it necessary to vacuum the innards of my machines occasionally and blow dust (pressurized air in cans) from vanes of CPU heat sink. These always-running machines are not only effective heaters, they are also effective air filters...

The additional heat is welcome in Winter but not in Summer!

Hope that helps. We don't want to lose you.

"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 49536 · Report as offensive     Reply Quote
old_user701332

Send message
Joined: 27 Jul 13
Posts: 14
Credit: 100,367
RAC: 0
Message 49544 - Posted: 11 Jul 2014, 19:00:10 UTC

That my CPU runs hot is why I lowered the maximum CPU use of BOINC to 90% and then 85% last month. In cooler seasons, I let BOINC use 100% of my CPU. Even allowing BOINC 100% of my CPU, I still had a lot of work units end with "Error While Computing." So, my CPU running hot is a side issue.

From what was posted here before I mentioned the temperature issue, Lacking any suggestions about making a software or hardware fix, other than those related to CPU temperature, I have to conclude that the problem seems to be the result of my not running my computer 24/7.
ID: 49544 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4341
Credit: 16,497,933
RAC: 6,477
Message 49545 - Posted: 11 Jul 2014, 19:40:24 UTC - in response to Message 49544.  

I run linux and most nights I turn my computer off. I use the suspend to disk option and using this, most tasks run to completion. The computer running hot could be a factor in the number of errors you get. Another option is to go into the bios and underclock it a bit.
ID: 49545 · Report as offensive     Reply Quote
WB8ILI

Send message
Joined: 1 Sep 04
Posts: 161
Credit: 81,421,805
RAC: 1,225
Message 49556 - Posted: 13 Jul 2014, 13:03:52 UTC

This is a different problem but fits the thread subject exactly.

Most, but not all, of recent batch of work units give me the following error after about 1.5 minutes on one of my computers (1267447).

Any ideas as to what the problem might be or how to find the problem?


<core_client_version>7.2.42</core_client_version>
<![CDATA[
<stderr_txt>
SIGSEGV: segmentation violation
Stack trace (13 frames):
/home/bob/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu(boinc_catch_signal+0x6f)[0x836e1cf]
[0xb0f9d400]
/home/bob/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x8136129]
/home/bob/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x813c074]
/home/bob/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x8131c87]
/home/bob/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x813d6aa]
/home/bob/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x8133fca]
/home/bob/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x8078e6f]
/home/bob/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x82d73ae]
/home/bob/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x82f8867]
/home/bob/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x82f14bb]
/home/bob/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x82f97f6]
/lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0xb0dac4d3]

Exiting...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7054, selfPID=7050, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Called boinc_finish

</stderr_txt>
ID: 49556 · Report as offensive     Reply Quote
Profile Ron Crouch
Avatar

Send message
Joined: 24 Feb 05
Posts: 45
Credit: 11,332,534
RAC: 0
Message 49560 - Posted: 15 Jul 2014, 20:34:57 UTC
Last modified: 15 Jul 2014, 21:03:46 UTC

CPDN on Linux or Windows with Intel chips doesn't like either hot operating conditions or overclocked chips. AMD's are fine overclocked as long as the heat issue is addressed.

Solve either or both as applicable and your not likely to have as many failures.

Either overgrown air coolers which may or may not allow for the re-installation of the side cover (and can be quite loud), or, liquid cooling which runs fairly quiet (and most likely will reduce your core temps by 20 C, but might require a new tower).

I have both an Amd 8350 and an Intel 4770k. Both are outfitted with a Seidon M120. Both run CPDN 24/7. The AMD being more power hungry runs at approx. 58 C, the Intel not being as power hungry runs at approx. 48 C. The Amd runs at 4.4 Ghz (overclocked no problem), The Intel runs at 3.5 Ghz (produces lots of failures if overclocked, but only on CPDN).

Obviously though different configurations using either chip will vary depending on other variables such as motherboard and ram for instance, thusly producing different results from mine. However the end does support the mean, regardless.
6,000?? Give it a rest.

G�bekli Tepe is more than 10,000 years old. And quite intricate I might add.

Explain that!
ID: 49560 · Report as offensive     Reply Quote
WB8ILI

Send message
Joined: 1 Sep 04
Posts: 161
Credit: 81,421,805
RAC: 1,225
Message 49561 - Posted: 15 Jul 2014, 21:57:53 UTC

Thanks Ron Crouch for the reply.

While I won't completely dismiss your suggestion, let me add the following:

The CPU is a Phenom II X4 945 Quad. It is NOT over clocked. The CPU temp is running at 55C which I think is on the reasonably low side.

All of the CPDN tasks that have had an error all end after about 90 sec with the same trace back list. I would think if heat were problem the failures would be at random points.

I am thinking I might have some out-of-date libraries or maybe missing libraries. But, I don't have the knowledge to figure that out. If this is the case, it must be some missing library function that isn't called or used in most of the tasks.

I have checked (tried to update) libc.so.6 (last entry in the traceback) and the Update Manager indicates that is up to date.

So, for now I will run some Einstein tasks.

ID: 49561 · Report as offensive     Reply Quote
Profile Ron Crouch
Avatar

Send message
Joined: 24 Feb 05
Posts: 45
Credit: 11,332,534
RAC: 0
Message 49562 - Posted: 16 Jul 2014, 1:07:33 UTC - in response to Message 49561.  

My suggestion wasn't intended to necessarily address your particular problem.

If you are running Linux x86_64 for instance then you would also need to install the 32 bit libraries for libc.so.6.

Seeing that your client is seg faulting on startup indicates that some libraries may be missing or are incompatible.
6,000?? Give it a rest.

G�bekli Tepe is more than 10,000 years old. And quite intricate I might add.

Explain that!
ID: 49562 · Report as offensive     Reply Quote
WB8ILI

Send message
Joined: 1 Sep 04
Posts: 161
Credit: 81,421,805
RAC: 1,225
Message 49563 - Posted: 16 Jul 2014, 1:13:22 UTC

Ron -

The missing or incompatible libraries is my thinking too. But, I don't know how to find which one or ones. I am using 32 bit UBUNTU 12.04.
ID: 49563 · Report as offensive     Reply Quote
Profile Ron Crouch
Avatar

Send message
Joined: 24 Feb 05
Posts: 45
Credit: 11,332,534
RAC: 0
Message 49564 - Posted: 16 Jul 2014, 1:28:10 UTC - in response to Message 49563.  
Last modified: 16 Jul 2014, 1:33:45 UTC

Yes it can sometimes be a royal pain trying to sort some things out.

I don't use Ubuntu so I might suggest trying their forums.

And 55 C is fine as long as that's the max under full system load. Would be very bad if that were the idling temp (should be around 23 C).

You may need to do a major update to your Ubuntu version to bring it up to 14.04 LTS.
6,000?? Give it a rest.

G�bekli Tepe is more than 10,000 years old. And quite intricate I might add.

Explain that!
ID: 49564 · Report as offensive     Reply Quote
Profile Greg van Paassen

Send message
Joined: 17 Nov 07
Posts: 142
Credit: 4,271,370
RAC: 0
Message 49565 - Posted: 16 Jul 2014, 1:52:03 UTC - in response to Message 49563.  

WB8ILI: use the `ldd' command as described in this thread.

That tells you which shared library files CPDN applications are using.

Then use `dpkg-query --search filename' to find the owning package, and check for updates.

(No doubt there's a quicker way to do this, but I don't know it off the top of my head.)

By the way, I'm also using 32-bit 12.04 LTS. It's fine; there's no need to upgrade to 14.04 yet, if you don't want to.
ID: 49565 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 49566 - Posted: 16 Jul 2014, 3:42:08 UTC

There's also the opposite problem, where an older version of a program is needed by some of the models.

I've come across this with the graphics of one model type, but I forget the details. I think that may be in another thread from early in the year.


ID: 49566 · Report as offensive     Reply Quote
WB8ILI

Send message
Joined: 1 Sep 04
Posts: 161
Credit: 81,421,805
RAC: 1,225
Message 49569 - Posted: 16 Jul 2014, 13:11:14 UTC

Greg -

I did the ldd and dpkg-query commands. All the libraries shown were part of libc6. I reinstalled that. No option for an older version.

It will be a day or two before I can download another CPDN model - (already have enough work).

55C is my running 4 tasks temp.
ID: 49569 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4341
Credit: 16,497,933
RAC: 6,477
Message 49572 - Posted: 16 Jul 2014, 14:53:44 UTC - in response to Message 49569.  

55C isn't likely to cause problems.
ID: 49572 · Report as offensive     Reply Quote
1 · 2 · Next

Questions and Answers : Unix/Linux : Lots of tasks end with "Error While Computing". Is there a problem at my end?

©2024 climateprediction.net