climateprediction.net home page
Reporting misconfigured computers
Reporting misconfigured computers
log in

Advanced search

Message boards : climateprediction.net Science : Reporting misconfigured computers

1 · 2 · 3 · 4 . . . 13 · Next
Author Message
Profile mo.v
Volunteer moderator
Avatar
Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 10,773,446
RAC: 2,347
Message 47149 - Posted: 21 Sep 2013, 19:27:45 UTC

If you come across computers that crash large numbers of models you may wish to let us know so that Jonathan can 'minus' the computer (temporarily prevent it from receiving new models) and send the owner an email asking them to post on the forum and ask for help to fix the problem.

Please bear in mind that:

* download problems are not usually the fault of the computer

* models whose stderr+ messages include 6 instances of a problem identified in capital letters are usually defective models. This isn't the fault of the computer. Examples are: INITTIME, INVALID THETA, NEGATIVE PRESSURE.

* the computer must have crashed a lot of models and not be running models successfully at the moment

* the crashes must have been happening for quite a long time, definitely more than a month

* it goes without saying that the owner will not have posted about how to solve the problem!
____________
Cpdn news

Profile mo.v
Volunteer moderator
Avatar
Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 10,773,446
RAC: 2,347
Message 47150 - Posted: 21 Sep 2013, 19:37:32 UTC
Last modified: 21 Sep 2013, 19:37:53 UTC

Lockleys have (has?) reported the following computers:

http://climateapps2.oerc.ox.ac.uk/cpdnboinc/show_host_detail.php?hostid=985494
http://climateapps2.oerc.ox.ac.uk/cpdnboinc/show_host_detail.php?hostid=1286362

Thank you, I'll let Jonathan know.
____________
Cpdn news

Belfry
Send message
Joined: 19 Apr 08
Posts: 179
Credit: 4,306,992
RAC: 0
Message 47186 - Posted: 27 Sep 2013, 18:26:47 UTC

One here. For some reason I cannot open the stderr messages for most of the recent tasks.

Profile Greg van Paassen
Send message
Joined: 17 Nov 07
Posts: 142
Credit: 4,271,370
RAC: 0
Message 47192 - Posted: 28 Sep 2013, 7:27:33 UTC
Last modified: 28 Sep 2013, 7:28:22 UTC

... aaaaand another.

Linux 64bit, missing 32 bit libstdc++.so.6. 310 crashed; 8 completed back in 2011.

Ingleside
Send message
Joined: 5 Aug 04
Posts: 95
Credit: 9,596,251
RAC: 0
Message 47193 - Posted: 28 Sep 2013, 12:31:37 UTC

Uhm, why is this thread hidden-away under Science?

Profile JIM
Send message
Joined: 31 Dec 07
Posts: 982
Credit: 14,320,108
RAC: 19,627
Message 47195 - Posted: 28 Sep 2013, 19:49:15 UTC - in response to Message 47193.

Profile ritterm
Avatar
Send message
Joined: 29 May 08
Posts: 124
Credit: 5,461,846
RAC: 11,439
Message 47197 - Posted: 29 Sep 2013, 14:30:47 UTC - in response to Message 47195.
Last modified: 29 Sep 2013, 14:31:05 UTC

Ingleside
Send message
Joined: 5 Aug 04
Posts: 95
Credit: 9,596,251
RAC: 0
Message 47203 - Posted: 30 Sep 2013, 0:00:54 UTC
Last modified: 30 Sep 2013, 0:01:11 UTC

Broken Linux 1140495 missing library since upgrade/downgrade/whatever either in December 2012 or January 2013. (one model trickling until 21.12.2012 was reported as missing library in January).

old_user101069
Avatar
Send message
Joined: 4 Oct 05
Posts: 12
Credit: 610,967
RAC: 0
Message 47210 - Posted: 30 Sep 2013, 17:44:11 UTC

(x-post from the thread in number crunching)

My computer, 1281635, fails the vast majority of it's tasks. Failure is wildly different per unit (the most-recent ones are the disk error, but I've received a number of different non-model errors,) and I have not been able to identify a pattern. I've always assumed that the work completed was still valuable, so I let it keep chugging away. Way back when I was a student, I used to back up my tasks, and restore them after an error, but I just don't have time to manage that anymore. If someone wants to take a look and tell me if I should just abandon the project, I'd be interested to hear it. (This is something that only affects this project. My pc is otherwise stable and goes for weeks without reboot.)


Thanks!
____________

Profile Dave Jackson
Send message
Joined: 15 May 09
Posts: 1790
Credit: 2,671,578
RAC: 898
Message 47217 - Posted: 30 Sep 2013, 20:46:02 UTC

uioped1

The bottom task on the page shows invalid theta which means an impossible climate has been produced. That is not a computer problem. The others all have the unable to locate track on disk. The one that completed is a regional model as opposed to a full resolution ocean one. These are much more demanding and more likely to crash. First thing I would look at is ensuring that the boinc data directory is excluded from any virus checking.

old_user671679
Send message
Joined: 30 Jan 12
Posts: 38
Credit: 10,197,388
RAC: 0
Message 47219 - Posted: 30 Sep 2013, 22:03:00 UTC

This computer had it's last completion in September 2010.

Eirik Redd
Send message
Joined: 31 Aug 04
Posts: 343
Credit: 73,848,322
RAC: 147,536
Message 47331 - Posted: 17 Oct 2013, 2:31:06 UTC

Another Linux box missing some library and downloading and failing 400+ models so far.
____________

Profile astroWX
Volunteer moderator
Send message
Joined: 5 Aug 04
Posts: 1459
Credit: 76,183,576
RAC: 71,704
Message 47337 - Posted: 17 Oct 2013, 18:36:44 UTC

Thanks, Eirik. Andy notified via email.

____________
"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.

Profile mo.v
Volunteer moderator
Avatar
Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 10,773,446
RAC: 2,347
Message 47338 - Posted: 18 Oct 2013, 1:07:18 UTC
Last modified: 18 Oct 2013, 1:08:35 UTC

Uioped

The Hadcm model that failed with 6,220 credits will have crashed at 50% when it was creating the second decadal file. This is a known weakness of this model type (though it hardly ever happens to me).

As your models mostly crashed with error code 25, please make sure you suspend crunching using the BOINC Manager activity menu and then exit completely from BOINC (File > Exit) before shutting down the computer. This doesn't seem to be very important for tasks from other projects but climate models ARE NOT HAPPY if the computer is shut down while the models and BOINC are running and sometimes they will crash - but not always.


The reason this thread is in Science is because the Misconfigured Computers thread is also in this section. It was placed in this section about three years ago by Milo who was then our programmer and sysadmin. He also placed a link to the Misconfigured Computers thread in the email that's sent to the owners of all the model-crashing computers, so if we move the two threads to Number Crunching (where they should of course really be) we'd have to ask for that link in the email to be changed.
____________
Cpdn news

MyLittleBoinc
Send message
Joined: 31 Mar 13
Posts: 44
Credit: 6,950,896
RAC: 14
Message 47353 - Posted: 19 Oct 2013, 15:03:09 UTC - in response to Message 47149.

This one has about 200 Errors.
http://climateapps2.oerc.ox.ac.uk/cpdnboinc/results.php?hostid=1208986

Alex Plantema
Send message
Joined: 3 Sep 04
Posts: 97
Credit: 14,754,828
RAC: 43,978
Message 47358 - Posted: 20 Oct 2013, 12:57:12 UTC

This computer processed 861 tasks without any valid result.

Profile astroWX
Volunteer moderator
Send message
Joined: 5 Aug 04
Posts: 1459
Credit: 76,183,576
RAC: 71,704
Message 47361 - Posted: 20 Oct 2013, 20:07:50 UTC
Last modified: 20 Oct 2013, 20:44:58 UTC

Thanks Alex, and MyLittleBoinc. Andy notified.

Good to have willing eyes and hands to help. (It's a pity, though, so few of those with misconfigured machines respond to Andy's emails. At least, Andy sees to it that the offending machines sin-no-more.)

[Edited for typo.]
____________
"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.

Profile MikeMarsUK
Volunteer moderator
Avatar
Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 13,504,741
RAC: 13,691
Message 47390 - Posted: 22 Oct 2013, 11:58:52 UTC


Here's one - a linux box instantly crashing models.

http://climateapps2.oerc.ox.ac.uk/cpdnboinc/results.php?hostid=1235466


____________
I'm a volunteer and my views are my own.
News and Announcements and FAQ

Belfry
Send message
Joined: 19 Apr 08
Posts: 179
Credit: 4,306,992
RAC: 0
Message 47394 - Posted: 23 Oct 2013, 0:55:13 UTC

A newer machine here with random crash times. For some reason I can't to the stderr messages.

Profile Dave Jackson
Send message
Joined: 15 May 09
Posts: 1790
Credit: 2,671,578
RAC: 898
Message 47594 - Posted: 18 Nov 2013, 6:45:03 UTC

Another linux box missing 32bit libs. Computer 1185724

1 · 2 · 3 · 4 . . . 13 · Next

Message boards : climateprediction.net Science : Reporting misconfigured computers


Main page · Your account · Message boards


Copyright © 2017 climateprediction.net