climateprediction.net home page
Orphened Work Units...

Orphened Work Units...

Message boards : Number crunching : Orphened Work Units...
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Art Masson
Avatar

Send message
Joined: 16 Oct 11
Posts: 254
Credit: 15,954,577
RAC: 19
Message 55501 - Posted: 17 Jan 2017, 17:05:47 UTC

I have 3 work units which CPDN has as "in process" status but which are no longer visible in my BOINC CPDN task list. These have apparently resulted from processing failures on computer ID 1266353. (However they are not listed as CPDN processing errors). All three of these failed after sending three trickles sometime after Dec 17, 2016.

Wondering if there is any way to determine how/why these kind of processing errors occur this way -- where the failure does not result an a CPDN error so that the work units can be resent. This seems to be a BOINC processing problem possibly caused by a CPU failure, power failure or computer restart. With the current status, these units will remain in "In Process" status until Nov 22, 2017 (one year after I received them) -- but obviously they can no longer be processed in the meantime.

The three work units in this status are:

hadam3p_eu_lxif_201611_3_482_010809922_1
hadam3p_eu_lygr_201611_3_482_010811158_0
hadam3p_eu_lum4_201611_3_481_010802543_1


I realize there have been discussions about these and Les has suggested we just don't worry about them...but if others are seeing this, perhaps we can somehow find the cause....

Thoughts anyone?
Art Masson
ID: 55501 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 55504 - Posted: 17 Jan 2017, 18:48:50 UTC

I sent another email a day or so back, and it's being discussed.
ID: 55504 · Report as offensive     Reply Quote
bernard_ivo

Send message
Joined: 18 Jul 13
Posts: 438
Credit: 24,488,575
RAC: 2,962
Message 55510 - Posted: 17 Jan 2017, 22:57:18 UTC - in response to Message 55501.  

I have 3 work units which CPDN has as "in process" status but which are no longer visible in my BOINC CPDN task list. These have apparently resulted from processing failures on computer ID 1266353.
I realize there have been discussions about these and Les has suggested we just don't worry about them...but if others are seeing this, perhaps we can somehow find the cause....

Thoughts anyone?
Art Masson


Hi Art,
Did you see any errors in the log file? Are you sure it was CPU failure? The ones reported in the other thread are mostly hadamp3_eu as yours (3 months models), but we haven't notice any (obvious) errors with them and these are reported by Les.
ID: 55510 · Report as offensive     Reply Quote
Art Masson
Avatar

Send message
Joined: 16 Oct 11
Posts: 254
Credit: 15,954,577
RAC: 19
Message 55532 - Posted: 20 Jan 2017, 22:21:21 UTC - in response to Message 55510.  

No..no error messages at all...they just "disappeared" after some failure on my machine...with no update/error status to let CPDN know...
ID: 55532 · Report as offensive     Reply Quote
bernard_ivo

Send message
Joined: 18 Jul 13
Posts: 438
Credit: 24,488,575
RAC: 2,962
Message 55538 - Posted: 21 Jan 2017, 9:03:52 UTC - in response to Message 55532.  

No..no error messages at all...they just "disappeared" after some failure on my machine...with no update/error status to let CPDN know...


As far as I get it yours did not finish successfully but failed. You may try detach and reattach CPDN and check the orphaned WUs (just noticed reattaching to CPDN with https://www.cpdn.org did not work for me, but http://climateprediction.net
ID: 55538 · Report as offensive     Reply Quote
Art Masson
Avatar

Send message
Joined: 16 Oct 11
Posts: 254
Credit: 15,954,577
RAC: 19
Message 55545 - Posted: 21 Jan 2017, 22:40:46 UTC - in response to Message 55538.  

I think I'll wait to hear back from Les -- since he sent another note to the CPDN folks and perhaps they will have another suggestion. Would be great just to have a way on the web site to advise CPDN to reissue a work unit that has failed this way....
ID: 55545 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 55547 - Posted: 22 Jan 2017, 1:51:13 UTC

It would be best if not too many reposts of the new url are made. Too many hackers about.

Using an Account manager was/is intended to make it easier to join multiple projects.
If it doesn't work with BAM, then it's not doing it's job.

*********

Manually, I'd suggest logging into the new address first.
On the old account page, replace the url up to and including UK with the new part, and then try.
It should ask you to log in.
The only things that should stop you, are not having cookies enabled, and perhaps some anti malware settings.
ID: 55547 · Report as offensive     Reply Quote
bernard_ivo

Send message
Joined: 18 Jul 13
Posts: 438
Credit: 24,488,575
RAC: 2,962
Message 55548 - Posted: 22 Jan 2017, 9:47:22 UTC - in response to Message 55547.  
Last modified: 22 Jan 2017, 9:47:50 UTC

If it doesn't work with BAM, then it's not doing it's job.


I tried on all 3 machines not via BAM! (deliberately switched it off), but via Add project function of BOINC and it did not work with the SSL. Did not try to attach via BAM! After reattaching I switched back using BAM!

I guess the manual instructions are for the site only not for attaching via BOINC, and I'm properly logged.
ID: 55548 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 55549 - Posted: 22 Jan 2017, 9:56:30 UTC - in response to Message 55548.  

I think that the BOINC sign up may still have the old url.
I'll ask.
ID: 55549 · Report as offensive     Reply Quote
Art Masson
Avatar

Send message
Joined: 16 Oct 11
Posts: 254
Credit: 15,954,577
RAC: 19
Message 55553 - Posted: 22 Jan 2017, 15:51:37 UTC - in response to Message 55547.  

Sorry if I'm being dumb...what is "BAM" ?
ID: 55553 · Report as offensive     Reply Quote
Profile Iain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1079
Credit: 6,904,878
RAC: 6,593
Message 55554 - Posted: 22 Jan 2017, 16:22:42 UTC - in response to Message 55553.  

Sorry if I'm being dumb...what is "BAM" ?

It's an "account manager", which consolidates various interactions with a set of BOINC (and possibly other) projects. The Web site is here.
ID: 55554 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 55567 - Posted: 23 Jan 2017, 23:10:43 UTC

OK, the new secure site is undergoing "stress testing". The BOINC sign up for it won't become active until enough people are running on the new site without problems.
But probably few have found out about it yet. Certainly not the set-and-forget people.

In the mean time, if anyone wants to Disconnect / re-attach to get rid of uncompleted tasks, and they also want to be on the new secure site, it's DIY time.

Releasing old uncompleted tasks will not necessarily mean they'll get run by someone else, or be useful if they do. As I've said before, if a researcher doesn't get all of their data back in a reasonable time, there's nothing to stop them from putting out another batch with the missing bit's in it, and then ignoring any of the original tasks if/when they eventually show up.
ID: 55567 · Report as offensive     Reply Quote
bernard_ivo

Send message
Joined: 18 Jul 13
Posts: 438
Credit: 24,488,575
RAC: 2,962
Message 55569 - Posted: 24 Jan 2017, 8:15:34 UTC - in response to Message 55567.  

Thanks Les,
DIY reattaching for the moment will still be under the old URL, won't be?

It might be useful, if possible, for very important messages to users to use BOINC Notices? (i.e when stress tests are over a detaching and reattaching message could make more people switch to SSL and clear up a bit the In Progress queue)
ID: 55569 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 55570 - Posted: 24 Jan 2017, 9:02:18 UTC - in response to Message 55569.  

DIY reattaching for the moment will still be under the old URL, won't be?

Yes. Then you'll need to change the url to the new one again. (which is the DIY part.)
I don't use detach/reattach, so I'm guessing with a lot of this.

The "testing" is waiting for people to break it. Or complain about something.
Which means that people need to be using the new url.

So far, the only problem has been with this detach business.
ID: 55570 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4314
Credit: 16,378,503
RAC: 3,632
Message 55572 - Posted: 24 Jan 2017, 9:45:30 UTC
Last modified: 24 Jan 2017, 9:47:53 UTC

I did wonder about editing account_climateprediction.net.xml

<account><master_url>http://climateprediction.net/</master_url><authenticator>blankedoutforobviousreasons</authenticator><project_name>climateprediction.net</project_name>

But thought probably best to try it out first on a machine which is out of work. Though could try it on resurrected ageing net-book which only has one six month old task running. Not sure if there are other places where it might need changing as well though.

Edit: I see looking further down that file there are a lot of places to replace http with https in the file.
ID: 55572 · Report as offensive     Reply Quote
Art Masson
Avatar

Send message
Joined: 16 Oct 11
Posts: 254
Credit: 15,954,577
RAC: 19
Message 55573 - Posted: 24 Jan 2017, 15:39:57 UTC - in response to Message 55570.  

Hmmm...I think I'll hold off detaching/reattaching until the new secure site is fully up/running. I'm not into DIY much and wouldn't want to lose the processing on CPDN WUs in process. Does this make sense, Les?
Art
ID: 55573 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 55575 - Posted: 24 Jan 2017, 20:00:36 UTC - in response to Message 55573.  

I don't really know, as I don't need to do that.
ID: 55575 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 55576 - Posted: 24 Jan 2017, 20:01:41 UTC

Has anyone been having problems with the new site/url?
ID: 55576 · Report as offensive     Reply Quote
Alex Plantema

Send message
Joined: 3 Sep 04
Posts: 126
Credit: 26,363,193
RAC: 0
Message 55577 - Posted: 24 Jan 2017, 23:16:44 UTC
Last modified: 24 Jan 2017, 23:17:08 UTC

These tasks have been reported on 21 January but are still marked as in progress on the website:
https://www.cpdn.org/cpdnboinc/result.php?resultid=20118807 (wah2_nawa25_a27i_209912_13_491_010823332_0)
https://www.cpdn.org/cpdnboinc/result.php?resultid=20117448 (wah2_nawa25_a15r_209512_13_491_010821973_0)
https://www.cpdn.org/cpdnboinc/result.php?resultid=20118957 (wah2_nawa25_a1hs_209612_13_491_010822406_1)

What's the new url? http://www.climateprediction.net/getting-started/ and Boinc still show the old url.
ID: 55577 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 55578 - Posted: 25 Jan 2017, 0:42:13 UTC

What's the new url?


The first part is https://www.cpdn.org

See my post way down near the start of this thread.
ID: 55578 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Orphened Work Units...

©2024 climateprediction.net