climateprediction.net home page
New error message

New error message

Message boards : Number crunching : New error message
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Kevin

Send message
Joined: 5 Jul 09
Posts: 63
Credit: 6,091,274
RAC: 0
Message 53486 - Posted: 20 Feb 2016, 20:31:49 UTC

Just out of curiosity, what is the definition around here of a hoarder?


Kevin
ID: 53486 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 53487 - Posted: 20 Feb 2016, 20:42:01 UTC - in response to Message 53486.  

The scientists would have to define it, but I would say along the lines of someone who holds more work units in their buffer than they can finish in the time than it takes for another user to do them. In other words, hoarding slows down the science rather than expedites it.
ID: 53487 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 53488 - Posted: 20 Feb 2016, 22:26:33 UTC - in response to Message 53484.  

I recently suggested shorter times, but didn't get anywhere, which is why I think what I said in my post may be "how things are".

Given that we have a "distributed input" from all over the planet, it may be simpler than trying to get the work to reliable computers.
Although it should be possible to write scripts to look for hoarders and serial crashers, implementing a way for fast, reliable machines to get more of the work load may be tricky.

Yes, all you need for a start is just to shorten the deadlines. The reliable computers are used when they need a quorum of two results from different users to agree (i.e., be validated against each other). The reliable machine is just a way to identify a third party who usually gets the results back in time to meet the deadline (usually seven days in their case), in case one of the first users does not return the result in a timely manner. But given the long deadlines here, and that you don't use the quorum system anyway, it is quite unnecessary. I just mention it to point out some of the techniques that some of the other projects use.
ID: 53488 · Report as offensive     Reply Quote
Alex Plantema

Send message
Joined: 3 Sep 04
Posts: 126
Credit: 26,363,193
RAC: 0
Message 53489 - Posted: 21 Feb 2016, 0:20:41 UTC

It might help giving a bonus for a completed result. Now it doesn't matter for the credits if one finishes tasks or not.
ID: 53489 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 53490 - Posted: 21 Feb 2016, 0:31:07 UTC - in response to Message 53489.  

It might help giving a bonus for a completed result.

This has been brought up a few times over the years.
And if you stand well back, and look at the matter sideways, (sort of thing), then there IS a bonus for finishing quickly - you get to run more models in a given time, thus increasing your monopoly money faster.

Unlike most, if not all other projects, cpdn is all statistics: throw a lot of data sets at the problem, wait to see how they fair, and then throw more datasets at it if necessary.

ID: 53490 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 53491 - Posted: 21 Feb 2016, 0:36:06 UTC - in response to Message 53486.  

Yes, Jim's pretty right with that.
I recently came across one model that had been sent in early February last year, and was only run and completed in January this year. And looking at the dates/times of trickle returns it was run fast.

And all of my recent posts in this thread are just my thoughts on the matter.
I could be way out.


ID: 53491 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 53492 - Posted: 21 Feb 2016, 0:46:53 UTC - in response to Message 53485.  

47an

That's the well know Intel FORTRAN error. There are a number of threads about it on these boards. One is here in the Windows section: Visual Fortran Runtime error


ID: 53492 · Report as offensive     Reply Quote
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 53516 - Posted: 24 Feb 2016, 21:07:52 UTC - in response to Message 53488.  

Hi, Jim,

Yes, all you need for a start is just to shorten the deadlines.

We went through that battle early in the project and a few times since. The problem with shorter deadlines, given relatively long CPDN tasks (though they are MUCH shorter now), is that CPDN too-quickly gets to "High Priority" for CPU time with short deadlines. People running multiple projects objected -- strenuously. The 'solution' was longer deadlines for CPDN -- good for other projects, not so good for CPDN.

Jim
"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 53516 · Report as offensive     Reply Quote
Harri Liljeroos
Avatar

Send message
Joined: 9 Dec 05
Posts: 111
Credit: 12,038,780
RAC: 1,393
Message 53519 - Posted: 25 Feb 2016, 11:04:25 UTC

If I have understood correctly, the current versions of Boinc want to calculate through all (or almost all) work from one project when it switches projects. That is at least what I am seeing for CPDN. If I have CPDN work on my host Boinc will crunch them with a single go until there is only one or two hours time left to calculate that WU before switching to projects. Projects are usually not switched every 60 minutes like the preferences are set to do.
ID: 53519 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 53520 - Posted: 25 Feb 2016, 21:38:46 UTC - in response to Message 53519.  

The new versions initially only run a single task from a newly added project.
And I think that it also only runs one project at a time.

This is so that BOINC can get an idea of how long these new-to-it tasks will take to run.
With the climate models taking so long to complete, this learning can take a long time, especially if the computer is not on all the time, or is heavily used by the owner.

Projects are usually not switched every 60 minutes like the preferences are set to do.

The "switch time" is only a timer that tells BOINC the earliest moment that it can start to consider switching. When it gets to this time, other factors come into play which can delay the switching for longer, so, yes, the projects don't get switched at that interval. e.g. 60 minutes.

ID: 53520 · Report as offensive     Reply Quote
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 53710 - Posted: 20 Mar 2016, 20:24:22 UTC

I'm stealing your Thread for the moment, Les, because the Board wouldn't allow me to create a new one (Missing parameter error -- though nothing was missing.)

Re.: wah2_sas50_
This error sequence occurred on two machines (I didn't give a third box the satisfaction.)

Staff notified.

3/20/2016 12:30:30 PM|climateprediction.net|Sending scheduler request: To fetch work. Requesting 2322726 seconds of work, reporting 0 completed tasks
3/20/2016 12:30:35 PM|climateprediction.net|[error] Can't parse file info in scheduler reply: file name is empty or has '..'


(Ditto -- for another 13 lines)

3/20/2016 12:30:35 PM|climateprediction.net|Scheduler request succeeded: got 5 new tasks
3/20/2016 12:30:35 PM|climateprediction.net|[error] State file error: missing file
3/20/2016 12:30:35 PM|climateprediction.net|[error] Can't handle task wah2_sas50_fpqd_201412_13_367_010380127_1 in scheduler reply


"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 53710 · Report as offensive     Reply Quote
Profile ritterm
Avatar

Send message
Joined: 29 May 08
Posts: 128
Credit: 6,289,876
RAC: 0
Message 53711 - Posted: 20 Mar 2016, 22:22:45 UTC - in response to Message 53710.  

astroWX wrote:

Re.: wah2_sas50_
This error sequence occurred on two machines (I didn't give a third box the satisfaction.)...

I see that I've returned two of these with similar stderr output. Any advice on what to do with any in our queues ready to start?
ID: 53711 · Report as offensive     Reply Quote
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 53713 - Posted: 20 Mar 2016, 23:30:51 UTC - in response to Message 53711.  

In my case, nothing can be done because they don't exist on my machines -- though my server account shows them alive and well -- and assigned to me.

However, a rather serious indication that they are not confused with anything else on the machine -- one box had no main-site work on it and still has no main-site work on it, despite server evidence that that machine owns one. (The other box supposedly caught five new tasks - only four are listed on the machine [three EU and one AFR]...)

If you have (non-returned) live ones on your machine, my suggestion is to hold them until we hear from staff, probably Monday.

"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 53713 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 53714 - Posted: 21 Mar 2016, 0:17:04 UTC

There's been an email back about it - looks like a template error.
So, dump any of these you have and try again.
If trouble persists, post back here.

ID: 53714 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,100,600
RAC: 2,970
Message 53716 - Posted: 21 Mar 2016, 3:56:32 UTC - in response to Message 53714.  

There's been an email back about it - looks like a template error.
So, dump any of these you have and try again.
If trouble persists, post back here.



What batches are the bad WU’s from so they can be dumped?

ID: 53716 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 53718 - Posted: 21 Mar 2016, 4:30:31 UTC - in response to Message 53717.  

Looking at the last line in Astro's post, they're sas50, batch 367.

ID: 53718 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,100,600
RAC: 2,970
Message 53725 - Posted: 21 Mar 2016, 15:05:59 UTC

Thanks. Fortunately I only had 1 from that batch. It has been aborted.

ID: 53725 · Report as offensive     Reply Quote
nairb

Send message
Joined: 3 Sep 04
Posts: 105
Credit: 5,646,090
RAC: 102,785
Message 53780 - Posted: 24 Mar 2016, 2:52:55 UTC
Last modified: 24 Mar 2016, 2:56:12 UTC

test post... unable to create new thread ... But I can post to this thread??

I get
Unable to handle request
missing or bad parameter: id; supplied:
ID: 53780 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 53781 - Posted: 24 Mar 2016, 5:48:43 UTC - in response to Message 53780.  

Yes, that's a know problem. It's been on the back burner while the credits problem was/is being worked on.

ID: 53781 · Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Number crunching : New error message

©2024 climateprediction.net