climateprediction.net home page
HadAM3P-PNW disappeared?
HadAM3P-PNW disappeared?
log in

Advanced search

Message boards : Number crunching : HadAM3P-PNW disappeared?

1 · 2 · Next
Author Message
Profile Greg van Paassen
Send message
Joined: 17 Nov 07
Posts: 142
Credit: 4,271,370
RAC: 0
Message 42008 - Posted: 22 Apr 2011, 20:50:21 UTC

As of yesterday the "Server Status" page has been showing 0 HadAM3P-PNW tasks. The day before, there were 50,000-odd.

Should there be an announcement?

BTW I have just received 6 HadAM3P-PNWs, at least two of which are 'new' - first task for the work unit was issued after the Server Status changed to 0.

Les Bayliss
Volunteer moderator
Send message
Joined: 5 Sep 04
Posts: 6231
Credit: 14,607,204
RAC: 543
Message 42009 - Posted: 22 Apr 2011, 21:39:34 UTC - in response to Message 42008.

No news at the moment.
It's the Easter long weekend, so most of Oxford would be closed, and I've been sleeping for the last few hours.
I've asked about it, but it may take a while for a reply.

The EU pool is down to 9 at the moment as well.



____________
Backups: Here

Profile Thyme Lawn
Volunteer moderator
Send message
Joined: 5 Aug 04
Posts: 1266
Credit: 12,068,949
RAC: 1,197
Message 42010 - Posted: 22 Apr 2011, 22:15:50 UTC
Last modified: 22 Apr 2011, 22:39:15 UTC

It looks like there were download problems earlier today. hadam3p_pnw_yyam_2005_1_006899510_0 (from the same WU as one of Greg's tasks) reported the following error at 22 Apr 2011 20:14:20 UTC:

app_version download error: couldn't get input files:
<file_xfer_error>
<file_name>hadam3p_pnw_graphics_6.09_i686-pc-linux-gnu</file_name>
<error_code>-224</error_code>
<error_message>file not found</error_message>
</file_xfer_error>

I've run a browser check on http://climateapps2.oucs.ox.ac.uk/cpdnboinc/download/mirror.php?file=/hadam3p_pnw_graphics_6.09_i686-pc-linux-gnu and it now seems to be available on the 3 mirror servers I'm aware of (http://uploader1.atm.ox.ac.uk, http://climateprediction.net and http://climateapps2.oucs.ox.ac.uk).
____________
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer

Profile Greg van Paassen
Send message
Joined: 17 Nov 07
Posts: 142
Credit: 4,271,370
RAC: 0
Message 42012 - Posted: 23 Apr 2011, 4:16:16 UTC
Last modified: 23 Apr 2011, 4:17:06 UTC

Oh, OK. The drop from 50,000-plus down to 0 was so sudden, I thought that someone had "pulled the plug" on the PNW project. But I'd expect that they would tell the moderators if so. :-) If you guys don't know anything about the project being cancelled, it must be just a glitch in the server status page.

Nigel Garvey
Send message
Joined: 5 May 10
Posts: 46
Credit: 762,215
RAC: 0
Message 42013 - Posted: 23 Apr 2011, 7:04:48 UTC

The PNW app has also disappeared from the Applications page.

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/apps.php

Profile Greg van Paassen
Send message
Joined: 17 Nov 07
Posts: 142
Credit: 4,271,370
RAC: 0
Message 42028 - Posted: 26 Apr 2011, 19:44:55 UTC
Last modified: 26 Apr 2011, 19:48:06 UTC

Now all my HadAM3P-PNWs have been marked "Didn't need". What's going on?

Edt: correction - two of them are still "in progress", but the other four are "didn't need". Do I cancel the "didn't need"s?

I see HadCM3N is back, on the "server status" page. Did I miss the memo?

Les Bayliss
Volunteer moderator
Send message
Joined: 5 Sep 04
Posts: 6231
Credit: 14,607,204
RAC: 543
Message 42029 - Posted: 26 Apr 2011, 21:32:01 UTC - in response to Message 42028.

Presumably, when the data sets were removed from the download data pool, the BOINC server software took that to mean that none of the unreturned results were needed, and wrote the Not needed message into everyone's model pages.

Note however, that Not needed isn't the same as not wanted by the researchers, who would still like to get their hands on as much data as possible, please.

So if the models are still running, and haven't been killed off by some downloaded signal from Oxford, you should continue to crunch them.

******************

There's been no memo, possibly because it was still the 'weekend' in the UK.
What's going on is anyone's guess.

There is, however, THIS memo about upgraded security measures on the alternative PHP board.


____________
Backups: Here

Profile astroWX
Volunteer moderator
Send message
Joined: 5 Aug 04
Posts: 1426
Credit: 61,828,043
RAC: 11,478
Message 42033 - Posted: 27 Apr 2011, 1:57:54 UTC

I received this message after #13 upload to Oxford. None of the earlier uploads (to U.Oregon) triggered a red message.

____________
"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.

Profile Jonathan Miller
Volunteer moderator
Project administrator
Project developer
Volunteer developer
Send message
Joined: 28 Mar 11
Posts: 35
Credit: 82,588
RAC: 0
Message 42044 - Posted: 27 Apr 2011, 12:40:43 UTC
Last modified: 27 Apr 2011, 13:15:00 UTC

DIDN'T NEED flag

Hi Everyone,

Some of the work units that get processed contain particular parameters that are of interest to the CPDN project. The BOINC system has a method for allowing us to gather more info on certain parameter sets by resubmitting a work unit to the pool of available work units.

The DIDN'T NEED flag means that the CPDN project did/do not need to resubmit the work unit for additional processing.

The flag can mean a number of things, and is combined with other flags in the database to determine exactly why we don't need to reprocess it. One of the common reasons is that the current run gives us exactly the info we need.

It is unfortunate that the flag gives the impression that we are not interested in the work unit - we certainly are interested.
We are looking into how we can make this more clear on the work unit info pages.

Please accept our apologies for any confusion or consternation this may have caused.

EDIT: We have now altered this flag to read "No Resubmission" which is a more accurate reflection of the status of the work unit.

Jonathan
CPDN SysAdmin

Profile Greg van Paassen
Send message
Joined: 17 Nov 07
Posts: 142
Credit: 4,271,370
RAC: 0
Message 42056 - Posted: 28 Apr 2011, 23:44:20 UTC

Re: Didn't need / No resubmission:

Fri 29 Apr 2011 10:02:11 NZST climateprediction.net Started upload of hadam3p_pnw_zjca_1969_1_006969986_2_13.zip
Fri 29 Apr 2011 10:02:13 NZST climateprediction.net Computation for task hadam3p_pnw_zjca_1969_1_006969986_2 finished
Fri 29 Apr 2011 10:10:59 NZST climateprediction.net Finished upload of hadam3p_pnw_zjca_1969_1_006969986_2_13.zip
Fri 29 Apr 2011 11:19:59 NZST climateprediction.net Sending scheduler request: To send trickle-up message.
Fri 29 Apr 2011 11:19:59 NZST climateprediction.net Reporting 1 completed tasks, not requesting new tasks
Fri 29 Apr 2011 11:20:04 NZST climateprediction.net Scheduler request completed
Fri 29 Apr 2011 11:20:04 NZST climateprediction.net Message from server: Completed result hadam3p_pnw_zjca_1969_1_006969986_2 refused: this result wasn't sent (not needed)

That suggests that completed work is not getting through, whether or not the scientists want it. I'm still confused. I think I'd rather crunch something else.

Les Bayliss
Volunteer moderator
Send message
Joined: 5 Sep 04
Posts: 6231
Credit: 14,607,204
RAC: 543
Message 42059 - Posted: 29 Apr 2011, 4:22:11 UTC - in response to Message 42056.

I've just received the same message.
It looks like the attempt to introduce a new label into the BOINC system hasn't worked. :(

I'll inform the project people.


____________
Backups: Here

DaveG27
Send message
Joined: 8 Nov 06
Posts: 18
Credit: 2,425,895
RAC: 0
Message 42062 - Posted: 29 Apr 2011, 17:05:23 UTC

I have had the same.
when I look tasks instead of saying "completed" get "No Resubmission" I get the feeling I am completely wasting my computer time this can be seen on other work units.

Les Bayliss
Volunteer moderator
Send message
Joined: 5 Sep 04
Posts: 6231
Credit: 14,607,204
RAC: 543
Message 42063 - Posted: 29 Apr 2011, 17:30:23 UTC

Those people who feel that this new message means that they're wasting their time should stop crunching climate models, leave the project, and not come back!

For the rest of us, the data is stored on the servers, but it's a 4 day long weekend in the UK, so we have to wait until Tuesday morning UK time for the project people to return and kick the BOINC code until it behaves. :)


____________
Backups: Here

DaveG27
Send message
Joined: 8 Nov 06
Posts: 18
Credit: 2,425,895
RAC: 0
Message 42064 - Posted: 29 Apr 2011, 17:53:59 UTC

Those people who feel that this new message means that they're wasting their time should stop crunching climate models, leave the project, and not come back!

I have crunched this project since the BBC days but if this the new attitude I will take your advice as the messages are no longer in plain English.
I have had very few failures and most have been successful and put up with some of the projects problems.

Profile Greg van Paassen
Send message
Joined: 17 Nov 07
Posts: 142
Credit: 4,271,370
RAC: 0
Message 42066 - Posted: 29 Apr 2011, 21:58:48 UTC - in response to Message 42064.

Dave, it's likely that this is just a 'learning the ropes' problem for the new project staff. And possibly Les was short of painkillers when he wrote that.

The HadCM3N models are working well. Just a couple of niggles: the initial duration estimate is about double the true figure (530 - 600 hours for your C2Qs), and they've a short deadline, which is really only indicative - the researchers will still use models that finish after the deadline. I'm crunching them, now.

DaveG27
Send message
Joined: 8 Nov 06
Posts: 18
Credit: 2,425,895
RAC: 0
Message 42067 - Posted: 29 Apr 2011, 22:19:12 UTC

Greg I've calmed down now but I was annoyed.
I had noticed that I had 3 completed models with downloads stuck in the transfer tab I do not check this often.
I quickly realised it due to a pnw running under linux with the handler problem. Edited the client_state.xml file which solved the problem but took hours off download. Any when reported completed I got in the messages so I went to my account to see if they were completed and left me confused.

Profile astroWX
Volunteer moderator
Send message
Joined: 5 Aug 04
Posts: 1426
Credit: 61,828,043
RAC: 11,478
Message 42068 - Posted: 30 Apr 2011, 2:17:00 UTC
Last modified: 30 Apr 2011, 2:27:17 UTC

PNW's first twelve uploads go to the science database at U.Oregon. No red messages from them, eh?

#13 upload, after task completion and full credits are awarded, is a restart dump sent to Oxford so the next segment of the sequence can start, supposedly where the last one ended. (Work was chopped into segments because people accustomed to tasks taking from minutes to a few hours elsewhere whined at length across the boards. Hence some of the current difficulties. [I don't envy the scientists working to understand segment differences run on different CPUs and OSs ...])

As I understand it, the new red message is a consequence of recent security changes to the boards to inhibit spam registrations. Not a secret, explanations were posted.

I've been with CPDN since Original Beta, July 2003, and though, early-on (pre-boinc), we had to do some manual uploads, I don't recall any work being lost. (Early 14-day boinc timeout is another thing.)

Despite a long history of under-staffing and a plethora of problems, many boinc-related, the Project has a good record of saving all our work.

Hang in with Oxford's new IT team through its learning curve or bail out of a wounded but still-flying bird. Your choice.
____________
"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.

BigMike
Send message
Joined: 6 Apr 05
Posts: 17
Credit: 744,057
RAC: 0
Message 42116 - Posted: 5 May 2011, 4:36:33 UTC

Messages and server problems aside, I'm still wondering why there is no work for PNW, while there is for the other regionals.

Any explanation from the science group?

=Mike
____________

Profile astroWX
Volunteer moderator
Send message
Joined: 5 Aug 04
Posts: 1426
Credit: 61,828,043
RAC: 11,478
Message 42117 - Posted: 5 May 2011, 5:35:15 UTC

I have no new skinny but there was an issue with Linux tasks. Perhaps the new support team felt it safer to throttle PNW flavor (the one for my area of the planet) rather than tweak possibilities. I have all confidence that Andy and Jonathan will sort it all out in due time. Please hang in there!


____________
"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.

MacRonin
Send message
Joined: 24 Apr 08
Posts: 6
Credit: 176,830
RAC: 0
Message 42125 - Posted: 5 May 2011, 20:54:56 UTC - in response to Message 42117.

Just verifying, but it sounds like the msg I got is not quite accurate and is being updated for items being downloaded in the future. But in the mean time, scary msg or not the work is useful to you and should still be allowed to run if its already going

After 24-48 hours of being unable to upload this task(other items uploaded ok, trickles I think) I finally got

[code]Thu May 5 16:21:52 2011 climateprediction.net Message from server: Completed result hadam3p_eu_wczh_1988_1_006821781_0 refused: this result wasn't sent (not needed)
[
/code]

And I should let my 3 other tasks go thru to completion and not get nervous if I get the same(or similar error msg)? The website doesn't show anything as labeled "In progress" but does have some(4) labeled as "no re-submission"

BTW, referring back to a comment earlier in the thread. I personally have absolutely no problem with really long tasks, as long as they are labeled as such and have appropriate deadlines.

Now another project I worked with for a bit would give estimates of 5-7 hours and deadlines of a week and then give you tasks that literally ran for weeks with systems dedicated 100% to them and didn't take any checkpoints for most of that time. But you guys don't do that :-)

1 · 2 · Next

Message boards : Number crunching : HadAM3P-PNW disappeared?


Main page · Your account · Message boards


Copyright © 2016 climateprediction.net