climateprediction.net home page
Trickles not being reported for one model

Trickles not being reported for one model

Questions and Answers : Windows : Trickles not being reported for one model
Message board moderation

To post messages, you must log in.

AuthorMessage
Thunder

Send message
Joined: 1 Sep 04
Posts: 42
Credit: 6,475,117
RAC: 0
Message 34199 - Posted: 30 Jun 2008, 22:18:35 UTC

This model run:

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=7384116

has apparently not reported any trickles since 23 Apr 08, yet the client thinks it\'s sending trickles just fine.

As recently as ~10 minutes ago, it sent another:

climateprediction.net 6/30/2008 5:03:42 PM Sending scheduler request: To send trickle-up message. Requesting 0 seconds of work, reporting 0 completed tasks
climateprediction.net 6/30/2008 5:03:47 PM Scheduler request succeeded: got 0 new tasks

(the preceding is from boincview, so it doesn\'t look precisely like the format from the boinc client)

In all other respects the client appears to be running fine. Other projects are humming along, model appears to be crunching, etc.

Any idea what\'s going on?

ID: 34199 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 34201 - Posted: 30 Jun 2008, 23:54:52 UTC
Last modified: 1 Jul 2008, 0:16:01 UTC

Scheduler request succeeded

That usually means that the server has the trickles, but on some occassions when the server is very busy, it doesn\'t actually accept the trickles.
In which case, they usually finish uploading on a contact soon after. In the meantime, the trickle files are still showing on the user\'s computer, but with a different icon, and greyed out.

Another possibility, which doesn\'t seem to apply, is that there has been a new computer ID issued, which is usually caused by using a backup. In this case, the trickles will be logged on the \'old\' (original), ID. But you don\'t have another appearance of that computer.

The only other thing that I can think of, is that you created a new account at about that time, and the trickles since then have been going to the new account.
As we haven\'t a clue about the ID of any such account, it would be up to you to find it.

edit
If there is a possiblity of a second account, then a way to look for it would be:
On the computer in question, use Notepad to open client_state.xml
Use Find to look for <project>
Check the next couple of lines to see if they both mention this project name (climateprediction.net)
Otherwise, do Find next

If it\'s the right project, a few lines below will be: <hostid>
compare the number with the one in this thread, just below your name, to the left of the posts.
ID: 34201 · Report as offensive     Reply Quote
Thunder

Send message
Joined: 1 Sep 04
Posts: 42
Credit: 6,475,117
RAC: 0
Message 34204 - Posted: 1 Jul 2008, 17:14:10 UTC - in response to Message 34201.  

Another possibility, which doesn\'t seem to apply, is that there has been a new computer ID issued, which is usually caused by using a backup. In this case, the trickles will be logged on the \'old\' (original), ID. But you don\'t have another appearance of that computer.

The only other thing that I can think of, is that you created a new account at about that time, and the trickles since then have been going to the new account.
As we haven\'t a clue about the ID of any such account, it would be up to you to find it.

edit
If there is a possiblity of a second account, then a way to look for it would be:
On the computer in question, use Notepad to open client_state.xml
Use Find to look for <project>
Check the next couple of lines to see if they both mention this project name (climateprediction.net)
Otherwise, do Find next

If it\'s the right project, a few lines below will be: <hostid>
compare the number with the one in this thread, just below your name, to the left of the posts.


Well, I checked and the <hostid> shown in client_state.xml is still 221046. Another strong indication that this is not a problem with hostid or userid is that the \"Last Contact\" column on the site updates each time the computer sends a trickle. For example, it presently reads: 1 Jul 2008 16:00:29 UTC and the message log of the client states: climateprediction.net 7/1/2008 11:59:36 AM Scheduler request succeeded: got 0 new tasks

Other than telling me that our clocks are about 1 minute off, that pretty much tells me that the client is communicating with CPDN on the correct hostid.

I\'m fairly sure this is a problem with the CPDN database and not an issue with the client.


ID: 34204 · Report as offensive     Reply Quote
Profile Thyme Lawn
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1283
Credit: 15,824,334
RAC: 0
Message 34205 - Posted: 1 Jul 2008, 21:32:31 UTC - in response to Message 34204.  

I\'m fairly sure this is a problem with the CPDN database and not an issue with the client.

Open up the graphics window and type \'Z\' to hide the sidebar and \'8\' to display the timestep. What phase and timestep number are shown? If it\'s anything less than phase 3 and timestep 75,614 the model has rewound and the server is ignoring your trickles because they\'ve already been received.

If you can\'t run the graphics have a look at the file projects/climateprediction.net/hadsm3fub_0169_005941516.xml instead. The phase number and timestep at the last checkpoint are in <PH> and <TS> tags.
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer
ID: 34205 · Report as offensive     Reply Quote
Thunder

Send message
Joined: 1 Sep 04
Posts: 42
Credit: 6,475,117
RAC: 0
Message 34206 - Posted: 1 Jul 2008, 21:43:57 UTC - in response to Message 34205.  

I\'m fairly sure this is a problem with the CPDN database and not an issue with the client.

Open up the graphics window and type \'Z\' to hide the sidebar and \'8\' to display the timestep. What phase and timestep number are shown? If it\'s anything less than phase 3 and timestep 75,614 the model has rewound and the server is ignoring your trickles because they\'ve already been received.

If you can\'t run the graphics have a look at the file projects/climateprediction.net/hadsm3fub_0169_005941516.xml instead. The phase number and timestep at the last checkpoint are in <PH> and <TS> tags.


Sure, here\'s a copy straight from it:

<V>520</V> 
  <MD>HADSM3</MD> 
  <N>hadsm3fub_0169_005941516</N> 
  <PH>3</PH> 
  <TS>79311</TS> 
  <DAY>3</DAY> 
  <MTH>7</MTH> 
  <YR>2055</YR> 
  <HR>7</HR> 
  <MIN>30</MIN> 
  <SEC>0</SEC> 


ID: 34206 · Report as offensive     Reply Quote
Thunder

Send message
Joined: 1 Sep 04
Posts: 42
Credit: 6,475,117
RAC: 0
Message 34207 - Posted: 1 Jul 2008, 22:03:28 UTC

Of course, I just noticed something that\'s pretty obviously \"not right\"...

It\'s been sending trickles all right... Heh... It\'s been trying to send a trickle approximately once every hour of computational time since... oh, the last 2 and a half months or so. :O

Thyme, I think you\'ve hit the likely scenario. It\'s in a loop that\'s going back to prior to the last (checkpoint? trickle point?) and then crossing it again and again and again. Looks like around 12-13 hundred times if I were to do some rough math.

It\'s showing 1439 hours of computation time and just by some really rough math, I don\'t think it should take more than 8-900 hours to complete a slab model, even running hyperthreaded.

Think I should abort this model?
ID: 34207 · Report as offensive     Reply Quote
Thunder

Send message
Joined: 1 Sep 04
Posts: 42
Credit: 6,475,117
RAC: 0
Message 34208 - Posted: 1 Jul 2008, 22:23:13 UTC
Last modified: 1 Jul 2008, 22:26:10 UTC

And for the final piece of evidence, you\'ll note that the only other computer running a task from the same workunit is stuck at exactly the same timestep and hasn\'t trickled for about one month, despite having contacted the server in the last couple hours.

WU = 6153695

Other poor crucher schlepping the same data over and over: Worldwidewog

I\'m going to hold off on aborting this model until I find out for sure that there\'s no useful info that the project gurus would find useful stored on the client. Can someone with more knowledge than me let me know for sure when is the right time to kill it?
ID: 34208 · Report as offensive     Reply Quote
Profile Thyme Lawn
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1283
Credit: 15,824,334
RAC: 0
Message 34209 - Posted: 1 Jul 2008, 22:30:15 UTC
Last modified: 1 Jul 2008, 22:51:48 UTC

That seems as conclusive as can be Thunder!

Under the circumstances the best thing you can do is abort the model, but it would be really helpful if you could backup your projects/climateprediction.net and slots directories first in case the project team want a copy to investigate why the model is behaving this way.

I\'ve sent a PM to Worldwidewog to pass on the bad news.
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer
ID: 34209 · Report as offensive     Reply Quote
Thunder

Send message
Joined: 1 Sep 04
Posts: 42
Credit: 6,475,117
RAC: 0
Message 34210 - Posted: 2 Jul 2008, 6:50:34 UTC - in response to Message 34209.  

That seems as conclusive as can be Thunder!

Under the circumstances the best thing you can do is abort the model, but it would be really helpful if you could backup your projects/climateprediction.net and slots directories first in case the project team want a copy to investigate why the model is behaving this way.

I\'ve sent a PM to Worldwidewog to pass on the bad news.


Thanks for the assistance and advice. I probably would have scratched my head for a while without it. :)

ID: 34210 · Report as offensive     Reply Quote

Questions and Answers : Windows : Trickles not being reported for one model

©2024 climateprediction.net