climateprediction.net home page
LOOPING IN 2040

LOOPING IN 2040

Questions and Answers : Windows : LOOPING IN 2040
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,097,287
RAC: 2,957
Message 34340 - Posted: 21 Jul 2008, 21:03:43 UTC

I have a problem at 74.9%. I have been running a CM model for several months. It has been doing fine up till now, sending regular trickles. Yesterday it crashed. I restored it from a backup made that morning. Now it is looping. Every time is reaches 03/12/2040 it loops back to 01/12/2040. It has done this at least 3 or 4 (probably more) times. Could the fact that 2040 is a 40 year mega-trickle year have anything to do with the problem?

I plan to try shifting the backup to my other machine(AMD processor to Intel) and seeing if I can get it past the sticking point. If that doesn’t work I will try restoring from an earlier backup. I hate to abort it with so much time invested. Wish me luck.

ID: 34340 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,097,287
RAC: 2,957
Message 34341 - Posted: 22 Jul 2008, 6:05:31 UTC

EUREKA!!!

Changing machines worked! The WU is now past the loop point and crunching on. Messages confirms that the 2040 trickle was sent and received. Apparently there really is a difference between the way the model runs on AMD and Intel processors. The model is now crunching its way through March of 2041. Maybe I shouldn\'t crow about it until it has trickle at least once more in 2041?

I will post in this thread if there is any further problems.

ID: 34341 · Report as offensive     Reply Quote
old_user219190

Send message
Joined: 14 Jan 07
Posts: 52
Credit: 284,001
RAC: 0
Message 34342 - Posted: 22 Jul 2008, 7:14:01 UTC

\"I love it when a plan comes together\" :)
Congratulations Jim.
Chris.


ID: 34342 · Report as offensive     Reply Quote
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 34348 - Posted: 22 Jul 2008, 20:49:20 UTC

Well done!

"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 34348 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 34359 - Posted: 23 Jul 2008, 16:54:50 UTC
Last modified: 23 Jul 2008, 16:55:49 UTC

Well done, Jim!

Do the two computers crunch at a fairly similar speed? I\'m asking because if you move a model from a slow machine to a much faster one (over twice as fast), there\'s a potential problem that can crop up later on. But regular backups allow a solution. If this is the case let us know.
Cpdn news
ID: 34359 · Report as offensive     Reply Quote
Profile Ananas
Volunteer moderator

Send message
Joined: 31 Oct 04
Posts: 336
Credit: 3,316,482
RAC: 0
Message 34366 - Posted: 23 Jul 2008, 19:51:03 UTC

I moved a Spinup from a Pentium III to an E8500, no noticable problems so far.
Does this \"later\" refer to <rsc_fpops_bound> somehow?

In this case I guess I\'ll have to edit init_data.xml and reduce <wu_cpu_time> a bit.
ID: 34366 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 34370 - Posted: 23 Jul 2008, 21:02:24 UTC
Last modified: 23 Jul 2008, 21:06:26 UTC

It\'s the rsc_fpops_bound number/value that\'s the problem so that\'s the number that needs to be changed. There are instructions here, intended to be detailed enough to enable almost any member to edit the file.

Members who think their model may hit this problem would probably do well to edit the file soon after the move, just in case. But you can also do nothing and wait to see whether the problem occurs as long as you back up regularly.

If a model\'s moved to another computer roughly the same speed, or slightly faster, or slower there\'s no problem.
Cpdn news
ID: 34370 · Report as offensive     Reply Quote
Profile Ananas
Volunteer moderator

Send message
Joined: 31 Oct 04
Posts: 336
Credit: 3,316,482
RAC: 0
Message 34371 - Posted: 23 Jul 2008, 21:12:01 UTC
Last modified: 23 Jul 2008, 21:38:43 UTC

Increasing the allowed fpops or reducing the time that the model has already used up should be about the same.

fpops_rsc_bound is part of client_state.xml and has a copy in init_data.xml, wu_cpu_time is only contained in init_data.xml

I will make a backup and try what happens if I change the CPU time in init_data.xml


Edit :

Result = it restored the original WU time from somewhere, so editing the time doesn\'t help. I guess BOINC does this as a cheat prevention. I increased the fpops_bound now, that should help.

Thanks for the information, I would not have wanted to loose that model, it is already half through (~40%).
ID: 34371 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,097,287
RAC: 2,957
Message 34377 - Posted: 24 Jul 2008, 2:19:47 UTC - in response to Message 34359.  

Well done, Jim!

Do the two computers crunch at a fairly similar speed? I\'m asking because if you move a model from a slow machine to a much faster one (over twice as fast), there\'s a potential problem that can crop up later on. But regular backups allow a solution. If this is the case let us know.


Yes, they are roughly the same speed. The AMD machine has a 1.7 Ghz single core processor. The Intel machine has a core 2 duo twin 1.5 Ghz processors. Both laptops. I transferred the WU back to the AMD when it reached March 2041 and it has since trickled successfully. It is now crunching happily through Sept. of 2042. Should trickle again about 2am local time. I wish I had a machine that was more than twice the speed of the 1.7Ghz. :)

ID: 34377 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 34383 - Posted: 24 Jul 2008, 16:21:23 UTC
Last modified: 24 Jul 2008, 16:23:32 UTC

I only had 1.33GHz for several years of crunching but the computer still did lots of useful CPDN processing. Though not very fast! I kept it working for nearly seven years until so many things went wrong that it eventually couldn\'t be used. But some parts from it are still inside my current AMD which is a hybrid of bits recovered from other computers. Long may it last.

The day you do replace it you may find that the price of fast 4 or even 8 cores has come right down.
Cpdn news
ID: 34383 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,097,287
RAC: 2,957
Message 34790 - Posted: 27 Aug 2008, 15:05:17 UTC

Hi, everyone.

This is an update one the earlier post in this thread. The CM model that I am running on my HP machine with the AMD processor still loops occasionally when it get to the trickle. It will loop back from 4/12 back to 1/12 for hours (real time) and then crash. This has happened 5 time now. The looping seems to happens about every 4 model years.

To fix this I make a backup on 1/12 and move it to my Acer machine with the Intel core2dou processor. I then let it run past 12/7 (to set a save point after the problem) and move it back to the HP. It works like a charm.

This work around has worked very well on the Boinc manager 5.10.45. I was concerned that it might not work as well on the new 6.2.18. I am happy to report that it works equally well on the 6.2.18. The model is now in 2061.

The model is 87% complete and I am determined that I am going to ride this nag across the finish line if I have to whip it every step of the way.

ID: 34790 · Report as offensive     Reply Quote
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 34807 - Posted: 27 Aug 2008, 21:38:58 UTC

Good for you, Jim. (The satisfaction in herding one of these large beasts to the finish line is worth the effort.)
"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 34807 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 34810 - Posted: 27 Aug 2008, 23:30:53 UTC
Last modified: 27 Aug 2008, 23:36:42 UTC

Hi Jim

I hate to say this, but I think there\'s something seriously wrong with this model and it may have been flawed from the outset. Like the Greek tragedies where, given the starting situation, a tragic ending is inevitable. But with the difference that while Greek audiences knew in advance whether they were going to see a tragedy or a comedy, on CPDN the suspense is greater because we never know beforehand how a model will turn out.

Here are the AMD\'s tasks.

Here\'s Task 7202686 which is of course a 160-year HADCM.

The graph only shows years to 2019 though Jim says it\'s reached 2061. Here is its graph as far as it goes:



That seems to have been abnormal from the start.

I\'ve looked at the whole workunit 6133966. Two other members made progress with their tasks from this WU but both models crashed. Here are Indefual\'s model and graph:

.
That temperature rise from 1920-1945 looks crazy to me. The rise in total precipitation early in the model looks equally bad.

Nixniz ran the model on his Intel. Here it is and here\'s its graph:


I don\'t like all those ups and downs and it shows the same early extreme temperature rise.

I think that in spite of all Jim\'s efforts he should abort this model but I\'d appreciate some other opinions.
Cpdn news
ID: 34810 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,097,287
RAC: 2,957
Message 34826 - Posted: 28 Aug 2008, 21:05:40 UTC - in response to Message 34810.  

Hi Jim

I hate to say this, but I think there\'s something seriously wrong with this model and it may have been flawed from the outset. Like the Greek tragedies where, given the starting situation, a tragic ending is inevitable. But with the difference that while Greek audiences knew in advance whether they were going to see a tragedy or a comedy, on CPDN the suspense is greater because we never know beforehand how a model will turn out.

Here are the AMD\'s tasks.

Here\'s Task 7202686 which is of course a 160-year HADCM.

The graph only shows years to 2019 though Jim says it\'s reached 2061. Here is its graph as far as it goes:



That seems to have been abnormal from the start.

I\'ve looked at the whole workunit 6133966. Two other members made progress with their tasks from this WU but both models crashed. Here are Indefual\'s model and graph:

.
That temperature rise from 1920-1945 looks crazy to me. The rise in total precipitation early in the model looks equally bad.

Nixniz ran the model on his Intel. Here it is and here\'s its graph:


I don\'t like all those ups and downs and it shows the same early extreme temperature rise.

I think that in spite of all Jim\'s efforts he should abort this model but I\'d appreciate some other opinions.


Hi Mo, this is Jim.

So you think the model is abnormal and should be aborted? I hate to give up on this one. I have been running it for 7 months!

I’m no expert at reading the graphs, but, the steep rise in temp between 1920 and 1945 does seem somewhat extreme. Did the model stop transmitting temp data in 2019? That would mean that I have been running it for the last 5 weeks for nothing.

There is no hurry making a decision. Right now the WU is sitting inactive in a backup file and I am running a slab model in its place.

ID: 34826 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 34828 - Posted: 28 Aug 2008, 22:07:26 UTC

You could continue on and spend months more on it. But the researchers could then run their statistical analysis programs on it and throw it out, and you\'d never know.
Given the results so far, I\'d be inclined to abort it.

And if it takes you that long to only get as far as you have with that model, I\'d suggest that you stick to the shorter slab models.


Backups: Here
ID: 34828 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 34834 - Posted: 29 Aug 2008, 1:57:07 UTC
Last modified: 29 Aug 2008, 1:58:59 UTC

Hi Jim

I\'ve looked at about 8 of the model\'s graphs and none look normal. There\'s missing data at the beginning and something strange happened about 1933.



I think you need to merge the records for your AMD - I think you have a phantom record, probably as a result of restoring a backup. It could be that after merging the records, the missing years after 2019 will show up on the graphs. Trickles may have been going to the phantom computer.

The computer time hasn\'t been completely wasted because the researchers also need to know what parameter sets are unviable.

I know it really hurts to abandon a model after looking after it so carefully and for so long, but I think you\'re going to have to make yourself abort it.

I think you should keep a close eye on the models your AMD runs because a slab HADSM model you ran earlier has missing precipitation data for the last few years of phase 3, whereas the other two members who completed it on an AMD and an Intel produced normal graphs. Usually, though not always, a slab model that goes wrong produces the same abnormality on all the computers that run it.

Here is that slab. It speeded up for its last few trickles because it wasn\'t processing all the data.

Keeping an eye on models means

* looking at the globe every few days to check it still looks normal with all the colours, not a monochrome display
* noticing the crunching speed (sec/timestep) which shouldn\'t slow down or speed up very much as the model progresses
* looking at the model\'s graphs on its web page to check that data is there

Jim, is your AMD overclocked, or have the settings and timings been altered in any way? I\'m asking this because a beta HADSM slab I ran on my AMD speeded up massively early on and the globe blanked out. Another mod, PeteB, got me to check the settings using CPU-Z. It turned out that the settings had been speeded up by 2½%, probably at the shop that had sold the CPU & motherboard to my son a couple of years earlier as part of a barebones package. As soon as I got it back to factory settings the computer behaved perfectly and I was able to rerun the same slab with normal results.

I\'m not suggesting that this problem is anything to do with AMD as opposed to Intel. But if there\'s any instability in a computer, it\'s likely to affect climate models. It may be significant that your HADCM is much more abnormal than the two run by other crunchers.

If you want to download CPU-Z, which is a freebie, to the AMD, and check the actual settings against stock settings for your CPU and RAM, one of us should be able to to advise you how to use the tool.
Cpdn news
ID: 34834 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,097,287
RAC: 2,957
Message 34835 - Posted: 29 Aug 2008, 4:23:09 UTC - in response to Message 34834.  

Hi Jim

I\'ve looked at about 8 of the model\'s graphs and none look normal. There\'s missing data at the beginning and something strange happened about 1933.



I think you need to merge the records for your AMD - I think you have a phantom record, probably as a result of restoring a backup. It could be that after merging the records, the missing years after 2019 will show up on the graphs. Trickles may have been going to the phantom computer.

The computer time hasn\'t been completely wasted because the researchers also need to know what parameter sets are unviable.

I know it really hurts to abandon a model after looking after it so carefully and for so long, but I think you\'re going to have to make yourself abort it.

I think you should keep a close eye on the models your AMD runs because a slab HADSM model you ran earlier has missing precipitation data for the last few years of phase 3, whereas the other two members who completed it on an AMD and an Intel produced normal graphs. Usually, though not always, a slab model that goes wrong produces the same abnormality on all the computers that run it.

Here is that slab. It speeded up for its last few trickles because it wasn\'t processing all the data.

Keeping an eye on models means

* looking at the globe every few days to check it still looks normal with all the colours, not a monochrome display
* noticing the crunching speed (sec/timestep) which shouldn\'t slow down or speed up very much as the model progresses
* looking at the model\'s graphs on its web page to check that data is there

Jim, is your AMD overclocked, or have the settings and timings been altered in any way? I\'m asking this because a beta HADSM slab I ran on my AMD speeded up massively early on and the globe blanked out. Another mod, PeteB, got me to check the settings using CPU-Z. It turned out that the settings had been speeded up by 2½%, probably at the shop that had sold the CPU & motherboard to my son a couple of years earlier as part of a barebones package. As soon as I got it back to factory settings the computer behaved perfectly and I was able to rerun the same slab with normal results.

I\'m not suggesting that this problem is anything to do with AMD as opposed to Intel. But if there\'s any instability in a computer, it\'s likely to affect climate models. It may be significant that your HADCM is much more abnormal than the two run by other crunchers.

If you want to download CPU-Z, which is a freebie, to the AMD, and check the actual settings against stock settings for your CPU and RAM, one of us should be able to to advise you how to use the tool.


Hi Mo and Les.
Thanks for the advise. I will abandon the defective CM WU. Since it is not presently installed on my machine (its sitting in a backup file) do I need to reinstall it and formally abort it? I know that one of my past slab models went iceball at 92%.

The AMD computer is a laptop with factory settings as far as I know.


ID: 34835 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,097,287
RAC: 2,957
Message 34840 - Posted: 29 Aug 2008, 21:01:59 UTC - in response to Message 34835.  

Hi Jim

I\'ve looked at about 8 of the model\'s graphs and none look normal. There\'s missing data at the beginning and something strange happened about 1933.



I think you need to merge the records for your AMD - I think you have a phantom record, probably as a result of restoring a backup. It could be that after merging the records, the missing years after 2019 will show up on the graphs. Trickles may have been going to the phantom computer.

The computer time hasn\'t been completely wasted because the researchers also need to know what parameter sets are unviable.

I know it really hurts to abandon a model after looking after it so carefully and for so long, but I think you\'re going to have to make yourself abort it.

I think you should keep a close eye on the models your AMD runs because a slab HADSM model you ran earlier has missing precipitation data for the last few years of phase 3, whereas the other two members who completed it on an AMD and an Intel produced normal graphs. Usually, though not always, a slab model that goes wrong produces the same abnormality on all the computers that run it.

Here is that slab. It speeded up for its last few trickles because it wasn\'t processing all the data.

Keeping an eye on models means

* looking at the globe every few days to check it still looks normal with all the colours, not a monochrome display
* noticing the crunching speed (sec/timestep) which shouldn\'t slow down or speed up very much as the model progresses
* looking at the model\'s graphs on its web page to check that data is there

Jim, is your AMD overclocked, or have the settings and timings been altered in any way? I\'m asking this because a beta HADSM slab I ran on my AMD speeded up massively early on and the globe blanked out. Another mod, PeteB, got me to check the settings using CPU-Z. It turned out that the settings had been speeded up by 2½%, probably at the shop that had sold the CPU & motherboard to my son a couple of years earlier as part of a barebones package. As soon as I got it back to factory settings the computer behaved perfectly and I was able to rerun the same slab with normal results.

I\'m not suggesting that this problem is anything to do with AMD as opposed to Intel. But if there\'s any instability in a computer, it\'s likely to affect climate models. It may be significant that your HADCM is much more abnormal than the two run by other crunchers.

If you want to download CPU-Z, which is a freebie, to the AMD, and check the actual settings against stock settings for your CPU and RAM, one of us should be able to to advise you how to use the tool.


Hi Mo and Les.
Thanks for the advise. I will abandon the defective CM WU. Since it is not presently installed on my machine (its sitting in a backup file) do I need to reinstall it and formally abort it? I know that one of my past slab models went iceball at 92%.

The AMD computer is a laptop with factory settings as far as I know.



Hi, Guys. This is Jim again.

I took your advice and merged 2 versions of my HP computer. Did it make the temp results for 2019 to 2060 appear?


ID: 34840 · Report as offensive     Reply Quote

Questions and Answers : Windows : LOOPING IN 2040

©2024 climateprediction.net