climateprediction.net home page
FAMOUS SUCCESS/FAILURE RATIO

FAMOUS SUCCESS/FAILURE RATIO

Message boards : Number crunching : FAMOUS SUCCESS/FAILURE RATIO
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next

AuthorMessage
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2167
Credit: 64,483,778
RAC: 4,361
Message 40202 - Posted: 22 Jul 2010, 3:49:39 UTC - in response to Message 40115.  

Updated as of July 21, on my systems here at cpdn...

Core i7 920 in Linux
8 completed, 10 failed, 4 in progress
Phenom II X4 940 in Linux
11 completed, 7 failed, 4 in progress
Core 2 E6420 in Windows
3 completed, 1 failed, 1 in progress
ID: 40202 · Report as offensive     Reply Quote
Profile old_user114060
Avatar

Send message
Joined: 24 Nov 05
Posts: 1
Credit: 612,262
RAC: 0
Message 40208 - Posted: 22 Jul 2010, 13:32:36 UTC

Q6600 2.4gig running stock.
Windows XP 64 bit.
i Famous run to completion;
and then on second Famous;
7/22/2010 7:02:13 AM climateprediction.net Started upload of famous_u01x_1799_200_006632920_5_6.zip
7/22/2010 7:02:36 AM climateprediction.net Finished upload of famous_u01x_1799_200_006632920_5_6.zip
7/22/2010 8:12:12 AM climateprediction.net Sending scheduler request: To send trickle-up message.
7/22/2010 8:12:12 AM climateprediction.net Not reporting or requesting tasks
7/22/2010 8:12:14 AM climateprediction.net Scheduler request completed
7/22/2010 8:38:35 AM climateprediction.net Resuming task famous_u01x_1799_200_006632920_5 using famous version 611
7/22/2010 9:10:01 AM climateprediction.net Computation for task famous_u01x_1799_200_006632920_5 finished
7/22/2010 9:10:01 AM climateprediction.net Output file famous_u01x_1799_200_006632920_5_7.zip for task famous_u01x_1799_200_006632920_5 absent
7/22/2010 9:10:01 AM climateprediction.net Output file famous_u01x_1799_200_006632920_5_8.zip for task famous_u01x_1799_200_006632920_5 absent
7/22/2010 9:10:01 AM climateprediction.net Output file famous_u01x_1799_200_006632920_5_9.zip for task famous_u01x_1799_200_006632920_5 absent
7/22/2010 9:10:01 AM climateprediction.net Output file famous_u01x_1799_200_006632920_5_10.zip for task famous_u01x_1799_200_006632920_5 absent
7/22/2010 9:10:01 AM climateprediction.net Output file famous_u01x_1799_200_006632920_5_11.zip for task famous_u01x_1799_200_006632920_5 absent
7/22/2010 9:10:01 AM climateprediction.net Output file famous_u01x_1799_200_006632920_5_12.zip for task famous_u01x_1799_200_006632920_5 absent
7/22/2010 9:10:01 AM climateprediction.net Output file famous_u01x_1799_200_006632920_5_13.zip for task famous_u01x_1799_200_006632920_5 absent
7/22/2010 9:10:01 AM climateprediction.net Output file famous_u01x_1799_200_006632920_5_14.zip for task famous_u01x_1799_200_006632920_5 absent
7/22/2010 9:10:01 AM climateprediction.net Output file famous_u01x_1799_200_006632920_5_15.zip for task famous_u01x_1799_200_006632920_5 absent
7/22/2010 9:10:01 AM climateprediction.net Output file famous_u01x_1799_200_006632920_5_16.zip for task famous_u01x_1799_200_006632920_5 absent
7/22/2010 9:10:01 AM climateprediction.net Output file famous_u01x_1799_200_006632920_5_17.zip for task famous_u01x_1799_200_006632920_5 absent
7/22/2010 9:10:01 AM climateprediction.net Output file famous_u01x_1799_200_006632920_5_18.zip for task famous_u01x_1799_200_006632920_5 absent
7/22/2010 9:10:01 AM climateprediction.net Output file famous_u01x_1799_200_006632920_5_19.zip for task famous_u01x_1799_200_006632920_5 absent
7/22/2010 9:10:01 AM climateprediction.net Output file famous_u01x_1799_200_006632920_5_20.zip for task famous_u01x_1799_200_006632920_5 absent

ID: 40208 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 40210 - Posted: 22 Jul 2010, 13:49:04 UTC
Last modified: 22 Jul 2010, 13:50:06 UTC

Mike, all those messages about the missing files just means that the model crashed before it could generate those files.

Here's the crashed model's web page. If you click on stderr out + you'll see that it crashed because of NEGATIVE THETA ie caused by the model's initial parameters. Nothing to worry about. The researchers want us to run them whether they crash or complete.
Cpdn news
ID: 40210 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,060,840
RAC: 733
Message 40213 - Posted: 22 Jul 2010, 18:37:08 UTC

Famous_up5n_1599_200_00665446_1 and Famous_umvv_1999_200_ 006662502_5 both completed successfully.

Famous_up5n_1599_200_00665446_1 OS is Windows 7 32 bit on a Core 2 Duo 1.5 GHz processor with 2 GB of RAM.

Famous_umvv_1999_200_ 006662502_5 was run on Windows 7 64 bit running on a Core 2 Duo 2.2 GHz processor with 4 GB of RAM.

This makes 5 successful completions in a row. Have they done something to improve stability or did the Scientists just front load most of the WU‘s with extreme parameters (that are more likely to fail) in the very early batches?

ID: 40213 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 40215 - Posted: 22 Jul 2010, 21:42:55 UTC - in response to Message 40213.  

As I said somewhere, the new model type takes us back to 2003-4, when the original 'slab' model was used.
The only way to find out which values lead to a long run, was to try them, and 'mark off' those values that caused early failures, and keep those that lasted the distance.
And this is what is happening with these totally different Millennium models: try everything and see what happens.

As it says here:
Slogan : Historical climate records tell various stories — Let's test them all.

And it also says:
In addition to perturbations for internal physics parameters of the model and initial condition, this experiment requires a large number of forcing perturbations to deal with the large uncertainty in the historical forcings.


The very first versions were more unstable, so more testing was done to find failure points, and compiler options were also changed.
And the type 'name' series are using different degrees of values, and this affects the stability.

Remember, this is a short term project, and lots of climatologists are poring over the results as they come in. On beta, Hiro is watching as each new trickle arrives. Well, several times a day. :)
The current 'test' version, which has a name starting with s2..., is producing 'hot' results, and Hiro knows about these before they fail/complete.
No doubt something similar is happening on this main site as well.


Backups: Here
ID: 40215 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,060,840
RAC: 733
Message 40227 - Posted: 23 Jul 2010, 21:03:06 UTC

Famous_up23_1999_200_006665318 finished successfully.
OS is Windows 7 64 bit running on a Core 2 Duo 2.2 GHz processor with 4 GB of RAM.


ID: 40227 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,060,840
RAC: 733
Message 40269 - Posted: 29 Jul 2010, 15:05:05 UTC

Famous_uky9_1599_200_006659996_3 failed at approx. 80% completion. OS is Windows 7 64 bit running on a Core 2 Duo 2.2 GHz processor with 4 GB of RAM.

ID: 40269 · Report as offensive     Reply Quote
old_user92639

Send message
Joined: 13 Aug 05
Posts: 54
Credit: 117,227
RAC: 0
Message 40289 - Posted: 3 Aug 2010, 7:16:44 UTC

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=11460092

process exited with code 22 (0x16, -234)

Suspended CPDN Monitor - Suspend request from BOINC...

Model crashed: ATM_DYN : INVALID THETA DETECTED.

error
ID: 40289 · Report as offensive     Reply Quote
Profile Overtonesinger

Send message
Joined: 30 Dec 05
Posts: 5
Credit: 986,440
RAC: 0
Message 40294 - Posted: 3 Aug 2010, 9:33:19 UTC

I have also many FAMOUS models crashing in last few days, on Intel Pentium Dual CPU "E2200" at 2.2 GHz (native).
I have never had any crashing models before and nothing has changed in the computer. It is perfectlz stable.
Is there some workaround for those crashes?

computer:
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/show_host_detail.php?hostid=1051527

Peace and Love!
Filip
ID: 40294 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 40296 - Posted: 3 Aug 2010, 11:13:13 UTC
Last modified: 3 Aug 2010, 11:14:28 UTC

Hello Overtonesinger

I've looked at the results for computer 1051527 which has an excellent list of model completions.

If you look at the web pages for the crashed FAMOUS models here and here and for each model click on + beside stderr you will see extra messages. Both models have exit code 22 and messages including NEGATIVE PRESSURE or INVALID THETA.

FAMOUS models are experimenting with some very extreme parameter values. In some cases this causes model crashes. It is not the fault of the computer; it's part of the experiment and even the crashed models are useful for Hiro, the researcher. If the crash is caused by the model parameter values you usually see NEGATIVE PRESSURE or INVALID THETA messages.

If you look at the workunit page for each crashed model (each model/task belongs to a workunit containing several copies of the same task) you see for example this. The processing of models depends on a combination of the computer's CPU (Intel or AMD) and its operating system (Windows, Linux or Mac/Darwin). Computers with the same combination usually all complete or all crash at the same processing moment. You will see that the two computers with Darwin crashed at the same moment, but not at the same moment as your computer which has Windows. The computer with Linux may complete the model.

But if we look at the other workunit we find two computers that couldn't start the model. Their models have -226 and -185 exit codes. These mean there's a problem in those computers. Their firewall or antivirus is probably blocking Boinc.

Don't try to back up or restore FAMOUS models that crash on a stable computer. They would crash again at the same processing moment.
Cpdn news
ID: 40296 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,060,840
RAC: 733
Message 40322 - Posted: 6 Aug 2010, 19:20:50 UTC

Famous_u1eh-1199_200_006634660_4 completed successfully.

Famous_u42q_1799_200_006638125_5 failed and Famous_u57z_1799_200_ 006639610_3 failed at approx. 45% completion. OS is Windows 7 64 bit running on a Core 2 Duo 2.2 GHz processor with 4 GB of RAM.

ID: 40322 · Report as offensive     Reply Quote
old_user92639

Send message
Joined: 13 Aug 05
Posts: 54
Credit: 117,227
RAC: 0
Message 40330 - Posted: 8 Aug 2010, 22:08:02 UTC - in response to Message 40289.  

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=11460092

process exited with code 22 (0x16, -234)

Suspended CPDN Monitor - Suspend request from BOINC...

Model crashed: ATM_DYN : INVALID THETA DETECTED.

error


Model crashed: ATM_DYN : INVALID THETA DETECTED. tmp/pipe_dummy

update ^^

ID: 40330 · Report as offensive     Reply Quote
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 40335 - Posted: 9 Aug 2010, 3:56:25 UTC

At risk of jinxing myself (Superstitious? Who? Me?), I've had more FAMOUS successes than failures lately, both here and on Beta. (Fingers crossed ...) May that be, or soon become, true for everyone.
"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 40335 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,060,840
RAC: 733
Message 40336 - Posted: 9 Aug 2010, 5:09:17 UTC - in response to Message 40335.  

I hate to say this but I felt the same way a little while back. Had 5 successes in a row. Since then I have had 3 crashes, with only 1 successful completion. I guess the law of averages is catching up with me.
[/quote]
ID: 40336 · Report as offensive     Reply Quote
old_user26115
Avatar

Send message
Joined: 21 Oct 04
Posts: 24
Credit: 207,633
RAC: 0
Message 40350 - Posted: 11 Aug 2010, 7:46:28 UTC
Last modified: 11 Aug 2010, 7:54:21 UTC

First task was a success

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=11432527

but the second crashed:

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=11474733

I had the message that a certain .zip-file wasn't there (I can't remember the full file-name :( )

greetz from Switzerland
littleBouncer
ID: 40350 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 40351 - Posted: 11 Aug 2010, 9:33:01 UTC

Missing zip file messages are normal when a model crashes - if the model hasn't progressed to the point where the file is created, then BOINC can't find it to upload it.

The real messages about the failure are on the web page for the model; click the + sign alongside stderr to see them.

ID: 40351 · Report as offensive     Reply Quote
DaveG27

Send message
Joined: 8 Nov 06
Posts: 18
Credit: 2,425,895
RAC: 0
Message 40352 - Posted: 11 Aug 2010, 22:11:33 UTC

I have so far 21 successes's 10 failures all "negative theta",6 in progress ,4 waiting to run(reserve supply because of down load problems).
As failures run for a shorter time this will skew the results in the short term and failures will appear higher than they actually are and a true ratio will become apparent over the longer term.
Two of my m/c's run linux and one windows 7 failure rates seem about the same.
When checking out the others in w.u. of failed models I was surprised by the differences on windows system between Xp,vista and 7 whether it failed or not and how far it got one would expect them all to fail at the same point which they generally did when running the same o.s.
Two of the w.u.'s had a lot of linux's one they all failed at the same point the other they were all different (you can't win)
Perhaps more research needs to be done on this to see if it is true or not and not just a coincidence on the ones I looked at.
ID: 40352 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,060,840
RAC: 733
Message 40377 - Posted: 17 Aug 2010, 17:53:10 UTC

Famous_u6f5_1399_200_006641826_6 failed at appromx. 95 % on a machine running Windows7 64 bit with 2.2 GHz Core 2 Duo processor and 4 GB of RAM.

Famous_1399_200_006641826_6 completed successfully on a machine running Windows7 64 bit with 2.2 GHz Core 2 Duo processor and 4 GB of RAM.

Famous_u34a_1799_200_006636885_2 completed successfully on machine running Windows7 32 bit with Core 2 Duo 1.5 GHz processor and 2 GB of RAM.

Famous_u34a_1799_200_006636891_1 completed successfully on a machine running Windows7 32 bit with Core 2 Duo 1.5 GHz processor and 2 GB of RAM.

ID: 40377 · Report as offensive     Reply Quote
Profile Strathpeffer
Avatar

Send message
Joined: 9 Jan 07
Posts: 497
Credit: 342,899
RAC: 0
Message 40378 - Posted: 17 Aug 2010, 18:02:17 UTC
Last modified: 17 Aug 2010, 18:03:33 UTC

Don't know if this is of any interest but it might be, because team Scotland members have quite a good record of completing long models, from BBC onwards. From this page of Iansm's brilliant stats for the team, it can be seen that, of 484 FAMOUS models issued to team members to date, 210 have completed and 170 have failed.
Visit the Scotland team
ID: 40378 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,060,840
RAC: 733
Message 40385 - Posted: 19 Aug 2010, 7:02:42 UTC

Famous_ua0y_1799_200_006636885_2 failed at appromx. 12% running on 2.2 GHz Core 2 Duo processor running Windows 7 64 bit. At least this one had the good grace to fail early (36 hours) and not after 11 days (95%) of running.

ID: 40385 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next

Message boards : Number crunching : FAMOUS SUCCESS/FAILURE RATIO

©2024 climateprediction.net