FAMOUS SUCCESS/FAILURE RATIO

Author	Message
geophi Volunteer moderator Send message Joined: 7 Aug 04 Posts: 2167 Credit: 64,483,778 RAC: 4,361	Message 40202 - Posted: 22 Jul 2010, 3:49:39 UTC - in response to Message 40115. Updated as of July 21, on my systems here at cpdn... Core i7 920 in Linux 8 completed, 10 failed, 4 in progress Phenom II X4 940 in Linux 11 completed, 7 failed, 4 in progress Core 2 E6420 in Windows 3 completed, 1 failed, 1 in progress ID: 40202 · Reply Quote

old_user114060 Send message Joined: 24 Nov 05 Posts: 1 Credit: 612,262 RAC: 0	Message 40208 - Posted: 22 Jul 2010, 13:32:36 UTC Q6600 2.4gig running stock. Windows XP 64 bit. i Famous run to completion; and then on second Famous; 7/22/2010 7:02:13 AM climateprediction.net Started upload of famous_u01x_1799_200_006632920_5_6.zip 7/22/2010 7:02:36 AM climateprediction.net Finished upload of famous_u01x_1799_200_006632920_5_6.zip 7/22/2010 8:12:12 AM climateprediction.net Sending scheduler request: To send trickle-up message. 7/22/2010 8:12:12 AM climateprediction.net Not reporting or requesting tasks 7/22/2010 8:12:14 AM climateprediction.net Scheduler request completed 7/22/2010 8:38:35 AM climateprediction.net Resuming task famous_u01x_1799_200_006632920_5 using famous version 611 7/22/2010 9:10:01 AM climateprediction.net Computation for task famous_u01x_1799_200_006632920_5 finished 7/22/2010 9:10:01 AM climateprediction.net Output file famous_u01x_1799_200_006632920_5_7.zip for task famous_u01x_1799_200_006632920_5 absent 7/22/2010 9:10:01 AM climateprediction.net Output file famous_u01x_1799_200_006632920_5_8.zip for task famous_u01x_1799_200_006632920_5 absent 7/22/2010 9:10:01 AM climateprediction.net Output file famous_u01x_1799_200_006632920_5_9.zip for task famous_u01x_1799_200_006632920_5 absent 7/22/2010 9:10:01 AM climateprediction.net Output file famous_u01x_1799_200_006632920_5_10.zip for task famous_u01x_1799_200_006632920_5 absent 7/22/2010 9:10:01 AM climateprediction.net Output file famous_u01x_1799_200_006632920_5_11.zip for task famous_u01x_1799_200_006632920_5 absent 7/22/2010 9:10:01 AM climateprediction.net Output file famous_u01x_1799_200_006632920_5_12.zip for task famous_u01x_1799_200_006632920_5 absent 7/22/2010 9:10:01 AM climateprediction.net Output file famous_u01x_1799_200_006632920_5_13.zip for task famous_u01x_1799_200_006632920_5 absent 7/22/2010 9:10:01 AM climateprediction.net Output file famous_u01x_1799_200_006632920_5_14.zip for task famous_u01x_1799_200_006632920_5 absent 7/22/2010 9:10:01 AM climateprediction.net Output file famous_u01x_1799_200_006632920_5_15.zip for task famous_u01x_1799_200_006632920_5 absent 7/22/2010 9:10:01 AM climateprediction.net Output file famous_u01x_1799_200_006632920_5_16.zip for task famous_u01x_1799_200_006632920_5 absent 7/22/2010 9:10:01 AM climateprediction.net Output file famous_u01x_1799_200_006632920_5_17.zip for task famous_u01x_1799_200_006632920_5 absent 7/22/2010 9:10:01 AM climateprediction.net Output file famous_u01x_1799_200_006632920_5_18.zip for task famous_u01x_1799_200_006632920_5 absent 7/22/2010 9:10:01 AM climateprediction.net Output file famous_u01x_1799_200_006632920_5_19.zip for task famous_u01x_1799_200_006632920_5 absent 7/22/2010 9:10:01 AM climateprediction.net Output file famous_u01x_1799_200_006632920_5_20.zip for task famous_u01x_1799_200_006632920_5 absent ID: 40208 · Reply Quote

mo.v Volunteer moderator Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0	Message 40210 - Posted: 22 Jul 2010, 13:49:04 UTC Last modified: 22 Jul 2010, 13:50:06 UTC Mike, all those messages about the missing files just means that the model crashed before it could generate those files. Here's the crashed model's web page. If you click on stderr out + you'll see that it crashed because of NEGATIVE THETA ie caused by the model's initial parameters. Nothing to worry about. The researchers want us to run them whether they crash or complete. Cpdn news ID: 40210 · Reply Quote

JIM Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,060,840 RAC: 733	Message 40213 - Posted: 22 Jul 2010, 18:37:08 UTC Famous_up5n_1599_200_00665446_1 and Famous_umvv_1999_200_ 006662502_5 both completed successfully. Famous_up5n_1599_200_00665446_1 OS is Windows 7 32 bit on a Core 2 Duo 1.5 GHz processor with 2 GB of RAM. Famous_umvv_1999_200_ 006662502_5 was run on Windows 7 64 bit running on a Core 2 Duo 2.2 GHz processor with 4 GB of RAM. This makes 5 successful completions in a row. Have they done something to improve stability or did the Scientists just front load most of the WUâ€˜s with extreme parameters (that are more likely to fail) in the very early batches? ID: 40213 · Reply Quote

Les Bayliss Volunteer moderator Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0	Message 40215 - Posted: 22 Jul 2010, 21:42:55 UTC - in response to Message 40213. As I said somewhere, the new model type takes us back to 2003-4, when the original 'slab' model was used. The only way to find out which values lead to a long run, was to try them, and 'mark off' those values that caused early failures, and keep those that lasted the distance. And this is what is happening with these totally different Millennium models: try everything and see what happens. As it says here: Slogan : Historical climate records tell various stories â€” Let's test them all. And it also says: In addition to perturbations for internal physics parameters of the model and initial condition, this experiment requires a large number of forcing perturbations to deal with the large uncertainty in the historical forcings. The very first versions were more unstable, so more testing was done to find failure points, and compiler options were also changed. And the type 'name' series are using different degrees of values, and this affects the stability. Remember, this is a short term project, and lots of climatologists are poring over the results as they come in. On beta, Hiro is watching as each new trickle arrives. Well, several times a day. :) The current 'test' version, which has a name starting with s2..., is producing 'hot' results, and Hiro knows about these before they fail/complete. No doubt something similar is happening on this main site as well. Backups: Here ID: 40215 · Reply Quote

JIM Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,060,840 RAC: 733	Message 40227 - Posted: 23 Jul 2010, 21:03:06 UTC Famous_up23_1999_200_006665318 finished successfully. OS is Windows 7 64 bit running on a Core 2 Duo 2.2 GHz processor with 4 GB of RAM. ID: 40227 · Reply Quote

JIM Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,060,840 RAC: 733	Message 40269 - Posted: 29 Jul 2010, 15:05:05 UTC Famous_uky9_1599_200_006659996_3 failed at approx. 80% completion. OS is Windows 7 64 bit running on a Core 2 Duo 2.2 GHz processor with 4 GB of RAM. ID: 40269 · Reply Quote

old_user92639 Send message Joined: 13 Aug 05 Posts: 54 Credit: 117,227 RAC: 0	Message 40289 - Posted: 3 Aug 2010, 7:16:44 UTC http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=11460092 process exited with code 22 (0x16, -234) Suspended CPDN Monitor - Suspend request from BOINC... Model crashed: ATM_DYN : INVALID THETA DETECTED. error ID: 40289 · Reply Quote

Overtonesinger Send message Joined: 30 Dec 05 Posts: 5 Credit: 986,440 RAC: 0	Message 40294 - Posted: 3 Aug 2010, 9:33:19 UTC I have also many FAMOUS models crashing in last few days, on Intel Pentium Dual CPU "E2200" at 2.2 GHz (native). I have never had any crashing models before and nothing has changed in the computer. It is perfectlz stable. Is there some workaround for those crashes? computer: http://climateapps2.oucs.ox.ac.uk/cpdnboinc/show_host_detail.php?hostid=1051527 Peace and Love! Filip ID: 40294 · Reply Quote

mo.v Volunteer moderator Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0	Message 40296 - Posted: 3 Aug 2010, 11:13:13 UTC Last modified: 3 Aug 2010, 11:14:28 UTC Hello Overtonesinger I've looked at the results for computer 1051527 which has an excellent list of model completions. If you look at the web pages for the crashed FAMOUS models here and here and for each model click on + beside stderr you will see extra messages. Both models have exit code 22 and messages including NEGATIVE PRESSURE or INVALID THETA. FAMOUS models are experimenting with some very extreme parameter values. In some cases this causes model crashes. It is not the fault of the computer; it's part of the experiment and even the crashed models are useful for Hiro, the researcher. If the crash is caused by the model parameter values you usually see NEGATIVE PRESSURE or INVALID THETA messages. If you look at the workunit page for each crashed model (each model/task belongs to a workunit containing several copies of the same task) you see for example this. The processing of models depends on a combination of the computer's CPU (Intel or AMD) and its operating system (Windows, Linux or Mac/Darwin). Computers with the same combination usually all complete or all crash at the same processing moment. You will see that the two computers with Darwin crashed at the same moment, but not at the same moment as your computer which has Windows. The computer with Linux may complete the model. But if we look at the other workunit we find two computers that couldn't start the model. Their models have -226 and -185 exit codes. These mean there's a problem in those computers. Their firewall or antivirus is probably blocking Boinc. Don't try to back up or restore FAMOUS models that crash on a stable computer. They would crash again at the same processing moment. Cpdn news ID: 40296 · Reply Quote

JIM Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,060,840 RAC: 733	Message 40322 - Posted: 6 Aug 2010, 19:20:50 UTC Famous_u1eh-1199_200_006634660_4 completed successfully. Famous_u42q_1799_200_006638125_5 failed and Famous_u57z_1799_200_ 006639610_3 failed at approx. 45% completion. OS is Windows 7 64 bit running on a Core 2 Duo 2.2 GHz processor with 4 GB of RAM. ID: 40322 · Reply Quote

old_user92639 Send message Joined: 13 Aug 05 Posts: 54 Credit: 117,227 RAC: 0	Message 40330 - Posted: 8 Aug 2010, 22:08:02 UTC - in response to Message 40289. http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=11460092 process exited with code 22 (0x16, -234) Suspended CPDN Monitor - Suspend request from BOINC... Model crashed: ATM_DYN : INVALID THETA DETECTED. error Model crashed: ATM_DYN : INVALID THETA DETECTED. tmp/pipe_dummy update ^^ ID: 40330 · Reply Quote

astroWX Volunteer moderator Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0	Message 40335 - Posted: 9 Aug 2010, 3:56:25 UTC At risk of jinxing myself (Superstitious? Who? Me?), I've had more FAMOUS successes than failures lately, both here and on Beta. (Fingers crossed ...) May that be, or soon become, true for everyone. "We have met the enemy and he is us." -- Pogo Greetings from coastal Washington state, the scenic US Pacific Northwest. ID: 40335 · Reply Quote

JIM Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,060,840 RAC: 733	Message 40336 - Posted: 9 Aug 2010, 5:09:17 UTC - in response to Message 40335. I hate to say this but I felt the same way a little while back. Had 5 successes in a row. Since then I have had 3 crashes, with only 1 successful completion. I guess the law of averages is catching up with me. [/quote] ID: 40336 · Reply Quote

old_user26115 Send message Joined: 21 Oct 04 Posts: 24 Credit: 207,633 RAC: 0	Message 40350 - Posted: 11 Aug 2010, 7:46:28 UTC Last modified: 11 Aug 2010, 7:54:21 UTC First task was a success http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=11432527 but the second crashed: http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=11474733 I had the message that a certain .zip-file wasn't there (I can't remember the full file-name :( ) greetz from Switzerland littleBouncer ID: 40350 · Reply Quote

Les Bayliss Volunteer moderator Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0	Message 40351 - Posted: 11 Aug 2010, 9:33:01 UTC Missing zip file messages are normal when a model crashes - if the model hasn't progressed to the point where the file is created, then BOINC can't find it to upload it. The real messages about the failure are on the web page for the model; click the + sign alongside stderr to see them. ID: 40351 · Reply Quote

DaveG27 Send message Joined: 8 Nov 06 Posts: 18 Credit: 2,425,895 RAC: 0	Message 40352 - Posted: 11 Aug 2010, 22:11:33 UTC I have so far 21 successes's 10 failures all "negative theta",6 in progress ,4 waiting to run(reserve supply because of down load problems). As failures run for a shorter time this will skew the results in the short term and failures will appear higher than they actually are and a true ratio will become apparent over the longer term. Two of my m/c's run linux and one windows 7 failure rates seem about the same. When checking out the others in w.u. of failed models I was surprised by the differences on windows system between Xp,vista and 7 whether it failed or not and how far it got one would expect them all to fail at the same point which they generally did when running the same o.s. Two of the w.u.'s had a lot of linux's one they all failed at the same point the other they were all different (you can't win) Perhaps more research needs to be done on this to see if it is true or not and not just a coincidence on the ones I looked at. ID: 40352 · Reply Quote

JIM Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,060,840 RAC: 733	Message 40377 - Posted: 17 Aug 2010, 17:53:10 UTC Famous_u6f5_1399_200_006641826_6 failed at appromx. 95 % on a machine running Windows7 64 bit with 2.2 GHz Core 2 Duo processor and 4 GB of RAM. Famous_1399_200_006641826_6 completed successfully on a machine running Windows7 64 bit with 2.2 GHz Core 2 Duo processor and 4 GB of RAM. Famous_u34a_1799_200_006636885_2 completed successfully on machine running Windows7 32 bit with Core 2 Duo 1.5 GHz processor and 2 GB of RAM. Famous_u34a_1799_200_006636891_1 completed successfully on a machine running Windows7 32 bit with Core 2 Duo 1.5 GHz processor and 2 GB of RAM. ID: 40377 · Reply Quote

Strathpeffer Send message Joined: 9 Jan 07 Posts: 497 Credit: 342,899 RAC: 0	Message 40378 - Posted: 17 Aug 2010, 18:02:17 UTC Last modified: 17 Aug 2010, 18:03:33 UTC Don't know if this is of any interest but it might be, because team Scotland members have quite a good record of completing long models, from BBC onwards. From this page of Iansm's brilliant stats for the team, it can be seen that, of 484 FAMOUS models issued to team members to date, 210 have completed and 170 have failed. Visit the Scotland team ID: 40378 · Reply Quote

JIM Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,060,840 RAC: 733	Message 40385 - Posted: 19 Aug 2010, 7:02:42 UTC Famous_ua0y_1799_200_006636885_2 failed at appromx. 12% running on 2.2 GHz Core 2 Duo processor running Windows 7 64 bit. At least this one had the good grace to fail early (36 hours) and not after 11 days (95%) of running. ID: 40385 · Reply Quote