HadSM3MH Performance

Author	Message
geophi Volunteer moderator Send message Joined: 7 Aug 04 Posts: 2169 Credit: 64,555,907 RAC: 5,858	Message 36532 - Posted: 28 Mar 2009, 19:15:23 UTC - in response to Message 36524. Looks like I may have answered my own question. My Core i7 920 is up running a HadSM3MH model at 0.5134 s/TS. Note that this model was 2/3 done by the older computer. This is slightly faster than my old Core2Duo E8400 (that ran 333MHz faster). The durations in CPU time for your last several trickles indicate about .45 s/TS for that model since the 920 took over. Are CPDN models floating-point bound processes? Or would it make more sense to have HT disabled for best performance? By performance, I mean RAC. I was actually hoping someone would experiment with that. I would suggest running 4 hadsm3\'s with HT off, then 8 with HT on, and see if it takes takes less than twice as much time to complete them. Or one could get an estimate by running 4 with HT off through several trickles, writing down average s/TS for those 4 models, then turning HT on and letting it download 4 more and run them through several trickles along with the first 4. Then see what the avg s/TS values were for the 4 new models. If the avg s/TS of the last 4 models downloaded is less than twice the average of the first 4 when they were running by themselves, HT would help throughput. We have a Core i7 920 at work that runs a high resolution meteorological model. Running one model with 8 threads with HT on, or on 4 threads with HT off, it takes almost exactly the same amount of time either way. But running 4 vs. 8 instances of cpdn would no doubt tax the memory bandwidth in different ways. ID: 36532 · Reply Quote

DJStarfox Send message Joined: 27 Jan 07 Posts: 300 Credit: 3,288,263 RAC: 26,370	Message 36540 - Posted: 29 Mar 2009, 4:52:20 UTC - in response to Message 36532. I was actually hoping someone would experiment with that. I would suggest running 4 hadsm3\'s with HT off, then 8 with HT on, and see if it takes takes less than twice as much time to complete them. Or one could get an estimate by running 4 with HT off through several trickles, writing down average s/TS for those 4 models, then turning HT on and letting it download 4 more and run them through several trickles along with the first 4. Then see what the avg s/TS values were for the 4 new models. If the avg s/TS of the last 4 models downloaded is less than twice the average of the first 4 when they were running by themselves, HT would help throughput. We have a Core i7 920 at work that runs a high resolution meteorological model. Running one model with 8 threads with HT on, or on 4 threads with HT off, it takes almost exactly the same amount of time either way. But running 4 vs. 8 instances of cpdn would no doubt tax the memory bandwidth in different ways. Once of my concerns is that I run multiple projects. With no ability to interlace projects with CPU cores in a logical fashion, I could not guarantee that two climate models wouldn\'t fight for the same floating point units. Also, another project\'s code may compete with Climate. To make matter\'s worse, I have to run my memory at 1066MHz (CAS 7) because the cheap DDR3 I bought wants more than the allowed voltage for Core i7\'s memory controller (1.65V). Therefore, the reduced memory bandwidth may limit the performance when running several models at once. I\'m having a hard time getting BOINC to download more than two models at once. When I have some more time, I\'ll try editing the long/short term debts of BOINC to force more models to download. It took near 1 hour to download the latest model type HadAM3P. ID: 36540 · Reply Quote

JIM Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,114,703 RAC: 2,578	Message 36554 - Posted: 29 Mar 2009, 18:57:16 UTC - in response to Message 36540. I had the same experience with the HadAM3P. It took me 44 minutes to download the first WU on a 15mps cable connection. This very long download seems to happen only on the first download of the HadAM3P. The second WU downloaded in only about 20minutes. I was actually hoping someone would experiment with that. I would suggest running 4 hadsm3\'s with HT off, then 8 with HT on, and see if it takes takes less than twice as much time to complete them. Or one could get an estimate by running 4 with HT off through several trickles, writing down average s/TS for those 4 models, then turning HT on and letting it download 4 more and run them through several trickles along with the first 4. Then see what the avg s/TS values were for the 4 new models. If the avg s/TS of the last 4 models downloaded is less than twice the average of the first 4 when they were running by themselves, HT would help throughput. We have a Core i7 920 at work that runs a high resolution meteorological model. Running one model with 8 threads with HT on, or on 4 threads with HT off, it takes almost exactly the same amount of time either way. But running 4 vs. 8 instances of cpdn would no doubt tax the memory bandwidth in different ways. Once of my concerns is that I run multiple projects. With no ability to interlace projects with CPU cores in a logical fashion, I could not guarantee that two climate models wouldn\'t fight for the same floating point units. Also, another project\'s code may compete with Climate. To make matter\'s worse, I have to run my memory at 1066MHz (CAS 7) because the cheap DDR3 I bought wants more than the allowed voltage for Core i7\'s memory controller (1.65V). Therefore, the reduced memory bandwidth may limit the performance when running several models at once. I\'m having a hard time getting BOINC to download more than two models at once. When I have some more time, I\'ll try editing the long/short term debts of BOINC to force more models to download. It took near 1 hour to download the latest model type HadAM3P. ID: 36554 · Reply Quote

DJStarfox Send message Joined: 27 Jan 07 Posts: 300 Credit: 3,288,263 RAC: 26,370	Message 36730 - Posted: 17 Apr 2009, 19:32:37 UTC Running four HadAM3P models at the same time resulting in their speed being between 3.2 and 3.7 s/TS. Running one model w/ other tasks such as SETI@Home resulted in a speed of 2.3 s/TS. That's quite a difference. I'm migrating my system over to the new Fedora 11 Beta, so it'll be June before I can test with 8 CPDN models at once (w/ HT on). ID: 36730 · Reply Quote