Message boards : Number crunching : New Work Announcements 2024
Message board moderation
Author | Message |
---|---|
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
Copied from old thread from Glen. Forthcoming batches The following batches are planned for Jan (or early Feb). a/ Weather@Home (Windows)* NZ25 - New Zealand 25km grid, natural forcings. EAS25 - East Asia 25km grid, range of different forcings. b/ HadAM4 (Linux) N216 climatological runs producing high frequency northern-hemisphere output. c/ OpeniFS (Linux) Low resolution batch to look at variation of model results across different hardware *We'll also roll out updated versions of the apps for Weather@Home, HadAM4, & HadSM4 to fix issues with the models failing, particularly on restarts. Although we hope to get these out before the Weather@Home batches it may not happen due to time pressure from the projects funding these batches. Hoping some of these might come sooner rather than later but I have given up holding my breath! |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,575,067 RAC: 15,735 |
Two Weather@Home (windows only) will be going out from today: NZ25 - New Zealand 25km grid, natural forcings. EAS25 - East Asia 25km grid, range of different forcings. Please note that these will still use the same app that has difficulties with the EAS25 grid when the model is restarted. The only solution is to try to minimise the number of times the model is restarted to reduce the risk the model will crash. The NZ25 case is less affected. I've been working on the app code for some time correcting various memory issues. Although I have a running version on Windows built with the latest compilers there are still a few model code changes to be made to overcome the remaining memory access issues affecting these runs. --- CPDN Visiting Scientist |
Send message Joined: 22 Feb 11 Posts: 32 Credit: 226,546 RAC: 4,080 |
And all of them are already taken. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
And all of them are already taken.I have four of the East Asia tasks downloading currently. Edit should be 6048 of the EAs ones. The others haven't gone out yet. Edit2: I think you must have posted before they went out. There were also some micro batches of four or five tasks each for Linux. |
Send message Joined: 5 Jun 09 Posts: 97 Credit: 3,736,855 RAC: 4,073 |
Just got 6 EAS25. Two of them died in just over 2 minutes, the other four have been running about 5 minutes and are behaving OK (so far). |
Send message Joined: 1 Sep 04 Posts: 3 Credit: 6,372,887 RAC: 6,035 |
I got 4 new tasks and ALL errored out 11-13 seconds Am I holding my tongue wrong? 5 computers running Win11 (2/12900KS, 2/13700KF, 1/14700KF) / 1 running Win10 (X99 CPU). ALL have min of 32 gb ram and ALL set to leave in memory. Suggestions? |
Send message Joined: 22 Feb 11 Posts: 32 Credit: 226,546 RAC: 4,080 |
I got only one and server requests wait for 1 hour. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,575,067 RAC: 15,735 |
I got 4 new tasks and ALL errored out 11-13 seconds Nothing you can do. They tend fail less often on older hardware. It's a known issue that I am fixing as I type this. Unfortunately we couldn't delay the scientist's project work any longer. --- CPDN Visiting Scientist |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
I got 4 new tasks and ALL errored out 11-13 seconds Very much luck of the draw I think. Unless yours are all from the second batch of EAs tasks which is the same number of tasks as the first batch. I don't use virtual cores or I would try and get some more to check. Tomorrow when there will be enough data to check I will have a look to see if it is batch 1002 causing the problems and 1001 is relatively OK. Even if it is one batch causing problems, it is luck of the draw as to which batch you get tasks from. I have eight from 1001 and all have gotten past the 1% mark without problems but I have suspended half so as not to slow down my tasks from testing branch. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
Worth noting, unlike past practice with CPDN these have a 3 month deadline rather than a year or more. |
Send message Joined: 22 Feb 06 Posts: 491 Credit: 31,126,826 RAC: 15,188 |
Have got 7 of the EAS25 batch. 4 going OK - other 3 not yet started. For info - i7-4790K 4.00GHz CPU, 24Gb RAM, Gigabyte m/b as this is quite old, W10 O/S. |
Send message Joined: 22 Feb 11 Posts: 32 Credit: 226,546 RAC: 4,080 |
They can't use more than 2 gb each. They are still 32 bit. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
Have got 7 of the EAS25 batch. 4 going OK - other 3 not yet started. For info - i7-4790K 4.00GHz CPU, 24Gb RAM, Gigabyte m/b as this is quite old, W10 O/S. There are two EAS batches 1001 and 1002. |
Send message Joined: 22 Feb 06 Posts: 491 Credit: 31,126,826 RAC: 15,188 |
Have got 7 of the EAS25 batch. 4 going OK - other 3 not yet started. For info - i7-4790K 4.00GHz CPU, 24Gb RAM, Gigabyte m/b as this is quite old, W10 O/S. Eight 1002 and two 1001 (picked up an extra 2, not repeats) |
Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918 |
Worth noting, unlike past practice with CPDN these have a 3 month deadline rather than a year or more.About time! I hate when they're 99% done and Boinc just leaves them there suspended. If I didn't intervene they'd get done a year later. |
Send message Joined: 2 Oct 06 Posts: 54 Credit: 27,309,613 RAC: 28,128 |
Worth noting, unlike past practice with CPDN these have a 3 month deadline rather than a year or more. This is great news! |
Send message Joined: 7 Sep 16 Posts: 262 Credit: 34,915,412 RAC: 16,463 |
Cool, three months is a reasonable compromise. Though I should be able to put a lot more hours/day of compute into the project once our days get sunnier and longer this year. :) |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
Third and fourth batches, 1003 and 4 has been released earlier today. Which may well have filled up most of the Windows machines so tasks available to send will be dropping more slowly now. |
Send message Joined: 24 Dec 19 Posts: 32 Credit: 41,360,512 RAC: 50,408 |
When I 1st started getting tasks for this latest batches of tasks I noticed I had a lot fail fairly quickly for some reason. Now a few of my computers according to the error log are limited to a quota of 1 task for the day. I don't believe my computers are the issue. I believe a batch of bad tasks are what put me in this position. Any way to fix this? What didn't help is that these tasks are for Windows. I was behind on Windows updates so I had to shut down the BOINC client to do them. Upon restart even more tasks failed. A bunch failed before the restart as well. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
These tasks are prone to failure if boinc needs to be restarted for any reason. There are 4 batches out there at the moment. I will have a look in the morning to see if there is a difference between the batches. I have noticed some computers seem to crash them all for no apparent reason. Suspending computation before closing down boinc seems to reduce the failure rate. Once Glen has finished his rewriting of parts of the code to address memory issues, failure rate on subsequent batches should be greatly reduced. |
©2024 cpdn.org