Message boards : Number crunching : Erroneous disk space notices
Message board moderation
Previous · 1 · 2
Author | Message |
---|---|
Send message Joined: 15 Jul 17 Posts: 99 Credit: 18,701,746 RAC: 318 |
1. The computer may be crashing lots of tasks which are not being cleared out, thus slowly taking up disk space.My computer is not crashing these WUs they are crashing because their parameters are faulty. But either way you're saying this project is incapable of cleaning up behind itself. 2. If the computer is being allowed to continue running tasks while the re-cabling of Oxford is being done and we're "off the air", then it will also fill up with files waiting to be sent back.And everything has been sent back and the problem persists. Besides I've never come close to filling my SSDs with anywhere from 74 GB to 700 GB available and this problem still persists. |
Send message Joined: 15 Jul 17 Posts: 99 Credit: 18,701,746 RAC: 318 |
Exactly. The message seems to be about storage memory of which I have no shortage. Maybe RAM is the problem. I think this program reserves memory somehow and hogs it for itself. Problem is I don't know how to get Linux to give me a report showing what it's reserved for itself, which is obviously too much.You may be right but even then, given that as far as I know this exact problem hasn't appeared before, that implies there is something about your setup that triggers the problem. I like Les, have been crunching since the early days of the project and not encountered it either personally or on the message boards until now.Do I misunderstand something, or is something else wrong? The original post shows complaints that the O.P. does not have enough DISK SPACE. And the responses seem to be about the amount of RAM needed. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
Exactly. The message seems to be about storage memory of which I have no shortage. Maybe RAM is the problem. I think this program reserves memory somehow and hogs it for itself. Problem is I don't know how to get Linux to give me a report showing what it's reserved for itself, which is obviously too much. If RAM is the problem, why is the complaint: climateprediction.net: Notice from server UK Met Office HadAM4 at N216 resolution needs 133.09MB more disk space. You currently have 1774.26 MB available and it needs 1907.35 MB. 9/19/2021 3:57:48 AM Rig-45 -------------------------------------------------------------------------------- climateprediction.net: Notice from server UK Met Office HadAM4 at N216 resolution needs 1907.35MB more disk space. You currently have 0.00 MB available and it needs 1907.35 MB. 9/18/2021 3:45:56 PM Rig-17, Rig-36 Since the disk space you report differs from the disk space whatever process is reporting the shortages (above), why are they different? Just what is the process that makes these error messages? Rig-45 has 338 GB available on its SSD with 22.5 GiB used of 31.1 GiB RAM and a barely used 16 GiB swap file. Rig-17 has 89 GB available on its SSD with 23.9 GiB used of 31.1 GiB RAM and a barely used 16 GiB swap file. I would think your computer would complain about RAM shortage. But I have never seen that. OTOH, if you did not have enough RAM, would not the Boinc Client just refrain from running the program until enough RAM were available? The Linux command free -h will tell you a lot about RAM and swap space usage On my machine with 64 GBytes RAM and running Red Hat Enterprise Linux release 8.4 (Ootpa), I get . $ free -h total used free shared buff/cache available Mem: 62Gi 10Gi 2.5Gi 120Mi 49Gi 50Gi Swap: 15Gi 12Mi 15Gi |
Send message Joined: 15 Jul 17 Posts: 99 Credit: 18,701,746 RAC: 318 |
I just doubled the RAM in Rig-47 from 32 GB to 64 GB. Rig-47: https://www.cpdn.org/show_host_detail.php?hostid=1521364 Note that this computer has Free Disk Space = 69.52 GB and yet WCG won't run because it says it needs 500 MB more. 134 World Community Grid 9/28/2021 10:15:20 AM Message from server: OpenPandemics - COVID 19 needs 200.00MB more disk space. You currently have 0.00 MB available and it needs 200.00 MB. 135 World Community Grid 9/28/2021 10:15:20 AM Message from server: Mapping Cancer Markers needs 500.00MB more disk space. You currently have 0.00 MB available and it needs 500.00 MB. aurum@Rig-47:~$ free -h total used free shared buff/cache available Mem: 62Gi 15Gi 25Gi 126Mi 21Gi 46Gi Swap: 15Gi 0B 15GiRig-47 has 10 hadam4h WUs and one hadam4 WUs running. Hyperthreading is disabled so this i9-10980XE CPU has 18 CPU cores available. I think the problem lies entirely with the code for hadam. |
Send message Joined: 15 Jul 17 Posts: 99 Credit: 18,701,746 RAC: 318 |
Rig-34 had this problem when it had 10 hadam4h WUs running but after completing one it's behaving normally and allowing WCG WUs to run. So it can handle 9 but not 10 WUs. Rig-08 runs as expected with 4 hadam4h WUs plus 9 hadam4 WUs. This is what makes me think the problem is coded in hadam4h. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
I just doubled the RAM in Rig-47 from 32 GB to 64 GB. Rig-47: https://www.cpdn.org/show_host_detail.php?hostid=1521364 It seems to me that the generators of these two sets of readings of disk space consumption are reporting about two different machines. |
Send message Joined: 4 Oct 15 Posts: 34 Credit: 9,075,151 RAC: 374 |
I just doubled the RAM in Rig-47 from 32 GB to 64 GB. Rig-47: https://www.cpdn.org/show_host_detail.php?hostid=1521364 This message exactly shows you, where the problem is. If you are right, and you have this much free space on your rigs, then you have configured boinc wrong, in therms of how much disk space it is allowed to use. please double check the following options, and test, how changes affect your message logs: (tickbox) Use no more than XX GB; (tickbox) Leave at least XX GB free; (tickbox not ticked) Use no more than XX% of total The above works fine for me. Greets Felix |
Send message Joined: 9 Oct 04 Posts: 82 Credit: 70,017,155 RAC: 3,100 |
134 World Community Grid 9/28/2021 10:15:20 AM Message from server: OpenPandemics - COVID 19 needs 200.00MB more disk space. You currently have 0.00 MB available and it needs 200.00 MB. 135 World Community Grid 9/28/2021 10:15:20 AM Message from server: Mapping Cancer Markers needs 500.00MB more disk space. You currently have 0.00 MB available and it needs 500.00 MB. You do not have enough disk space available. You might reconfigure your BOINC options as indicated by the post above. And you are right, if a climateprediction.net WU crashes the zip files will not be cleaned up afterwards. So, you will have a lot of worthless information eating up your disk space. Someone mentioned it before, there are two solutions: restart projector clean-up all the crashed WUS by hand in the corresponding project folder. Yes, it is not easy and hassle free to run climateprediction.net, but therefore it is fun and will help further generations! |
Send message Joined: 15 May 09 Posts: 4542 Credit: 19,039,635 RAC: 18,944 |
If only running ten, then memory wouldn't be a problem. If using hyperthreading, and all 36 cores then it would, though a bigger problem then would be the massive slow down due to lack of space in Cache memory and swapping from there to RAM slowing things down. I am at a loss to explain the disk space being misreported though as I have for experiment only run 16 of the N216 tasks at once with broadly similar disk space available with no issues. With respect to crashed tasks not cleaning up after themselves, this seems to me much less of a problem than it used to be and it only rarely seems to happen to me now whereas it used to happen frequently. That may be because outside of testing branch, I only get the very occasional crashed task these days. |
Send message Joined: 15 Jul 17 Posts: 99 Credit: 18,701,746 RAC: 318 |
Well then it seems you're wrong.I just doubled the RAM in Rig-47 from 32 GB to 64 GB. Rig-47: https://www.cpdn.org/show_host_detail.php?hostid=1521364It seems to me that the generators of these two sets of readings of disk space consumption are reporting about two different machines. |
Send message Joined: 15 Jul 17 Posts: 99 Credit: 18,701,746 RAC: 318 |
I already said they're set to 95%. I presented that in different ways. Exactly what should these settings be???I just doubled the RAM in Rig-47 from 32 GB to 64 GB. Rig-47: https://www.cpdn.org/show_host_detail.php?hostid=1521364 |
Send message Joined: 15 Jul 17 Posts: 99 Credit: 18,701,746 RAC: 318 |
One computer submitted its last WU and I detached it. My available disk space increased by 60 GB!!!. Yes, this is a buggy program that cannot clean up behind itself.134 World Community Grid 9/28/2021 10:15:20 AM Message from server: OpenPandemics - COVID 19 needs 200.00MB more disk space. You currently have 0.00 MB available and it needs 200.00 MB. 135 World Community Grid 9/28/2021 10:15:20 AM Message from server: Mapping Cancer Markers needs 500.00MB more disk space. You currently have 0.00 MB available and it needs 500.00 MB.You do not have enough disk space available. You might reconfigure your BOINC options as indicated by the post above. I've tried this command several times and it has no effect whatsoever: /etc/init.d/boinc-client restartI assume what you mean is the Reset in BOINCmgr: Reset project: Stop the project's current work, if any, and start from scratch. Use this if BOINC has become stuck for some reason. Any unreported results and tasks in progress will be discarded.I don't want to wipe out all current work having committed weeks to it. Better to let them finish and Detach CP. As for doing it by hand how can I know for sure what is garbage and what is still in use??? I agree climate studies should be first or second priority. EDIT: I looked in the CP project folder and it's obvious there's many outdated folders. I deleted them and it started playing nice with others again. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
It seems to me that the generators of these two sets of readings of disk space consumption are reporting about two different machines. I see I am making the same mistake that some others may be making: confusing RAM and DISK SPACE. The UNIX and Linux free command deals with RAM and the df command deals with disk space. And with the large amount of RAM some of us have, that exceeds the amount of disk space that used to be common (20 years ago) it is an easy to make mistake. I remember how thrilled I was when someone came out with a 2 GByte hard drive. SO if the free command says you have 46 GBytes of RAM available, and whatever source you get your error messages from says it needs 200 MBytes or 500 MBytes of disk space, then what I said previously, while not wrong, is nonsense: a clear case of comparing apples to oranges. So we should be comparing apples to apples and the Linux command for that is df On my machine, for example, I run the Boinc client in a dedicated partition. So my total disk space looks enormous to me, but the amount dedicated to Boinc is much more modest. The boinc partition is 118 GBytes, and I am using only 23% of it. There are four N216 CPDN tasks up in there and some WCG, Rosetta, and Universe ones as well. $ df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/rhel-root 50G 9.7G 41G 20% / /dev/mapper/rhel-home 410G 19G 392G 5% /home /dev/nvme0n1p2 1014M 349M 666M 35% /boot /dev/nvme0n1p1 599M 17M 583M 3% /boot/efi /dev/sdb3 118G 25G 87G 23% /var/lib/boinc <---<<< /dev/sdb1 92G 60M 87G 1% /D3P1 /dev/sda2 98G 19G 79G 20% /home/jeandavid8/Sound /dev/sda1 489G 204G 285G 42% /home/jeandavid8/Videos /dev/sdb2 92G 13G 75G 15% /D3P2 /dev/sdb7 196G 1.8G 194G 1% /D3P7 /dev/sdb6 196G 2.4G 193G 2% /D3P6 /dev/sdb5 387G 16G 371G 5% /home/margaret |
Send message Joined: 9 Oct 04 Posts: 82 Credit: 70,017,155 RAC: 3,100 |
With respect to crashed tasks not cleaning up after themselves, this seems to me much less of a problem than it used to be and it only rarely seems to happen to me now whereas it used to happen frequently. That may be because outside of testing branch, I only get the very occasional crashed task these days.I would not say so: If Aurum has the problem with disk space and, as it seems to me, lot of crashed models, these crashed models will eat up a lot of space quite fast! Since I got WSL working on two Win10 computers, I had to clean-up by hand crushed WUs every times Win10 decided to restart my computer after the monthly Up-Date cycle without my intervention. And I remember well going around my Linux computers with WU numbers written down reported on climateprediction.net as crashed and cleaning it up on the hard disk so new ones could be downloaded again. This is the reason I do not run climateprediction.net on my server. |
Send message Joined: 9 Oct 04 Posts: 82 Credit: 70,017,155 RAC: 3,100 |
EDIT: I looked in the CP project folder and it's obvious there's many outdated folders. I deleted them and it started playing nice with others again.Great that it worked! Unfortunatelly, this is a little house keeping one has to do on climateprediction.net, when there is no disk space left. |
©2024 cpdn.org