FreeMan

Members
  • Posts

    1520
  • Joined

  • Last visited

Everything posted by FreeMan

  1. Due to overheating issues, my data disk rebuild is pausing (using Parity Check Tuning) when drives get too hot. * At 00:21 this morning, the rebuild was paused due to heat at 57.7% complete * At 00:30 the normally scheduled parity check was initiated by the scheduler * At 00:35 the parity check (which does seem to be rebuilding the data) paused again due to temperatures, now at 0.4% complete. Screenshots of the Pushover notifications showing this: I reported this in the General forum where JorgeB confirmed the issue by manually pausing a check on a test system without PCT installed, confirming it's a bug in the base Parity Check launch logic allowing the check to restart a running, but paused check or (much more critically) rebuild. (Also, I pointed out the "%%" typo and itimpi is going to fix that in PCT). I agree with JorgeB that this is something of an edge case, but, for those running with single parity and on not the highest-end hardware, it does extend the "at risk" time running one drive down. I'm glad I was "only" at 58% complete, not 90+% complete - that would have been extremely frustrating! The diagnostics from when I discovered the problem earlier today. nas-diagnostics-20210701-0749.zip
  2. Unless this can be moved to the bug thread, I'll post an issue later today. Thanks for duplicating it. Sent from my moto g(7) using Tapatalk
  3. it adds nothing to the discussion, but it's quite sad: (Wow, sorry that's so BIG!) Those are the Pushover notices from when it reset. I was wrong in my initial report - the Parity Check is scheduled for 00:30, not 03:30 on the first of the month. There is a minor issue of the extra "%" sign in there, but I think this is the first time I've ever noticed, so it's definitely a minor issue.
  4. Thanks,@itimpi. Not trying to pick on you with these PCT reports lately, I promise! I've discovered that I'm pretty good at finding bugs in other's code, not so good at finding them in my own. [emoji53] Frankly, if it weren't for PCT automatically pausing the rebuild, I may well have cooked a drive by now, so I don't really mind all that much. I have no idea why it's running so hot after installing this one additional drive. Fixing that is my next highest priority. Sent from my moto g(7) using Tapatalk
  5. Sigh. OK. Before I do anything else, I need to get this disk rebuilt. Then I need to get the cooling sorted. If I remember, I'll get back to checking this out after those are all sorted out.
  6. The "minimum space free" setting is the minimum space that UNRAID will look for before copying a file from the cache to a particular disk available to a share. It will not ensure that this is the minimum amount of space that is always available on a share. For example (all numbers made up for this example), if you rip a BluRay, create 4.9GB file, then write it to the server into the cache enabled plexMedia share, UNRAID will actually write it directly to the cache in the plexMedia directory. When the mover kicks off, it will scan Disk1, discover that it has a plexMedia directory that could be used, and initiate the minimum-disk-space-free check. It discovers that there is 5.1GB of free space on Disk1, so the min free space passes (more than 5GB free is "good to go"), and moves your new 4.9GB file to Disk1 in the plexMedia share, leaving 0.2GB free on Disk1. Obviously, your tolerances were even tighter than my made up example, however, it's not a bug.
  7. I believe this is a bug, but thought I'd first post here in General Support, for confirmation. I had to replace a disk and I kicked off the disk rebuild yesterday. My server is suddenly having overheating issues like never before, so Parity Check Tuning has been pausing the disk rebuild to allow the drives to cool back down. Therefore, the rebuild has been going very slowly (it runs for about 30-45 minutes, the drive overheats, it pauses for 15-20 minutes. Lather. Rinse. Repeat.) When I went to bed last night, I snuck a quick peak and it was about 55% complete on the disk rebuild. (Yes, I'll be looking into the overheating as soon as I've got the disk rebuilt. In the meantime, I've got extra fans blowing, though I'm not sure it's helping a lot...) When I woke up this morning and checked on the rebuild, I was quite shocked to find that it was in the high single-digit percentage (7-9% somewhere). It seems that my regularly scheduled monthly parity check kicked off at about 3:30 this morning. The parity check history does not show anything from the data rebuild (TBH, I don't know if it should): And, more importantly, the drive is still showing emulated: I'd imagine that whatever happened will show up here: nas-diagnostics-20210701-0749.zip If this is indeed a bug, is it possible to just move this to the Bug Reports section? If not, I'll retype it all.
  8. TBH, it only seemed to get out of sync when I manually resumed the rebuild. I've left the Main page up (mostly) as this has been doing its thing, and it seems to be OK. I'll be sure to leave it up for a while to see if it does get out of sync again.
  9. (Originally posted for support) The Parity-Sync/Data-Rebuild button text toggles between "Pause" and "Resume", depending on the state of the currently running parity check. Using the Parity Check Tuning parameters, I have my parity check (actually a data rebuild in this case) set to pause on reaching the warning temperature and resume when dropping 2 degrees. Unfortunately, my new disk that's having the data rebuilt onto it is getting quite rather hot, so it triggered the thermal pause. I manually spun down disks to let everything cool down a bit, then hit the "Resume" button to manually restart the data rebuild. A while later, I think after it had automatically paused/resumed again (though I don't recall at this point), I looked at the Main screen again and the button text read "Pause" and the "Elapsed time" display read "50 minutes (paused)". i.e. the button text read "pause" when the process was actually paused and it should have read "resume". To reproduce: 1) Set Parity Check Tuning to pause during a check/rebuild based on temperature 2) Shut off air flow through your test server 3) Start a parity check 4) Wait for a drive to start cooking 5) Manually resume the check while the drive is still hotter than PCT's "resume" temperature 6) Leave the Main page open, monitoring temps and check progress 7) At some point, the drive will get hot, PCT will pause the check and the text will be out of sync with the current check activity Priority level set to "annoyance" because it does seem to get itself back in sync if PCT is allowed to do its thing without interference. I did not think to grab diagnostics while the display was out of sync, this is fresh as this report is being created and while the display is in sync. nas-diagnostics-20210630-0957.zip
  10. Somehow, the button label has gotten itself out of sync with what's actually happening: @bonienl does this fall under your purview? Is there anything else I can get you to help sort it out?
  11. I replaced a drive and I'm rebuilding. I've got this rather odd looking display of info on the Array Operation portion of the Main display: It appears that the rebuild has paused, but the button option is to "Pause", not "Resume". Unfortunately, the current server location is not ideal for airflow, so the new drive got a bit toasty and the Parity Check Tuning kicked in and paused the rebuild based on temp. I'm about 99% sure that I then went in and told it to resume (instead of letting it do so automatically - I know...). Is this to be expected based on my actions? If I click "Pause", should it change to "Resume" and allow me to manually resume the operation? I'm getting a fan to improve air flow to help keep the server a bit cooler, so I may disable the Parity Check Tuning's ability to pause the rebuild, since I also want it to complete...
  12. Did you do an upgrade recently? There was a change in the default behavior of docker networking - even for existing docker installs. I don't recall if it was the 6.8 or 6.9 release that did this, but I noted similar issues. If you look at the Docker page, you should see that many/all of them are on a 172.* subnet. Can't explain all the details to you, I'm sure someone will stop by who can, but it was intentional and it did cause me a few headaches. Ended up having to reconfigure a few things from the 172.* network to the 192.168.* IP address to talk "back" to the server that way.
  13. Since parity is calculated across all disks, you can't "just" have UNRAID remove parity info about one disk. If you don't need the disks, stop the array, do a new config, then add in the 10 disks you still need (they can be disk1 - disk10, no need to keep them in the same Disk# position they were in before), and let it rebuild parity. The only "non-parity-rebuild" option is to skip parity entirely. You'll then have 14 disks available to the Unassigned Devices plugin (assuming that's installed). Or, you can physically pull the drives. BTW- nice server names. Exactly the same as mine!
  14. Like many other web display issues, have you tried clearing the browser cache and cookies? Have you tried a different browser? Also, strongly recommend enabling system notifications - it'll give you early warnings (via email, browser pop-ups, or notification app to your phone or other computer) about any issues that may impact your server.
  15. For future reference when UNRAID indicates issues with a disk: 1. STOP all activity on the server 2. Post a full diagnostics (Tools -> Diagnostics) zip file here with an explanation of the situation 3. WAIT until someone responds with instructions 3a. KEEP waiting until someone responds - general tinkering tends to lead to data loss 4. Follow them TO THE LETTER
  16. At the time, there was nothing that I'm aware of that was writing to Disk8. I had a file downloading, but that had completed before I witnessed the CPU pinned for a minute or two. Unless, of course, it was caching at the server or drive level and the download had completed but it wasn't finished actually writing to disk. As related to the other issue I posted (that you addressed a couple of hours ago), I think I'm going to swap my new 8TB drive for the current disk8, just to get it out of the mix and see what happens. If all goes well in that scenario, I might consider adding this current disk8 back into the array, moving the contents of the 2, quite old, 4TB drives to it (controlled moves overnight where there should be nothing else going on) and excluding this particular drive from all shares to prevent additional writes from going to it. Or, I may just bite the bullet, pick up another non-SMR drive, and replace the two 4TBs with that.
  17. Again this afternoon, I've run into this: and it's been like that for several minutes - no jumping around, just pegged. I grabbed a couple of screen shots from top showing shfs taking a fair amount of CPU: and I grabbed diagnostics again. nas-diagnostics-20210628-1344.zip I discovered this page which seems to indicate that shfs may no longer be relevant. I don't recall what distro UNRAID is based on, but Arch, at least, seems to be deprecating it. Also, the server's been quite busy most of the day today Any insight whatsoever to what may be causing this or how to figure out what's causing it would be most appreciated!
  18. Thanks! I will do. And, I'll keep an eye out for a good deal on a new drive. Sent from my moto g(7) using Tapatalk
  19. Disk1 is at 7.8TB full - I don't recall having had issues like this as it was filling up, while Disk8 is at only 3.15TB full. Of course, I had been using cache for all shares for a very long time, and recently (about 6-9 months ago) switched to not caching writes to most of my shares, so I may have had writes this slow and just never noticed because it was in the middle of the night. I've switched the one share that's getting the majority of the writes back to using cache, so maybe the problem will appear to go away. Are the 2 Reported Uncorrect errors worth enough worry to swap the drive right now, or would you suggest letting it ride for now, while keeping an eye on it. (Maybe run a short or even long SMART test monthly or so, after parity check is complete.)
  20. Diskspeed shows that it's not entirely unreasonable in terms of read speed And, it's on par with Disk1 which is the other Barracuda drive I've got Unfortunately, that's only read speed being tested, not write speed. I've not specifically noticed issues with writing to Disk1, but this is really the first time I've tied some of the system issues I've had to writing to a specific disk, so I'm not sure if Disk1 is contributing to the general slowness I'm having. The drive's less than 18 months old, but, of course, the warranty was only 12 months, so I guess I'm a bit hosed on this one. I do have a new drive (Iron Wolf, I believe) that was going to go in place of one of the rather old 4TB drives. Would I be better off replacing this disk, instead and replacing the apparently otherwise functional 4TBs later?
  21. I have 2 "reported uncorrect" errors on my reasonably new disk 8. It also seems that any time I try to read/write to this disk, especially if it's more than one file at a time, I'm getting extremely slow response from the server. Is this an indication that the disk is failing or has other issues that need to be addressed immediately? nas-diagnostics-20210628-0744.zip
  22. CPU utilization since 07:00 today: I just rebooted the server. The wife wants to watch a race, and that takes a higher priority than gathering more info. Anyone have any ideas what might be causing this or suggestions on how to find out what is?