FreeMan

July 1, 2021

Sigh. OK.

Before I do anything else, I need to get this disk rebuilt. Then I need to get the cooling sorted.

If I remember, I'll get back to checking this out after those are all sorted out.

July 1, 2021

11 hours ago, AndrewMc said:

Only 1MB left, when the limit it clearly 5GB.

The "minimum space free" setting is the minimum space that UNRAID will look for before copying a file from the cache to a particular disk available to a share. It will not ensure that this is the minimum amount of space that is always available on a share.

For example (all numbers made up for this example), if you rip a BluRay, create 4.9GB file, then write it to the server into the cache enabled plexMedia share, UNRAID will actually write it directly to the cache in the plexMedia directory. When the mover kicks off, it will scan Disk1, discover that it has a plexMedia directory that could be used, and initiate the minimum-disk-space-free check. It discovers that there is 5.1GB of free space on Disk1, so the min free space passes (more than 5GB free is "good to go"), and moves your new 4.9GB file to Disk1 in the plexMedia share, leaving 0.2GB free on Disk1. Obviously, your tolerances were even tighter than my made up example, however, it's not a bug.

July 1, 2021

I believe this is a bug, but thought I'd first post here in General Support, for confirmation.

I had to replace a disk and I kicked off the disk rebuild yesterday. My server is suddenly having overheating issues like never before, so Parity Check Tuning has been pausing the disk rebuild to allow the drives to cool back down. Therefore, the rebuild has been going very slowly (it runs for about 30-45 minutes, the drive overheats, it pauses for 15-20 minutes. Lather. Rinse. Repeat.) When I went to bed last night, I snuck a quick peak and it was about 55% complete on the disk rebuild. (Yes, I'll be looking into the overheating as soon as I've got the disk rebuilt. In the meantime, I've got extra fans blowing, though I'm not sure it's helping a lot...)

When I woke up this morning and checked on the rebuild, I was quite shocked to find that it was in the high single-digit percentage (7-9% somewhere). It seems that my regularly scheduled monthly parity check kicked off at about 3:30 this morning.

The parity check history does not show anything from the data rebuild (TBH, I don't know if it should):

image.png.3dc65005a8564d833edfe91bb1a42988.png

And, more importantly, the drive is still showing emulated: image.png.7433a00901dd4e7579127cf87fb76e7d.png

I'd imagine that whatever happened will show up here:

nas-diagnostics-20210701-0749.zip

If this is indeed a bug, is it possible to just move this to the Bug Reports section? If not, I'll retype it all.

June 30, 2021

7 hours ago, itimpi said:

If the parity check is paused or resumed by the Parity Check Tuning plugin then the GUI can get out of sync if left on the Main tab until you either do a refresh on the page or navigate away and back.

I would like to fix this with an automated refresh if the GUI is left on the Main page but do not know of any way to achieve this from within the plugin.

TBH, it only seemed to get out of sync when I manually resumed the rebuild. I've left the Main page up (mostly) as this has been doing its thing, and it seems to be OK. I'll be sure to leave it up for a while to see if it does get out of sync again.

June 30, 2021

7 hours ago, bonienl said:

This looks like a bug. Can you make a bug report?

~~Will do.~~

Done

June 29, 2021

After the drive got hot again, it seems to be back in sync:

image.png.076bd5f2f0628efcb4799ddeb7b2a4f7.png

June 29, 2021

Somehow, the button label has gotten itself out of sync with what's actually happening:

image.png.05536f3d3a786e6666f398a245ad0bc4.png

@bonienl does this fall under your purview? Is there anything else I can get you to help sort it out?

June 29, 2021

I replaced a drive and I'm rebuilding.

I've got this rather odd looking display of info on the Array Operation portion of the Main display:

image.png.736d9775bd5fdb7c97dfc2f36ab7031b.png

It appears that the rebuild has paused, but the button option is to "Pause", not "Resume".

Unfortunately, the current server location is not ideal for airflow, so the new drive got a bit toasty and the Parity Check Tuning kicked in and paused the rebuild based on temp. I'm about 99% sure that I then went in and told it to resume (instead of letting it do so automatically - I know...). Is this to be expected based on my actions?

If I click "Pause", should it change to "Resume" and allow me to manually resume the operation?

I'm getting a fan to improve air flow to help keep the server a bit cooler, so I may disable the Parity Check Tuning's ability to pause the rebuild, since I also want it to complete...

June 29, 2021

Did you do an upgrade recently? There was a change in the default behavior of docker networking - even for existing docker installs. I don't recall if it was the 6.8 or 6.9 release that did this, but I noted similar issues.

If you look at the Docker page, you should see that many/all of them are on a 172.* subnet.

Can't explain all the details to you, I'm sure someone will stop by who can, but it was intentional and it did cause me a few headaches. Ended up having to reconfigure a few things from the 172.* network to the 192.168.* IP address to talk "back" to the server that way.

June 29, 2021

Since parity is calculated across all disks, you can't "just" have UNRAID remove parity info about one disk.

If you don't need the disks, stop the array, do a new config, then add in the 10 disks you still need (they can be disk1 - disk10, no need to keep them in the same Disk# position they were in before), and let it rebuild parity. The only "non-parity-rebuild" option is to skip parity entirely.

You'll then have 14 disks available to the Unassigned Devices plugin (assuming that's installed). Or, you can physically pull the drives.

BTW- nice server names. Exactly the same as mine!

June 29, 2021

Like many other web display issues, have you tried clearing the browser cache and cookies? Have you tried a different browser?

Also, strongly recommend enabling system notifications - it'll give you early warnings (via email, browser pop-ups, or notification app to your phone or other computer) about any issues that may impact your server.

June 29, 2021

For future reference when UNRAID indicates issues with a disk:

1. STOP all activity on the server

2. Post a full diagnostics (Tools -> Diagnostics) zip file here with an explanation of the situation

3. WAIT until someone responds with instructions

3a. KEEP waiting until someone responds - general tinkering tends to lead to data loss

4. Follow them TO THE LETTER

June 28, 2021

6 minutes ago, JorgeB said:

Does it get better if you stop all writing to disk8? Dashboard CPU usage will show i/o load.

At the time, there was nothing that I'm aware of that was writing to Disk8. I had a file downloading, but that had completed before I witnessed the CPU pinned for a minute or two. Unless, of course, it was caching at the server or drive level and the download had completed but it wasn't finished actually writing to disk.

As related to the other issue I posted (that you addressed a couple of hours ago), I think I'm going to swap my new 8TB drive for the current disk8, just to get it out of the mix and see what happens.

If all goes well in that scenario, I might consider adding this current disk8 back into the array, moving the contents of the 2, quite old, 4TB drives to it (controlled moves overnight where there should be nothing else going on) and excluding this particular drive from all shares to prevent additional writes from going to it.

Or, I may just bite the bullet, pick up another non-SMR drive, and replace the two 4TBs with that.

June 28, 2021

Again this afternoon, I've run into this:

image.png.53731fd8ab9c166470ded3f3ff43e3d4.png

and it's been like that for several minutes - no jumping around, just pegged.

I grabbed a couple of screen shots from top showing shfs taking a fair amount of CPU:

image.png.e7b48a248a32430e5cab90bf22462ce3.png

image.png.800ceb3d3441e1b1efb07604db9b7104.png

image.png.b56da152bfae3727c83f637bc567e99b.png

and I grabbed diagnostics again.

nas-diagnostics-20210628-1344.zip

I discovered this page which seems to indicate that shfs may no longer be relevant. I don't recall what distro UNRAID is based on, but Arch, at least, seems to be deprecating it.

Also, the server's been quite busy most of the day today

image.png.9ad43a8932ae074c389683a579dde74d.png

Any insight whatsoever to what may be causing this or how to figure out what's causing it would be most appreciated!

June 28, 2021

This, parity check is the same as an extended SMART test, so a good way to see if there are more issues.

Thanks! I will do. And, I'll keep an eye out for a good deal on a new drive.

Sent from my moto g(7) using Tapatalk

June 28, 2021

Disk1 is at 7.8TB full - I don't recall having had issues like this as it was filling up, while Disk8 is at only 3.15TB full. Of course, I had been using cache for all shares for a very long time, and recently (about 6-9 months ago) switched to not caching writes to most of my shares, so I may have had writes this slow and just never noticed because it was in the middle of the night. I've switched the one share that's getting the majority of the writes back to using cache, so maybe the problem will appear to go away.

Are the 2 Reported Uncorrect errors worth enough worry to swap the drive right now, or would you suggest letting it ride for now, while keeping an eye on it. (Maybe run a short or even long SMART test monthly or so, after parity check is complete.)

June 28, 2021

Diskspeed shows that it's not entirely unreasonable in terms of read speed

image.png.ae80794879851ffdf89feb2a2ecd3a50.png

And, it's on par with Disk1 which is the other Barracuda drive I've got

image.png.15164aa9fdce44ba0fdbb59c1cd22e9b.png

Unfortunately, that's only read speed being tested, not write speed. I've not specifically noticed issues with writing to Disk1, but this is really the first time I've tied some of the system issues I've had to writing to a specific disk, so I'm not sure if Disk1 is contributing to the general slowness I'm having.

The drive's less than 18 months old, but, of course, the warranty was only 12 months, so I guess I'm a bit hosed on this one. I do have a new drive (Iron Wolf, I believe) that was going to go in place of one of the rather old 4TB drives. Would I be better off replacing this disk, instead and replacing the apparently otherwise functional 4TBs later?

June 28, 2021

I have 2 "reported uncorrect" errors on my reasonably new disk 8. It also seems that any time I try to read/write to this disk, especially if it's more than one file at a time, I'm getting extremely slow response from the server.

Is this an indication that the disk is failing or has other issues that need to be addressed immediately?

nas-diagnostics-20210628-0744.zip

June 22, 2021

CPU utilization since 07:00 today:

I just rebooted the server. The wife wants to watch a race, and that takes a higher priority than gathering more info.

Anyone have any ideas what might be causing this or suggestions on how to find out what is?

June 22, 2021

OK, I don't have the CPU completely pinned at the moment, but the server's becoming unusable.

I've got 2 torrents downloading at ~1Mb/s or less, and I'm trying to play a video on my Kodi box, but it simply won't start - it spins for 30-60 seconds then goes right back to the menu like nothing ever happened. I did just talk to my son who is upstairs watching videos on his laptop. He's using a web-based Emby client, watching over WiFi with absolutely no problems what so ever.

I'm seeing a lot of this in top:

image.png.6d2079e555b8a90ae956e29504ab65d9.png

Here are the diagnostics that I just pulled while this is going on: nas-diagnostics-20210621-2313.zip

Any suggestions?

June 13, 2021

6 hours ago, JorgeB said:

If now it's working correctly it's difficult to say, but maybe it was some docker running in the background, if it happens again, the parity check being very slow, post diags grabbed at that time.

I appreciate that and will do so. That's also what I've done for the initial issue in this thread of the very high CPU utilization.

I seem to be able to work around these issues (reboot the server fixes the high CPU, pause the parity check fixes the very slow speed) but that's doing me nothing to track down the actual cause of the issues and come up with some sort of resolution.

June 12, 2021

The parity check finally completed. It's reporting 600MB/s because the last bit of run was only 3 hours and it was on a set of 8TB disks. Maths are a bit off... It doesn't seem to have noticed that I manually paused/resumed several times.

This did pop up within the last hour or so of the parity check:

image.png.38707057f74cca94b0e43016c4c96d8a.png

I can acknowledge it, I know, but is that something to be concerned about? It's the only error that came out of the parity check.nas-diagnostics-20210612-1956.zip

June 12, 2021

Extended SMART tests finally completed... No errors on any HDD or SSD.

nas-diagnostics-20210612-1050.zip

The correcting parity check from the unclean shut down has been resumed with about 3.6TB to complete. Since the initial pause of this check:

On 6/10/2021 at 10:52 AM, FreeMan said:

I've never had a parity check run anywhere near this slow. I mean I don't have the fastest setup in the world, but my last 21 checks (what fits on the first page of the parity check display) averaged 106MB/sec. Now it's running at 19.4? It slows down as it nears the 4TB mark because of a couple of older 4TB drives in the mix, but never that slow...

The check seem to be running at a more normal speed.

Any other suggestions of what to do or where to look to determine what the issue may be?

I've got a pre-cleared 8TB drive that's ready to go in for a replacement for either Drive 3 or 4 (whichever has had the most spinning hours - they're close). Would it make sense at this point to do the disk replacement (after the parity check completes), or should I try to sort this out before risking anything?

June 11, 2021

I've got the extended SMART tests running now.

I'll be gone most of the rest of the day, but if that finishes before I leave, I'll resume the parity check before I go.

UPDATE: Eh, decided to resume the parity check as the SMART tests are only at about 10% completion. As of now, it's running at ~110 MB/s. It'll be late this evening before I'm back home to check on it (though I'll touch base via ControlR & WireGuard, when possible). All dockers are shut down for now, so that may well be helping, too.

June 11, 2021

OK. Issues yesterday with Speed Gaps on Drive 3 yesterday. To the point where it stopped the test and wouldn't continue. I shut down the DiskSpeed docker, watched some TV and resumed the parity check overnight.

This morning, I shut down all dockers, fired up DriveSpeed and ran a test again. Everything seems normal to me:

image.png.c23f27f8689b8707ef4c858fcb988df1.png

image.png.cac9ece41e68b1fde76cb835df5e0ff8.png

The small dip at ~4.5TB/60% is on Disk4, but doesn't seem particularly significant:

image.png.8543573b6a0a9c6a453cb8b360ecd4d5.png

FYI: Drive/controller arrangement: (I'm sure there's a more compact/text representation somewhere, but this was quick & easy)

image.png.0cb20b38d84749fa98d758a5e232ea9b.png

Despite the fact that all the SMART tests seem to have aborted, it appears that the Extended SMART test completed on Disks 3 & 4 w/o error:

image.png.e407f4f0770633a846e02b7555df9cc4.png image.png.add0bf5e032e039d281ca5d5acbd4a4f.png

Note that the power on hours are within a few hours of current - the test #1 results are the runs from yesterday afternoon. I am, however, starting an Extended Test on all drives again, just to get complete data.

Anything else I should be looking into?

FreeMan

Posts

Joined

Last visited

Content Type

Profiles

Forums

Downloads

Store

Gallery

Bug Reports

Documentation

Landing

Posts posted by FreeMan

[Resolved - bug reported] Scheduled parity check interrupted disk rebuild.

[Now NOT solved] MOVE from cache doesn't work when drive full.

[Resolved - bug reported] Scheduled parity check interrupted disk rebuild.

Parity-Sync/Data-Rebuild - Pause/Resume button text out of sync with reality.

Parity-Sync/Data-Rebuild - Pause/Resume button text out of sync with reality.

Parity-Sync/Data-Rebuild - Pause/Resume button text out of sync with reality.

Parity-Sync/Data-Rebuild - Pause/Resume button text out of sync with reality.

Parity-Sync/Data-Rebuild - Pause/Resume button text out of sync with reality.

Dockers unable to connect to each other

Shrinking Array, best way to go about it?

Unraid GUI Issues

Help with Maintenance mode

How do I identify the cause of very high CPU usage?

How do I identify the cause of very high CPU usage?

Disk with "Reported Uncorrect" errors

Disk with "Reported Uncorrect" errors

Disk with "Reported Uncorrect" errors

Disk with "Reported Uncorrect" errors

How do I identify the cause of very high CPU usage?

How do I identify the cause of very high CPU usage?

How do I identify the cause of very high CPU usage?

How do I identify the cause of very high CPU usage?

How do I identify the cause of very high CPU usage?

How do I identify the cause of very high CPU usage?

How do I identify the cause of very high CPU usage?