Am I likely to suffer plex issues during parity checks?


Recommended Posts

I'm going to wanna run scheduled parity checks once a month as a way to both read all sectors (trigger any lurking pre-fail SMART attributes like pending or realloc) as well as of course making sure parity is sane. I have a couple of questions in relation to it, and in general.

 

 

 

1) My setup is up to 24 internal drives on a single 2400MB/s SAS2 connection due to the backplane's expander having a single SFF-8087 connector. So theoretically all drives have 100MB/s if they are all used at once. I have gigabit network and expect along those lines when reading from a share. Will I suffer hard here during checking? I suppose I could leave ~4 unused for spares etc. which would open up some bandwidth. Opening up hopefully at least 10-20MB/s for high bitrate movies.

 

 

2) Do unraid have something like smartd running that can instantly notify me if a bad attribute has been discovered within just minutes? And if I'm vigilant of replacing any such drive, can I expect the array of staying intact - even during sudden power loss? As in, if no drives got bad during the power loss, unraid will be able to fix any parity problems and at least have all other data intact except the file or data that was being written. (yes I got UPS, but wondering either way. I've had old units ironically create power loss themselves when battery goes old).

 

 

3) One of the reasons I haven't tried unraid before is the USB drive aspect. So I just wanna ask straight up, is there any scenario where I can loose my array or data on it from a catastrophically defective USB boot drive? Disregarding any human errors I may make myself replacing it. I mean any action it may take itself while booting up erroneously. Simply put, can I risk the array due to a bad boot USB drive? While we all praise the need for backups, this would probably be a deal breaker for me.

 

 

 

My only need for this server is 4 user shares just for backing up and storing documents and videos, as well as the plex media. While having a single VM run Windows Server 2019 on SSD cache pool (I guess this is the common way?) of 2-3 SSDs. I got like 5-6 laying around. That single VM img or qcow2 (do unraid support qcow2?) will serve all and any services in that VM, including plex and its metadata (trying to keep things simple). The only load on the array will be reading and writing backup files and media. It won't host games, databases, VMs or anything intensive. I just need storage and that single VM to be stable without hickups or weird freezes (as long as my hardware is OK of course).

 

I'm on the fence about keeping the stablebit stack of software with all its quirks and limitations, being a filesystem filter (community and support also becoming slower and less active these days) where I'd do pool:cloud realtime duplication. Or roll my own minimal debian solution (it's what I'm most comfortable with of all distros) install with zfs and kvm (minimal gui with xorg and openbox for the comfort of using virt-manager to handle VM) which I've already battle tested just to see what I can expect, so I'm getting comfortable with it. Or go unraid and being able to expand easier, not having to pay up front and plan big 8x raidz2 vdevs at once. Also at the cost of more parity...

 

Regardless, rclone backups to my gsuite where everything is today (managing it is slow, so I want it local again). Backup is covered, but I rather not download ~60-100TB in the long run unless I really have to, due to loosing ALL data on striped conventional raids (the reason I'm not going with hardware raid, even though I like adaptec). With unraid I'm most worried about loosing just SOME data, but not knowing exactly WHAT I've lost (would seriously strain my OCD). But a simple rclone sync operation should fix that. Another thing that's dodgy with stablebit, unless they've fixed it recently.

 

This ended up being a wall of text... Sorry. I just got a lot on my mind. Tend to get obsessed when faced with a challenge/problem/issue.

 

This community is mega responsive, and I love you all for it. There's a huge chance I'll buy a pro license before the weekend knocks.

Edited by Corvinus
Link to comment
1 hour ago, Corvinus said:

So theoretically all drives have 100MB/s if they are all used at once

My parity check ran yesterday (scheduled for 15th of every month).  My average speed on a parity check is in the 133-136 MB/s range.  My parity and data drives are 8TB and the parity check takes ~16.5 hours.  Speeds start out around 180 MB/s and drop over time to about 85 MB/s as it moves to the inner tracks.  My drives are connected to a PCIe 2.0 HBA in an x8 slot so 4000 MB total bandwidth (500 MB/s x 8).  Reality is that with overhead, I probably have 400 MB/s available to each drive.  For most HDDs, 200 MB/s is the upper end of the performance spectrum.

 

With 100 MB/s bandwidth to each drive, your parity checks will take longer as all are spun up and read in the check.

 

Fortunately, unRAID will allow you to pause a parity check.  You can run it at night when no one is using the system, pause it during the day, and resume it again at night.  There is even a plugin to help manage that.

 

I did have an issue yesterday afternoon with Plex streaming during a parity check.  It was buffering, which it never does.  It was close to the end of the check so I was at a bandwidth bottleneck.  A parity check pause quickly rectified that.

 

1 hour ago, Corvinus said:

can I expect the array of staying intact - even during sudden power loss? As in, if no drives got bad during the power loss, unraid will be able to fix any parity problems and at least have all other data intact except the file or data that was being written.

Lacking a UPS (which you say you have) to do a clean shutdown, a power loss will result in an unclean shutdown even if no data was being written to the array at the time of the power loss.  unRAID will automatically start a parity check when server is powered on again.  Parity check can be set to non-correcting (the default) or correcting.  A non-correcting check just alerts you if there are any problems between what parity says and the data drives.  A correcting check will actually "fix" any problems it finds.  Keep in mind that "fixing" just means that it makes sure parity reflects accurately what is on the data drives.  Problems with data on the drives will be reflected in the parity calculation as well.  A sudden power loss will result in anything in the drive cache buffer that has not been written to disk to be lost so there will likely be problems with any files be written at the time of power loss (as you know).

 

Parity is as good as the drives from which it is being calculated.

 

The short answer to your question above is that, yes, generally this is how works.  You just need to understand exactly what is happening in a parity check.

 

1 hour ago, Corvinus said:

So I just wanna ask straight up, is there any scenario where I can loose my array or data on it from a catastrophically defective USB boot drive?

In the boot up process the array is not started until the very end when everything else has checked out.  Even if you have problems with the flash drive they are limited to what the OS is trying to do as it loads into RAM and boots up; things such as loading hardware drivers, mounting the file system and checking disk availability, etc.  In fact, you can even have the system configured to not start the array automatically after the server has booted.  There is no interaction with the data on the data drives or parity directly during the boot process.

 

A corrupt flash drive results in the failure to load drivers, load the OS, etc.  It does not result in any data corruption.   If you have data corruption, it is far more likely that it is the result of the same thing that caused the flash drive issue such as a sudden power loss, bad RAM, failed power supply, etc.  A bad flash drive itself does not cause any data issues. 

 

Perhaps there are very extreme edge cases of that happening, but if so I am unaware of corrupt data being the direct result of anything wrong on the flash drive (other than user caused errors by putting something dangerous in the "go" file which executes at the end of the boot process).

Edited by Hoopster
  • Thanks 1
Link to comment
On 2/16/2021 at 4:19 PM, Hoopster said:

 Speeds start out around 180 MB/s and drop over time to about 85 MB/s as it moves to the inner tracks. 

OUter tracks. Your parity check slows as you move out from the center of the platter. The distance needed to travel for the inner sectors versus the sectors on the outer edge is much smaller.

 

Think about it this way: If you watch a car race, do the cars hug the inside or the outside of the curve?

Link to comment
37 minutes ago, eagle470 said:

OUter tracks. Your parity check slows as you move out from the center of the platter. The distance needed to travel for the inner sectors versus the sectors on the outer edge is much smaller.

 

Think about it this way: If you watch a car race, do the cars hug the inside or the outside of the curve?

Read up on CLV vs CAV.

Link to comment
48 minutes ago, eagle470 said:

OUter tracks. Your parity check slows as you move out from the center of the platter. The distance needed to travel for the inner sectors versus the sectors on the outer edge is much smaller.

Nope, it's slower on inner tracks due to constant angular velocity.

 

From an HDD benchmarking site:

 

"Hard drives run at a constant angular velocity (e.g. 7200 or 5400 rpm), and more linear track area runs under the heads at the outer edges of the disk than at the innermost tracks. The manufacturers take full advantage of those two facts to pack more data into the outer tracks, plus the data will be transferred faster in those portions of the disk since the longer outer tracks pass under the head in the same amount of time as the inner tracks with less data on them.

 

Running a typical hard drive benchmark that checks tracks across the entire disk will show a falloff in sequential read/write performance as the head moves towards the inside."

Link to comment

I have a followup if I may. It seems like a rule of thumb to check integrity every month or so, I just want to understand it more.

 

In what situations will parity get corrupted? I'm just wondering how important the parity check really is. What is it that actually can corrupt it? Is it more likely to corrupt than e.g. a mdadm RAID5/6, or equally so? As far as I know, there's no such checking there. While ZFS recommends their scrubs though.

 

Is it just about bit rot? Doesn't modern drives have internal ECC or CRC mechanisms, that uses redundant correcting data stored on the same sectors as their file data and lets SMART know if there is uncorrectable data? 

 

Thanks.

Edited by Corvinus
Link to comment

If you ever have an unclean shutdown or a write to an array drive fails then parity is probably going to be out-of-sync with the array drives. 

 

You are correct that most of the time a hardware error will detect a problem has occurred.   However many people basically run their servers with minimal attention so often miss or ignore an error message and the periodic parity check stops such errors accumulating over time without the user being aware.  s such you can consider it a form of preventive maintenance.

 

The impact on daily use can be minimised by using the Parity Check Tuning plugin to run such checks in increments outside prime time.

 

Link to comment

I see. Thanks. I think I may just run it manual then. 

 

Does unraid have SMART monitoring by default (like the smartmontools daemon/smartd)? That could e-mail whenever an uncorrectable attribute has been triggered by a drive. I tend to install it on Debian servers and be on top of drive health early. When testing ZFS I've also used ZED (ZFS event daemon) for its own notification purposes in addition to smartd. Smartd has a default SMART check interval of 30 minutes, which I tend to adjust down to 1-5 minutes. 

 

Does unraid have something equivalent already in place? If not, can I install it on unraid as well and make it stick persistently through future reboots? I'm not entirely sure how the OS configuration and customization works yet. 

Link to comment
3 minutes ago, Corvinus said:

Does unraid have SMART monitoring by default (like the smartmontools daemon/smartd)? That could e-mail whenever an uncorrectable attribute has been triggered by a drive. I tend to install it on Debian servers and be on top of drive health early. When testing ZFS I've also used ZED (ZFS event daemon) for its own notification purposes in addition to smartd. Smartd has a default SMART check interval of 30 minutes, which I tend to adjust down to 1-5 minutes. 

 

Unraid will attempt to raise a notification whenever an attribute that is being monitored changes value.  It is up to you to configure now you want such notifications to be handled.

 

5 minutes ago, Corvinus said:

I'm not entirely sure how the OS configuration and customization works yet

 

Anything configured via the GUI will stick.  Unraid stores such configuration information on the flash drive and re-applies it any time the system is booted.

 

Changes made by the command line by-passing the GUI need some additional action to make them stick as such changes are only made on the running copy of unRaid that is in RAM.

  • Thanks 1
Link to comment
  • 4 months later...
5 hours ago, captainnapalm said:

If I can tack onto this, I'd love if there were a way to pause the parity check when there's Plex activity, and resume when there is no Plex activity. Can anyone think of a way to do this?

Not directly in terms of checking for Plex activity, but as long as you have periods when you know the server will not be used for Plex you can use the Parity Check Tuning plugin set parity checks to run in increments outside prime time.

Link to comment

Just realized how old this post is. Lol Oh well. ;)

 

I have my system setup to run Parity every 1st of the month starting at Midnight and it normally runs until 9AM. I have (5) 4TB drives so its pretty quick and done normally without me even noticing other than a message I get via my email or via the webgui that it started and when its finished. 

 

I used to use the Parity Tuning Plugin because I was running some slower WD drives and I had it set start at Midnight and stop at 6AM and then restart the next day at Midnight and continue this repeat process until until it was done. Never had an issue and its pretty customizable. 

 

I do have a bunch of scripts on my machine that use the User.Script plugin to move files here and there and do other various things, but have them all set to NOT run during a parity check. In User.Scripts on each script you just add a little line that tells it to check for Parity running first. 

 

USB drive failure. Well I've heard of people having issues, but its simply a matter of backing up your /config folder literally anywhere or you could use the MyServer plugin and back it up automatically to the cloud. Don't like the cloud idea then anywhere you choose. I've been running a 2GB Cruzer stick for 9 years now and Its never troubled me once. The OS boots off the USB and puts the entire OS into Ram. Even if NUKE your OS for some odd reason it'll just reload on Bootup from the USB stick again. ;)  If your USB stick dies for some reason you simply request a new key and insert it into your drive and your good to go. There is a manual way and an automatic way to speed up recovery. 

 

unraid started from humble beginnings and has been growing at a rapid pace from its users who request or build things, devs who volunteer a lot of their time and some paid staff who keep the wheels turning.  We have a lot of Active users who simply want to help, Moderators who volunteer a lot of their time as well.  I can nearly Guarantee you will not find this level of pride and wanting to see You successful in your build than here than with any other solution. 

Link to comment
  • 2 weeks later...
On 7/2/2021 at 5:27 AM, itimpi said:

Not directly in terms of checking for Plex activity, but as long as you have periods when you know the server will not be used for Plex you can use the Parity Check Tuning plugin set parity checks to run in increments outside prime time.

Yeah, that's my current solution, just want to ideally finish the parity check as quickly as possible by running whenever there's no Plex activity. As it stands, it takes almost 2 days with a 14TB and 12TB dual parity.

Edited by captainnapalm
Link to comment
9 hours ago, captainnapalm said:

Yeah, that's my current solution, just want to ideally finish the parity check as quickly as possible by running whenever there's no Plex activity. As it stands, it takes almost 2 days with a 14TB and 12TB dual parity.

 

Fair enough.  I tend to think that since a parity check (assuming it is going to return the expected 0 errors) is basically just system housekeeping and avoiding impact on other usages of the system is far more important than elapsed time.  

 

One problem with you original question is that as far as I know there is no reliable way to detect if an application such as Plex is being actively used.  This applies to other usages that I can think of as well.

Link to comment
7 hours ago, itimpi said:

 

Fair enough.  I tend to think that since a parity check (assuming it is going to return the expected 0 errors) is basically just system housekeeping and avoiding impact on other usages of the system is far more important than elapsed time.  

 

One problem with you original question is that as far as I know there is no reliable way to detect if an application such as Plex is being actively used.  This applies to other usages that I can think of as well.

For sure, it's just a routine maintenance type activity. For a few months I had disabled the parity check tuning plugin to try to expedite the process a bit, but my users were seeing heavy lag while navigating libraries, and other various quirks with plex while the parity check was running. Re-enabled the parity check tuning to run overnight from like midnight to 9am until completed. It also didn't help that in this past July 1st parity check we were in a heat wave and the server was running very hot as it was.

 

Yeah, I'm curious if maybe there's a way to use Tautulli to detect streams, and send a command to Unraid to pause/restart the parity check.

Link to comment

You all are far nicer than I am.  My parity check running full out takes almost a day (have a lot of different disk sizes and some disks are quite old).  The people that have access to my server have been told about the parity check that happens at the beginning of the month and since they do not pay anything for the "service" I am providing they do not get to complain when it is slow or goes down for periods of time.

  • Haha 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.