New heavy user - Some questions


Robot

Recommended Posts

Hi all! I wanted to build a NAS/Server for a long time, and a couple of months ago I came across unRAID which seemed like the best option.

 

For the past month I've been experimenting with a very simple (and not ideal by any means) setup, with just one 512GB SSD as cache and one 2TB WD Blue drive as disk1.

I liked it, so I bought seven 4TB WD Red and a couple of NVME 1TB drives to work as a cache pool (raid1), built it yesterday and left it overnight to do the parity sync. Today everything is OK, I'm configuring my VMs, etc.

 

BUT!! I got some basic questions I didn't manage to solve reading the FAQs, maybe the info isn't there or maybe I'm just blind. Sorry if it's the latter.

 

1. I understand how parity works. but I don't understand how errors are treated. Let's assume one file has an error in one drive for whatever reason. Will the parity drive rebuild as soon as the error happens? Or will it wait for the parity check to even realize there's an error and then fix it?

 

2. I'm running dual parity, which for what I've read is usually most needed if a drive fails during rebuild of another drive. My question is, is it normal for drives to fail during rebuild of other drives? Wouldn't this imply parity drives are more prone to fail?

 

3. During the building of the server, I plugged once drive at a time to see which was which in order to label them. Motherboard connections were pretty easy, since SATA1 was sdb, SATA2 was sbc, etc. The problem is that I installed half the drives in a PCI-E Sata card, and I tried one port at a time but unRAID always assigned sdf for the new drive, so I ended up adding all of them. If a drive fails down the road, how will I know which physical drive is it?

 

4. Continuing last question (3), let's assume DISK6 fails. I buy a new drive, turn off the system, disconnect Drive 6 and boot up again (without the new drive, just to see if I'm working on the correct one). What if it wasn't the failing drive? What if I now see DISK6 FAIL and DISK4 missing? Can I just plug back in Drive 4 and try the next one until I know for a fact it it the failing one? And then replace it with the new one bought?

 

5. I'm using two NVME drives in a RAID1 config as cache pool, so when writing to a share my W/R speeds are capped by my gigabit connection, I assume though that I'd get to 1GB/s if I had a 10gbe network. That being said, during the parity-sync process I saw it did it at an average of 101MB/s, that's because it does it directly on the WD REDs. Would this speed be higher if I didn't have parity? (This one is just out of curiosity).

 

Thank you very much to anyone who read all this and specially to those who -I hope- will take some time to answer :)

Edited by Robot
Link to comment
1 hour ago, Robot said:

Let's assume one file has an error in one drive for whatever reason. Will the parity drive rebuild as soon as the error happens? Or will it wait for the parity check to even realize there's an error and then fix it?

If a data drive has a read error, then the system (parity + other drives) will calculate what should be there and then write the appropriate data back to the drive.  All done in real time without your knowledge (short of a read error appearing on the Main tab)

 

1 hour ago, Robot said:

My question is, is it normal for drives to fail during rebuild of other drives? Wouldn't this imply parity drives are more prone to fail?

No.  By and large, the absolute most common cause of drive "failures" and improper connections to them.  Commonly happens when changing out drives.  I'm a big fan of Hot Swap cases for this reason, but barring that, make sure that everything is pushed in correctly on all drives whenever touching a cable to one of them.  Because of the crappy nature of the SATA connector, it is so easy to slightly move one of them when playing with another if you touch it.

 

1 hour ago, Robot said:

unRAID always assigned sdf for the new drive

The sdX labels are immaterial, and no guarantee that they will stay the same from one boot to another.  It's assigned by the OS in the order that the drives are enumerated.

1 hour ago, Robot said:

If a drive fails down the road, how will I know which physical drive is it?

Beyond putting the drives into your case in a logical order (Parity, Disk 1, Disk 2, etc) (Which I can't be bothered to do), Label the outside of the drives with either what disk # they are, or with the last 4 characters of the serial number.  It will make it super easy to find.

 

1 hour ago, Robot said:

, during the parity-sync process I saw it did it at an average of 101MB/s, that's because it does it directly on the WD REDs. Would this speed be higher if I didn't have parity?

At that point you'd be doing a read check of the drives, and hoping that during normal operation nothing would ever fail.

 

Parity Check speeds are dictated by the drives, the controllers, and the design of the motherboard (specifically how you arrange the controllers on the PCIe slots) to maximize the speeds (and also by the tunables in Disk Settings)

Link to comment
5 hours ago, Robot said:

1. I understand how parity works. but I don't understand how errors are treated. Let's assume one file has an error in one drive for whatever reason. Will the parity drive rebuild as soon as the error happens? Or will it wait for the parity check to even realize there's an error and then fix it?

> Actually, most time system won't know data nor parity correct or not. When any write on data disk, system just update parity in same time. So some people would run parity check/correct periodic to ensure they are in "sync" and ready for in case disk fault.

 

Quote

2. I'm running dual parity, which for what I've read is usually most needed if a drive fails during rebuild of another drive. My question is, is it normal for drives to fail during rebuild of other drives? Wouldn't this imply parity drives are more prone to fail?

> in fact, any time any drive could fault even in theory parity have more write loading then others.

 

Quote

 

3. During the building of the server, I plugged once drive at a time to see which was which in order to label them. Motherboard connections were pretty easy, since SATA1 was sdb, SATA2 was sbc, etc. The problem is that I installed half the drives in a PCI-E Sata card, and I tried one port at a time but unRAID always assigned sdf for the new drive, so I ended up adding all of them. If a drive fails down the road, how will I know which physical drive is it?

> sdx ( drive letter ) for a drive would be change, Unraid use disk S/N to identify not drive letter.

 

Quote

4. Continuing last question (3), let's assume DISK6 fails. I buy a new drive, turn off the system, disconnect Drive 6 and boot up again (without the new drive, just to see if I'm working on the correct one). What if it wasn't the failing drive? What if I now see DISK6 FAIL and DISK4 missing? Can I just plug back in Drive 4 and try the next one until I know for a fact it it the failing one? And then replace it with the new one bought?

> Miss understand the question first. Let say, you have dual parity. If wrongly unplug disk 4 then array will start too and disk 4 and 6 will drop and in emulate status.

 

Quote

5. I'm using two NVME drives in a RAID1 config as cache pool, so when writing to a share my W/R speeds are capped by my gigabit connection, I assume though that I'd get to 1GB/s if I had a 10gbe network. That being said, during the parity-sync process I saw it did it at an average of 101MB/s, that's because it does it directly on the WD REDs. Would this speed be higher if I didn't have parity? (This one is just out of curiosity).

"Average" 101MB for WD RED a bit slow, there may be other cause this. If a disk max read ~168MB ( i.e. 3TB 4TB size ), average speed should ~120MB.

Edited by Benson
Link to comment
20 hours ago, Squid said:

If a data drive has a read error, then the system (parity + other drives) will calculate what should be there and then write the appropriate data back to the drive.  All done in real time without your knowledge (short of a read error appearing on the Main tab)

 

20 hours ago, Benson said:

> Actually, most time system won't know data nor parity correct or not. When any write on data disk, system just update parity in same time. So some people would run parity check/correct periodic to ensure they are in "sync" and ready for in case disk fault.

 

Squid you say that read errors are corrected without me even knowing but you Benson say that system might not even know there's an error? These replies seem a little contradictory, don't they? Or maybe you guys are talking about different stuff and I just assumed it's the same?

 

20 hours ago, Squid said:

No.  By and large, the absolute most common cause of drive "failures" and improper connections to them.  Commonly happens when changing out drives.  I'm a big fan of Hot Swap cases for this reason, but barring that, make sure that everything is pushed in correctly on all drives whenever touching a cable to one of them.  Because of the crappy nature of the SATA connector, it is so easy to slightly move one of them when playing with another if you touch it.

 

Yeah, I read about that. I bought "clicky" SATA cables, I hope they stay in place.

 

20 hours ago, Squid said:

The sdX labels are immaterial, and no guarantee that they will stay the same from one boot to another.  It's assigned by the OS in the order that the drives are enumerated.

Beyond putting the drives into your case in a logical order (Parity, Disk 1, Disk 2, etc) (Which I can't be bothered to do), Label the outside of the drives with either what disk # they are, or with the last 4 characters of the serial number.  It will make it super easy to find.

 

20 hours ago, Benson said:

> sdx ( drive letter ) for a drive would be change, Unraid use disk S/N to identify not drive letter.

 

Oh man... I didn't consider labeling them using serial numbers... I did label them with "Disk 1", "Disk 2", etc. But since I assigned them using sdx label... I guess they might all be mixed up. Plus, it's not a hot-swappable case so in order to see their serial number I must completely remove them and install them back in.

 

I guess this'll be a pending job for the next system maintenance. I plan on installing Noctua's noise reduction adaptors for all fans, so I'll probably do it then. Will wait one month or so and do a parity check after it (since I'll be moving the system and all).

 

20 hours ago, Benson said:

> Miss understand the question first. Let say, you have dual parity. If wrongly unplug disk 4 then array will start too and disk 4 and 6 will drop and in emulate status.

 

Ok, then I guess it could work. unplug/plug one by one until the faulty disk is missing.

 

20 hours ago, Squid said:

Parity Check speeds are dictated by the drives, the controllers, and the design of the motherboard (specifically how you arrange the controllers on the PCIe slots) to maximize the speeds (and also by the tunables in Disk Settings)

 

20 hours ago, Benson said:

"Average" 101MB for WD RED a bit slow, there may be other cause this. If a disk max read ~168MB ( i.e. 3TB 4TB size ), average speed should ~120MB.

 

They are indeed 4TB Western Digital Red, not Pro, 5400rpm afaik. The arrangement is simple, both cache drives are M.2 so they are where they belong (second M.2 slot disables SATA3 on the motherboard). Then the REDs, 3 of them are plugged into the motherboard and 4 of them to a PCI x1 controller with 4 SATA connections. I wanted to set the parity drives to be SATA1 and 2 on the motherboard, but since I used sdx... I guess they could be any.

 

Should they be doing more than ~100MB/s with dual parity? Looking at this benchmark from the wiki, second to last from the parity check table has 6x REDs and he's getting 105MB/s.


If they shoud indeed be faster, I really don't know how to check or what tuning I can made... help?

 

 

Thank you very much guys!!

 

EDIT: As for the last thing, drives speed, I created a test share disabling cache use, and write/read speeds saturate my gigabit network, so they are indeed faster than 101MB/s. Is it normal then for the parity sync to be slower?

Edited by Robot
Link to comment
1 hour ago, Robot said:

Or maybe you guys are talking about different stuff and I just assumed it's the same?

Yes, I talking about when reading on data disk without error and @squid said if reading error happen.

 

Anyway, simple say in both case, there are no guarantee data are consistent. But it may further indicate by filesystem which have checksuming i.e. btrfs

 

This just let you know what parity function have and haven't.

 

Edited by Benson
  • Like 1
Link to comment
1 hour ago, Robot said:

4 of them to a PCI x1

This will be bottleneck because all disk access in same time then each disk will got less then 125MB/s, so you got longer time on parity check. (assume pcie 2.0 in x1)

 

Due to longer time need, so average speed after calculation " size / time " will slow.

 

1 hour ago, Robot said:

so in order to see their serial number I must completely remove them

Most disk should have a small manufacture lable in edge side for identify.

Edited by Benson
Link to comment
46 minutes ago, Benson said:

This will be bottleneck because all disk access in same time then each disk will got less then 125MB/s, so you got longer time on parity check. (assume pcie 2.0 in x1)

 

According to manufacturer, all 4 SATA lanes should have a ~500MB/s limit each.

 

46 minutes ago, Benson said:

Most disk should have a small manufacture lable in edge side for identify.

 

It's not in the side, it's only in the "front" of the disk, thus I can't really see them except for two (the ones which don't have another disk covering it). I'd need to unscrew all disks in order to see the other five disks' serial number. That's why I say I'll do it during my next maintenance. 

 

Thanks!

Link to comment
2 hours ago, Robot said:

Or maybe you guys are talking about different stuff and I just assumed it's the same?

It's complicated. If the drive can't read a sector and returns an error instead of data, unraid reconstructs that data from all the other drives and writes it back to the drive that returned the error. If the write succeeds, the only indication of an issue is the error column in the GUI is incremented. If the write fails, the drive is disabled from the array, and all further operations involving that slot are emulated using the rest of the drives.

 

If, on the other hand, something writes corrupt data to a file, the drive or unraid doesn't know the data isn't proper, and writes it to the slot without notification or correction. Unraid has no concept of files or file systems when it comes to parity and drive emulation, it just emulates whatever was written. If the drive fails unraid emulates whatever is there, regardless of file system or corruption issues.

 

Unraid parity can't fix bad RAM, controller failures, incomplete writes from power issues, user errors, or the myriad of other issues that can corrupt files. You still need a backup to another location of any files that you can't afford to lose.

 

Unraid will help you recover from a failed drive, as long as you follow procedure.

 

2 hours ago, Robot said:

Ok, then I guess it could work. unplug/plug one by one until the faulty disk is missing.

Yeah, don't do that. Because of the way unraid handles disks that don't respond, you will end up needing to rebuild any drives that are disconnected when the array is started, or else you could end up with corrupt or missing data. I suppose you could make sure the array is not set to start automatically and do the one by one thing, but you must be sure all the drives are connected and assigned properly before starting the array.

Link to comment
1 hour ago, jonathanm said:

Be sure your drives are compatible with latched cables, or the cables include the internal retention pieces as well as the latches.

https://support-en.wd.com/app/answers/detail/a_id/15954

 

To be honest I don't know how to check if my drives are one way or the other. Cables do work, it was easy plugging them and I did hear the "click". I also could unplug them pretty easily. I'm assuming they do work as intended.

 

1 hour ago, jonathanm said:

It's complicated. If the drive can't read a sector and returns an error instead of data, unraid reconstructs that data from all the other drives and writes it back to the drive that returned the error. If the write succeeds, the only indication of an issue is the error column in the GUI is incremented. If the write fails, the drive is disabled from the array, and all further operations involving that slot are emulated using the rest of the drives.

 

In that last case I assume then that unRAID would mark the disk as failure? Asking me to replace it, right?

 

1 hour ago, jonathanm said:

If, on the other hand, something writes corrupt data to a file, the drive or unraid doesn't know the data isn't proper, and writes it to the slot without notification or correction. Unraid has no concept of files or file systems when it comes to parity and drive emulation, it just emulates whatever was written. If the drive fails unraid emulates whatever is there, regardless of file system or corruption issues.

 

You mean that if I write an "original" which is already corrupted, then unRAID will just see it correct, meaning that it's exactly what I wrote initially, right? 

 

1 hour ago, jonathanm said:

Yeah, don't do that. Because of the way unraid handles disks that don't respond, you will end up needing to rebuild any drives that are disconnected when the array is started, or else you could end up with corrupt or missing data. I suppose you could make sure the array is not set to start automatically and do the one by one thing, but you must be sure all the drives are connected and assigned properly before starting the array.

 

Mmm OK, this I need to be clear of. So my idea of how to do it is a NO. If I understand you correctly, steps would be:

1. Write down serial number of failing disk.

2. Disable auto-start of array.

3. Clean shut down of system.

4. Unplug one drive (the one I think has failed)

5. Turn on the system and see if the failing disk is indeed the one I unplugged. If so, replace disk.

6. If it is not, turn off again, plug that one back in and unplug another one.

7. Same as 5.

8. Repeat until failing disk is the one I unplugged and replace.

 

Is that correct? Actually I could use that same method just to label all disks with their serial prior to any failure, right? Since I didn't do it properly upon building the system...

 

Thank you very much!

Link to comment
39 minutes ago, Robot said:

You mean that if I write an "original" which is already corrupted, then unRAID will just see it correct, meaning that it's exactly what I wrote initially, right? 

Yep, or if the data is corrupted during the process, like with bad RAM or SATA card, unraid has no way of knowing. It will quite happily emulate corrupt data. The disk is only one part of a long chain of places that the data goes through.

48 minutes ago, Robot said:

Actually I could use that same method just to label all disks with their serial prior to any failure, right? Since I didn't do it properly upon building the system...

That sounds like the best option. While you're at it, confirm that your SATA cables have the correct retention method. The link I posted shows what to look for, in a nutshell, if the latch on the outside of the cable doesn't firmly engage a piece of plastic directly, the cable must have 2 bumps that pinch inside the SATA slot. The internal 2 bump cable is actually preferred, as all drives should have the corresponding notches behind the conductors at the edge.

 

57 minutes ago, Robot said:

In that last case I assume then that unRAID would mark the disk as failure? Asking me to replace it, right?

Yep, a write failure means the drive is marked with a red X and no longer participates in any way.

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.