SSDs seem to work fine within unRAID but probably kills Parity when thrashed


Recommended Posts

I'd like to share my experience of using SSD's and VM's on unRAID.

 

I put off using SSD's in my rig because the WIki said that SSD's should not be used within the array. Because of this I was asking for a new feature - to add multiple cache pools, but @garycase responded to an earlier thread suggesting that SSD's should work even if they weren't supported in unRAID, so  .. go for it .. and indeed they do work, BUT I suspect that Parity can't keep up with the SSD's throughput.

 

I wondered what everyone else thinks. Here's my setup/experience.

 

I have 6.1.3 Pro running on a Gigabyte m-ITX with an Intel G3258 proc, 4Gb RAM, LSI/Avago 16i HBA. I have 3x 5way hot swap bays hanging off this, with 10x 4Tb WD Reds plus a couple of smaller WD Red disks, plus a couple of Samsung fast 840/850 Pro SSDs (250Gb). If anyone is interested to know, I also replaced all the fans in my rig with Noctua super silent fans. Even when all the disks are running and several case fans I still can't hear my rig from just a few feet away. Despite having only 2 cores and 4Gb RAM and running a VM, this all works fine.

   

On the VM I'm running W8.1. On this I'm hosting BlueIris IP security camera software, because they don't make it for Linux, unfortunately. I tried others but haven't found anything comparable yet.

 

BlueIris is constantly displaying what all the cameras see and the cameras also generate video files (in a native BlueIris format) every time they're triggered by an activity within their respective detection zones. It needs to be a VM so that it continues to 'watch' what's going on when I'm not there, as I often work away from home. I could also get the cameras to FTP files out if I wanted to, but I don't at the moment

 

However, what seems to happen with this SSD is that after a few hours/days, Parity stops working. The VM is still running and recording, but obviously if I lost a disk and had no parity I'd be 'scr3w3d'.     

 

Whereas the SSD is showing as having had 11,843,000 reads and 1,200,000 writes, Parity is showing as having had 47,975,640,533,690 reads and  47,975,619,457,344 writes. 

 

If I don't run the VM, parity is fine ad infinitum.

 

So I'm thinking of moving the SSD which run my camera software off my array and onto a SNAP-managed disk, as I've already had to re-prep (pre-clear) my Parity disk several times. But doing this would also leaves me back where I started - needing to have multiple SSD's in a separate RAID array (eg software RAID off the mobo) of some kind to protect my VM/data, which then makes me wish I could have a separate cache pool instead, within unRAID/support but outside of the array.

 

Anyone have any thoughts or suggestions ?

Link to comment

Parity is showing as Faulty in WebGUI 'Dashboard' and against the disk in the 'Main' tab, there is an 'X' against the disk showing that 'Parity Device is Disabled'.

I definitely haven't disabled it ... so something else has.

Syslog attached but I can't spot anything obvious in there either.

 

 

Rgds

Duncan

unRAID has disabled it for the same reason it disables any disk, because a write to the disk failed. Go to Tools - Diagnostics and post the complete diagnostics zip file. The diagnostics zip is always preferred for V6 instead of just posting the syslog.
Link to comment

SMART for parity drive WD-WCC4EHJF6896 looks OK but the syslog does indicate writing to parity failed so the disk was disabled.

 

Check connections. What is your power supply?

 

Your idea in the subject of this thread doesn't really make any sense, but you should be aware that any writes to any disks in the array, SSD or spinner, will depend on the speed of your parity drive since unRAID reads parity and data, calculates the parity changes the data changes would make, and writes parity and data.

 

Link to comment

Power supply is an ATX 850W.

All connections are solid

I'm only trying to share my experiences here .. not expecting a solution.

In my experience, although SSD's will work in some circumstances (eg read-based workloads) it would appear that the parity disk gets disabled when (presumably) it can't keep up any more, so unless any SSD within the array is doing very little writing it's going to become a problem for maintaining parity at some point and a 4-6-8-10 Tb SSD is not going to become available/affordable any time soon.

So in my case, I will move the SSD which is receiving a lot of writes, out of my array, giving my parity disk a fair chance of doing what it's supposed to be doing.

 

 

Link to comment

Nope. That isn't the case.

 

AFAWK, there is no such thing as Parity drive not being able to keep up causing issues, since further writes are idled to the slowest drive involved in the process of reading the old data and then writing the new data.

 

You have an issue in your setup. The community is trying to help you fix your issue. Others do not have similar issues.

 

Link to comment

A couple of thoughts ...

 

First, as already noted, the fact the parity drive is much slower than the SSD should NOT cause any issue ... it will just slow the writes down to the speed of the parity drive.    If the goal of using an SSD was to eliminate drive spinups, then you should use SSDs in a btrfs cache pool (which is also fault tolerant) and just use a cache-only share for your VM's storage ... OR use the SSD mounted outside the array (as you're already considering).

 

Second, although it's true that read/write counts aren't generally something to be concerned about, the magnitude of the counts on the parity drive is troubling => nearly 48 TRILLION operations vs. a few MILLION for your other drives (at least the SSDs).    This is a factor of more than 1 million to one !!  Since UnRAID has disabled your parity drive, I'd change to a new parity drive before concluding that other drives are causing this.

 

As for SSDs working in UnRAID => one of the forum members (Johnnie.Black) has a very nice test setup that has helped diagnose a lot of issues that is exclusively SSDs, and has no issues whatsoever with its performance or reliability ... but it's sure neat to see parity checks measured in MINUTES instead of hours  :)

 

One very interesting bit of info Johnnie.Black has been able to provide us with his all-SSD array is the performance of various controllers, since his SSDs can stress the controllers to their max limits.

http://lime-technology.com/forum/index.php?topic=43026.msg410578#msg410578

 

 

 

Link to comment

Thanks for your feedback/consideration of my setup issue.

I already changed the parity disk once, to one I had pre-cleared, because I couldn't clear the 'disabled' status until I had replaced the disk, so wanted to use one that I knew was good, as I assumed the previous one had actually failed. I've not yet had a chance to see if the previous disk that went this way can be pre-cleared again, but as I now have two disks in this state I'll kick that off today.

 

 

Link to comment

Not necessary to preclear a parity drive, or any drive that will be used for a rebuild. Only disks that will be added to a new slot need to be clear so parity will remain valid. Preclear can be used to test drives even if they don't need to be cleared. You can also rebuild parity or data to a drive that was disabled instead of rebuilding to a new disk if there is nothing wrong with the disk. Disks will be disabled when a write to it fails for any reason. Often the disk is not the problem.

Link to comment

Trurl: I was aware that you don't need to pre-clear a parity disk, but the reason for doing this was purely to check whether the disks are OK or have been damaged in some way.

If there's a quicker way to check whether a disabled disk is still 'good' can someone please tell me it. It sounds like there's a pre-clear test mode (switch) I could use.

 

Also, is there a way to re-enable/reset a disk without bouncing the server several times as per the wiki, or by changing the disk.

 

Should I perhaps try hanging the parity disk off the mobo's own on-board SATA controller to rule out anything dodgy on the HBA/hotswap cage. How will this impact throughput compared with all disks being on the same (PCIe2) controller.

 

Thanks

 

Link to comment

In fact iirc, the disks not being on the same controller is a good thing for throughput since it helps avoid channel saturation. unRAID isn't doing straight device to device xfers over the controller. It has to read it off the controller, calculate parity, and then send the data back and it is doing that on two devices (data and parity). By having parity on a different controller you should be avoiding bottlenecking. Im' also pretty sure that on-board ports are always best especially for the parity drive.

 

Or my memory is off from when I made the decisions I did with my own system.

 

 

Link to comment
  • 2 weeks later...

As for SSDs working in UnRAID => one of the forum members (Johnnie.Black) has a very nice test setup that has helped diagnose a lot of issues that is exclusively SSDs, and has no issues whatsoever with its performance or reliability ... but it's sure neat to see parity checks measured in MINUTES instead of hours  :)

 

Sorry, I was going to reply to this earlier but got sidetracked and forgot.

 

This has nothing to do with the OP issues, but I would not recommend using SSDs on the protected array, I really like my SSD only test server as it is very fast to test various issues and procedures, most of the times I use precleared SSDs because this allows me to change number of disks and chose any disk as parity and it is always valid.

 

When the SSDs contain data I experience some unexpected sync errors, I can do 10 consecutive parity checks with 0 errors but if I power off and do a parity check on the next day it will almost always find some errors, sometimes 2 or 3, other times 20 or more, I believe this is caused by the SSDs garbage collection and/or wear leveling algorithms, and it can be different with other SSDs models, but for anyone wanting to used them for data with parity protection I would recommend at the very least doing a lot of parity checks in the beginning to make sure parity is always synced.

 

Link to comment

Thanks for the post Johnnie => apparently TRIM is indeed causing some issues with SSDs, so I'm definitely going to change my suggestions vis-à-vis using them in the array (at least until LT indicates they have TRIM support).  I had assumed there was no issue primarily because of you and your all-SSD test array  :)

 

 

... I wonder if this is also an issue for btrfs cache pools.

 

Link to comment

Hi,

I thought I should update you guys that based on some of the points and feedback posted, I moved my (CCTV-VM) output folder to the btrfs Cache, left the vdisks on an SSD within the RAID and all seems fine now. I have 2x SSD's with the array and parity is no longer getting hammered.

I take the point that TRIM may ultimately be the problem, but as a read-mostly resource it seems to work OK.

I'll monitor Parity check errors for any spurious results as suggested by johnnie.black, and if necessary move the SSD's off the array pending 'support' for SSD's

Thanks all.. 

Link to comment

... as a read-mostly resource it seems to work OK.

 

I suspect this is because if you're not actively writing to the SSDs, there's nothing to "TRIM" ... so there are no changes that could impact parity.  Note also that any errors that DO occur as a result of TRIM activity are legitimately on the parity disk, so as long as you do correcting checks they'll be properly corrected and the SSD will actually work just fine.  So if writes to them are rare, and you understand that you may occasionally get a few sync corrections on a check after those writes, then it should be fine to keep them in the array.    But clearly it'd be best if there was actual TRIM support in UnRAID.

 

Link to comment
So if writes to them are rare, and you understand that you may occasionally get a few sync corrections on a check after those writes, then it should be fine to keep them in the array.
I'm still unclear on that. If the parity is randomly incorrect when an SSD is involved, then a rebuilt failed drive will be corrupt if those incorrect bits happen to correspond to a file location. Personally I wouldn't keep a SSD in a single parity protected array until there is more exhaustive testing.

 

Randomly incorrect parity is not good, doesn't matter which disk it's on, because ALL disks are used to reconstruct.

Link to comment

Now I'm confused.  If trim on an ssd can move data around and possibly corrupt parity then that would mean that the ssd knows and understands the filesystem present on the ssd and updates the file allocation and b trees accordingly.  In other words this means that trim actively modifies your file system, and I find that very hard to believe.

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.