Jump to content

"Fun" with Dual Parity


doron

Recommended Posts

Hi all,

 

During a reconfiguration project, I managed to get an Unraid system into an "interesting" state. I'm looking for a safe way to recover from it, I haven't figured one out yet.

 

This is kind of long winded, sorry about that. 

 

Previous state: A small array of five 4TB (WD Se) drives, three for data and two for dual parity (yes, I know). Very stable, no issues.

Acquired three new 12TB HGST HC520 SAS drives, to expand the array. The plan was to first replace the parity drives with two large ones, and subsequently add the third, plus the two 4TB former parity drives, as data drives.

 

After testing and zeroing the new drives, project commences.

 

Step 1: Stop array. Insert one large drive in the parity2 slot instead of the existing one. Start array to have Unraid sync parity2 into this drive. All seems fine (takes 22-23 hours but that's expected for 12TB. Avg speed 260MB/s which is nice).

 

Now we have an array with all data drives being 4TB, parity is 4TB, and parity2 is 12TB. Looking good. Let's call this STATE A hereby.

 

Step 2: Stop array. Insert second large drive in the parity slot instead of existing one. Start array to have it sync parity. It starts to churn nicely, but after a couple hours the drive seems to start developing a problem, parity sync seems to grind to (almost) a halt and I can swear the (little) sound I hear from the drive is not the right one (nuanced). Uh-oh. After a long wait with no progress, Unraid moves the disk to the dreaded "dsbl" and I'm staring at a red x.

 

Looking at the drive with smartctl I find a large number (tens of thousands) of "Correction algorithm invocations" on write, hundreds of "Total uncorrected errors" and "Elements in grown defect list" (these are SAS drives) so I understand there's an actual disk problem. Checking (and swapping) cabling to debug does not change anything.

 

Now, the next plan is to fall back. Sounds simple enough. Ha.

 

Thought process was: Since all data drives are still 4TB, a 4TB parity drive should be large enough per Unraid rules. So we'll stick the 4TB drive back in parity slot, sync it to stabilize the array, and then suspend the project and deal with warranty replacement etc.

 

What we did is this:

1. Stop array. Change the first parity slot to "no parity". Start array (to get rid of the dsbl in the slot). Array starts happily, with three data drives and only a parity2. Not a usual sight but not a pathological one either (or so I thought). All appears normal, Unraid's not complaining.

2. Stop array. Insert the previous 4TB drive into the parity slot. Start array to sync parity. The expectation is to go back to STATE A.

At this stage, the UI is telling me, in the parity sync progress display, that Unraid is syncing 12TB. This is a red flag. Why would it think it is syncing 12TB when in fact the drive in the slot (and also the largest data drive) is 4TB?

I let that one slide, hoping it's a presentation quirk. No it was not. Sure enough after 8 hours and 4TB of syncing it started writing outside of the drive... And sure enough it hit the breaks (at 33.3%...), saying there's a read error (sure there is!...). So it wasn't lying - it was really trying to sync 12TB of parity on a 4TB drive.

 

Pause: Would you consider this a bug? I mean, the drive is 4TB, parity sync starts and it says it's gonna sync 12TB. I presume (not sure) this is because of the weird direction - adding a first parity to an array that already has a synced parity2 - but still?

 

Anyway, the array is now back in the not-very-normal (although all "green") state of having three 4TB data drives and one 12TB parity2 drive, only.

 

Thanks for reading this far!

 

Questions:

1. Is the array protected, albeit against just one failure, in this unorthodox state? I assume it is, since a data drive failure now would be equivalent to losing one data drive plus first parity, which is (or should be) a recoverable state. Am I missing something?

 

2. Is there a way to recover from this state, safely, back into a dual parity state? "Safely" means remaining protected throughout the process. I know I can do this: Stop array, remove parity2, start array, stop array, add 4TB drive as parity, start array, allow sync, stop array, add parity2, start/allow sync. Can even sync both 4TB parity drives in parallel w/ New Config. But -- throughout the sequence marked in red, which is about 8-9 hours of massive drive churn, the array is unprotected, which is a problem (backups for critical stuff are there but still). Is there a path I'm missing?

 

Thanks for any insight!

 

Link to comment

Thanks for responding @trurl.

 

5 hours ago, trurl said:

No it wasn't. It was just continuing to sync the 12TB parity2.

I'm not sure what you mean by "continuing". The parity sync of the 12TB parity2 had been completed previously, after 23 hours and 12TB of reported progress. Are you saying that it started re-calculating parity2, as a function of a first parity drive joining in?

If so, that would makes sense; the observed data points are (a) it reported "syncing 12TB" from the moment it started building the 4TB parity, and (b) it was trying to access (I guess either r or w) the 4TB drive beyond its physical limits.
 

5 hours ago, trurl said:

Why not just use that additional 12TB disk you already have for parity1?

Fair question. I was hoping to pull all three out of the game and run extended tests on them, to see whether I have a batch of lemons or this is a one off (they aren't cheap). One of the other two does report a small number (like 15 or so) of "Correction algorithm invocations" upon write that were successfully corrected. Probably normal and not a cause for alarm but due to the one lemon, I want to be extra careful. So I'm trying to return to 4TB parity.

But -- agreed, if all else fails, I can probably do that. Prefer not to if at all possible.

 

Link to comment
2 hours ago, doron said:

I'm not sure what you mean by "continuing". The parity sync of the 12TB parity2 had been completed previously, after 23 hours and 12TB of reported progress. Are you saying that it started re-calculating parity2, as a function of a first parity drive joining in?

If so, that would makes sense; the observed data points are (a) it reported "syncing 12TB" from the moment it started building the 4TB parity, and (b) it was trying to access (I guess either r or w) the 4TB drive beyond its physical limits.

When you build parity, there is no way to specify not building both. And it's probably best that way.

Link to comment
7 hours ago, trurl said:

When you build parity, there is no way to specify not building both. And it's probably best that way.

Agreed. In this case, parity is built for an array in which the largest disk is size X, and the two parity disks are size X and size Y>>X, and it fails. It might be something worth fixing.

 

I'm still unsure as to my previous questions - (a) is the array protected at its current state (data drives and parity2 only), (b) is there any safe way to return to 4TB parity without going thru an unprotected period?

 

Thanks!!

Link to comment
13 hours ago, trurl said:

When you build parity, there is no way to specify not building both. And it's probably best that way.

5 hours ago, doron said:

Agreed... It might be something worth fixing.

Not sure I understand what you mean here. You agree but then you say something needs fixing.

 

5 hours ago, doron said:

I'm still unsure as to my previous questions - (a) is the array protected at its current state (data drives and parity2 only), (b) is there any safe way to return to 4TB parity without going thru an unprotected period?

I don't see anything in your previous description that suggested it didn't sync parity1. Just that you thought that it was wrong that it continued past 4TB. Except you did say

23 hours ago, doron said:

Anyway, the array is now back in the not-very-normal (although all "green") state of having three 4TB data drives and one 12TB parity2 drive, only.

Do you mean parity1 isn't part of the array? Is it disabled? If it is, what do you mean by all "green"? Or is it unassigned? Maybe a screenshot and diagnostics would clear things up. Did you leave anything out in your description?

Link to comment

 

37 minutes ago, trurl said:

Not sure I understand what you mean here. You agree but then you say something needs fixing.

Let me clarify. I agreed with you that it might be the best way to have both parity drives sync simultaneously. However in the case that there are two drives of different sizes in the two parity slot, the process fails and one of the drives gets a red x (dsbl). This is the part that needs fixing, IMHO. 

(e.g. one way to address it would be to make sure the smaller of the two is still at least as large as the largest data drive, and then proceed to the ends of both drives - but not beyond, so that the process completes successfully). 

37 minutes ago, trurl said:

I don't see anything in your previous description that suggested it didn't sync parity1. Just that you thought that it was wrong that it continued past 4TB. Except you did say

I didn't think that; this is what happened. It tried to read past the end of the smaller drive, failed, and placed the parity1 drive in red x disabled state.

37 minutes ago, trurl said:

Do you mean parity1 isn't part of the array? Is it disabled? If it is, what do you mean by all "green"? Or is it unassigned? Maybe a screenshot and diagnostics would clear things up. Did you leave anything out in your description?

It is now not part of the array (it is unassigned). The array now has parity2 and three data drives. Nothing left out of the description 🙂

 

42 minutes ago, trurl said:

Let me put back the bit I elided in quoting you in the previous post.

What failed?

The parity sync process failed. First parity placed in red x after Unraid tried to read past its end. See section 2 under "What we did" in the first post. Sorry of this wasn't clear from my post.

 

Link to comment
1 minute ago, doron said:

However in the case that there are two drives of different sizes in the two parity slot, the process fails and one of the drives gets a red x (dsbl). This is the part that needs fixing, IMHO. 

1 minute ago, doron said:

It tried to read past the end of the smaller drive, failed, and placed the parity1 drive in red x disabled state.

If parity1 was disabled it was not for any of these reasons. It will not try to read past the end of the smaller drive.

 

5 minutes ago, doron said:

placed the parity1 drive in red x disabled state.

6 minutes ago, doron said:

It is now not part of the array (it is unassigned)

5 minutes ago, doron said:

Nothing left out of the description

 

How did it get from disabled to unassigned?

 

54 minutes ago, trurl said:

Maybe a screenshot and diagnostics would clear things up.

 

Link to comment
1 minute ago, trurl said:

If parity1 was disabled it was not for any of these reasons. It will not try to read past the end of the smaller drive.

Nor will it try to write past the end of the smaller drive. It is possible to get a write error if you try to write a file that a disk doesn't have enough space for, but this will not disable a disk. Writing files is at a higher level than the actual disk I/O. And parity doesn't have any files anyway.

 

And not that it has anything to do with this particular case, but read failures will not disable a disk. Unraid disables a disk when a write to it fails.

 

If a read fails, Unraid may use parity plus all other disks to calculate the data that couldn't be read, and try to write that calculated data back to the disk. If that write fails then the disk will be disabled.

 

But none of this has anything at all to do with trying to access beyond the end of the disk. That just doesn't happen.

 

Link to comment
1 hour ago, trurl said:

If parity1 was disabled it was not for any of these reasons. It will not try to read past the end of the smaller drive.

s/will not/should not/ 🙂

But it did. This is what I was reporting.

1 hour ago, trurl said:

How did it get from disabled to unassigned?

Stop array; unassign red x (smaller) drive from parity slot; start array.

 

1 hour ago, trurl said:

Nor will it try to write past the end of the smaller drive. It is possible to get a write error if you try to write a file that a disk doesn't have enough space for, but this will not disable a disk. Writing files is at a higher level than the actual disk I/O. And parity doesn't have any files anyway.

Indeed. And no file i/o was happening anyway at that time. Only parity sync. 

 

What you're saying "it will not" and "nor will it" should really be "it should not" and "nor should it". My detailed problem report above asserts that it did exactly that, at exactly 4TB (33.3%) into the parity sync. I checked the logs at that point at this is what they were saying (unfortunately the machine was rebooted a couple times since then so I can't provide syslog, but trust me on that).

1 hour ago, trurl said:

And not that it has anything to do with this particular case, but read failures will not disable a disk. Unraid disables a disk when a write to it fails.

It was trying to access the disk beyond its end. Got a failure and error sense. Disabled the parity drive.

1 hour ago, trurl said:

If a read fails, Unraid may use parity plus all other disks to calculate the data that couldn't be read, and try to write that calculated data back to the disk. If that write fails then the disk will be disabled.

 

But none of this has anything at all to do with trying to access beyond the end of the disk. That just doesn't happen.

See above. Unfortunately it does. Which is why I said this is something that needs fixing.

Link to comment

I have just done a test with trying to add in a parity1 disk with a larger existing parity2 disk and did not get any error.    It is possible that there really was a write error right near the end of the 4TB parity1 disk.    We would need the system diagnostics to be sure one way or the other.

Edited by itimpi
Link to comment
7 minutes ago, itimpi said:

I have just done a test with trying to add in a parity1 disk with a larger existing parity2 disk and did not get any error.    It is possible that there really was a write error right near the end of the 4TB parity1 disk.    We would need the system diagnostics to be sure one way or the other.

And there have been no other reports of this happening.

Link to comment
18 minutes ago, trurl said:

Maybe you could repeat the experiment and then you would have diagnostics to go with a bug report.

Following your suggestion I just did. And - lo and behold - it now reports it is doing (Total size:) 4TB of sync (not 12TB as it did the previous time around). Judging by this, I'm quite sure it will also complete successfully (we'll know in 8 hours or so...).

So it may have been a one-off quirk. Which is probably why no one else reported this happening.

 

Link to comment

Since you were only replacing parity1 then it will only rebuild parity1. Not sure what was going on.

 

I'm surprised it let you replace 12TB parity1 with the old 4TB parity1 without going through New Config, and that is what I thought you must have done which resulted in it rebuilding both.

Link to comment
41 minutes ago, trurl said:

Since you were only replacing parity1 then it will only rebuild parity1. Not sure what was going on.

 

I'm very sure it was only rebuilding parity[1]. However probably due to some leftover somewhere, it decide that "Total size: 12TB" and was apparently trying to do, well, exactly that.

If I were to try to reproduce (I wish I had a test system, as I had with 5.x - a testing license would have been great), I would have tried to have two parity drives of same size, and then insert a smaller drives into the parity[1] slot and see if it's reproduced. @itimpi - if you have the bandwidth and interest...

41 minutes ago, trurl said:

I'm surprised it let you replace 12TB parity1 with the old 4TB parity1 without going through New Config, and that is what I thought you must have done which resulted in it rebuilding both.

It usually allows replacing one drive in one slot, I believe.

See above - t'was not not rebuilding both. Was rebuilding one, assuming an incorrect size.

Link to comment
1 hour ago, doron said:

It usually allows replacing one drive in one slot, I believe.

Of course it does, but it usually doesn't allow you to replace a disk with a smaller one. It makes sense that parity would be an exception to that rule though. So it appears you can replace a parity disk with a smaller disk as long as the replacement is still at least as large as any data disk.

Link to comment

I think it's entirely possible that you ran into a corner case, where a failed second parity drive is assumed to be replaced by the same size or larger, regardless of the largest data drive. Maybe restarting the server with valid single parity would have reset it, and simply starting with no drive assigned wasn't enough.

 

Diagnostics collected immediately after the event would have been valuable. If 6.8.x rc wasn't hot and heavy right now, I'd ping Tom and see if he could reproduce. Maybe after things calm down and 6.8 is out this should be revisited, to test if it's possible to reproduce.

 

Should be pretty simple, build an example STATE A array as you outlined in your first post, swap parity1 for larger drive, kill power on hot swap bay for parity1 to simulate drive failure, reinsert original STATE A parity1 drive and see what happens.

Link to comment
5 minutes ago, jonathanm said:

Diagnostics collected immediately after the event would have been valuable.

I know, sorry about that. I was debugging the situation, shutdown to check hardware, and logs and diags are gone. I need to see to it that at least syslog persists on my server.

5 minutes ago, jonathanm said:

If 6.8.x rc wasn't hot and heavy right now, I'd ping Tom and see if he could reproduce. Maybe after things calm down and 6.8 is out this should be revisited, to test if it's possible to reproduce.

I'd try to reproduce on a test server if I could run one... Side note: a test/debug license could have been a great feature. e.g. make that license type do something like shutdown the server after 24 hours, or two hours of "idling", or another arbitrary action that is very annoying for production servers but harmless for testing. I for one would have used it a lot 🙂

5 minutes ago, jonathanm said:

Should be pretty simple, build an example STATE A array as you outlined in your first post, swap parity1 for larger drive, kill power on hot swap bay for parity1 to simulate drive failure, reinsert original STATE A parity1 drive and see what happens.

Exactly. Thanks!

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...