Drive Failure and Parity Upgrade


Recommended Posts

I am just looking for some advice on how to proceed. I had a drive I planned on using to upgrade my second parity but I have had a drive fail before I was able to put it.

 

Current situation. 2 Parity drives, and 1 failed drive. unraid 6.6.6 (going to update to 6.7.2 when array is healthy again)

Parity 1 is a WD gold

Parity 2 is a Seagate Archive 

Spare is a WD Red/White label

The failed drive is a WD green

 

Where I want to end up:

Parity 1 remains unchanged 

Parity 2 is the WD Red/White label

The failed drive is replaced with the Seagate Archive (Which was previously Parity 2)

 

Size wise I'm fine to move the drives where I want.

 

I want to know the "safest" way to accomplish my goal which to me would be to maintain dual parity while swapping out the failed drive.

 

I think what I want to do is Parity Swap/Swap Disable, as outlined here: https://wiki.unraid.net/The_parity_swap_procedure

Would that be correct? 

Link to comment

It likely has. I do hate that you can't see the SMART attributes after a disk has been disabled. For the most part I would say it is unlikely a cable issue, as nothing has been touched internally in a long time. The case has hot swap bays. So far all the drives I have that have failed I have confirmed are bad by plugging them into a separate system and running preclear as a stress test, most just die part way through, other have the sector count increase etc. With one exception if my memory is right as one was disabled by unraid and I think tested ok. And I think it might have been from this slot so there is a chance something is going on there. So thanks for the tip/idea. I guess easiest thing to do would be move it to another slot. One reason is I don't like playing around inside and causing more problems, like knocking a/another cable off, when the array isn't healthy. And secondly the server is a bit remote at the moment and in a family members house who is doing the drive replacements for me and they won't be going inside.

 

But hypothetically if the green drive is dead (which is still likely as I see in my notes it did have a few errors on it in the past), would the parity swap be the right path? Does it even work with dual parity?

Link to comment
On 9/4/2019 at 7:46 AM, trurl said:

That isn't true. Diagnostics usually contain SMART for a disabled disk as long as the disk can be communicated with.

Well I was referring to the GUI, going into the properties of the drive shows nothing, says disk needs to be spun up but clicking spin up does nothing. 
 

Anyhow, moved the drive just to see and after rebooting I could get the the attributes. This disk had 

Current pending sector 618

Offline uncorrectable 617

 

Not sure if I knew it was high before and acknowledged it and when on my way wondering how much longer it would last or it shot up quickly.  But I'm pretty sure it was the former. Not surprised knowing the drives history. 

 

Parity swap running now, so far initializing it was easy, now its just time to wait. Thanks guy and thanks limetech/bonienl for making the was an easy task in the gui.

 

Link to comment
5 hours ago, trurl said:

Do you have Notifications setup to alert you immediately by email or other agent as soon as a problem is detected?

Yup, that's how I was aware of the drive failure. ***

 

So I decided to do something I guess kind of dumb and now I'm not sure how to proceed again. I think I know but would like to be sure.

 

I did the parity swap, and unraid copied the data from my old parity to my new parity with the array offline. That operation completed and then I was presented with a stopped array. I had valid parity and starting the array would start the rebuild of disk 15 (overwriting my old parity). That's all well and good.  But I moved the 2 drives around the server physically, with the server on and array stopped (as I have hot swap bays). That pissed off unraid.....In hindsight I think if it did it with the server off it would have been fine and moving them physically shouldn't mater to unraid correct?

So if I assign my old parity back to the parity slot and leave disk 15 unassigned its happy. It's back to before I started the parity swap and copy procedure. I then start the array, then stop the array then continue with the parity swap/copy procedure again.  I would suspect this will work 100% as it will just copy the parity again (unnecessarily really). After the copy is complete, start the array and let it rebuild disk 15.

 

But I think there is a way to assign the new parity drive and tell it to trust it. Then build disk 15. 

 

Essentially I know I have 2 parity disks that are the same due to the parity copy procedure. My new parity and old parity drives should be identical. But unraid wants to "All existing data on this device will be OVERWRITTEN when array is Started" on my new parity, even though it should be valid since its a clone of the old parity. I'm about 99% I can't use new config as I'll loose parity and the data on my failed disk. I was thinking more of a trust parity but I don't see that option.

 

 

Option one seems a lot safer but a bit longer so I'll likely end up there but would like to know if option 2 really exists.

 

 

 

***

Side note, I was just thinking it would be nice to have a history of "acknowledged" drive stats. Since you get a notification of say a reallocated sector, say 5. And while that's not great as long as it stays at a steady state then not a big concern. But then some time later you get a notification for 6. You acknowledge and move on. What I'm getting at is it's hard to keep track of if the drive health is slowly diminishing if you don't recall when you have acknowledged the errors. Sure there are other solutions but a history of the error and when you said "yeah I saw it" would be nice to have in the drive information page. 

 

Edited by bnevets27
Link to comment
4 hours ago, bnevets27 said:

But I moved the 2 drives around the server physically, with the server on and array stopped (as I have hot swap bays). That pissed off unraid

There seems to be some details implied in the "pissed off unraid" that I am missing. How did you get from there to it wanting to rebuild parity?

Link to comment

That's it nothing else happened. But I haven't talked about rebuilding parity. 

 

Parity 2 was in bay 2

Disk 15 was in bay 19 (The replacement for the dead disk 15)

 

Parity swap/copy was performed.  All I had to do was hit start and unraid would built the data onto disk 15.

 

But before I hit start I moved the disks

 

Parity 2 (which after parity swap became disk 15) was moved to bay 15

Disk 15 (which after parity swap became Parity 2) was moved to bay 2

 

Unraid then marked both disk as new (blue square) and said "All existing data on this device will be OVERWRITTEN when array is Started" on both disks (parity 2 and disk 15)

 

The only way to correct the above issue is to change the assignments back to before the parity swap happened. If I assign disk 15 back to parity 2 slot and unassign disk 15 from slot 15 then unraid reports "Configuration valid'. 

 

So I think the obvious process would be to basically start over. Start the array without disk 15 assigned to a slot. Stop the array, add the new disk and perform parity swap/copy. Then start the array as I should have done. Without moving anything.

 

What I think happened is unraid saw the disks disappear when the drives were pulled out. For some reason it seems to have forgot the parity swap/operation happened. I suspect if it had powered down the server then did the move unraid would not have cared. But that's only a theory.  

 

Not sure if something weird just happened or this is a "bug" or a note should be added to the procedure. It would need to be replicated of course but it would be simple. Do a parity swap then unplug and replug while powered on and see if unraid forgets that the parity swap happened. So the note would just be to not make any changes until the array is started at least once.

 

 

Well I decided to try and start over. Started the array with a valid config which like I said was going back to before the parity swap. Which is disk 15 back to parity 2 slot and disk 15 unassigned. With the array up everything was fine disk 15 was emulated and I have valid parity on both parity drives. Now I was going to start the parity swap again but I don't know if this looks right. I didn't think disk 15 had the "data will be overwritten" warning (disk 15 being the "old" parity 2)

 

Please confirm this is expected on a parity swap and I'll proceed. But it looks wrong and like I would have data loss. (loss of data of disk 15)

image.thumb.png.f9e37566c3936a79fbf95142ffab6da4.png

 

*Note, this is what I saw after moving the disks around and why I was worried to start the array. As the data on disk 15 (old parity 2) needs to be copied onto parity 2. If the data on disk 15 is overwritten then the parity info is lost. So this looks wrong.

 

The ironic thing here to me is both disk 15 and parity 2 are currently identical as the parity copy operation finished. Assign either to parity 2 should be fine but unraid won't allow the new drive to be assigned.

Edited by bnevets27
Link to comment
7 hours ago, bnevets27 said:

Parity 2 (which after parity swap became disk 15) was moved to slot 15

Disk 15 (which after parity swap became Parity 2) was moved to slot 2

Just to clarify, and possibly make a point that might have been missed.

 

People sometimes use words differently. I try to say "port" when I mean how the disk is attached, and "bay" for the physical location of the disk within the case, and "slot" for the actual Unraid disk assignment, since that is how Unraid refers to that in the logs.

 

When you use the word "slot", do you mean the actual Unraid disk assignment? Rearranging those invalidates parity2.

 

Link to comment
7 hours ago, johnnie.black said:

Yes, any interruption of the parity swap procedure requires staring over, same if you reboot/powerdown after parity copy.

Ah ok that's exactly where I went wrong. Thanks. Maybe adding a warning to the wiki or GUI might not a be a bad idea.

3 hours ago, trurl said:

Just to clarify, and possibly make a point that might have been missed.

 

People sometimes use words differently. I try to say "port" when I mean how the disk is attached, and "bay" for the physical location of the disk within the case, and "slot" for the actual Unraid disk assignment, since that is how Unraid refers to that in the logs.

 

When you use the word "slot", do you mean the actual Unraid disk assignment? Rearranging those invalidates parity2.

 

You make a good point and I was using the wrong terminology. Which I'm usually a stickler for so thank you for putting me straight. I edited my last post to reflect the proper wording.

 

 

Since I know I'm starting over, I'm going to perform the parity swap/copy again. The server is currently in the above state waiting to start the process. But I want to confirm that seeing "All existing data on this device will be OVERWRITTEN when array is Started" on both disks is the expected behaviour. It really feels like it shouldn't be what is expected.

Link to comment
2 minutes ago, bnevets27 said:

But I want to confirm that seeing "All existing data on this device will be OVERWRITTEN when array is Started" on both disks is the expected behaviour. It really feels like it shouldn't be what is expected.

It is, first parity disk will be overwritten with old parity, then disk15 will be overwritten by the rebuild.

Link to comment
30 minutes ago, johnnie.black said:

It is, first parity disk will be overwritten with old parity, then disk15 will be overwritten by the rebuild.

Thanks Johnnie. That makes sense but I had thought I had not see it shown like that (both disks as "new"/blue) the first time I did the parity swap/copy. From what I've gathered the copy operation happens then the array stops and then you have to bring it online to rebuild.

 

So during the copy, the old parity (which in this case is now in slot 15) is writing to the new parity in the parity 2 slot. Disk 15 during this operation isn't getting written to. Now after the copy operation happens the array needs to be started, when the array is started that's when the data on disk 15 is overwritten. I know this is minor details but having the correct information in the GUI would help understand what is really happening.

 

Started the parity copy now, and I won't be doing any changes until I start the array this time. Thanks to both of you.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.