Failing Drive - Shrinking Array.


Recommended Posts

So, I know this has been covered 1000s of times all over the web. It's just that there are a few scenarios/options out there and want to make sure I understand them.

 

------------------------------------------------------------------------------

** The Set up **

- 8 disks in the array.

- 1 parity, 7 data. 

- As well as few empty bay slots.

------------------------------------------------------------------------------

 

------------------------------------------------------------------------------

- Disk 5 started throwing errors. 

 

** Option 1: Shrink Array - Less safe**

Rather than replacing disk 5, Just shrink the array to 6 Data disks and 1 Parity. 

 

- Stop the array

- Select Disk 5 as "unassigned" under the "Main" tab

- Start the array 

- Stop the array

- Tools > New Config > "Preserve current assignments: ALL" > Apply

- Main > Start array. 

- **(One worrisome thing I see is that next to the parity drive; it says "All existing data on this device will be OVERWRITTEN when array is Started"). 

- ** I'm assuming it means JUST the parity drive? Because to make the new array it has to redo the parity right? But no data on the array will be lost? Right? 

- Completed after rebuild. 

 

** Option 1: Pros and Cons **

- Pro: It's fast and easy. 

- Con: It leaves your array venerable during the rebuild process. And since the rebuild process is more taxing than normal; another drive may fail. 

 

------------------------------------------------------------------------------

 

** Option 2: Failing Drive - Shrink - More safe **

- Use something like unBALANCE to move data off disk 5 to the other drives. 

- Then follow steps in Option 1. 

 

** Option 2: Pros and Cons **

- Pro: Doesn't leave it unprotected like Option 1. 

- Con: Takes a bit longer to transfer the files off. Esp if the drive is failing and has really shitty speeds... (My situation). 

 

Does this all sound about right? I'm currently about to start the array again. But seeing the "All existing data on this device will be OVERWRITTEN when array is Started" Warning scares me. So just want to double check. 

 

**EDIT** Or should I stick Disk 5 back in and do something else? I did start to use unBALANCE, but the speeds would drop too slow and i would have to cancel it. I also then got some "Duplicated file errors" from the "Fix Common Problems" plug in even though i was using "move" and not "Copy" in unBALANCE... But figured that would go away once disk 5 was out of the picture. No?

 

-Thanks,

 

 

Screen Shot 2020-04-26 at 11.44.07 AM.png

Edited by Maximus01701
Added additional info.
Link to comment
1 hour ago, Maximus01701 said:

I ended up starting the array... but paused it as soon as you posted... 

Sorry I've been busy with other things, and may be in and out for a while. I am concerned about what else you might have done besides starting the array. Did you do anything else?

 

Option 1 above gets you an array that no longer contains disk5, though it will have a gap in the disk5 position, so that could have been done better. But most importantly, Option 1 gets you an array that no longer contains any of the data from disk5.

 

Option 2 above does get the data from disk5 onto other disks in the array, but I don't know why it says there is no rebuild, because it then goes on to tell you to follow the steps in option1 which rebuilds parity.

 

A quick glance at the diagnostics isn't showing disk5 in the SMART folder, but it is showing invalid parity, which suggests you are now rebuilding parity without disk5.

 

Syslog indicates problems communicating with at least one disk, so the parity rebuild isn't likely to be any good anyway.

 

Stop the array and wait for further advice.

 

Link to comment
7 minutes ago, trurl said:

Is disk5 still installed? If not, go to Settings - Disk Settings and turn off autostart. Then shutdown, reinstall disk5, double check all connections, boot up again but don't assign disk5 and don't start the array. Post new diagnostics so we can check the health of disk5.

 Disk 5 is still in the server. (I have not physically removed any drives). 

 

The errors from disk 5 were UMDA errors. And from my understanding it's a hardware issue (Bad cable or something along those lines). 

So the plan was to just remove the drive from the array, then when i got more time go in and take a physical look.

 

- I shut down the array like you said. 

- Disk 5 is still in here. 

- Auto Start is already set to "no"

 

** EDIT** The advice was a culmination of reading online from different places like the wiki. So i paraphrased as best as i could here to make sure i understood it correctly. Apparently, not... 

 

-Thanks,

 

 

 

Edited by Maximus01701
More info
Link to comment
3 minutes ago, Maximus01701 said:

The errors from disk 5 were UMDA errors. And from my understanding it's a hardware issue (Bad cable or something along those lines). 

So the plan was to just remove the drive from the array, then when i got more time go in and take a physical look.

So was the disk actually disabled? If not, no reason to do anything except fix the connection.

 

If it was disabled, then by far the simplest and most correct thing to do would have been to rebuild it, to the same or a different disk. Removing disks from the array can certainly be done, but it is more complicated and more prone to mistakes. Almost certainly going to make mistakes unless you understand what you're doing or unless you carefully follow instructions exactly.

 

Is this the disk that was previously assigned as disk5: WD-WCC4M2248130 ? That disk looks OK.

 

 

Link to comment

Probably your data is OK on that disk, but it is no longer part of the array. And your parity is invalid. Since the data is probably OK and parity needs to be rebuilt anyway, just go to Tools - New Config, keep all assignments and reassign disk5. Then start the array and let parity rebuild.

 

If it looks like it is having problems, stop and post new diagnostics. Otherwise let it complete and post new diagnostics and we will see where we are.

 

VERY IMPORTANT. Do NOT FORMAT anything.

 

Link to comment
3 minutes ago, trurl said:

Probably your data is OK on that disk, but it is no longer part of the array. And your parity is invalid. Since the data is probably OK and parity needs to be rebuilt anyway, just go to Tools - New Config, keep all assignments and reassign disk5. Then start the array and let parity rebuild.

 

If it looks like it is having problems, stop and post new diagnostics. Otherwise let it complete and post new diagnostics and we will see where we are.

 

VERY IMPORTANT. Do NOT FORMAT anything.

 

 

Doing that now. 

 

I agree, the data is still on disk 5. And the disk is 90% probably okay.. 

I am doing what you said now. This is what i figured i would have to do if i was unable to shrink it. 

 

I will post the logs back up in the morning.

 

HOWEVER - hypothetically speaking... if a drive dies. (100% dead). And you don't have access to another drive but don't want to be unprotected. What would be the best way to shrink the array without that drive? 

 

-Thanks,

 

 

 

 

 

Link to comment

Disk still looks fine. Are you sure you did this?

1 hour ago, trurl said:

double check all connections

Both ends, power and SATA, including any power splitters. Change cable if you have another. The connector must rest squarely on the connector on the disk, with no tension on the cable which might make it otherwise. Also, if you are bundling your SATA cables, don't.

Link to comment
58 minutes ago, Maximus01701 said:

HOWEVER - hypothetically speaking... if a drive dies. (100% dead). And you don't have access to another drive but don't want to be unprotected. What would be the best way to shrink the array without that drive? 

Until you get more experience, probably the first step would be to ask for advice by posting your diagnostics. Disk problems are not as common as connection problems.

 

Going back to a point I made earlier though:

1 hour ago, trurl said:

If it was disabled, then by far the simplest and most correct thing to do would have been to rebuild it, to the same or a different disk.

This is basically the whole point of having parity in the first place.

 

As for shrinking, perhaps one of the things you were paraphrasing was this wiki (which I helped write):

 

https://wiki.unraid.net/Shrink_array

 

Read it carefully and see if you can see how it differs from your paraphrase. If you are capable of writing the correct instructions yourself, then it is fine to paraphrase, since you will know if your other wording is still correct.

 

The reason some of us know what to do in these situations is because we completely understand how parity works. Even though I was able to write those instructions, I have never done it myself. I just know how it needs to work.

 

Parity isn't very complicated. It is basically the same concept wherever it is used in computers and communications. Parity is just an extra bit that allows a missing bit to be calculated from all the other bits.

 

Here is the wiki on parity:

 

https://wiki.unraid.net/UnRAID_6/Overview#Parity-Protected_Array

 

Link to comment

I did not see that page of the wiki.. thank you for sending it. It does clear up a lot.. 

 

I powered down the server, swapped the cable, it hung on that USB error i mentioned in the other thread a few weeks ago. (if you have a solution for that yet). 

Started the array, and it is rebuilding a 102 MB/S now. I will keep an eye out and see if the drive starts to error out again... 

 

-Thanks,

Link to comment

So... the rebuild finished w/ out any errors... Nor did the drive put out any errors. So the eSATA cable was it. It was just my own laziness to not swap it and just shrink it and deal w/ the drive for testing later... 

 

So one issue i've caused in doing all of this, is that now i've got duplicated files... 

 

I was using unBALANCE to move stuff from drive 5 (The one giving errors) to drive 7. 

I did the dry run and fixed some permissions, I figured i was good to go. So i started to do the move, but the drive's speed went to shit and started giving out errors, So i canceled it and tried it again. (Happened a few more times). 

 

Then we fixed my fuck up.. (Trying to shrink the array). 

 

However, now i've got some duplicated files (According to "Fix common problems"). The errors says 

Quote

"The following files exist within the same folder on more than one disk.  This duplicated file means that only the version on the lowest numbered disk will be readable, and the others are only going to confuse unRaid and take up excess space:"

 

I haven't done much research yet, but figured i would post here in conjunction with googling around.  

Any ideas/recommendation on correcting that? 

 

-Thanks,

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.