Disk quickly deteriorating (bad sectors growing)


Recommended Posts

In Covecube Stablebit DrivePool I can then decide to remove a drive, with these options (see attached).

Is there something similarly easy in unRAID to kick a bad disk out? I tried using the Unbalancer plugin for that, but it gives me so many errors I can't even solve (they're probably not even correct, since they're not possible) that I'm not sure that is a thing to use for this. Sure, I can copy or move data from /mnt/disk5 to /mnt/cache or something on commandline, but that too seems not the way to go, since I'm not sure how unRAID then knows what happened..

2019-10-06_135804.png

Edited by fluisterben
Link to comment

This is really a missing feature!

Knowing that the array has more than enough free storage space to entirely de-commission a drive from the array, it would be best to be able to invoke moving its data before taking out the bad drive. This also highly speeds up the data restoration when a new drive is put in, since there's less R/W to be done.

Edited by fluisterben
Link to comment
11 minutes ago, fluisterben said:

This is really a missing feature!

Knowing that the array has more than enough free storage space to entirely de-commission a drive from the array, it would be best to be able to invoke moving its data before taking out the bad drive. This also highly speeds up the data restoration when a new drive is put in, since there's less R/W to be done.

I agree that having the ability to move data in preparation for removing a drive would be a nice feature.  However I am not sure it is that critical as moving the data off the drive is not needed if you are intending to replace it and rebuild its contents (in fact this may be something you do NOT want to do as it may hasten the demise of a drive that is failing).   Have you tried using the Unbalance plugin to achieve this?

 

Note however that the point about it speeding up the replacement is wrong.   When you replace a drive then every sector on the replacement drive is written regardless of whether it contains data or not.

Link to comment

Yes, I did try the Unbalance plugin, but it keeps telling me about permissions and errors, which simply aren't valid (I've thoroughly checked) and then it doesn't allow me to unbalance a drive out, so to say.

 

Still, the array rewriting every sector of a new drive seems horribly overkill. I'm more for the way StableBit DrivePool does it, where it basically allows you to say which dirs need to have which amount of copies in the pool, and each drive's content is accessible separately. In fact, when I first started with unRAID, I thought it was more similar to CoveCube's DrivePool, turns out it isn't, it's just another RAID array, and frankly, even the name 'unRAID' isn't really appropriate. All it is, is a GUI for 2 raid arrays (the cache and the parity-controlled array..).

 

On 10/9/2019 at 10:01 AM, itimpi said:

in fact this may be something you do NOT want to do as it may hasten the demise of a drive that is failing

 

There's what I think is really missing in unRAID; I get notices that a drive has a growing amount of bad sectors and errors, and then there's nothing that tells me how to save the files that are not corrupted yet on that drive, it just leaves me with "bad drive bad drive red alert!", honestly, that's just not the way to go. It should have a button by which to safely decommission the drive and safeguard its content. And then suddenly the GUI isn't friendly anymore, and we need to go to a shell and dd or ddrescue and such. It's such a linux-disease, pretending to offer a GUI and user friendly everything, and when push comes to shove we all need to be sysadmins and go shell-scripting again. Don't get me wrong, I like being on a shell with ssh, but it's not unRAID's intended use-case.

Edited by fluisterben
Link to comment
1 hour ago, fluisterben said:

I get notices that a drive has a growing amount of bad sectors and errors, and then there's nothing that tells me how to save the files that are not corrupted yet on that drive

As long as parity is valid, there are no corrupted files. If a read operation gives an error, the rest of the array drives are accessed and the data that should be there is calculated from all the other array drives. That value is written to the drive to repair the read error. If that write operation fails, the disk is red balled, and is not used again. All subsequent reads and writes to that data slot are calculated and emulated using all the remaining array drives.

 

"Moving" data off of that emulated drive slot involves reading and writing from all the array devices, so the least stressful way to keep that data safe is to rebuild it back to a new device. Since all sectors of all array drives are needed to emulate a missing device, all sectors must be put back on the rebuilt device to allow the new device to participate in the array parity calculation. If only the sectors containing active data were written, then parity would be wrong at the other sectors on the disk, and parity would need to be rebuilt. If a different drive were to fail, it could no longer be emulated accurately and data would be lost.

 

The GUI is all that is needed to replace a drive, no need to go the shell.

 

Removing a drive slot, even when all drives are perfectly healthy, is way more complicated. To remove a data slot you have to reverse what was done to add it. First, all data must be moved to other drives, then all the sectors of the drive to be removed must be set to 0 so the parity calculation is not effected when the drive is removed. That is a long process on a healthy array, on one that is having problems it may not even be feasible to do in a timely fashion.

 

 

Link to comment
  • 1 month later...

So, basically you're saying;

Remove the drive, put a replacement in, let it do a parity rebuild. Done.

If that is the procedure, why isn't unraid just telling me while it happens?

The way things are portrayed, I'm not sure if data in the array is in tact or complete when I just kick that drive out.

 

Here's my advice to unRAID dev;

 

I get warnings that a drive is getting bad, more failures, more SMART errors, slowly deteriorating. I want to replace it.

First thing a user wants to do is have unraid READ from that dying disk what is still in tact (and readable), move it out, and then discard blocks.

Or at the very least be really assured the in-tact versions of what may be going bitrot on that drive exist somewhere outside of that drive so user is not losing data.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.