(Halp): During a Data rebuild (added a new drive), one of the drives started showing errors


Recommended Posts

First of all Hi,

As the title already says, during a Data rebuild (added a new drive - not the failing drive), one of the drives started showing errors (the bad kind and quite high count - drive might be failing, would need to turn of and check all the cables if perhaps one got loose or PSU issue, which i suspect with this system (old-ish PSU)). Speed of the rebuild fell down to 1 MB/s but the data rebuild continued and is now continuing at normal speeds.

My question is, after reading through different posts, some suggest to leave the data rebuild to finish and then swap the failing HDD with a new/working one. But I am afraid of the drive failing during the rebuild (not sure if 1 parity drive is enough to emulate the failing drive and protect me from a failing drive) or to stop the data rebuild and put the old drive back and switch the failing drive with the new one (that is currently being rebuilt).

My question is - Is 1-parity protecting me in this scenario (1 drive emulated - data rebuild, and 1 soon to be an unmountable drive but currently is not)?

Also what is the best way to proceed in this scenario, continue or stop and re-add the drive back?

Thank you for any clarification, first time I have an issue with two drives at the same time.

*Attached diagnostics.

thetower-diagnostics-20240126-2211.zip

Link to comment

Disk3 rebuilding, disk2 read errors, all disks currently still mountable. Disk2 errors are not connection issues, should probably have been replaced before now.

 

Do you have Notifications setup to alert you immediately by email or other agent as soon as a problem is detected? Don't let one unnoticed problem become multiple problems and data loss.

 

Do you have another copy of anything important and irreplaceable?

 

Do you still have the original disk3? Why exactly were you replacing it?

Link to comment

Will def. replace disc2. Question is what to do till then. Do you perhaps know if 1 disc parity is protecting me in this scenario (1 drive emulated - data rebuild, and 1 failing drive)?

Important stuff, yes, a backup. But not of of media which is i guess not critical, but still quite a workload to redo it all and would like to avoid doing it, as it will take years to redo :/

I still have original disc3, only unplugged it and still sitting in the same spot. Just replacing it because of age and higher capacity, will be replacing two more in a short while.

Update: Data rebuild at 26%, error count has maintained the same at OU 1032.

Link to comment

So original disk3 should still have its contents if disk3 rebuild doesn't go well, which it may not.

 

Single parity requires all other disks to be reliably read in order to reliably rebuild a disk. Parity contains none of your data, and by itself it can't rebuild anything. So not only is single parity not protecting you from disk2 failing, it really can't be expected to rebuild disk3 since disk2 isn't working well.

 

 

Leave Docker and VM Manager disabled and don't write anything to your server until we get your array stable again.

 

Do you know if anything has been written to your server since you removed original disk3? It might be better to New Config original disk3 back into the array and rebuild disk2 instead.

 

SInce you didn't answer this I assume the answer is no. Might have made you decide to replace disk2 instead of disk3.

5 hours ago, trurl said:

Do you have Notifications setup to alert you immediately by email or other agent as soon as a problem is detected? Don't let one unnoticed problem become multiple problems and data loss.

You should do something about that.

 

 

Link to comment

 

7 hours ago, trurl said:

So original disk3 should still have its contents if disk3 rebuild doesn't go well, which it may not.

Correct.

 

Quote

Single parity requires all other disks to be reliably read in order to reliably rebuild a disk. Parity contains none of your data, and by itself it can't rebuild anything. So not only is single parity not protecting you from disk2 failing, it really can't be expected to rebuild disk3 since disk2 isn't working well.


Crap. So I need to setup dual parity. Is there a chance that the rebuild does work?

 

7 hours ago, trurl said:

You should do something about that.

I only have local notifications, havent looked into email notifications yet, but I am on it every day and if something comes up I check it. HDD was looking okay before removing the drive. I was even checking the stats before running to select a drive that is oldest, all had 0 on the critical errors.
 

 

7 hours ago, trurl said:

Do you know if anything has been written to your server since you removed original disk3? It might be better to New Config original disk3 back into the array and rebuild disk2 instead.

Before running a rebuild i disabled dockers and VM was not running. I am not sure tho if anything got written as there are writes on the drives since last boot up, not sure if writes happened during rebuild but everything was turned off (VM/Docker).

image.png.822863271129e0c639625d9f7640c555.png

I assume we are waiting for rebuild to finish, if it finishes successfully I change disc2 to rebuild it. If it does not, we try "It might be better to New Config original disk3 back into the array and rebuild disk2 instead."

Could you let me know if this is what you meant - it means I remove the disk2-failing one, re-add the original disk3 and put the new disk in place of failing disk2 and do a data rebuild?


P.s. The rebuild is in progress and is now at 46%, another 12-15 hours and it should finish. Error count is not increasing.




 

Edited by Matthews
Link to comment
3 hours ago, Matthews said:

If it does not, we try "It might be better to New Config original disk3 back into the array and rebuild disk2 instead."

Could you let me know if this is what you meant - it means I remove the disk2-failing one, re-add the original disk3 and put the new disk in place of failing disk2 and do a data rebuild?

Basically correct, but there are specific steps involved to make this work. Somewhat different from a "standard" New Config since you don't want to affect parity in any way, and you can't do the drive replacement until all that is done correctly.

 

Post new diagnostics when rebuild completes or if there seem to be problems.

Link to comment
3 hours ago, Matthews said:

So I need to setup dual parity

Always recommended if you have a lot of drives, whether or not it is worth the cost for a small array is debatable. I don't have dual parity on either of my systems, but I have email notifications and good (enough) backups.

Link to comment
Quote

Post new diagnostics when rebuild completes or if there seem to be problems.


Rebuild just finished. As requested attached diagnostics after the rebuild.

Notifications claims 0 errors (perhaps parity calculated corrections or i am not understanding it correctly?), yet the main tab claims finding 7907 errors. A bit confused.



image.png.cd6d7b6a20b2b511f2c31d881c46d3ed.png

image.png.2d4e262e104db0af7c37808211c32aaf.png
 

Quote

Always recommended if you have a lot of drives, whether or not it is worth the cost for a small array is debatable. I don't have dual parity on either of my systems, but I have email notifications and good (enough) backups.


Well I think for ease of use, dual parity makes sense for exactly such occasions with combination of local backup of only important files. I think I will order a new drive next week and go dual parity.

 

afterRebuildtower-diagnostics-20240128-0251.zip

Link to comment

It seems okay, did not check everything except important stuff, so cant say 100% sure but looks good so far.

I went into unRaid docs and found this for anyone that face similar issue with one parity. If the disk is not disabled, it means parity reconstructed the data. If it doesnt, the disk automatically is disabled.
 

Quote

Errors counts the number of unrecoverable errors reported by the device I/O drivers. Missing data due to unrecoverable array read errors is filled in on-the-fly using parity reconstruct (and we attempt to write this data back to the sector(s) which failed). Any unrecoverable write error results in disabling the disk.



trurl It seem rebuild worked and the disk2 only had minor files that were corrupted as the disk is failing. Not sure to what extent as I need to check all the files but so far so good. Will let the rebuild finish as to not stress the hdd more.

 

13 hours ago, trurl said:

You need to replace disk2 ASAP


disk2 has been replaced and data rebuild is on the way. In about 5 hours it should finish and we will see.
 

Link to comment

Just thought to write that the rebuild went trough smoothly. Ran a parity check afterwards (without writing corrections to the parity). It looked good, so I ran it again with writing corrections to the parity.

It finished today and I re-enabled docker/VMs. Everything is working. Will keep an eye out for a few day and look into setting up notification system, as suggested by @trurl.

@trurl Thank you for your guidance!

P.s. now I am looking into a small low power backup NAS or going for dual parity - was lucky with 2nd hdd not failing on me during rebuild. 

Link to comment
1 hour ago, trurl said:

You must always have another copy of anything important and irreplaceable. You get to decide what qualifies.

 

I do. But I don't like the solution I am using,its not very fail proof. Do you have any good suggestion as to what to use for local backup? I prefer non cloud solutions tho it probably is the cheapest.

 

Best practice is afaik Server - different location server - and extra protection with copies on the cloud.

Link to comment

I have a backup server which is made entirely from parts that used to be part of my main server, including its disks. I only boot that up occasionally and rsync the less important files from my main server, media and such. It doesn't have the capacity to take all of my main server, and some of my main server contents are just there for convenience and not important enough to backup, such as things I can easily download again.

 

The important things on my main server are really just backups from our PCs. I image the main PC weekly, and our personal files are copied nightly to the main server. Those personal files are backed up monthly to external disks.  All of that personal stuff is what I really consider important, and it will all fit on a single 2TB disk so far. I have a few of these 2TB disks that were once part of my servers. These are in rotation with the 2 most recent of these stored in an offsite location.

 

We do have some things in the cloud, mostly for mobile convenience, nothing important.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.