Jump to content

Need Advice on Replacing 2 Disks


hadek

Recommended Posts

I had a bad SSD drive serving as cache. To replace it I chose an M.2. Once I moved everything out of cache - there were no files on the drive - I brought down the array, shutdown the system and I replaced the failed drive. Upon start I assigned the cache drive and noticed that parity drive was down. (Being the scrooge I am my system run on a single parity drive)  

I powered the system down and during hardware inspection I noticed that power plug slipped out of parity drive. I reconnected and rebooted the system. Of course parity drive would not resume normal operation and to make matters worse one of the data disks in the array started acting up with write errors. I figured the parity drive was good so I used this article to force parity drive back and accepted the parity as being good. Started up the array and kicked off parity check.

 

As you may have guessed the parity drive started reporting errors in parity and to make matters worse started reporting write problems too.

 

The data drive that failed had the data backed up from the failing cache drive. I believe it was data related to Docker/VMS and 'system' share. 

 

I know I've lost some of the data.

 

Now my plan is as follows:

I'm getting 2 new drives and plan to use them as parity drives. I need to remove the failed data drive and make the array one drive smaller. The goal is not lose the data on remaining data drives. 

 

How to execute the switch and trim without loss of data on the good drives? I can recreate my VMs and Dockers but could someone chime in on importance on 'system' share and if that will cause any other issues?

 

thanks in advance,

hdk

 

 

Edited by hadek
Link to comment
10 hours ago, hadek said:

during hardware inspection I noticed that power plug slipped out of parity drive

You must always double check all connections when mucking about in the case. What you experienced here may be the most common cause of disks becoming disabled.

 

Parity is disabled again. SMART for that disk looks OK.

 

Disk3 isn't disabled since parity was already disabled. SMART for that disk also looks OK.

 

You almost certainly have connection issues with both of these disks. Check all connections, all disks, both ends, SATA and power, including any splitters.

 

Disk3 and parity aren't showing as assigned in SMART folder. Maybe they both show up in the Unassigned area.

 

I see no reason to replace (or remove) either and probably simpler if you don't at least until everything is stable. You can run extended SMART tests if you want but I expect they will pass.

 

10 hours ago, hadek said:

importance on 'system' share

Your system share is the default location of docker.img and libvirt.img, and that is indeed where you have these configured, but since you moved it off cache and onto the missing disk, it doesn't currently exist.

 

After getting your connections fixed, rebuild parity again, then a non-correcting parity check. After you confirm that you have no sync errors you can consider if you want to change anything, and get your system share back on cache where it belongs.

 

Single parity is probably fine with so few disks.

 

Link to comment

Many thanks for the analysis. You were 100%, the drives are good. Turns out my motherboard bit the dust - not even 4 months old. I'm still assessing the carnage the failure caused.

 

Is it OK to use the originally referenced procedure to bring back the parity drive? Or is there a better way?

 

Thanks,

hdk

Link to comment

It is usually better to rebuild than to assume a disabled disk is correct, because it is probably out-of-sync.

 

Unraid disables a disk when a write to it fails. A disabled disk isn't used again until rebuilt (or New Config as you did before). But writes to the array continue.

 

In the case of a disabled data disk, the disk is emulated from the parity calculation. That original failed write, and any subsequent writes, update parity, so those writes can be recovered by rebuilding. And so the disabled disk is out-of-sync with parity.

 

In the case of disabled parity, any subsequent writes to the array are not written to parity, so it is out-of-sync with the array.

 

To rebuild to the same disk, whether data or parity

  1. Stop array
  2. Unassign disabled disk
  3. Start array with disabled disk unassigned
  4. Stop array
  5. Reassign disabled disk
  6. Start array to begin rebuild

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...