Possible invalid parity, and failing data drives...what to do next?


Recommended Posts

This all started with a parity upgrade.  I wanted to swap out a 4tb for an 8tb, so i just replaced the old drive with the new and rebuilt parity.  When it was done, my drive 8 (heretofore md8) was red-balled.  The filesystem was corrupt and couldnt be repaired in-place.  Using UFS I was able to save some data, but not all, which is currently sitting on a spare 8tb that is also eventually supposed to make its way into the array.  The array is currently in maintenance mode while I try to figure out whats next (re-formatting md8, I'd guess.)

 

During the parity rebuild, disk 7 also showed read errors, so my new parity is most likely corrupt.  What are my best steps from here?

 

I still have the original parity drive which may or may not be valid since I've written nothing to the array since the upgrade.  I also have a spare 8tb (currently holding the recovered data from md8) and I will pick up a replacement 4tb for disk 7 tonight.

 

Can I tank the whole array by trying to revert everything back and rebuilding md8 from original parity?  What impact will the read errors on disk 7 have (currently passes short SMART, will not complete extended).  Seeing as I might not have valid parity, should I try to mirror the contents from disk 7 onto the replacement and then proceed?

 

The compound issues have me questioning what to do next as I dont want to do more damage by doing the wrong thing.

 

I've attached two diagnostics: one following the parity rebuild, and one with the array in maintenence after recovering some data from md8.  I've posted about each issue previously to which johnnie.balck replied (thanks!) but I figured I should inquire about the whoel picture.

apollo-diagnostics-20191016-2013.zip apollo-diagnostics-20191027-0357.zip

Link to comment

Disk8 looks fine but disk7 is failing, problem is if parity isn't valid, and although diags don't show parity sync it likely isn't based on your description and the failing disk7, there aren't many options.

 

You could first see if actual disk8 mounts correctly, emulated disk might be corrupted because of invalid parity, if it does then all data there should be OK, then use ddrescue on disk7.

 

With array stopped:
 

mkdir /temp
mount -o ro /dev/sdd1 /temp

 

Check that disk8 is still sdd, note the 1 after the device, you can then browse /temp using the console or e.g. midnight commander and safely check the disk contents, when done unmount the disk with:
 

umount /temp

 

Link to comment

Having similar issue .
Disk 1 "Device is disabled, contents emulated"
Error:  UDMA CRC error count2019-11-01_023014-BIGNAS_Device.jpg

So have done :

0. Replaced PSU  with 650W unit  that has SATA power connectors,  3 drives/power cable ,

1. Formatted Disk 1
2.
xfs_repair status now reports no faults.

3. Smart Check:  Completed without error

4. Reboot, start array.

 

However,  the Disk1 is still showing as Disabled. How to enable it?

2019-11-01_022412-BIGNAS_Main.jpg

Edited by lemoncurry
Link to comment
6 minutes ago, lemoncurry said:

However,  the Disk1 is still showing as Disabled. How to enable it?

Since you formatted the disk if you rebuild all data will be gone, i.e., disk will be empty, is that what you want?

 

Also there are read errors on disk2, so a rebuild might result in corruption, but without diags no way to see how the disk is.

Link to comment

Thanks.
Where are you seeing errors on disk2?

I assumed that "contents emulated" meant the data had been shifted to other drives , before unraid disabled the drive, is that not the case? If not what does "contents emulated" mean then?
Parity check completed without Disk1  active in the array, I assumed formatting Disk1 should not lose data from the array,  am I wrong?
 

Link to comment
52 minutes ago, lemoncurry said:

I assumed that "contents emulated" meant the data had been shifted to other drives , before unraid disabled the drive, is that not the case? If not what does "contents emulated" mean then

No.    What UnRAID is telling you is that it is acting as if the disk is present by reconstructing on-the-fly the contents using the combination of the other drives plus the parity drive.    Unraid never moves data between array drives on its own - it always takes manual action to achieve that.

 

54 minutes ago, lemoncurry said:

I assumed formatting Disk1 should not lose data from the array,  am I wrong?
 

That was an incorrect assumption.    When you told UnRaid to format the disk it was an instruction to write an empty file system to the disk and update parity appropriately.     If the disk was was disabled then the format took place against the emulated drive and not the physical drive so you may still have retrievable data on the physical drive that was disabled.

Link to comment
9 hours ago, lemoncurry said:

Where are you seeing errors on disk2?

Ignore that, was thinking of another thread.

 

See itimpi's reply to the other questions, basically to recover the data from disk1 you'd need to do a new config and re-sync parity (assuming disk1 is indeed healthy).

 

 

Edited by johnnie.black
Link to comment

Thanks.
Well only half of that make logical sense to me.  To my logic, if unraid disables a drive then it should be assumed as not part of the array,  thus formatting a disabled drive should Not remove any content being emulated!  Logically,  a parity update after disabling a disk should convert emulated content to real!  Not just dump emulated content associated with the disabled disk.  Otherwise whats the point of having parity ?!

Anyway,  The drive is still showing as "Device is disabled, contents emulated" so data may not be lost.

How to re-build the array to real content from emuated content?  ie exercise the redundancy protection function ,  recreating the content on the enabled drives .

Edited by lemoncurry
Link to comment
50 minutes ago, lemoncurry said:

formatting a disabled drive should Not remove any content being emulated!

The word "format" has always meant, since the days of floppy disks, "create an empty file system" and it comes with a warning that you'll lose whatever is currently stored there. It has nothing to do with parity or arrays or whether a disk is disabled and emulated.

Link to comment
Quote

Well only half of that make logical sense to me.  To my logic, if unraid disables a drive then it should be assumed as not part of the array,  thus formatting a disabled drive should Not remove any content being emulated!  Logically,  a parity update after disabling a disk should convert emulated content to real!  Not just dump emulated content associated with the disabled disk.  Otherwise whats the point of having parity

Once Unraid disables a drive it stops using the physical drive and only uses the ‘emulated’ version.   The advantage of this to the end-user is that they can continue to use the array as if the drive was still present (albeit with a loss of protection against further failures).  If you issue a format at this stage then is assumed that it is the ‘emulated’ drive showing in the GUI that is to be formatted.   If you now replace the physical drive then it is the contents of the ‘emulated’ drive that is rebuilt onto the physical drive.   Since a rebuild rewrites every sector on a drive it is irrelevant if that drive was formatted before the start of the rebuild.   I therefore cannot see any obvious reason why you would want to format the disabled physical drive unless you intend to RMA it and in such a case there are better ways to erase data than a format.

 

One side-effect of this behavior is that the physical drive is left in the state it was at the point UnRAID disabled it.    If you rebuild the ‘emulated’ drive onto a new disk you still have the original physical drive intact.    If you now mount that original drive outside the array you can frequently access most of its content which can be useful in a data recovery scenario.

Link to comment

@itimpi  Thanks for the detailed explanation.  That's clear now,  unfortunately I thought a disabled disk was treated differntly. Will replace the drive,  rebuild and the recover data outside of the array.

Quote

If you issue a format at this stage then is assumed that it is the ‘emulated’ drive showing in the GUI that is to be formatted.

The key factor here is that the content of the ‘emulated’ drive is really still "linked" to the physical drive even if disabled. ie A disabled drive is still part of the array. Format a disabled drive and you effect the array content likewise.

I think it would be useful for the unraid GUI to offer a tool to make the process of removing and/or replacing a drive simplified and streamlined,  as someone that will only log on to unraid when there is an issue, the current processes and doc's are a bit ambiguous, one tends to forget how it all works. 

Cheers

 

Link to comment

Damit this is confusing.!  Ive follows the Doc'c procedure for "Remove a disk", 
 Removed Faulty Disk 1,

 Preserved the Parity and Cache assignments.
 Reassigned Disk 2,  3 as per previous assignment. Leaving out Disk 1

I'm now presented with a warning for Parity Disk: 
"All existing data on this device will be OVERWRITTEN when array is Started"
Which means the Parity will Be recreated for Disk2 & 3 data, the exact opposite of what I need!

880098443_2019-11-04_144248-PageInfo-http___faq.out-club.ru_download_pajero_sport_2008_maintenance_Servic.jpg.52de30f69b332d45b2e0c0ce5cf3d9c7.jpg
What the!  This process is supposed to preserve Content,  not overwrite it!
What do I do to rebuild the array from current emulated content ?

 

Edited by lemoncurry
Link to comment

You've misunderstood what I am saying. 

One purpose of having Parity "Content Emulation" is to recover from Disk failure.
The failed disk is a "small" 4TB disk,  so I dont want to replace it,  just convert the Emulated Content of the removed disk back to Real content ,  placed instead on the remaining disks.
Can this not be done?
 

Edited by lemoncurry
Link to comment
17 minutes ago, lemoncurry said:

The failed disk is a "small" 4TB disk,  so I dont want to replace it,  just convert the Emulated Content of the removed disk back to Real content ,  placed instead on the remaining disks.
Can this not be done?

Not automatically. When the drive is being emulated, you can copy the data to another disk slot, however this is a slow process, because it involves reading and writing all the other array disks. Normally you would replace the drive with another, typically larger drive.

 

In your case, since you formatted the emulated content, it's no longer rebuildable from parity, so the only path forward is to try to recover the data from the physical disk.

Link to comment

Actually, I thought I'd formatted it, but somehow it didnt do it, the data is still there. Just not sure if its all there.

So I recon its still rebuild-able, worth trying. Speed wont be an issue.
Does it need to copy to a unassigned disk or can it be copied to the array?  Im guessing the former, if so that means buying a new disk anyway,  so might as well replace rather than remove Disk1

 

Edited by lemoncurry
Link to comment

!!! This whole process is ambiguous, illogical,   there's got to be a better way!  Well unraid decided the disk was not healthy,  hence all this trouble ! The documentation on this is is poor to say the least. I have limited time to waste on this!
I appreciate the help,  thanks
Just ordered a replacement disc,  and now discover its pointless.. faarrrk!
 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.