(Solved) 6.4.1 Parity errors, Disabled drive, SMART errors

mfort312 · April 3, 2018

For the first time, a monthly parity check returned errors and a drive is now showing as Disabled. I am unable to read SMART attributes on this drive although the SMART status is green. A couple of other drives have some SMART reallocated sector errors but have been holding steady for some time now.

My plan of action is to first move the data off emulated/disabled drive and then remove the drive from the array, as I have plenty of spare capacity in remaining drives.

Should I upgrade to 6.5 first? Any steps I should be mindful of? Do my logs indicate what happened to cause the Disabled drive? I'm assuming simple failure from an old drive.

Edited January 28 by mfort312
Solved

mfort312 · April 3, 2018

Something else has happened while using MC to copy files from the disabled drive to another: a read/write error (5) occurred and all the files on the drive (drive 9) I was copying to disappeared! I am leaving the system as is for time being. Attached are the new diagnostics.

tower-diagnostics-20180402-2247.zip

JorgeB · April 3, 2018

Disk8 was already disabled when you started the parity check, and parity is failing, so you may be in trouble, reboot so we can see a SMART report for disk8

mfort312 · April 3, 2018

Rebooted and the SMART attributes from disk8 are now available. However Disk9 is now reporting as unmountable with no file system.

When I was moving files from Disk8, I first copied to Disk3 with no problems. When I was moving a folder to Disk9, a few files copied before the read/write errors and then the Disk9 was inaccessible via MC and all the files disappeared. There were only about 80G of files, nothing irreplaceable, but now I am worried about the other drives. If possible, it would be nice to recover at least a file list to know which files were on that drive before hand. Actually, it would be nice to do that for all the drives if possible before proceeding, is there an easy way to get a full file list?

tower-diagnostics-20180403-0732.zip

JorgeB · April 3, 2018

Disk8 is also failing and needs to be replaced, disk9 seems fine, looks life a filesystem problem only.

Since you have two failing disks IMO your best way forward would to do a new config with a new parity and a new disk8 (or without one if you don't need the space), run xfs_repair on disk9, rsync parity and then connect old disk8 with the UD plugin and try to copy any important data.

P.S.: disk5 is not currently failing but has a lot of reallocated sectors, keep an eye on it or preemptively replace.

mfort312 · April 3, 2018

Thank you, Johnnie. The parity drive is the other drive failing?

So I should set up a new config with a new parity drive first and then run xfs_repair on disk9 as part of the array? Or try xfs_repair first, with disk9 in array? I was thinking to use disk9 as the new parity drive if I can recover and move the files. Sounds like it would be better to get a new drive for parity before attempting xfs_repair?

JorgeB · April 3, 2018

43 minutes ago, mfort312 said:

The parity drive is the other drive failing?

Yes

43 minutes ago, mfort312 said:

So I should set up a new config with a new parity drive first and then run xfs_repair on disk9 as part of the array? Or try xfs_repair first, with disk9 in array?

~~Any way will work, if done first unassign the failing parity drive~~ Since there's a disable disk you can't unassign parity, so do a new config first.

Edited April 3, 2018 by johnnie.black

mfort312 · April 4, 2018

Ok, good news: I managed to copy everything off the disabled disk8 while still in emulation mode to a drive off the unraid array.

Next, to fix disk9, I started in maintenance mode and from a terminal attempted:

xfs_repair -v /dev/md9

ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed.  Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair.  If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.

I tried mounting and unmounting again with the same error, so back in maintenance mode I next tried:

xfs_repair -vL /dev/md9

ALERT: The filesystem has valuable metadata changes in a log which is being
destroyed because the -L option was used.

After finishing, I stopped and started the array again in normal mode, and bingo, there were all my missing files. Lost and found had only a few files from the failed MC copy yesterday. Everything else is in its place. I am now copying everything from disk9 off the unraid array.

With disk9 free, I will use it to replace the failing parity drive. And next work on replacing disk5.

Disk5's SMART status looks pretty similar if not worse than the failing Parity drive's SMART status. How can I spot the difference between currently failing and still hanging on?

Thanks again for your help and advice.

mfort312 · April 4, 2018

Also, what will happen with my Docker apps (on cache drive) and User Shares with a New Config? Will I need to rebuild them?

JorgeB · April 4, 2018

3 hours ago, mfort312 said:

Disk5's SMART status looks pretty similar if not worse than the failing Parity drive's SMART status. How can I spot the difference between currently failing and still hanging on?

Parity has pending sectors, disk5 not, at least not on the report.

3 hours ago, mfort312 said:

Also, what will happen with my Docker apps (on cache drive) and User Shares with a New Config? Will I need to rebuild them?

Not as long as cache drive remains the same.

(Solved) 6.4.1 Parity errors, Disabled drive, SMART errors

Recommended Posts

mfort312

Link to comment

mfort312

Link to comment

JorgeB

Link to comment

mfort312

Link to comment

JorgeB

Link to comment

mfort312

Link to comment

JorgeB

Link to comment

mfort312

Link to comment

mfort312

Link to comment

JorgeB

Link to comment

Join the conversation