unmountable with large number of xfs_repair issues

sts · November 13, 2018

I wish I had started by posting here, but instead I tried to figure it out. I'll do my best to explain what has happened and what I've done over the past couple days. To add to the headache the server is built on a asrock c2550d4i so I'm guessing a combo of faulty cabling and/or bad Marvell controllers are at the root of my issues.

started weeks ago with disk5 turning up Unmountable: No file system. I assumed this was a bad drive, as the system had no trouble emulating with parity, I shut everything down and ordered a new drive and when it arrived started rebuilding but the process was slow and various disks were showing Hard Resetting Link errors.

I stopped the rebuild and shut the system down and replaced all the sata cables and reorganized the sata power cabling. that seemed to help except disk4 was now also showing as Unmountable: No file system. After looking at the forums and I ran xfs_repair -v -L on disk4, making it mountable again.

Began rebuilding disk5 again and disk4 started producing errors. Then I noticed both disk4 & disk3 have SMART errors. I allowed the rebuild of disk5 to finish only to discover it is of course also unmountable.

With the array in maintenance mode I ran run xfs_repair -n on all disks and discovered disk3, disk5 and disk7 have issues. I ran xfs_repair -v on disk3 and disk7, which appeared to work but the xfs_repair -n of disk5 (emulated, assigned or unassigned) has a lot of "out-of-order bno btree", "data fork in ino" and skipped phase 5, 6 & 7.

I've included as much of my idiotic fumbling as I can remember above in case it's helpful?

So I'm looking for advice, is there a way to make this drive mountable again or is the file structure too damaged? should I even attempt to run xfs_repair -v or L on disk5? unfortunately the system has been powered down and rebooted numerous times through this process, so the Diagnostics file is what it is, sorry about that. Is there a best method moving forward for rebuilding and mitigating data loss? I do have space on a second server for copying.

antron-diagnostics-20181113-1252.zip

virtual disk5 xfs_repair status -nv.txt

JorgeB · November 14, 2018

First thing to do would be to stop any more repair attempts while using the Marvell controllers, replace then first then attempt to repair the damaged filesystems.

sts · November 14, 2018

Thanks for having a look Johnnie. I had a spare M1015 / LSI SAS9220-8i which I've installed, so no Marvell controllers are being used. In maintenance mode now, with Disk5 emulated I ran another xfs_repair -n which produced what appears to be the same output as before. How should I proceed? Is there a repair I should attempt, or can I offer more information?

JorgeB · November 15, 2018

Run xfs_repair without -n or nothing will be fixed.

sts · November 15, 2018

Ok, in maintenance mode with disk 5 emulated i ran xfs_repair -v. I've attached the output.

disk 5 xfs_repair -v.txt

JorgeB · November 16, 2018

Emulated disk looks to be corrupt beyond repair, do you still have the old disk5 intact?

sts · November 16, 2018

I do

JorgeB · November 16, 2018

Use the UD plugin and see if it mounts correctly, if it does and assuming the disk is OK (in doubt post a SMART report), instead of rebuilding do a new config with it and resync parity.

sts · November 17, 2018

from UD the disk doesn't want to mount, after each attempt it reverts back.

would there be any benefit to running the File System Check offered in blue under the drive? or could i do a xfs_repair -v /dev/sdm1?

JorgeB · November 17, 2018

2 hours ago, sts said:

or could i do a xfs_repair -v /dev/sdm1?

Yes, do that.

sts · November 18, 2018

when I do xfs_repair -v /dev/sdm1 in maintenance mode it wants to run like this for a long time

I have both the original disk5 and the replacement disk in UD, neither wants to mount. I'll attach a new diagnostic so you can see the smart values of the original disk (ST8000DM004-2CX188_WCT06DMZ-20181117-1918 (sdn))

antron-diagnostics-20181117-1918.zip

sts · November 18, 2018

I just noticed you've said in another thread with a similar issue:

Quote

XFS_repair is searching the disk for a backup superblock, I remember it can take a while in big disks, unless there is something wrong with the disk, like pending sectors, just let it run.

So I'll just let it run overnight and cross my fingers it doesn't end in an error.

I'll write again when i see a result

JorgeB · November 18, 2018

It' not good, it means the primary superblock is damaged, but let it run, it may found a backup superblock.

sts · November 18, 2018

692746179_Screenshot2018-11-1811_12_45.png.666277637c75f425ba5c627fff6f981d.png

completed with the above message. it does not look good.

any other options or am I looking at a new config without the data on disk5? and is there a way to rescue the data on the bad disk outside of unraid and reintegrate it?

JorgeB · November 19, 2018

Very strange the emulated disk having a valid superblock and the actual disk not, also in your diags sdm is the parity disk, are you sure you ran xfs_repair on the correct disk?

sts · November 20, 2018

oh wow... yeah no, sorry. fat thumbs

ah, I guess when I didn't have the replacement disk installed the assignments were different?

let me try that again on the correct disk...

it appears to hit an error and stop

xfs_dir_ino_validate: XFS_ERROR_REPORT

fatal error -- couldn't map inode 65763233, err = 117

here is the full output.

xfs_repair dev sdn1.txt

JorgeB · November 20, 2018

Like the emulated disk it looks very corrupt, but xfs_repair should not abort, it should run and fix what it can, with more or less data loss, you can ask for help on the xfs mailing list, they might be able to help more.

trurl · November 20, 2018

3 hours ago, sts said:

ah, I guess when I didn't have the replacement disk installed the assignments were different?

Unraid identifies the disks by their serial number when it assigns a number to them because the disk letters are not guaranteed to stay the same between boots. You must always confirm if you need to use a disk letter for anything.

sts · November 21, 2018

Thanks for the clarification trurl. I really should have known better.

Thanks again for the help Johnnie. I'll look into the xfs Mailing List.

I do have another question, out of curiosity I've found that UFS file explorer has no problem browsing and copying off the replacement drive, i'm assuming it could also read the original failed disk5. Would it be possible to back up the content of the failed drive to another location and then select New Config in Unraid with the existing array plus the replacement drive, rebuild the server with that empty disk, obviously abandoning emulation and the media that was on disk5, but then re-add the media that was backed up from using UFS file explorer? or am I asking for whole new set of problems?

JorgeB · November 21, 2018

3 hours ago, sts said:

rebuild the server with that empty disk, obviously abandoning emulation and the media that was on disk5, but then re-add the media that was backed up from using UFS file explorer?

You don't need a new config for that, just rebuild the disk as is, format and restore the data from the external disk.

unmountable with large number of xfs_repair issues

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Archived