sts Posted November 13, 2018 Share Posted November 13, 2018 I wish I had started by posting here, but instead I tried to figure it out. I'll do my best to explain what has happened and what I've done over the past couple days. To add to the headache the server is built on a asrock c2550d4i so I'm guessing a combo of faulty cabling and/or bad Marvell controllers are at the root of my issues. started weeks ago with disk5 turning up Unmountable: No file system. I assumed this was a bad drive, as the system had no trouble emulating with parity, I shut everything down and ordered a new drive and when it arrived started rebuilding but the process was slow and various disks were showing Hard Resetting Link errors. I stopped the rebuild and shut the system down and replaced all the sata cables and reorganized the sata power cabling. that seemed to help except disk4 was now also showing as Unmountable: No file system. After looking at the forums and I ran xfs_repair -v -L on disk4, making it mountable again. Began rebuilding disk5 again and disk4 started producing errors. Then I noticed both disk4 & disk3 have SMART errors. I allowed the rebuild of disk5 to finish only to discover it is of course also unmountable. With the array in maintenance mode I ran run xfs_repair -n on all disks and discovered disk3, disk5 and disk7 have issues. I ran xfs_repair -v on disk3 and disk7, which appeared to work but the xfs_repair -n of disk5 (emulated, assigned or unassigned) has a lot of "out-of-order bno btree", "data fork in ino" and skipped phase 5, 6 & 7. I've included as much of my idiotic fumbling as I can remember above in case it's helpful? So I'm looking for advice, is there a way to make this drive mountable again or is the file structure too damaged? should I even attempt to run xfs_repair -v or L on disk5? unfortunately the system has been powered down and rebooted numerous times through this process, so the Diagnostics file is what it is, sorry about that. Is there a best method moving forward for rebuilding and mitigating data loss? I do have space on a second server for copying. antron-diagnostics-20181113-1252.zip virtual disk5 xfs_repair status -nv.txt Link to comment
JorgeB Posted November 14, 2018 Share Posted November 14, 2018 First thing to do would be to stop any more repair attempts while using the Marvell controllers, replace then first then attempt to repair the damaged filesystems. Link to comment
sts Posted November 14, 2018 Author Share Posted November 14, 2018 Thanks for having a look Johnnie. I had a spare M1015 / LSI SAS9220-8i which I've installed, so no Marvell controllers are being used. In maintenance mode now, with Disk5 emulated I ran another xfs_repair -n which produced what appears to be the same output as before. How should I proceed? Is there a repair I should attempt, or can I offer more information? Link to comment
JorgeB Posted November 15, 2018 Share Posted November 15, 2018 Run xfs_repair without -n or nothing will be fixed. Link to comment
sts Posted November 15, 2018 Author Share Posted November 15, 2018 Ok, in maintenance mode with disk 5 emulated i ran xfs_repair -v. I've attached the output. disk 5 xfs_repair -v.txt Link to comment
JorgeB Posted November 16, 2018 Share Posted November 16, 2018 Emulated disk looks to be corrupt beyond repair, do you still have the old disk5 intact? Link to comment
JorgeB Posted November 16, 2018 Share Posted November 16, 2018 Use the UD plugin and see if it mounts correctly, if it does and assuming the disk is OK (in doubt post a SMART report), instead of rebuilding do a new config with it and resync parity. Link to comment
sts Posted November 17, 2018 Author Share Posted November 17, 2018 from UD the disk doesn't want to mount, after each attempt it reverts back. would there be any benefit to running the File System Check offered in blue under the drive? or could i do a xfs_repair -v /dev/sdm1? Link to comment
JorgeB Posted November 17, 2018 Share Posted November 17, 2018 2 hours ago, sts said: or could i do a xfs_repair -v /dev/sdm1? Yes, do that. Link to comment
sts Posted November 18, 2018 Author Share Posted November 18, 2018 when I do xfs_repair -v /dev/sdm1 in maintenance mode it wants to run like this for a long time I have both the original disk5 and the replacement disk in UD, neither wants to mount. I'll attach a new diagnostic so you can see the smart values of the original disk (ST8000DM004-2CX188_WCT06DMZ-20181117-1918 (sdn)) antron-diagnostics-20181117-1918.zip Link to comment
sts Posted November 18, 2018 Author Share Posted November 18, 2018 I just noticed you've said in another thread with a similar issue: Quote XFS_repair is searching the disk for a backup superblock, I remember it can take a while in big disks, unless there is something wrong with the disk, like pending sectors, just let it run. So I'll just let it run overnight and cross my fingers it doesn't end in an error. I'll write again when i see a result Link to comment
JorgeB Posted November 18, 2018 Share Posted November 18, 2018 It' not good, it means the primary superblock is damaged, but let it run, it may found a backup superblock. Link to comment
sts Posted November 18, 2018 Author Share Posted November 18, 2018 completed with the above message. it does not look good. any other options or am I looking at a new config without the data on disk5? and is there a way to rescue the data on the bad disk outside of unraid and reintegrate it? Link to comment
JorgeB Posted November 19, 2018 Share Posted November 19, 2018 Very strange the emulated disk having a valid superblock and the actual disk not, also in your diags sdm is the parity disk, are you sure you ran xfs_repair on the correct disk? Link to comment
sts Posted November 20, 2018 Author Share Posted November 20, 2018 oh wow... yeah no, sorry. fat thumbs ah, I guess when I didn't have the replacement disk installed the assignments were different? let me try that again on the correct disk... it appears to hit an error and stop xfs_dir_ino_validate: XFS_ERROR_REPORT fatal error -- couldn't map inode 65763233, err = 117 here is the full output. xfs_repair dev sdn1.txt Link to comment
JorgeB Posted November 20, 2018 Share Posted November 20, 2018 Like the emulated disk it looks very corrupt, but xfs_repair should not abort, it should run and fix what it can, with more or less data loss, you can ask for help on the xfs mailing list, they might be able to help more. Link to comment
trurl Posted November 20, 2018 Share Posted November 20, 2018 3 hours ago, sts said: ah, I guess when I didn't have the replacement disk installed the assignments were different? Unraid identifies the disks by their serial number when it assigns a number to them because the disk letters are not guaranteed to stay the same between boots. You must always confirm if you need to use a disk letter for anything. Link to comment
sts Posted November 21, 2018 Author Share Posted November 21, 2018 Thanks for the clarification trurl. I really should have known better. Thanks again for the help Johnnie. I'll look into the xfs Mailing List. I do have another question, out of curiosity I've found that UFS file explorer has no problem browsing and copying off the replacement drive, i'm assuming it could also read the original failed disk5. Would it be possible to back up the content of the failed drive to another location and then select New Config in Unraid with the existing array plus the replacement drive, rebuild the server with that empty disk, obviously abandoning emulation and the media that was on disk5, but then re-add the media that was backed up from using UFS file explorer? or am I asking for whole new set of problems? Link to comment
JorgeB Posted November 21, 2018 Share Posted November 21, 2018 3 hours ago, sts said: rebuild the server with that empty disk, obviously abandoning emulation and the media that was on disk5, but then re-add the media that was backed up from using UFS file explorer? You don't need a new config for that, just rebuild the disk as is, format and restore the data from the external disk. Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.