September 27, 20241 yr Hello, I had to buy a new HBA because my old one was defect and the drive connected to it kept disconnecting. I properly turned off my Array, shut down the server and put the new stuff in. Now both of the old drives (just connected to a new controller) are showing "Unmountable: Unsupported or no file system". How to I rebuild this at best without data loss? This is what it looks like right now: Do I just format them both and let it rebuild? Really not sure what the best way to fix this mess is.
September 27, 20241 yr Community Expert 46 minutes ago, Yasuman said: Do I just format them both and let it rebuild? Formatting will delete all the data, post the diagnostics.
September 27, 20241 yr Community Expert There appears to be a cable problem with disk5, problem is that now you have two invalid disks with single parity, replace cables for that disk, then try this: -Tools -> New Config -> Retain current configuration: All -> Apply -Check all assignments and assign any missing disk(s) if needed -IMPORTANT - Check both "parity is already valid" and "maintenance mode" and start the array (note that the GUI will still show that data on parity disk(s) will be overwritten, this is normal as it doesn't account for the checkbox, but it won't be as long as it's checked) -Stop array -Unassign disk4 -Start array (in normal mode now) and post new diags
September 27, 20241 yr Author Looks like this now: And I'll attach the new diagnostics. yeji-diagnostics-20240927-1651.zip
September 27, 20241 yr Author I swapped cables around. I'm curious where/how you see issues with the dive? I'll try some more changes.
September 27, 20241 yr Community Expert 5 minutes ago, Yasuman said: I'm curious where/how you see issues with the dive? In the syslog Sep 27 16:50:50 Yeji kernel: md: disk5 read error, sector=4294967424 Sep 27 16:50:50 Yeji kernel: XFS (md4p1): metadata I/O error in "xfs_da_read_buf+0x9a/0xff [xfs]" at daddr 0x100000080 len 8 error 5 Sep 27 16:50:50 Yeji kernel: sd 1:0:0:0: [sdf] tag#2076 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=0s Sep 27 16:50:50 Yeji kernel: sd 1:0:0:0: [sdf] tag#2076 Sense Key : 0x2 [current] Sep 27 16:50:50 Yeji kernel: sd 1:0:0:0: [sdf] tag#2076 ASC=0x4 ASCQ=0x0 Sep 27 16:50:50 Yeji kernel: sd 1:0:0:0: [sdf] tag#2076 CDB: opcode=0x88 88 00 00 00 00 02 80 00 00 68 00 00 00 08 00 00 Sep 27 16:50:50 Yeji kernel: I/O error, dev sdf, sector 10737418344 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 Sep 27 16:50:50 Yeji kernel: md: disk5 read error, sector=10737418280 Sep 27 16:50:50 Yeji kernel: XFS (md4p1): metadata I/O error in "xfs_da_read_buf+0x9a/0xff [xfs]" at daddr 0x280000028 len 8 error 5 Sep 27 16:50:50 Yeji kernel: sd 1:0:0:0: [sdf] tag#2077 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=0s Sep 27 16:50:50 Yeji kernel: sd 1:0:0:0: [sdf] tag#2077 Sense Key : 0x2 [current] Sep 27 16:50:50 Yeji kernel: sd 1:0:0:0: [sdf] tag#2077 ASC=0x4 ASCQ=0x0 Sep 27 16:50:50 Yeji kernel: sd 1:0:0:0: [sdf] tag#2077 CDB: opcode=0x88 88 00 00 00 00 03 1c 40 4d 80 00 00 00 08 00 00 Sep 27 16:50:50 Yeji kernel: I/O error, dev sdf, sector 13358878080 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 If new cables don't help, run an extended SMART test.
September 27, 20241 yr Author I can't get the SMART test to finish at all, or even progress past 10%. I imagine that means the drive is broken/done for? I've also swapped all cables around, and other drives are working just fine on what Disk5 was connected to before. Edited September 27, 20241 yr by Yasuman
September 27, 20241 yr Community Expert That still look more like a connection issue, but if swapping cables doesn't help it may be a bad drive, problem, is that without that disk working, you also won't be bale to recover disk4, so you will lose the data from two disks.
September 27, 20241 yr Author Yeah, I've not only swapped cables but also the ports around. Moved disk5 from the dedicated HBA to the mainboard now. Not really sure what else I could do at this point?
September 27, 20241 yr Community Expert Is there an old disk4, like if that was an upgrade or replacement? Or why was it being rebuilt initially?
September 27, 20241 yr Author It's because of my old, faulty HBA that caused the drive to be disabled.
September 27, 20241 yr Community Expert Try doing this, but now to see if you can rebuild disk5 instead, nothing to lose at this point: -Tools -> New Config -> Retain current configuration: All -> Apply -Check all assignments and assign any missing disk(s) if needed -IMPORTANT - Check both "parity is already valid" and "maintenance mode" and start the array (note that the GUI will still show that data on parity disk(s) will be overwritten, this is normal as it doesn't account for the checkbox, but it won't be as long as it's checked) -Stop array -Unassign disk5 -Start array (in normal mode now) and post new diags
September 27, 20241 yr Community Expert Disk4 mounted and it's showing data, so that's good, check filesystem on the emulated disk5, run it without -n
September 27, 20241 yr Author Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. Should I just run it with -L then?
September 28, 20241 yr Community Expert 12 hours ago, Yasuman said: Should I just run it with -L then? Yep
September 28, 20241 yr Author Well, that's looking better now: I'm gonna let it rebuild for now. Edited September 28, 20241 yr by Yasuman
September 28, 20241 yr Community Expert Check the syslog for errors with disk6, if they still appear now, or when reading the data after the rebuild, you may need to replace it.
September 28, 20241 yr Author I still get quite the amount of errors, but so far it is still rebuilding:
September 28, 20241 yr Author I should be able to replace it and just restart the build process, right? It dropped down to less than 1Mb/s now and that would take years to build.
September 28, 20241 yr Community Expert 11 minutes ago, Yasuman said: I should be able to replace it and just restart the build process, right? Yes.
September 28, 20241 yr Author Okay, last question for now since I need to get another drive anyway. How beneficial would it be for me to also get a 2nd parity drive?
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.