Unmountable Disk (SOLVED)


Recommended Posts

Hi,

 

I made some mess. I had a red X for disk 2. Ran xfs_repair which seemed OK, but still couldn't mount it.

Tried:  xfs_admin -U generate /dev/sde1

but got:

/dev/sde1 : No such file or directory

fatal error -- couldn't initialize XFS library

 

I did new config (probably a mistake), preserved all assignments, started the array WITH "parity is ok",

but still the disk is unmountable.

However I can mount it through UD and browse it's data through the gui, so probably didn't lose the data.

How do I resolve this?

Attached diagnostics.

 

juno-diagnostics-20190308-2209.zip

Edited by Gico
Link to comment

You didn't use the correct procedures and could lose data, but after the new config disk2 wasn't mounting because of this:

Mar  8 21:47:19 Juno kernel: XFS (md2): Filesystem has duplicate UUID 98e2e722-1e0f-4db0-9449-4a444af07a0e - can't mount

This means there a clone of that disk already mounted, or the same disk mounted in UD for example, unmount that one and try again, you'll need to do a new config again, since you unassigned the disk since.

 

P.S: all you cache devices have read/write errors:

Mar  8 21:32:17 Juno kernel: BTRFS info (device sdo1): bdev /dev/sdo1 errs: wr 251000, rd 264252, flush 27, corrupt 36174, gen 1
Mar  8 21:32:17 Juno kernel: BTRFS info (device sdo1): bdev /dev/sdm1 errs: wr 29454, rd 31202, flush 2, corrupt 0, gen 0
Mar  8 21:32:17 Juno kernel: BTRFS info (device sdo1): bdev /dev/sdr1 errs: wr 194380, rd 205820, flush 85, corrupt 63, gen 58
Mar  8 21:32:17 Juno kernel: BTRFS info (device sdo1): bdev /dev/sdp1 errs: wr 6, rd 1, flush 0, corrupt 7, gen 0
Mar  8 21:32:17 Juno kernel: BTRFS info (device sdo1): bdev /dev/sdq1 errs: wr 44286, rd 45178, flush 14, corrupt 1, gen 0

Probably a record :), see here for more info on how clear and monitor for future errors.

 

 

Edited by johnnie.black
  • Like 1
Link to comment

Unmounted, assigned, new config, and array is fully online, but I ran about a minute of parity check (without correction), and 47 errors were found.

I stopped it for now. Is it probably because of the xfs_repair, and I should let the parity check run with corrections?

[Edit: Running parity check with corrections]

 

Why didn't the xfs_admin command I ran failed?

 

As for the cache, I'll run a scrub, but I'm planning to replace all the cache drives soon,

although I'm not convinced that they are to be blamed for the errors.

Edited by Gico
Link to comment

There appears to be a HBA problem, could also be power/cable related.

 

41 minutes ago, Gico said:

generate new  UUIDs

There's no need to generate new UUIDs, either rebuild the disks or if you think they are fine and no new data was written to any emulated disk do a new config and resync parity.

Link to comment

Both disks were not written to with new data.

 

Disk 3 repaired by xfs_repair and seems ok now. Still disabled.

 

Disk 9 dropped from the array (was in "Not installed" status).

After reboot it's present. Assigned it to the array and started maintenance mode. "Content emulated" now.

Running xfs_repair and it's not ending even an hour later.

The last line of the xfs_repair doesn't change: "resetting inode 7732495730 nlinks from 3 to 2"

Should I leave it to end?

 

epMsFiP.jpg

 

juno-diagnostics-20190309-1730.zip

Link to comment

If you're going to do a new config not much point in fixing the filesystem on the emulated disks, though like mentioned new config should only be done if you're pretty sure the disks are OK, safer way in these cases is to rebuild to spare disks keeping the old ones intact, and you still need to fix the underlying problem.

Link to comment

Emulated disk9 is unmountable. Any other option?

juno-diagnostics-20190309-1934.zip

 

Edit: Reconstruct ended, but not mounting, and the reconstructed disk has the same error for xfs_repair.

I consulted with someone who told me that I also have a memoty issue.

Part of dmesg command output:

 

[ 4160.946042] XFS (md9): Metadata CRC error detected at xfs_agi_read_verify+0x89/0xd2 [xfs], xfs_agi block 0x1ffffffe2
[ 4160.946667] XFS (md9): Unmount and run xfs_repair
[ 4160.947018] XFS (md9): First 128 bytes of corrupted metadata buffer:
[ 4160.947373] 00000000d50ca9ae: 58 41 47 49 00 00 00 01 00 00 00 04 0f ff ff ff  XAGI............
[ 4160.947990] 000000002fa5e4f9: 00 00 0e 80 00 00 00 03 00 00 00 01 00 00 00 e8  ................
[ 4160.948600] 000000000b6b644d: 09 3e 3e 07 ff ff ff ff ff ff ff ff ff ff ff ff  .>>.............
[ 4160.949217] 00000000d8d5886f: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[ 4160.949827] 00000000d1ba6cc7: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[ 4160.950447] 000000007c95e4e3: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[ 4160.951061] 000000009f7ac1b5: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[ 4160.951672] 0000000068cf2da9: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[ 4160.952355] XFS (md9): metadata I/O error in "xfs_trans_read_buf_map" at daddr 0x1ffffffe2 len 1 error 74
[ 4160.968428] XFS (md9): Metadata CRC error detected at xfs_agi_read_verify+0x89/0xd2 [xfs], xfs_agi block 0x27fffffda
[ 4160.969070] XFS (md9): Unmount and run xfs_repair
[ 4160.969411] XFS (md9): First 128 bytes of corrupted metadata buffer:
[ 4160.969758] 00000000d50ca9ae: 58 41 47 49 00 00 00 01 00 00 00 05 07 54 1e 8e  XAGI.........T..
[ 4160.970384] 000000002fa5e4f9: 00 00 0e 80 00 3e 1b 5b 00 00 00 01 00 00 00 8d  .....>.[........
[ 4160.971012] 000000000b6b644d: 17 c8 5e a2 ff ff ff ff ff ff ff ff ff ff ff ff  ..^.............
[ 4160.971625] 00000000d8d5886f: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[ 4160.972253] 00000000d1ba6cc7: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[ 4160.972872] 000000007c95e4e3: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[ 4160.973492] 000000009f7ac1b5: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[ 4160.974163] 0000000068cf2da9: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
[ 4160.974833] XFS (md9): metadata I/O error in "xfs_trans_read_buf_map" at daddr 0x27fffffda len 1 error 74
[ 4160.975474] XFS (md9): Error -117 reserving per-AG metadata reserve pool.
[ 4160.975481] XFS (md9): xfs_do_force_shutdown(0x8) called from line 548 of file fs/xfs/xfs_fsops.c.  Return address = 00000000445d3cbe

[ 4160.975490] XFS (md9): Corruption of in-memory data detected.  Shutting down filesystem
[ 4160.975836] XFS (md9): Please umount the filesystem and rectify the problem(s)

Edited by Gico
Link to comment
15 hours ago, Gico said:

Edit: Reconstruct ended, but not mounting, and the reconstructed disk has the same error for xfs_repair

That's expected, since it will rebuild what is currently on the emulated disk, hope you did it on a new disk.

 

15 hours ago, Gico said:

Emulated disk9 is unmountable. Any other option?

Upgrade to v6.7rc which includes a newer xfsprogs, if that still fails best bet is using the old disk, in case it wasn't rebuilt on top.

 

 

Link to comment
16 hours ago, Gico said:

Edit: Reconstruct ended, but not mounting, and the reconstructed disk has the same error for xfs_repair.

I consulted with someone who told me that I also have a memoרy issue.

Part of dmesg command output:

....

[ 4160.975490] XFS (md9): Corruption of in-memory data detected.  Shutting down filesystem
[ 4160.975836] XFS (md9): Please umount the filesystem and rectify the problem(s)

Do you also see a memory issue? Probably better to get a new memory stick first, right?

 

Yes I reconstructed disk9 on a new disk

Link to comment

Done nothing yet, booted up the server and although reconstruct ended,

something is not right with the disk assignments.

Do I need to address this before v6.7rc? New Config?

Edit: Disk3 repaired by xfs_repair, and I trust it's content.

 

VzX7RDp.jpg

Edited by Gico
Link to comment

Problem was since emulated disk9 was unmountable Unraid couldn't correctly resize the filesystem during the previous rebuild, so it's still with the old size, but like mentioned no point in rebuilding an unmountable emulated disk, unassign disk9 and start the array, to again use the emulated disk, then upgrade to v6.7 and run xfs_repair on the emulated disk, if xfs_repair still fails no point in rebuilding, best bet is to use the old disk after a new config.

Link to comment

Finally it's fixed! The logical disk9 fs is OK now. Thank you!

disk3: Had 5.31 TB in use before this event, now 4.65 TB, out of which 199 GB are in lost+found. Disk is disabled & emulated.

disk9: Had 4.70 TB in use before this event, now 4.04 TB in use, out of which 1.64 TB are in lost+found. Disk is disabled, emulated, & not installed.

How to proceed?

lost+found are good files that only "lost" their folder?

 

After this would be done, I will try to see if the original disk9 has anything else to offer with xfs_repair -P.

juno-diagnostics-20190311-0309.zip

Edited by Gico
Link to comment
6 hours ago, Gico said:

How to proceed?

You need to rebuild both disks, it can be done at the same time, if rebuilding disk3 on top of the old disk you might want to check the old disk first, it might be less corrupt, you can moun't it with UD but only with the array is stopped or it won't mount.

 

6 hours ago, Gico said:

lost+found are good files that only "lost" their folder?

Usually yes, you'll need to go through them and get them sorted.

Link to comment

The old (physical) disk3 fs seems to be in a perfect shape: Mounted successfully through UD, Used/Free capacity is as before this event,

no lost+found folder, and I can browse it through Unraid gui.

 

I wrote the earlier in this thread, and the FS had a perfect result in xfs_repair,

so I didn't understand why I had problems later.

On 3/10/2019 at 12:36 PM, Gico said:

Edit: Disk3 repaired by xfs_repair, and I trust it's content.

 

So I will leave disk3 unassigned, assign the replacement disk9, rebuild it from the xfs_repaired logical disk9,

then assign original disk3, new config and rebuild parity. Correct?

 

Later I will check if original disk9 is in a better state.

 

Edit: Why can't I mount the original disk9 through UD when the array is stopped? It says:

"Mar 12 22:17:17 Juno unassigned.devices: Mount of '/dev/sdu1' failed. Error message: mount: /mnt/disks/WDC_WD6002FRYZ-01WD5B0_K1GGUUVD: /dev/sdu1 already mounted or mount point busy."

Edited by Gico
Link to comment
11 hours ago, Gico said:

So I will leave disk3 unassigned, assign the replacement disk9, rebuild it from the xfs_repaired logical disk9,

then assign original disk3, new config and rebuild parity. Correct?

Yes.

 

11 hours ago, Gico said:

Edit: Why can't I mount the original disk9 through UD when the array is stopped? It says:

"Mar 12 22:17:17 Juno unassigned.devices: Mount of '/dev/sdu1' failed. Error message: mount: /mnt/disks/WDC_WD6002FRYZ-01WD5B0_K1GGUUVD: /dev/sdu1 already mounted or mount point busy."

Try rebooting.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.