Drive failed, rebuilt and now no shares show up

xjumper84 · January 20

I use this box for Frigate and Plex in dockers.

A week ago I did a parity check. Everything was good.

I had a 4TB drive start to show a ton of errors so I took it out and replaced it with a 10tb drive - rebuilt the array. after rebuild it doesn't show any shares, but all the files are physically on the drives.

When I look up the disks in terminal it says "structure needs cleaning" on all the drive I rebuilt. I also get "cannot access 'folder name': Input/Output error on the other drives.

I had a billion errors during the rebuild. I still have the drive I took out.

Diagnostics attached. Any help would be appreciated. Thank you

nas-diagnostics-20240119-1721.zip

Edited January 20 by xjumper84

JorgeB · January 20

Looks like a controller issue, errors with all connected disks, this can happen with onboard Ryzen SATA controllers during heavy load:

Jan 19 04:57:31 NAS kernel: md: disk0 read error, sector=1728810592
Jan 19 04:57:31 NAS kernel: md: disk3 read error, sector=1728810784
Jan 19 04:57:31 NAS kernel: md: disk1 read error, sector=1728810520
Jan 19 04:57:31 NAS kernel: md: disk2 read error, sector=1728810504

So disk4 rebuild would not have been successful, reboot and post new diags after array start.

xjumper84 · January 20

Its been trying to reboot for almost 10 minutes. Its stuck on disk5 as it says 'target is busy'.

I did a hard power off, checked the sata ports to be connected to the drives and motherboard.

Reboot came up fine, array wants to do a parity check - i paused it to create the diagnostics file - attached.

Question - if frigate was running while the drive was attempting to rebuild, would that have caused issues with parity, because Frigate was saving camera files to other disks?

nas-diagnostics-20240120-0817.zip

xjumper84 · January 20

I'm now seeing a lot of errors on the disk I just rebuilt - should I stop parity check, unmount the drive and run xfs_repair?

Jan 20 08:18:58 NAS kernel: XFS (md4): Metadata corruption detected at xfs_buf_ioend+0xac/0x386 [xfs], xfs_inode block 0x15c3e0ef0 xfs_inode_buf_verify
Jan 20 08:18:58 NAS kernel: XFS (md4): Unmount and run xfs_repair
Jan 20 08:18:58 NAS kernel: XFS (md4): First 128 bytes of corrupted metadata buffer:
Jan 20 08:18:58 NAS kernel: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:18:58 NAS kernel: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:18:58 NAS kernel: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:18:58 NAS kernel: 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:18:58 NAS kernel: 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:18:58 NAS kernel: 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:18:58 NAS kernel: 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:18:58 NAS kernel: 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:18:58 NAS kernel: XFS (md4): metadata I/O error in "xfs_imap_to_bp+0x50/0x70 [xfs]" at daddr 0x15c3e0ef0 len 32 error 117
Jan 20 08:18:58 NAS kernel: XFS (md4): Metadata corruption detected at xfs_buf_ioend+0xac/0x386 [xfs], xfs_inode block 0x15c3e0ef0 xfs_inode_buf_verify
Jan 20 08:18:58 NAS kernel: XFS (md4): Unmount and run xfs_repair
Jan 20 08:18:58 NAS kernel: XFS (md4): First 128 bytes of corrupted metadata buffer:
Jan 20 08:18:58 NAS kernel: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:18:58 NAS kernel: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:18:58 NAS kernel: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:18:58 NAS kernel: 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:18:58 NAS kernel: 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:18:58 NAS kernel: 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:18:58 NAS kernel: 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:18:58 NAS kernel: 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:18:58 NAS kernel: XFS (md4): metadata I/O error in "xfs_imap_to_bp+0x50/0x70 [xfs]" at daddr 0x15c3e0ef0 len 32 error 117
Jan 20 08:19:11 NAS kernel: XFS (md4): Metadata corruption detected at xfs_buf_ioend+0xac/0x386 [xfs], xfs_inode block 0x173f2aae8 xfs_inode_buf_verify
Jan 20 08:19:11 NAS kernel: XFS (md4): Unmount and run xfs_repair
Jan 20 08:19:11 NAS kernel: XFS (md4): First 128 bytes of corrupted metadata buffer:
Jan 20 08:19:11 NAS kernel: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:11 NAS kernel: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:11 NAS kernel: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:11 NAS kernel: 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:11 NAS kernel: 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:11 NAS kernel: 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:11 NAS kernel: 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:11 NAS kernel: 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:11 NAS kernel: XFS (md4): metadata I/O error in "xfs_imap_to_bp+0x50/0x70 [xfs]" at daddr 0x173f2aae8 len 32 error 117
Jan 20 08:19:11 NAS kernel: XFS (md4): Metadata corruption detected at xfs_buf_ioend+0xac/0x386 [xfs], xfs_inode block 0x174129f48 xfs_inode_buf_verify
Jan 20 08:19:11 NAS kernel: XFS (md4): Unmount and run xfs_repair
Jan 20 08:19:11 NAS kernel: XFS (md4): First 128 bytes of corrupted metadata buffer:
Jan 20 08:19:11 NAS kernel: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:11 NAS kernel: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:11 NAS kernel: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:11 NAS kernel: 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:11 NAS kernel: 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:11 NAS kernel: 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:11 NAS kernel: 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:11 NAS kernel: 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:11 NAS kernel: XFS (md4): metadata I/O error in "xfs_imap_to_bp+0x50/0x70 [xfs]" at daddr 0x174129f48 len 32 error 117
Jan 20 08:19:11 NAS kernel: XFS (md4): Metadata corruption detected at xfs_buf_ioend+0xac/0x386 [xfs], xfs_inode block 0x18141b988 xfs_inode_buf_verify
Jan 20 08:19:11 NAS kernel: XFS (md4): Unmount and run xfs_repair
Jan 20 08:19:11 NAS kernel: XFS (md4): First 128 bytes of corrupted metadata buffer:
Jan 20 08:19:11 NAS kernel: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:11 NAS kernel: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:11 NAS kernel: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:11 NAS kernel: 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:11 NAS kernel: 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:11 NAS kernel: 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:11 NAS kernel: 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:11 NAS kernel: 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:11 NAS kernel: XFS (md4): metadata I/O error in "xfs_imap_to_bp+0x50/0x70 [xfs]" at daddr 0x18141b988 len 32 error 117
Jan 20 08:19:11 NAS kernel: XFS (md4): Metadata corruption detected at xfs_buf_ioend+0xac/0x386 [xfs], xfs_inode block 0x15c3e0ef0 xfs_inode_buf_verify
Jan 20 08:19:11 NAS kernel: XFS (md4): Unmount and run xfs_repair
Jan 20 08:19:11 NAS kernel: XFS (md4): First 128 bytes of corrupted metadata buffer:
Jan 20 08:19:11 NAS kernel: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:11 NAS kernel: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:11 NAS kernel: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:11 NAS kernel: 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:11 NAS kernel: 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:11 NAS kernel: 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:11 NAS kernel: 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:11 NAS kernel: 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:11 NAS kernel: XFS (md4): metadata I/O error in "xfs_imap_to_bp+0x50/0x70 [xfs]" at daddr 0x15c3e0ef0 len 32 error 117
Jan 20 08:19:11 NAS kernel: XFS (md4): Metadata corruption detected at xfs_buf_ioend+0xac/0x386 [xfs], xfs_inode block 0x18bb364e8 xfs_inode_buf_verify
Jan 20 08:19:11 NAS kernel: XFS (md4): Unmount and run xfs_repair
Jan 20 08:19:11 NAS kernel: XFS (md4): First 128 bytes of corrupted metadata buffer:
Jan 20 08:19:11 NAS kernel: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:11 NAS kernel: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:11 NAS kernel: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:11 NAS kernel: 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:11 NAS kernel: 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:11 NAS kernel: 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:11 NAS kernel: 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:11 NAS kernel: 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:11 NAS kernel: XFS (md4): metadata I/O error in "xfs_imap_to_bp+0x50/0x70 [xfs]" at daddr 0x18bb364e8 len 32 error 117
Jan 20 08:19:17 NAS kernel: XFS (md4): Metadata corruption detected at xfs_buf_ioend+0xac/0x386 [xfs], xfs_inode block 0x16ffeca48 xfs_inode_buf_verify
Jan 20 08:19:17 NAS kernel: XFS (md4): Unmount and run xfs_repair
Jan 20 08:19:17 NAS kernel: XFS (md4): First 128 bytes of corrupted metadata buffer:
Jan 20 08:19:17 NAS kernel: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:17 NAS kernel: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:17 NAS kernel: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:17 NAS kernel: 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:17 NAS kernel: 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:17 NAS kernel: 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:17 NAS kernel: 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:17 NAS kernel: 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:17 NAS kernel: XFS (md4): metadata I/O error in "xfs_imap_to_bp+0x50/0x70 [xfs]" at daddr 0x16ffeca48 len 32 error 117
Jan 20 08:19:17 NAS kernel: XFS (md4): Metadata corruption detected at xfs_buf_ioend+0xac/0x386 [xfs], xfs_inode block 0x16ffeca48 xfs_inode_buf_verify
Jan 20 08:19:17 NAS kernel: XFS (md4): Unmount and run xfs_repair
Jan 20 08:19:17 NAS kernel: XFS (md4): First 128 bytes of corrupted metadata buffer:
Jan 20 08:19:17 NAS kernel: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:17 NAS kernel: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:17 NAS kernel: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:17 NAS kernel: 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:17 NAS kernel: 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:17 NAS kernel: 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:17 NAS kernel: 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:17 NAS kernel: 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:17 NAS kernel: XFS (md4): metadata I/O error in "xfs_imap_to_bp+0x50/0x70 [xfs]" at daddr 0x16ffeca48 len 32 error 117
Jan 20 08:19:37 NAS kernel: ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Jan 20 08:19:37 NAS kernel: ata9.00: configured for UDMA/133
Jan 20 08:19:39 NAS kernel: XFS (md4): Metadata corruption detected at xfs_buf_ioend+0xac/0x386 [xfs], xfs_inode block 0x15c3e0ef0 xfs_inode_buf_verify
Jan 20 08:19:39 NAS kernel: XFS (md4): Unmount and run xfs_repair
Jan 20 08:19:39 NAS kernel: XFS (md4): First 128 bytes of corrupted metadata buffer:
Jan 20 08:19:39 NAS kernel: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:39 NAS kernel: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:39 NAS kernel: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:39 NAS kernel: 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:39 NAS kernel: 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:39 NAS kernel: 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:39 NAS kernel: 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:39 NAS kernel: 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:19:39 NAS kernel: XFS (md4): metadata I/O error in "xfs_imap_to_bp+0x50/0x70 [xfs]" at daddr 0x15c3e0ef0 len 32 error 117
Jan 20 08:21:25 NAS kernel: XFS (md4): Metadata corruption detected at xfs_buf_ioend+0xac/0x386 [xfs], xfs_inode block 0x15c3e0ef0 xfs_inode_buf_verify
Jan 20 08:21:25 NAS kernel: XFS (md4): Unmount and run xfs_repair
Jan 20 08:21:25 NAS kernel: XFS (md4): First 128 bytes of corrupted metadata buffer:
Jan 20 08:21:25 NAS kernel: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:21:25 NAS kernel: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:21:25 NAS kernel: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:21:25 NAS kernel: 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:21:25 NAS kernel: 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:21:25 NAS kernel: 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:21:25 NAS kernel: 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:21:25 NAS kernel: 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:21:25 NAS kernel: XFS (md4): metadata I/O error in "xfs_imap_to_bp+0x50/0x70 [xfs]" at daddr 0x15c3e0ef0 len 32 error 117
Jan 20 08:21:48 NAS  ntpd[1447]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized
Jan 20 08:23:15 NAS kernel: ata10: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Jan 20 08:23:15 NAS kernel: ata10.00: configured for UDMA/133
Jan 20 08:23:15 NAS kernel: XFS (md4): Metadata corruption detected at xfs_buf_ioend+0xac/0x386 [xfs], xfs_inode block 0x15c3e0ef0 xfs_inode_buf_verify
Jan 20 08:23:15 NAS kernel: XFS (md4): Unmount and run xfs_repair
Jan 20 08:23:15 NAS kernel: XFS (md4): First 128 bytes of corrupted metadata buffer:
Jan 20 08:23:15 NAS kernel: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:23:15 NAS kernel: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:23:15 NAS kernel: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:23:15 NAS kernel: 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:23:15 NAS kernel: 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:23:15 NAS kernel: 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:23:15 NAS kernel: 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:23:15 NAS kernel: 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:23:15 NAS kernel: XFS (md4): metadata I/O error in "xfs_imap_to_bp+0x50/0x70 [xfs]" at daddr 0x15c3e0ef0 len 32 error 117
Jan 20 08:23:15 NAS kernel: XFS (md4): Metadata corruption detected at xfs_buf_ioend+0xac/0x386 [xfs], xfs_inode block 0x15c3e0ef0 xfs_inode_buf_verify
Jan 20 08:23:15 NAS kernel: XFS (md4): Unmount and run xfs_repair
Jan 20 08:23:15 NAS kernel: XFS (md4): First 128 bytes of corrupted metadata buffer:
Jan 20 08:23:15 NAS kernel: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:23:15 NAS kernel: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:23:15 NAS kernel: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:23:15 NAS kernel: 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:23:15 NAS kernel: 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:23:15 NAS kernel: 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:23:15 NAS kernel: 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:23:15 NAS kernel: 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Jan 20 08:23:15 NAS kernel: XFS (md4): metadata I/O error in "xfs_imap_to_bp+0x50/0x70 [xfs]" at daddr 0x15c3e0ef0 len 32 error 117

trurl · January 20

Those log entries are not in your most recently posted diagnostics. Post new diagnostics.

xjumper84 · January 20

New diagnostics attached.

nas-diagnostics-20240120-0846.zip

trurl · January 20

24 minutes ago, xjumper84 said:

unmount the drive and run xfs_repair?

Those ideas are missing some details.

Check filesystem on disk4. Be sure to use the webUI and not the command line. Capture the output and post it.

trurl · January 20

15 hours ago, xjumper84 said:

I still have the drive I took out.

Can you mount this disk as an Unassigned Device?

xjumper84 · January 20

4 minutes ago, trurl said:

Those ideas are missing some details.

Check filesystem on disk4. Be sure to use the webUI and not the command line. Capture the output and post it.

Output in the text file attached.

xfs_repair -n disk4.txt

xjumper84 · January 20

11 minutes ago, trurl said:

Can you mount this disk as an Unassigned Device?

Yes, I'll need to power off to physically install it. Should I do that now? Or attempt the xfs_repair first?

Edited January 20 by xjumper84

itimpi · January 20

15 hours ago, xjumper84 said:

I had a billion errors during the rebuild. I still have the drive I took out.

Is this the rebuild onto the 10TB drive? If so I would not expect the rebuilt drive to have usable content.

xjumper84 · January 20

5 minutes ago, itimpi said:

Is this the rebuild onto the 10TB drive? If so I would not expect the rebuilt drive to have usable content.

The billion errors were on disks 1-3. The rebuild went onto the 10TB drive (which is disk4).

Edited January 20 by xjumper84

trurl · January 20

7 minutes ago, xjumper84 said:

The billion errors were on disks 1-3. The rebuild went onto the 10TB drive.

Every bit of all other disks must be reliably read to reliably rebuild every bit of the new disk.

56 minutes ago, xjumper84 said:

Question - if frigate was running while the drive was attempting to rebuild, would that have caused issues with parity, because Frigate was saving camera files to other disks?

Ordinarily this is OK because parity is updated realtime, so parity would still be valid after rebuilding the data disk. On the other hand, it means parity is now no longer valid for that original disk.

The results of the check don't look good.

Let's see what can be read from the original as an Unassigned Device. Not entirely clear there was anything actually wrong with that disk since "a ton of errors" is not very specific and we don't have any diagnostics from then.

xjumper84 · January 20

2 minutes ago, trurl said:

Every bit of all other disks must be reliably read to reliably rebuilt every bit of the new disk.

Ordinarily this is OK because parity is updated realtime, so parity would still be valid after rebuilding the data disk. On the other hand, it means parity is now no longer valid for that original disk.

The results of the check don't look good.

Let's see what can be read from the original as an Unassigned Device. Not entirely clear there was anything actually wrong with that disk since "a ton of errors" is not very specific and we don't have any diagnostics from then.

Understood.

edit: the errors showed on md1, md2 and md3 (same as shown in the picture before the last reboot) to read something similar to: "metadata i/o error in "xfs_imap_to_bp+0x50/0x70 [xfs]" ... len 32 error 5

So i'll power off the machine, connect the drive and startup. Please refresh my memory, with the new drive added, the array won't start? and then I'll be able to add the original 4tb drive into Unassigned Devices?

Edited January 20 by xjumper84

xjumper84 · January 20

Ok. I re-added the original disk4 back, it showed up in Unassigned Devices. I mounted it and all my files are there.

What is the proper step to proceed?

I would think (and please let me know if this is correct) - unmount the drive from Unassigned Devices. Stop the array, change the 10tb drive to the original 4tb drive and then restart the array. Check the files to make sure all good, then do a parity check, ensure we're good there.

Then once we are, in unassigned devices - format the 10tb drive to xfs, copy the 4tb drive to the 10tb drive, unmount the 10tb. stop the array, change the 4tb drive to the 10tb drive, setup the 4tb drive in unassigned devices and restart the array?

trurl · January 20

1 minute ago, xjumper84 said:

I would think (and please let me know if this is correct) - unmount the drive from Unassigned Devices. Stop the array, change the 10tb drive to the original 4tb drive and then restart the array. Check the files to make sure all good, then do a parity check, ensure we're good there.

No this is not correct. The only way to get the original back in the array is New Config and rebuild parity.

2 minutes ago, xjumper84 said:

Then once we are, in unassigned devices - format the 10tb drive to xfs, copy the 4tb drive to the 10tb drive, unmount the 10tb. stop the array, change the 4tb drive to the 10tb drive, setup the 4tb drive in unassigned devices and restart the array?

And all of this is wrong too. If you replace the drive it will rebuilt again.

trurl · January 20

Just now, trurl said:

The only way to get the original back in the array is New Config and rebuild parity.

You should disable Docker and VM Manager in Settings until you get everything working right again. Stop anything that might write to your array, then New Config might be the safest approach, since only the parity disk would be written, and parity contains none of your data.

trurl · January 20

13 minutes ago, xjumper84 said:

disk4 back, it showed up in Unassigned Devices

Post new diagnostics so we can see check the health of that disk.

xjumper84 · January 20

New diagnostics attached

nas-diagnostics-20240120-0951.zip

xjumper84 · January 20

11 minutes ago, trurl said:

No this is not correct. The only way to get the original back in the array is New Config and rebuild parity.

And all of this is wrong too. If you replace the drive it will rebuilt again.

6 minutes ago, trurl said:

You should disable Docker and VM Manager in Settings until you get everything working right again. Stop anything that might write to your array, then New Config might be the safest approach, since only the parity disk would be written, and parity contains none of your data.

Ok - So I should:

go to settings set enable docker to no - this will keep the docker containers I have though right?

go to settings, set enable VM's to no - this will also keep the VM files I already have.

Then:

stop the array

switch the 10tb out

add the 4tb in

start the array with no parity check

go to tools -> new config

then rebuild parity, wait. don't let any docker run, don't copy anything in or out until it finishes.

Is that correct?

trurl · January 20

Jan 20 09:29:06 NAS unassigned.devices: Mounting partition 'sdi1' at mountpoint '/mnt/disks/WD-WCC4E4TSK55Y'...

sdi has

ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   200   200   051    -    177
...
197 Current_Pending_Sector  -O--CK   200   200   000    -    3
...
200 Multi_Zone_Error_Rate   ---R--   200   200   000    -    12

You should click on each of your WD disks to get to its settings, and add attributes 1 and 200 for monitoring. They all look good right now, unlike that disk.

7 minutes ago, xjumper84 said:

go to settings set enable docker to no - this will keep the docker containers I have though right?

go to settings, set enable VM's to no - this will also keep the VM files I already have.

This will just disable the Docker and VM Manager services, so none of these can run. They will not go away.

Do that much while we consider how to proceed.

trurl · January 20

Another possible approach is to try the rebuild again. We could do that first, see how it goes, keep the original for later. I am leaning towards this approach.

Anybody else @JorgeB @itimpi have any opinion or other ideas?

xjumper84 · January 20

18 minutes ago, trurl said:
Jan 20 09:29:06 NAS unassigned.devices: Mounting partition 'sdi1' at mountpoint '/mnt/disks/WD-WCC4E4TSK55Y'...
sdi has
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   200   200   051    -    177
...
197 Current_Pending_Sector  -O--CK   200   200   000    -    3
...
200 Multi_Zone_Error_Rate   ---R--   200   200   000    -    12
 
You should click on each of your WD disks to get to its settings, and add attributes 1 and 200 for monitoring. They all look good right now, unlike that disk.

This will just disable the Docker and VM Manager services, so none of these can run. They will not go away.

Do that much while we consider how to proceed.

I checked the other WD disks, they are all zero's on ID1, 197 and 200. Docker and VM's are set to no.

trurl · January 20

1 minute ago, xjumper84 said:

I checked the other WD disks, they are all zero's on ID1, 197 and 200.

As I said, but did you

21 minutes ago, trurl said:

add attributes 1 and 200 for monitoring

xjumper84 · January 20

17 minutes ago, trurl said:

Another possible approach is to try the rebuild again. We could do that first, see how it goes, keep the original for later. I am leaning towards this approach.

Anybody else @JorgeB @itimpi have any opinion or other ideas?

Could I copy the data off the original 4tb drive currently in Unassigned Devices to my desktop, just to have a good copy of the data, then erase the rebuilt data on disk 4. Then parity check to ensure parity is good. Then copy the data back to Disk 4?

Drive failed, rebuilt and now no shares show up

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation