[7.0.0] disk error - XFS (dm-2): metadata I/O error - General Support

February 6, 20251 yr

I got an error on one of my disks. I replaced it with a new one and rebuilt the array successfully. After one day I started getting this:

Feb  6 00:00:33 nedio-server Docker Auto Update: Installing Updates for borgmatic immich mariadb Minio netdata nextcloud
Feb  6 00:00:33 nedio-server kernel: XFS (dm-2): Metadata corruption detected at xfs_buf_ioend+0xa9/0x384, xfs_inode block 0xe4343410 xfs_inode_buf_verify
Feb  6 00:00:33 nedio-server kernel: XFS (dm-2): Unmount and run xfs_repair
Feb  6 00:00:33 nedio-server kernel: XFS (dm-2): First 128 bytes of corrupted metadata buffer:
Feb  6 00:00:33 nedio-server kernel: 00000000: b4 cb fc 73 d1 4d 06 7b e3 4c 72 74 f9 72 b9 42  ...s.M.{.Lrt.r.B
Feb  6 00:00:33 nedio-server kernel: 00000010: f1 19 35 c4 a7 ce b4 66 36 dc 1f f6 88 e1 ea ad  ..5....f6.......
Feb  6 00:00:33 nedio-server kernel: 00000020: 13 28 1e 6a a1 1c e5 46 6c 9f ff 94 7a 5d e1 12  .(.j...Fl...z]..
Feb  6 00:00:33 nedio-server kernel: 00000030: 9b 97 0b 2c 47 92 6d 19 f3 84 fa d6 22 0b a3 3e  ...,G.m....."..>
Feb  6 00:00:33 nedio-server kernel: 00000040: 4b 51 0d 41 a3 8e 59 f9 09 13 ce 3f c1 a2 57 87  KQ.A..Y....?..W.
Feb  6 00:00:33 nedio-server kernel: 00000050: f9 c6 30 b4 bc 30 65 b1 0f 86 aa 06 6f 35 57 cc  ..0..0e.....o5W.
Feb  6 00:00:33 nedio-server kernel: 00000060: c6 33 3c 33 9c 11 cc 2c d3 fc 97 10 61 ea 1f 7c  .3<3...,....a..|
Feb  6 00:00:33 nedio-server kernel: 00000070: bc 50 a2 fe bf b9 23 36 b3 60 bc 01 60 c0 c3 d7  .P....#6.`..`...
Feb  6 00:00:33 nedio-server kernel: XFS (dm-2): metadata I/O error in "xfs_imap_to_bp+0x52/0x74" at daddr 0xe4343410 len 32 error 117
Feb  6 00:00:34 nedio-server emhttpd: read SMART /dev/sdd
Feb  6 00:00:41 nedio-server emhttpd: read SMART /dev/sde
Feb  6 00:00:45 nedio-server kernel: XFS (dm-2): Metadata corruption detected at xfs_buf_ioend+0xa9/0x384, xfs_inode block 0x1467bfac8 xfs_inode_buf_verify
Feb  6 00:00:45 nedio-server kernel: XFS (dm-2): Unmount and run xfs_repair

I have stopped the array. Now I can't start it. Please help. My data is only partially backed up.. 😱

nedio-server-diagnostics-20250206-0928.zip

Quote

February 6, 20251 yr

Community Expert

Check filesystem on disk3.

Quote

February 6, 20251 yr

Author

I can't start the array in maintenance mode... So currently I still have the previous HDD (3TB) which was mounted as Disk 3 - unplugged from the unraid server. And the new HDD (4TB) which I used trying to replace the HDD at Disk 3 of the array. I have precleared the 4TB HDD and then mounted in into the Disk3 slot. The array rebuild completed without errors. But now the array does not start (even if I check the maintenance mode).

Please advise what should I do.

Edited February 6, 20251 yr by Nedio Server Unraid

Quote

February 6, 20251 yr

Community Expert

5 minutes ago, Nedio Server Unraid said:

I can't start the array in maintenance mode

Why not?

Quote

February 6, 20251 yr

Author

The log that I am getting is:

Feb 6 13:31:19 nedio-server emhttpd: cmdStart: already started

Although on the main page the array looks like not started.. With dropdown boxes for disk selection. Should I restart the server maybe?

Quote

February 6, 20251 yr

Community Expert

Try rebooting.

Quote

February 6, 20251 yr

Author

ok, after rebooting the server I was able to start the array in Maintenance mode and clicked the Check File System button on disk 3. I have the unraid spinner running since 25 mins..

image.png.d319e7152e085f0969fb68a06da23d23.png

Here is the log output:

Feb  6 14:28:53 nedio-server rc.nginx: Reloading Nginx server daemon configuration...
Feb  6 14:28:54 nedio-server nginx: 2025/02/06 14:28:54 [alert] 11134#11134: worker process 25079 exited on signal 6
Feb  6 14:28:56 nedio-server nginx: 2025/02/06 14:28:56 [alert] 11134#11134: worker process 25363 exited on signal 6
Feb  6 14:28:57 nedio-server nginx: 2025/02/06 14:28:57 [alert] 11134#11134: worker process 25511 exited on signal 6
Feb  6 14:29:06 nedio-server nginx: 2025/02/06 14:29:06 [alert] 11134#11134: worker process 25549 exited on signal 6
Feb  6 14:29:07 nedio-server ool www[15128]: /usr/local/emhttp/plugins/dynamix/scripts/xfs_check 'start' '/dev/mapper/md3p1' 'ST4000NE001-2MA101_WS25BT0Z' '-n'
Feb  6 14:29:07 nedio-server nginx: 2025/02/06 14:29:07 [alert] 11134#11134: worker process 26094 exited on signal 6
Feb  6 14:29:08 nedio-server nginx: 2025/02/06 14:29:08 [alert] 11134#11134: worker process 26284 exited on signal 6
...
Feb  6 14:31:02 nedio-server nginx: 2025/02/06 14:31:02 [alert] 11134#11134: worker process 33870 exited on signal 6
Feb  6 14:31:04 nedio-server nginx: 2025/02/06 14:31:04 [alert] 11134#11134: worker process 33950 exited on signal 6
Feb  6 14:31:06 nedio-server nginx: 2025/02/06 14:31:06 [alert] 11134#11134: worker process 34135 exited on signal 6
Feb  6 14:31:06 nedio-server publish: curl to arraymonitor failed
Feb  6 14:31:08 nedio-server nginx: 2025/02/06 14:31:08 [alert] 11134#11134: worker process 34242 exited on signal 6
Feb  6 14:31:10 nedio-server nginx: 2025/02/06 14:31:10 [alert] 11134#11134: worker process 34321 exited on signal 6
...
Feb  6 14:34:48 nedio-server nginx: 2025/02/06 14:34:48 [alert] 11134#11134: worker process 48053 exited on signal 6
Feb  6 14:34:49 nedio-server nginx: 2025/02/06 14:34:49 [alert] 11134#11134: worker process 48151 exited on signal 6
Feb  6 14:39:14 nedio-server emhttpd: spinning down /dev/sdf
Feb  6 14:39:14 nedio-server emhttpd: spinning down /dev/nvme0n1
Feb  6 14:39:15 nedio-server emhttpd: sdspin /dev/nvme0n1 down: 25
Feb  6 14:43:48 nedio-server emhttpd: spinning down /dev/sdj
Feb  6 14:43:48 nedio-server emhttpd: spinning down /dev/sdh
Feb  6 14:43:48 nedio-server emhttpd: spinning down /dev/sdg
Feb  6 14:43:48 nedio-server emhttpd: spinning down /dev/sdd
Feb  6 14:43:48 nedio-server emhttpd: spinning down /dev/sde
Feb  6 14:43:48 nedio-server emhttpd: spinning down /dev/sdb
Feb  6 14:43:48 nedio-server emhttpd: spinning down /dev/sdi
Feb  6 14:44:17 nedio-server emhttpd: spinning down /dev/sdc

Should I wait more? I do not see the

xfs_check

in htop.

Edited February 6, 20251 yr by Nedio Server Unraid

Quote

February 6, 20251 yr

Community Expert

Try running it using the CLI:

xfs_repair -v /dev/mapper/md3p1

Quote

February 6, 20251 yr

Author

I've run it. Got this:

user@nedio-server:~# xfs_repair -v /dev/mapper/md3p1
Phase 1 - find and verify superblock...
        - block cache size set to 4590072 entries
Phase 2 - using internal log
        - zero log...
Log inconsistent (didn't find previous header)
failed to find log head
zero_log: cannot find log head/tail (xlog_find_tail=5)
ERROR: The log head and/or tail cannot be discovered. Attempt to mount the
filesystem to replay the log or use the -L option to destroy the log and
attempt a repair.

Quote

February 6, 20251 yr

Community Expert

Use -L

Quote

February 6, 20251 yr

Author

Done. Here are the last lines of the console output:

Format log to cycle 2069032622.
cache_purge: shake on cache 0x4d67c0 left 1 nodes!?
cache_purge: shake on cache 0x4d67c0 left 1 nodes!?
cache_zero_check: refcount is 1, not zero (node=0x14bf180fb010)

        XFS_REPAIR Summary    Thu Feb  6 16:03:17 2025

Phase           Start           End             Duration
Phase 1:        02/06 16:00:31  02/06 16:00:31
Phase 2:        02/06 16:00:31  02/06 16:01:00  29 seconds
Phase 3:        02/06 16:01:00  02/06 16:01:36  36 seconds
Phase 4:        02/06 16:01:36  02/06 16:01:36
Phase 5:        02/06 16:01:36  02/06 16:01:37  1 second
Phase 6:        02/06 16:01:37  02/06 16:01:37
Phase 7:        02/06 16:01:37  02/06 16:01:37

Total run time: 1 minute, 6 seconds
done
user@nedio-server:~#

Thanks for helping me out JorgeB. I am a bit nervous, hoping I did not lose my data..

What's next?

Edited February 6, 20251 yr by Nedio Server Unraid

Quote

February 6, 20251 yr

Community Expert

Start the array in normal, the disk should mount, if yes check contents, also look for a lost+found folder.

Quote

February 6, 20251 yr

Author

It mounted. The array has started. But there is a lot in the lost+found directory on disk3... 😨

Also the used and free size of the disk 3 shows only 49 GB used.. whereas I remember having 2+TB there... BTW: is there a way to know from the diagnostic how much data was on that HDD before XFS restore?

Now, this disk (4TB) is the one that was rebuilt from the parity (when I replaced the originally failing HDD which was 3TB).

On that 3TB disk I did not try to check/restore the xfs..

Do you think it is safe to plug the HDD-3TB back, start the array in maintenance mode and try to check the XFS there? Could it restore more data?

Quote

February 6, 20251 yr

Community Expert
Solution

6 minutes ago, Nedio Server Unraid said:

is there a way to know from the diagnostic how much data was on that HDD before XFS restore?

Since the array was not started in normal mode when the diagnostics were taken, no way to know anything about the filesystems on any disks.

Leave the array as it currently is. If you have a spare port plug the original disk in and see if it will mount as an Unassigned Device.

Either way post new diagnostics.

Quote

February 9, 20251 yr

Author

ok, I have plugged in the original (3TB) disk. It shows up in the unassigned devices.

Should I start the array (with the new 4TB disk) before posting the diagnostics?

Should I try to mount the original disk outside of the array?

Quote

February 9, 20251 yr

Author

I have started the array in maintenance mode (so that I can pass the encryption passphrase to unraid).

Then I pressed the [mount] button next to the original disk in unassigned devices.

Here is the log I see:

Feb  9 14:46:56 nedio-server unassigned.devices: Mounting partition 'sdb1' at mountpoint '/mnt/disks/WD-WMC4N0F887V2'...
Feb  9 14:46:56 nedio-server emhttpd: shcmd (4857): /usr/sbin/cryptsetup luksOpen '/dev/sdb1' 'WD-WMC4N0F887V2'
Feb  9 14:46:58 nedio-server nginx: 2025/02/09 14:46:58 [alert] 11265#11265: worker process 90202 exited on signal 6
Feb  9 14:46:58 nedio-server unassigned.devices: Mount cmd: /sbin/mount -t 'xfs' -o rw,relatime '/dev/mapper/WD-WMC4N0F887V2' '/mnt/disks/WD-WMC4N0F887V2'
Feb  9 14:46:58 nedio-server kernel: XFS (dm-8): Mounting V5 Filesystem f185aae9-b6c9-4eb8-b7dc-23ae79ee4411
Feb  9 14:46:58 nedio-server kernel: XFS (dm-8): Starting recovery (logdev: internal)
Feb  9 14:46:59 nedio-server kernel: XFS (dm-8): Ending recovery (logdev: internal)
Feb  9 14:47:00 nedio-server unassigned.devices: Successfully mounted '/dev/mapper/WD-WMC4N0F887V2' on '/mnt/disks/WD-WMC4N0F887V2'.
Feb  9 14:47:00 nedio-server unassigned.devices: Device '/dev/sdb1' is not set to be shared.
Feb  9 14:47:00 nedio-server nginx: 2025/02/09 14:47:00 [alert] 11265#11265: worker process 90521 exited on signal 6

Quote

February 9, 20251 yr

Author

nedio-server-diagnostics-20250209-1456.zip

Quote

February 9, 20251 yr

Community Expert

Check the contents from the old disk, if they are correct and more complete, you can copy to the array.

Quote

February 9, 20251 yr

Author

I am considering to run a full parity check first on my Array. I somehow lost faith in it.. With writing corrections. Once I know my array is in a good shape - I will try to read from the original disk and write to the array. Please let me know if any part of this plan is a bad idea? 🙏

Then I will setup a proper 3-2-1 backup urgently...

Quote

February 10, 20251 yr

Community Expert

I would first look at recovering the data from that disk, then you can run a check.

Quote

February 10, 20251 yr

Author

My array was messed up:

 Duration: 12 hours, 49 minutes, 35 seconds. Average speed: 86.6 MB/s
 Finding 487755460 errors

All of that is supposed to be corrected now. Will start copying the data from the original disk to array.

Quote

March 6, 20251 yr

Author

managed to copy all my data from the old disk 🍀. have setup the backup ⛑️. a bit bitter after taste - why the hell my XFS file system got corrupted - still no answer 🤷‍♂️.

Quote

March 6, 20251 yr

Community Expert

59 minutes ago, Nedio Server Unraid said:

why the hell my XFS file system got corrupted - still no answer

That can be difficult to answer, typically it's caused by an unclean shutdown or flush issue, but if it keeps happening it may indicate an underlying issue.

Quote

[7.0.0] disk error - XFS (dm-2): metadata I/O error

Featured Replies

Solved by trurl

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)