Shares disappeared, new cache I/O errors

Followers

March 24, 20242 yr

Hello!

I attempted to access some docker services today and realized that the cache drive was "out". The device appears as "Active, normal operation" but there's several errors in its log and most shares are missing. After a clean reboot there was a notification that an unclean shutdown was detected and currently there's a parity check running. Shares are still missing after the reboot.

The cache drive is new (~10 days) and was operating with no issues for the past week.

Here's the cache drive log:

Mar 24 15:56:05 Tower kernel: nvme0n1: p1
Mar 24 15:57:28 Tower emhttpd: Samsung_SSD_990_PRO_with_Heatsink_1TB_S73JNJ0W605701A (nvme0n1) 512 1953525168
Mar 24 15:57:28 Tower emhttpd: import 30 cache device: (nvme0n1) Samsung_SSD_990_PRO_with_Heatsink_1TB_S73JNJ0W605701A
Mar 24 15:57:32 Tower emhttpd: read SMART /dev/nvme0n1
Mar 24 15:57:47 Tower emhttpd: shcmd (57): mount -t xfs -o noatime,nouuid /dev/nvme0n1p1 /mnt/cache
Mar 24 15:57:47 Tower kernel: XFS (nvme0n1p1): Mounting V5 Filesystem
Mar 24 15:57:48 Tower kernel: XFS (nvme0n1p1): Starting recovery (logdev: internal)
Mar 24 15:57:48 Tower kernel: XFS (nvme0n1p1): Ending recovery (logdev: internal)
Mar 24 16:01:55 Tower kernel: nvme0n1: I/O Cmd(0x2) @ LBA 516769048, 8 blocks, I/O Error (sct 0x3 / sc 0x71)
Mar 24 16:01:55 Tower kernel: I/O error, dev nvme0n1, sector 516769048 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2
Mar 24 16:01:55 Tower kernel: nvme0n1: detected capacity change from 1953525168 to 0
Mar 24 16:01:55 Tower kernel: nvme0n1p1: writeback error on inode 1086305362, offset 1630208, sector 989418728
Mar 24 16:01:55 Tower kernel: nvme0n1p1: writeback error on inode 1076556609, offset 0, sector 979578768
Mar 24 16:01:55 Tower kernel: XFS (nvme0n1p1): log I/O error -5
Mar 24 16:01:55 Tower kernel: XFS (nvme0n1p1): Filesystem has been shut down due to log error (0x2).
Mar 24 16:01:55 Tower kernel: XFS (nvme0n1p1): Please unmount the filesystem and rectify the problem(s).
Mar 24 16:01:55 Tower kernel: XFS (nvme0n1p1): metadata I/O error in "xfs_imap_to_bp+0x50/0x70 [xfs]" at daddr 0x587d2230 len 32 error 5
Mar 24 16:01:55 Tower kernel: nvme0n1p1: writeback error on inode 555942864, offset 0, sector 507454752
Mar 24 16:01:55 Tower kernel: nvme0n1p1: writeback error on inode 536871051, offset 73728, sector 498262064
Mar 24 16:01:55 Tower kernel: nvme0n1p1: writeback error on inode 556136556, offset 86016, sector 507648792
Mar 24 16:01:55 Tower kernel: nvme0n1p1: writeback error on inode 1076717193, offset 0, sector 979739088
Mar 24 16:01:55 Tower kernel: nvme0n1p1: writeback error on inode 536871051, offset 77824, sector 498262072
Mar 24 16:01:55 Tower kernel: nvme0n1p1: writeback error on inode 1626563962, offset 4128768, sector 1508843144

Any ideas?

Thanks!

tower-diagnostics-20240324-1802.zip

Edited March 24, 20242 yr by Tzundoku

Quote

Solved by JorgeB

March 25, 20242 yr

Go to solution

March 24, 20242 yr

Community Expert

The NVMe device dropped offline, power cycle the server, don't just reboot, then post new diags.

Quote

March 25, 20242 yr

Author

20 hours ago, JorgeB said:

The NVMe device dropped offline, power cycle the server, don't just reboot, then post new diags.

Thanks for the prompt reply JorgeB.

I powercycled as directed but also included the following lines to the syslinux config after going through a couple of other posts:

nvme_core.default_ps_max_latency_us=0 pcie_aspm=off

Initially everything was working fine, then while watching a movie through Jellyfin shares went out again.

Diagnostics taken immediately after.

tower-diagnostics-20240325-1615.zip

Quote

March 25, 20242 yr

Author

Should I post diagnostics immediately after powercycling or is the above adequate?

Thanks

tower-diagnostics-20240325-1636.zip

Edited March 25, 20242 yr by Tzundoku
diagnostics right after a powercycle.

Quote

March 25, 20242 yr

Community Expert

Device dropped offline again, try a different m.2 slot if available, if issues continue would recommend using a different brand/model device.

Quote

March 25, 20242 yr

Author

3 hours ago, JorgeB said:

Device dropped offline again, try a different m.2 slot if available, if issues continue would recommend using a different brand/model device.

Any chance the issue could be related to something else? I.e. the mobo?

I was using a Samsung 870 evo with months of uptime prior to replacing it with the one that is currently going offline. A month or so ago that drive started displaying similar I/O errors whenever I initialized a windows vm (vfio binded ssd which was working fine until last month as well). Thought the 870 evo cache had the issue, hence the replacement.

Quote

March 25, 20242 yr

Community Expert
Solution

23 minutes ago, Tzundoku said:

I.e. the mobo?

It's possible, but since you have another NVMe device, swap slots between them and re-test, see where the issues follows.

Quote

4 weeks later...

April 22, 20242 yr

Author

On 3/25/2024 at 8:53 PM, JorgeB said:

It's possible, but since you have another NVMe device, swap slots between them and re-test, see where the issues follows.

Swapped drives and at the same day the latest stable release dropped so I went with the update as well- no issues so far so either could be the fix.

Thanks for your time!

Quote

2 weeks later...

April 30, 20242 yr

Author

On 3/25/2024 at 8:53 PM, JorgeB said:

It's possible, but since you have another NVMe device, swap slots between them and re-test, see where the issues follows.

Updating- issue is back since yesterday unfortunately.

Quote

May 1, 20242 yr

Community Expert

Problem is with the same device or same slot?

Quote

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Followers

Go to topic listing

Shares disappeared, new cache I/O errors

Featured Replies

Solved by JorgeB

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)