NVME Drive with I/O Errors

February 21, 20233 yr

I have had my unraid server running for about 2 months now, and this has happened once in the past. The solution then was just to restart and everything was fine. Now that it has happened again, I am more concerned.

root@UnraidServer:~# btrfs device stats /mnt/cache/
[/dev/nvme0n1p1].write_io_errs    1
[/dev/nvme0n1p1].read_io_errs     1046565
[/dev/nvme0n1p1].flush_io_errs    1
[/dev/nvme0n1p1].corruption_errs  0
[/dev/nvme0n1p1].generation_errs  0

This issue manifests itself through the fact that almost all of my containers become unresponsive, no longer function, and then will not restart/start. In the container's logs, I get errors like these:

grep: (standard input): I/O error
/usr/bin/wg-quick: line 50: read: read error: 0: I/O error
Sonarr failed to start: AppFolder /config is not writable

The nvme in question is a SK hynix Platinum P41 1TB. It is installed in the M.2_1 slot, as seen in the image below (taken from my TUF GAMING Z690-PLUS WIFI D4 manual):

I have attached my system log / diagnostics zip file. As you can see, there is no SMART log for the nvme in question. I rebooted the server, and then attached it.

Again, everything is back to normal now that I have rebooted. The syslog.txt is no longer being flooded with

BTRFS error (device nvme0n1p1: state EA): bdev /dev/nvme0n1p1 errs: wr 1, rd 519402, flush 1, corrupt 0, gen 0

like it is in the attached syslog.txt (within the diagnostics zip file), and all my containers are running again without issue.

After rebooting, I did a scrub on the nvme drive:

UUID:             7579a732-bdd9-4dac-b182-f01dbb08f3c7
Scrub started:    Tue Feb 21 08:19:52 2023
Status:           finished
Duration:         0:00:32
Total to scrub:   143.52GiB
Rate:             4.48GiB/s
Error summary:    no errors found

and reran this:

root@UnraidServer:~# btrfs device stats /mnt/cache/
[/dev/nvme0n1p1].write_io_errs    0
[/dev/nvme0n1p1].read_io_errs     0
[/dev/nvme0n1p1].flush_io_errs    0
[/dev/nvme0n1p1].corruption_errs  0
[/dev/nvme0n1p1].generation_errs  0

I feel like in another week or so, this issue is going to pop up again. Any ideas on what I can do to resolve this so I don't get these errors anymore? Thanks

unraidserver-diagnostics-20230221-0720.zip SHPP41-1000GM_SSB6N82781170710H-20230221-0812.txt

Edited February 21, 20233 yr by halexh

Quote

February 21, 20233 yr

Community Expert

See if this helps, on the main GUI page click on the flash drive, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (top right) and add this to your default boot option, after "append initrd=/bzroot"

nvme_core.default_ps_max_latency_us=0 pcie_aspm=off

e.g.:

append initrd=/bzroot nvme_core.default_ps_max_latency_us=0 pcie_aspm=off

Reboot and see if it makes a difference, if it doesn't I would try a different model NVMe device if that's a possibility.

Quote

February 21, 20233 yr

Author

6 minutes ago, JorgeB said:
See if this helps, on the main GUI page click on the flash drive, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (top right) and add this to your default boot option, after "append initrd=/bzroot"
nvme_core.default_ps_max_latency_us=0 pcie_aspm=off
e.g.:
append initrd=/bzroot nvme_core.default_ps_max_latency_us=0 pcie_aspm=off
Reboot and see if it makes a difference, if it doesn't I would try a different model NVMe device if that's a possibility.

Its going into a power saving state, causing a write/read failure? Just curious on your thinking as to how that could be the issue

Quote

February 22, 20233 yr

Community Expert

Not sure that is the problem in this case but it's a know issue with some NVMe devices and Linux, so worth trying.

Quote

NVME Drive with I/O Errors

Featured Replies

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)