February 21, 20233 yr I have had my unraid server running for about 2 months now, and this has happened once in the past. The solution then was just to restart and everything was fine. Now that it has happened again, I am more concerned. root@UnraidServer:~# btrfs device stats /mnt/cache/ [/dev/nvme0n1p1].write_io_errs 1 [/dev/nvme0n1p1].read_io_errs 1046565 [/dev/nvme0n1p1].flush_io_errs 1 [/dev/nvme0n1p1].corruption_errs 0 [/dev/nvme0n1p1].generation_errs 0 This issue manifests itself through the fact that almost all of my containers become unresponsive, no longer function, and then will not restart/start. In the container's logs, I get errors like these: grep: (standard input): I/O error /usr/bin/wg-quick: line 50: read: read error: 0: I/O error Sonarr failed to start: AppFolder /config is not writable The nvme in question is a SK hynix Platinum P41 1TB. It is installed in the M.2_1 slot, as seen in the image below (taken from my TUF GAMING Z690-PLUS WIFI D4 manual): I have attached my system log / diagnostics zip file. As you can see, there is no SMART log for the nvme in question. I rebooted the server, and then attached it. Again, everything is back to normal now that I have rebooted. The syslog.txt is no longer being flooded with BTRFS error (device nvme0n1p1: state EA): bdev /dev/nvme0n1p1 errs: wr 1, rd 519402, flush 1, corrupt 0, gen 0 like it is in the attached syslog.txt (within the diagnostics zip file), and all my containers are running again without issue. After rebooting, I did a scrub on the nvme drive: UUID: 7579a732-bdd9-4dac-b182-f01dbb08f3c7 Scrub started: Tue Feb 21 08:19:52 2023 Status: finished Duration: 0:00:32 Total to scrub: 143.52GiB Rate: 4.48GiB/s Error summary: no errors found and reran this: root@UnraidServer:~# btrfs device stats /mnt/cache/ [/dev/nvme0n1p1].write_io_errs 0 [/dev/nvme0n1p1].read_io_errs 0 [/dev/nvme0n1p1].flush_io_errs 0 [/dev/nvme0n1p1].corruption_errs 0 [/dev/nvme0n1p1].generation_errs 0 I feel like in another week or so, this issue is going to pop up again. Any ideas on what I can do to resolve this so I don't get these errors anymore? Thanks unraidserver-diagnostics-20230221-0720.zip SHPP41-1000GM_SSB6N82781170710H-20230221-0812.txt Edited February 21, 20233 yr by halexh
February 21, 20233 yr See if this helps, on the main GUI page click on the flash drive, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (top right) and add this to your default boot option, after "append initrd=/bzroot" nvme_core.default_ps_max_latency_us=0 pcie_aspm=off e.g.: append initrd=/bzroot nvme_core.default_ps_max_latency_us=0 pcie_aspm=off Reboot and see if it makes a difference, if it doesn't I would try a different model NVMe device if that's a possibility.
February 21, 20233 yr Author 6 minutes ago, JorgeB said: See if this helps, on the main GUI page click on the flash drive, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (top right) and add this to your default boot option, after "append initrd=/bzroot" nvme_core.default_ps_max_latency_us=0 pcie_aspm=off e.g.: append initrd=/bzroot nvme_core.default_ps_max_latency_us=0 pcie_aspm=off Reboot and see if it makes a difference, if it doesn't I would try a different model NVMe device if that's a possibility. Its going into a power saving state, causing a write/read failure? Just curious on your thinking as to how that could be the issue
February 22, 20233 yr Not sure that is the problem in this case but it's a know issue with some NVMe devices and Linux, so worth trying.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.