November 3, 20241 yr I am having a cache drive issue for the second time. 2 nvme Samsung SSD 990 pros in a cache pool. The first time it worked for a few days, and then the cache pool said it entered read only mode. New to Unraid, I was messing around with a bunch of settings, figured that I broke things so I just started from scratch. 2nd time through, got things set up, less than a day to get errors again. Following some posts online, I tried btrfs dev stats /mnt/cache and saw a bunch of errors. [/dev/nvme0n1p1].write_io_errs 0 [/dev/nvme0n1p1].read_io_errs 0 [/dev/nvme0n1p1].flush_io_errs 0 [/dev/nvme0n1p1].corruption_errs 0 [/dev/nvme0n1p1].generation_errs 0 [/dev/nvme1n1p1].write_io_errs 4699665 [/dev/nvme1n1p1].read_io_errs 70894 [/dev/nvme1n1p1].flush_io_errs 53487 [/dev/nvme1n1p1].corruption_errs 0 [/dev/nvme1n1p1].generation_errs 0 I tried scrubbing the drive but it says all the errors are uncorrectable. Makes me think maybe a drive or port issue, but I wanted to ask for advice and if there is anything I can do to check/repair before opening up the case. If I just move the drive to a new port does it just automatically sync up with the one that is still working? Do I have to try to move everything to the array first? Also hoping to get the right order of operations (thought I read somewhere that with cache pool errors you don't want to power off the machine). Thanks for any help. bearcave-diagnostics-20241102-2350.zip
November 3, 20241 yr Community Expert Solution First thing to try to resolve is the NVMe device dropping offline. Some NVMe devices have issues with power states on Linux, try this, on the main GUI page click on the flash drive, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (top right) and add this to your default boot option, after "append initrd=/bzroot" nvme_core.default_ps_max_latency_us=0 pcie_aspm=off pcie_port_pm=off e.g.: append initrd=/bzroot nvme_core.default_ps_max_latency_us=0 pcie_aspm=off pcie_port_pm=off Reboot, run a correcting scrub and post the results
November 3, 20241 yr Author I did not initially do that because I thought my device was still connected so it wouldn't apply. Thank you for suggesting to do that. The scrub after a reboot was able to correct all of the errors. I cleared the stats and still no errors yet. Still early but hopeful that all stays well.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.