December 24, 2025Dec 24 Good day,I have a problem with my cache pool. I am using 2x brand new 2TB SSDs with BTRFS filesystem in RAID1 mode. But every couple of days or weeks one or the other SSD starts showing write errors. Since SMART values are always OK, I get no notification from Unraid that something is wrong. Some more days further down the line, the failing drive seems to be completely disconnected, temperature and power are not reported anymore and no read/write activities are happening on that drive. Still, Unraid reports everything to be OK. The nightly Scrubbing shows unrecoverable errors, but UNRAID gives me no notification about that, very annoying.The weird part is that it's not always the same SSD, sometimes it's the first one, sometimes the second one!To recover, I normally have to restart the server which brings both SSDs online again. However, scrubbing the filesystem shows a lot of unrecoverable errors on the previously failed drive.So I have to remove the failing SSD from the pool, save, and then re-add the SSD to force a complete re-sync of all the data from the healthy drive which always succeeds without any errors.Both SSDs will keep working without errors for some days or weeks until one of them shows write errors again and the whole story starts again....Im running out of ideas what to do, anyone out there who can give me a hint or had similar issues ? Should I re-create the entire pool with a different filesystem ? Any BIOS settings or boot options which might be relevant ? Auto-Trim is enabled Spin-Down disabled. Thanks in advance for any help.
December 28, 2025Dec 28 Author I have the issue with 2 brand new nvme ssds. After upgrading to 7.2.3 the issues got worse. Devices go offline within 2 hours.It seems there is an issue with the KERNEL/BOARD/NVME combination.What I find most frustrating is that Unraid doesnt even see an issue and reports all drives in the pool to be "OK" until the server crashes. Even read-write errors are only shown if you go into pool settings, but not on the dashboard or notifications...It seems the issues are fixed by disabling NVME low power states by appending "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" to boot parameters. Edited December 28, 2025Dec 28 by phoenixtech
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.