July 2, 20251 yr Hello,This morning I received notifications from this BTRFS monitoring script from JorgeB that I had BRTFS errors on my cache drive. Main page shows the nvme device is offline.nas2-diagnostics-20250702-0815.zipThis is my third time having this issue (previous threads here and here).Every time this happened, it was the NVME drive with the errors. After the last occurrence, I replaced the drive with a new Samsung 990 Pro. So I think it's unlikely a problem with the drive itself.What's the best course of action to figuring out why this keeps happening and how to fix it? Edited July 2, 20251 yr by projectsunset
July 2, 20251 yr Device is dropping offline, this appears to be an issue with some board/nvme/kernel combinations, was the previous device from the same model?
July 2, 20251 yr Author Thanks for your help Jorge. Previous NVME was a Samsung 960 Pro 512GB. Current is 990 Pro 2TB. Some more info... Rebooted the server and checked the BIOS and the NVME drive was missing. Re-seated the NVME and booted up without issue.Short and long SMART self tests, no errors.Samsung_SSD_990_PRO_with_Heatsink_2TB-20250702-0932.txtRan a scrub, no uncorrectable errors.UUID: 6fa168e8-ff47-4535-bd33-2e3897328848 Scrub started: Wed Jul 2 09:18:32 2025 Status: finished Duration: 0:03:06 Total to scrub: 179.68GiB Rate: 989.14MiB/s Error summary: verify=1964 csum=563248 Corrected: 565212 Uncorrectable: 0 Unverified: 0Having encountered the same problem with two separate NVME drives, it seems unlikely that the issue is with the drives themselves. It also seems unlikely that the NVME wasn't seated properly as both times the server ran for months without issue before the NVME dropped offline.
July 2, 20251 yr 23 minutes ago, projectsunset said:Rebooted the server and checked the BIOS and the NVME drive was missing. Re-seated the NVME and booted up without issue.Typically, when this happens, you need to power cycle the server, just rebooting is not enough.I would recommend using a different brand device, or if it's easier to use a different board, it may also help, and look for a BIOS update, in rare cases it can also help.
July 2, 20251 yr Author 52 minutes ago, JorgeB said:Typically, when this happens, you need to power cycle the server, just rebooting is not enough.I would recommend using a different brand device, or if it's easier to use a different board, it may also help, and look for a BIOS update, in rare cases it can also help.I'll try a power cycle next time instead of reboot or re-seat of the nvme.Unfortunately there's not a lot of selection of available boards that fit the bill for my 9th gen CPU build. Most recent BIOS update was released 4 years ago, so no luck there.I'm planning on building a new NAS later this year. I'll look into purchasing a different brand NVME for now that I can move into the new server once that's built.Thanks for you help Jorge.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.