December 21, 20232 yr Hello, I set up my cache pool for errors using the steps outlined in this post a while ago because I had problems with Docker not starting. The docker image was corrupt and needed to be recreated. I went ahead and did that and things have been fine. Unfortunately, I woke up this morning to more errors on my cache pool. Tried running a scrub (and then a scrub with repair corrupted blocks) but unfortunately lots were uncorrectable: Running btrfs dev stats /mnt/cache twice (with about a few minutes between running the command): Tried running Fix Common Errors, but nothing. Is nvme0n1p1 failing? If so, what steps should I take? I have them ran as a mirror. Attaching diagnostics and would appreciate any help. Thank you! noumenon-diagnostics-20231221-0604.zip Edited December 21, 20232 yr by rud
December 21, 20232 yr One of the NVMe devices dropped offline, power cycle the server, not just reboot, to see if it comes back, if yes run another scrub.
December 21, 20232 yr Author 1 hour ago, JorgeB said: One of the NVMe devices dropped offline, power cycle the server, not just reboot, to see if it comes back, if yes run another scrub. Firstly, thank you so much for the quick reply. I went ahead and power cycled the server. Was able to run another scrub and this time it was able to correct the errors. Ran another one and it seems good now. Although it seems fixed now, should I be concerned with the NVME device dropping offline? Is there more investigating I should do or be cautious about? ** Update** - Unfortunately, I was browsing through the webui, and now it seems the Docker Service failed to start. I assume my Docker file got corrupted since the drive went offline, so I went ahead and recreated the Docker image. Everything okay seems on *that* end now, but my question about concerns over the NVME device d ropping offline (long term) still stands. Thank you! Thanks again Edited December 21, 20232 yr by rud updates
December 21, 20232 yr This can help sometimes, on the main GUI page click on the flash drive, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (top right) and add this to your default boot option, after "append initrd=/bzroot" nvme_core.default_ps_max_latency_us=0 pcie_aspm=off e.g.: append initrd=/bzroot nvme_core.default_ps_max_latency_us=0 pcie_aspm=off Reboot and see if it makes a difference, if note best bet is to try a different model.
December 21, 20232 yr Author 29 minutes ago, JorgeB said: This can help sometimes, on the main GUI page click on the flash drive, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (top right) and add this to your default boot option, after "append initrd=/bzroot" nvme_core.default_ps_max_latency_us=0 pcie_aspm=off e.g.: append initrd=/bzroot nvme_core.default_ps_max_latency_us=0 pcie_aspm=off Reboot and see if it makes a difference, if note best bet is to try a different model. I went ahead and added that. I also went into the BIOS and made sure ASPM was off. Once again, thank you for the help and quick replies!
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.