AshleyS Posted January 1, 2023 Share Posted January 1, 2023 Happy new year! I have tried looking up these errors and am not sure exactly what steps to take so I have turned to the forums. Setup I am running - Unraid Version: 6.11.5 I am using - Community Applications, Dynamix File Manager, My Servers and User Scripts. Docker - Cloudflare-DDNS, mariadb, nextcloud, plex, swag and vaultwarden. I've got two Sabrent Rocket 4.0 1TB's running in raid 1 as cache. X3 18TB Seagate Exos HDD'S - in dual parity (plan to add more in the future). The Problem Last month I was listening to music on plexamp and it suddenly stopped. I checked the server and it said cache drive nvme0n1 was missing. Thoughtlessly I decided to reboot the system (Through the Unraid option). Once it had booted back up the drive appeared again and that's when I decided to get a second nvme and run them in raid 1. After rebooting again the drive showed up as missing, after a few more reboots (I realise this probably isn't too smart) The drive shows up again. This is still the case on reboots now. When the server starts back up, sometimes the nvme is missing sometimes its not. I checked my syslog and it keeps showing errors on my cache drive (See image) I searched this error and found a post saying to run this command (btrfs dev stats /mnt/cache) so I did (See image) The post said all of the above is meant to show 0. Plan From my understanding this is a problem with the cache drive itself and I am planning on replacing the nvme. Is this the correct thing to do or is there something I can to do in software? Thank you for your time - Ashley pacific-diagnostics-20230101-1312.zip Quote Link to comment
JorgeB Posted January 2, 2023 Share Posted January 2, 2023 Swap m.2 slots between the devices and see if the issue follows the device, you should run a scrub once both are online. Quote Link to comment
AshleyS Posted January 2, 2023 Author Share Posted January 2, 2023 Thank you so much for the reply! I have just ran a scrub (before swapping the slots) and this is the result. Doesn't look good. Is it still safe to swap the slots over? Quote Link to comment
JorgeB Posted January 2, 2023 Share Posted January 2, 2023 Was this a correcting scrub? If yes run another to confirm all errors are fixed. Quote Link to comment
AshleyS Posted January 2, 2023 Author Share Posted January 2, 2023 It was not a correcting scrub. I checked "Repair corrupted blocks" and this was the result 😬 Quote Link to comment
JorgeB Posted January 2, 2023 Share Posted January 2, 2023 Since the errors are uncorrectable you'll need to nuke the pool, you can try to backup what you can before. Quote Link to comment
AshleyS Posted January 2, 2023 Author Share Posted January 2, 2023 Alright, will do! Just one more thing, is this a btrfs, nvme or me being silly issue? From what I've seen it seems that people with similar issues have swapped to xfs as a "solution". Thank you for your help, I checked your profile and the amount of people you help is crazy! Thanks again 😄 Quote Link to comment
Solution JorgeB Posted January 2, 2023 Solution Share Posted January 2, 2023 The problem was caused by the device dropping offline, and because of that likely the 2nd device never fully synced, so it's not able to repair, btrfs won't make devices drop offline, that's a hardware issue, with the device or board problem/compatibility issue. 1 Quote Link to comment
semioniy Posted September 28, 2023 Share Posted September 28, 2023 Hi. I wanted to give a heads-up to anyone experiencing this problem from time to time. Problem: My cache pool consists of 3 SSDs, 2 of which are fairly old (lived through 2 laptops), and I had a problem of one of the SSDs being disconnected mid pool operation, and errors starting to accumulate, until I reboot the system and run scrub. Suggested solution / investigation: Some forum posts suggested that the issue lies in the SSDs themselves, cables, or even the backplane of the motherboard. I swapped cables, tried 2 different PCIe - SATA adapters, nothing helped. Actual solution (in my specific case): TLDR - I bought and installed a UPS for the server. All the errors stopped appearing. Longer version - I noticed that my 3d printer sometimes had a horizontal shift in the prints in one of the directions. That's apparently a sign of a short power outage (possibly due to voltage fluctuations) - not so long that the printer would stop and force me to restart the print, but long enough for it to lose its position. I decided that losing my data because of a voltage spike would defeat the purpose of a NAS, so I bought a UPS (first one that I found that satisfied my wattage needs - APC Back UPS BX - BX950MI-GR - 950 VA). Errors didn't appear ever since, and as an added benefit - it has a battery, and UnRaid can safely shutdown in the event of power outage if you connect UPS to the server via USB. Quote Link to comment
semioniy Posted November 26, 2023 Share Posted November 26, 2023 The issue of cache drives dropping offline has returned despite UPS. UPS added some stability for sure, but, apparently, didn't solve the problem altogether. Quote Link to comment
JorgeB Posted November 27, 2023 Share Posted November 27, 2023 This sometimes helps, on the main GUI page click on the flash drive, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (top right) and add this to your default boot option, after "append initrd=/bzroot" nvme_core.default_ps_max_latency_us=0 pcie_aspm=off e.g.: append initrd=/bzroot nvme_core.default_ps_max_latency_us=0 pcie_aspm=off Reboot and see if it makes a difference. Quote Link to comment
semioniy Posted November 27, 2023 Share Posted November 27, 2023 Thanx, but my SSDs are SATA, not nvme. I'll add it regardless, maybe I'll install a nvme drive in the future 😁 Quote Link to comment
JorgeB Posted November 27, 2023 Share Posted November 27, 2023 2 hours ago, semioniy said: Thanx, but my SSDs are SATA Then it won't help, this can happen when you are replying to someone else's thread, we usually recommend you always start your own. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.