mikela Posted July 14, 2020 Share Posted July 14, 2020 When I started up unRAID I got the following email message - unRAID Status: Warning [TOWER] - Cache pool BTRFS missing device(s) Event: Unraid Cache disk message Subject: Warning [TOWER] - Cache pool BTRFS missing device(s) Description: Samsung_SSD_960_PRO_512GB_S3EWNX0J411884E (nvme0n1) Importance: warning I ran a diagnostics when I first turned on unRAID. I then ran a Scrub and there were a large number (over 1K) of uncorrectable errors I then ran another diagnostics Both are attached. I am using unRaid version 6.8.3. I am using 2 dockers, binhex-krusader and steefdebruijn/docker-roonserver. I also have a Windows 10 VM. I am using the following apps: CA Appdata Backup/Restore v2, CA Auto Update, CA Fix Common Problems, Community Applications, Dynamix Local Master, Dynamix SSD Trim, Dynamix System Buttons, Dynamix System Info, Dynamix System Stats, Dynamix System Temp, NerdPack GUI, Preclear Disk, Unassigned Devices & Unassigned Devices Plus (Addon). Here is my equipment list. Any assistance would be greatly appreciated. tower-diagnostics-20200714-0831.zip tower-diagnostics-20200714-0846.zip Quote Link to comment
JorgeB Posted July 15, 2020 Share Posted July 15, 2020 Problems with the NVMe device: Jul 14 08:32:43 Tower kernel: nvme nvme0: I/O 769 QID 1 timeout, aborting Jul 14 08:32:43 Tower kernel: nvme nvme0: I/O 698 QID 3 timeout, aborting Jul 14 08:32:43 Tower kernel: nvme nvme0: I/O 128 QID 4 timeout, aborting Jul 14 08:32:43 Tower kernel: nvme nvme0: I/O 0 QID 5 timeout, aborting Jul 14 08:32:43 Tower kernel: nvme nvme0: I/O 1 QID 5 timeout, aborting Jul 14 08:32:43 Tower kernel: nvme nvme0: I/O 2 QID 5 timeout, aborting Jul 14 08:32:43 Tower kernel: nvme nvme0: I/O 3 QID 5 timeout, aborting Jul 14 08:32:43 Tower kernel: nvme nvme0: I/O 5 QID 5 timeout, aborting Jul 14 08:33:13 Tower kernel: nvme nvme0: I/O 769 QID 1 timeout, reset controller Jul 14 08:33:44 Tower kernel: nvme nvme0: I/O 0 QID 0 timeout, reset controller This can sometimes help: Some NVMe devices have issues with power states on Linux, try this, on the main GUI page click on flash, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (on the top right) and add this to your default boot option, after "append" nvme_core.default_ps_max_latency_us=0 Reboot and see if it makes a difference. 1 Quote Link to comment
mikela Posted July 15, 2020 Author Share Posted July 15, 2020 (edited) Thanks! Hopefully, I added this to the right location. Should I be concerned with this: "EDAC sbridge: Failed to register device with error -19." ? Also wondering if the cache media and data integrity errors are an issue? tower-diagnostics-20200715-1110.zip Edited July 15, 2020 by mikela Quote Link to comment
JorgeB Posted July 16, 2020 Share Posted July 16, 2020 12 hours ago, mikela said: I added this to the right location. No, it's after append and before initrd, like so: 12 hours ago, mikela said: "EDAC sbridge: Failed to register device with error -19." ? Also wondering if the cache media and data integrity errors are an issue? That should be harmless. 1 Quote Link to comment
mikela Posted July 16, 2020 Author Share Posted July 16, 2020 (edited) 23 hours ago, johnnie.black said: No, it's after append and before initrd, like so: Thanks for the clarification! Here is the diagnostic file. Does this look OK? tower-diagnostics-20200716-1635.zip Given my concern about the SSD, I tried to copy files over using the "Replace A Cache Drive" instructions. I was able to copy everything off the server to a separate hard drive with the exception of my "Windows 10" (vdisk1.img) which is 75.2 GB. I got this error for that file. Not sure if my Windows 10 VM is recoverable at this point or what my next steps should be? Here is the full diagnostic file. tower-diagnostics-20200716-1707.zip Edited July 17, 2020 by mikela Quote Link to comment
JorgeB Posted July 17, 2020 Share Posted July 17, 2020 That suggests there could be a real problem with the NVMe device, try the copy again, if you get the same errors probably not much more you can do, you can try reformatting the device but any more similar errors probably best to replace it. 1 Quote Link to comment
mikela Posted July 17, 2020 Author Share Posted July 17, 2020 (edited) 19 minutes ago, johnnie.black said: That suggests there could be a real problem with the NVMe device, try the copy again, if you get the same errors probably not much more you can do, you can try reformatting the device but any more similar errors probably best to replace it. Thanks, I was afraid that might be the case. It made me realize that I need to take backing up my cache files more seriously. The irony is I have Appdata Backup / Restore V2 installed and never used it. Edited July 17, 2020 by mikela Quote Link to comment
mikela Posted July 30, 2020 Author Share Posted July 30, 2020 (edited) Okay, so Samsung replaced my 512GB 960 Pro with a 970 Pro. I really want to just start over and use the XFS file system, however, the only options I see are "Auto", "btrfs" and "btrfs - encrpted". What am I doing wrong? It shows 2 caches slots when I stop the array and it won't allow me to make it just one. Edited July 30, 2020 by mikela Quote Link to comment
mikela Posted July 31, 2020 Author Share Posted July 31, 2020 Ok, I found a check box to delete saved cache. Once I rebooted, I was able to select just one cache and then xfs appeared. I was able to reformat. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.