July 14, 20205 yr When I started up unRAID I got the following email message - unRAID Status: Warning [TOWER] - Cache pool BTRFS missing device(s) Event: Unraid Cache disk message Subject: Warning [TOWER] - Cache pool BTRFS missing device(s) Description: Samsung_SSD_960_PRO_512GB_S3EWNX0J411884E (nvme0n1) Importance: warning I ran a diagnostics when I first turned on unRAID. I then ran a Scrub and there were a large number (over 1K) of uncorrectable errors I then ran another diagnostics Both are attached. I am using unRaid version 6.8.3. I am using 2 dockers, binhex-krusader and steefdebruijn/docker-roonserver. I also have a Windows 10 VM. I am using the following apps: CA Appdata Backup/Restore v2, CA Auto Update, CA Fix Common Problems, Community Applications, Dynamix Local Master, Dynamix SSD Trim, Dynamix System Buttons, Dynamix System Info, Dynamix System Stats, Dynamix System Temp, NerdPack GUI, Preclear Disk, Unassigned Devices & Unassigned Devices Plus (Addon). Here is my equipment list. Any assistance would be greatly appreciated. tower-diagnostics-20200714-0831.zip tower-diagnostics-20200714-0846.zip
July 15, 20205 yr Community Expert Problems with the NVMe device: Jul 14 08:32:43 Tower kernel: nvme nvme0: I/O 769 QID 1 timeout, aborting Jul 14 08:32:43 Tower kernel: nvme nvme0: I/O 698 QID 3 timeout, aborting Jul 14 08:32:43 Tower kernel: nvme nvme0: I/O 128 QID 4 timeout, aborting Jul 14 08:32:43 Tower kernel: nvme nvme0: I/O 0 QID 5 timeout, aborting Jul 14 08:32:43 Tower kernel: nvme nvme0: I/O 1 QID 5 timeout, aborting Jul 14 08:32:43 Tower kernel: nvme nvme0: I/O 2 QID 5 timeout, aborting Jul 14 08:32:43 Tower kernel: nvme nvme0: I/O 3 QID 5 timeout, aborting Jul 14 08:32:43 Tower kernel: nvme nvme0: I/O 5 QID 5 timeout, aborting Jul 14 08:33:13 Tower kernel: nvme nvme0: I/O 769 QID 1 timeout, reset controller Jul 14 08:33:44 Tower kernel: nvme nvme0: I/O 0 QID 0 timeout, reset controller This can sometimes help: Some NVMe devices have issues with power states on Linux, try this, on the main GUI page click on flash, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (on the top right) and add this to your default boot option, after "append" nvme_core.default_ps_max_latency_us=0 Reboot and see if it makes a difference.
July 15, 20205 yr Author Thanks! Hopefully, I added this to the right location. Should I be concerned with this: "EDAC sbridge: Failed to register device with error -19." ? Also wondering if the cache media and data integrity errors are an issue? tower-diagnostics-20200715-1110.zip Edited July 15, 20205 yr by mikela
July 16, 20205 yr Community Expert 12 hours ago, mikela said: I added this to the right location. No, it's after append and before initrd, like so: 12 hours ago, mikela said: "EDAC sbridge: Failed to register device with error -19." ? Also wondering if the cache media and data integrity errors are an issue? That should be harmless.
July 16, 20205 yr Author 23 hours ago, johnnie.black said: No, it's after append and before initrd, like so: Thanks for the clarification! Here is the diagnostic file. Does this look OK? tower-diagnostics-20200716-1635.zip Given my concern about the SSD, I tried to copy files over using the "Replace A Cache Drive" instructions. I was able to copy everything off the server to a separate hard drive with the exception of my "Windows 10" (vdisk1.img) which is 75.2 GB. I got this error for that file. Not sure if my Windows 10 VM is recoverable at this point or what my next steps should be? Here is the full diagnostic file. tower-diagnostics-20200716-1707.zip Edited July 17, 20205 yr by mikela
July 17, 20205 yr Community Expert That suggests there could be a real problem with the NVMe device, try the copy again, if you get the same errors probably not much more you can do, you can try reformatting the device but any more similar errors probably best to replace it.
July 17, 20205 yr Author 19 minutes ago, johnnie.black said: That suggests there could be a real problem with the NVMe device, try the copy again, if you get the same errors probably not much more you can do, you can try reformatting the device but any more similar errors probably best to replace it. Thanks, I was afraid that might be the case. It made me realize that I need to take backing up my cache files more seriously. The irony is I have Appdata Backup / Restore V2 installed and never used it. Edited July 17, 20205 yr by mikela
July 30, 20205 yr Author Okay, so Samsung replaced my 512GB 960 Pro with a 970 Pro. I really want to just start over and use the XFS file system, however, the only options I see are "Auto", "btrfs" and "btrfs - encrpted". What am I doing wrong? It shows 2 caches slots when I stop the array and it won't allow me to make it just one. Edited July 30, 20205 yr by mikela
July 31, 20205 yr Author Ok, I found a check box to delete saved cache. Once I rebooted, I was able to select just one cache and then xfs appeared. I was able to reformat.
Archived
This topic is now archived and is closed to further replies.