unRAID Status: Unable to write to cache


Recommended Posts

When I started up unRAID I got the following email message - unRAID Status: Warning [TOWER] - Cache pool BTRFS missing device(s)

 

 

Event: Unraid Cache disk message
Subject: Warning [TOWER] - Cache pool BTRFS missing device(s)
Description: Samsung_SSD_960_PRO_512GB_S3EWNX0J411884E (nvme0n1)
Importance: warning

 

I ran a diagnostics when I first turned on unRAID.

 

I then ran a Scrub  and there were a large number (over 1K) of uncorrectable errors

 

I then ran another diagnostics

 

Both are attached.

 

 I am using unRaid version 6.8.3.  I am using 2 dockers, binhex-krusader and steefdebruijn/docker-roonserver.  I also have a Windows 10 VM.

 

I am using the following apps: CA Appdata Backup/Restore v2, CA Auto Update, CA Fix Common Problems, Community Applications, Dynamix Local Master, Dynamix SSD Trim, Dynamix System Buttons, Dynamix System Info, Dynamix System Stats, Dynamix System Temp, NerdPack GUI, Preclear Disk, Unassigned Devices & Unassigned Devices Plus (Addon).

 

Here is my equipment list.

 

Any assistance would be greatly appreciated.

 

tower-diagnostics-20200714-0831.zip tower-diagnostics-20200714-0846.zip

Link to comment

Problems with the NVMe device:

 

Jul 14 08:32:43 Tower kernel: nvme nvme0: I/O 769 QID 1 timeout, aborting
Jul 14 08:32:43 Tower kernel: nvme nvme0: I/O 698 QID 3 timeout, aborting
Jul 14 08:32:43 Tower kernel: nvme nvme0: I/O 128 QID 4 timeout, aborting
Jul 14 08:32:43 Tower kernel: nvme nvme0: I/O 0 QID 5 timeout, aborting
Jul 14 08:32:43 Tower kernel: nvme nvme0: I/O 1 QID 5 timeout, aborting
Jul 14 08:32:43 Tower kernel: nvme nvme0: I/O 2 QID 5 timeout, aborting
Jul 14 08:32:43 Tower kernel: nvme nvme0: I/O 3 QID 5 timeout, aborting
Jul 14 08:32:43 Tower kernel: nvme nvme0: I/O 5 QID 5 timeout, aborting
Jul 14 08:33:13 Tower kernel: nvme nvme0: I/O 769 QID 1 timeout, reset controller
Jul 14 08:33:44 Tower kernel: nvme nvme0: I/O 0 QID 0 timeout, reset controller

 

This can sometimes help:

 

Some NVMe devices have issues with power states on Linux, try this, on the main GUI page click on flash, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (on the top right) and add this to your default boot option, after "append"

 

nvme_core.default_ps_max_latency_us=0

 

Reboot and see if it makes a difference.

  • Thanks 1
Link to comment
12 hours ago, mikela said:

I added this to the right location.

No, it's after append and before initrd, like so:

 

image.thumb.png.f4d10ea3b0107d842f67a071334c7b3e.png

 

 

12 hours ago, mikela said:

"EDAC sbridge: Failed to register device with error -19." ?  Also wondering if the cache media and data integrity errors are an issue?

That should be harmless.

 

 

  • Thanks 1
Link to comment
23 hours ago, johnnie.black said:

No, it's after append and before initrd, like so:

 

Thanks for the clarification!

 

Revised add.png

 

Here is the diagnostic file.  Does this look OK?

 

tower-diagnostics-20200716-1635.zip

 

Given my concern about the SSD, I tried to copy files over using the "Replace A Cache Drive" instructions.  I was able to copy everything off the server to a separate hard drive with the exception of my "Windows 10" (vdisk1.img) which is 75.2 GB.  I got this error for that file.

 

Windows 10 move failure 1.png

 

Not sure if my Windows 10 VM is recoverable at this point or what my next steps should be?  Here is the full diagnostic file.

 

tower-diagnostics-20200716-1707.zip

Edited by mikela
Link to comment
19 minutes ago, johnnie.black said:

That suggests there could be a real problem with the NVMe device, try the copy again, if you get the same errors probably not much more you can do, you can try reformatting the device but any more similar errors probably best to replace it.

Thanks, I was afraid that might be the case.  It made me realize that I need to take backing up my cache files more seriously.  The irony is I have Appdata Backup / Restore V2 installed and never used it.

Edited by mikela
Link to comment
  • 2 weeks later...

Okay, so Samsung replaced my 512GB 960 Pro with a 970 Pro.  I really want to just start over and use the XFS file system, however, the only options I see are "Auto", "btrfs" and "btrfs - encrpted".  What am I doing wrong?  It shows 2 caches slots when I stop the array and it won't allow me to make it just one.

Edited by mikela
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.