Jump to content

Suddenly unable to write to cache


Recommended Posts

Hi all, I've been running UnRAID for about a year now. My current uptime is 25 days. Last night around 10pm MT (90 minutes after my parity check started) all of dockers suddenly stopped working. Here are all of the funky things going on in the server, which I am guessing is due to the Cache drive not being accessible (or something similar wrong with the Cache drive).

 

--

I checked out Fix Common Problems and I have these two issues:

 

Error Found: Unable to write to cache. Suggested Fix: Drive mounted read-only or completely full

Error Found: Unable to write to Docker Image Suggested Fix: Docker Image either full or corrupted.

 

--

On my dashboard, I see my Cache as available, SMART shows the healthy thumbs up, and the utilization is 44% [453GB / 1TB] (which is normal as I have Plex writing thumbnails to the Cache). So nothing wrong here.

 

--

When clicking on my Cache drive and scrolling to the Self-Test section, I can download the SMART report (attached) but the SMART self-test history, error logs, short self-test, and extended self-test buttons are grayed out. The "Last SMART test result" shows "SPIN UP" with a note reading "Unavailable - disk must be spun up." After clicking it, it takes me back to the top of the page and nothing further seems to happen as the SPIN UP is still there.

 

--

When I go into the web terminal and type 

ls /mnt

I see 

/bin/ls: cannot access '/mnt/user': Input/output error
/bin/ls: cannot access '/mnt/cache': Input/output error

and my other disks are available like normal.

 

--

In my settings --> Docker, I have just disabled the docker so the status is stopped. I see the following:

 

Docker vDisk location: /mnt/user/system/docker/docker.img (orange triangle with a white exclamation mark) Path does not exist

Default appdata storage location: /mnt/user/appdata/ (orange triangle with a white exclamation mark) Path does not exist

 

When attempting to re-enable the docker, the docker-tab in the top bar pops back up but the message shows "Docker Service failed to start."

 

--

When I click on the Shares tab at the top, I see the following:

 

There are no exportable user shares

There are no exportable disk shares

 

--

When I click Main at the top and then view each of the individual array devices' contents, I see that all of my array disks still have their folders and data. However, the Cache disk shows 0 objects: 0 directories, 0 files

 

--

The parity check is still running, it is about 92% done. It has currently found 0 errors.

 

--

Other information

The cache drive is a 1 TB M.2 NVME

The file system type for the Cache drive is xfs

 

--

I believe that that's just about everything I've looked at. I haven't done a server reboot since it is going through the parity check right now. I haven't deleted / recreated my docker image since the problem seems to be more related to the Cache drive (even though the docker problems are what tipped me off to it). I've attached the diagnostics as well as the SMART report for the cache drive.

 

Thanks in advance for the help. I'm not really sure where to start with correcting this problem and any help / advice / pointers are appreciated!

tower-diagnostics-20200902-1017.zip tower-smart-20200902-1033.zip

Edited by dapiedude
Link to comment

Cache was not giving SMART in those diagnostics which suggests it was disconnected. You do have some things about your docker configuration that probably should be reconsidered, but I can't know the full extent of that since user shares are broken without cache.

 

Since parity check is nearly complete let it finish, then shutdown, reconnect cache, reboot, and post new diagnostics.

Link to comment

NVMe dropped offline:

 

Sep  1 22:15:36 Tower kernel: nvme nvme0: I/O 92 QID 1 timeout, aborting
Sep  1 22:15:36 Tower kernel: nvme nvme0: I/O 224 QID 2 timeout, aborting
Sep  1 22:15:36 Tower kernel: nvme nvme0: I/O 116 QID 3 timeout, aborting
Sep  1 22:15:36 Tower kernel: nvme nvme0: I/O 90 QID 6 timeout, aborting
Sep  1 22:16:06 Tower kernel: nvme nvme0: I/O 92 QID 1 timeout, reset controller
Sep  1 22:16:37 Tower kernel: nvme nvme0: I/O 1 QID 0 timeout, reset controller
Sep  1 22:17:39 Tower kernel: nvme nvme0: Device not ready; aborting reset

 

Some reports that the newer kernel on the latest Unraid beta helps with this on Ryzen boards.

Link to comment

@trurl - I'll post new diagnostics when it's finished, about an hour or so left. Thanks!!

 

@JorgeB - does this suggest that it dropped offline due to a hardware disconnect or is this a software drop? Also, would I be able to reconnect the dropped NVME from the Tower GUI, or is it something a reboot should fix, or simply a full shut-down then disconnect + reconnect of the NVME? Thanks a lot :)

Edited by dapiedude
accidentally pressed publish
Link to comment

I noticed in earlier diagnostics you had allocated 80G for docker.img

 

Have you had problems filling it? 20G should be more than enough and making it larger won't fix anything, it will just make it take longer to fill.

 

I am running 17 dockers and they are using less than half of 20G.

 

The usual reason for filling docker.img is an application writing to a path that doesn't correspond to a container path in the mappings. Common mistakes are not using the same upper/lower case as in the mappings, or using a relative path.

Link to comment

I hadn't filled up the 20G but I was at around 90% of that 20G and kept getting notifications that the docker.img was close to full. Honestly, it was more out of laziness in finding that notification and turning it off so I just upped it to 80G.

 

Currently I'm using 23.4G with 30 dockers. I just checked the mappings for each of them and they all seemed correct. I've turned down the allocation to 30G though, that way I can watch for bloat! Is there a good way to see how much space each of my dockers are using within the docker.img? I have cadvisor but that isn't helpful when looking at the general docker allocations.

 

Thanks for helping me with all of this and being so forward thinking too!

Link to comment
On 9/3/2020 at 9:31 AM, dapiedude said:

I just checked the mappings for each of them

It isn't so much about the mappings. It is about whether the application is using the container paths in the mappings or instead using some other paths.

 

For example, suppose you have a mapping with a container path of "/download", but the application is using the path "/Download". Since linux is case-sensitive, these are different paths. Or the application might be using "download" instead of "/download". Since "download" is a relative path, it is going to be relative to something inside docker.img.

 

On 9/3/2020 at 9:31 AM, dapiedude said:

see how much space each of my dockers are using

Container Size button at the bottom of the Docker page.

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...