Probable cache pool issues

Followers

December 31, 20241 yr

I have an unraid instance with 7 hdd drives and 2 ssd drives as a cache pool. Current version is 6.12.11. The whole thing has been running fine for about a year. A month or so ago, the cache pool seemed to drop out of nowhere, and any shares with the cache pool obviously seemed to stop working. That includes docker, etc.

I had shut the system down in preparation for some kind of diagnostic. Wasn't sure where to start, but before that, I restarted the machine, and everything came back fine. Obviously I wasn't entirely comfortable that everything was actually OK, but life got in the way, and I never got around to looking into it further.

Seemed OK for a month or two, then the same happened again. Reboot did nothing, but shutting down, then starting the machine again seemed to bring the pool back up. However, this time docker doesn't start, and I assume there are other issues I haven't noticed yet. The data drive share doesn't involved the cache, and that seems to work fine.

Logs show errors when trying to mount. That's about all I know right now. I haven't gone deeper, and I'm not really sure how I might try to identify what/if anything is wrong with the cache nor how to repair the situation.

Attaching diagnostic download.

The server does not get heavy use. It's mostly hosting a few docker instances that get relatively light use, and the big data share. If I could remove the cache from the shares as a temporary measure, that would get me through temporarily, but I don't know if that just loses data that possibly wasn't written when the cache pool went down, or if removing the cache might leave those shares broken entirely. Losing some data from the cache wouldn't be a big deal, but rebuilding the whole docker setup would be kind of a bummer, putting it mildly.

If one of the cache drives looks bad, replacing one wouldn't be a big deal, but if one of them was bad, I would expect that to be the message, rather than the whole pool simply going down.

Anyway, thanks in advance.

mediabox-diagnostics-20241230-2026.zip

Quote

Solved by JorgeB

December 31, 20241 yr

Go to solution

December 31, 20241 yr

Community Expert

Log shows that both pool devices dropped offline in the past:

Dec 30 17:14:59 MediaBox kernel: BTRFS info (device sdd1): bdev /dev/sdd1 errs: wr 706980, rd 428, flush 123639, corrupt 0, gen 0
Dec 30 17:14:59 MediaBox kernel: BTRFS info (device sdd1): bdev /dev/sde1 errs: wr 373662, rd 8, flush 32835, corrupt 0, gen 0

Run a correcting scrub on the pool and post the results

Quote

December 31, 20241 yr

Author

Quite a few. Scrub was never run AFAIK. I've scheduled it to run regularly now.

Quote

December 31, 20241 yr

Author

Ran again, nothing corrected, rebooted. Attaching diagnostic again.

mediabox-diagnostics-20241231-0859.zip

Quote

December 31, 20241 yr

Community Expert
Solution

All errors were corrected, that's good, but the docker image is corrupt, you can recreate it:

https://docs.unraid.net/unraid-os/manual/docker-management/#re-create-the-docker-image-file
Then:
https://docs.unraid.net/unraid-os/manual/docker-management/#re-installing-docker-applications
Also see below if you have any custom docker networks:
https://docs.unraid.net/unraid-os/manual/docker-management/#docker-custom-networks

Also recommend taking a look here, to reset the current pool stats and keep monitoring:

https://forums.unraid.net/topic/46802-faq-for-unraid-v6/page/2/#findComment-700582

Quote

December 31, 20241 yr

Author

Thanks so much! That seems to have worked, at least for Docker. If you (or anybody) has some tips for things I should look into WRT the cache pool, that would be much appreciated. I don't exactly trust the stability of the system at this point.

When the pool went down in the past, both drives seem to be "gone", and a power-down/boot was required to make the drives visible again. That does make me think there's something deeper going on, but I don't know Unraid well enough to say for sure that's the case.

For now, I have the cache scrub scheduled, and will schedule the occasional shutdown/boot.

Quote

January 1, 20251 yr

Community Expert

16 hours ago, Kevin G said:

When the pool went down in the past, both drives seem to be "gone", and a power-down/boot was required to make the drives visible again.

If that happens again post the diags before rebooting.

Quote

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Followers

Go to topic listing

Probable cache pool issues

Featured Replies

Solved by JorgeB

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)