[6.8.0] Removed Cache Disk + Cache Enabled -> Kernel Crash

trurl · January 6, 2020

Since this seems to be a case of shooting yourself in the foot, I expect this report will be downgraded to "Minor" if not moved to General Support instead, but I will leave it for now. I'm pretty sure there is already a check for cache with mover but maybe it doesn't work in your scenario.

But, your scenario is not entirely clear since you didn't post any diagnostics, that snippet of syslog is incomplete at best, and you didn't give any clear directions for how to reproduce the problem. For future reference, here is the guideline on posting a bug report:

https://forums.unraid.net/bug-reports/stable-releases/report-guidelines-r68/

Quote

In my Unraid box I used to have an SSD cache disk. Some months back I removed the cache disk from cache duty, let it be mounted as just a disk directly on the filesystem. Ostensibly at this point I left cache enabled, just with no disks

This seems to be the most important part of your post, but it is also the part that is most unclear. I assume you mean you added the SSD to the parity array. SSDs aren't recommended in the parity array, but I will leave that for now. (The word "filesystem" usually means something else)

There isn't any specific place where cache is enabled so I don't know what you mean by that part. If you mean there were user shares set to use cache I don't think that would matter. I know cache-yes and cache-prefer would just overflow to the array. I'm not entirely sure what cache-only would do without cache though. Is there something else you had in mind when you said cache was enabled?

Could you give a more complete, step-by-step, description of what exactly you did, that stuff that you only summarized in the part I quoted above?

streaky81 · January 6, 2020

The diagnostics aren't super relevant given the issue no longer exists for me if I got them now that's not the state it was in when it was crashing. I could try to reproduce but I don't fancy intentionally making my live server kernel crash.

I thought the reproduction steps were reasonably clear, but, y'know, sorry:

Enable cache, assign disk to cache, start array. Unassign disk but leave the cache disk count as 1/cache enabled and then start array again - for me that left mover scheduled and it caused a crash. Setting cache size to 0 then disabling it all fixed the issue.

The kernel crash may well be specifically contained to my hardware, I get that, but if mover didn't run it wouldn't have caused it. It definitely was doing *something*.

As I said before it's fixed for me, even if it becomes a wontfix hopefully it helps somebody who has a similar setup and their server is seemingly randomly disappearing off their network..

Edited January 6, 2020 by streaky81

JorgeB · January 6, 2020

I can't reproduce this, if I unassign all cache devices, leaving slots as they were I get this on the log:

root: mover: cache not present, or only cache present

mover is not executed

JorgeB · January 6, 2020

It might only happen during specific circumstances but we'd need to know how to reproduce, so I'm going to change status for now until OP or any other user adds more info.

JorgeB · January 6, 2020

Changed Status to Retest

Changed Priority to Minor

JonathanM · January 7, 2020

20 hours ago, johnnie.black said:
I can't reproduce this, if I unassign all cache devices, leaving slots as they were I get this on the log:
root: mover: cache not present, or only cache present
mover is not executed

Try this.

After you unassign the physical cache devices, try creating a /mnt/cache folder, like what would happen if a container were mis-configured to use the disk path instead of /mnt/user

I suspect the OP was filling up RAM with some misconfiguration, causing the crash.

JorgeB · January 7, 2020

1 hour ago, jonathanm said:

After you unassign the physical cache devices, try creating a /mnt/cache folder, like what would happen if a container were mis-configured to use the disk path instead of /mnt/user

Good idea, but the mover still doesn't run with the same error, looking at the mover script it checks for the existence of the user0 mount point:

if ! mountpoint -q /mnt/user0 ; then
    echo "mover: cache not present, or only cache present"
    exit 3

So even if /mnt/user0 was created manually the mover script wouldn't run, since the mount point wouldn't exist, also looking more carefully at the OP's log snippet you can see that the mover exited because of the same check:

Jan  5 18:00:01 unraid crond[1826]: exit status 3 from user root /usr/local/sbin/mover &> /dev/null

Exit status 3 is because /mnt/user0 mountpoint doesn't exist, the difference in how it was logged for me is because of mover logging enable vs disable, the mover script also wasn't running for the OP, so I can't see how it was causing the errors, but it's a bit suspicious the errors starting 30 seconds after the mover script is called, but coding isn't really in my wheelhouse, so not sure if it's related or not.

trurl · January 7, 2020

1 hour ago, johnnie.black said:

the mover script it checks for the existence of the user0 mount point

I thought user0 was deprecated and no longer used by Mover.

JorgeB · January 7, 2020

2 minutes ago, trurl said:

I thought user0 was deprecated and no longer used by Mover.

It's not used for the move operation as it was before with rsync, but it's apparently still used for that sanity check.

[6.8.0] Removed Cache Disk + Cache Enabled -> Kernel Crash

User Feedback

Recommended Comments

trurl 2,984

Link to comment

streaky81 0

Link to comment

JorgeB 8,215

Link to comment

JorgeB 8,215

Link to comment

JorgeB 8,215

Link to comment

JonathanM 2,446

Link to comment

JorgeB 8,215

Link to comment

trurl 2,984

Link to comment

JorgeB 8,215

Link to comment

Join the conversation