Server crashes when adding a second disk to the cache pool

Followers

February 1, 20224 yr

I have installed unraid 6.9.2 on a new server. I'm using 4 disks of which 1 is parity. I added an Samsung 980pro 2TB disk as cache. I've installed a couple of Docker containers and I'm not using any VM's. All was running smooth till then. Since I don't want to loose my data when the cache disk breaks down I added a second cache disk (same model/size as the other one). Since I did that my server crashes constantly. Sometimes right after starting the array and the docker containers but usually within a few minutes. I've created a syslog server so I could save the logs before the crash but nothing is written to the logs. The server become totally unresponsive and I can't connect to it anymore. When I removed the second disk from the cache pool everything worked again. And when putting it back in it crashed again. The disk I have added is brand new and both the SMART and the badblocks test don't show any errors. What I sometimes notice is that the CPU lock up right before the crash:

When I have top open I see a lot of wait states from the CPU:

Anyone have any idea what could be wrong or how I can debug the issue?

brain-diagnostics-20220201-1623.zip

Quote

February 5, 20224 yr

Author

Debugging the last couple of days; what I have tried without success:

- upgrade to 6.10-rc2

- switch network from macvlan to ipvlan

Then I managed to get logs just before the crash, there seem to be nvme errors:

Strange thing is that it also happens on the other drive:

It seems one of the nvme drives produces a read error and after that the server hangs on in an I/O wait state. I googled but could find any solution, some suggest it has to do with power management settings. I changed the disks to never spin down (they are ssd's so no need anyway) without any success.

Any help would be appreciated, I'm a bit lost now.

Quote

February 5, 20224 yr

Hi,

New to unraid but have browsed hardware quite a bit. Some motherboards makes one onboard SATA-port unavailable when you use specific m2-slots on the motherboard. Don't know if this can be your case?

Quote

2 weeks later...

February 15, 20224 yr

Author

Hi Felixen,

Thank you for your suggestion. I checked the Mainboard and this is not the issue. Sata 5 and 6 are shared with m2 but I am only using Sata 1 - 4.

I did some more research and tried out a lot of different things, all without success. The things I tried are:

Prevent the m.2 drive to go to sleep with the kernal parameter:

nvme_core.default_ps_max_latency_us=0

Do some more m.2 tweaks:

pcie_aspm.policy=performance

pcie_aspm=off

pcie_port_pm=off

nvme_core.default_ps_max_latency_us=0

nvme_core.io_timeout=255

nvme_core.max_retries=10

nvme_core.shutdown_timeout=10

Disable IOMMU:

iommu=off

It seem to happen more often when there is heavy I/O. When the system is mostly idle it will stay online for hours but when there is heavy I/O it'll crash within minutes. I still don't have any clue what is causing this.

Quote

11 months later...

January 27, 20233 yr

Have you found a solution yet?

I'm facing the same problem right now.

Quote

3 weeks later...

February 13, 20233 yr

Author

I've upgraded to a newer version and the issue magically went away but I still don't know why.

Quote

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Followers

Go to topic listing

Server crashes when adding a second disk to the cache pool

Featured Replies

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)