Server Becoming Unresponsive Requiring Hard Reboot

SergeantCC4 · November 19, 2022

Recently after updating to 6.11.1 (to correct for the wireguard display glitch) my server would intermittently become unresponsive and I would be forced to perform a hard reboot. This typically occured while I was connected via wireguard to my server (it runs pi hole so I can get benifits remotely).

I had previously gotten a docker image corrupt error but when I was unable to stop/delete the image I simply rebooted the server and it seemed to have fixed the problem. I posted on another topic that I thought I was having the same issue but I was mistaken. I tried to run the memtest built into the server but upon selection the server just rebooted and came back to the unraid boot screen. I read I was supposed to enable CSM but for some reason when I did that my server would no long POST. I disconnected my PCIe JBOD cards, and my boot flash and was able to post, and am now currently running memtest86 v10. The first run just finished and hasn't found any errors but I'm going to let it run through a few more cycles.

My question is however, I have a dual NVMe protected cache array set up and I'm unsure if I have the right settings, and/or I need to do the balance or scrub. It's worked find during 6.10, and I upgraded to 6.11 to take advantage of the iGPU in the 12500 I just upgraded to. I tried to find some documentation, but it seems to be a little fuzzy (or maybe i'm the fuzzy one) about what to do when, where, and why. Anyone have any recommendations?

Thanks in advance!

syslog.txt citadel-diagnostics-20221118-2026.zip

JorgeB · November 20, 2022

There's filesystem corruption on the pool, you should backup and re-format.

SergeantCC4 · November 20, 2022

With those being relatively new devices (<2 months) should I be worried that something is wrong with them? Was there something I could've done to prevent this?

JorgeB · November 20, 2022

Start by running memtest.

SergeantCC4 · November 20, 2022

I saw your post elsewhere to do that so I ran 6 passes total yesterday and it returned zero errors.

JorgeB · November 21, 2022

Then just reformat and see here for better pool monitoring, so you are warned if there are more issues.

SergeantCC4 · November 22, 2022

Thanks @JorgeB I was able to backup my files to the array, reformat the cache pool, and migrate my data back. Some strange networking bugs with wireguard happened but they seemed to fix themselves when I upgraded to 6.11.5. I read the post you linked and set up an hourly check using User Scripts.

I also read a few other forums about the btrfs issues and set up a weekly balance as it seemed that there is really no harm in doing this?

I've had a busy day and didn't get a chance to really look in depth but is there anywhere I can start for a basic understanding of Scrub and Balance and how often if at all they should be run? I'm not sure I understand the purpose of these features.

JorgeB · November 22, 2022

Monthly scrub should be enough, regular balance might not be needed, depends on how the pool is used, but if you want to schedule one a monthly balance using the default block usage should be good.

Server Becoming Unresponsive Requiring Hard Reboot

Recommended Posts

SergeantCC4

Link to comment

JorgeB

Link to comment

SergeantCC4

Link to comment

JorgeB

Link to comment

SergeantCC4

Link to comment

JorgeB

Link to comment

SergeantCC4

Link to comment

JorgeB

Link to comment

Join the conversation