February 24, 20233 yr Hello Unraid Family - I have had a stable system for a few years now, recently I upgraded my NVME Cache drives from 512GB to 2TB units. They tested perfectly. But now the system will kernel panic every 24-48 hours. I cannot seem actually see what the root cause is, my gut tells me it is not actual the new drives. Attaching diagnostics details in hopes the community could point me in the right direction. unraid-diagnostics-20230224-0725.zip
February 24, 20233 yr Enable the syslog server and post that after a crash, also probably not the problem if it only started now but check this.
February 24, 20233 yr Author Thank you, I enabled syslog server and have it saving to another Unraid box. I don't think C-State is actually an issue since has been super stable but I will check the BIOS settings. MB is a MSI-X570-A-PRO running current firmware. I'll post the system logs if/when it crashes.
February 25, 20233 yr Author First crash log, normally it kernel panics and does not reboot, this time around it rebooted and loaded up on it's own. Did not see anything super obvious in the logs. syslog-192.168.1.15.log
February 25, 20233 yr Author Fresh diagnostics pull syslog-192.168.1.15.log unraid-diagnostics-20230224-1957.zip
February 25, 20233 yr 6 hours ago, JJJTech said: Did not see anything super obvious in the logs. Yep, nothing relevant logged, one thing you can try is to boot the server in safe mode with all docker/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one.
February 25, 20233 yr Author Finally captured the error today. System will run fine as a NAS 3+ days, seems to an issue related to DOCKER. - I have already deleted vDisk and rebuilt the dockers. - One docker running = 24-48 hours of stability - Eight plus = about 5 hours of stability Super strange this was a rock-solid system that ran for months, only rebooting for NVIDA driver or OS updates syslog-192.168.1.15.log
February 26, 20233 yr Lot's of crashing but no clues for me on what's causing them, you can try stopping all docker containers then re-enable one by one to see if it's a specif container.
February 26, 20233 yr Author Yeah, that is where I am as well, one docker at a time and seeing what happens. Thanks for taking a look at it.
March 10, 20233 yr Author Finally captured something useful, I think one of my new cache drives is faulty. BTRFS info (device nvme0n1p1: state EA): forced readonly LOGS UNRAID 03092023.txt
March 11, 20233 yr Author Yep, ran fully NVME test on both drives all came back clean. Backed Up, Formatted and restored the pool. 24 hours up and running so far so good.
March 17, 20233 yr Author So 7-days later past the reformat of my cache drives all is stable and right in the world! Thanks for the assistance.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.