JJJTech Posted February 24, 2023 Share Posted February 24, 2023 Hello Unraid Family - I have had a stable system for a few years now, recently I upgraded my NVME Cache drives from 512GB to 2TB units. They tested perfectly. But now the system will kernel panic every 24-48 hours. I cannot seem actually see what the root cause is, my gut tells me it is not actual the new drives. Attaching diagnostics details in hopes the community could point me in the right direction. unraid-diagnostics-20230224-0725.zip Quote Link to comment
JorgeB Posted February 24, 2023 Share Posted February 24, 2023 Enable the syslog server and post that after a crash, also probably not the problem if it only started now but check this. Quote Link to comment
JJJTech Posted February 24, 2023 Author Share Posted February 24, 2023 Thank you, I enabled syslog server and have it saving to another Unraid box. I don't think C-State is actually an issue since has been super stable but I will check the BIOS settings. MB is a MSI-X570-A-PRO running current firmware. I'll post the system logs if/when it crashes. Quote Link to comment
JJJTech Posted February 25, 2023 Author Share Posted February 25, 2023 First crash log, normally it kernel panics and does not reboot, this time around it rebooted and loaded up on it's own. Did not see anything super obvious in the logs. syslog-192.168.1.15.log Quote Link to comment
JJJTech Posted February 25, 2023 Author Share Posted February 25, 2023 Fresh diagnostics pull syslog-192.168.1.15.log unraid-diagnostics-20230224-1957.zip Quote Link to comment
JorgeB Posted February 25, 2023 Share Posted February 25, 2023 6 hours ago, JJJTech said: Did not see anything super obvious in the logs. Yep, nothing relevant logged, one thing you can try is to boot the server in safe mode with all docker/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one. Quote Link to comment
JJJTech Posted February 25, 2023 Author Share Posted February 25, 2023 Finally captured the error today. System will run fine as a NAS 3+ days, seems to an issue related to DOCKER. - I have already deleted vDisk and rebuilt the dockers. - One docker running = 24-48 hours of stability - Eight plus = about 5 hours of stability Super strange this was a rock-solid system that ran for months, only rebooting for NVIDA driver or OS updates syslog-192.168.1.15.log Quote Link to comment
JorgeB Posted February 26, 2023 Share Posted February 26, 2023 Lot's of crashing but no clues for me on what's causing them, you can try stopping all docker containers then re-enable one by one to see if it's a specif container. Quote Link to comment
JJJTech Posted February 26, 2023 Author Share Posted February 26, 2023 Yeah, that is where I am as well, one docker at a time and seeing what happens. Thanks for taking a look at it. Quote Link to comment
JJJTech Posted March 10, 2023 Author Share Posted March 10, 2023 Finally captured something useful, I think one of my new cache drives is faulty. BTRFS info (device nvme0n1p1: state EA): forced readonly LOGS UNRAID 03092023.txt Quote Link to comment
Solution JorgeB Posted March 10, 2023 Solution Share Posted March 10, 2023 Suggest backing up and re-formatting that pool. Quote Link to comment
JJJTech Posted March 11, 2023 Author Share Posted March 11, 2023 Yep, ran fully NVME test on both drives all came back clean. Backed Up, Formatted and restored the pool. 24 hours up and running so far so good. 1 Quote Link to comment
JJJTech Posted March 17, 2023 Author Share Posted March 17, 2023 So 7-days later past the reformat of my cache drives all is stable and right in the world! Thanks for the assistance. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.