MyNameWasTaken Posted January 24, 2023 Share Posted January 24, 2023 This has happened several times now in the past few weeks and I can't figure out a commonality in all the shutdowns so I've come here for help. My server will become unresponsive randomly. When it does this it doesn't shut the PC down but I also can no longer reach it on my network. Somehow it kills my network and I can't get anything back online until I unplug the ethernet either on the tower or on the switch. I can plug it back in after that and everything is fine on my network but I still can't reach my tower and the only way to get it back is a hard reboot. I tried plugging an HDMI into a monitor when it's in this state but I haven't been able to get picture out of the tower past the splash screen ever since I set it up so I didn't think much of it. I tried changing some configs on it, I installed the fix common problems plugin and implemented all the suggestions. It's a very recent build with all new components that I tested thoroughly before assembling. I am at a loss so I started getting the log copied to flash and when it crashed most recently I grabbed the logs and could use some help deciphering what useful info there may be. I don't see a way to attach files to this post so if someone can let me know how to do that or if I need to just copy it to a pastebin I'd appreciate it. Quote Link to comment
trurl Posted January 24, 2023 Share Posted January 24, 2023 See if you can attach Diagnostics to your NEXT post in this thread Quote Link to comment
MyNameWasTaken Posted January 24, 2023 Author Share Posted January 24, 2023 Yes I can attach files now. Thanks. Here are the most recent diags as well as the syslog leading up to the most recent event tower-diagnostics-20230124-0924.zip syslog.txt Quote Link to comment
JorgeB Posted January 24, 2023 Share Posted January 24, 2023 Btrfs is detecting data corruption on both pool devices, this is usually caused by RAM issues, also note that Ryzen with overclocked RAM like you have is known to corrupt data, see here. Quote Link to comment
MyNameWasTaken Posted January 24, 2023 Author Share Posted January 24, 2023 I'm not aware of any overclocked RAM. I made no changes to overclock it out of the box. It's just ddr5-5600 ram with a ryzen 7950x so it might seem like the 5600mhz is overclocked. Does it say somewhere in the diags it's overclocked? 2 hours ago, JorgeB said: Btrfs is detecting data corruption on both pool devices Where are you seeing that? Quote Link to comment
JorgeB Posted January 25, 2023 Share Posted January 25, 2023 11 hours ago, MyNameWasTaken said: I'm not aware of any overclocked RAM I forgot that 7000 is DDR5, according to the diags RAM is running at 3600MT/s, so well withing spec, I don't recall if maximum officially supported speed is 5200 or 5600 for these. 11 hours ago, MyNameWasTaken said: Where are you seeing that? Jan 22 17:24:38 tower kernel: BTRFS info (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 0, rd 0, flush 0, corrupt 47, gen 0 Jan 22 17:24:38 tower kernel: BTRFS info (device nvme0n1p1): bdev /dev/nvme1n1p1 errs: wr 0, rd 0, flush 0, corrupt 29, gen 0 During mount you see any corruption previously detected for this filesystem. Quote Link to comment
MyNameWasTaken Posted January 25, 2023 Author Share Posted January 25, 2023 2 hours ago, JorgeB said: Jan 22 17:24:38 tower kernel: BTRFS info (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 0, rd 0, flush 0, corrupt 47, gen 0 Jan 22 17:24:38 tower kernel: BTRFS info (device nvme0n1p1): bdev /dev/nvme1n1p1 errs: wr 0, rd 0, flush 0, corrupt 29, gen 0 During mount you see any corruption previously detected for this filesystem. Okay thank you for pointing that out so I can maybe recognize it in the future. What do you suggest? Run memtest again to verify the ram and if that passes flip that BIOS setting or adjust c states like in your linked comment? I ran data scrubbing and didn’t find any errors on the cache. Are those drives maybe the issue and need to be replaced? I just got them so they’d definitely be under warranty. Quote Link to comment
JorgeB Posted January 25, 2023 Share Posted January 25, 2023 That's usually not a drive problem, run memtest again, if nothing is found reset the stats and keep monitoring, but if new errors appear there's still a problem. Quote Link to comment
MyNameWasTaken Posted January 25, 2023 Author Share Posted January 25, 2023 Thanks a lot for all the replies it’s been very helpful and educational. I’ve got a memtest going now. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.