August 25, 20223 yr Diagnostics and syslog that I mirrored to flash so that I had more data when/if it happened again. The lockup (as I'll call it) occurred between 18:14:30-18:16:50. I know as I have a monitor checking for a Docker app that runs that goes down just like everything else when it happens. I cannot even ping the server itself whatsoever once it happens, yet it was still logging for the 15 minutes or so until I hard rebooted it. It has happened about 5 times now over 2 weeks. It has been a few days, thought it may have been resolved. I reseated the memory and ran MemTest 9 until it said PASS after the 4 passes it does. Originally, unless unrelated, this problem started with something happening overnight, couldn't access anything, after reboot I couldn't see files on cache drive. After xfs repair, I could, everything okay. Then 2 days or so later the regular lockups occurred, maybe 2 days, 24 hours, 5 hours in between, very random, wasn't usually using server firsthand when it happens except once. Nothing stood out in syslog to me around lockup time, but it wasn't a log I am used to reading. I did see once I think that memory log showed 100% full on dashboard while it still kind of worked. I tried to reboot, wouldn't do it, had to hard reboot. I didn't see it seem to fill up quickly when monitoring it. At that time, CompreFace was pegging CPU, but even after keeping off, it still locked up once. I'd appreciate any help, please. This is driving me mad. I rely on this server so much. Thanks. just over 2.5 years old. No issues up until this. Edited September 4, 20223 yr by Iceman24
August 26, 20223 yr Community Expert Log shows many NMI events before the crash, and before those events are some WSD strange errors, WSD is known to some times cause high CPU usage, so I would starta with disabling that: Settings -> SMB Settings
August 26, 20223 yr Author Thanks, JorgeB. I will try that now. I did notice the CompreFace container giving me issues again, this time not stopping. It was holding up almost everything else. I had to stop all other containers individually. I can't even kill the process. I had to hard reboot it again. Ugh. I've had this container for months, no recent update and like I said before, problem happens without it running. Edit: I have had the option "-i br0" on WSD this whole time. I don't recall why, read about it years ago. Edited August 26, 20223 yr by Iceman24
August 26, 20223 yr Author Another thing is that every time now I have to turn off Docker service, go into network settings, modify DNS just to the point that after I put settings back, I apply. I didn't even change anything. Then restart Docker service, then I have proper DNS connectivity the way I have things setup. What issues I have varies, but sometimes it's as far as delays to ping anything on the Internet at all, but it's all good once I do what I just said.
August 29, 20223 yr Author It happened again just now. More diagnostics and syslog attached. 20:11:30-20:13:15 time of lockup. I did notice the "rcu_sched self-detected stall on CPU" error, some other info I found on it, but I will leave that to someone more knowledgeable to advise me on how to move forward with my particular server. Thanks. I could update from 6.9.2, was hesitant due to some issues people had. Edited September 4, 20223 yr by Iceman24
August 29, 20223 yr Community Expert Solution Aug 26 12:18:49 BlackIce kernel: macvlan_broadcast+0x10e/0x13c [macvlan] Aug 26 12:18:49 BlackIce kernel: macvlan_process_broadcast+0xf8/0x143 [macvlan] Macvlan call traces are usually the result of having dockers with a custom IP address and will end up crashing the server, upgrading to v6.10 and switching to ipvlan should fix it (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enabled, top right)).
September 4, 20223 yr Author Has been good, issue appears resolved. Thanks, again. Glad to be over this.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.