flyize Posted September 24, 2023 Share Posted September 24, 2023 (edited) When this happened yesterday, I was able to eventually get Pihole to load, which showed load at like 200 and memory at 100%. Server responds to ping and my Home Assistant VM continues to control devices. I set a syslog mirror, so that (final three hours) and diags are attached. Edited September 25, 2023 by flyize Quote Link to comment
flyize Posted September 24, 2023 Author Share Posted September 24, 2023 It just happened again. I can still ping the server, but can't get to web UI or ssh in. However, my Home Assistant VM is still up and running just fine. Quote Link to comment
flyize Posted September 24, 2023 Author Share Posted September 24, 2023 And now Home Assistant is down. Quote Link to comment
Mainfrezzer Posted September 24, 2023 Share Posted September 24, 2023 mhmm i cant find a reason right now but you do probably wanna set the samba config domain master = no preferred master = no on some devices so that they stop fighting. Quote Link to comment
flyize Posted September 24, 2023 Author Share Posted September 24, 2023 I saw that too. Never noticed those errors before. I'll get that fixed. Probably unrelated though, correct? Quote Link to comment
flyize Posted September 25, 2023 Author Share Posted September 25, 2023 So I think I resolved the master browser issue by letting Home Assistant handle it (since it irritatingly has no way to control it). It's been almost 24 hours and I'm crossing my fingers. Seems really unlikely that fixed it. Quote Link to comment
flyize Posted September 25, 2023 Author Share Posted September 25, 2023 Server is down again. Please can anyone help? Quote Link to comment
JorgeB Posted September 25, 2023 Share Posted September 25, 2023 Nothing obvious in the partial log posted, I would recommend posting the complete syslog, some issues are known to leave call traces days before crashing. Quote Link to comment
flyize Posted September 25, 2023 Author Share Posted September 25, 2023 2 minutes ago, JorgeB said: Nothing obvious in the partial log posted, I would recommend posting the complete syslog, some issues are known to leave call traces days before crashing. Anything PII I should remove? Quote Link to comment
flyize Posted September 25, 2023 Author Share Posted September 25, 2023 (edited) 2 hours ago, JorgeB said: Nothing obvious in the partial log posted, I would recommend posting the complete syslog, some issues are known to leave call traces days before crashing. Went ahead and removed email addresses. Here it is. Thank you so much for any help. Edited September 25, 2023 by flyize Quote Link to comment
JorgeB Posted September 25, 2023 Share Posted September 25, 2023 Unfortunately there's still nothing relevant logged, this usually points to a hardware issue, one thing you can try is to boot the server in safe mode with all docker/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one. Quote Link to comment
flyize Posted September 25, 2023 Author Share Posted September 25, 2023 (edited) 5 minutes ago, JorgeB said: Unfortunately there's still nothing relevant logged, this usually points to a hardware issue, one thing you can try is to boot the server in safe mode with all docker/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one. My friends and family will kill me. Wait, I just realized that I added a second NVMe for appdata a couple of weeks ago. It's been totally stable, but maybe that's it. I'll remove it. Then try to run memtest. Anything else I can 'actively' run to try and figure out the issue more quickly? edit: Wait, how do I remove the mirrored NVMe? Edited September 25, 2023 by flyize Quote Link to comment
JorgeB Posted September 25, 2023 Share Posted September 25, 2023 3 minutes ago, flyize said: Wait, I just realized that I added a second NVMe for appdata a couple of weeks ago Unlikely this would make the server crash, start with memtest, also try another PSU if available. 1 Quote Link to comment
flyize Posted September 25, 2023 Author Share Posted September 25, 2023 Actually now that I think about it, the server is *not* crashing as it responds to pings. Also, one time that I was able to login to the PiHole docker, it showed CPU and RAM maxed. So this doesn't seem like it could be hardware. Unfortunately, I don't have any way to run top or anything to see what's using all those resources. Quote Link to comment
flyize Posted September 26, 2023 Author Share Posted September 26, 2023 Just following up on this. At this point, I cant see how its hardware. If something is eating up all the RAM, couldn't OOM be doing this? Wouldn't that cause things to not work, high CPU, but still have network connectivity? Quote Link to comment
JorgeB Posted September 26, 2023 Share Posted September 26, 2023 21 hours ago, JorgeB said: one thing you can try is to boot the server in safe mode with all docker/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one. Did you try this? Quote Link to comment
flyize Posted September 26, 2023 Author Share Posted September 26, 2023 I let it run overnight with everything on but the Plex container and it stayed up. So I just pulled a Plex release from two weeks ago. Seems unlikely that Plex would have some memory leak, but maybe? Does OOM killing things make sense to you? Quote Link to comment
flyize Posted September 26, 2023 Author Share Posted September 26, 2023 Actually maybe it is Plex https://forums.plex.tv/t/pms-1-32-6-hw-transcoding-issues-and-corrections/853757 They pulled the 1.32.6 release. Quote Link to comment
JorgeB Posted September 26, 2023 Share Posted September 26, 2023 54 minutes ago, flyize said: Does OOM killing things make sense to you? It could be. Quote Link to comment
flyize Posted September 26, 2023 Author Share Posted September 26, 2023 I'm probably going to curse myself by saying this, but I think it was Plex that was crashing everything. If its still up tomorrow, I'll be confident that its fixed. Quote Link to comment
flyize Posted September 26, 2023 Author Share Posted September 26, 2023 I sure did. The server crashed again but is still responding to pings. This is driving me crazy! Quote Link to comment
shaunvis Posted September 27, 2023 Share Posted September 27, 2023 I'm assuming you're on 6.12, correct? If so, try 6.11.5. Lots of people, myself included can't go a day on 6.12 without it doing this exact sort of thing. Have to do a hard reboot, then it works for a little while again. I've tried each version of 6.12 and always end up back on 6.11.5 where I have no issues. Quote Link to comment
flyize Posted September 27, 2023 Author Share Posted September 27, 2023 52 minutes ago, shaunvis said: I'm assuming you're on 6.12, correct? If so, try 6.11.5. Lots of people, myself included can't go a day on 6.12 without it doing this exact sort of thing. Have to do a hard reboot, then it works for a little while again. I've tried each version of 6.12 and always end up back on 6.11.5 where I have no issues. I'm kinda out of ideas. It's been running fine for weeks until this. Quote Link to comment
shaunvis Posted September 27, 2023 Share Posted September 27, 2023 23 minutes ago, flyize said: I'm kinda out of ideas. It's been running fine for weeks until this. Try downgrading to 6.11.5. It's the last version that runs fine for many people Quote Link to comment
flyize Posted September 27, 2023 Author Share Posted September 27, 2023 Yep, I'm running it now. Hopefully I can report back tomorrow that its still up and running. I still think it has to be some memory leak somewhere causing OOM to kill everything. That would explain the one time I was able to get into PiHole and see CPU/memory maxed. And sometimes the Home Assistant VM was still available. And *every* time, I could ping it. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.