Patmanduu Posted February 11 Share Posted February 11 PowerEdge R730xd Sometime during the night, my otherwise rock solid server lost network connectivity. I can currently only access it through the physical terminal. I can't access the Web GUI remotly or the network shares or SSH. Things I've tried: - Power cycled several times. - Booting in Safe Mode (no GUI, no plugins). Still couldn't access the web gui remotely. GUI mode has never worked for me on this box (black screen). - From the server, I can ping Google but not other devices on the network or the gateway (shows "destination host unreachable ping: sendmsg: destination address required"). I cannot ping the server from other devices on the network either. - The router has the server's IP address statically mapped, and I see no signs of a address conflict. - ifconfig does not show an IP address assigned to eth0, my static address is assigned to br0. I never looked in here before so I can't say if this is unusual or not. - The array appears to be running normally as I can browse files from the CLI. - Tried stopping the array (and thus docker service) from the CLI, and restarting the network service. No luck there. - Dell's integrated remote access controller (idrac) shows no isues. - MyServers shows server is offline. I have it dumping syslog to another device running rsyslog pi. Around the time it stopped transmitting the server was auto updating docker containers and plugins. No errors reported. The only network related updates I notice are swag and dynamix.my.servers. The NerdTools plugin also updated (this is just to control the Dell fans). Basically everything is fine, except I cannot access the server I am at a loss. Any guidance is appreciated. Happy to provide a more recent syslog, but I forget how to dump it to the flash drive. Quote Link to comment
JorgeB Posted February 11 Share Posted February 11 You can post the diagnostics to see if there's something visible there. Quote Link to comment
Patmanduu Posted February 11 Author Share Posted February 11 @JorgeB Thanks. spring-diagnostics-20240211-1602.zip Quote Link to comment
Patmanduu Posted February 12 Author Share Posted February 12 I feel like eth0 should have an IP here? The server's static IP should be 192.168.1.201. Quote Link to comment
JorgeB Posted February 12 Share Posted February 12 Type use_ssl no and then see if you can access using the IP http://192.168.1.201 Quote Link to comment
Patmanduu Posted February 12 Author Share Posted February 12 Tried, no luck: Also tried unassigning the static IP mapping on my router, just to definitely rule out a conflict. The DHCP gave it a new address but same result, no connection, so reverted that change. I still think it's odd I can't ping the router/gateway from the server: Thanks for the help BTW! Quote Link to comment
JorgeB Posted February 12 Share Posted February 12 Try booting with a new flash drive using a stock Unraid install, no key needed, that will confirm if the problem is config related. Quote Link to comment
Patmanduu Posted February 12 Author Share Posted February 12 Okay, did that. The new flash drive boots fine into 6.12.6 and I can access the web gui at the usual address (192.168.1.201), got the prompt to set a new root password. I suppose that rules out any hardware or network issues, but now what? Quote Link to comment
Solution JorgeB Posted February 12 Solution Share Posted February 12 You can backup the current flash drive and then redo it and just restore the bare minimum, like the key, super.dat and the pools folder for the assignments, also copy the docker user templates folder, if all works you can then reconfigure the server or try restoring a few config files at a time from the backup to see if you can find the culprit. Quote Link to comment
Patmanduu Posted February 12 Author Share Posted February 12 (edited) I did the following: Downloaded 6.12.4 (same as original version). Backed up the flash disk (copy paste). Re-flashed with manual install method (Rufus. UNRAID USB Creator has never worked for me with SanDisk USB drives). Booted as a test. Got a kernel panic (image below). Rrepeated with a 2nd USB drive, so there coudl be something funky with 6.12.4 Repeated all above using 6.12.6, this time it booted successfully. Remove USB drive and copied over: config/pro.key config/pools config/super.dat config/plugins/dockerMan This boots and I can access the web-gui. What now? Should I just restore files and folders from the backup at random and re-test? Is there a streamlined way of doing this? Should I be worried about copying config files from 6.12.4 into 6.12.6. Edited February 12 by Patmanduu Quote Link to comment
trurl Posted February 12 Share Posted February 12 1 minute ago, Patmanduu said: Should I be worried about copying config files from 6.12.4 into 6.12.6. That shouldn't be a problem. Probably most things except plugins should be fine. All the rest is just .cfg files, settings from the webUI. They are all text so you can examine them. If any look corrupt don't copy. If you had anything in the extra folder on flash leave that out too. Quote Link to comment
Patmanduu Posted February 12 Author Share Posted February 12 (edited) WireGuard was the culprit! After a lot of semi-systematic trial and error, I concluded that restoring the original: /config/wireguard directory breaks the UNRAID Web GUI. To confirm, I restored all the original /config directories and files except /config/wireguard (left that blank). The system now boots fine and I can get into the Web GUI. I can also now ping the gateway and other clients on the LAN. Thanks @trurl for the note. Since everything is working, I think I will leave it like this. The system claims to be on 6.12.6, I expect that is because I did not replace anything in the root of the flash drive, only the config directory (minus wireguard). I can do without WireGuard. As to the root cause... I am not sure. I had been having problems with connecting remotely with WireGuard for the last year. The issues were limited to connecting iOS devices. I fixed the issue for a time (there is a thread on this somewhere) by updating this setting in CloudFlare (my DNS+CA the VPN uses): But this broke again a month or so later. The most recent config change I made in order to fix the iOS issue was to change the Access Type on a number of the peers. That was a few weeks ago. I probably had not rebooted since. No idea what caused an issue now? I don't know if WireGuard updates along automatically along with the other plugins now that its integrated? I will have to check. I will probably move to a monthly manual update model for the whole server moving forward, and take backups of the flash drive before and after. Thanks for all the support @JorgeB. You guys are awesome as always! Edited February 12 by Patmanduu Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.