JorgeB Posted March 14 Share Posted March 14 That still doesn't look line an Unraid problem to me, but reboot in safe mode to rule out any plugin issues. Quote Link to comment
MAM59 Posted March 14 Share Posted March 14 (edited) these messages come from a "far away running" clock. you need to stop the ntpd daemon (it wont sync if the current clock is more than a certain limit away from the real time), then issue "ntpdate <server>" (you can use any known ntp server like time.windows.com) ntpdate will sync to the received time without any limit afterwards you can restart ntpd (if this happens again soon after, check the battery on your motherboard) Edited March 14 by MAM59 Quote Link to comment
Astatine Posted March 14 Author Share Posted March 14 4 hours ago, MAM59 said: these messages come from a "far away running" clock. you need to stop the ntpd daemon (it wont sync if the current clock is more than a certain limit away from the real time), then issue "ntpdate <server>" (you can use any known ntp server like time.windows.com) ntpdate will sync to the received time without any limit afterwards you can restart ntpd (if this happens again soon after, check the battery on your motherboard) I tried this. I believe the ntp errors are coming because the npt daemon isn't able to connect with remote "clock" due to DNS issue Quote Link to comment
MAM59 Posted March 14 Share Posted March 14 (edited) 5 minutes ago, Astatine said: I believe the ntp errors are coming because the npt daemon isn't able to connect with remote "clock" due to DNS issue yeah but those "a message from the past..." means that the clock is far away. Of course, without LAN you cannot sync, but the error happened before. still no clue in your logs what is going on. I meant you should sync the clock with ntpdate BEFORE the LAN is gone. Reboot, stop ntpd, use ntpdate, restart ntpd and wait, we want to be sure that the box has the correct time when the error occurs. Edited March 14 by MAM59 Quote Link to comment
Astatine Posted March 14 Author Share Posted March 14 8 minutes ago, MAM59 said: yeah but those "a message from the past..." means that the clock is far away. Of course, without LAN you cannot sync, but the error happened before. still no clue in your logs what is going on. I meant you should sync the clock with ntpdate BEFORE the LAN is gone. Reboot, stop ntpd, use ntpdate, restart ntpd and wait, we want to be sure that the box has the correct time when the error occurs. Okay. I'm checking what JorgeB suggested. Booted in safe mode. I'll observe the server for a while in safe mode and after ntpdate. Just bear with my stupid questions. I just want to understand as much as I can in case something like this happens in the future. Quote Link to comment
MAM59 Posted March 14 Share Posted March 14 1 minute ago, Astatine said: s I can in case something like this happens in the future. so far nobody has an idea what is going on and why... Quote Link to comment
Astatine Posted March 14 Author Share Posted March 14 (edited) @JorgeBOkay, so waited about an hour after booting into safe mode. The server had lost connection. I guess we can rule out a bad plugin from the things causing the issue @MAM59I rebooted and used the following commands to resync ntp /etc/rc.d/rc.ntpd stop ntpdate time.cloudflare.com /etc/rc.d/rc.ntpd restart Right after restarting the service. I checked the logs and found this Mar 14 16:33:36 astatine ntpd[1153]: ntpd exiting on signal 1 (Hangup) Mar 14 16:33:36 astatine ntpd[1153]: 127.127.1.0 local addr 127.0.0.1 -> <null> Mar 14 16:33:36 astatine ntpd[1153]: 216.239.35.0 local addr 192.168.1.4 -> <null> Mar 14 16:33:36 astatine ntpd[1153]: 216.239.35.4 local addr 192.168.1.4 -> <null> Mar 14 16:33:36 astatine ntpd[1153]: 216.239.35.8 local addr 192.168.1.4 -> <null> Mar 14 16:33:36 astatine ntpd[1153]: 216.239.35.12 local addr 192.168.1.4 -> <null> Mar 14 16:35:02 astatine ntpd[18956]: ntpd 4.2.8p15@1.3728-o Fri Jun 3 04:17:10 UTC 2022 (1): Starting Mar 14 16:35:02 astatine ntpd[18956]: Command line: /usr/sbin/ntpd -g -u ntp:ntp Mar 14 16:35:02 astatine ntpd[18956]: ---------------------------------------------------- Mar 14 16:35:02 astatine ntpd[18956]: ntp-4 is maintained by Network Time Foundation, Mar 14 16:35:02 astatine ntpd[18956]: Inc. (NTF), a non-profit 501(c)(3) public-benefit Mar 14 16:35:02 astatine ntpd[18956]: corporation. Support and training for ntp-4 are Mar 14 16:35:02 astatine ntpd[18956]: available at https://www.nwtime.org/support Mar 14 16:35:02 astatine ntpd[18956]: ---------------------------------------------------- Mar 14 16:35:02 astatine ntpd[18960]: proto: precision = 0.034 usec (-25) Mar 14 16:35:02 astatine ntpd[18960]: basedate set to 2022-05-22 Mar 14 16:35:02 astatine ntpd[18960]: gps base set to 2022-05-22 (week 2211) Mar 14 16:35:02 astatine ntpd[18960]: Listen normally on 0 lo 127.0.0.1:123 Mar 14 16:35:02 astatine ntpd[18960]: Listen normally on 1 br0 192.168.1.4:123 Mar 14 16:35:02 astatine ntpd[18960]: Listen normally on 2 lo [::1]:123 Mar 14 16:35:02 astatine ntpd[18960]: Listening on routing socket on fd #19 for interface updates Mar 14 16:35:02 astatine ntpd[18960]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized Mar 14 16:35:02 astatine ntpd[18960]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized kernel reports TIME_ERROR: 0x41: Clock Unsynchronized Edited March 14 by Astatine Quote Link to comment
Astatine Posted March 15 Author Share Posted March 15 (edited) So, I was looking online and found that sometimes the 'Clock Unsync' issue is caused because the BIOS time is incorrect. So, I checked that and lo & behold, the time is correct. As expected the time is in UTC and it correct. Which means we can rule out a faulty CMOS battery, right @MAM59? Anyways, good thing after my recent reboot, I am no longer seeing any clock out of sync errors. However, internet connection remains Edited March 15 by Astatine Quote Link to comment
trott Posted March 15 Share Posted March 15 You can boot with ubuntu live USB and see if everything is ok, if not, then you might start to look at the whole network setting and hardware issue Quote Link to comment
Astatine Posted March 15 Author Share Posted March 15 1 minute ago, trott said: You can boot with ubuntu live USB and see if everything is ok, if not, then you might start to look at the whole network setting and hardware issue Great idea. Gonna do this ASAP. Thanks!!! Quote Link to comment
MAM59 Posted March 15 Share Posted March 15 (edited) 6 hours ago, Astatine said: rule out a faulty CMOS battery, right @MAM59? yeah, but the manual time correction was necessary I think because ntp alone would never have come to the real time. It only adjusts in minimal steps and calculates a correction factor for a (believed) unreliable cmos clock. So it can take ages until it works. ntpdate (once should be enough if the battery is ok) forces an update and stores the received time into the cmos clock. on the next boot, ntp should find an "almost correct" clock and keeps it in sync with its mini-adjustment. (but dont expect too much, the time problem is not what you are searching for, it is just an additional problem that came up because of your lack of internet connection. Anyway, fix it before it comes into your way too much) Edited March 15 by MAM59 Quote Link to comment
Astatine Posted March 15 Author Share Posted March 15 Yeah, totally agreed. Time sync issue seems to have occurred due to loss of connection and not the other way around. However, I there something else that comes to mind in how to diagnose the real issue? I'm going to try booting into a live ubuntu install later in the evening and observe if something similar happens in that too. In the meantime, I was also wondering if there is way to do a fresh install of unRAID while keeping my config and settings intact. That way, if there is a corrupt file somewhere, it would be fixed. Thoughts? Quote Link to comment
itimpi Posted March 15 Share Posted March 15 31 minutes ago, Astatine said: I was also wondering if there is way to do a fresh install of unRAID while keeping my config and settings intact. That way, if there is a corrupt file somewhere, it would be fixed The core Unraid files are unpacked into RAM from the archives on the flash drive every time that Unraid boots so in that sense they should not be corrupt. It is possible that something in the config folder which holds all your settings/customisations is corrupt but if you do not keep that you have to redo your settings. You can simply stop plugins being loaded by booting in Safe Mode, but any other change would require you to do something to your settings. Quote Link to comment
MAM59 Posted March 15 Share Posted March 15 49 minutes ago, Astatine said: I there something else that comes to mind in how to diagnose the real issue? reading your diagnostics give absolutely no clue about what is going on. even the "loss of LAN" is not recognized, so I would go with JorgeB ("the error is outside of unraid"). If the Link to the switch would be lost, it would be noted, if you say "the others" can still comunicate, its also not the link to the router. The only faint (VERY FAINT!) idea that comes to my mind is a blocking port. try to disable "flow control" in the network settings (you need to use the "tips & tricks" plugin for this). But another test you can do (and report the result here) is to wait, until communication is gone. Then pull the ethernet plug, wait a few seconds, then put it back in again. I'm curious to see if a brutal line cut will reset the error state... Quote Link to comment
Astatine Posted March 20 Author Share Posted March 20 On 3/15/2023 at 3:38 PM, itimpi said: The core Unraid files are unpacked into RAM from the archives on the flash drive every time that Unraid boots so in that sense they should not be corrupt. It is possible that something in the config folder which holds all your settings/customisations is corrupt but if you do not keep that you have to redo your settings. You can simply stop plugins being loaded by booting in Safe Mode, but any other change would require you to do something to your settings. I am open to setting up everything from the scratch. How can I have a fresh start without using the limited USB Key resets? I believe all the parity data, appdata, etc. will just remain the same in the new install so long as I assign the correct drive back to the same purpose. As for docker container install, I am willing to do it again and point the new containers to old appdata to bring server back to my latest state. Is there a way to accomplish the above? Quote Link to comment
Astatine Posted March 20 Author Share Posted March 20 On 3/15/2023 at 4:00 PM, MAM59 said: reading your diagnostics give absolutely no clue about what is going on. even the "loss of LAN" is not recognized, so I would go with JorgeB ("the error is outside of unraid"). If the Link to the switch would be lost, it would be noted, if you say "the others" can still comunicate, its also not the link to the router. The only faint (VERY FAINT!) idea that comes to my mind is a blocking port. try to disable "flow control" in the network settings (you need to use the "tips & tricks" plugin for this). But another test you can do (and report the result here) is to wait, until communication is gone. Then pull the ethernet plug, wait a few seconds, then put it back in again. I'm curious to see if a brutal line cut will reset the error state... Well, with all due respect, I find it hard to believe that it is not an unRAID issue. I booted the server to a live ubuntu USB and left it to ping google.com, cloudflare.com & unraid.net for more than a day and the packet loss are as follows: 0% for google.com, 0.00383547% packet loss for unraid.net & 0.00153467% packet loss for cloudflare.com. So, I believe this rules out the possibility of a hardware issue. - Not a router issues as all my other devices are working just fine connecting to the outside world - Not a switch issue as all the other devices connected to the switch are working fine + unRAID server also get connection for 10-15 minutes after reboot and then loses DNS - Not a hardware issue as booting into live ubuntu USB doesn't seem to have the same DNS resolution issue as the unRAID server With this new information on hand, I can confidently say that it is an unRAID issue, whether it originates from the OS or a corrupted config file, that's a different story. ping_cloudflare.txt ping_google.txt ping_unraid.txt Quote Link to comment
JorgeB Posted March 20 Share Posted March 20 Setup new stock Unraid flash drive, no key needed, boot the server with it and test, if no issues it suggest a problem with current /config, you can then try to copy just the bare minimum from the old flash drive and reconfigure the rest of the server, or copy a few config files at a time and re-test to see if you can find the culprit. Quote Link to comment
Astatine Posted March 20 Author Share Posted March 20 1 minute ago, JorgeB said: Setup new stock Unraid flash drive, no key needed, boot the server with it and test, if no issues it suggest a problem with current /config, you can then try to copy just the bare minimum from the old flash drive and reconfigure the rest of the server, or copy a few config files at a time and re-test to see if you can find the culprit. Thanks. I'll get back to you once I've tried it. Quote Link to comment
autumnwalker Posted March 21 Share Posted March 21 I appear to be having the same symptoms. I'm not getting clock sync errors or anything. I can hit anything on my local LAN using IP or DNS entry (using pfSense DNS resolver), but anything that my local DNS cannot resolve (e.g. google.com) I get "Destination Host Unreachable". What's odd is I tried the same ping tests from inside a VM running on Unraid while Unraid is having these issues ... and the VM has no network problems at all. The VM is configured to use the same pfSense DNS resolver. I wasn't having this issue until after I upgraded to 6.11.5. I was previously on 6.10.3. Quote Link to comment
MAM59 Posted March 22 Share Posted March 22 (edited) 6 hours ago, autumnwalker said: (using pfSense DNS resolver) You should check your firewall rules. Likely that you have forgot to allow access from the VM Net, so that everybody but the VM Host can use it. Edited March 22 by MAM59 Quote Link to comment
autumnwalker Posted March 22 Share Posted March 22 4 hours ago, MAM59 said: You should check your firewall rules. Likely that you have forgot to allow access from the VM Net, so that everybody but the VM Host can use it. Thanks! Same network. Also network topology has not changed. This was not an issue until upgrading to 6.11.5. Quote Link to comment
autumnwalker Posted March 22 Share Posted March 22 (edited) @Astatine are you using ipvlan + "Host access to custom networks"? wonder if we're bumping into this? More info: https://forums.unraid.net/bug-reports/stable-releases/6100-6102-network-lost-after-some-days-working-r1971/page/2/?tab=comments#comment-19909 Edited March 22 by autumnwalker Quote Link to comment
Astatine Posted March 24 Author Share Posted March 24 (edited) @autumnwalker Yes, I am using ipvlan + host access to custom networks because I want the prowlarr, jackett containers on Host IP to communicate with some containers that are on a different IP setup using br0 interface. I'll turn off the "host access to custom networks" and see if the issue gets fixed like it did for the folks you mentioned Edited March 24 by Astatine Quote Link to comment
Astatine Posted March 24 Author Share Posted March 24 On 3/20/2023 at 1:15 PM, JorgeB said: Setup new stock Unraid flash drive, no key needed, boot the server with it and test, if no issues it suggest a problem with current /config, you can then try to copy just the bare minimum from the old flash drive and reconfigure the rest of the server, or copy a few config files at a time and re-test to see if you can find the culprit. @JorgeB So, I did what you asked and turns out the test unRAID flash drive has 0 issues. And after going through what @autumnwalker mentioned above, I feel like it's a known issue with how docker handles networking and hasn't been fixed yet. I'll test this theory before anything else. Quote Link to comment
Solution autumnwalker Posted March 24 Solution Share Posted March 24 To add to this - I kept "host access to custom networks" on and went back to macvlan and all my problems went away. I do not have crashing issues with macvlan (luckily). Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.