Jump to content

Astatine

Members
  • Posts

    36
  • Joined

  • Last visited

Posts posted by Astatine

  1. Quote

    If you want to continue using macvlan then you need to disable bridging on eth0 as mentioned in the Release Notes.

    Understood. I'll go through the release notes to understand the new way to do bridging. 

     

    Anyways, what should I do with the potentially corrupt config? My server renamed itself to 'Tower'. ident.cfg file lists name as 'Tower'. I don't know what else might have been reset back to default. Do I need to start with 'New Config'? 

  2. Quote

    Macvlan call traces will usually end up crashing the server, switching to ipvlan should fix it (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enabled, top right)), then reboot.

    I previously had issues with docker containers, thread linked. Which was solved by switching to macvlan.

     

    If I go back to that, can you suggest how to avoid the DNS issue again?


    I'll try switching back to ipvlan this evening and see if it makes things better

  3. So, I was moving houses and couldn't pay attention to my UNRAID server for quite some time. I have since replaced a corrupt USB drive. Transferred my key to a new USB. 
    It was working properly for some time but soon after that I started noticing that the server would "freeze" as in the docker services would be running but a couldn't access server using the IP. 

    I also noticed that during boot there are these number list (my guess memory block list) which reads so:1312504590_WhatsAppImage2024-01-19at18_27.42_67defd3d.thumb.jpg.621740392244c319ccc3ac81e18a95c7.jpg 

    I don't understand what this is. Then the situation worsened. Now, docker containers would just randomly stop working and when tried to restart them, I would receive popup either saying "403" or "Execution error". I then either had to restart the docker service or the whole server to be able to get them to work. My guess was that the docker file is corrupted. But I couldn't still act to fix that due to life. 

    image.thumb.png.2c51ee4eba0d3921fdb4ad23e69e40cd.png

     

    Then the whole server started freezing. I couldn't access it over the IP, same for docker containers until I restart the machine which resulted in a ton of unclean shutdowns. 

     

    And as of yesterday, after restart, the docker containers are only working for 5-10 minutes and my unraid machine name went back to "Tower" which according to this could be config issue: Server name changed from "N2" to "Tower" - General Support - Unraid

    Also, I am no longer able to access the server over https://[IP] but http://[IP] is showing the login screen but after entering the credentials, it doesn't take me in. Just keeps redirecting to login screen. 

     

    I convinced that something is seriously off in the /config in the USB. 

     

    Can someone help me track down the issue and fix it? or is it better to start from scratch? 

    If so, when I do 'New Config' and assign the devices to their current purpose, will my data be retained? both array and cache? I understand that new config will definitely rebuild the parity.

     

    I try to somehow get the diag. report from the server directly and post it here soon

    Here are the diagnosticstower-diagnostics-20240119-1839.zip


    Also, when I clicked 'Download' on the diagnostics page, I saw the following error on the server cli screen: 

    433642651_WhatsAppImage2024-01-19at18_41.14_56818d66.thumb.jpg.6a4d31598ad098bba211e1a47073ca74.jpg

     

    Please help. It is getting very frustrating to deal with!!!!

  4. 1 minute ago, trurl said:

    Are you sure you don't have the array started in Maintenance mode?

    Ah, I see. I just restarted in Safe Mode without Maintenance and can see the disks. Guess I turned on Maintenance Mode earlier. I didn't find something on Google hence posted. Sorry for the inconvenience. 

  5. Hi everyone, I am running unRAID 6.12.6 and after a reboot just now, none of the drives are showing up in /mnt

    I can see the drive in 'Main' tab and cannot even see the "drive explorer" button for the drives.

    msedge_dpRlbgThdQ.thumb.png.b8495ca8f865368a7603c62a50e36425.png

    (screenshot of Main tab)

     

    But I cannot see any shares or disks mounts in /mnt folder

    image.thumb.png.4e600e2c835a2ef57b6571d4852d9293.png

     

    Please help me figure this out as to why I cannot see the none of the disk, cache or otherwise. The SMART report in the report is looking good to me but maybe I am wrong. 

     

    The diagnostics report is attached to the post: astatine-diagnostics-20231216-2241.zip

    Thank you in advance!

     

  6. Just had a weird this happen to me. I pulled up the UNRAID web UI, everything working as usual. I go Apps tab and the page is upside down. The banner/header was still in the right orientation. I switch between tabs, no effect. I restart the browser and go back to Apps tab and I see a popup asking me 'Yes' or 'No' about continuing the International Bat Appreciation Day celebration. The popup dialog had a link to this webpage: https://nationaltoday.com/international-bat-appreciation-day/

     

    Has anyone else seen this?

  7. 21 hours ago, autumnwalker said:

    To add to this - I kept "host access to custom networks" on and went back to macvlan and all my problems went away. I do not have crashing issues with macvlan (luckily).

    Well look at that. Changing 'ipvlan' to 'macvlan' to fixed the issue. I kept all the other settings as is. Now, I am more curious as to why ipvlan had that issue and it didn't occur for the first few minutes after reboot. 

  8. On 3/20/2023 at 1:15 PM, JorgeB said:

    Setup new stock Unraid flash drive, no key needed, boot the server with it and test, if no issues it suggest a problem with current /config, you can then try to copy just the bare minimum from the old flash drive and reconfigure the rest of the server, or copy a few config files at a time and re-test to see if you can find the culprit.

    @JorgeB So, I did what you asked and turns out the test unRAID flash drive has 0 issues. And after going through what @autumnwalker mentioned above, I feel like it's a known issue with how docker handles networking and hasn't been fixed yet. I'll test this theory before anything else.

  9. @autumnwalker Yes, I am using ipvlan + host access to custom networks because I want the prowlarr, jackett containers on Host IP to communicate with some containers that are on a different IP setup using br0 interface. I'll turn off the "host access to custom networks" and see if the issue gets fixed like it did for the folks you mentioned

     

  10. 1 minute ago, JorgeB said:

    Setup new stock Unraid flash drive, no key needed, boot the server with it and test, if no issues it suggest a problem with current /config, you can then try to copy just the bare minimum from the old flash drive and reconfigure the rest of the server, or copy a few config files at a time and re-test to see if you can find the culprit.

    Thanks. I'll get back to you once I've tried it.

  11. On 3/15/2023 at 4:00 PM, MAM59 said:

    reading your diagnostics give absolutely no clue about what is going on. even the "loss of LAN" is not recognized, so I would go with JorgeB ("the error is outside of unraid").

    If the Link to the switch would be lost, it would be noted, if you say "the others" can still comunicate, its also not the link to the router.

    The only faint (VERY FAINT!) idea that comes to my mind is a blocking port. try to disable "flow control" in the network settings (you need to use the "tips & tricks" plugin for this).

     

    But another test you can do (and report the result here) is to wait, until communication is gone. Then pull the ethernet plug, wait a few seconds, then put it back in again.

    I'm curious to see if a brutal line cut will reset the error state...

     

    Well, with all due respect, I find it hard to believe that it is not an unRAID issue. I booted the server to a live ubuntu USB and left it to ping google.com, cloudflare.com & unraid.net for more than a day and the packet loss are as follows: 0% for google.com, 0.00383547% packet loss for unraid.net & 0.00153467% packet loss for cloudflare.com. So, I believe this rules out the possibility of a hardware issue.

     

    - Not a router issues as all my other devices are working just fine connecting to the outside world

    - Not a switch issue as all the other devices connected to the switch are working fine + unRAID server also get connection for 10-15 minutes after reboot and then loses DNS

    - Not a hardware issue as booting into live ubuntu USB doesn't seem to have the same DNS resolution issue as the unRAID server

     

    With this new information on hand, I can confidently say that it is an unRAID issue, whether it originates from the OS or a corrupted config file, that's a different story. 

     

    ping_cloudflare.txt ping_google.txt ping_unraid.txt

  12. On 3/15/2023 at 3:38 PM, itimpi said:

    The core Unraid files are unpacked into RAM from the archives on the flash drive every time that Unraid boots so in that sense they should not be corrupt.   It is possible that something in the config folder which holds all your settings/customisations is corrupt but if you do not keep that you have to redo your settings.  You can simply stop plugins being loaded by booting in Safe Mode, but any other change would require you to do something to your settings.

    I am open to setting up everything from the scratch. How can I have a fresh start without using the limited USB Key resets? I believe all the parity data, appdata, etc. will just remain the same in the new install so long as I assign the correct drive back to the same purpose. As for docker container install, I am willing to do it again and point the new containers to old appdata to bring server back to my latest state. 

     

    Is there a way to accomplish the above?

  13. Yeah, totally agreed. Time sync issue seems to have occurred due to loss of connection and not the other way around. However, I there something else that comes to mind in how to diagnose the real issue? I'm going to try booting into a live ubuntu install later in the evening and observe if something similar happens in that too. In the meantime, I was also wondering if there is way to do a fresh install of unRAID while keeping my config and settings intact. That way, if there is a corrupt file somewhere, it would be fixed. Thoughts?

  14. So, I was looking online and found that sometimes the 'Clock Unsync' issue is caused because the BIOS time is incorrect. So, I checked that and lo & behold, the time is correct. As expected the time is in UTC and it correct. Which means we can rule out a faulty CMOS battery, right @MAM59?

     

    Anyways, good thing after my recent reboot, I am no longer seeing any clock out of sync errors. However, internet connection remains

  15. @JorgeBOkay, so waited about an hour after booting into safe mode. The server had lost connection. I guess we can rule out a bad plugin from the things causing the issue

    @MAM59I rebooted and used the following commands to resync ntp

    /etc/rc.d/rc.ntpd stop
    ntpdate time.cloudflare.com
    /etc/rc.d/rc.ntpd restart

    Right after restarting the service. I checked the logs and found this

    Mar 14 16:33:36 astatine  ntpd[1153]: ntpd exiting on signal 1 (Hangup)
    Mar 14 16:33:36 astatine  ntpd[1153]: 127.127.1.0 local addr 127.0.0.1 -> <null>
    Mar 14 16:33:36 astatine  ntpd[1153]: 216.239.35.0 local addr 192.168.1.4 -> <null>
    Mar 14 16:33:36 astatine  ntpd[1153]: 216.239.35.4 local addr 192.168.1.4 -> <null>
    Mar 14 16:33:36 astatine  ntpd[1153]: 216.239.35.8 local addr 192.168.1.4 -> <null>
    Mar 14 16:33:36 astatine  ntpd[1153]: 216.239.35.12 local addr 192.168.1.4 -> <null>
    Mar 14 16:35:02 astatine  ntpd[18956]: ntpd 4.2.8p15@1.3728-o Fri Jun  3 04:17:10 UTC 2022 (1): Starting
    Mar 14 16:35:02 astatine  ntpd[18956]: Command line: /usr/sbin/ntpd -g -u ntp:ntp
    Mar 14 16:35:02 astatine  ntpd[18956]: ----------------------------------------------------
    Mar 14 16:35:02 astatine  ntpd[18956]: ntp-4 is maintained by Network Time Foundation,
    Mar 14 16:35:02 astatine  ntpd[18956]: Inc. (NTF), a non-profit 501(c)(3) public-benefit
    Mar 14 16:35:02 astatine  ntpd[18956]: corporation.  Support and training for ntp-4 are
    Mar 14 16:35:02 astatine  ntpd[18956]: available at https://www.nwtime.org/support
    Mar 14 16:35:02 astatine  ntpd[18956]: ----------------------------------------------------
    Mar 14 16:35:02 astatine  ntpd[18960]: proto: precision = 0.034 usec (-25)
    Mar 14 16:35:02 astatine  ntpd[18960]: basedate set to 2022-05-22
    Mar 14 16:35:02 astatine  ntpd[18960]: gps base set to 2022-05-22 (week 2211)
    Mar 14 16:35:02 astatine  ntpd[18960]: Listen normally on 0 lo 127.0.0.1:123
    Mar 14 16:35:02 astatine  ntpd[18960]: Listen normally on 1 br0 192.168.1.4:123
    Mar 14 16:35:02 astatine  ntpd[18960]: Listen normally on 2 lo [::1]:123
    Mar 14 16:35:02 astatine  ntpd[18960]: Listening on routing socket on fd #19 for interface updates
    Mar 14 16:35:02 astatine  ntpd[18960]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized
    Mar 14 16:35:02 astatine  ntpd[18960]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized

    kernel reports TIME_ERROR: 0x41: Clock Unsynchronized 

  16. 8 minutes ago, MAM59 said:

    yeah but those "a message from the past..." means that the clock is far away. Of course, without LAN you cannot sync, but the error happened before. still no clue in your logs what is going on.

     

    I meant you should sync the clock with ntpdate BEFORE the LAN is gone. Reboot, stop ntpd, use ntpdate, restart ntpd and wait,

    we want to be sure that the box has the correct time when the error occurs.

    Okay. I'm checking what JorgeB suggested. Booted in safe mode. I'll observe the server for a while in safe mode and after ntpdate. 

     

    Just bear with my stupid questions. I just want to understand as much as I can in case something like this happens in the future.

  17. 4 hours ago, MAM59 said:

    these messages come from a "far away running" clock.

     

    you need to stop the ntpd daemon (it wont sync if the current clock is more than a certain limit away from the real time), then issue "ntpdate <server>" (you can use any known ntp server like time.windows.com)

    ntpdate will sync to the received time without any limit

     

    afterwards you can restart ntpd

    (if this happens again soon after, check the battery on your motherboard)

     

    I tried this. I believe the ntp errors are coming because the npt daemon isn't able to connect with remote "clock" due to DNS issue
    image.png.97430541e4a6600e3befee1de716dbb5.png

  18.  

    @JorgeB@itimpi So, I gave up on the server last night and left it on its own. Turns out a couple hours after the network reset yesterday, the server behavior changed. Now the server is connecting on and off. And I am getting new errors in the log(s). I have attached the latest diagnostics to this post. Here's a snippet of the new error(s) that I am seeing.

    Mar 13 11:52:22 astatine  avahi-daemon[23598]: Joining mDNS multicast group on interface vethf5a568d.IPv6 with address fe80::c86c:29ff:fead:e306.
    Mar 13 11:52:22 astatine  avahi-daemon[23598]: New relevant interface vethf5a568d.IPv6 for mDNS.
    Mar 13 11:52:22 astatine  avahi-daemon[23598]: Registering new address record for fe80::c86c:29ff:fead:e306 on vethf5a568d.*.
    Mar 13 11:52:22 astatine  avahi-daemon[23598]: Joining mDNS multicast group on interface veth25e0c9f.IPv6 with address fe80::44e7:81ff:fe1d:5a72.
    Mar 13 11:52:22 astatine  avahi-daemon[23598]: New relevant interface veth25e0c9f.IPv6 for mDNS.
    Mar 13 11:52:22 astatine  avahi-daemon[23598]: Registering new address record for fe80::44e7:81ff:fe1d:5a72 on veth25e0c9f.*.
    Mar 13 11:52:23 astatine  avahi-daemon[23598]: Joining mDNS multicast group on interface veth21f1006.IPv6 with address fe80::307a:a5ff:fe6a:307e.
    Mar 13 11:52:23 astatine  avahi-daemon[23598]: New relevant interface veth21f1006.IPv6 for mDNS.
    Mar 13 11:52:23 astatine  avahi-daemon[23598]: Registering new address record for fe80::307a:a5ff:fe6a:307e on veth21f1006.*.
    Mar 13 11:52:23 astatine  avahi-daemon[23598]: Joining mDNS multicast group on interface veth3e1bea2.IPv6 with address fe80::409b:19ff:febd:db3e.
    Mar 13 11:52:23 astatine  avahi-daemon[23598]: New relevant interface veth3e1bea2.IPv6 for mDNS.
    Mar 13 11:52:23 astatine  avahi-daemon[23598]: Registering new address record for fe80::409b:19ff:febd:db3e on veth3e1bea2.*.
    Mar 13 17:25:47 astatine nginx: 2023/03/13 17:25:47 [error] 3949#3949: nchan: A message from the past has just been published. Unless the system time has been adjusted, this should never happen.
    Mar 13 17:25:47 astatine nginx: 2023/03/13 17:25:47 [error] 3949#3949: nchan: A message from the past has just been published. Unless the system time has been adjusted, this should never happen.
    Mar 13 17:25:47 astatine nginx: 2023/03/13 17:25:47 [error] 3949#3949: nchan: A message from the past has just been published. Unless the system time has been adjusted, this should never happen.
    Mar 13 17:42:32 astatine  ntpd[7970]: no peer for too long, server running free now
    Mar 13 18:17:31 astatine  ntpd[7970]: no peer for too long, server running free now
    Mar 13 18:24:29 astatine webGUI: Successful login user root from <tailscale ip of my phone>

    Checkout the log at timestamp: Mar 13 20:07:29

    Mar 13 18:27:07 astatine  avahi-daemon[23598]: Joining mDNS multicast group on interface veth641d21e.IPv6 with address fe80::78a8:91ff:fee9:b4be.
    Mar 13 18:27:07 astatine  avahi-daemon[23598]: New relevant interface veth641d21e.IPv6 for mDNS.
    Mar 13 18:27:07 astatine  avahi-daemon[23598]: Registering new address record for fe80::78a8:91ff:fee9:b4be on veth641d21e.*.
    Mar 13 20:07:29 astatine  ntpd[7970]: receive: Unexpected origin timestamp 0xe7ba3941.9078c3b0 does not match aorg 0000000000.00000000 from server@216.239.35.0 xmt 0xe7ba3940.e57b3745
    Mar 14 00:59:48 astatine nginx: 2023/03/14 00:59:48 [error] 3949#3949: nchan: A message from the past has just been published. Unless the system time has been adjusted, this should never happen.
    Mar 14 00:59:48 astatine nginx: 2023/03/14 00:59:48 [error] 3949#3949: nchan: A message from the past has just been published. Unless the system time has been adjusted, this should never happen.
    Mar 14 00:59:48 astatine nginx: 2023/03/14 00:59:48 [error] 3949#3949: nchan: A message from the past has just been published. Unless the system time has been adjusted, this should never happen.
    Mar 14 02:00:01 astatine root: mover: started
    Mar 14 02:00:03 astatine root: mover: finished
    Mar 14 02:22:58 astatine nginx: 2023/03/14 02:22:58 [error] 3949#3949: nchan: A message from the past has just been published. Unless the system time has been adjusted, this should never happen.
    Mar 14 03:53:35 astatine nginx: 2023/03/14 03:53:35 [error] 3949#3949: nchan: A message from the past has just been published. Unless the system time has been adjusted, this should never happen.
    Mar 14 03:53:35 astatine nginx: 2023/03/14 03:53:35 [error] 3949#3949: nchan: A message from the past has just been published. Unless the system time has been adjusted, this should never happen.
    Mar 14 03:53:35 astatine nginx: 2023/03/14 03:53:35 [error] 3949#3949: nchan: A message from the past has just been published. Unless the system time has been adjusted, this should never happen.
    Mar 14 04:23:08 astatine  ntpd[7970]: no peer for too long, server running free now
    Mar 14 05:29:33 astatine nginx: 2023/03/14 05:29:33 [error] 3949#3949: nchan: A message from the past has just been published. Unless the system time has been adjusted, this should never happen.
    Mar 14 05:29:33 astatine nginx: 2023/03/14 05:29:33 [error] 3949#3949: nchan: A message from the past has just been published. Unless the system time has been adjusted, this should never happen.
    Mar 14 05:29:33 astatine nginx: 2023/03/14 05:29:33 [error] 3949#3949: nchan: A message from the past has just been published. Unless the system time has been adjusted, this should never happen.
    Mar 14 06:18:35 astatine nginx: 2023/03/14 06:18:35 [error] 3949#3949: nchan: A message from the past has just been published. Unless the system time has been adjusted, this should never happen.
    Mar 14 07:59:39 astatine  ntpd[7970]: no peer for too long, server running free now
    Mar 14 09:02:32 astatine  ntpd[7970]: no peer for too long, server running free now
    Mar 14 09:31:20 astatine  ntpd[7970]: no peer for too long, server running free now
    Mar 14 10:23:03 astatine nginx: 2023/03/14 10:23:03 [error] 3949#3949: nchan: A message from the past has just been published. Unless the system time has been adjusted, this should never happen.
    Mar 14 10:23:03 astatine nginx: 2023/03/14 10:23:03 [error] 3949#3949: nchan: A message from the past has just been published. Unless the system time has been adjusted, this should never happen.
    Mar 14 10:23:03 astatine nginx: 2023/03/14 10:23:03 [error] 3949#3949: nchan: A message from the past has just been published. Unless the system time has been adjusted, this should never happen.
    Mar 14 10:23:03 astatine nginx: 2023/03/14 10:23:03 [error] 3949#3949: nchan: A message from the past has just been published. Unless the system time has been adjusted, this should never happen.
    Mar 14 10:23:03 astatine nginx: 2023/03/14 10:23:03 [error] 3949#3949: nchan: A message from the past has just been published. Unless the system time has been adjusted, this should never happen.
    Mar 14 10:23:03 astatine nginx: 2023/03/14 10:23:03 [error] 3949#3949: nchan: A message from the past has just been published. Unless the system time has been adjusted, this should never happen.
    Mar 14 10:23:04 astatine nginx: 2023/03/14 10:23:04 [error] 3949#3949: nchan: A message from the past has just been published. Unless the system time has been adjusted, this should never happen.
    Mar 14 10:23:04 astatine nginx: 2023/03/14 10:23:04 [error] 3949#3949: nchan: A message from the past has just been published. Unless the system time has been adjusted, this should never happen.
    Mar 14 10:23:04 astatine nginx: 2023/03/14 10:23:04 [error] 3949#3949: nchan: A message from the past has just been published. Unless the system time has been adjusted, this should never happen.
    Mar 14 10:23:05 astatine nginx: 2023/03/14 10:23:05 [error] 3949#3949: nchan: A message from the past has just been published. Unless the system time has been adjusted, this should never happen.
    Mar 14 10:23:06 astatine nginx: 2023/03/14 10:23:06 [error] 3949#3949: nchan: A message from the past has just been published. Unless the system time has been adjusted, this should never happen.
    Mar 14 10:23:07 astatine nginx: 2023/03/14 10:23:07 [error] 3949#3949: nchan: A message from the past has just been published. Unless the system time has been adjusted, this should never happen.

     

  19. 3 hours ago, JorgeB said:

    Everything looks normal in the diags, looks more like a LAN problem, Try changing your DNS server to 208.67.222.222 and 208.67.220.220 instead of using your router.

     

     

    Already done that. No luck. It works fine for 10-15 minutes after saving the changes and then back to square one. No connection. Pings failing.

  20. 17 minutes ago, MAM59 said:

    your logs are not very helpful. You should go to your server, reboot, log on to the console (not the gui), place a "ping -t somebodyYouKnow" and wait for the unreachable.

    Then cancel the ping and directly do "diagnostics" to capture this state.

    This should only show the relevant data and not contains tons and tons of wrong tries.

     

    I perform this in the evening. However, I just now noticed a strange thing. So, as of now I am unable to see CA tab, no docker container is able to connect to the internet, pings are failing. However, the network panel on the dashboard is show significant network activity. 

    msedge_0hnDAGmfBN.gif.1e1feb1097d2ceea8265ccac25886b2e.gif

     

    I saw the outbound traffic going upward to about 10Mbps as well. 

     

    What can explain this?

×
×
  • Create New...