Jump to content
  • [6.10.0-6.10.2] - Network lost after some days working


    vvancreij
    • Solved Urgent

    Hi guys,

     

    After upgrading from unraid v6.9.2 to 6.10.2 (6.9.2 -> 6.10.0 -> 6.10.1 -> 6.10.2) I have notice some network issues after few days with the server up and running. At some point I lose connection to my unraid server, no ssh available not even ping (from local network, of course). This behaviour doesn't appear in v6.9.2 and its quite annoying because I need to execute a manual 'dirty' reboot to fix it. All this with parity checks execution inconvenients (out of schedule). After some days the issue comes again. 

     

    I have to tell that I think that I am not affected by tg3 drive as my NIC is a I211 Gigabit Network Connection and is using driver=igb.

     

    I would be grateful for some help at this point. Attached you can find Diagnostics.zip file for more datails. I choose urgent priority beacause the potential risk after 'dirty' reboots. 

     

    Thank you in advance. Looking fordward to hearing from you.

    vkhpsrv01-diagnostics-20220601-1857.zip




    User Feedback

    Recommended Comments



    Thank you for your prompt answer, I'll do it and I'll post next occurance.

    Edited by vvancreij
    Typo error
    Link to comment

    Got the same problem. Two days in a row now.

    Only solution is to hard reset the server.

    Same upgrade path v6.9.2 to 6.10.2 (6.9.2 -> 6.10.0 -> 6.10.1 -> 6.10.2)

    Server was online for weeks on 6.9.2 with no problem.
    Problems came with the upgrade to 6.10.0, first after a few days, now on 6.10.2 every day.

    I'll enable the syslog server as well and will post the logs here, the problem will happen soon enough again.

    edit. as an added note as I'm using IMPI to check on the server, there is no remote response, no CLI, no gui, nothing. So it's probably not a network thing. The system seems to freeze with no interaction possible.

    edit2. I attached a previous log with the problem happening at the 23:07:00 mark (lost connectivity at 23:08:44).

    syslog-unraid-02june2022-23-08-44.txt

    Edited by Sonophis
    Link to comment
    9 hours ago, Sonophis said:

    edit2. I attached a previous log with the problem happening at the 23:07:00 mark (lost connectivity at 23:08:44).

    That looks related to the macvlan issue, see if this helps:

     

    Switch from macvlan to ipvlan (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enable, top right))

     

     

    Link to comment
    10 hours ago, Sonophis said:

    edit2. I attached a previous log with the problem happening at the 23:07:00 mark (lost connectivity at 23:08:44).

    syslog-unraid-02june2022-23-08-44.txt

     

    Reviewing your log and comparing it to mine, I have found very similar behaviour between what you get at 23:07 and what I get 01:43. However, after getting that error, in my case, I have no lost my connection yet. I am still loging as I have no lost my connection yet despite having get the error you mention in your post I attach my partial syslog with similar warnings. 

     

    Related to potential solution provided by @JorgeB (many thanks for your prompt answers):

     

    1 hour ago, JorgeB said:

    Switch from macvlan to ipvlan (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enable, top right))

     

    @Sonophis, as you are getting that error every day, could you try that solution and let us know?

     

    As soon as I can reproduce the issue, I'll share my syslog. Hope we can fix this the sooner the better. Thank you too!

    partial-syslog-unraid-02june2022-01-43-00.txt

    Edited by vvancreij
    Link to comment
    5 hours ago, JorgeB said:

    That looks related to the macvlan issue, see if this helps:

     

    Switch from macvlan to ipvlan (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enable, top right))

     

     


    Followed your suggestion and changed from macvlan to ipvlan, i'll share a syslog in case of problem.

    I'm looking into macvlan vs ipvlan aswell, learned something new today.

    Thx again.

    Link to comment
    On 6/3/2022 at 2:25 PM, Sonophis said:


    Followed your suggestion and changed from macvlan to ipvlan, i'll share a syslog in case of problem.

    Thx again.

     

    It's been a week now and no problem whatsoever.

    Seems the change from macvlan to ipvlan did the trick.

    Had to change some network settings for some of my containers but other than that, Unraid 6.10.2 is running smoothly.

    Good job and thx again.

    Edited by Sonophis
    • Like 1
    Link to comment

    Hi guys,

     

    I have reproduced the incident with docker network still configured as macvlan type. Attached you can find the syslog. Having a look on them I have found serveral erros but as I am not an expert, I am not able to know exactly what could be happening. As mentioned in prevoius posts, comparing my logs with @Sonophis's, I think I have found similar errors.

     

    So, after @Sonophis feedback related to @JorgeB suggestion

     

    On 6/3/2022 at 8:27 AM, JorgeB said:

    Switch from macvlan to ipvlan (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enable, top right))

     

    I will follow it and let you about new updates regarding this issue.


    Thank you all!

     

    syslog

    Edited by vvancreij
    Typo error
    Link to comment
    43 minutes ago, vvancreij said:

    I have reproduced the incident with docker network still configured as maclan type.

    It's also the macvlan issue:

    Jun  3 15:47:20 vkhpsrv01 kernel: macvlan_broadcast+0x116/0x144 [macvlan]
    Jun  3 15:47:20 vkhpsrv01 kernel: macvlan_process_broadcast+0xc7/0x110 [macvlan]

     

    Besides that there's also filesystem corruption detected on disk1, check filesystem there, btrfs is detecting data corruption in the pool, see here how to handle that, good idea to run memtest first though, also you have both DIMMs in the same channel, for better performance and stability install one in each channel.

    Link to comment

    Hi again,

     

    12 hours ago, JorgeB said:

    It's also the macvlan issue:

    Jun 3 15:47:20 vkhpsrv01 kernel: macvlan_broadcast+0x116/0x144 [macvlan]

    Jun 3 15:47:20 vkhpsrv01 kernel: macvlan_process_broadcast+0xc7/0x110 [macvlan]

     

    Great to know that. I have already change docker network type to ipvlan. 

     

    Regarding

    12 hours ago, JorgeB said:

    filesystem corruption detected on disk1

     

    I have checked and run xfs_repair. Now everything seems to be ok. Thank you for the guidance Check filesystem

     

    Related to:

    12 hours ago, JorgeB said:

    btrfs is detecting data corruption in the pool,

     

    I followed instructions described in

    Now, no errors have been detected on the pool. Thank you again for the guidance.

     

    For recomendation

     

    12 hours ago, JorgeB said:

    Also you have both DIMMs in the same channel, for better performance and stability install one in each channel.

     

    As soon as I have the opportunity, I will install as suggested. Just aas curiosity, how did you get that infromation? It's Incredible your troubleshooting level!

     

    Finally, as the issue related to network have been reolved after @JorgeB troubleshooting and @Sonophis feedback I am going to mark it as solved.

     

    Many, many thanks to all of you for your support and feedback and let me say thanks again for @JorgeB for your prompt answers regarding that issue.

     

    Edited by vvancreij
    Typo error
    • Like 1
    Link to comment
    4 hours ago, vvancreij said:

    Just aas curiosity, how did you get that infromation?

    It's available in the diags, system/meminfo.txt

     

    It shows both DIMMs on channel A, this means they were installed side by side, to install in dual channel mode you need to leave on slot empty between the DIMMs, some boards use different colors on the sockets to indicate this, but not all.

    • Like 1
    Link to comment

    Thank you very much for your explanation.

     

    So, if you have 4 DIMM modules of different size but in pairs, what will the most recommended installation?

     

    Imagine you have two DIMM 8GB and other two 16GB. After your explanation, the proper way to install them could be Channel A0 8 Gb, Channel A1 16gb, Channel B0 8Gb and Channel B1 16Gb?

     

    Thank you in advance for your explanations and sharing your knowledge!

    Link to comment
    9 hours ago, vvancreij said:

    the proper way to install them could be Channel A0 8 Gb, Channel A1 16gb, Channel B0 8Gb and Channel B1 16Gb?

    Yes, I usually install the larger pair on A0+B0 then the smaller one on A1+B1, I remember reading somewhere that the larger capacity DIMMs should be installed first, but to be honest probably not much difference either way, important part is that each pair use both channels.

    • Like 1
    Link to comment

    From one hand, thank you for your explanation. I will update my DIMMs installation as soon as I get some time.

     

    From the other hand, unfortunately, something happened again to my network as I can't connect to my unraid server. I am not at home right now, so I can't hard reset my server. Neither attach logs... This evening when I arrive I will proceed with all this stuff and I'll attach logs...

     

    I am going to reopen the incident again...

     

    Edited by vvancreij
    Link to comment

    Sorry guys. I have some scheduled processes that connects to my unraid server and have finished ok while I could no access remotely. I have notice that all my dockers with custom netrwork exposed to internet where working fine as well. So, it seems that local connectivity was perfectly right and the problems were with remote access. I have notice that I could not connect to my unraid server from myservers remote acces neither through wireguardvpn.

     

    What is true is that something has happened as I was not able to logon remotely. Now, at home, I can to logon without any problem and after restarting unraid api I have remote connection from myservers either from wireguard.

     

    While I was writting this post, I lost connectivity from my unraid server to Internet (I notice as I was trying to apply a docker update). From Internet to unraid server was OK. I have reboot my server and everything is ok right now. I attach the log if you can see something.

     

    Anyway, I amb going to close this incident relate to network connection. If my network connection from unraid server to the Internet fails agian, I will open another.

     

    Thank you in advance.

    syslog-192.168.10.10.txt

    Edited by vvancreij
    Link to comment

    Hi guys, 

     

    After thinking the best way to post the incident, at the end, I decided to reopen that post as I consider it as Network issues too.

     

    Now after applying ipvlan on docket network I never lose intranet connection to my server. However, from time to time, my unraid server lose connection to the internet. I have read some a post in the community that with ipvlan there are issues related to pinging outside.

     

    Now it is a complete nightmare for me... With macvlan after some time I get my server inaccessible and need a hard reboot. With ipvlan I lose internet connection from my server with problems applying updates etc.

     

    In version 6.9.2 I was using macvlan without any trouble and internet acces has never lost.

     

    Please I need some help here.

     

    Thank you in advabce.

    Edited by vvancreij
    Link to comment
    1 hour ago, vvancreij said:

    With ipvlan I lose internet connection from my server with problems applying updates etc.

    Don't remember any similar reports, get the diagnostics after the problem to see if there's anything relevant logged.

    Link to comment

    Thank you @JorgeB for your prompt answer. Now I cannot connect remotely to my server through my servers so I suspect that the problem is happening right now. As soon I'll be at home, I'll get the diagnostics and post it with log too.

     

    What is very curious is that I cann access to my apps exposed through Internet from my swag docker. I mean that I have a swag docker with custom network and in my router i fordward all 443 and 80 port connections to that docker. With this I want to clarify that the problem seems to be from unraid server to the Internet. 

     

    The post I was refering to related to ipvlan problems with internet connection is as follows. Seems that with stable version 6.10.2 is still happening.

     

     

    Another question I have is why I am suffering now that macvlan issue? I mean, in v6.9.2 I had the same configuration and I have never had any issue with that... I had macvlan configured and I have never lost communication to server...

     

    Hope we are able to fix that as soon as possible.

     

    Thank you in advance

     

     

    Link to comment
    6 minutes ago, vvancreij said:

    Another question I have is why I am suffering now that macvlan issue?

    That's not uncommon, that issue first started with v6.5, but for some users it started happening after updating from v6.8 to v6.9, to others after updating from v6.9 to v6.10.

    Link to comment
    4 minutes ago, JorgeB said:

    That's not uncommon, that issue first started with v6.5, but for some users it started happening after updating from v6.8 to v6.9, to others after updating from v6.9 to v6.10.

     Thanks for your answer!

    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.

×
×
  • Create New...