DieFalse

Members
  • Posts

    432
  • Joined

  • Last visited

  • Days Won

    1

Report Comments posted by DieFalse

  1. 4 minutes ago, ThatDude said:

    I'm not sure if this is a bug / a quirk or a coincidence. After upgraded from RC1 (which ran fine) one of my drives has been marked 'unmountable unsupported partition layout'. But it's still in it's slot (Disk 1 - see attached) and is not being emulated which is what I would have expected. Also the array isn't showing as degraded - is that normal?

    Screenshot 2021-11-02 at 23.38.45.jpg

    Definitely start a support thread and post your diagnostics in it.

    • Like 1
  2. 1 hour ago, limetech said:

    Could be.  Nothing in 'stock' Unraid OS requires libgd so we would not have noticed if an updated package removed it.

    We can add it - what about 'vnstat' package?  Is this useful to add to Unraid OS?

     

    VNSTAT is needed for Network Statistics to function correctly and it is not currently startable on my servers: "vnstat service must be running STARTED to view network stats." Please keep it.

     

    "root@Arcanine:~# vnstat
    Error: Database "/var/lib/vnstat//vnstat.db" contains 0 bytes and isn't a valid database, exiting."

     

    root@Arcanine:~# vnstat
    Error: Failed to open database "/var/lib/vnstat//vnstat.db" in read-only mode.
    The vnStat daemon should have created the database when started.
    Check that it is configured and running. See also "man vnstatd".

     

    root@Arcanine:~# vnstatd -d
    Error: Not enough free diskspace available in "/var/lib/vnstat/", exiting.

  3. 18 hours ago, danioj said:

     

    I have to openly admit that I do not have the technical insights as to how it works. What I can share is what I experienced.

     

    The server was stable ever since I issued the command initially. I had to shut off the server as I had an electrician in to install some smart light switches and we had to cut the power.

     

    When I turned back on - well, 5 hours after I turned back on - the server crashed.

     

    I hard reset and went back in an issued the same above command and it's been stable again ever since. I concluded from that (and it will be interesting to see if I get a crash again over the coming week - noting they did come daily previously), that the command has to keep being issued.

     

    Thank you for detailing this.  I am looking for extra information on how this solves the issue to begin with and how it reacts for others.  Hopefully someone better than I can chime in and expand.

  4. 2 hours ago, danioj said:


    I have just tested this. The command does not survive reboots. 

     

    Without re-issuing the command, after a reboot, have you had the call trace?  The reason I ask, is because the number will change, as designed, so verifying it with cat /proc/sys/net/netfilter/nf_conntrack_max or similar is not valid.

  5. 14 hours ago, danioj said:

     

    Update since I applied this "Fix" - no call traces in the log or hard crashes yet.

     

    If this works I will probably need to add the command to a user script to execute on array start.

     

    You should not need to,  It is a once and done command UNLESS something in the system sets the conntrack too high which shouldn't happen and hasn't since Kernel 5.12.2 https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.12.2

     

    Since this netfilter conntrack is working as of now, the limitation for the nic was a false flag. I don't know what NIC's you are using but some consumer level ones do not like multiple mac addresses and fail as low as two.  creating a virtual vlan or br0 ip etc creates new virtual MAC addresses and the card can only handle so much. Example of a enterprise card: "Many Mellanox adapters are capable of exposing up to 127 virtual instances." Consumer card: "Realtek 1Gb NICs are often limited to 6-12 virtual instances" 

     

    Looking at your config, you have I believe 7 instances on one card and 1 on the other which is controlled by a Intel® i210AT dual port 1Gb integrated, which is limited to 5 vectors (instances) per port,  so it is technically over limit - HOWEVER you're not assigning IP's to the VLANS and I believe this stops them from being true virtual instances, so theoretically it should be fine.

     

    Give it some more time, but please advise if you experience any call trace and if so, post diagnostics with the syslog please.  I don't expect you to have any though.

  6. Ok - awaiting your reply on testing.  I am 100% seeing netfilter call trace cause.  your expierence on IPVLAN is expected, and you can overcome this by properly building your docker network with creating custom networks and some other config.  "host access being enabled" will cause issues and is generally not advised, it would take you some time to correct your config not to need it. But - that's a different thread.  

     

     

     

  7. On 8/28/2021 at 9:15 PM, danioj said:

    Update. I had another hard lock up over night. Same issues in the log.

     

    Tried the fix linked to me by @ljm42 above:

     

    sysctl net/netfilter/nf_conntrack_max=131072

     

    Let's see how it goes.

    I reviewed your logs and you are experiencing call traces due to your networking adapter, however it appears you are somehow reaching the limit of your NIC.   If the fix above doesn't work, splitting between a couple NICs may, or upgrading your existing NIC or it's firmware even.   I didn't review the logs enough to find the systems hardware as it's late, so I will relook tomorrow and let you know if I can see anything deeper.

  8. On Aug 20th, I posted the above due to debugging seeming to point to nVidia, however post troubleshooting in depth it was determined netfilter was causing the call traces.   

     

    I was asked to try "ipvlan" instead of "macvlan" - this made no change, so reverted back to macvlan.

     

    :: Place holder for details and oulying the issue, original values etc :: :: at work so limited on what I can pull, will edit to add later ::

     

    I have since, after reviewing other similar call traces, found reference to setting the conn track max in an effort to resolve the call traces.  just over 36 hours ago I made the following change: "sysctl net/netfilter/nf_conntrack_max=131072" in terminal and verified it with "cat /proc/sys/net/netfilter/nf_conntrack_max" showing the new value of 131072.  I have not had a single call trace since.

     

    TLDR: setting this 

    sysctl net/netfilter/nf_conntrack_max=131072

    stopped my call traces.

     

    If anyone knows how to help me gather what's needed to see why this stopped the call traces and prevent them from happening to others - please assist.

    • Like 1
  9. Tracking this down with discord assistance, it was determined that the servers was blocking/routing incorrectly due to Jumbo Frames on the NICs.  when the MTU is higher than 1500 on the NICs no web access across the network was possible, yes jumbo frames is correctly configured on the switch's and router, and jumbo frames worked on <6.9.2.  When the MTU is set to 1500, WebUI loads correctly.

    • Like 2
  10. I continue to have problems with GSA and Arcanine.  Arcanine became completely unresponsive last night.  GSA is SSH'able but nothing will run correct.y

     

    Neither is unusable in its state, so I will be forced to revert to 6.9.2 soon to bring them back online,  I would like to give all information I can to help resolve this. Please let me know what to provide. I can even provide remote access if you want to PM me.

  11. 17 hours ago, fmp4m said:

    I will restore via instructions this evening.   The btrfs issues were resolved in another thread that lead to me trying 6.10 as the call traces on previous versions.   Leading me to here.  I'll advise when both are 100% stock and then upgrade to 6.10rc1 again and see if any difference is noticed.

     

    Ok, Arcanine is on 6.10rc1 with no issues as of this morning.

     

    GSA I could not get into it in any way other than SSH as previously mentioned, so it still has the issues originally reported. Except error 500 on loading the webui instead of 302.

     

    I used use_ssl no and got into the web ui, call traces are still heavily happening.

     

    The kernel files should all be stock now.

     

    I since I had previously downgraded, manually copied 6.9.2 to root, rebooted, then upgraded to 6.10rc1 through UI and verified bz files matched the downloaded zip for 6.10rc1

    gsa-diagnostics-20210822-1057.zip

  12. 15 minutes ago, ljm42 said:

    gsa has the same plugin installed, so I'd remove that and restore it to stock as well.

     

    gsa has a bunch of btrfs mentions in the log, not sure if that is an issue?

    Aug 18 21:45:17 GSA kernel: BTRFS info (device sdb1): relocating block group 43333096308736 flags data
    Aug 18 21:45:22 GSA kernel: BTRFS info (device sdb1): found 9 extents, stage: move data extents
    Aug 18 21:45:22 GSA kernel: BTRFS info (device sdb1): found 9 extents, stage: update data pointers
    Aug 18 21:45:23 GSA kernel: BTRFS info (device sdb1): relocating block group 43332022566912 flags data
    Aug 18 21:45:28 GSA kernel: BTRFS info (device sdb1): found 8 extents, stage: move data extents
    Aug 18 21:45:28 GSA kernel: BTRFS info (device sdb1): found 8 extents, stage: update data pointers
    Aug 18 21:45:29 GSA kernel: BTRFS info (device sdb1): relocating block group 43330948825088 flags data
    Aug 18 21:45:33 GSA kernel: BTRFS info (device sdb1): found 9 extents, stage: move data extents
    Aug 18 21:45:34 GSA kernel: BTRFS info (device sdb1): found 9 extents, stage: update data pointers
    Aug 18 21:45:35 GSA kernel: BTRFS info (device sdb1): relocating block group 43329875083264 flags data
    Aug 18 21:45:40 GSA kernel: BTRFS info (device sdb1): found 8 extents, stage: move data extents
    Aug 18 21:45:41 GSA kernel: BTRFS info (device sdb1): found 8 extents, stage: update data pointers
    
    

     

    I will restore via instructions this evening.   The btrfs issues were resolved in another thread that lead to me trying 6.10 as the call traces on previous versions.   Leading me to here.  I'll advise when both are 100% stock and then upgrade to 6.10rc1 again and see if any difference is noticed.

    • Like 1
  13. 1 minute ago, ljm42 said:

     

    arcanine has the Unraid-Kernel-Helper.plg plugin installed, and the bzfirmware file is 20MB whereas the stock 6.10.0-rc1 file is 10MB.

     

    Does that plugin have an option to return to stock? If so run that and then uninstall the plugin.

     

    Unfortunately, nothing else is really standing out to me. Hopefully somebody else will see something.

     

    That plugin has been removed, the behaviour exists on GSA and Arcanine both.  There was no "revert" option so I assume on Arcanine if thats the only one with bad .bz, I will need to do something to revert?

     

    I thought when you run upgrade it installs the latest .bz, so that makes me think something would have to modify the .bz post upgrade?

  14. 34 minutes ago, ljm42 said:

    > curl localhost results in 302 error, so I am assuming this has to do with the dns redirect that unraid uses to the hash url?  (is there anyway to disable that since I use my own dns and hostnames with valid certs anyway and it would be easy to configure).

     

    ssh into the server and type:

      use_ssl no

    This is the equivalent of going to Settings -> Management Access and setting Use SSL to No.  You will then be able to access the webgui using:

      http://ipaddress  (note: http not https)

     

    Thank you - I will try that to see if I can get into the WebUI but the hangs / inability to maneuver anything even in ssh is worrisome.  Anything in diagnostics useful?

  15. Hi JonP sorry for the delay,   I checked and no,  I have over 2TB available.  Cache was balanced and scrubbed.   I have no idea why it can't be recreated.   There may be something wrong with my setup - I've posted a couple issues that got worked through and a couple untouched in general, however I have had time-outs occur when updating dockers and dockers disappearing because of it, as well as really sluggish web gui responses in general.   Can't seem to nail it down at this point - and generic posts tend to yield no results.   Maybe its related to my system only?