spdelope

Members
  • Posts

    16
  • Joined

  • Last visited

Posts posted by spdelope

  1. 9 hours ago, JorgeB said:

    Both times it was a HBA problem:

     

    Apr 10 02:10:01 UnRaid-Server kernel: mpt2sas_cm0: SAS host is non-operational !!!!

     

    Make sure the HBA is well seated, you can also try a different PCIe slot, also see below, to see if it helps with the PCIe errors being logged:

    https://forums.unraid.net/topic/118286-nvme-drives-throwing-errors-filling-logs-instantly-how-to-resolve/?do=findComment&comment=1165009

     

     

    Interesting that turning ASPM on seemed to make everything happy and work together. I also reseated the card for good measure. Thanks for the reply!

    • Like 1
  2. I have had a couple times this past week or so with major errors with my array where all disks attached to the SAS expander card would be unavailable. Woke up to an array that wasn't available and a full log file. 

     

    This sort of happened when I replaced a 10GBaseT card with an SFP+ one. Not sure if related. A day or so later, it said one of my drives failed so I replaced and rebuilt the array. Then I had an unresponsive server. After some reboots and updating BIOS and ensuring BIOS settings are correct, I finally was able to get it booted and working when I pulled the GPU. So far so good.

     

    So not sure if related to the GPU or not, or maybe its some sort of PCI lane issue with the mobo or power management with PCI. Or it could be the SAS card failing. Some help is appreciated.

     

    I've attached logs from when it happened two times. At the end of the two logs are the read errors which continue almost infinitely.

    logs relevent #2.txt logs relevent.txt

    unraid-server-diagnostics-20240411-1209.zip

  3. 22 hours ago, JorgeB said:
    Jul 26 17:14:29 UnRaid-Server kernel: macvlan_broadcast+0x116/0x144 [macvlan]
    Jul 26 17:14:29 UnRaid-Server kernel: macvlan_process_broadcast+0xc7/0x110 [macvlan]

     

    Macvlan call traces are usually the result of having dockers with a custom IP address and will end up crashing the server, upgrading to v6.10 and switching to ipvlan should fix it (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enabled, top right)).

    I went ahead and turned off the pihole docker (it was a backup to my pi anyways) we'll see how long I can go with out a crash!

  4. 10 hours ago, JorgeB said:
    Jul 26 17:14:29 UnRaid-Server kernel: macvlan_broadcast+0x116/0x144 [macvlan]
    Jul 26 17:14:29 UnRaid-Server kernel: macvlan_process_broadcast+0xc7/0x110 [macvlan]

     

    Macvlan call traces are usually the result of having dockers with a custom IP address and will end up crashing the server, upgrading to v6.10 and switching to ipvlan should fix it (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enabled, top right)).

     

    The only docker I have running in br0 with its own ip address is pihole. I had countless file permission issues with v6.10. Do you think just stopping pihole will resolve my issue?

     

    Thank you

  5. 2 hours ago, trurl said:

    Perhaps unrelated except it might indicate something wrong with your docker configuration:

    Why do you have 120G docker.img? Have you had problems filling it? 20G is often more than enough.

     

    Something that might be related to your problem and could also be caused by docker configuration:

    What do you get from the command line with this?

    df -h /

     

    This is the result from df -h /

    Filesystem      Size  Used Avail Use% Mounted on
    rootfs           16G  1.7G   14G  11% /

    I had the larger docker image size just to avoid it filling up. I'm at 16gb now and I think I just went to 120 since I have plenty of space on my cache so didn't want to deal with it again.

  6. I have had this issue for a while and can't quite nail it down. My server will randomly hang and become unresponsive, requiring me to reboot on the machine. Cannot access via webgui, ssh, or via monitor plugged in. Never any real info in the logs from what I can tell. Here are from the last time this happened. My server locked up later around 12:01pm or so but the logs stopped until I rebooted. The last line is concerning.

     

    Sep  9 11:53:12 UnRaid-Server kernel: IPv6: ADDRCONF(NETDEV_CHANGE): veth367ca58: link becomes ready
    Sep  9 11:53:12 UnRaid-Server kernel: docker0: port 18(veth367ca58) entered blocking state
    Sep  9 11:53:12 UnRaid-Server kernel: docker0: port 18(veth367ca58) entered forwarding state
    Sep  9 11:53:21 UnRaid-Server kernel: docker0: port 18(veth367ca58) entered disabled state
    Sep  9 11:53:21 UnRaid-Server kernel: vethd0ae93e: renamed from eth0
    Sep  9 11:53:21 UnRaid-Server kernel: docker0: port 18(veth367ca58) entered disabled state
    Sep  9 11:53:21 UnRaid-Server kernel: device veth367ca58 left promiscuous mode
    Sep  9 11:53:21 UnRaid-Server kernel: docker0: port 18(veth367ca58) entered disabled state
    Sep  9 11:54:21 UnRaid-Server kernel: docker0: port 18(vethee550ee) entered blocking state
    Sep  9 11:54:21 UnRaid-Server kernel: docker0: port 18(vethee550ee) entered disabled state
    Sep  9 11:54:21 UnRaid-Server kernel: device vethee550ee entered promiscuous mode
    Sep  9 11:54:21 UnRaid-Server kernel: docker0: port 18(vethee550ee) entered blocking state
    Sep  9 11:54:21 UnRaid-Server kernel: docker0: port 18(vethee550ee) entered forwarding state
    Sep  9 11:54:21 UnRaid-Server kernel: docker0: port 18(vethee550ee) entered disabled state
    Sep  9 11:54:21 UnRaid-Server kernel: eth0: renamed from vethd3a5f8d
    Sep  9 11:54:21 UnRaid-Server kernel: IPv6: ADDRCONF(NETDEV_CHANGE): vethee550ee: link becomes ready
    Sep  9 11:54:21 UnRaid-Server kernel: docker0: port 18(vethee550ee) entered blocking state
    Sep  9 11:54:21 UnRaid-Server kernel: docker0: port 18(vethee550ee) entered forwarding state
    Sep  9 11:54:26 UnRaid-Server kernel: docker0: port 18(vethee550ee) entered disabled state
    Sep  9 11:54:26 UnRaid-Server kernel: vethd3a5f8d: renamed from eth0
    Sep  9 11:54:26 UnRaid-Server kernel: docker0: port 18(vethee550ee) entered disabled state
    Sep  9 11:54:26 UnRaid-Server kernel: device vethee550ee left promiscuous mode
    Sep  9 11:54:26 UnRaid-Server kernel: docker0: port 18(vethee550ee) entered disabled state
    Sep  9 11:54:27 UnRaid-Server kernel: general protection fault, probably for non-canonical address 0x49e5ef1403d70: 0000 [#3] SMP NOPTI
    Sep  9 11:54:27 UnRaid-Server kernel: CPU: 0 PID: 8290 Comm: Disk Tainted: P      D W  O      5.10.28-Unraid #1
    Sep  9 11:54:27 UnRaid-Server kernel: Hardware name: ASUS System Product Name/PRIME Z590-A, BIOS 1202 10/27/2021
    Sep  9 11:54:27 UnRaid-Server kernel: RIP: 0010:page_vma_mapped_walk+0x209/0x4dc