Jump to content

bobo89

Members
  • Posts

    100
  • Joined

  • Last visited

Posts posted by bobo89

  1. So hear me out, I realize it's not the best option, but there are a few reasons I can't get this to work through another way. I also have higher than 1 Gb internet, so want that to be available to the network.

     

    1) No more spaces for NICs in the server.


    That's about it. Now I could go with a second box to do this, but I'd like to do it this way.

     

    I have created a separate VLAN in my managed witch. I have only exposed that VLAN to Unraid and the internet modem. Now when my plug in my laptop into the switch on that VLAN, I get a public IP, and everything works as intended.

     

    In unraid I have added that same VLAN, but not assigned an IP to unraid on that VLAN (no services exposed). I pass through that VLAN to opnsense, and nothing I can do get's me an IP on the WAN in opnsense. I've confirmed it's the right interface being trunked to my modem, but no dice.

    I do a pcap on the WAN interface while it's trying to get an address. I am seeing DHCP requests flying around, but nothing sticks. When I put the WAN in promiscious mode I am seeing packets captured that are originating from the internet, implying the forwarding and vlaning is working as intended.

    I use the same method to trunk the VLAN for the internal network on opnsense, which works. Is unraid maybe blocking passing through public IPs?

  2. 13 hours ago, itimpi said:

    That call trace looks like it could be macvlan related.   If so you need to either switch docker to using ipvlan, or alternatively disable bridging on eth0 to continue using macvlan.

    Switched to ipvlan. Not only does it seem that fixed the issue (no more traces in the logs for about 9 hours ) but docker networking responsiveness seems to have improved

  3. 10 hours ago, JorgeB said:

    Enable the syslog server and post that after a crash.

     

    The server locks up so hard I can't get onto it.

     

    I have enabled remote syslogging to another server, and offloading to the USB key. I'll see if I can use that to catch another crash. Usually takes about 24 hours.

     

    Edit: Caught one within a couple minutes of starting the array. Attached is diagnostics and here is the sysllog.

     

    root@temp:/var/log# tail -f syslog
    Dec 26 16:56:19 Tower nmbd[31928]:
    Dec 26 16:56:19 Tower nmbd[31928]:   Samba name server TOWER is now a local master browser for workgroup WORKGROUP on subnet 192.168.2.118
    Dec 26 16:56:19 Tower nmbd[31928]:
    Dec 26 16:56:19 Tower nmbd[31928]:   *****
    Dec 26 16:56:22 Tower kernel: Bluetooth: Core ver 2.22
    Dec 26 16:56:22 Tower kernel: NET: Registered PF_BLUETOOTH protocol family
    Dec 26 16:56:22 Tower kernel: Bluetooth: HCI device and connection manager initialized
    Dec 26 16:56:22 Tower kernel: Bluetooth: HCI socket layer initialized
    Dec 26 16:56:22 Tower kernel: Bluetooth: L2CAP socket layer initialized
    Dec 26 16:56:22 Tower kernel: Bluetooth: SCO socket layer initialized
    Dec 26 16:56:33 Tower kernel: docker0: port 1(veth04efed0) entered disabled state
    Dec 26 16:56:33 Tower kernel: veth898bf67: renamed from eth0
    Dec 26 16:56:33 Tower kernel: docker0: port 1(veth04efed0) entered disabled state
    Dec 26 16:56:33 Tower kernel: device veth04efed0 left promiscuous mode
    Dec 26 16:56:33 Tower kernel: docker0: port 1(veth04efed0) entered disabled state
    Dec 26 16:56:36 Tower kernel: NET: Registered PF_PACKET protocol family
    Dec 26 16:56:38 Tower mergerfs[17304]: running basic garbage collection
    Dec 26 16:56:38 Tower mergerfs[17304]: threadpool (fuse.read): spawning 24 threads w/ max queue depth 24
    Dec 26 16:56:38 Tower mergerfs[17304]: read-thread-count=24; process-thread-count=-1; process-thread-queue-depth=-1; pin-threads=false;
    Dec 26 21:56:44 temp systemd[1]: systemd-timedated.service: Deactivated successfully.
    Dec 26 16:58:41 Tower kernel: ------------[ cut here ]------------
    Dec 26 16:58:41 Tower kernel: WARNING: CPU: 7 PID: 6258 at net/netfilter/nf_conntrack_core.c:1210 __nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
    Dec 26 16:58:41 Tower kernel: Modules linked in: af_packet bluetooth ecdh_generic ecc nvidia_uvm(PO) xt_connmark xt_mark xt_comment iptable_raw wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha veth xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle vhost_net tun vhost vhost_iotlb xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter xfs md_mod zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) tcp_diag inet_diag nct6775 nct6775_core hwmon_vid ip6table_filter ip6_tables iptable_filter ip_tables x_tables macvtap macvlan tap bridge 8021q garp mrp stp llc mlx4_en mlx4_core igb i2c_algo_bit nvidia_drm(PO) nvidia_modeset(PO) edac_mce_amd edac_core intel_rapl_msr intel_rapl_common iosf_mbi kvm_amd nvidia(PO) kvm video drm_kms_helper
    Dec 26 16:58:41 Tower kernel: crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 aesni_intel crypto_simd cryptd wmi_bmof mxm_wmi drm mpt3sas rapl backlight k10temp i2c_piix4 nvme syscopyarea raid_class sysfillrect ccp input_leds scsi_transport_sas ahci i2c_core sysimgblt joydev led_class fb_sys_fops nvme_core libahci wmi button acpi_cpufreq unix [last unloaded: mlx4_core]
    Dec 26 16:58:41 Tower kernel: CPU: 7 PID: 6258 Comm: kworker/u64:7 Tainted: P           O       6.1.49-Unraid #1
    Dec 26 16:58:41 Tower kernel: Hardware name: Micro-Star International Co., Ltd. MS-7B78/X470 GAMING PRO CARBON (MS-7B78), BIOS 2.E0 06/10/2020
    Dec 26 16:58:41 Tower kernel: Workqueue: events_unbound macvlan_process_broadcast [macvlan]
    Dec 26 16:58:41 Tower kernel: RIP: 0010:__nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
    Dec 26 16:58:41 Tower kernel: Code: 44 24 10 e8 e2 e1 ff ff 8b 7c 24 04 89 ea 89 c6 89 04 24 e8 7e e6 ff ff 84 c0 75 a2 48 89 df e8 9b e2 ff ff 85 c0 89 c5 74 18 <0f> 0b 8b 34 24 8b 7c 24 04 e8 18 dd ff ff e8 93 e3 ff ff e9 72 01
    Dec 26 16:58:41 Tower kernel: RSP: 0018:ffffc900003b0d98 EFLAGS: 00010202
    Dec 26 16:58:41 Tower kernel: RAX: 0000000000000001 RBX: ffff8881e2698900 RCX: 5703bb9def20d4f0
    Dec 26 16:58:41 Tower kernel: RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8881e2698900
    Dec 26 16:58:41 Tower kernel: RBP: 0000000000000001 R08: 5da7c85202080faa R09: d0edd303cd3e3a8a
    Dec 26 16:58:41 Tower kernel: R10: ca240b8a0ce8c507 R11: ffffc900003b0d60 R12: ffffffff82a11d00
    Dec 26 16:58:41 Tower kernel: R13: 000000000000ba78 R14: ffff88953c5e8800 R15: 0000000000000000
    Dec 26 16:58:41 Tower kernel: FS:  0000000000000000(0000) GS:ffff889f9e9c0000(0000) knlGS:0000000000000000
    Dec 26 16:58:41 Tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Dec 26 16:58:41 Tower kernel: CR2: 000014ad5540b020 CR3: 0000000108a1c000 CR4: 0000000000350ee0
    Dec 26 16:58:41 Tower kernel: Call Trace:
    Dec 26 16:58:41 Tower kernel: <IRQ>
    Dec 26 16:58:41 Tower kernel: ? __warn+0xab/0x122
    Dec 26 16:58:41 Tower kernel: ? report_bug+0x109/0x17e
    Dec 26 16:58:41 Tower kernel: ? __nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
    Dec 26 16:58:41 Tower kernel: ? handle_bug+0x41/0x6f
    Dec 26 16:58:41 Tower kernel: ? exc_invalid_op+0x13/0x60
    Dec 26 16:58:41 Tower kernel: ? asm_exc_invalid_op+0x16/0x20
    Dec 26 16:58:41 Tower kernel: ? __nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
    Dec 26 16:58:41 Tower kernel: ? __nf_conntrack_confirm+0x9e/0x2b0 [nf_conntrack]
    Dec 26 16:58:41 Tower kernel: ? nf_nat_inet_fn+0xc0/0x1a8 [nf_nat]
    Dec 26 16:58:41 Tower kernel: nf_conntrack_confirm+0x25/0x54 [nf_conntrack]
    Dec 26 16:58:41 Tower kernel: nf_hook_slow+0x3d/0x96
    Dec 26 16:58:41 Tower kernel: ? ip_protocol_deliver_rcu+0x164/0x164
    Dec 26 16:58:41 Tower kernel: NF_HOOK.constprop.0+0x79/0xd9
    Dec 26 16:58:41 Tower kernel: ? ip_protocol_deliver_rcu+0x164/0x164
    Dec 26 16:58:41 Tower kernel: __netif_receive_skb_one_core+0x77/0x9c
    Dec 26 16:58:41 Tower kernel: process_backlog+0x8c/0x116
    Dec 26 16:58:41 Tower kernel: __napi_poll.constprop.0+0x2b/0x124
    Dec 26 16:58:41 Tower kernel: net_rx_action+0x159/0x24f
    Dec 26 16:58:41 Tower kernel: __do_softirq+0x129/0x288
    Dec 26 16:58:41 Tower kernel: do_softirq+0x7f/0xab
    Dec 26 16:58:41 Tower kernel: </IRQ>
    Dec 26 16:58:41 Tower kernel: <TASK>
    Dec 26 16:58:41 Tower kernel: __local_bh_enable_ip+0x4c/0x6b
    Dec 26 16:58:41 Tower kernel: netif_rx+0x52/0x5a
    Dec 26 16:58:41 Tower kernel: macvlan_broadcast+0x10a/0x150 [macvlan]
    Dec 26 16:58:41 Tower kernel: ? _raw_spin_unlock+0x14/0x29
    Dec 26 16:58:41 Tower kernel: macvlan_process_broadcast+0xbc/0x12f [macvlan]
    Dec 26 16:58:41 Tower kernel: process_one_work+0x1ab/0x295
    Dec 26 16:58:41 Tower kernel: worker_thread+0x18b/0x244
    Dec 26 16:58:41 Tower kernel: ? rescuer_thread+0x281/0x281
    Dec 26 16:58:41 Tower kernel: kthread+0xe7/0xef
    Dec 26 16:58:41 Tower kernel: ? kthread_complete_and_exit+0x1b/0x1b
    Dec 26 16:58:41 Tower kernel: ret_from_fork+0x22/0x30
    Dec 26 16:58:41 Tower kernel: </TASK>
    Dec 26 16:58:41 Tower kernel: ---[ end trace 0000000000000000 ]---

     

     

    In case this pictures helps this was displayed on the screen when I got to it.(when there was a hard crash, not related to the previous text dump)

     

    image.thumb.png.2788a13ce51f232443aef4698055dddc.png

     

    tower-diagnostics-20231226-1702.zip

  4. Chasing 2 things here. Server started locking up once a day after I added a new vlan on the NIC and adding that VLAN to docker. (unsure if that is the cause, but that's the major last thing I've done). All I can capture is this on the screen. Server is locked up hard and can't be interacted with. Any thoughts what causing this issue? macvlan?

    image.thumb.png.a02f0f6ea2cfda14143e41de6aecedad.png

     

     

    Second thing is after a couple hard reboots drive6 started reading "unmountable: unsoported or no file system".

    Following the instructions here I mounted in maintenance:

     

    https://docs.unraid.net/unraid-os/manual/storage-management/#drive-shows-as-unmountable

     

    
    
    root@Tower:/mnt# xfs_repair -n -L -v /dev/md6p1
    Phase 1 - find and verify superblock...
            - block cache size set to 6118448 entries
    Phase 2 - using internal log
            - zero log...
    zero_log: head block 3003760 tail block 3002304
    ALERT: The filesystem has valuable metadata changes in a log which is being
    ignored because the -n option was used.  Expect spurious inconsistencies
    which may be resolved by first mounting the filesystem to replay the log.
            - scan filesystem freespace and inode maps...
    sb_fdblocks 98313526, counted 111938173
            - found root inode chunk
    Phase 3 - for each AG...
            - scan (but don't clear) agi unlinked lists...
            - process known inodes and perform inode discovery...
            - agno = 0
            - agno = 1
            - agno = 2
            - agno = 3
            - agno = 4
            - agno = 5
    inode 11297473053 - bad extent starting block number 4503567551246457, offset 0
    correcting nextents for inode 11297473053
    bad data fork in inode 11297473053
    would have cleared inode 11297473053
            - agno = 6
            - agno = 7
            - process newly discovered inodes...
    Phase 4 - check for duplicate blocks...
            - setting up duplicate extent list...
            - check for inodes claiming duplicate blocks...
            - agno = 0
            - agno = 6
            - agno = 1
            - agno = 2
            - agno = 4
            - agno = 5
            - agno = 7
            - agno = 3
    entry "00141-capture.jpg" at block 1 offset 608 in directory inode 11297472846 references free inode 11297473053
            would clear inode number in entry at offset 608...
    inode 11297473053 - bad extent starting block number 4503567551246457, offset 0
    correcting nextents for inode 11297473053
    bad data fork in inode 11297473053
    would have cleared inode 11297473053
    No modify flag set, skipping phase 5
    Phase 6 - check inode connectivity...
            - traversing filesystem ...
            - agno = 0
            - agno = 1
            - agno = 2
            - agno = 3
            - agno = 4
            - agno = 5
    entry "00141-capture.jpg" in directory inode 11297472846 points to free inode 11297473053, would junk entry
    bad hash table for directory inode 11297472846 (no data entry): would rebuild
    would rebuild directory inode 11297472846
            - agno = 6
            - agno = 7
            - traversal finished ...
            - moving disconnected inodes to lost+found ...
    Phase 7 - verify link counts...
    No modify flag set, skipping filesystem flush and exiting.
    
            XFS_REPAIR Summary    Mon Dec 25 23:12:52 2023
    
    Phase           Start           End             Duration
    Phase 1:        12/25 23:10:44  12/25 23:10:44
    Phase 2:        12/25 23:10:44  12/25 23:10:45  1 second
    Phase 3:        12/25 23:10:45  12/25 23:11:53  1 minute, 8 seconds
    Phase 4:        12/25 23:11:53  12/25 23:11:54  1 second
    Phase 5:        Skipped
    Phase 6:        12/25 23:11:54  12/25 23:12:52  58 seconds
    Phase 7:        12/25 23:12:52  12/25 23:12:52
    
    Total run time: 2 minutes, 8 seconds
    
    [Click and drag to move]
    

     

     

     

    root@Tower:/mnt# xfs_repair -v /dev/md6p1
    Phase 1 - find and verify superblock...
            - block cache size set to 6118448 entries
    Phase 2 - using internal log
            - zero log...
    zero_log: head block 3003760 tail block 3002304
    ERROR: The filesystem has valuable metadata changes in a log which needs to
    be replayed.  Mount the filesystem to replay the log, and unmount it before
    re-running xfs_repair.  If you are unable to mount the filesystem, then use
    the -L option to destroy the log and attempt a repair.
    Note that destroying the log may cause corruption -- please attempt a mount
    of the filesystem before doing this.

     

     

     

    Basically do I want to run it with the -L now or not ?

     

     

    tower-diagnostics-20231225-2303.zip

  5. So I rebooted multiple times and tried again. Tried in Firefox, opera and edge all from the same machine and everytime I did, same behaviour. I could see my click being registered in the browser (in developer tools), but no response would come from server.

     

    I then tried from a different machine, Firefox on Linux and worked exactly as intended. Having trouble explaining that one...

     

    Nothing assigned as disk 1, I guess when I built the array I started from disk 2.

     

    One quirk I did find during this process, I thought the issue was I can't add 2 disks at once, so I removed 1 of the new disks from the array. When I clicked format(and finally worked when attempting from the new machine) it formated both the new disk in the array and the new disk NOT in the array, even though it wasnt part of the array

     

    After then adding the second new disk to the array, unraid is now "rebuilding data" onto the 2nd new disk. Seems like a bug to me.

  6. tower-diagnostics-20231021-1529.zip

     

    I bought 2 new Seage SMR 8 TB drives and LSI 9207-8i (in IT mode). I connected the 2 new drives to the controller. No issues with that ptocess.

     

    I ran a preclear on them ( the 3 part one. Read, write then read). After the write stage, I confirmed that the correct headers were written to the drives. (the pre-clear header was confirmed written in the logs).

     

    However I got tired of waiting for the final read pass, so I cancelled it.

     

    I then disabled the array, added both drives as Disk 7 and 8, and started array.

     

    Both drives showed up as Umountable.

     

    I check "Format will create a file system in all Unmountable disks.", and click Format. and nothing happens.

     

    I have tried removing one of the disks in order to only have 1 added to the array at a time, but clicking the "Format" button doesn't do anything.

     

    So I"m noticing that the raw read error rate on the 2 new drives is huge, and growing... Is that a possible issue ? If so, it's less likely the the LSI 9207 but maybe the cables I bought ?

  7. Motherboard MSI X470 Carbon Pro - Few versions behind on BIOS

     

    Old NIC that is flaky - Emulex Corporation OneConnect OCe10100/OCe10102 Series 10 GbE

     

    Another NIC in the box that works - Mellanox Technologies MT27500 Family [ConnectX-3]

     

     

    5 hours ago, Tomo82 said:

    What motherboard do you have? Is there a BIOS Update for it?
     

    What was the old NIC that worked? Any other NICs working?

     

     

  8. So unfortunately the only game I care about is FS 2020 which is very CPU and GPU intensive. (Have a 3090, but currently CPU bound)

     

    I'm not keen on upgrading the platform and happy with AM4 that I'm on. The 3900x in a VM is not cutting it.

     

    My question is if I were to get the 5800x3d given that it's the best CPU for FS 2020, would all the features that I could potentially gain (3d cache and higher boosting) be accessible via unraid.

     

    That is, does the current linux kernel support boosting to the 5800x3d values, and would the 3d cache be as effective given I have a VM ontop of everything. Is Qemu aware of this extra layer of cache and would the performance improvment be passed on to the VM?

  9. Have you resorted this problem? I'm facing the same, already-running container with 127.0.0.11 do work, but generating the same instance with another IP get's 127.0.0.11 again but is unable to establish a DNS connection - pinging 8.8.8.8 works whereas google.com doesn't.

    Any help or clarification would be really appreciated! [emoji4]

    Kind regards
    Hank
    From memory this only affected some dockers. In the extra parameters field I added --dns DNS server

    Sent from my SM-G988W using Tapatalk

    • Thanks 1
  10. 1 hour ago, Fastcompjason said:

    I am running Unraid 3.9.2 and I have a Ryzen 3900x with X570 chipset on the MSI Prestige Creation Mobo. I have never been able to get the CPU temps to show up in my Unraid since I built this server over a year ago. I realize that there have been improvements to the BIOS (which my mobo is on latest BIOS as of August 2021. How can I get my temps to show up? I want to make sure that my system is running cool enough. Is it something to do with BIOS options for CPU control like the other user mentioned above? Is there any documented settings that are known to work / not work that I can check? I thought that Unraid 6.9 was supposed to fix the issues with the Ryzen temps not showing?

    do you have dynamix system temperature? It's a plugin.

  11. 9 hours ago, Iker said:

    Hi, that is an issue with the x470 board BIOS, in my older mobo I had the exact same Processor and board, a x470 Gaming Pro Carbon with 2.E0 Bios; you have to enable the option "AMD Cool & Quiet" ; "PBO" and "Global C-States" must be Enabled; make sure that 'Power supply idle mode' is set to 'Typical Idle Current'; then the ACPI Driver could work nice and let you choose the profile that you prefer.

     

    Even if I haven't checked your ram in the QVL list for the MSI Board, as a safe measure, disable DOCP and let the ram at 2133 Mhz, then, when everything is stable, start your way up in the ram speed, as an advice, you should never go beyond 3200; 3600 Mhz is way to high for the Processor and Unraid in general (I could be wrong, but there is not much performance left in the table beyond 3200 with a 3900x).

     

    Let me know if everything works ok.

    Bingo that was it. I am only seeing boosting to 4.15 Ghz. Is that configurable in the bios ?

  12. On 6/8/2021 at 10:14 PM, bobo89 said:

    Running 6.9.2, 3900x on a x470 board.

     

    In tips and tweaks it says "no drivers" and Turbo boost option is blank.

    Governers are selectable.

     

    Do I need to load my own driver ?

     

    root@Tower:~# cpufreq-info
    cpufrequtils 008: cpufreq-info (C) Dominik Brodowski 2004-2009
    Report errors and bugs to [email protected], please.
    analyzing CPU 0:
      no or unknown cpufreq driver is active on this CPU
      maximum transition latency: 4294.55 ms.
    analyzing CPU 1:

     

     

    Anyone?

  13. I have a docker only vlan with no IPv4 assignment in the networking tab.

     

    On 6.8.3 all containers were happily accessing the right DNS server which is a pihole docker container. However after upgrading to 6.9.2 containers all have /etc/resolv.conf set to 127.0.0.11. Where was the change that is preventing the dockers from accessing the DNS as normal?

  14. Running 6.9.2, 3900x on a x470 board.

     

    In tips and tweaks it says "no drivers" and Turbo boost option is blank.

    Governers are selectable.

     

    Do I need to load my own driver ?

     

    root@Tower:~# cpufreq-info
    cpufrequtils 008: cpufreq-info (C) Dominik Brodowski 2004-2009
    Report errors and bugs to [email protected], please.
    analyzing CPU 0:
      no or unknown cpufreq driver is active on this CPU
      maximum transition latency: 4294.55 ms.
    analyzing CPU 1:

     

     

  15. My issue is that on server reboot some of my docker containers seem to boot faster than the rclone mount comes up. For example emby keeps giving me can't find media stream errors and from within the docker console no files show up. Rebooting the docker containers picks up the mount properly.

    Anyone else have this problem ? Any way to delay docker startup on array start ?

    Sent from my SM-N960W using Tapatalk

  16. Run the memtest option for at least several hours.
    Ryzen 3900x with 128gb of ram at 3600 MHz . Although that is technically an OC 22 hours or memtest ran fine.

    I realized what my issue was. I wasn't zipping up the files from the usb backup, but the whole usb folder was zipped, so the format or the usb wasn't right.

    Last question is, I've booted in now, but the drive assignments are all blank, however the config folder contains a disk_assignments.txt file however unraid isn't picking it up. Should I manually reassign or is this indicative of another problem ?

    Sent from my SM-N960W using Tapatalk

  17. 15 minutes ago, jonathanm said:

    Try creating a memtest USB on another stick and see if that runs.

    I was still on 6.8.3, so I just created a 6.9.2 USB (same usb key), and tried to copy over just the config files. That booted fine to UNRAID, but the config was all messed up, with drives unassigned and services wonky.

     

    Seems that the USB key is good, but the backups are all not. Is there anyway for me to gracefully upgrade from 6.8.3 to 6.9.2 and preserve the config without being able to boot into my last known good config?

×
×
  • Create New...