vakilando

Moderators
  • Posts

    367
  • Joined

  • Last visited

Report Comments posted by vakilando

  1. Ok, it seems to be fixed for me.

    I rebooted several times, updated unriad to 6.10.1 ans everything (shim network) works as expected.

     

    Note:

    I realized that unraid does not create the shim network after recovering (rebooting) from an unraid crash.

    I still don't know exactly why it crashes...but my raspi (with pivccu homematic CCU3) crashes at the same time. I have the presumption, that it's my raspi crashing first and unraid crashes because of the raspi. They are on the same socket strip... Investigating.........

     

    So this bug report can be closed again!

     

     

  2. Ok, I have better informations now. I know what happens but still don't know the cause...

     

    I am on 6.9.2 and also randomly encounter the problem to loose connection from host to some docker containers mostly after an reboot of unraid.
    Sometimes this issue aslo comes out of the blue. 
    I don't know exactly when it appears on my running Unraid Server (out of the blue) because I may realise this some days after it appeared... But I can imagine that it may somtimes happen after a automatic backup of appdate with the plugin "CA appdata backup/restore V2" because this plugin stops and resatrs the running docker container. 

     

    Last time it happend: Yesterday.
      Probably at 1:00 AM. My server just rebooted out of the blue because of another problem (I'm investigating...)
      After this: no shim networks. Resolved today at ~08 AM
      (see attached log)

     

    My relevant configuration:

    I have

    • Network: two NICs and four VLANs.
    • Docker: "Allow access to host networks" checked/active.
    • Dockers and VMs in those VLANs (br.01, br0.5, br0.6, br1.5, br1.16)
    • A Home Assistant Docker (host network) that looses connection to some other docker containers on different vlans (e.g. ispyagentdvr on custom br0.6 network, motioneye on custom br0.5 network, frigate on custom br1.15 network).

     

    This raises this issue:

    • Reboot of unraid: sometimes
    • Running unraid: sometimes (because of plugin "CA appdata backup/restore V2"??)

     

    This workaround solves this issue temporary:

    • Always: Stopp docker service, de-/reactivate "Allow access to host networks", restart docker service
    • Sometimes: Reboot of unraid

     

    I didn't try manually readding the shim networks but in this post "shim-br0-networs-in-unraid" it seems to be a possible workaround:

     

     

    So the problem are the shim networks!?

    • They sometimes aren't set at boot. (Why?)
    • They sometimes get lost. (Why?)

     

    What are shim networks?:
    shim networks should be created when the Docker setting "Host access to custom networks" is enabled.
    This allows Unraid to talk directly with docker containers, which are using macvlan (custom) networks.
    But those shim networks are not allway created after reboot!

     

    So it's still a NOT solved bug:

     

    What worries me, is that this is a bug that seems to persist in Unraid 6.10-rc3:

     

    Perhaps a user-script could detect missing shim networks and readding them? Any ideas or hints??

     

    Please see the pictures and the log I attached.

     

    Before stopping docker service.

    1626406153_2022-05-1808-07-30-RoutingTable-noDockerconnecteachother.thumb.png.77f1c3a6e57fadc38b2b32087d34768f.png

     

    After de-/reactivating "Allow access to host networks" followed by restarting docker service

    567396766_2022-05-1808-13-45-RoutingTable-OKDockerconnecteachother.thumb.png.a89615468e42fa9f07007168e86f6fb2.png

     

    See the (commented) log file:

    syslog_2022-05-18_crash-at-01-AM-no-shim-networks-after-reboot_fix-at-08-AM.log

  3. I'm also still randomly encountering this problem. This issue doesn't seem to be finally solved...

    I have "Allow access to host networks" checked/active.

    My Home Assistant Docker (host network) sometimes looses connection to some other docker containers on different vlans (e.g. ispyagentdvr on custom br0.6 network, motioneye on custom br0.5 network, frigate on custom br1.15 network).

    Stopping and starting the docker service always solves this issue. A reboot of unraid sometimes solves this issue, sometimes it's raising this issue. I have two NICs and four VLANs.

  4. On 8/11/2020 at 9:57 PM, testdasi said:

    Yep, 6.9.0 should bring improvement to your situation. But as I said, you need to wipe the drive in 6.9.0 to reformat it back to 1MiB alignment and needless to say it would make the drive incompatible with Unraid before 6.9.0.

    Essentially back up, stop array, unassign, blkdiscard, assign back, start and format, restore backup. Beside backing up and restoring from backup, the middle process took 5 minutes.

     

    I expect LT to provide more detailed guidance regarding this perhaps when 6.9.0 enters RC or at least when 6.9.0 becomes stable.

    Not that 6.9.0-beta isn't stable. I did see some bugs report but I personally have only seen the virtio / virtio-net thingie which was fixed by using Q35-5.0 machine type (instead of 4.2). No need to use virtio-net which negatively affects network performance.

     

     

     

    PS: been running iotop for 3 hours and still average about 345MB / hr. We'll see if my daily house-keeping affects it tonight.

    Thanks!
    The procedure "back up, stop array, unassign, blkdiscard, assign back, start and format, restore backup" is no problem and not new for me (except of blkdiscard) as I had to do it as my cache disks died because of those ugly unnecessary writes on btrfs-cache-pool...

    As said before, I tend changing my cache to xfs with a singel disk an wait for the stable release 6.9.x

    Meanwhile I'll think about a new concept managing my disks.

    This is my configuration at the moment:

    • Array of two disks with one parity (4+4+4TB WD red)
    • 1 btrfs cache pool (raid1) for cache, docker appdata, docker and folder redirection for my VMs (2 MX500 1 TB)
    • 1 UD for my VMs (1 SanDisk plus 480 GB)
    • 1 UD for Backup data (6 TB WD red)
    • 1 UD for nvr/cams (old 2 TB WD green)

    I still have two 1TB and one 480 GB SSDs lying around here..... I have to think about how I could use them with the new disk pools in 6.9

  5. Damn! My Server seems also to be affected...
    I had an unencrypted BTRFS RAID 1 with two SanDisk Plus 480 GB.
    Both died in quick succession (mor or less 2 weeks) after 2 year of use! 

    So I bought two 1 TB Crucial MX500.
    As I didn't know about the problem I again made a unencrypted BTRFS RAID 1 (01 July 2020).
    As I found it strange that they died in quick succession I did some researches and found all those threads about massive writes on BTRFS cache disks.
    I made some tetst and here are the results.

     

    ### Test 1:

     

    running "iotop -ao" for 60 min: 2,54 GB [loop2] (see pic1)

    pic1.png.1f36cc11a2b99512e6724481eb27c8a8.png

     

    Docker Container running:

    The docker containers running during this test are the most important for me.
    I stopped Pydio and mariadb though its also important for me - see other tests for the reason...

      - ts-dnsserver
      - letsencrypt
      - BitwardenRS
      - Deconz
      - MQTT
      - MotionEye
      - Homeassistant
      - Duplicacy

     

    shfs writes:

      - Look pic1, are the shfs writes ok? I don't know...

     

    VMs running (all on Unassigned disk):
      - Linux Mint (my primary Client)
      - Win10
      - Debian with SOGo Mail Server

     

    /usr/sbin/smartctl -A /dev/sdg | awk '$0~/LBAs/{ printf "TBW %.1f\n", $10 * 512 / 1024^4 }' => TBW 10.9
    /usr/sbin/smartctl -A /dev/sdh | awk '$0~/LBAs/{ printf "TBW %.1f\n", $10 * 512 / 1024^4 }' => TBW 10.9



    ### Test 2:


    running "iotop -ao" for 60 min: 3,29 GB [loop2] (see pic2)

    pic2.thumb.png.ae87dcdb6ad428c2b0da7b4cd778765e.png

     

    Docker Container running (almost all of my dockers):
      - ts-dnsserver
      - letsencrypt
      - BitwardenRS
      - Deconz
      - MQTT
      - MotionEye
      - Homeassistant
      - Duplicacy
      ----------------
      - mariadb
      - Appdeamon
      - Xeoma
      - NodeRed-OfficialDocker
      - hacc
      - binhex-emby
      - embystat
      - pydio
      - picapport
      - portainer

     

    shfs writes:

      - Look pic2, there are massive shfs writes too!

     

    VMs running (all on Unassigned disk)
      - Linux Mint (my primary Client)
      - Win10
      - Debian with SOGo Mail Server

     

    /usr/sbin/smartctl -A /dev/sdg | awk '$0~/LBAs/{ printf "TBW %.1f\n", $10 * 512 / 1024^4 }' => TBW 11 
    /usr/sbin/smartctl -A /dev/sdh | awk '$0~/LBAs/{ printf "TBW %.1f\n", $10 * 512 / 1024^4 }' => TBW 11 

     

     

    ### Test 3:


    running "iotop -ao" for 60 min: 3,04 GB [loop2] (see pic3)

    pic3.thumb.png.7b37ebf8d148be391973a1671d6ca863.png

     

    Docker Container running (almost all my dockers except mariadb/pydio!):
      - ts-dnsserver
      - letsencrypt
      - BitwardenRS
      - Deconz
      - MQTT
      - MotionEye
      - Homeassistant
      - Duplicacy
      ----------------
      - Appdeamon
      - Xeoma
      - NodeRed-OfficialDocker
      - hacc
      - binhex-emby
      - embystat
      - picapport
      - portainer

     

    shfs writes:

      - Look at pic3, the shfs writes are clearly less without mariadb!
        (I also stopped pydio as it needs mariadb...)

     

    VMs running (all on Unassigned disk)
      - Linux Mint (my primary Client)
      - Win10
      - Debian with SOGo Mail Server

     

    /usr/sbin/smartctl -A /dev/sdg | awk '$0~/LBAs/{ printf "TBW %.1f\n", $10 * 512 / 1024^4 }' => TBW 11
    /usr/sbin/smartctl -A /dev/sdh | awk '$0~/LBAs/{ printf "TBW %.1f\n", $10 * 512 / 1024^4 }' => TBW 11


     

    ### Test 4:


    running "iotop -ao" for 60 min: 6,23 M [loop2] (see pic4)

    pic4.thumb.png.cbe03a5616fccfb64bf7aec12a14267a.png

     

    Docker Container running:

      - none, but docker service is started

     

    shfs writes:

      - none

     

    VMs running (all on Unassigned disk)
      - Linux Mint (my primary Client)
      - Win10
      - Debian with SOGo Mail Server

     

    /usr/sbin/smartctl -A /dev/sdg | awk '$0~/LBAs/{ printf "TBW %.1f\n", $10 *
    PLEASE resolve this problem in next stable release!!!!!!!

    Next weenkend I will remove the BTRFS RAID 1 Cache and go with one single XFS cache disk.
     

    If I ca do more analysis and research, please let me know. I'll do my best!

  6. I can and confirm that setting

    "Settings => Global Share Settings => Tunable (support Hard Links)" to NO

    resolves the problem.

    Strange thing is that I never had a problem with nfs shares before.

    The problems started after I upgraded my Unraid Server (mobo, cpu, ...) and installed a Linux Mint VM as my primary Client.
    I "migrated" the nfs settings (fstab) from my old Kubuntu Client (real hardware, no VM) to the new Linux Mint VM and the problems started. The old Kubuntu Client does not seem to have those problems...

    Perhaps also a client problem? kubuntu vs mint, nemo vs dolphin?

     

    I do not agree that NFS is an outdated archaic protocol, it works far better than SMB if you have Linux Clients!

    • Like 2