Jump to content

JesterEE

Members
  • Posts

    169
  • Joined

  • Last visited

Report Comments posted by JesterEE

  1. On 10/19/2022 at 9:36 PM, JesterEE said:

    I'm going to restart, remove my deluge docker (but keep the appdata of course!) and reinstall Unassigned Devices. If this doesn't work, I think I'm going to head back to a stable 6.10.3 till LimeTech squares this off. If it does, I'm not sure what my next step will be (suggestions?).

     

    7 days uptime after removing my deluge docker and doing normal tasks with the server (VM, docker, plex streaming, databasing, file storage, parity/integrity checks, etc.). I'm going to install it again and see how long it stays stable. I'd bet, less than 3 days. 🤔

     

     

    If this fails, it would be nice to be able to replicate it without docker in the loop. Is there a known way to issue high IO natively in the Unraid environment (or via a script)? I was thinking stress-ng might be the right thing, but:

    1. I've never used it for IO testing so I don't know if it's doing the right thing(s)
    2. There's no slackware package for it so it would need to be compiled from source and packaged for use in Unraid.

     

    -JesterEE

     

  2. 10 hours ago, JesterEE said:

    After my restart, I'll be uninstalling Unassigned Devices and reverting to docker macvlan.

     

     

    Crashed again after 8 hours. Diagnostics attached (it's the same though)

     

    This time, before restarting dirty I wanted to try to get back to my webUI. Still couldn't, but since I can ssh in, I decided to try and stop all my dockers with command:

     

    docker stop $(docker ps -q)

     

    This hung for a moment but worked. The only container it couldn't stop is my torrent container (deluge). But after I stopped all the others, I was able to get back to my local WebUI and proceed with a normal shutdown. This is kinda weird to me, because I don't think that should have made a difference to the Unraid web backend, but I'm not going to think too much into it; it's all weird right now.

     

    When I stopped my array (yes, I usually manually stop my array before a shutdown to see if I can catch bad behavior like this), it hung unmounting the cache drive (where my docker appdata resides), so while I was eventually able to issue a shut down command, I'm pretty sure it wasn't as graceful as it should be (hitting the timeout period and triggering a hard poweroff). I believe something in docker was holding onto that mount from the deluge container being hung.

     

    Also notable, I was in the middle of 'yet another' parity check. I was able to see it got to 22.2% and was still chugging along like nothing went wrong.

     

    This leaves me to believe it's some Unraid/docker edge case and not a plugin interaction at all.  Docker was updated between 6.10.3 and 6.11.1 so maybe something isn't quite working right on that release:

    • 6.10.3 - docker: version 20.10.17 (CVE-2022-29526 CVE-2022-30634 CVE-2022-30629 CVE-2022-30580 CVE-2022-29804 CVE-2022-29162 CVE-2022-31030)
    • 6.11.1: docker: version 20.10.18 (CVE-2022-27664 CVE-2022-32190 CVE-2022-36109)

     

    I'm going to restart, remove my deluge docker (but keep the appdata of course!) and reinstall Unassigned Devices. If this doesn't work, I think I'm going to head back to a stable 6.10.3 till LimeTech squares this off. If it does, I'm not sure what my next step will be (suggestions?).

     

    -JesterEE

     

    cogsworth-diagnostics-20221019-2143.zip

    • Like 1
  3. 12 hours ago, JesterEE said:

    I'm going to let it sit for another 15 hours or so and then switch back to docker macvlan to unfortunately, really point the finger at UD.

     

    Welp ... glad I waited that extra 15 hours because it happened again just now.

     

    Attached diagnostics for those interested.

     

    After my restart, I'll be uninstalling Unassigned Devices and reverting to docker macvlan.

     

    -JesterEE

     

     

    cogsworth-diagnostics-20221019-1156.zip

  4. 11 hours ago, binhex said:

     

    no crashes for me since doing the above, uptime 4 days 20 hours and counting, so MAYBE (a big maybe) it is related to Unassigned Devices (uninstalled after first (and only) crash as mentioned above), luckily for me i don't rely on UD, i only had it installed as i was previously playing with pre-clear on a USB connected drive.

     

    I have to agree. I only did 2 things on this cycle which currently has an uptime of 2 days 9 hours.

    1. Switched docker macvlan -> ipvlan (as noted above in another comment, this doesn't seem to be it)
    2. Moved all my drives off of Unassigned Devices and made them Pool Devices.

     

    All my plugins remain installed and fully updated as of this posts timestamp (i.e. all the ones @JorgeB listed in the OP and more).

     

    I'm thinking it was #2 and something is going on when interfacing UD with a lot of IO traffic (like torrenting) on the 6.11 series. I was previously using one of my UD drives for torrent seeding using LinuxServer.io's deluge container and the Gluetun VPN client docker container. I still use the same containers but moved my UD xfs formatted drive to a pool device (which, for those of you that haven't done this yet, will NOT destroy your data if you don't change the filesystem) and have not had another crash since. I also was using UD for my VM drive, Plex database, and scratch drive so UD was previously doing a lot of heavy lifting. But I really think it was the torrent traffic that did it in because I was often able to access the other features of my server (VMs, plex and docker containers) that were utilizing UD AFTER the effects from the OP were noted.

     

    I'm going to let it sit for another 15 hours or so and then switch back to docker macvlan to unfortunately, really point the finger at UD.

     

    -JesterEE

    • Like 3
  5. On 10/13/2022 at 5:33 AM, JorgeB said:

    Anyone with this issue using ipvlan? If using macvlan try switching to ipvlan

     

    My Safe Mode test completed 2 1/2 days of uptime so I'm calling that a pass. So this leaves any issues caused by the VM Manager, Docker, or plugins.

     

    I started my next test with just the docker macvlan switched to ipvlan and we'll see how it goes. Looking to see another 2-3 days of uptime without errors and I'll reevaluate.

    • Like 1
  6. Upgraded from 6.8.3 without issue!  Using the Nvidia drivers as well ... tested working as expected with a number of dockers and Windows Q35 VMs.

     

    Quote

    Linux Kernel

    This release includes Linux kernel 5.8.18.  We realize the 5.8 kernel has reached EOL and we are currently busy upgrading to 5.9.

    Looking forward to this 5.9 kernel release!🙏  There is a patch to hwmon I've been waiting to get my hands on!

  7. On 6/28/2020 at 9:37 AM, _rogue said:

    If I could have this with full ZFS support on the array that would be perfect! 

    ...

    Pluz ZFS is looking to become more like unraid with vdev expansion.

    I mostly agree ... a ZFS RAIDZ array would be almost perfect.  I like everything a ZFS has to offer and the tools that support the function.  Bundle that with an Unraid style interface for common array tasks for file versioning, scrubbing, and resilvering ... 🔥!

     

    The #1 place I think ZFS still needs some more time the oven is, as @_rogue pointed out, vdev expansion.  All indicators point to that being a priority for the project devs, so maybe ZFS implementation for an Unraid 7.0 release target?  Soon™

     

    One issue I see with incorporating ZFS as the "main Unraid array" is how it handles the parity in a ZFS RAIDZ1 implementation; it's just different from how Unraid does it today.  While a Unraid array stores parity information on the parity disk(s), a ZFS RAIDZ stores necessary parity throughout the array.  Also, the way ZFS caches reads and writes is different and can require a LOT of RAM for big arrays.  I'm obviously oversimplifying here, but that fact remains, the way it works is a fundamental shift from the current Unraid state.  Is this better ... or worse?  I think that's subjective.  However given the ZFS baked-in features such as snapshots, block checksums to protect from bitrot, and native copy on write ... I'll think I'd deal with the few downsides.

     

    -JesterEE

  8. Quote

    In a future release we will include the NVIDIA and AMD GPU drivers natively into Unraid OS.  The primary use case is to facilitate accelerated transcoding in docker containers.  For this we require Linux to detect and auto-install the appropriate driver.  However, in order to reliably pass through an NVIDIA or AMD GPU to a VM, it's necessary to prevent Linux from auto-installing a GPU driver for those devices upon boot, which can be easily done now through System Devices page.  Users passing GPU's to VM's are encouraged to set this up now.

    This is fantastic and will make may users very happy!

     

    One question though.  Currently, using the Linuxserver.io Unraid Nvidia Plugin we can pass through a GPU even with the Linux GPU driver installed. It can get a little dicey when you boot VM that tries to use a GPU that's actively being used in a docker container (Linuxserver.io even identifies this in the support thread) as it locks the Unraid OS and forces a dirty restart.  Been there ... a few times 😝.  But, I'd gladly take the opportunity to shoot myself in the foot rather than require that there be separate dedicated GPUs for dockers (i.e. available to Linux) and VMs (i.e. unavailable to Linux).

     

    Is there intention to retain this ability in the official release of the baked in GPU drivers?

     

    Thanks for your continued efforts @limetech!

    -JesterEE

×
×
  • Create New...