JesterEE

Posted October 26, 2022 · October 26, 2022

On 10/19/2022 at 9:36 PM, JesterEE said:

I'm going to restart, remove my deluge docker (but keep the appdata of course!) and reinstall Unassigned Devices. If this doesn't work, I think I'm going to head back to a stable 6.10.3 till LimeTech squares this off. If it does, I'm not sure what my next step will be (suggestions?).

7 days uptime after removing my deluge docker and doing normal tasks with the server (VM, docker, plex streaming, databasing, file storage, parity/integrity checks, etc.). I'm going to install it again and see how long it stays stable. I'd bet, less than 3 days. 🤔

If this fails, it would be nice to be able to replicate it without docker in the loop. Is there a known way to issue high IO natively in the Unraid environment (or via a script)? I was thinking stress-ng might be the right thing, but:

I've never used it for IO testing so I don't know if it's doing the right thing(s)
There's no slackware package for it so it would need to be compiled from source and packaged for use in Unraid.

-JesterEE

Posted October 21, 2022 · October 21, 2022

13 hours ago, ShadyDeth said:

EDIT: Looks like I got a bad SFF 8087 TO 4 x SATA cable. Ordered some new ones. Testing on hold until the new ones arrive.

Oooof, I've been there! Terrifying when you see your drives not there all at once.

Posted October 20, 2022 · October 20, 2022

10 hours ago, JesterEE said:

After my restart, I'll be uninstalling Unassigned Devices and reverting to docker macvlan.

Crashed again after 8 hours. Diagnostics attached (it's the same though)

This time, before restarting dirty I wanted to try to get back to my webUI. Still couldn't, but since I can ssh in, I decided to try and stop all my dockers with command:

docker stop $(docker ps -q)

This hung for a moment but worked. The only container it couldn't stop is my torrent container (deluge). But after I stopped all the others, I was able to get back to my local WebUI and proceed with a normal shutdown. This is kinda weird to me, because I don't think that should have made a difference to the Unraid web backend, but I'm not going to think too much into it; it's all weird right now.

When I stopped my array (yes, I usually manually stop my array before a shutdown to see if I can catch bad behavior like this), it hung unmounting the cache drive (where my docker appdata resides), so while I was eventually able to issue a shut down command, I'm pretty sure it wasn't as graceful as it should be (hitting the timeout period and triggering a hard poweroff). I believe something in docker was holding onto that mount from the deluge container being hung.

Also notable, I was in the middle of 'yet another' parity check. I was able to see it got to 22.2% and was still chugging along like nothing went wrong.

This leaves me to believe it's some Unraid/docker edge case and not a plugin interaction at all. Docker was updated between 6.10.3 and 6.11.1 so maybe something isn't quite working right on that release:

6.10.3 - docker: version 20.10.17 (CVE-2022-29526 CVE-2022-30634 CVE-2022-30629 CVE-2022-30580 CVE-2022-29804 CVE-2022-29162 CVE-2022-31030)
6.11.1: docker: version 20.10.18 (CVE-2022-27664 CVE-2022-32190 CVE-2022-36109)

I'm going to restart, remove my deluge docker (but keep the appdata of course!) and reinstall Unassigned Devices. If this doesn't work, I think I'm going to head back to a stable 6.10.3 till LimeTech squares this off. If it does, I'm not sure what my next step will be (suggestions?).

-JesterEE

cogsworth-diagnostics-20221019-2143.zip

Posted October 19, 2022 · October 19, 2022

12 hours ago, JesterEE said:

I'm going to let it sit for another 15 hours or so and then switch back to docker macvlan to unfortunately, really point the finger at UD.

Welp ... glad I waited that extra 15 hours because it happened again just now.

Attached diagnostics for those interested.

After my restart, I'll be uninstalling Unassigned Devices and reverting to docker macvlan.

-JesterEE

cogsworth-diagnostics-20221019-1156.zip

Posted October 19, 2022 · October 19, 2022

11 hours ago, binhex said:

no crashes for me since doing the above, uptime 4 days 20 hours and counting, so MAYBE (a big maybe) it is related to Unassigned Devices (uninstalled after first (and only) crash as mentioned above), luckily for me i don't rely on UD, i only had it installed as i was previously playing with pre-clear on a USB connected drive.

I have to agree. I only did 2 things on this cycle which currently has an uptime of 2 days 9 hours.

Switched docker macvlan -> ipvlan (as noted above in another comment, this doesn't seem to be it)
Moved all my drives off of Unassigned Devices and made them Pool Devices.

All my plugins remain installed and fully updated as of this posts timestamp (i.e. all the ones @JorgeB listed in the OP and more).

I'm thinking it was #2 and something is going on when interfacing UD with a lot of IO traffic (like torrenting) on the 6.11 series. I was previously using one of my UD drives for torrent seeding using LinuxServer.io's deluge container and the Gluetun VPN client docker container. I still use the same containers but moved my UD xfs formatted drive to a pool device (which, for those of you that haven't done this yet, will NOT destroy your data if you don't change the filesystem) and have not had another crash since. I also was using UD for my VM drive, Plex database, and scratch drive so UD was previously doing a lot of heavy lifting. But I really think it was the torrent traffic that did it in because I was often able to access the other features of my server (VMs, plex and docker containers) that were utilizing UD AFTER the effects from the OP were noted.

I'm going to let it sit for another 15 hours or so and then switch back to docker macvlan to unfortunately, really point the finger at UD.

-JesterEE

Posted October 14, 2022 · October 14, 2022

On 10/13/2022 at 5:33 AM, JorgeB said:

Anyone with this issue using ipvlan? If using macvlan try switching to ipvlan

My Safe Mode test completed 2 1/2 days of uptime so I'm calling that a pass. So this leaves any issues caused by the VM Manager, Docker, or plugins.

I started my next test with just the docker macvlan switched to ipvlan and we'll see how it goes. Looking to see another 2-3 days of uptime without errors and I'll reevaluate.

Posted October 13, 2022 · October 13, 2022

5 hours ago, JorgeB said:

Anyone with this issue using ipvlan? If using macvlan try switching to ipvlan

I'll have to check my configs after my current Safe Mode test is done tomorrow, but I do run a docker network, which I believe is macvlan but I'd have to verify that.

November 9, 2021

22 hours ago, jonp said:

@nickp85 and @JesterEE please attach your system diagnostics when you have a moment so I can review those. Again, I have been unable to recreate the issue in the lab, so we're going to need more help from you guys to figure out what's going on.

I have to reload 6.10 on my server to do that. I'll see if I can get to it this week sometime.

November 21, 2020

Upgraded from 6.8.3 without issue! Using the Nvidia drivers as well ... tested working as expected with a number of dockers and Windows Q35 VMs.

Quote

Linux Kernel

This release includes Linux kernel 5.8.18. We realize the 5.8 kernel has reached EOL and we are currently busy upgrading to 5.9.

Looking forward to this 5.9 kernel release!🙏 There is a patch to hwmon I've been waiting to get my hands on!

June 29, 2020

On 6/28/2020 at 9:37 AM, _rogue said:

If I could have this with full ZFS support on the array that would be perfect!

...

Pluz ZFS is looking to become more like unraid with vdev expansion.

I mostly agree ... a ZFS RAIDZ array would be almost perfect. I like everything a ZFS has to offer and the tools that support the function. Bundle that with an Unraid style interface for common array tasks for file versioning, scrubbing, and resilvering ... 🔥!

The #1 place I think ZFS still needs some more time the oven is, as @_rogue pointed out, vdev expansion. All indicators point to that being a priority for the project devs, so maybe ZFS implementation for an Unraid 7.0 release target? Soon™

One issue I see with incorporating ZFS as the "main Unraid array" is how it handles the parity in a ZFS RAIDZ1 implementation; it's just different from how Unraid does it today. While a Unraid array stores parity information on the parity disk(s), a ZFS RAIDZ stores necessary parity throughout the array. Also, the way ZFS caches reads and writes is different and can require a LOT of RAM for big arrays. I'm obviously oversimplifying here, but that fact remains, the way it works is a fundamental shift from the current Unraid state. Is this better ... or worse? I think that's subjective. However given the ZFS baked-in features such as snapshots, block checksums to protect from bitrot, and native copy on write ... I'll think I'd deal with the few downsides.

-JesterEE

June 19, 2020

This bit me yesterday on my 6.8.3 install when a disk was unmountable on my 2 disk cache btrfs "raid1" array 😩. Just like in this post:

I'm glad it's fixed in 6.8, but it would have been nice if Unraid checked for this bug and fixed it with future OS upgrades. Even a check and notification with the Fix Common Problems Plugin would have been helpful.

-JesterEE

June 17, 2020

Quote

In a future release we will include the NVIDIA and AMD GPU drivers natively into Unraid OS. The primary use case is to facilitate accelerated transcoding in docker containers. For this we require Linux to detect and auto-install the appropriate driver. However, in order to reliably pass through an NVIDIA or AMD GPU to a VM, it's necessary to prevent Linux from auto-installing a GPU driver for those devices upon boot, which can be easily done now through System Devices page. Users passing GPU's to VM's are encouraged to set this up now.

This is fantastic and will make may users very happy!

One question though. Currently, using the Linuxserver.io Unraid Nvidia Plugin we can pass through a GPU even with the Linux GPU driver installed. It can get a little dicey when you boot VM that tries to use a GPU that's actively being used in a docker container (Linuxserver.io even identifies this in the support thread) as it locks the Unraid OS and forces a dirty restart. Been there ... a few times 😝. But, I'd gladly take the opportunity to shoot myself in the foot rather than require that there be separate dedicated GPUs for dockers (i.e. available to Linux) and VMs (i.e. unavailable to Linux).

Is there intention to retain this ability in the official release of the baked in GPU drivers?

Thanks for your continued efforts @limetech!

-JesterEE

JesterEE

Posts

Joined

Last visited

Content Type

Profiles

Forums

Downloads

Store

Gallery

Bug Reports

Documentation

Landing

Report Comments posted by JesterEE

Crashes since updating to v6.11.x for qBittorrent and Deluge users

Crashes since updating to v6.11.x for qBittorrent and Deluge users

Crashes since updating to v6.11.x for qBittorrent and Deluge users

Crashes since updating to v6.11.x for qBittorrent and Deluge users

Crashes since updating to v6.11.x for qBittorrent and Deluge users

Crashes since updating to v6.11.x for qBittorrent and Deluge users

Crashes since updating to v6.11.x for qBittorrent and Deluge users

[6.10.0-RC1] Awful VM performance

Unraid OS version 6.9.0-beta35 available

Unraid OS version 6.9.0-beta22 available

[6.7.x] New cache pools are not redundant

Unraid OS version 6.9.0-beta22 available