JesterEE

Members
  • Posts

    168
  • Joined

  • Last visited

Everything posted by JesterEE

  1. Still running 6.11.2, but I'll run this test for a couple days and see what happens w/o my torrent app on. Since @ShadyDeth reported the crash still happens in 6.11.3 and nothing concerning this issue has changed, we'll see what we see.
  2. Some of us over in this thread would appreciate some of this 6.11 series bug squashing love: -JesterEE
  3. Unencrypted XFS cache (separate from the array cache). I was using UD at first (earlier part of this thread) because I never got around to moving the static non-array storage to cache when multiple caches were enabled in Unraid a few releases back. To debug and remove UD as the potential culprit of this issue, I removed all my static drive UD dependence, and ran a test with UD both installed and uninstalled and got the same error. I don't think this is a UD problem. -JesterEE
  4. No joy with 6.11.2. Almost 2 days of uptime before 💣. Back to 6.10.3 when I get some time. cogsworth-diagnostics-20221107-2355.zip
  5. Hadn't gotten around to putting 6.10.3 back on, so I guess I'll give this a shot first.
  6. Well, it was actually only 26 hours and poof! This is pretty conclusive to me. Something is messed up, and it's not something I did 😝. I'm not 100% sure if it's high IO, the network interface, or docker, but any way you slice it, it's an OS issue and needs to be addressed by the devs in a future release. If someone at @limetech posts here (or contacts me privately) by 10/29/22 asking to help debug, I will gladly do so. If not, back to 6.10.3 and a stable server for me. -JesterEE
  7. Do you have a way of using that utility in Unraid to see if it comes up without docker? I couldn't find a Slackware package for it.
  8. Short answer... Yes! I've never used that utility, but I just googled it and if there's value, I could run some tests.
  9. Why would this be any harder to do in docker than on the host?? If you have a preferred method of doing this at a shell prompt, it's easy enough to just run that command in any docker image you want.
  10. Sure, it could be a lot of things since torrent clients do a lot from networking to file IO. Plus, in Unraid, this includes the docker abstraction layer so that's yet another whole set of potential interactions. This is why I asked above if there is any testing we can do natively in Unraid. I was hoping a dev would chime in with a unit test or something along those lines so we can start to get some resolution on this. -JesterEE
  11. 7 days uptime after removing my deluge docker and doing normal tasks with the server (VM, docker, plex streaming, databasing, file storage, parity/integrity checks, etc.). I'm going to install it again and see how long it stays stable. I'd bet, less than 3 days. 🤔 If this fails, it would be nice to be able to replicate it without docker in the loop. Is there a known way to issue high IO natively in the Unraid environment (or via a script)? I was thinking stress-ng might be the right thing, but: I've never used it for IO testing so I don't know if it's doing the right thing(s) There's no slackware package for it so it would need to be compiled from source and packaged for use in Unraid. -JesterEE
  12. Oooof, I've been there! Terrifying when you see your drives not there all at once.
  13. Crashed again after 8 hours. Diagnostics attached (it's the same though) This time, before restarting dirty I wanted to try to get back to my webUI. Still couldn't, but since I can ssh in, I decided to try and stop all my dockers with command: docker stop $(docker ps -q) This hung for a moment but worked. The only container it couldn't stop is my torrent container (deluge). But after I stopped all the others, I was able to get back to my local WebUI and proceed with a normal shutdown. This is kinda weird to me, because I don't think that should have made a difference to the Unraid web backend, but I'm not going to think too much into it; it's all weird right now. When I stopped my array (yes, I usually manually stop my array before a shutdown to see if I can catch bad behavior like this), it hung unmounting the cache drive (where my docker appdata resides), so while I was eventually able to issue a shut down command, I'm pretty sure it wasn't as graceful as it should be (hitting the timeout period and triggering a hard poweroff). I believe something in docker was holding onto that mount from the deluge container being hung. Also notable, I was in the middle of 'yet another' parity check. I was able to see it got to 22.2% and was still chugging along like nothing went wrong. This leaves me to believe it's some Unraid/docker edge case and not a plugin interaction at all. Docker was updated between 6.10.3 and 6.11.1 so maybe something isn't quite working right on that release: 6.10.3 - docker: version 20.10.17 (CVE-2022-29526 CVE-2022-30634 CVE-2022-30629 CVE-2022-30580 CVE-2022-29804 CVE-2022-29162 CVE-2022-31030) 6.11.1: docker: version 20.10.18 (CVE-2022-27664 CVE-2022-32190 CVE-2022-36109) I'm going to restart, remove my deluge docker (but keep the appdata of course!) and reinstall Unassigned Devices. If this doesn't work, I think I'm going to head back to a stable 6.10.3 till LimeTech squares this off. If it does, I'm not sure what my next step will be (suggestions?). -JesterEE cogsworth-diagnostics-20221019-2143.zip
  14. Glad you got something working but I'd hardly consider this a solution given that just 1 OS version before didn't show a hardware issue. So unless the update somehow messed up the physical power supply electronics (extremely unlikely) I'm skeptical this is it. There are a lot more of us with this issue and I doubt that everyone has faulty hardware. You can track the problem thread here:
  15. Welp ... glad I waited that extra 15 hours because it happened again just now. Attached diagnostics for those interested. After my restart, I'll be uninstalling Unassigned Devices and reverting to docker macvlan. -JesterEE cogsworth-diagnostics-20221019-1156.zip
  16. I have to agree. I only did 2 things on this cycle which currently has an uptime of 2 days 9 hours. Switched docker macvlan -> ipvlan (as noted above in another comment, this doesn't seem to be it) Moved all my drives off of Unassigned Devices and made them Pool Devices. All my plugins remain installed and fully updated as of this posts timestamp (i.e. all the ones @JorgeB listed in the OP and more). I'm thinking it was #2 and something is going on when interfacing UD with a lot of IO traffic (like torrenting) on the 6.11 series. I was previously using one of my UD drives for torrent seeding using LinuxServer.io's deluge container and the Gluetun VPN client docker container. I still use the same containers but moved my UD xfs formatted drive to a pool device (which, for those of you that haven't done this yet, will NOT destroy your data if you don't change the filesystem) and have not had another crash since. I also was using UD for my VM drive, Plex database, and scratch drive so UD was previously doing a lot of heavy lifting. But I really think it was the torrent traffic that did it in because I was often able to access the other features of my server (VMs, plex and docker containers) that were utilizing UD AFTER the effects from the OP were noted. I'm going to let it sit for another 15 hours or so and then switch back to docker macvlan to unfortunately, really point the finger at UD. -JesterEE
  17. My Safe Mode test completed 2 1/2 days of uptime so I'm calling that a pass. So this leaves any issues caused by the VM Manager, Docker, or plugins. I started my next test with just the docker macvlan switched to ipvlan and we'll see how it goes. Looking to see another 2-3 days of uptime without errors and I'll reevaluate.
  18. I'll have to check my configs after my current Safe Mode test is done tomorrow, but I do run a docker network, which I believe is macvlan but I'd have to verify that.
  19. @JorgeB Thanks for raising this issue up. I will mark this tread as solved while it is tracked on the bug report you created. As an aside, @trurl pointed a couple of us to his Unraid 6 FAQ in another thread with concern to the Ryzen processor family and stability issues. I have been running a Ryzen 7 3800x for 3 years without issue before my upgrade to a Ryzen 9 5950x. Though to be complete and check the boxes, I under clocked my memory (2133 MHz which is the Auto setting my board sets) and modified the default Auto C-State setting which worked on my 3800x, to the "typical current idle". The same bug showed up so I believe this is unrelated. I am currently letting my server do a 4th parity check due to all the dirty shutdowns in Safe Mode with the RAM set to 2133MHz and the C-State setting back in Auto. I'll let it sit like that for a couple more days (I'm thinking 60-72hrs uptime) to see if it shows in Safe Mode. If not, I will start working with the plugins @JorgeB itemized in the bug report.
  20. I just started getting this error too; very frustrating. To be transparent, I upgraded my hardware 2 weeks before my upgrade to 6.11.1. So while I thought the hardware is stable after running without issue on 6.10.3 for that time before upgrading, it is possible (though unlikely) that it is not an Unraid issue. What's interesting is that all 3 of us (so far) use an ASUS motherboard. Maybe it's a coincidence. Maybe it's not. I am cross posting for traction on my other thread I started here: -JesterEE
  21. Recently after installing Unraid 6.11.0 (and subsequently 6.11.1 in hopes of fixing this issue) my server started reporting kernel NULL pointer dereferencing after about 1-2 days of uptime. This causes: The WebUI to become unresponsive (loading indefinitely after "successful login") Some (but not all) on my dockers to become unresponsive Dockers like plex and the *arrs seem to stay alive Others become unresponsive via their web interfaces but are reported as up via a 'docker container ls' check. Non-functional powerdown and poweroff commands (in attempt to reboot without parity check) It does not seem to impact: Operation of a VM I have running before, during, and after the error is reported (I'm typing on that VM right now and I had this issue pop up earlier this AM). Connection to the server via SSH For full transparency, I did upgrade my server's hardware (CPU and added RAM) 2 weeks before the 6.11.0 upgrade when operating on 6.10.3. I ran memtest86 on all the RAM for 3 passes after installation so I'm pretty sure that's OK. And 2 weeks of stable Unraid 6.10.3 operation leads me to believe this is unrelated to my hardware upgrade, but I cannot be 100% sure. Here is the relevant section of the syslog. Diagnostics attached. I have other diagnostics showing the same issue on both 6.11.0 and 6.11.1. Any help from the gurus appreciated! 🙏
  22. I have the same issue on my reverse proxied Unraid dashboard, but only on a certain Windows 10 device. The browser loading the content doesn't seem to matter (tried Google Chrome and Microsoft Edge) so I'm thinking its some group policy or something; maybe even an IT security safeguard (i.e. blocker). I looked at the network traffic in the Chrome Debugger and see all the WebSocket traffic coming in; none of which is blocked outright by the browser. But, I am getting errors in the console that look like this: So, something is not letting the connections through. I'd like to know how to fix this too, but I have no idea what would be causing this issue outside the browser. My best guess ... an IT deployed "safeguard". @FlyingTexan, have you tried on a different device?
  23. For anyone looking for a currently available FL1100 based USB Controller that works with Unraid, Linux and Windows 10 VMs (haven't tested a macOS VM) check out this one from YEELIYA. I haven't tested throughput to see what I can get out of it in a real-world test to really push the PCI-E 1x bus, but I have no issues with a keyboard, mouse, webcam and 2 USB sticks working simultaneously.