Rhynri

Members
  • Posts

    68
  • Joined

  • Last visited

Everything posted by Rhynri

  1. Hello! I'd like to submit a feature request for a setting that prevents the array from starting if there is an issue with the cache drive/array. I recently noticed that my motherboard was missing a molex power plug, so I shut down the system and popped the plug in. Somewhere along the way, I bumped a plug on my U.2 mounted NVME drive, loosening it just enough to take it off line. Upon starting unraid, the Array started as normal, but obviously the cache array was offline. My cache array is a BTRFS software RAID: Data, RAID0: total=1.11TiB, used=1.11TiB System, RAID1: total=32.00MiB, used=96.00KiB Metadata, RAID1: total=2.00GiB, used=1.21GiB Because the array started without the cache drive intact, upon reboot I was greeted with the attached screenshot in my dashboard. That drive activity is a BTRFS disk delete... on the array my running VM that I took this screenshot on is working off of, because everything automatically started as normal and then proceeded to do this. Which is not only unwanted but wasteful and shortens the lifespan of the drives. So I'd like to respectfully submit a request to have a setting that prevents the array from starting if there is any disk error at all. If this is already a thing and I couldn't find it, I apologize for wasting your time. Now I'm off to re-add that drive to my cache.
  2. Unraid has been an absolute lifesaver when it comes to managing my home tech infrastructure. I’ve consolidated so much into one system it’s not even funny. And the support you guys give to your users is unreal.
  3. @binhex - Found a solution for that scanning issue that I had to manually lock Plex back to 1.14 for. (It was a while back.) Edit: The problem showed up in the logs only as Jun 03, 2019 14:55:55.967 [0x151e50971740] WARN - Scanning the location /media/[Library Name] did not complete Jun 03, 2019 14:55:55.967 [0x151e50971740] DEBUG - Since it was an incomplete scan, we are not going to whack missing media. One of my scanners (Hama) was silently failing on certain files. I had to put a bunch of debugs into the .py files to sort it, but once I realized that the latest version from the GitHub solved it. So if you encounter someone with scanning issues, have them refresh all their scanners/plugins. I'll try to keep an eye out myself.
  4. If you’ve manually specified a version other than latest.
  5. Ouch. Yeah, I'm very happy with your docker image and knowing we can roll back easily is just icing on the cake.
  6. Firstly, thank you for your response. Second, I appreciate the education; I wasn't aware plex pass was a form of beta, but that makes sense now that you've said it. I'll see what I can do through official channels, but thank you again for your time and providing this great container in the first place.
  7. The latest Plex container would not scan for, nor detect changes to, my library items. Manual scans would immediately terminate, and manual scanning was not required prior to the latest un-tagged versions. Rolling back to 1.14.1.5488-1-01 immediately rectified the problem and found the library items I've added since updating to the 'latest' version. I'm available for debugging purposes if you are interested @binhex. Judging by responses on the official Plex forums this may be an issue in the official release, but most of the threads I'm finding are referencing the Mac OSX version. Edit: Your container has been excellent one for many moons for me though. Just wanted to give my praise as well.
  8. Awesome video. I'd like to note that in "independent research" I got hwloc/lstopo included with the GUI boot in Unraid 6.6.1. So that's another option requiring about the same number of reboots as the script method. I.e. - Reboot into GUI, take snapshot, reboot back to CLI. Of course if you run GUI all the time, this is just a bonus for you. Also, here is a labeled version of the Asus x399 ZE board in NUMA mode. Enjoy, and thanks @SpaceInvaderOne! (Note: this is with all M.2 slots and the U.2 4x/PCIE 4x split enabled with installed media. Slot numbers are counting by full-length slots in order of physical closeness to CPU socket... so top down for most installs)
  9. Thank you very much for this. I completely understand if it's only available in GUI-boot. Just gives me an excuse to go see the GUI! Hopefully other people find it useful as well.
  10. I wrote a rather in-depth reply then accidentally deleted it and there is no undelete. Suffice to say moving the VM to the other NUMA node reduced the incidence of the problem and improved the rendering performance of the VM in question. It's still not gone but I think a lot of the remaining NUMA misses are related to unraid caching things, which is hardly a priority operation: numastat node0 node1 numa_hit 2773556844 1684914320 numa_miss 6233397 193845232 numa_foreign 193845232 6233397 interleave_hit 84430 84643 local_node 2773481539 1684881326 other_node 6308702 193878226 Starting from a clean boot and looking at numastat when booting the two important VMs yields very few numa_miss (es) relative to the previous configuration. This is after 8 days of uptime. @limetech - If you could please include lstopo in a future release I'd greatly appreciate it. I linked a slackware build for hwloc in a previous post in this thread if that helps. There are a few BIOS settings relating to IOMMU allocation in relation to the CCX's on Threadripper and I'd like to do some A/B testing with lstopo to see what if any difference they make. As I mentioned in that reply, it would also potentially be a useful addition to the System Devices page. Please and thank you for your time and effort in making Unraid OS awesome.
  11. It looks like it's trying to work. It will slow down the startup significantly and cause the numa misses to skyrocket. I've since discovered that only one of my VMs behaves this way. I'm wondering if I can move that one to the other node it keeps trying to allocate memory on and see if that fixes the issue. Does anyone know if it matters which cores are isolated? Say, if i want to move my isolated cores to the beginning (0-11 physical), instead of at the end (4-15 physical) if unraid cares at all?
  12. I've been looking into this, and I think it may have something to do with which NUMA node the GPU is on. I was able to force correct NUMA allocations by changing the memory size of my node0 VM to neatly fill the available memory on that node, then booting the remaining two, but that results in a super lopsided memory allocation (28,16,8), and it's a very manual process. I'm going to be asking around the VFIO community to see if there is anything I've been overlooking. I've been trying to install hwloc (slackbuild link) into unraid so I can have access to the very useful lstopo which would let me know which node(s) my pcie devices are on. I keep running in to compilation issues, however, so I'm going to keep working on that. However, the lstopo output as a standalone would be something very useful to have on the tools page as it gives you a very good idea of what devices are nested for pass-through... it's arguably as useful as anything on the [Tools]>[System Devices] page in terms of pass-through usage. I've also attached an image of what the lstopo gui output looks like. Example (not my system): # lstopo Machine (256GB) NUMANode L#0 (P#0 128GB) Socket L#0 + L3 L#0 (20MB) L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0) L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#2) L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#4) L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#6) L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4 + PU L#4 (P#8) L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5 + PU L#5 (P#10) L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6 + PU L#6 (P#12) L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7 + PU L#7 (P#14) HostBridge L#0 PCIBridge PCI 1000:005d Block L#0 "sda" PCIBridge PCI 14e4:16a1 Net L#1 "eth0" PCI 14e4:16a1 Net L#2 "eth1" PCI 14e4:16a1 Net L#3 "eth2" PCI 14e4:16a1 Net L#4 "eth3" PCI 8086:8d62 PCIBridge PCIBridge PCIBridge PCIBridge PCI 102b:0534 PCI 8086:8d02 Block L#5 "sr0" NUMANode L#1 (P#1 128GB) Socket L#1 + L3 L#1 (20MB) L2 L#8 (256KB) + L1d L#8 (32KB) + L1i L#8 (32KB) + Core L#8 + PU L#8 (P#1) L2 L#9 (256KB) + L1d L#9 (32KB) + L1i L#9 (32KB) + Core L#9 + PU L#9 (P#3) L2 L#10 (256KB) + L1d L#10 (32KB) + L1i L#10 (32KB) + Core L#10 + PU L#10 (P#5) L2 L#11 (256KB) + L1d L#11 (32KB) + L1i L#11 (32KB) + Core L#11 + PU L#11 (P#7) L2 L#12 (256KB) + L1d L#12 (32KB) + L1i L#12 (32KB) + Core L#12 + PU L#12 (P#9) L2 L#13 (256KB) + L1d L#13 (32KB) + L1i L#13 (32KB) + Core L#13 + PU L#13 (P#11) L2 L#14 (256KB) + L1d L#14 (32KB) + L1i L#14 (32KB) + Core L#14 + PU L#14 (P#13) L2 L#15 (256KB) + L1d L#15 (32KB) + L1i L#15 (32KB) + Core L#15 + PU L#15 (P#15) HostBridge L#7 PCIBridge PCI 15b3:1003 Net L#6 "eth4" Net L#7 "eth5"
  13. NUMA daemon source As for the webterminal, once it has enough text to get a decent scroll back the scrolling gets choppy and the typing lags a little. I do use a fairly old MacBook Air and chrome to access unraid, but it’s not something I noticed last build. It’s possible it’s just that machine being goofy too. I haven’t had time to research the issue fully, but I’ll look into it tomorrow and let you know if I find any suggestions.
  14. So, after getting into RC2, I was trying to optimize my pinning using the new interface. I looked up my NUMA boundaries in the process: numactl --hardware available: 2 nodes (0-1) node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23 node 0 size: 32040 MB node 0 free: 256 MB # <<< Make note of this value node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31 node 1 size: 32243 MB node 1 free: 19974 MB node distances: node 0 1 0: 10 16 1: 16 10 I'm currently running two VMs as of this command. VM 1: 16gb RAM, CPUs 4-7, 20-23 (So, numa node 0, in CPU pairs) VM 2: 16gb RAM, CPUs 8-11, 24-27 (Numa node 1) But as you'll note, all the RAM is being allocated to node 0. Uh oh. Let's check: numastat qemu Per-node process memory usage (in MBs) PID Node 0 Node 1 Total ----------------------- --------------- --------------- --------------- 13479 (qemu-system-x86) 16473.43 0.25 16473.68 27148 (qemu-system-x86) 13259.18 3204.48 16463.66 ----------------------- --------------- --------------- --------------- Total 29732.60 3204.74 32937.34 Well, crap. That's no good. I then tried to force it using the <numatune> tags. This works fine for VM 1, which is completely in it's own node, but for VM 2, this makes it take forever to start up, because it tries to force the second qemu instance onto node 1 (where it should be) and you get a bunch of numa misses when the memory is allocated to node 0 anyway. This can also cause some NVRAM corruption in combination with other numa-optimizations and xml configuration settings, though I'm not able to remember exactly which one borked up the VM so bad I had to restore the .img file, nvram and xml to get nvidia drivers working again. I imagine this will be extra important for 2990 users as two of the cores have significantly better memory access than the others and you'd want to keep VMs nicely in line with these boundaries for optimum performance. Obviously we don't want this boundary crossing to happen with other processors (like my 1950) for performance reasons as well. Bonus bug: WebTerminal is really slow this release once you have some text in the window compared to last release. Bonus question: Any chance of getting 'numad' baked in so we can use "auto" in numatune?
  15. Hello! The new RC2 now boots correctly. Thanks! I've run into some other weirdness but it's not the same issue so I've closed this one.
  16. I have a 1950. I’m using all the back USB ports for the Unraid OS itself and then I pass individual controllers off a Sonnet card for the VMs. I have the wireless and Bluetooth disabled because it cleans up the pass through for the rest, although I’d love for unraid to be able to use the wireless for additional network redundancy. In the bios I have NUMA set up and then additional tweaks to the pcie setup because I’m splitting the bottom slot between the sonnet card and a U.2.
  17. The ASMedia USB header requires a specific plug that I don't have available, so it is enabled but not currently used. The rest of the USBs (minus my Sonnet USB card, which is for VM use only and isolated at boot) I believe stem from the AMD chipset. All available USBs seem to exhibit the problem. In my previous reply, I mention that the front panel USB works for M/KB with new bios revision. While this isn't terribly obvious (sorry about that), front panel connections are always via MB header. While I haven't attempted to use my front panel USB 2.0 ports to boot from, I'd argue that if that does in fact solve the problem it's not a solution, as if you don't have USB 2.0 header ports available it won't work, and it's probably a symptom of a bigger problem. Also, what TR do you have and what board revision? Use: dmidecode --type 2 And you should get back something like mine: Manufacturer: ASUSTeK COMPUTER INC. Product Name: ROG ZENITH EXTREME Version: Rev 1.xx Serial Number: 170706217200585
  18. I was able to get a little farther this time, in that the front panel USB worked, meaning I had a functional keyboard. I was able to get a USB key to mount, and copied off the contents of the /var/log directory, figuring that'd be the most useful. I then tried booting with the UNRAID drive in this port, but unfortunately that failed in the exact same manner. I've PM'd the full zip to eschultz and am posting the dmesg output here for all eyes. dmesg.txt
  19. Yessir, am working on that now. Will follow up after that's done. Actually thought I had this update but apparently downloaded it and didn't apply it. Thank you for your concern, but I'm well aware of this. There is no way to fit them inside the chassis with the other hardware in there. I actually was going to mention this in the original post, but figured it didn't move the conversation forward or have anything to do with the bug. My apologies for the omission.
  20. Diagnostics [from previous version 6.5.3] attached. tower-diagnostics-20180904-1836.zip
  21. My apologies upfront - no logs available as the xHCI controller dies, and there is no PS2 support on the board for CLI login. Additionally, since the UNRAID drive is unreachable during a critical point in boot, no networking is available. As you can see from the attached screen, the xHCI controller is listed as dead by the kernel during a critical juncture in the boot-up sequence, preventing successful boot. Just before this, you can see my array drives being listed, and they are all attached via USB, so clearly the controller is up just prior to this message. The first time this happened, the machine hung so severely it required a full hard power-switch cycle to properly reboot and be able to enumerate the USB attached drives again. From what I can see the boot precedes according to previous boot logs until this point. The same thing happens on safe boot. Rolling back to previous version was successful and Unraid booted without incident. Please let me know how I may assist troubleshooting. Diagnostic file in separate post below. Motherboard is an Asus Zenith Extreme.
  22. Hello Gridrunner, For me the value in /sys/kernel/mm/ksm/run was zero, but you can echo a 1 there to enable KSM. I haven't seen much benefit as it seems as though it still might not be working in the VMs. I'll keep working at it.
  23. Thanks for taking a look. I really appreciate the response. My VM's indeed had the XML lines you posted, which I am guessing is why I've never seen any KSM activity. I'll test and let you know how it goes. I'd rather just add 32GB of ram to the machine (or more) so each one can have an actual 16GB, but as you are probably well aware, the price of ram is insane at the moment.
  24. While I agree for production server use, he seems to be looking into running multiple gaming VMs, so for that TR works fine, I use it to host my daily driver with 0 issues since the PCI reset bug was fixed. If it crashes once or twice a year he probably doesn't care too much.