thenonsense

Members
  • Posts

    126
  • Joined

  • Last visited

Everything posted by thenonsense

  1. No issues with speed after more thorough testing, closing this with the above as the solution.
  2. For anyone else that has a similar issue, I think I found it. It was brought forward by my install of the "active streams" plugin. I noticed as soon as I opened it up a TON of activity despite the client being supposedly asleep, the client being one of my windows VMs. Turns out, Firefox was writing session data constantly back home, and File History was apparently writing it to my backup share, basically clogging my SMB pipe with a ton of garbage. Now I'm not sure how File History could be so busy since it's only set to run about every 15 minutes (still an aggressive setting), but I do have File History set to back up my entire user folder just in case I lose a wayward game data save. Two solutions could be employed here. One is to decrease the frequency by which Firefox writes session data. Not a bad solution, but not a perfect fit. The second is to prevent File History from writing Firefox entirely. I employed both, since I just don't like Firefox pounding any disk, including my OS. I've yet to have another problem yet in about half a day of testing. Given that my samba config is vanilla, as was my hardware, I believe this ultimately was the problem. Someone out there likely also backs up their appdata folder or even their whole user folder, and might run into the same annoyance. I hope you find my words here. Shoutout to JorgeB for noting the corruption, and motivating me to upgrade my rig. I cannot believe I never noticed those messages in the syslog, and of the two problems chatted about in this thread, corruption is 100% worse for this usecase than speed.
  3. I wanted to provide an update after testing, including an upgrade to Ryzen 7000. The CPU is a 7950x, sitting on an AsRock Taichi, cores split exactly as before but since these aren't 4-cores/CCX but 8-cores/CCX, infinity fabric doesn't factor in nearly as much for VM performance and on a separate note the gaming is fast. I'm sitting on 64GB DDR5, having tested it at stock settings and and at EXPO 6000. It's one of fefw boards with enough SATA slots, but they aren't all sitting nicely on adjacent controllers, as an LSTOPO has revealed. Now fortunately the corruption has appeared to have vanished. I'm glad to have that behind me. However the speed issue is not addressed. I'm not sure if minute read/writes from VMs are eating up bus traffic, or if 16GB (after each VM takes 24GB) is enough for both Unraid to operate as a hypervisor and for SMB to serve files for gaming. Or maybe 2 HT cores are not enough. I'm not sure what else to test here. If anyone has ideas, I'd love to hear them. I'm going to try Dynamix Active Streams to see if that gives me any more insight into what SMB is doing and what's taking so long.
  4. After the first wipe, I honestly bumped the voltages and tried to keep the clocks for now, since it's a gaming PC. After the second one (still found corruption) I tried reverting to 2133MHz and upon writing back to the restored pool we had a crash on the mover. No corruption though on this second try, but I'll need a day or two to verify. Looks like RAM or first gen infinity fabric was indeed the issue. I've been seeing some weird activity (isolated cores @ 100% but no VM running to use them) and I'm wondering if there's something with the bus only being connected to one CCX/die that's also at play here. Sadly I feel like I know a lot about the first and second gen threadripper quirks, but I get shown something new at least once every few weeks. Part of my goal in jumping to the 7k series is to hop on that unified bus and see if that (plus a more mature process) helps address these issues while maintaining the sweet spot of 6k MHz on the DDR5. I'm still new to what zen4 architectural changes there are, and I haven't yet come across a thorough enough article to get ahold of the CCX organization, infinity fabric, UMA, and everything else.
  5. @JorgeB thanks for your help earlier, sorry to call you out, do you think it'd be a good idea to keep th cache on btrfs or should I bump to zfs or otherwise? This is the first time I've run a protected cache on btrfs since the rig's conception in 2017.
  6. Pulled files off the drives, reset the pool, BOOM file corruption on new files. Pool was verified clean before the new files were written, and immediately we have the same issue with lapses in IO. I'm not sure if the IO crippling is caused by the corruption, but neither are really acceptable. This specific pool is about 2 or so months old, and the version before it was single-device, so ironically the best thing for stability might just be a lack of raid lol. Rather than that I've decided to throw the baby with the bathwater, sell the farm, and transplant a new spine into the patient. 7950x, DDR5, and Asrock Taichi are in the mail. We'll see if we can resolve these errors the David Martinez way, speed.
  7. Wow, that's good info. I'm dropping the array now and running some BTRFS recovery commands, then I'll post back.
  8. UPDATE: Rebooting a VM during the time when IO seems to be crippled (incurring hefty read/write on the disks) appears to occur with no issue. Therefore, IO to/from the disks themselves isn't the problem. The problem does seem to be Samba, but I don't know Samba enough to figure out where...
  9. HI all, I usually try to get these resolved myself but I'm at the end of my rope here. I'm curious if anyone has seen this and has some helpful tips. Essentially my rig has a pool of 6 SATA3 SSDs in RAID10 on which sit gaming libraries (shares) for two players, and the vdisks for two gaming VMs. The VM-vdisk I/O seems fine, but regularly the gaming libraries will go from acceptable transfer speeds to completely stalling out, crashing games or worse. I experimented with disk shares to take FUSE out of the equation, but transfers would still stall out. I checked from both VMs and from other devices on the network, but the behavior is the same. I've also tried to validate the VMs do not have their vdisk IO interrupted, and this seemed the case until most recently one of the VMs completely crashed on reboot, leading me to believe the disk failed to be written to on shutdown. The most recent version of this problem I've noticed is that this cache pool isn't the only storage to stall, or else I'd suspect a disk failure. My spinning disks will also stall out completely. Thus I'm not sure if the issue is Samba, one disk's IO that is tanking all of Unraid, or something else. This issue has persisted since 6.10, so it's not limited exclusively to 6.11.x for me. I've got two diagnostics here. The earlier one has Samba logs enabled, but that was spitting out the smbd synthetic pathref error over and over again, which according to other posts is more a nonissue. The second diagnostics had samba logging set to 0 to better show the rest of the syslog. Please let me know if you have any ideas. I appreciate your help! undoot-diagnostics-20221104-1202.zip undoot-diagnostics-20221106-1247.zip
  10. Yo friends, I'm not sure about you but I've been struggling to find a board this generation that supports a 2-gamers-1-pc build, especially with the number of drives we currently support. Coming from a 1950x, we had more PCIe lanes than we could deal with. Npw, in this current market, I'm stumped on what may be a viable upgrade path. I hopped on pcbuilder.net and pcpartpicker.com to filter motherboards, but several of those tools take into account only physical PCIe slot size and not the number of lanes driven in different configurations. I'm reaching out to the forum to see if anyone's seen a better board than me for the following: 2x PCIe gen 4-5x16 slots (driving 8x/8x since we have 2 GPUs) 2-3x PCIe gen 3-4x1 slots, or less if the board supports: 8x sata slots (can be replaced by a PCIe card above) 3x NVME slots (can be replaced by a PCIe card too) A ****ton of USB headers/ports (capable of being split off from the Unraid boot drive) I've been seeing too many boards that might have enough of the PCIex1 or PCIex4 slots, but bump the second GPU to x4 as well. Otherwise, I see proper 8x/8x support and not much else. PCIe generation on the GPUs is the only important one, everything else in my rig today is pretty much gen 3. Therefore it seems like with even only 24 PCIe gen 5 lanes there should be enough bandwidth to support what my 1950x is driving now (64 gen 3 lanes on 1950x vs ~96 lanes on 1950x). Has someone come across a board that meets these parameters?
  11. After a slew of changes (without validating them individually of course) it APPEARS to be stable, for now. Notable events: Point our VM GPUs at different BIOSes (I point secondary GPUs at BIOS files too, but somehow they were pointing at the same one, same GPU so no compatibility issue, but maybe a file handle issue) Disable Hypervisor (enabling Hypervisor afterwards has not re-induced the issue from what I can tell) Fast boot was enabled on a VM. Disabled that crap. Not sure if that did it but the words "fast startup" and "power state" go hand in hand in my brain. Also you should have that disabled by default in these VMs. Did not try nvidia-persistenced tag yet, and apparently didn't need it If anyone else has any thoughts, please share them.
  12. Hi everyone, Fair warning, this is my crosspost attempt from Reddit. Why I started there, I'm not sure. I'm an owner of a 2-streamers-1-CPU build that holds 2 GTX 1080s and a Threaderipper 1950x, and I've been stable for a long period of time, 6 months to a year since my last issue. About a couple of months ago I upgraded to 6.9.0rc2, and after about a month of uptime my GF and I had a double-blackscreen while playing the same game, Monster Hunter World. Logs from the VMs are as follows, see the bottom 4 lines for the major issue: -boot strict=on \ -device nec-usb-xhci,p2=15,p3=15,id=usb,bus=pci.0,addr=0x7 \ -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x3 \ -blockdev '{"driver":"host_device","filename":"/dev/disk/by-id/ata-Samsung_SSD_850_EVO_1TB_S3PJNB0J806112N","node-name":"libvirt-4-storage","cache":{"direct":false,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-4-format","read-only":false,"cache":{"direct":false,"no-flush":false},"driver":"raw","file":"libvirt-4-storage"}' \ -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=libvirt-4-format,id=virtio-disk2,bootindex=1,write-cache=on \ -blockdev '{"driver":"file","filename":"/mnt/disks/GameCache/Blizzard/FearTurkey/vdisk2.img","node-name":"libvirt-3-storage","cache":{"direct":false,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-3-format","read-only":false,"cache":{"direct":false,"no-flush":false},"driver":"raw","file":"libvirt-3-storage"}' \ -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=libvirt-3-format,id=virtio-disk3,write-cache=on \ -blockdev '{"driver":"file","filename":"/mnt/user/isos/Win10_1803_English_x64.iso","node-name":"libvirt-2-storage","auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-2-format","read-only":true,"driver":"raw","file":"libvirt-2-storage"}' \ -device ide-cd,bus=ide.0,unit=0,drive=libvirt-2-format,id=ide0-0-0,bootindex=2 \ -blockdev '{"driver":"file","filename":"/mnt/user/isos/virtio-win-0.1.173-2.iso","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-1-format","read-only":true,"driver":"raw","file":"libvirt-1-storage"}' \ -device ide-cd,bus=ide.0,unit=1,drive=libvirt-1-format,id=ide0-0-1 \ -netdev tap,fd=33,id=hostnet0,vhost=on,vhostfd=34 \ -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:45:b5:84,bus=pci.0,addr=0x2 \ -chardev pty,id=charserial0 \ -device isa-serial,chardev=charserial0,id=serial0 \ -chardev socket,id=charchannel0,fd=35,server,nowait \ -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 \ -device usb-tablet,id=input0,bus=usb.0,port=2 \ -device 'vfio-pci,host=0000:44:00.0,id=hostdev0,bus=pci.0,addr=0x6,romfile=/mnt/user/nas/Build Info (DO NOT TOUCH)/gtx1080.dump' \ -device vfio-pci,host=0000:44:00.1,id=hostdev1,bus=pci.0,addr=0x8 \ -device vfio-pci,host=0000:09:00.3,id=hostdev2,bus=pci.0,addr=0x9 \ -device vfio-pci,host=0000:42:00.0,id=hostdev3,bus=pci.0,addr=0xa \ -device usb-host,hostbus=1,hostaddr=2,id=hostdev4,bus=usb.0,port=1 \ -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ -msg timestamp=on 2021-03-15 20:58:17.231+0000: Domain id=1 is tainted: high-privileges 2021-03-15 20:58:17.231+0000: Domain id=1 is tainted: host-cpu char device redirected to /dev/pts/0 (label charserial0) 2021-03-16T05:19:05.856739Z qemu-system-x86_64: vfio: Unable to power on device, stuck in D3 2021-03-16T05:19:05.861722Z qemu-system-x86_64: vfio: Unable to power on device, stuck in D3 2021-03-16T05:19:07.054770Z qemu-system-x86_64: vfio: Unable to power on device, stuck in D3 2021-03-16T05:19:07.054895Z qemu-system-x86_64: vfio: Unable to power on device, stuck in D3 "Ok", I think to myself, "Maybe sitting for multiple months on an RC branch wasn't a good idea, upgrade to 6.9.1". I make the upgrade and decide to try again. My GF and I play some more games for a couple hours, then swap back to Monster Hunter World. After roughly an hour of MHW, crash again, both VMs. Same message. Now I'm aware of the dreaded AMD reset bug and I've seen Code 43s but this is neither AMD nor a Code 43, not to mention the system has been as stable as our relationship through Covid, so I'm at a loss. Google fu told me about trying to disable the hypervisor, and I'm also thinking of vfio binding one or both GPUs at boot, but I'd be concerned about boot, since one of the 1080s is the primary card. Does anyone have a better idea at what could be wrong? Logfiles can be found here. Forgive the VM names, I'm a TFS fan at heart.
  13. You've got several different options. Either swap your motherboard to one with 5 slots (I'm not sure one exists) or use PCI bifurcation to split a slot. I believe this is your best bet. Now, you'll possibly run into bottlenecks depending on how PCIe 4.0 handles 8 lanes for a GPU (assuming you split a 16x slot) versus video encoding throughput. If the loss in bandwidth corresponds to a loss in encoding speed, you can tie that to dollars lost per unit time. I would research PCIe 4.0 bifurcation. Then of course research what PCIe lanes go to which CCX's on the Zen 2 die (I believe Zen 2 uses uniform access for both memory and PCIe via the IO module on the socket, so this isn't actually. problem). Split everything so you manage CCX's properly and avoid CPU communication cross-die or cross CCX if possible, and you're gold.
  14. I figured out my own hacky solution. Essentially I have a single drive (or in this case I just RAIDed 2 NVMEs for speed) and set them up with btrfs, single share, for both users. I dragged my own shared games over, then cp --reflink'd everything for the other user. For steam, it's best to get the original user's app manifest files, and delete the executables for other users so that steam can remake them from scratch for each user, apparently part of the DRM protections. Bottom line every game file but mod, .exe, and app manifest files were reflinked. Then every time I send through a game update on both libraries, I just rebuild the reflinks using jdupes --dedupe. The last step was the hardest. How to hit convergence after CoW drift? judpes doesn't do a perfect job, it still compares large files even after they've already been reflinked, not sure if btrfs has a way to cleanly expose those for more efficient diffs. However, that's the hackiest part of this job, so I'm happy with the result. Glad to raise a dead thread. Hope this helps someone else.
  15. Sorry dmacias, not sure how I missed the notification for your update, jdupes 1.14 not compiled with btrfs support. Running "jdupes --dedupe" shows the message. I'll add a PR when I get a chance.
  16. Hi dmacias, Can you please add the newest (2.10) version of rmlint? We're using it to get CoW convergence on our caches Edit: It looks like the other option for this, jdupes, was built without btrfs support. For those who need it, can you please enable that flag in the makefile and republish? I'm not sure if it's in the works to configure where Nerdpack is pointed, but if it is, I'd be willing to open my own packages to lend a hand. I imagine most others feel similarly.
  17. Hi everyone, I'm a couple-year user of Unraid but I've never found my most favorable way to share games between users. My usecase is: Players 1 and 2 both have the same general batch of games they want to play, at the same time. I used to eat the storage redundancy with separate drives holding the same content, to avoid issues for both players having access to the same files. My question is: is there a way to get both users to share the EXACT same gaming library, same files, reasonably? Last I tried pointing both users' steam clients at the same game files (SATA SSD share running BTRFS), I didn't run into updating collisions, but when both users accessed the same files the performance tanked for each instance of the game. I don't think accelerated storage is the answer, I think it might be file handles. I could copy the files to different locations on the same disk/share, and take advantage of CoW, but I don't know if updates that trigger CoW will remove the redundant files when they reach convergence (after both file sets update). I could also just swap up to NVMe, and see if increased r/w performance will help with this issue. There are a few different directions I can go. What does the public recommend?
  18. I figured it out. I had expected CoW to be a lot more intensive than it actually is. It's not. Copies of files are implicitly made only when a file is copied and then one of those copies is modified upon. I expected some copy-in-place transactional craziness that versioned files as they were modified, and didn't care if a copy was made previously. I'm confident I'm not explaining my original point of view well, but that's because it was pretty stupid to begin with.
  19. Hi all, I'm trying to understand something about Btrfs and how it tracks its multiple copies of files. I have a couple scenarios and I'm tryting to understand how COW resolves with them: A file is put onto the file system by User A. User B updates the file. What differentiates User A from User B? How is data chosen (original vs updated) to present to User A or B? What about User C, who has seen neither original nor updated variant? A library, or video game, or even a git repo is put onto the file system. That library / game / repo receives an update 2 weeks later. Large amounts of edits may be made, so does this rapidly consume storage? A file is updated/deleted by User A. How does that file get updated/deleted in the original, so that User B either sees the new version or no file at all? I'd appreciate if someone could explain to me some of these items or point me in a direction that helps to explain. I've been over wikipedia a couple times, and devouring stackoverflow, but I'm still not seeing clear answers to my questions, likely out of not understanding the answers.
  20. Not sure if this was ever resolved, what cores are you typically passing through?
  21. It sounds like you've got gigabit internet (or at least on that scale). Can you do some file transfers between your NAS and a local VM? NAS and another device on the network? Post your local speed results. I'll do the same this weekend and post my findings, but I'm living nicely on just 100Mb speeds to the rest of the world. I'm also getting those speeds on all my unraid resources. I'm sitting on 6.7.0 stable as well.
  22. I got the notification about linked content and I'm real glad that people are using this. It's become quite a good resource. Due to the architecture of Ryzen, there are PLENTY of tidbits that need to be in a row to reduce inter-die communication, and therefore latency. Making sure you're accessing memory modules directly linked to your die, using only cores on your die (or even just on the same CCX), what PCIe lanes go where, etc. It looks like a lot of ground has been covered since your initial post, with cross-die core allocations and nebulous GPU performance. It seems like you've corrected quite a few problems. To add to your solutions, I'm going to recommend some basic architecture decisions that worked for me in the past (on my 1950x, so grain of salt): Try to reduce vms to all cores on one CCX, and then expand from there. Try to get VMs on different dies. Try to leave 10-20% of your total memory to Unraid, then scale up from there. Numa-ize your memory. Try to get 2 nodes and assign node 0 to die 0 operations (like VM 1) and node 1 to die 0 to VM 2. Reallocate disks for your VMs. Try passing them through. Most of all, experiment. Foot my plane ticket, and I'll look alongside you
  23. It isn't. There are guides like this one here: https://forums.unraid.net/topic/35112-guide-passthrough-entire-pci-usb-controller/ However it isn't very up-to-date. Here's the typical process: Go to /Tools/SysDevs and look for your USB controllers. You're looking for all USB controllers. An example would be [1022:145c] 09:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) USB 3.0 Host Controller You can also buy PCI->USB expansion cards to more easily find a USB controller. Next is to follow the linked guide, and find out what USB ports map to what buses. This is important for your motherboard, but not as important if you bought your own USB card. Map any of the controllers (your expansion card OR a controller on your motherboard that is NOT the same bus your Unraid flash drive is on) to the ports it controls. In your flash drive config, add the line vfio-pci.ids=1022:145c between "append" and any other configs (like isolcpus) but replace the 8-digit value with the corresponding 8-digit PCI identifier you pulled in our first step. Reboot your rig, and note in your VM config that the USB controller should now be on the list of passable devices at the bottom. Sorry for errors in the guide, a few drinks in. YMMV
  24. There are a few views on this but no replies. I will say that I personally have seen disconnect issues, but I don't use individual device passthrough. I pass through the controller, and sometimes see a disconnect/reconnect when transferring large files (that breaks the transfer) but it always reattaches the device. Have you tried passing through the controller? It should at least easily facilitate the reconnect when it drops. No need to buy another controller, you can pass through one of the controllers on your mobo.
  25. You do, the one built into your 8700k. Intel HD 630. Set the integrated GPU in BIOS as default, and boot to unraid. From there, just pass through your GTX 1080 to the VM. You also don't need one. You can copy the BIOS of your GPU using the tutorial by SpaceInvaderOne for using passthrough with a single GPU. However, this isn't necessarily the easiest for first-timers.