thenonsense

Members
  • Posts

    126
  • Joined

  • Last visited

Recent Profile Visitors

2248 profile views

thenonsense's Achievements

Apprentice

Apprentice (3/14)

7

Reputation

1

Community Answers

  1. No issues with speed after more thorough testing, closing this with the above as the solution.
  2. For anyone else that has a similar issue, I think I found it. It was brought forward by my install of the "active streams" plugin. I noticed as soon as I opened it up a TON of activity despite the client being supposedly asleep, the client being one of my windows VMs. Turns out, Firefox was writing session data constantly back home, and File History was apparently writing it to my backup share, basically clogging my SMB pipe with a ton of garbage. Now I'm not sure how File History could be so busy since it's only set to run about every 15 minutes (still an aggressive setting), but I do have File History set to back up my entire user folder just in case I lose a wayward game data save. Two solutions could be employed here. One is to decrease the frequency by which Firefox writes session data. Not a bad solution, but not a perfect fit. The second is to prevent File History from writing Firefox entirely. I employed both, since I just don't like Firefox pounding any disk, including my OS. I've yet to have another problem yet in about half a day of testing. Given that my samba config is vanilla, as was my hardware, I believe this ultimately was the problem. Someone out there likely also backs up their appdata folder or even their whole user folder, and might run into the same annoyance. I hope you find my words here. Shoutout to JorgeB for noting the corruption, and motivating me to upgrade my rig. I cannot believe I never noticed those messages in the syslog, and of the two problems chatted about in this thread, corruption is 100% worse for this usecase than speed.
  3. I wanted to provide an update after testing, including an upgrade to Ryzen 7000. The CPU is a 7950x, sitting on an AsRock Taichi, cores split exactly as before but since these aren't 4-cores/CCX but 8-cores/CCX, infinity fabric doesn't factor in nearly as much for VM performance and on a separate note the gaming is fast. I'm sitting on 64GB DDR5, having tested it at stock settings and and at EXPO 6000. It's one of fefw boards with enough SATA slots, but they aren't all sitting nicely on adjacent controllers, as an LSTOPO has revealed. Now fortunately the corruption has appeared to have vanished. I'm glad to have that behind me. However the speed issue is not addressed. I'm not sure if minute read/writes from VMs are eating up bus traffic, or if 16GB (after each VM takes 24GB) is enough for both Unraid to operate as a hypervisor and for SMB to serve files for gaming. Or maybe 2 HT cores are not enough. I'm not sure what else to test here. If anyone has ideas, I'd love to hear them. I'm going to try Dynamix Active Streams to see if that gives me any more insight into what SMB is doing and what's taking so long.
  4. After the first wipe, I honestly bumped the voltages and tried to keep the clocks for now, since it's a gaming PC. After the second one (still found corruption) I tried reverting to 2133MHz and upon writing back to the restored pool we had a crash on the mover. No corruption though on this second try, but I'll need a day or two to verify. Looks like RAM or first gen infinity fabric was indeed the issue. I've been seeing some weird activity (isolated cores @ 100% but no VM running to use them) and I'm wondering if there's something with the bus only being connected to one CCX/die that's also at play here. Sadly I feel like I know a lot about the first and second gen threadripper quirks, but I get shown something new at least once every few weeks. Part of my goal in jumping to the 7k series is to hop on that unified bus and see if that (plus a more mature process) helps address these issues while maintaining the sweet spot of 6k MHz on the DDR5. I'm still new to what zen4 architectural changes there are, and I haven't yet come across a thorough enough article to get ahold of the CCX organization, infinity fabric, UMA, and everything else.
  5. @JorgeB thanks for your help earlier, sorry to call you out, do you think it'd be a good idea to keep th cache on btrfs or should I bump to zfs or otherwise? This is the first time I've run a protected cache on btrfs since the rig's conception in 2017.
  6. Pulled files off the drives, reset the pool, BOOM file corruption on new files. Pool was verified clean before the new files were written, and immediately we have the same issue with lapses in IO. I'm not sure if the IO crippling is caused by the corruption, but neither are really acceptable. This specific pool is about 2 or so months old, and the version before it was single-device, so ironically the best thing for stability might just be a lack of raid lol. Rather than that I've decided to throw the baby with the bathwater, sell the farm, and transplant a new spine into the patient. 7950x, DDR5, and Asrock Taichi are in the mail. We'll see if we can resolve these errors the David Martinez way, speed.
  7. Wow, that's good info. I'm dropping the array now and running some BTRFS recovery commands, then I'll post back.
  8. UPDATE: Rebooting a VM during the time when IO seems to be crippled (incurring hefty read/write on the disks) appears to occur with no issue. Therefore, IO to/from the disks themselves isn't the problem. The problem does seem to be Samba, but I don't know Samba enough to figure out where...
  9. HI all, I usually try to get these resolved myself but I'm at the end of my rope here. I'm curious if anyone has seen this and has some helpful tips. Essentially my rig has a pool of 6 SATA3 SSDs in RAID10 on which sit gaming libraries (shares) for two players, and the vdisks for two gaming VMs. The VM-vdisk I/O seems fine, but regularly the gaming libraries will go from acceptable transfer speeds to completely stalling out, crashing games or worse. I experimented with disk shares to take FUSE out of the equation, but transfers would still stall out. I checked from both VMs and from other devices on the network, but the behavior is the same. I've also tried to validate the VMs do not have their vdisk IO interrupted, and this seemed the case until most recently one of the VMs completely crashed on reboot, leading me to believe the disk failed to be written to on shutdown. The most recent version of this problem I've noticed is that this cache pool isn't the only storage to stall, or else I'd suspect a disk failure. My spinning disks will also stall out completely. Thus I'm not sure if the issue is Samba, one disk's IO that is tanking all of Unraid, or something else. This issue has persisted since 6.10, so it's not limited exclusively to 6.11.x for me. I've got two diagnostics here. The earlier one has Samba logs enabled, but that was spitting out the smbd synthetic pathref error over and over again, which according to other posts is more a nonissue. The second diagnostics had samba logging set to 0 to better show the rest of the syslog. Please let me know if you have any ideas. I appreciate your help! undoot-diagnostics-20221104-1202.zip undoot-diagnostics-20221106-1247.zip
  10. Yo friends, I'm not sure about you but I've been struggling to find a board this generation that supports a 2-gamers-1-pc build, especially with the number of drives we currently support. Coming from a 1950x, we had more PCIe lanes than we could deal with. Npw, in this current market, I'm stumped on what may be a viable upgrade path. I hopped on pcbuilder.net and pcpartpicker.com to filter motherboards, but several of those tools take into account only physical PCIe slot size and not the number of lanes driven in different configurations. I'm reaching out to the forum to see if anyone's seen a better board than me for the following: 2x PCIe gen 4-5x16 slots (driving 8x/8x since we have 2 GPUs) 2-3x PCIe gen 3-4x1 slots, or less if the board supports: 8x sata slots (can be replaced by a PCIe card above) 3x NVME slots (can be replaced by a PCIe card too) A ****ton of USB headers/ports (capable of being split off from the Unraid boot drive) I've been seeing too many boards that might have enough of the PCIex1 or PCIex4 slots, but bump the second GPU to x4 as well. Otherwise, I see proper 8x/8x support and not much else. PCIe generation on the GPUs is the only important one, everything else in my rig today is pretty much gen 3. Therefore it seems like with even only 24 PCIe gen 5 lanes there should be enough bandwidth to support what my 1950x is driving now (64 gen 3 lanes on 1950x vs ~96 lanes on 1950x). Has someone come across a board that meets these parameters?
  11. After a slew of changes (without validating them individually of course) it APPEARS to be stable, for now. Notable events: Point our VM GPUs at different BIOSes (I point secondary GPUs at BIOS files too, but somehow they were pointing at the same one, same GPU so no compatibility issue, but maybe a file handle issue) Disable Hypervisor (enabling Hypervisor afterwards has not re-induced the issue from what I can tell) Fast boot was enabled on a VM. Disabled that crap. Not sure if that did it but the words "fast startup" and "power state" go hand in hand in my brain. Also you should have that disabled by default in these VMs. Did not try nvidia-persistenced tag yet, and apparently didn't need it If anyone else has any thoughts, please share them.
  12. Hi everyone, Fair warning, this is my crosspost attempt from Reddit. Why I started there, I'm not sure. I'm an owner of a 2-streamers-1-CPU build that holds 2 GTX 1080s and a Threaderipper 1950x, and I've been stable for a long period of time, 6 months to a year since my last issue. About a couple of months ago I upgraded to 6.9.0rc2, and after about a month of uptime my GF and I had a double-blackscreen while playing the same game, Monster Hunter World. Logs from the VMs are as follows, see the bottom 4 lines for the major issue: -boot strict=on \ -device nec-usb-xhci,p2=15,p3=15,id=usb,bus=pci.0,addr=0x7 \ -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x3 \ -blockdev '{"driver":"host_device","filename":"/dev/disk/by-id/ata-Samsung_SSD_850_EVO_1TB_S3PJNB0J806112N","node-name":"libvirt-4-storage","cache":{"direct":false,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-4-format","read-only":false,"cache":{"direct":false,"no-flush":false},"driver":"raw","file":"libvirt-4-storage"}' \ -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=libvirt-4-format,id=virtio-disk2,bootindex=1,write-cache=on \ -blockdev '{"driver":"file","filename":"/mnt/disks/GameCache/Blizzard/FearTurkey/vdisk2.img","node-name":"libvirt-3-storage","cache":{"direct":false,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-3-format","read-only":false,"cache":{"direct":false,"no-flush":false},"driver":"raw","file":"libvirt-3-storage"}' \ -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=libvirt-3-format,id=virtio-disk3,write-cache=on \ -blockdev '{"driver":"file","filename":"/mnt/user/isos/Win10_1803_English_x64.iso","node-name":"libvirt-2-storage","auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-2-format","read-only":true,"driver":"raw","file":"libvirt-2-storage"}' \ -device ide-cd,bus=ide.0,unit=0,drive=libvirt-2-format,id=ide0-0-0,bootindex=2 \ -blockdev '{"driver":"file","filename":"/mnt/user/isos/virtio-win-0.1.173-2.iso","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-1-format","read-only":true,"driver":"raw","file":"libvirt-1-storage"}' \ -device ide-cd,bus=ide.0,unit=1,drive=libvirt-1-format,id=ide0-0-1 \ -netdev tap,fd=33,id=hostnet0,vhost=on,vhostfd=34 \ -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:45:b5:84,bus=pci.0,addr=0x2 \ -chardev pty,id=charserial0 \ -device isa-serial,chardev=charserial0,id=serial0 \ -chardev socket,id=charchannel0,fd=35,server,nowait \ -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 \ -device usb-tablet,id=input0,bus=usb.0,port=2 \ -device 'vfio-pci,host=0000:44:00.0,id=hostdev0,bus=pci.0,addr=0x6,romfile=/mnt/user/nas/Build Info (DO NOT TOUCH)/gtx1080.dump' \ -device vfio-pci,host=0000:44:00.1,id=hostdev1,bus=pci.0,addr=0x8 \ -device vfio-pci,host=0000:09:00.3,id=hostdev2,bus=pci.0,addr=0x9 \ -device vfio-pci,host=0000:42:00.0,id=hostdev3,bus=pci.0,addr=0xa \ -device usb-host,hostbus=1,hostaddr=2,id=hostdev4,bus=usb.0,port=1 \ -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ -msg timestamp=on 2021-03-15 20:58:17.231+0000: Domain id=1 is tainted: high-privileges 2021-03-15 20:58:17.231+0000: Domain id=1 is tainted: host-cpu char device redirected to /dev/pts/0 (label charserial0) 2021-03-16T05:19:05.856739Z qemu-system-x86_64: vfio: Unable to power on device, stuck in D3 2021-03-16T05:19:05.861722Z qemu-system-x86_64: vfio: Unable to power on device, stuck in D3 2021-03-16T05:19:07.054770Z qemu-system-x86_64: vfio: Unable to power on device, stuck in D3 2021-03-16T05:19:07.054895Z qemu-system-x86_64: vfio: Unable to power on device, stuck in D3 "Ok", I think to myself, "Maybe sitting for multiple months on an RC branch wasn't a good idea, upgrade to 6.9.1". I make the upgrade and decide to try again. My GF and I play some more games for a couple hours, then swap back to Monster Hunter World. After roughly an hour of MHW, crash again, both VMs. Same message. Now I'm aware of the dreaded AMD reset bug and I've seen Code 43s but this is neither AMD nor a Code 43, not to mention the system has been as stable as our relationship through Covid, so I'm at a loss. Google fu told me about trying to disable the hypervisor, and I'm also thinking of vfio binding one or both GPUs at boot, but I'd be concerned about boot, since one of the 1080s is the primary card. Does anyone have a better idea at what could be wrong? Logfiles can be found here. Forgive the VM names, I'm a TFS fan at heart.
  13. You've got several different options. Either swap your motherboard to one with 5 slots (I'm not sure one exists) or use PCI bifurcation to split a slot. I believe this is your best bet. Now, you'll possibly run into bottlenecks depending on how PCIe 4.0 handles 8 lanes for a GPU (assuming you split a 16x slot) versus video encoding throughput. If the loss in bandwidth corresponds to a loss in encoding speed, you can tie that to dollars lost per unit time. I would research PCIe 4.0 bifurcation. Then of course research what PCIe lanes go to which CCX's on the Zen 2 die (I believe Zen 2 uses uniform access for both memory and PCIe via the IO module on the socket, so this isn't actually. problem). Split everything so you manage CCX's properly and avoid CPU communication cross-die or cross CCX if possible, and you're gold.
  14. I figured out my own hacky solution. Essentially I have a single drive (or in this case I just RAIDed 2 NVMEs for speed) and set them up with btrfs, single share, for both users. I dragged my own shared games over, then cp --reflink'd everything for the other user. For steam, it's best to get the original user's app manifest files, and delete the executables for other users so that steam can remake them from scratch for each user, apparently part of the DRM protections. Bottom line every game file but mod, .exe, and app manifest files were reflinked. Then every time I send through a game update on both libraries, I just rebuild the reflinks using jdupes --dedupe. The last step was the hardest. How to hit convergence after CoW drift? judpes doesn't do a perfect job, it still compares large files even after they've already been reflinked, not sure if btrfs has a way to cleanly expose those for more efficient diffs. However, that's the hackiest part of this job, so I'm happy with the result. Glad to raise a dead thread. Hope this helps someone else.
  15. Sorry dmacias, not sure how I missed the notification for your update, jdupes 1.14 not compiled with btrfs support. Running "jdupes --dedupe" shows the message. I'll add a PR when I get a chance.