thenonsense

Members
  • Posts

    116
  • Joined

  • Last visited

Recent Profile Visitors

1630 profile views

thenonsense's Achievements

Apprentice

Apprentice (3/14)

5

Reputation

  1. After a slew of changes (without validating them individually of course) it APPEARS to be stable, for now. Notable events: Point our VM GPUs at different BIOSes (I point secondary GPUs at BIOS files too, but somehow they were pointing at the same one, same GPU so no compatibility issue, but maybe a file handle issue) Disable Hypervisor (enabling Hypervisor afterwards has not re-induced the issue from what I can tell) Fast boot was enabled on a VM. Disabled that crap. Not sure if that did it but the words "fast startup" and "power state" go hand in hand in my brain. Also you should have that disabled by default in these VMs. Did not try nvidia-persistenced tag yet, and apparently didn't need it If anyone else has any thoughts, please share them.
  2. Hi everyone, Fair warning, this is my crosspost attempt from Reddit. Why I started there, I'm not sure. I'm an owner of a 2-streamers-1-CPU build that holds 2 GTX 1080s and a Threaderipper 1950x, and I've been stable for a long period of time, 6 months to a year since my last issue. About a couple of months ago I upgraded to 6.9.0rc2, and after about a month of uptime my GF and I had a double-blackscreen while playing the same game, Monster Hunter World. Logs from the VMs are as follows, see the bottom 4 lines for the major issue: -boot strict=on \ -device nec-usb-xhci,p2=15,p3=15,id=usb,bus=pci.0,addr=0x7 \ -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x3 \ -blockdev '{"driver":"host_device","filename":"/dev/disk/by-id/ata-Samsung_SSD_850_EVO_1TB_S3PJNB0J806112N","node-name":"libvirt-4-storage","cache":{"direct":false,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-4-format","read-only":false,"cache":{"direct":false,"no-flush":false},"driver":"raw","file":"libvirt-4-storage"}' \ -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=libvirt-4-format,id=virtio-disk2,bootindex=1,write-cache=on \ -blockdev '{"driver":"file","filename":"/mnt/disks/GameCache/Blizzard/FearTurkey/vdisk2.img","node-name":"libvirt-3-storage","cache":{"direct":false,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-3-format","read-only":false,"cache":{"direct":false,"no-flush":false},"driver":"raw","file":"libvirt-3-storage"}' \ -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=libvirt-3-format,id=virtio-disk3,write-cache=on \ -blockdev '{"driver":"file","filename":"/mnt/user/isos/Win10_1803_English_x64.iso","node-name":"libvirt-2-storage","auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-2-format","read-only":true,"driver":"raw","file":"libvirt-2-storage"}' \ -device ide-cd,bus=ide.0,unit=0,drive=libvirt-2-format,id=ide0-0-0,bootindex=2 \ -blockdev '{"driver":"file","filename":"/mnt/user/isos/virtio-win-0.1.173-2.iso","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-1-format","read-only":true,"driver":"raw","file":"libvirt-1-storage"}' \ -device ide-cd,bus=ide.0,unit=1,drive=libvirt-1-format,id=ide0-0-1 \ -netdev tap,fd=33,id=hostnet0,vhost=on,vhostfd=34 \ -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:45:b5:84,bus=pci.0,addr=0x2 \ -chardev pty,id=charserial0 \ -device isa-serial,chardev=charserial0,id=serial0 \ -chardev socket,id=charchannel0,fd=35,server,nowait \ -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 \ -device usb-tablet,id=input0,bus=usb.0,port=2 \ -device 'vfio-pci,host=0000:44:00.0,id=hostdev0,bus=pci.0,addr=0x6,romfile=/mnt/user/nas/Build Info (DO NOT TOUCH)/gtx1080.dump' \ -device vfio-pci,host=0000:44:00.1,id=hostdev1,bus=pci.0,addr=0x8 \ -device vfio-pci,host=0000:09:00.3,id=hostdev2,bus=pci.0,addr=0x9 \ -device vfio-pci,host=0000:42:00.0,id=hostdev3,bus=pci.0,addr=0xa \ -device usb-host,hostbus=1,hostaddr=2,id=hostdev4,bus=usb.0,port=1 \ -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ -msg timestamp=on 2021-03-15 20:58:17.231+0000: Domain id=1 is tainted: high-privileges 2021-03-15 20:58:17.231+0000: Domain id=1 is tainted: host-cpu char device redirected to /dev/pts/0 (label charserial0) 2021-03-16T05:19:05.856739Z qemu-system-x86_64: vfio: Unable to power on device, stuck in D3 2021-03-16T05:19:05.861722Z qemu-system-x86_64: vfio: Unable to power on device, stuck in D3 2021-03-16T05:19:07.054770Z qemu-system-x86_64: vfio: Unable to power on device, stuck in D3 2021-03-16T05:19:07.054895Z qemu-system-x86_64: vfio: Unable to power on device, stuck in D3 "Ok", I think to myself, "Maybe sitting for multiple months on an RC branch wasn't a good idea, upgrade to 6.9.1". I make the upgrade and decide to try again. My GF and I play some more games for a couple hours, then swap back to Monster Hunter World. After roughly an hour of MHW, crash again, both VMs. Same message. Now I'm aware of the dreaded AMD reset bug and I've seen Code 43s but this is neither AMD nor a Code 43, not to mention the system has been as stable as our relationship through Covid, so I'm at a loss. Google fu told me about trying to disable the hypervisor, and I'm also thinking of vfio binding one or both GPUs at boot, but I'd be concerned about boot, since one of the 1080s is the primary card. Does anyone have a better idea at what could be wrong? Logfiles can be found here. Forgive the VM names, I'm a TFS fan at heart.
  3. You've got several different options. Either swap your motherboard to one with 5 slots (I'm not sure one exists) or use PCI bifurcation to split a slot. I believe this is your best bet. Now, you'll possibly run into bottlenecks depending on how PCIe 4.0 handles 8 lanes for a GPU (assuming you split a 16x slot) versus video encoding throughput. If the loss in bandwidth corresponds to a loss in encoding speed, you can tie that to dollars lost per unit time. I would research PCIe 4.0 bifurcation. Then of course research what PCIe lanes go to which CCX's on the Zen 2 die (I believe Zen 2 uses uniform access for both memory and PCIe via the IO module on the socket, so this isn't actually. problem). Split everything so you manage CCX's properly and avoid CPU communication cross-die or cross CCX if possible, and you're gold.
  4. I figured out my own hacky solution. Essentially I have a single drive (or in this case I just RAIDed 2 NVMEs for speed) and set them up with btrfs, single share, for both users. I dragged my own shared games over, then cp --reflink'd everything for the other user. For steam, it's best to get the original user's app manifest files, and delete the executables for other users so that steam can remake them from scratch for each user, apparently part of the DRM protections. Bottom line every game file but mod, .exe, and app manifest files were reflinked. Then every time I send through a game update on both libraries, I just rebuild the reflinks using jdupes --dedupe. The last step was the hardest. How to hit convergence after CoW drift? judpes doesn't do a perfect job, it still compares large files even after they've already been reflinked, not sure if btrfs has a way to cleanly expose those for more efficient diffs. However, that's the hackiest part of this job, so I'm happy with the result. Glad to raise a dead thread. Hope this helps someone else.
  5. Sorry dmacias, not sure how I missed the notification for your update, jdupes 1.14 not compiled with btrfs support. Running "jdupes --dedupe" shows the message. I'll add a PR when I get a chance.
  6. Hi dmacias, Can you please add the newest (2.10) version of rmlint? We're using it to get CoW convergence on our caches Edit: It looks like the other option for this, jdupes, was built without btrfs support. For those who need it, can you please enable that flag in the makefile and republish? I'm not sure if it's in the works to configure where Nerdpack is pointed, but if it is, I'd be willing to open my own packages to lend a hand. I imagine most others feel similarly.
  7. Hi everyone, I'm a couple-year user of Unraid but I've never found my most favorable way to share games between users. My usecase is: Players 1 and 2 both have the same general batch of games they want to play, at the same time. I used to eat the storage redundancy with separate drives holding the same content, to avoid issues for both players having access to the same files. My question is: is there a way to get both users to share the EXACT same gaming library, same files, reasonably? Last I tried pointing both users' steam clients at the same game files (SATA SSD share running BTRFS), I didn't run into updating collisions, but when both users accessed the same files the performance tanked for each instance of the game. I don't think accelerated storage is the answer, I think it might be file handles. I could copy the files to different locations on the same disk/share, and take advantage of CoW, but I don't know if updates that trigger CoW will remove the redundant files when they reach convergence (after both file sets update). I could also just swap up to NVMe, and see if increased r/w performance will help with this issue. There are a few different directions I can go. What does the public recommend?
  8. I figured it out. I had expected CoW to be a lot more intensive than it actually is. It's not. Copies of files are implicitly made only when a file is copied and then one of those copies is modified upon. I expected some copy-in-place transactional craziness that versioned files as they were modified, and didn't care if a copy was made previously. I'm confident I'm not explaining my original point of view well, but that's because it was pretty stupid to begin with.
  9. Hi all, I'm trying to understand something about Btrfs and how it tracks its multiple copies of files. I have a couple scenarios and I'm tryting to understand how COW resolves with them: A file is put onto the file system by User A. User B updates the file. What differentiates User A from User B? How is data chosen (original vs updated) to present to User A or B? What about User C, who has seen neither original nor updated variant? A library, or video game, or even a git repo is put onto the file system. That library / game / repo receives an update 2 weeks later. Large amounts of edits may be made, so does this rapidly consume storage? A file is updated/deleted by User A. How does that file get updated/deleted in the original, so that User B either sees the new version or no file at all? I'd appreciate if someone could explain to me some of these items or point me in a direction that helps to explain. I've been over wikipedia a couple times, and devouring stackoverflow, but I'm still not seeing clear answers to my questions, likely out of not understanding the answers.
  10. Not sure if this was ever resolved, what cores are you typically passing through?
  11. It sounds like you've got gigabit internet (or at least on that scale). Can you do some file transfers between your NAS and a local VM? NAS and another device on the network? Post your local speed results. I'll do the same this weekend and post my findings, but I'm living nicely on just 100Mb speeds to the rest of the world. I'm also getting those speeds on all my unraid resources. I'm sitting on 6.7.0 stable as well.
  12. I got the notification about linked content and I'm real glad that people are using this. It's become quite a good resource. Due to the architecture of Ryzen, there are PLENTY of tidbits that need to be in a row to reduce inter-die communication, and therefore latency. Making sure you're accessing memory modules directly linked to your die, using only cores on your die (or even just on the same CCX), what PCIe lanes go where, etc. It looks like a lot of ground has been covered since your initial post, with cross-die core allocations and nebulous GPU performance. It seems like you've corrected quite a few problems. To add to your solutions, I'm going to recommend some basic architecture decisions that worked for me in the past (on my 1950x, so grain of salt): Try to reduce vms to all cores on one CCX, and then expand from there. Try to get VMs on different dies. Try to leave 10-20% of your total memory to Unraid, then scale up from there. Numa-ize your memory. Try to get 2 nodes and assign node 0 to die 0 operations (like VM 1) and node 1 to die 0 to VM 2. Reallocate disks for your VMs. Try passing them through. Most of all, experiment. Foot my plane ticket, and I'll look alongside you
  13. It isn't. There are guides like this one here: https://forums.unraid.net/topic/35112-guide-passthrough-entire-pci-usb-controller/ However it isn't very up-to-date. Here's the typical process: Go to /Tools/SysDevs and look for your USB controllers. You're looking for all USB controllers. An example would be [1022:145c] 09:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) USB 3.0 Host Controller You can also buy PCI->USB expansion cards to more easily find a USB controller. Next is to follow the linked guide, and find out what USB ports map to what buses. This is important for your motherboard, but not as important if you bought your own USB card. Map any of the controllers (your expansion card OR a controller on your motherboard that is NOT the same bus your Unraid flash drive is on) to the ports it controls. In your flash drive config, add the line vfio-pci.ids=1022:145c between "append" and any other configs (like isolcpus) but replace the 8-digit value with the corresponding 8-digit PCI identifier you pulled in our first step. Reboot your rig, and note in your VM config that the USB controller should now be on the list of passable devices at the bottom. Sorry for errors in the guide, a few drinks in. YMMV
  14. There are a few views on this but no replies. I will say that I personally have seen disconnect issues, but I don't use individual device passthrough. I pass through the controller, and sometimes see a disconnect/reconnect when transferring large files (that breaks the transfer) but it always reattaches the device. Have you tried passing through the controller? It should at least easily facilitate the reconnect when it drops. No need to buy another controller, you can pass through one of the controllers on your mobo.
  15. You do, the one built into your 8700k. Intel HD 630. Set the integrated GPU in BIOS as default, and boot to unraid. From there, just pass through your GTX 1080 to the VM. You also don't need one. You can copy the BIOS of your GPU using the tutorial by SpaceInvaderOne for using passthrough with a single GPU. However, this isn't necessarily the easiest for first-timers.