dnoyeb

Members
  • Posts

    136
  • Joined

  • Last visited

Everything posted by dnoyeb

  1. Just wanted to chime in after running this release since it came out. I was crashing every few days and getting concerned I had issues with ram or cpu.. then this release came out and viola! No more crashes (knock on wood). I do remember seeing some macvlan errors from time to time, and now it's all gone. Thanks so much for this release and the fixes.
  2. btw few examples of people getting it claimed back in this other thread: I for one had issues since my laptop and the server were on different subnets; was able to get it claimed via the web login for the server itself after getting on the same subnet and removing the variables from the xml file in the data folder as outlined in that other thread.
  3. Ya I think the complexity for my setup was the different subnets and firewall rules between them. Was unable to do the curl command successfully. Anyways, looks like we have a few successes now in this thread with ideas for others to try!
  4. Ok, so just in case anyone has a similar issue to me... let me explain what I did. I followed the directions above and removed the 4 parameters/variables from the xml file from the app data folder. started up the docker and then was able to reclaim via the general settings. While you would THINK this would be easy, my laptop was on a different subnet than the server and I NEVER got the claim option in general. Once I thought about it and realized the laptop was not on the IOT network; I swapped over and now have the claim option. So, just in case anyone else is running multiple networks at their house, make sure you jump on the same subnet. Not sure why that made a difference but it did. Perhaps I didn't need to edit the xml file, so first try joining the same network just in case you run an IOT network like I do.
  5. I dont' have an option to do that.
  6. In the same boat... argh... anyone successfully get the claim token and stuff to work?
  7. lmao, ok, is that something that should / would have changed when upgrading? Thanks
  8. Upgraded my test machine to the latest rc1 to see how it would do; very basic system, only like 2 plugins. Just noticed docker failed to start; and when looking around; I see a cache mnt point and a disk1; but no actual "user" anymore... of which I have a feeling is keeping docker from running if I had to guess since it doesn't have a spot for its files now. Anyways, diag attached. tower-diagnostics-20210813-0950.zip
  9. thanks man. ok so then really there is only one drive that has any "real" errors to it, which are reallocated sectors. will keep an eye on things going forward and see if anything increases; pulled a diag from 2019 and that same drive had the same 8 reallocated sectors... so probably not leading to the issue. Thanks again so much for your help. Time will tell.
  10. ah, actually disk 7 has reallocated sectors. Sounds like candidate to replace... Now to figure out how to upgrade parity to 12tb (from 8tb) and not lose my parity; then use the 8tb parity drive to become disk 7.... also strange is just about every newer drive in my system show signs of "Raw read error rate" and "Hardware ECC recovered"... not really sure what to make of that (haven't really looked at that before to know if it's recently increased or not). Only that one disk 7 has reallocated sectors however.
  11. Guess I'll start with doing the file system check on that device... Other than that; at a bit of a loss since the pool's disks don't show any errors or smart flags. This thing has been such a rock for so many years!
  12. What's strange is that day, the system was just completely locked up. first time in the 10 year's I've had the system that it has happened.... That and it did a parity check on startup and didn't find anything.
  13. Needing a little help; got an email today about 3645 errors; ran it again since I realized my monthly check wasn't setup to auto correct. Same number, shows that it fixed it. This system has been running for almost 10 years at this point; migrated to new mobo/cpu years ago, but there are drives in there since 2012. Anyways, not sure what to make of it, since in the gui it doesn't show any drive specific errors... I did see in my log that the cache drive has some sort of metadata store error and recommends repairing; but don't see how / why that would result in parity errors. Can someone take a peak at the diag file and advise on how I can tell if there is a drive needing to be replaced? I've got 2 extra 12tb sitting in there just waiting to be swapped into the array (precleared already)... Thanks! unraid-diagnostics-20210517-1039.zip
  14. Retested on beta 29, issue still there. Based on further testing and seeing Radek's comment; it occurs in 6.8.3 as well. I tried both cards that were in the box, same issue on each address. Here's the current error that pops up and the diag is attached: Execution error internal error: qemu unexpectedly closed the monitor: 2020-09-29T17:39:26.418978Z qemu-system-x86_64: -device vfio-pci,host=0000:03:00.0,id=hostdev0,bus=pci.4,addr=0x0: vfio 0000:03:00.0: Failed to set up TRIGGER eventfd signaling for interrupt INTX-0: VFIO_DEVICE_SET_IRQS failure: Device or resource busy One side note I just wanted to bring up; based on mellanox's website, they're up to driver version 5.x whereas the diagnostic files i'm looking through seem to show this build is using the 4.0 driver version. Any reason to think that could contribute? Or when these things are passed through are they completely transparent to unraid? tower-diagnostics-20200929-1351.zip
  15. oh I see one i'm going to test tomorrow! "webgui: better handling of multiple nics with vfio-pci" do you guys prefer that we update the testing in our bug report thread for record keeping purposes?
  16. Anonymized version attached; if you need the other version let me know and i'll dm it. Other testing done last night: I tried disabling the usb2.0 ports and had same issue, also tried disabling the usb3.0 ports and moved the key over to the 2.0 to just make sure it wasn't something funny like that. Neither helped. Thanks for any guidance. tower-diagnostics-20200804-0854.zip
  17. bit more info, I see in the logs these lines: Aug 3 19:04:09 Tower kernel: vfio-pci 0000:03:00.0: enabling device (0100 -> 0102) Aug 3 19:04:10 Tower kernel: vfio-pci 0000:03:00.0: vfio_ecap_init: hiding ecap 0x19@0x18c Aug 3 19:04:10 Tower kernel: genirq: Flags mismatch irq 16. 00000000 (vfio-intx(0000:03:00.0)) vs. 00000080 (ehci_hcd:usb1) Hopefully this helps in the troubleshooting..
  18. Trying to get a mellanox-3 card passed through to a VM and having some troubles. To set the stage, I have two of these cards in my unraid server, I am using one of them for the OS. I used the Tools / System Devices / Bind Selected to VFIO at boot method and have verified that the card is added: cat vfio-pci.cfg BIND=0000:03:00.0|15b3:1003 Here is the log showing it was successful in being bound at boot of unraid: Loading config from /boot/config/vfio-pci.cfg BIND=0000:03:00.0|15b3:1003 --- Processing 0000:03:00.0 15b3:1003 Vendor:Device 15b3:1003 found at 0000:03:00.0 IOMMU group members (sans bridges): /sys/bus/pci/devices/0000:03:00.0/iommu_group/devices/0000:03:00.0 Binding... Successfully bound the device 15b3:1003 at 0000:03:00.0 to vfio-pci --- vfio-pci binding complete Devices listed in /sys/bus/pci/drivers/vfio-pci: lrwxrwxrwx 1 root root 0 Aug 3 18:56 0000:03:00.0 -> ../../../../devices/pci0000:00/0000:00:1c.4/0000:03:00.0 ls -l /dev/vfio/ total 0 crw------- 1 root root 249, 0 Aug 3 18:56 12 crw-rw-rw- 1 root root 10, 196 Aug 3 18:56 vfio This card shows up when setting up the VM : Other PCI Devices: Mellanox Technologies MT27500 Family [ConnectX-3] | Ethernet controller (03:00.0) When that box is checked and the VM is started, this error shows up in the log: 2020-08-04T00:24:59.369033Z qemu-system-x86_64: -device vfio-pci,host=0000:03:00.0,id=hostdev0,bus=pci.0,addr=0x8: vfio 0000:03:00.0: Failed to set up TRIGGER eventfd signaling for interrupt INTX-0: VFIO_DEVICE_SET_IRQS failure: Device or resource busy 2020-08-04 00:25:00.476+0000: shutting down, reason=failed I'm confused as to how / why it is in use since it is allocated to VFIO at boot. I have tried enabling "PCIe ACS override" and for the fun of it did the "VFIO allow unsafe interrupts" Neither helped (didn't think they would but tried anyways. Does anyone have any thoughts on other things to try? I am working to setup a RockNSM VM and need to pass through this NIC so the VM can capture the 10g mirror of my uplink to my router. Thanks in advance for any assistance.
  19. Wow I'm literally in the exact same boat... I have been debating on upgrading to some 2696 v2's in the same motherboard or upgrading to an amd since they seem to be extremely strong. The things i'm concerned with are the pci x16 slots as I have 3 m1015 cards now (I have about 20 drives) yet still having space for a 10gb nic and onboard VGA + at least one more slot for my nvidia 1660 for the gaming vm / plex transcoding. Have you made any headway on the motherboard you'd pick?
  20. Holy smokes, the new alpha build on transcoding via the nvidia card is freaking unreal.... my cpu levels went to practically zero. huge improvement having decode and encode working by default.
  21. Ok, big thanks to JasonM! Got the VM side working while using the Nvidia build without pinning anything. Few things were needed, first: Step 1 (initial instructions JasonM shared and probably would have been enough if I was already running OVMF): However this didn't seem to get it working, ended up that my Windows10 VM was running on SeaBios. I used the directions from alturismo to prepare my SeaBios backed version of Windows 10 install for OVMF: Step 2 (prepare vidsk for OVMF): Step 3: Edit the newly created VM template, pin the CPU's, manually add the vdisk, checking boxes for the keyboard / mouse I was attaching and saving without starting. Step 4: edit VM again, this time go into XML mode, add hostdev code you built from Step one up above and paste under the last entry for hostdev Step 5: save and edit one last time in GUI mode. Add the Nvidia GPU and sound. Save step 6: verify you don't have any transcodes going on, boot up the sucker and go play some Doom. just figured i'd share in case anyone came across my issue and wondered how it got solved.
  22. replying to my issue... I get the plugin to see the card if I remove the vfio-pci.ids= (my id's of the nvidia gpu, nvidia sound, nvidia usb, nvidia serial) This breaks my ability to connect to a VM. Tried doing the pcie_acs_override=downstream but still no go... guess next try is to add multithread I guess and see if that does anything. added the multithread option, still a no go. So does anyone with these newer cards (1660ti and above) have the ability to use the card with the nvidia plugin and kvm? KVM doesn't seem to lime my iommu group due to those dang usb/serial controllers on the nvidia card.... hence I had to stub them to allow me to launch the vm. Really hope I can do both; otherwise may just have to return the card for another model (1060 or the likes) Side note, does anyone else have a 1660ti?? Do they all have this stupid nvidia USB/Serial on them? It seems to be what screwing with me.