blacklight

Members
  • Posts

    26
  • Joined

  • Last visited

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

blacklight's Achievements

Noob

Noob (1/14)

2

Reputation

  1. it seems that I was not informed enough and KVM is in fact a Type 1 hypervisor. https://ubuntu.com/blog/kvm-hyphervisor Correct me if wrong .... I saw both over different sources. Still the question is Xen still out there for Unraid ? no idea ...
  2. Hey there, I hope that is not a silly question, but I am currently solving problems on Truenas virtualization when I got a reply on the TN forum that the OS actually prefers Type I virtualization with passed through PCIe devices. https://forums.truenas.com/t/truenas-core-cant-execute-smart-check-not-capable-of-smart-self-check-resulting-in-bug/681/5 Because of that I wanted to ask if it is still possible to use the Xen based (Type I) virtualization in Unraid that Limetech used years ago or is that gone for good ? I saw there was a version 8 years ago that allowed both. How did that work back then ? Did you just switch the machine type ? Or was there a different template (like Windows 11, FreeBSD, etc.) ? Also has someone experience with virtualizing Truenas Core & Scale comparing them both ? Does it make a difference for the compatibility from KVM if my Guest OS is different than the Host (Core with FreeBSD) or if there is a same Guest Hosts systems (Scale with Linux base) performing better/ creating less problems (Considering PCIe passthrough) ? Thanks
  3. Bug/Error persists after BIOS Update (which resets BIOS entries) and Unraid update (6.12.10) + I swapped the HBA to one of the BIFUR x8 slots to have full bandwidth. Unfortunately it didn't solve the problem. I noticed two things: 1. the qemu log stopped at one day but the VM was running for another 3 days, so I also couldn't see any shutdown command in the qemu log. Is it because the log overflows with the VFIO DMA MAP errors ? Can I clear the log somehow ? I also had another error inside the VM with TrueNAS. It couldn't SMART check two drives and I think it stopped at some point. Could it be that this was the same point the DMA MAP errors stopped ? I will actually give it a try and turn off the SMART check inside Truenas, maybe this fixes it. Could failed drives lead to such an error, even if the HBA is stubbed ? EDIT: the qemu log seems to stop shortly after starting the VM. Even under load or running tasks inside Truenas I can not see any new DMA MAP errors. 2. The unresponsive Unraid GUI behavior actually started after days of the VM running and not after stopping it immediately (which was the case before). I was able to restart it a few times without the GUI or the VM acting up. Any idea where I can find what the attribute -22 even means. I am researching about this error in particular for months now and I still have no clue what the error (code) actually means and yes I looked up the qemu documentation ... no luck from my side there.
  4. And the crash is reproducible. Every time I want to SMB move files while 3 VMs are active EDIT: It happens for all transfers now no matter what Windows VM I use. I didn't even change the Truenas xml and it was working for days ... man Unraid makes me go insane : ( I however managed to get a diagnostics dump. It randomly works ... Find it attached. The only thing I noticed is that the times are not synchronized between the logs from libvirt, the trueness vm and the windows vm. Another idea, is there a chance to change the libvirt/qemu version (again) to maybe an older or a newer version ? I already did it with the edk2 as explained here: Does that include libvirt/qemu ... I have no idea ... Glad about any answer. icarus-diagnostics-20240322-0553.zip
  5. If you consider trying another mainboard (I know it's not ideal): I am close to using all PCIe lanes on the Asus W680 ACE with the i9. It works like a charm and the IOMMU layout is perfect for my use case. You can easily riser the m2 slots for additional x4 PCIes and I will even try a SLIM SAS to PCIE adapter soon, that would give you up to 8 PCIe slots (2 x8 and 6 x4) on a workstation mainboard. I can't speak for that HBA and mainboard in particular never used it, sry ...
  6. I can just tell you from my experience that Unraid is a pain in the *** with virtualization with EXACTLY this card (see my other posts) but that was never the fault of the NAS component of UNRAID itself. It always detected all drives attached to this HBA if I didn't bound it to VFIO and it also detected all other drives attached to the mainboard. Did you figures this out ? Otherwise go through the bios and search for every option you find according the LSI HBA and/or the HDDs attached. A separate Bios entry should be there for the HBA, what information is in there ? Did you flash both Controllers ? This HBA has two 8x SAS controllers ! Did you flash them yourself and how ? Did you validate IT mode (I only did this once successful with a booted efi version of sas3flash) ? Did you check the VFIO is not bound ? Did you check the PCIe slot with another device ? Is it maybe turned of in BIOS or by another BIFUR (Bifurcation) feature ?
  7. So I gave it a try and implemented the "<maxphysaddr mode='passthrough'/>" and it WORKED for a few days for the Truenas VM !! So the VM can be started/stopped/restarted from within the VM and also Unraid, is performing under load and it seems like it won't crash after a long time on its own, thank you very much for the help there ! BUT (and that's a big but unfortunately) the LOG is still spammed with VFIO DMA MAP -22 errors AND the error seemed to progress into another VM but with the same symptom: I had a Windows 10 VM running in parallel with a passed through GPU and the same lock up happened: - Windows 10 VM (with GPU) failed, RDP froze (see attached syslog__.txt) - second Windows 10 VM continued running w.o. problems but a file transfer inside stopped, that made me curious -> more fatal the Truenas vm failed but continued running -> more precisely the pool attached to the HBA failed while the other pool that had the two VMs on it was still fine. -> so my guess here is: something inside the virtualization mechanism of Unraid failed and dropped all VFIO maps or attached devices - The Unraid guy continued to work, but VM, Docker & Settings-> VM/Docker froze again ! - I could download the syslog from the GUI - but I couldn't create a diagnostics package - after clicking on VM the GUI froze again I already implemented the "<maxphysaddr mode='passthrough'/>" for the Windows VM with GPU because I wasn't able to restart it. I always had to force shut down it and sometimes it froze randomly and the VM paused (I attached a log of that event). When I tried to use the Unraid shut down/restart one core went to 100% and the other stayed at 0% and the VM got stuck in this crashed state. The maxphysaddr didn't help here .... Is there any solution to avoid that all VFIO devices fail at once ? Do I have to use "VFIO Allow Unsafe Interrupts:" ? I wanted to avoid that because then the log is completely empty and I can't trace the errors. Thanks again @SimonF that helped me a lot, because the main part, the NAS, is working. But the VFIO trouble remains. I posted it here because I didn't want to start a new post because the symptoms of a locked up UI and failed VFIO devices are the same. I also added the two xmls of the VMs, the SOUNDCARD is passed through, so that's not the problem : P I will research more about that, because I found way more input for Windows/GPU/Gaming VMs than for HBA or Truenas problems, but still if some expert from Unraid has any clou of what part is faulty inside that VFIO construct I would be glad about a (technical) answer and a solution. Thanks. syslog__.txt paused vm log.txt xml windows gpu.txt xml truenas.txt
  8. Found the solution myself after research and trying some variants of User Scripts (plugin) I settled with this variant: 1. Created two custom scripts, one executed after start of the machine, one executed before stopping the machine 2. Customize scripts with mkdir and mount e.g.: Start Script #!/bin/bash sleep 150 mkdir /mnt/remotes/fastVMs_ext mkdir /mnt/remotes/nextcloud mount -t nfs 192.168.0.116:/mnt/fast_data/fastNAS_Data/fastVMs /mnt/remotes/fastVMs_ext mount -t nfs 192.168.0.116:/mnt/main_data/PC_Data/nextcloud_main /mnt/remotes/nextcloud Stop Script #!/bin/bash umount -t nfs 192.168.0.116:/mnt/fast_data/fastNAS_Data/fastVMs /mnt/remotes/fastVMs_ext --lazy umount -t nfs 192.168.0.116:/mnt/main_data/PC_Data/nextcloud_main /mnt/remotes/nextcloud --lazy rmdir /mnt/remotes/fastVMs_ext rmdir /mnt/remotes/nextcloud The paths of the scripts are located on the flash (mounted it on my Mac after opening the share on the Unraid guy): -> /Volumes/flash/config/plugins/user.scripts/scripts/delayed mount of Icarus TrueNAS share -> /Volumes/flash/config/plugins/user.scripts/scripts/delayed mount of Icarus TrueNAS share - SHUTDOWN The modified file is always the script file. No restart of the machine needed, the script works right away. The scripts, for now, just wait to assume the VM had enough time to start. I will have a look into conditioned startup (check if VM is up before connecting to it's SHARES) and a potential custom shutdown script (first shutdown depended VMs, then UNMOUNT main NAS share, in my case Truenas, then Shutdown VM, then shutdown machine, to ensure no data loss or abrupt share unmount) later. I will update my results here to provide my information ; )
  9. ACS Override = Multifunction and VFIO Allow Unsafe Interrupts = Yes, just gets rid of the log entries. Error persists: This is the UI stuck on a loading screen after trying to shut down. Syslinux Conf used:
  10. Unfortunately, that doesn't seem to solve it. I configured my Syslinux Conf and restarted the machine: Also attached again: newest diagnostics and syslog of the trueness vm. It always freezes on normal VM shutdown: still the VFIO_DMA_MAP -22 error. Force shutdown works tho. Unraid Shutdown from the IPMI looks like this (see the two "waiting 200 secs ..."), tried it two times, shutdown is not executed even if forced by Unraid, so I have to cut the power via IPMI command): But I solved a PART of the problem: The heavy SMB load to the Truenas vm works perfectly (even with advanced features like Dedup the speeds are pretty good for what I see -> up to 2Gb/s without bigger hickups). Thanks to the QEMU Dev/professional user "jarthur" on the QEMU Matrix channel. He responded immediately and send me this: which solved to problem under heavy load. My guess is that this is a downgrade to a lower QEMU version that didn't have this particular problem, which rendered my truenas vm with a passed through HBA useless. Thank you very much, again! This is the first success after weeks of troubleshooting. The amount of dma map errors (-22) also became less in my opinion buts is still there. The Syslinux Config (above) didn't change that. I would be glad about any new input for the Syslinux file! I turned on ACS OVERRIDE to MULTIFUNCTION, but left VFIO unsafe interrupts OFF (NO). I will try to add all the unsafe interrupts and test again - for now no luck! I feel like this is definitely a problem caused by Unraid, because the UI freezes after trying to access the VM tab and not before that, even if I already tried to shut down the nas vm. So if I go to the dashboard I still can control Unraid, but one click on VM, Docker or Settings->VM/Docker and the machine is unreachable... That must be a BUG, right? How can a failed VM influence the stability of the whole system after a weird sequence of changing tabs back and forth Still fighting to find a fix : ( icarus-diagnostics-20240309-1550.zip icarus-syslog-20240309-2149.zip
  11. New error log after restarting. 0ver 6000 lines full with kernel error. No clue what is going on here : ( Also attached a screenshot of my IOMMU groups. Anyone any idea ? I am happy about any input .... syslog.txt Icarus_SysDevs.pdf
  12. Attached also results for the "cat /proc/iomem" commond, with the distinct block for the HBA: bc100000-bc5fffff : PCI Bus 0000:0b bc100000-bc4fffff : PCI Bus 0000:0c bc100000-bc2fffff : PCI Bus 0000:0f bc100000-bc1fffff : 0000:0f:00.0 bc200000-bc23ffff : 0000:0f:00.0 bc200000-bc23ffff : vfio-pci bc240000-bc24ffff : 0000:0f:00.0 bc240000-bc24ffff : vfio-pci bc300000-bc4fffff : PCI Bus 0000:0d bc300000-bc3fffff : 0000:0d:00.0 bc400000-bc43ffff : 0000:0d:00.0 bc400000-bc43ffff : mpt3sas bc440000-bc44ffff : 0000:0d:00.0 bc440000-bc44ffff : mpt3sas bc500000-bc53ffff : 0000:0b:00.0 Both controllers seem to be bound correctly. One to the VFIO and one to the SAS module. iomem
  13. ok I found something new, for whatever reason the same hiccup with the system happened again. The point where it happened was the start of a SMB transfer from a windows 10 VM (image located on an Unraid NOT Truenas drive) to my Truenas ssd share. I can clearly identify the error: qemu-system-x86_64: vfio_dma_map(0x14a74b57da00, 0x380000060000, 0x2000, 0x14af51e47000) = -22 (Invalid argument) 2024-02-22T06:10:33.177272Z qemu-system-x86_64: VFIO_MAP_DMA failed: Invalid argument There is no GPU handed through for the TrueNAS vm BUT a Broadcom HBA (Broadcom 9300-16i -> capacity for 16 internal drives on two 8 SAS controllers -> one controller bound to VFIO = 8HDDs for TrueNAS & 3x drives for Unraid on the other controller not bound to VFIO). The weird thing is that the SMB transfer was targeted to a RAID10 of 6 Samsung SSDs all of which are connected to the motherboard's SATA connectors. I had problems with the HBA in the past before upgrading to TrueNAS 13 so I am still afraid the HBA could be the failure point, but at this moment the system was accessing the other controller with 2x 500Gb ssds for Unraid directly (virtual image on the Unraid and not the Truenas drives !). And the Truenas vm should have written the data to the ssds attached to the mobo not the HBA. How can I handle this error ? Any ideas ? Why is the Unraid UI affected (freezing tabs)? Unraid is in the RAM it shouldn't be affected by failing connections to any drives ? Only the VM, Docker and certain Setting tabs are resulting in a frozen UI .... it doesn't make sense and still drives me crazy : ( Attached there are to different diagnostics that I was able to create while the UI was acting up and the extracted VM log where the VFIO error can be found. Thanks for any answer ... icarus-diagnostics-20240226-2326.zip icarus-diagnostics-20240228-2358.zip TrueNAS Icarus.txt
  14. I was trying to get a syslog (graylog) server up and running on my second machine, but as always it's not that easy to get it to run properly. So I radnomly looked into my current log and a lot of errors were popping up over the day. The limiting request errors are actually pointing to the mac I am acessing the server with (via VPN). Is this a big deal ? Can I ignore the error messages ? Also in my syslog server there are 0 messages reported. Can I send a test message to the syslog server to validate it ? Something like "diagnostics *target ip of log server*" ? I will have a second look tommorow if any messages are reported but for now it seems like the syslog server option in Unraid is not working for me (attachted you can find the syslog configuration). icarus-syslog-20240223-0705.zip
  15. I have a second server running for future backups, can I also stream the diagnostic log to a share of that machine ? It doesn't have much storage though (just a few gb left and only one HDD), could that result in a problem e.g. bottleneck ? I want to avoid using the same machine or a VM on the same machine, because if the VM fails first I obviously have no log at the failing point to begin with. Can you suggest a syslog server for truenas ? If not I will just try to get it set up with the infos I find and post the log here, as soon as I have one.