Siren

Members
  • Posts

    9
  • Joined

  • Last visited

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

Siren's Achievements

Noob

Noob (1/14)

5

Reputation

  1. So it seems like it could be 2 things: - Drivers are messed up somewhere and you might need to reinstall them - Firmware issue on the card (happened to mine, which were also brand new at the time) On your windows machine, if you have the cards set to Infiniband, can you run mst status and post a screenshot? I want to try and map out your issue. If you cant, You might need to use WinMFT and re-flash the firmware. Here's a link for the firmware: http://www.mellanox.com/downloads/firmware/fw-ConnectX3-rel-2_42_5000-MCX354A-FCB_A2-A5-FlexBoot-3.4.752.bin.zip Link to WinMFT: https://www.mellanox.com/downloads/MFT/WinMFT_x64_4_26_1_3.exe Steps on how to burn the firmware: https://network.nvidia.com/support/firmware/nic/ Your ID's should be the same as mine since I have the same card, but with 1 physical port instead of 2. Since then, I've got the card to work on UnRAID on my Dell server, but I'm planning on redoing my main server in the future and had to pull the card out to fit multiple GPU's.
  2. Hi All, I was one of the lucky ones and got an RTX 3080 Ti from the Newegg shuffle on release day. Got the card in the mail the next day, installed it into my UnRAID server, and it detected just fine. Was even able to use @SpaceInvaderOne's vBIOS Dump Script and extracted the vBIOS of the card out of the box. Mapped the path of the vBIOS to the card as well in the VM settings. However, I'm running into an issue where the card isn't being recognized in the VM at all on a fresh Windows 10 VM. In device manager, it is only showing up as the Microsoft Basic Display Adapter with error code 31. I've already installed the latest drivers from GeForce experience and can confirm that my dummy card is working just fine (GeForce GT 710). Below is a screenshot with the device manager + error code, NVIDIA control panel info, and GeForce experience: Below is also my full XML for the VM: <?xml version='1.0' encoding='UTF-8'?> <domain type='kvm' id='8'> <name>Windows 10-2</name> <uuid>f8006310-5f0e-0dc0-6413-7ef7619197df</uuid> <metadata> <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/> </metadata> <memory unit='KiB'>16777216</memory> <currentMemory unit='KiB'>16777216</currentMemory> <memoryBacking> <nosharepages/> </memoryBacking> <vcpu placement='static'>16</vcpu> <cputune> <vcpupin vcpu='0' cpuset='16'/> <vcpupin vcpu='1' cpuset='40'/> <vcpupin vcpu='2' cpuset='17'/> <vcpupin vcpu='3' cpuset='41'/> <vcpupin vcpu='4' cpuset='18'/> <vcpupin vcpu='5' cpuset='42'/> <vcpupin vcpu='6' cpuset='19'/> <vcpupin vcpu='7' cpuset='43'/> <vcpupin vcpu='8' cpuset='20'/> <vcpupin vcpu='9' cpuset='44'/> <vcpupin vcpu='10' cpuset='21'/> <vcpupin vcpu='11' cpuset='45'/> <vcpupin vcpu='12' cpuset='22'/> <vcpupin vcpu='13' cpuset='46'/> <vcpupin vcpu='14' cpuset='23'/> <vcpupin vcpu='15' cpuset='47'/> </cputune> <resource> <partition>/machine</partition> </resource> <os> <type arch='x86_64' machine='pc-i440fx-5.1'>hvm</type> <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd</loader> <nvram>/etc/libvirt/qemu/nvram/f8006310-5f0e-0dc0-6413-7ef7619197df_VARS-pure-efi.fd</nvram> </os> <features> <acpi/> <apic/> <hyperv> <relaxed state='on'/> <vapic state='on'/> <spinlocks state='on' retries='8191'/> <vendor_id state='on' value='none'/> </hyperv> </features> <cpu mode='host-passthrough' check='none' migratable='on'> <topology sockets='1' dies='1' cores='8' threads='2'/> <cache mode='passthrough'/> <feature policy='require' name='topoext'/> </cpu> <clock offset='localtime'> <timer name='hypervclock' present='yes'/> <timer name='hpet' present='no'/> </clock> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices> <emulator>/usr/local/sbin/qemu</emulator> <disk type='file' device='disk'> <driver name='qemu' type='raw' cache='writeback'/> <source file='/mnt/disk1/domains/Windows 10/vdisk1.img' index='1'/> <backingStore/> <target dev='hdc' bus='virtio'/> <boot order='1'/> <alias name='virtio-disk2'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </disk> <controller type='pci' index='0' model='pci-root'> <alias name='pci.0'/> </controller> <controller type='virtio-serial' index='0'> <alias name='virtio-serial0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </controller> <controller type='usb' index='0' model='ich9-ehci1'> <alias name='usb'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x7'/> </controller> <controller type='usb' index='0' model='ich9-uhci1'> <alias name='usb'/> <master startport='0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0' multifunction='on'/> </controller> <controller type='usb' index='0' model='ich9-uhci2'> <alias name='usb'/> <master startport='2'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x1'/> </controller> <controller type='usb' index='0' model='ich9-uhci3'> <alias name='usb'/> <master startport='4'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x2'/> </controller> <interface type='bridge'> <mac address='52:54:00:87:3e:ae'/> <source bridge='br0'/> <target dev='vnet0'/> <model type='virtio-net'/> <alias name='net0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </interface> <serial type='pty'> <source path='/dev/pts/0'/> <target type='isa-serial' port='0'> <model name='isa-serial'/> </target> <alias name='serial0'/> </serial> <console type='pty' tty='/dev/pts/0'> <source path='/dev/pts/0'/> <target type='serial' port='0'/> <alias name='serial0'/> </console> <channel type='unix'> <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-8-Windows 10-2/org.qemu.guest_agent.0'/> <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/> <alias name='channel0'/> <address type='virtio-serial' controller='0' bus='0' port='1'/> </channel> <input type='mouse' bus='ps2'> <alias name='input0'/> </input> <input type='keyboard' bus='ps2'> <alias name='input1'/> </input> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> </source> <alias name='hostdev0'/> <rom file='/mnt/disk1/isos/vbios/GeForce GT 710.rom'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x23' slot='0x00' function='0x0'/> </source> <alias name='hostdev1'/> <rom file='/mnt/disk1/isos/vbios/rtx3080tibios.rom'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x01' slot='0x00' function='0x1'/> </source> <alias name='hostdev2'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </hostdev> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x23' slot='0x00' function='0x1'/> </source> <alias name='hostdev3'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/> </hostdev> <hostdev mode='subsystem' type='usb' managed='no'> <source> <vendor id='0x046d'/> <product id='0xc52b'/> <address bus='1' device='3'/> </source> <alias name='hostdev4'/> <address type='usb' bus='0' port='1'/> </hostdev> <hostdev mode='subsystem' type='usb' managed='no'> <source> <vendor id='0x046d'/> <product id='0xc534'/> <address bus='1' device='2'/> </source> <alias name='hostdev5'/> <address type='usb' bus='0' port='2'/> </hostdev> <hostdev mode='subsystem' type='usb' managed='no'> <source> <vendor id='0x1b1c'/> <product id='0x1b2d'/> <address bus='5' device='3'/> </source> <alias name='hostdev6'/> <address type='usb' bus='0' port='3'/> </hostdev> <hostdev mode='subsystem' type='usb' managed='no'> <source> <vendor id='0x258a'/> <product id='0x0033'/> <address bus='9' device='7'/> </source> <alias name='hostdev7'/> <address type='usb' bus='0' port='4'/> </hostdev> <memballoon model='none'/> </devices> <seclabel type='dynamic' model='dac' relabel='yes'> <label>+0:+100</label> <imagelabel>+0:+100</imagelabel> </seclabel> </domain> I also have an RTX 2080 super that I used on this system before also and it was working just fine before I moved it back to another system. For the BIOS settings: - Enable 4g decode ON - CSM Enabled - Resizable BAR disabled Messing with these didn't change anything at all for the VM. Lastly, in a regular Windows 10 installation on the server (dual boot between Windows and UnRAID) its able to detect it just fine. So I have a feeling there is something causing the incompatibility just on the VM. Server specs (will update the signature later): - AMD Threadripper 3960X - ASUS ROG Zenith II extreme Alpha - EVGA RTX 3080 Ti XC3 Ultra - GeForce GT 710 2gb Any help/advice is appreciated in advance. Cheers!
  3. Been a while since I updated this thread, but some slightly good news: I have some improvements by only changing a few pieces of hardware. TL,DR; As of now, what I am getting in terms of speed is around 2.0GB/s on a raw transfer to RAID 10 SSD's from a 120gb vm after swapping to a really good RAID controller. Halfway there! 😄 🍺 What I realized is that on these Dell servers, the on-board RAID controller (H710p Mini) is PCIe 2.0 x8 which is UTTERLY slow, even for RAID 10. So I took a risk and bought a single H740p for the server. Normally, this is NOT supported by Dell and they even claim that it might not be possible for it to work in its entirety. But that's not the case All I did was insert the controller in the server and it showed up after some Initialization (for any new hardware, the Dell servers will perform an initialization after ANY change in internal components like CPU and PCIe devices). Cleared the controller's old cache and re-imported the config after I swapped the SAS cables from the server. Lastly, I updated the firmware of the H740p to the latest (because some of the original firmwares gate the cache to 4gb rather than getting the full 8gb) and it was done. I didnt have to reimport or redo any configurations as it automatically imported it from the old controller, which was perfect as I didnt want to reinstall my copy of windows As to why I bought this RAID controller? Four simple reasons: - 8gb cache, which is an INSANE amount of cache - Ability to toggle between RAID and HBA mode along with NVMe support (which is used for the newer Poweredge X40 models) - Price was reasonable for the controller - Potential compatibility with Dell servers and online management In terms of my speeds, I went from about: 1.75GB/s seq. read and 1GB/s seq. write RAID 10 on the old H710p to: 7GB/s seq. read and 6.75GB/s seq. write RAID 10 on the new H740p On the disks using CrystalDisk. Proceeded to do a raw one-way 120gb VM transfer VIA SMB using the 40gbe cards and got speeds of 2.0GB/s and peaking at around 2.2GB/s All on 4x Samsung 850 EVO 1TB disks. Yes, I am still using RAID on the card, but at some point I'll test them under HBA mode as I know unRAID does NOT like/prefer use of RAID (I guess... hence the name 🤣). But the results are still some of the better ones I've seen on my server. Lastly, when I updated to the newest version of UnRAID (6.8.3, at the time of writing), I realized that there were SMB issues. Figured out that you need to enable Anonymous login AND enable SMB1.0 support in Windows and Group Policy editor to get it to work. There is a thread that I found that worked for me, but I'm unable to link it because I cant find it again and if so, I can repost it here. Cheers!
  4. No Problem. Glad that you got your cards working Again, still have been busy and still trying to figure out where my bottleneck is, but havent had time yet to figure it out. Not intentionally trying to resurrect this thread but its been a crazy ride with other OS's. Added my servers to my signature, for anyone who wants to accurately see what I have now. Now that you have a 40g link, go ahead and test the raw data transfer and see what you get (assuming you can get RoCE working). If you get close to 40gbps speeds (or 5Gb/s), let me know what you have so that I can evaluate my stuff here and eventually make a guide for 40gb speeds. Thanks in advance
  5. Utilizing the ethernet protocol for the Mellanox Cards/RDMA (or in this case, RoCE) is much easier for most OS's to recognize full compatible connectivity without requiring a SM, unlike IB. And also because of the way IB protocols work (Dont want to explain it here). Unfortunately, you would have to do a lot of kernel modding to get IB drivers to work properly in UnRAID at the time of writing, as IB mode will not be seen in the network page on the Web GUI. Obviously, using lspci in the shell will show that its there as I experienced that the hard way. Also, they only included base rdma in UnRAID, but I haven't gotten it to work at all. Trying to use the rdma command gives me an error (dont have it on me). But in some way shape or form, if anyone else has a way to get the actual Mellanox drivers/RDMA to work in UnRAID along with the Infiniband requirements (i.e: "Infiniband Support" packages like in CentOS 7), I'd be more than willing to test it out, as I'm not that good in kernel modding in Linux. If base RDMA wont work (without RoCE/iWARP), you probably wont have any luck in NVMEoF. However, you are correct in that once you get an SM up and running along with IP's allocated to each card, it would work out. The devs just need to modify/add the requirements for it. I've already requested this to be a feature under the feature request page, but if more people start to do this then they may add this in the next major release. Out of curiosity, have you tested the connect-x 4 cards with standard SSD drives (i.e: 860 EVO's)? Just wanted to know as the only OS I've seemed to get it to work is in Windows, which I'm getting around 32gb/s out of the 40 in atto benchmark (not complaining about the speed at all. Just complaining that Linux isnt treating me well with the speed I'm getting there, which is around 10-15). Tried it in VMWare as well by attaching the card to the VM, but getting poor read speeds there. Pretty sure with NVMe drives, I'll get there but 1TB M.2 drives are relatively pricey and would need a LOT of them for my host.
  6. Been a while since I got back to this thread. Was really busy and trying out some other possibilities to get 40g to work anywhere else. My goal isn't to turn this thread into a Mellanox tutorial, but I'll help out. Seems like your just trying to run the exe VIA double click. The exe doesn't work like that and is ONLY run by the command prompt/powershell (best to use command prompt). Steps below are how to find out the necessary card info and get it to work in ethernet mode: A: I'm assuming you've already downloaded and installed BOTH WinOF & WinMFT for your card as well as you've already installed the card in your system. If not, head over to the Mellanox site and download + install it. WinOF should automatically update the firmware of the card. B: In this example, I'm using my card which I've already listed above. Again, YMMV if you have a different model. 1. Run Command prompt as Administrator. Navigate to where WinMFT is installed in Windows by default: cd C:\Program Files\Mellanox\WinMFT 2. Run the following command & save the info for later: mst status Your output should look something like this: MST devices: ------------ <device identifier>_pci_cr0 <device identifier>_pciconf<port number> ##In my case: MST devices: ------------ mt4099_pci_cr0 mt4099_pciconf0 Note that any additional ports will also be shown here as well. 3. Query the card and port to check on the mode it is using mlxconfig -d <device identifier>_pciconf<port number> ## in my case mlxconfig -d mt4099_pciconf0 query the output should be something similar below: What is in green is the port type for the card. Note that just because I have 2 ports there doesnt mean that I have 2 physical ports on the card. As I mentioned above in the thread, the card is a single port card. The port types are as follows: (1) = Infiniband (2) = Ethernet (3) = Auto sensing 4. If your card is already in ethernet, then your good. If not, use the following command below to change it: mlxconfig -d <device identifier>_pciconf<device port> set LINK_TYPE_P1=2 LINK_TYPE_P2=2 ##In my case mlxconfig -d mt4099_pciconf0 set LINK_TYPE_P1=2 LINK_TYPE_P2=2 Output: Select Y and hit enter and the port type will change (note that the change will be under the New column). It'll then ask you to reboot the system to take effect. Do so and do step 3 again to verify it's changed. You should get nearly the same output as mine above.
  7. I've definitely have some things I can post on it. Just been way too busy since my last post, but worked on it a good amount and it works really well. My notes below assume that you have the same card. YMMV if it's a different model: Configuring notes: Additional Config Notes: Performance Notes: Warnings Issues I faced: I'm not necessarily making a guide for noobs or anything. Just from my own experiences. If you know what your doing, then I would go for the 40gb speeds, since it's entirely possible and data rates are much faster and maybe even more affordable than 10gb speeds.
  8. Thanks for the reply jonnie.black. I'm aware that Unraid doesnt support infiniband as of 6.6.7 and figured out that there was an issue with my card not saving the config after a reboot to Ethernet and kept switching back to Auto Sensing. Had it set to ethernet before. Eventually, I re-flashed the firmware manually to the same version and it seems to be working just fine after 10 reboots and Unraid recognizes it in the network settings. I'm going to be out of the country for a week, so I can't test it out until I come back. Out of curiosity: Have there been any posts/reports of 40gbe connections working with unraid? If not, guess I might be the 1st 😁 Thanks.
  9. Hi all, I am new to the unraid community and would like some advice/opinions/guidance as to how I can use the connectx-3 cards to work with Unraid. I recently acquired 2 of the MCX353A cards to use. One for my unraid machine (Dell R610 server w/ 2 * X5650 and 128gb of RAM that I bought for a good price off of a client who was decommissioning it) and another for my "hybrid" file server and backup server running Windows 10 pro (custom built ITX system: 8700K, 16gb of RAM (will add more in the future)). I understand that it isn't ideal to use a consumer desktop OS with infiniband and might just be outright stupid, but I have seen it and have got it to work with some other clients systems. Windows recognizes the cards just fine and was able to update the both of them to the newest firmware on each of the card. However, I can see the card in unraid using "lspci | grep Mellanox" and it outputs: Under tools -> system devices in unraid it does match up also. My assumption is that there aren't any recognizable drivers for this (to which I'm not surprised, since not many people request/use a 40gb infiniband card) as it also doesnt show up in the network settings section. Mellanox's site does have documentation/guides AND the actual firmware for the cards. Provided link (MCX353A-FCBT): http://www.mellanox.com/page/firmware_table_ConnectX3IB If anyone has any suggestions/solutions/guidance to this, I'd greatly appreciate it. Thanks in advance.