Jump to content

offroadguy56

Members
  • Posts

    27
  • Joined

  • Last visited

Posts posted by offroadguy56

  1. I've got ram back to recommended speeds of 3200. I found that it was at 4000 and I honestly don't remember setting it to 4000, I could have sworn I left it at 3600. To match the speed of the infinity fabric, or so I've read it helps performance/stability.

     

    Anyway, I ran scrub on the cache. And I don't see any indication of the server doing anything. Be aware the cache went into it's error state within a minute and scrub was ran while in error state. I see in the log:

    May 23 00:03:51 waffle ool www[23548]: /usr/local/emhttp/plugins/dynamix/scripts/btrfs_scrub 'start' '/mnt/cache' ''

     

    I don't see anything else and it's been several minutes. Here is what the scrub status block shows. image.png.3dab44875bb7b52c721a3c2c716141d7.png

  2. @JorgeBI've not had any issues with cache until I added that 2nd nvme. I also switched from xfs to btrfs for the cache drives. As far as I know I have not overclocked the RAM. I think the RAM was rated for 4000mhz but I've not been able to run it past whatever it is at right now (3666mhz? 3800?) I understand supported frequencies for the board and cpu, I'll adjust RAM to supported frequencies. 

     

    How should I scrub and reset pool stats?

     

    Any downside to having the 2 nvme drives in the array along with the HDDs? 

  3. Having more issues with my cache drives. They're a 970 Evo and 970 Evo Plus in raid0. In the past the Evo Plus was getting zero-log corruptions and that was a simple command to fix along with an array restart. This all started when I issued a shutdown and changed the power strip the server was connected to. As far as I could tell the server was fully shut down before removing power.

     

    Right now I have appdata, system, and domains directories stored on the cache. I have a partial old backup of a VM on the array, and full backups of appdata on the array. No backup of system.

     

    I originally thought I would benefit from the cache drives, but really I only use it for the VM and quick docker container startups.

    My questions are; will my SSDs run full speed if they are adopted into the array? The array has a 4TB HDD parity drive.

    Second; How much do you believe is to be corrupted on the cache? Could I safely backup what left of the VM. I plan to move the entire VM image to the array as I am also having issues with the VM manager.

    Third; Is there anyway for me to fix the cache and continue current setup (though I still believe my best setup now is having the SSDs in the array)?

     

     

    The problems the bad cache drives are causing me:

     

    When I launch the VM I have it tells me it can't launch because the file system is read only. Right now VM manager is disabled while I trouble shoot the docker containers. I haven't tried launching the VM when the cache was in a good operating state yet.

     

    Docker stuff will run initially when unraid first boots. Docker containers will run fine for 5 minutes or up to about an hour. Once I get cache errors some of the containers will stop working while others will continue operating fine. While in an error state none of the containers will be able to restart, even the good ones, I get a generic server error or a code 403.

     

     

    Things of note:

    I'm aware of the low disk space. I'm currently removing old files to make room for the VM image. And I'm saving for more drives. I hear its possible to increase the parity drive size with some effort.

     

    Also, the parity drive has some SMART errors. I've run multiple deep scans and the errors have stopped. Still planning to replace this drive.

     

    The array is XFS

    waffle-diagnostics-20240521-1737.zip

  4. I do have access to the ISO and I'm currently finishing backing up important files before I attempt fixes on the OS. I had URBackup running for some files but not all.

     

    My Windows 10 Pro VM is stored on my cache drive. The other day the Zero-Log corrupted again. I was able to rescue it as it was not the first time. However when starting up the VM, VNC would show Guest has not initialized display (yet). Again, seen this issue before and I fixed it in the past by creating a new VM template and manually referencing the old iso. However this time when windows boots I'm met with a BSOD that reads Bad System Config Info. Windows automatic repairs fail. Windows install disk repairs fail.

     

    There is one guide I followed. To use DISM and SFC to repair. I can't get the DISM commands to execute. Dism /Online /Cleanup-Image /RestoreHealth results in error 50. To fix that error I try dism /image:C: /cleanup-image /revertpendingactions which results in error 3 saying the image cannot be accessed. I've had to change the drive in the command from :😄 to :X: as I guess that's a recovery mode thing? As a result I'm greeted with error 2 instead. The rest of the search results to fix these errors include using software or tools not available in the recovery mode.

    I give up on DISM and try SFC now. SFC /scannow scans the disk and presents WIndows Resource Protection could not perform the requested operation. I try chkdsk /r next, this says Windows cannot run disk checking on this volume because it is write protected. Search results don't bring me to any solutions for this.

     

    Disk part says there are no disks listed. No partitions. The only volumes are the 2 ISO drives from the VM, ESD-ISO and virtio.

     

    I'm ready to write off the install and start fresh. I've backed up what I care about. But I hate giving up without exhausting all my options. If you'd like to give some things for me to try that would be neat.

  5. Well the cache went unmountable again. The zero-log rescue command did bring it back again. But since the first time it's only been working for less than a day. Any idea to fix this? or should I backup my data and just wipe the cache pool and start fresh?

    It shouldn't be because of a full drive, it has 800gb free. Now I do have disk4 that has a duplicate of my VM vdisk. This drive is 80gb free I wonder if that is screwing up the cache pool.

  6. 19 minutes ago, JonathanM said:

    While definitely possible to backup the vdisks, it requires shutting down the VM fully to get an accurate backup, which is not ideal. Much better to use a backup utility inside the VM to backup to a location on the array, just like you would a standard hardware based PC. I use UrBackup, it's been a lifesaver.

     

    Appdata has its own backup application in the app store, it works well with some attention and tuning on first deployment.

     

    Docker image should NOT need to be backed up, the whole point is that the appdata folders contain all the customization and content that isn't written to the array shares. The only exception currently is custom networks, so as long as you keep notes on any custom networks created and redo them before restoring your applications from previous apps in the app store, the docker image rebuilds itself in a matter of minutes. If you accidentally have a container writing settings and data INSIDE the docker image, you need to fix that, as it will create other issues besides restoring data in the event you need to recreate the docker image file.

     

    The system folder contains the VM definitions as well, but the appdata backup app has provisions to save those.

    I'll put this information to use. Hopefully I can prevent future screw ups on my part. 

     

    Thanks again for the assistance. Y'all are great!

  7. 18 hours ago, itimpi said:

    Depends what is on the cache pool that needs backing up?   There are plugins for backing up appdata and VM vdisks at specified intervals.

    That's basically what is on it. VM vdisks, appdata, docker image, and system folder.

     

    Right now I'd like to just backup appdata and the VM vdisks. And if possible the docker image and system folder.

     

    Free space available on the cache is about 200-300gb. So it performs it's cache duties for the most part. But I needed the raid0 setup as I wanted the fastest and biggest (affordable) storage I can offer for the software in my VM. Which I am already considering increasing once again depending how my data hoarding goes. 

  8. 12 hours ago, JorgeB said:

    If the log tree is the only problem this may help:

     

    btrfs rescue zero-log /dev/nvme0n1p1

     

    The re-start array.

    Holy smokes looks like that worked. So glad I didn't screw up the file system trying to fix it myself like I did last time some months ago. Last time I took my cache drive out and accidentally set it back up as btrfs instead of it's original xfs.

     

    16 hours ago, itimpi said:

    This would not result in mover doing anything as the "No" setting makes mover ignore the share.  The setting to move from pool to array is "Yes".  If in doubt use the Help built into the GUI to see what would be the correct setting for what you want to do.

     

    The other point is that mover will never overwrite duplicates as in normal operation they should not occur.   It is up to you to decide which copy to keep and manually delete the other.

     

     

     

    And you are absolutely correct. My mistake typing the original post. I did set the shares to Yes. I remember anxiously watching the GBs tick by as the pool emptied. I will clean up the duplicates.

     

    On a side note do either of you have recommendations of automatically backing up the cache pool as it is not part of the array?

     

    Thanks very much for the assistance both of you!

    • Like 1
  9. I wasn't able to access my services on my server. I log in to find that most of my docker containers had stopped except for a few still running. I attempted to run one of them but was given a 403 error code. I did a quick search and saw that it was in reference to a full cache pool. I checked my cache pool and it still showed plenty of free space. I then restarted the server and was met with my cache drives being unmountable.

     

    The process I performed which may have lead to the cache's demise was such: I had a m.2 drive I was on a time crunch to backup. I had a m.2 to USB adapter on order but was afraid it would not arrive on time. I would need to use the m.2 slots in the server, but to reduce risk of data corruption if I took them out I began transferring data off the cache pool to the array by changing "Use cache pool: prefer" to "no" and evoking the mover. The USB adapter did arrive in time and I backed up my m.2 to the array. I left the mover doing its thing until it finished moving my various share's files to the array. I then set my shares back to "prefer" and I noticed that some files had a duplicate stored on the array and on the cache specifically 2 appdata folders, the docker container image, and my VM image. According to unraid they were stored both on Disk4 (or Disk6) and Cache. I evoked mover again and the duplicates didn't disappear. I restarted then evoked mover once more and the duplicates remained still. Some time later I reach the 403 error and after a restart the cache pool is unmountable now.

     

    Looking for assistance in trouble shooting the issue. I have had cache problems before due to my incompetence; I had removed the cache pool and put it back it and I gave it the wrong file system.

     

    I have 2 cache drives. 1 TB each. Set up in raid 0 equivalent. Most of their data should be duplicated on the array if unraid Shares tab is correct. Appdata is nowhere on the array but I have an older backup on my personal computer.

    cache 1.PNG

    cache 2.PNG

    waffle-diagnostics-20230812-1700.zip

  10. Thanks JorgeB. Looks like the SATA passthrough to VM was the root of the problem. I'm not entirely sure how it managed to get that way. All I remember at the start from a week ago was plugging my GPU accelerator back in after changing it's cooler while also plugging in a 2nd m.2 NVME drive. The computer attempted to boot windows off that 2nd NVME as I had not wiped it. It tried several times before I caught on to it. After getting into unraid I noticed Disk1 was disabled so I restarted unraid multiple times and tried changing cables/SATA ports. When I disabled the array to fix Disk1 (just a simple stop array -> start array) I also simultaneously added a 2nd slot to my cache pool which changed it from xfs to btrfs which disabled my working cache drive (the 1st nvme). I don't believe loosing the cache pool was the cause or symptom as Disk1 was disabled before I touched the cachepool. But I could be remembering wrong because libvirt.img was in a share that was solely stored on the cache drive. So the SATA passthrough issue could have happened when I added that 2nd slot and drive to the cache pool which caused the file system to change causing the cache pool to become unreadable.

    Thanks again, the community here is great.

    • Like 1
  11. Disk1 is down again. My docker img was corrupted at some point so I went to go fix that. I also had 2 images on the array, docker.img and docker-xfs.img. I deleted both then started the docker service with these settings. Do we know if this has caused Disk1 to go offline. This time it says it's enabled but unmountable: wrong or no file system. 

    EDIT: I found a previous post referencing xfs_repair. I was able to execute the command and disk1 appears to be back and operational.

    image.png

    waffle-diagnostics-20230307-1507 - removed corupted docker.img_then made new docker img as btrfs.zip

  12. 10 hours ago, JorgeB said:

    Yeah, that won't do it, should be OK for now or the disk would become disabled almost immediately.

    Looks like the array is back online. SMB shares are working again. Docker and VM are currently disabled. I can work on my own to get those back.

     

    Can I remove these historical disks without further issue?

    image.png.385fc0692e56886e11d60e8861f7d530.png

  13. 23 hours ago, JorgeB said:

    Everything looks good so far, try re-enabling parity now.

    I though the read check that it offered me would fix the disk being disabled. It did not.

    So I performed the start array with out disk and add it back trick. The array is performing a parity sync now. Parity drive is enabled now. I'll see how it goes, last time it did it's parity check but then the drive was disabled again, but that shouldn't happen now with the VM issue removed. I'll past back here with results sometime tomorrow after some sleep.

    image.thumb.png.29a2b64ad1cd5e9d38de981a6d1bdb5e.png

  14. 5 hours ago, JorgeB said:

    Yes.

     

    Yep, but since possibly there's a "phantom" VM there probably best to still delete it before or after.

     

    Post new diags after the VM issue is resolved.

    Array started, libvirt deleted (i believe). Here is the most recent diagnostics.

     

    Here is most recent screenshot of Main tab. For when I attempt to fix the array.

     

    And my normal SMB shares have shown themselves on Disk1 again. No more linux file system.

     

    waffle-diagnostics-20230305-1047 - After libvirt deletion.zip

    image.png

    image.thumb.png.b922d8981214a6190e083b87e210d02c.png

  15. 6 hours ago, JorgeB said:

    The service must be stopped for the delete option to appear, edit /boot/config/domain.cfg and change SERVICE="enable" to "disable" then reboot.

    Looks like the file was properly modified by the webui. But unraid failed to properly shut down the VM service. After a restart I can now modify the Libvirt storage location path. Do I need the array running to see the option to delete libvirt.img?

     

    If the SATA controller passthrough was the culprit, with the VM manager disabled in theory I should be able to start the array and repair it without issue correct? Even with out removing libvirt.img?

     

    There is also one more thing I want to point out. Currently on my Disk1 if I look at the contents of the disk I do not have my usual SMB shares. Instead I see a linux file system. If I navigate to /mnt/ I can see Disk2,Disk3,Disk4, etc but no Disk1. Just want to put this info out there before any more rebuilds or parity checks are performed.

    image.thumb.png.1d486a9ab14c089543f11f212c340d13.png

  16. 10 minutes ago, JorgeB said:

    That's strange, there a Windows 10 VM on the diags, but yeah, disable the VM manager, you will then have the option to delete libvirt.img, delete it and you can then re-enable the VM manager.

    I have tried to disable the VM manager. It still says running in the top right. But the VMs tab is gone and enable VMs is set to 'no'. Unraid did hang on the loading icon for a few minutes. I refreshed the page to regain control of the webui.

    How should I go about removing the libvirt.img file? I assume I would see a button next to the path location in the settings page.

     

    This is the most recent line in the log:
     

    Quote

    Mar  4 05:18:46 waffle  ool www[2694]: /usr/local/emhttp/plugins/dynamix/scripts/emcmd 'cmdStatus=Apply'

    image.thumb.png.11f7e84ad7b5b6f6ab9e55c472d9099d.png

  17. 1 hour ago, JorgeB said:

    This SATA controller is being passed-through to the VM, so when it starts Unraid will lose all connected disks, just edit the VM and correct that.

     

    08:00.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 51)
        Subsystem: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901]
        Kernel driver in use: ahci
        Kernel modules: ahc

    i

    My VMs list is currently empty. I should have a windows 10 VM there. Any suggestions to accomplish what you recommended above?

    I feel like I've done enough potential damage. I'd like to play is slow and safe and see what the community suggests first.

    I assume I could fix this problem by disabling the VM manager?

  18. Unraid 6.11.5

     

    I have some weird happening with my array. I will try and describe the series of events the best I can.

    TL;DR I tried to bring my disk1 online. It and parity disk took turns being offline. After 3 or 4 parity syncs and disk rebuilds. Parity disk will not go online after parity sync. Unraid has multiple notices saying Disk1 Disk2 can't be written to. Disk1 has read errors. Parity disk is disabled. 

     

    Before I began my upgrade I have 7 disks total with 1 being parity. I have 1 cache drive NVME.

     

    I had planned to install a 2nd cache drive and assign it to the same pool.

    I install the 2nd cache drive. On bootup it asks me to assign it BTRFS. I click yes and that brings my 1st cache drive offline because it was formatted XFS. At the same time my Disk1 showed it was offline. I restarted Unraid multiple times and tried different cables and SATA ports. I though the disk being offline meant that it was not recognized by the OS/BIOS. I learn that was not the case and that the disk had to be rebuilt by the parity data. I take the array offline remove the disk, start the array, stop the array, add the disk back. Rebuild begins. After rebuild unraid says all disks + parity are online. I remove the 2nd cache slot and set the cache pool back to XFS and assign my cache drive back to the pool. I move the data off the cache drive by invoking mover and setting all of my "prefer cache" disks to "yes cache". All the data moves successfully. I also as a precaution copy the appdata folder contents to my main PC via SMB.

     

    I notice that my VM list and Docker list are empty. I restart unraid. I should mention at this point and before during the restarts Unraid was not able to properly shut down. It would either hang on trying to stop the array, and do absolutely nothing for 30 minutes or more. Or it would spit out on the local console that it had IO errors. I didn't think anything of it. 3 times now upon boot up my computer would not recognize any bootable devices including the unraid USB except for the 2nd cache drive which had a windows install. After a restart unraid would boot.

     

    After actually booting into unraid if I had rebuilt disk1, the parity disk would be offline and I would take the array down and back up to get it to parity sync. If I had previously done a parity sync before a restart disk1 would be offline and I would take the array offline and back online to rebuild it. Now after each restart and parity sync, the parity disk remains offline even before the restart and the "successful" parity sync. Also as of now there is no longer a parity sync button, it has been replaced with a read-check button.

     

    I'm sure some info has been left out and that certain things aren't very clear. Please ask me questions and point me in the correct direction to recover my array. My only hypothesis is that I somehow swapped my parity and disk1 positions.

     

    I have attached some diagnostic dumps and here are screenshots of my current webui. I also have a flash backup of unraid from 02-15-2023 I believe it is version 6.9.5 in that backup?

     

    Thanks in advance for all that yall do here.

    image.thumb.png.d0ccf09d877d1bfc32e49fbda4c0cef1.png

     

    image.thumb.png.e2856e8b48b502359a7f68b07cb33822.png

     

    image.thumb.png.e80af4d7e4472f6271cd96417c40855b.png

     

    waffle-diagnostics-20230303-1230.zip waffle-diagnostics-20230303-1932 after parity sync.zip waffle-diagnostics-20230303-2004 after restart to normal OS mode.zip

  19. 2 hours ago, SimonF said:

    I think the issue is size of you memory on the card.

    The card is 24gb. I didn't think it would be an issue on address size. I still don't really know how all that works. When I was searching for code 12 errors I didn't see any direct correlation to address size so I pushed those results aside. It didn't help that most results weren't for my use case anyway.

     

    Again, thanks a bunch.

  20. 7 hours ago, SimonF said:

    Not sure if this is related. Its the qemu options at the bottom to increate to 64 from 32 default. 

     

     

    Holy smokes, it's worked. I'll begin testing actual functions of the card, but for now Windows is no longer complaining and GPUZ/Afterburner have recognized it. Even Nvidia control panel is working.

     

    How did you end up finding that post? I didn't think a card from 2016 would have resizable bar. My GPUZ says it does.

     

    Thanks so much!

  21. I recently bought a used Tesla P40 for AI work with Stable Diffusion. I planned to run it in my Windows 10 Pro VM. However after installing the Telsa P40 data center drivers from Nvidia Windows will recognize the card but will display a Code 12 error saying it cannot find enough free resources.

     

    I am completely stumped at this point. The GPU works perfectly fine when running bare metal. But as soon as it's passed to a VM it's not functioning. I tried Linux Mint at one point, drivers installed successfully (as far as I could tell) but Mint would notify that it can't detect an appropriate GPU.

     

    Any help would be appreciated.

    If you have questions let me know.

     

    image.png.81468ce85339d02a616c98218d373e57.png

    image.png.e3630fb6bbb306d51cb3af3fc84c97ff.png

     

    Here is my VM setup and the XML

    Spoiler

    <?xml version='1.0' encoding='UTF-8'?>
    <domain type='kvm'>
      <name>Windows 10</name>
      <uuid>b9a3338e-5c66-c478-c4d0-bd9ea543a7c9</uuid>
      <metadata>
        <vmtemplate xmlns="unraid" name="Windows 10" icon="windows.png" os="windows10"/>
      </metadata>
      <memory unit='KiB'>33554432</memory>
      <currentMemory unit='KiB'>33554432</currentMemory>
      <memoryBacking>
        <nosharepages/>
      </memoryBacking>
      <vcpu placement='static'>30</vcpu>
      <cputune>
        <vcpupin vcpu='0' cpuset='1'/>
        <vcpupin vcpu='1' cpuset='17'/>
        <vcpupin vcpu='2' cpuset='2'/>
        <vcpupin vcpu='3' cpuset='18'/>
        <vcpupin vcpu='4' cpuset='3'/>
        <vcpupin vcpu='5' cpuset='19'/>
        <vcpupin vcpu='6' cpuset='4'/>
        <vcpupin vcpu='7' cpuset='20'/>
        <vcpupin vcpu='8' cpuset='5'/>
        <vcpupin vcpu='9' cpuset='21'/>
        <vcpupin vcpu='10' cpuset='6'/>
        <vcpupin vcpu='11' cpuset='22'/>
        <vcpupin vcpu='12' cpuset='7'/>
        <vcpupin vcpu='13' cpuset='23'/>
        <vcpupin vcpu='14' cpuset='8'/>
        <vcpupin vcpu='15' cpuset='24'/>
        <vcpupin vcpu='16' cpuset='9'/>
        <vcpupin vcpu='17' cpuset='25'/>
        <vcpupin vcpu='18' cpuset='10'/>
        <vcpupin vcpu='19' cpuset='26'/>
        <vcpupin vcpu='20' cpuset='11'/>
        <vcpupin vcpu='21' cpuset='27'/>
        <vcpupin vcpu='22' cpuset='12'/>
        <vcpupin vcpu='23' cpuset='28'/>
        <vcpupin vcpu='24' cpuset='13'/>
        <vcpupin vcpu='25' cpuset='29'/>
        <vcpupin vcpu='26' cpuset='14'/>
        <vcpupin vcpu='27' cpuset='30'/>
        <vcpupin vcpu='28' cpuset='15'/>
        <vcpupin vcpu='29' cpuset='31'/>
      </cputune>
      <os>
        <type arch='x86_64' machine='pc-q35-5.1'>hvm</type>
        <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd</loader>
        <nvram>/etc/libvirt/qemu/nvram/b9a3338e-5c66-c478-c4d0-bd9ea543a7c9_VARS-pure-efi.fd</nvram>
      </os>
      <features>
        <acpi/>
        <apic/>
        <hyperv>
          <relaxed state='on'/>
          <vapic state='on'/>
          <spinlocks state='on' retries='8191'/>
          <vendor_id state='on' value='none'/>
        </hyperv>
      </features>
      <cpu mode='host-passthrough' check='none' migratable='on'>
        <topology sockets='1' dies='1' cores='15' threads='2'/>
        <cache mode='passthrough'/>
        <feature policy='require' name='topoext'/>
      </cpu>
      <clock offset='localtime'>
        <timer name='hypervclock' present='yes'/>
        <timer name='hpet' present='no'/>
      </clock>
      <on_poweroff>destroy</on_poweroff>
      <on_reboot>restart</on_reboot>
      <on_crash>restart</on_crash>
      <devices>
        <emulator>/usr/local/sbin/qemu</emulator>
        <disk type='file' device='disk'>
          <driver name='qemu' type='raw' cache='writeback'/>
          <source file='/mnt/user/domains/Windows 10/vdisk1.img'/>
          <target dev='hdc' bus='virtio'/>
          <boot order='1'/>
          <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
        </disk>
        <disk type='file' device='cdrom'>
          <driver name='qemu' type='raw'/>
          <source file='/mnt/user/isos/Windows 10/Windows 10 64bit.iso'/>
          <target dev='hda' bus='sata'/>
          <readonly/>
          <boot order='2'/>
          <address type='drive' controller='0' bus='0' target='0' unit='0'/>
        </disk>
        <disk type='file' device='cdrom'>
          <driver name='qemu' type='raw'/>
          <source file='/mnt/user/isos/virtio-win-0.1.229.iso'/>
          <target dev='hdb' bus='sata'/>
          <readonly/>
          <address type='drive' controller='0' bus='0' target='0' unit='1'/>
        </disk>
        <controller type='usb' index='0' model='ich9-ehci1'>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x7'/>
        </controller>
        <controller type='usb' index='0' model='ich9-uhci1'>
          <master startport='0'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0' multifunction='on'/>
        </controller>
        <controller type='usb' index='0' model='ich9-uhci2'>
          <master startport='2'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x1'/>
        </controller>
        <controller type='usb' index='0' model='ich9-uhci3'>
          <master startport='4'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x2'/>
        </controller>
        <controller type='pci' index='0' model='pcie-root'/>
        <controller type='pci' index='1' model='pcie-root-port'>
          <model name='pcie-root-port'/>
          <target chassis='1' port='0x10'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/>
        </controller>
        <controller type='pci' index='2' model='pcie-root-port'>
          <model name='pcie-root-port'/>
          <target chassis='2' port='0x11'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x1'/>
        </controller>
        <controller type='pci' index='3' model='pcie-root-port'>
          <model name='pcie-root-port'/>
          <target chassis='3' port='0x12'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x2'/>
        </controller>
        <controller type='pci' index='4' model='pcie-root-port'>
          <model name='pcie-root-port'/>
          <target chassis='4' port='0x13'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x3'/>
        </controller>
        <controller type='pci' index='5' model='pcie-root-port'>
          <model name='pcie-root-port'/>
          <target chassis='5' port='0x14'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x4'/>
        </controller>
        <controller type='virtio-serial' index='0'>
          <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
        </controller>
        <controller type='sata' index='0'>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
        </controller>
        <interface type='bridge'>
          <mac address='52:54:00:0c:70:42'/>
          <source bridge='br0'/>
          <model type='virtio-net'/>
          <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
        </interface>
        <serial type='pty'>
          <target type='isa-serial' port='0'>
            <model name='isa-serial'/>
          </target>
        </serial>
        <console type='pty'>
          <target type='serial' port='0'/>
        </console>
        <channel type='unix'>
          <target type='virtio' name='org.qemu.guest_agent.0'/>
          <address type='virtio-serial' controller='0' bus='0' port='1'/>
        </channel>
        <input type='tablet' bus='usb'>
          <address type='usb' bus='0' port='2'/>
        </input>
        <input type='mouse' bus='ps2'/>
        <input type='keyboard' bus='ps2'/>
        <graphics type='vnc' port='-1' autoport='yes' websocket='-1' listen='0.0.0.0' keymap='en-us'>
          <listen type='address' address='0.0.0.0'/>
        </graphics>
        <video>
          <model type='qxl' ram='65536' vram='65536' vgamem='16384' heads='1' primary='yes'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
        </video>
        <hostdev mode='subsystem' type='pci' managed='yes'>
          <driver name='vfio'/>
          <source>
            <address domain='0x0000' bus='0x09' slot='0x00' function='0x0'/>
          </source>
          <rom file='/mnt/user/isos/GPU ROM/NVIDIA.TeslaP40.24576.161020.rom'/>
          <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
        </hostdev>
        <hostdev mode='subsystem' type='usb' managed='no'>
          <source>
            <vendor id='0x1b1c'/>
            <product id='0x0c2a'/>
          </source>
          <address type='usb' bus='0' port='1'/>
        </hostdev>
        <memballoon model='none'/>
      </devices>
    </domain>


    image.thumb.png.42d8696bc17be6a8ba784edb30711827.png

     

  22. unRAID version 6.9.2

     

    I am looking at getting a datacenter GPU for ai image training for cheap. Much cheaper than what is normally available as a normal graphics card. It has no fan or headers on the graphics card board. I plan to 3D print a duct and attach my own fan and connect it to my motherboard's case fan header. I have Dynamix Autofan and temperature plugins. There doesn't seem to be any option to reference a specific temperature, I just see mainboard or CPU. And Autofan just lists options for connected drives.

     

    Is there a plugin or some software I could run at unRAID's level to control a case fan determined by the temperature of a specific probe, like the GPU?

     

    The only other solutions I can think of is a.) external fan controller with some auto fan curve and a temp probe stuffed into the GPU. or b.) USB fan controller, like Corsair Commander, and pass the USB device to a windows VM.

    The performance of the external fan controllers I can find on amazon aren't great. Users complain of not being able to adjust the fan curve or that the curve is incorrect for their use case. And for my Windows VM, I am not able to currently pass through fan speed and temperature info for the software without that USB controller.

     

    And of course last resort is just power the fan at 100%. But this server is in a public space in the house, I have no server room near a LAN jack. So I've only been able to keep it around with how little noise it makes.

×
×
  • Create New...