Jump to content

Wingede

Members
  • Posts

    12
  • Joined

  • Last visited

Posts posted by Wingede

  1. 4 minutes ago, ich777 said:

    Then it seems that there is something wrong with the BIOS when it doesn't boot when Above 4G Decoding is turned on because that's what 4G Decoding is for...

     

    Admittedly i've never had that setting enabled until you mentioned it.  I did perform a bios update from F2 to F5 as part of this troubleshooting.  With that setting enabled it did open some rambar size setting which I left but the system didn't boot.  I had no video display from the iGPU, i have left dongles at work (which I can't get to due to lockdown) incase it somehow switched video output to the cards.  I did leave the system powered to see if unraid came up but after 5 minutes I had no ping response.  

     

    I will update my ticket with gigabyte to see if they can share something.  6.9.2 seems fine at the moment.

     

     

  2. 13 hours ago, ich777 said:

    Can you maybe try to Disable Above 4G Decoding and see if it works?

     

    I can only thing of Memory allocation issue with the newer Kernel.

     

    If you got time please give me the output from:

    dmesg | grep "root bus resource"

    One time with enabled 4G decoding and one time disabled please?

     

    My system doesn't boot when 4g decoding is enabled and have to reset bios to get things working again.

     

    I did the update from 6.9.2 to 6.10-rc2 and the primary k1200 disappears.  Diagnostics attached and dmesg output

     

     

    busres-4gdisable.png

    av4-diagnostics-20211110-0917.zip

  3. 11 minutes ago, ich777 said:

    May I ask for what containers are you using the second card? Only for transcoding or for something different too?

     

    On what card does the unRAID console get displayed or is it displayed on the iGPU?

     

    unRaid console uses iGPU, docker specifically uses the first k1200 for doing transcodes for emby or plex.  The second k1200 doesn't do a heck of a lot but is passed through to a Windows VM which I use for different things, some cad stuff when remote/being lazy ;).  I probably could just use the igpu for emby transcodes but I have the cards.  

     

     

     

    11 minutes ago, ich777 said:

     

    You can also remove this line on 6.10.0 from your go file:

    modprobe i915

     

    I would also recommend that you use the Intel GPU TOP plugin (not the container) to enable the iGPU, this will also allow you to see the usage on the unRAID Dashboard in combination with the GPU Statistics plugin.

     

    Thanks for the advice will do.

     

     

     

    11 minutes ago, ich777 said:

     

    Somebody else has also the exact same issue with two cards and 6.10.0:

     

    I'm really curious to find out what the root of this issue is.

     

    Interesting!!, i'll make it a priority to update back to rc2 tomorrow and see if it breaks.  Starting to get a bit late here and need to wind down a bit but will do that tomorrow and post feedback since you have spent a fair amount of time helping me.

     

     

  4. 1 minute ago, ich777 said:

    Oh, I thought you are using both for Docker containers.

    Have you tried to bind the second card to VFIO or even stub it and tried to pass it through to a VM on 6.10.0?

    What error code do you get in Windows or wasn't it picked up by the VM at all?

     

    Also interested in that, please report back about your findings.

     

    My configuration has always been card 0 for docker and card 1 for a Windows based VM.  This has been running on 6.9.x for ~6-8 or so weeks before upgrading to 6.10-rc1, then recently rc2.  Same hardware etc.  

     

    Where I mentioned Windows seeing the card but having the error 43 issue was when I booted the entire system into a windows native installation with just a single card in the primary slot.  This was an attempt to see if it was linux/unraid related or narrowing down hardware.  I haven't booted native windows since. 

     

    I will look to update back to rc2 and see what happens.  Perhaps there is some compatibility issue with my motherboard and kernel 5.14.  What I don't understand is that it was running rc2 for a couple of days at least before it all went pear shaped.  I will report back later on in the week as I need the system running and curious to see if it is stable for a few days.

     

     

     

     

     

      

  5. Just now, ich777 said:

    Just as a note it would be easier if you only replace the bz* files on the USB boot device, so youbdon't have to create a whole new stick amd copy everything back.

     

    Can you share your Diagnostics please again with everything working?

     

    Thanks for the hint re replacing bz files - good to know.

     

    Attached diagnostics with everything working, i just fired up one the vm's using the second card to get some load across both and seem fine.  Primary card is used for docker based stuff.

     

    Some what wondering if I should push update to 6.10-rc1/2 and see if it breaks again?

    av4-diagnostics-20211109-1907.zip

  6. 10 hours ago, ich777 said:

    This is completely subjective I have to say, but I never had problems with ASUS or MSI.

     

    Maybe it's also some BIOS or hardware compatibility issue...

     

    Something interesting (started RMA process) but I decided to go back to unraid 6.9.2, I just shut the system down reflashed boot drive and copied across the config and fired the system up (had it running all day today with both cards in but only one properly recognised).  Both cards now recognised. 

     

    My first crash was a couple of weeks ago but I am sure I have been running 6.10-rc1 for quite some time.

     

     

    bothcards.png

  7. 52 minutes ago, ich777 said:

    I only have a few suggestions that you can try:

    • Enable Above 4G deconding in the BIOS
    • Change from UEFI Boot to Legacy and vice versa, depending on what mode you are booting with now
    • Manually set the PCIe Gen for slot one to Gen3 or Gen2 (would not make much of a difference in terms of performance for this most cards anyways)

     

    Also saw that you are using a Gigabyte motherboard from which I am not a huge fan of because they gave me also troubles in the past and also I read recently a few posts about fault Motherboards, not working/correctly working HPET timers, various BIOS issues... :/

     

    Tried the above changes and no difference.  I will go down the RMA process with the supplier again ;(.

     

    What motherboard manufacturer do you recommend (for future reference)?

  8. 50 minutes ago, ich777 said:

     

    My suggestion would be to put the card in another computer and try if it is working there, but also keep in mind that you have to install the Nvidia drivers on that other machine because the basic display output is working most of the time without a problem, only after installing the driver and putting a 3D load on it you will see if it is fully working.

     

    Thanks for going through my logs.  

     

    Windows or unraid in that primary pcie slot the os's are always able to see the card but not functional, windows shows error 43 and you get the picture from unraid.  I have 3 of these cards and thought it might be a faulty card (as they are getting old).  Moving that initial card to another slot everything worked fine but re-using that primary slot regardless of card used I have the issue.  

     

    Going to just a single k1200 card in the system hasn't made any difference even with bios update and reset to defaults etc.  Moving that particular card to another other slot everything works as expected (tested some transcoding operations). 

     

    I think it is pointing to a motherboard issue (this is the second one since purchasing, bad luck as the first one had a faulty LAN controller on it - well wasn't faulty, dead is more appropriate).    I just hope I haven't pushed the boundary on the motherboard by using 2 k1200's (would think not).

     

    Regards and thanks again for parsing my logs.

     

     

     

     

     

      

     

     

     

     

  9. Another update: Windows recognises the card but with error 43, updated drivers and reflashed the card and still not working.  Put it in the other slot and working but this doesn't help long term.  

     

     

    Update: I noticed it was recognising the second card slot... removing that card and putting it in the primary slot the nvidia driver isn't recognising the card, full bios reset to defaults, lspci is showing it but nvidia driver not working... thinking it is something to do with the slot itself since the card is not faulty?

     

    After a random crash today on my unraid server, powered back up to find one of my graphics cards not being recognised.  I've had this setup running for several months without any issues what so ever.  

     

    The nvidia plugin is showing only a single card, but lspci and unraids system devices shows both.  nvidia-smi only shows one card.  I have swapped in a spare card as I originally thought the card had failed but that's not the case.  Bios update to the latest as of today also no success.  

     

    Some hardware background

    Motherboard: Gigabyte Z590 UD AC (F2 original bios, updated to F5)

    2 x nVidia Quadro K1200's

    Unraid Version: 6.10.0-rc2 (has been working without any issues for several weeks on this version and rc1 prior to that)

    Nvidia driver version: Was running latest (v495.44, tried downgrading to v470.82.00) 

     

    Nvidia plugin output (attached)

    System tools/devices (attached card 1 and 2)

     

    I'm not sure what caused the crash, this has happened twice in the last couple of weeks but this setup has been fine for months.  

     

    /proc/(etc) only shows a single card which is as per nvidia plugin.

     

    Any suggestions would be greatly appreciated.

     

     

     

    nvidia-plugin.png

    card1.png

    card2.png

  10. Ty for creating this!

     

    I attempted to migrate from previous docker, expected my custom nginx configuration to work (not overly custom).  One aspect is geoip2 but I am getting error in logs saying geoip directive is not recognized.  Thought it was for the most part all using the same source?

     

    UPDATE: Decided to look at the dockerfile and can see they both use different repositories and docker build files.  

  11. Hi team,

     

    Wondering if anyone has successfully compiled beta 35 with nvidia and tbsos drivers.  I have tried several times and can only get beta29 to compile.  beta30 and beta35 all have issues with tbsos.  I also noticed there are newer drivers from the tbs website.  I did try and download and add to a custom build even using the same filenames as the older ones but simply doesn't compile for me.

     

     

×
×
  • Create New...