Jump to content

Snubbers

Members
  • Content Count

    26
  • Joined

  • Last visited

Community Reputation

3 Neutral

About Snubbers

  • Rank
    Member

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. Another happy user: I just picked up the RM850i second hand and negotiated based on the fact I presumed the 'i' part would be useless on UNRAID so got a bit knocked off.. Imagine my surprise when I suddenly came across this in the app store and plugged in the USB cable internally and it just worked! The only stat I was slightly intrigued about was the overall power draw, as I have 5 HDDs, 2 NVME Drives and 3 SSDs with a GTX1660 Super this is about what I'd expect when not doing too much.
  2. Nice! OT, my 1660 super keeps dropping off the bus again within a few hours of rebooting. I've written a user script that is scheduled every 10 minutes to just run nvidia-smi and put a log entry to say if it's present or not to see exactly when it drops off the bus. Currently you only know if you have GPU Stats installed so when you login to the WEBUI it fires up nvidia-smi and that's when you realise, but since I login fairly infrequently I don't know if there is a pattern to the GPU disappearing or if it's truly random.
  3. Or maybe "Hi Gee1" this thread is for specific issues with the unraid NVidia Plugin, to request changes to Dockers, see the support thread for that docker (should be in the following sub-forum: https://forums.unraid.net/forum/47-docker-containers/ , if not the app store entry for the docker should link to its support thread"
  4. It looks correct to me, you will notice there are 2 bar graphs per line, representing two cores, CPUx means it think that's a full Core, HTx means it think that core is a hyper threaded (or equivalent, partial core) linked to the first one. i.e. CPU0 - HT1 [ ]0% [ ]0% So with 2 'cores' represented per line that is 64 cores shown overall which gives the correct total. Why it thinks half the cores are Hyper Threaded might be a foible, or it might be that generically the opeterons have some architecture that does have some shared resources between cores and so it's probably best classified as Hyper Threaded. [edit] Ahh, OK the opteron 6380 is based on Pile Driver cores, these are an improved version of the Bull Dozer cores, which do share some resources between two threads, in which case, the "HT" naming of alternate CPU's is a bit of a generalisation but warranted.
  5. Just to update my own 'issue' and say I think I've tentatively found a solution after a bit more problem solving Issues 1. Multiple Log entries stating "kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]" and "kernel: caller _nv000908rm+0x1bf/0x1f0 [nvidia] mapping multiple BARs" This correlated with nvidia-smi being called (either manually or by GPU Stats plugin). Solution - It seems moving this to a different PCIE slot (I have two x16 slots, although not x16 if you use both) has stopped this error from appearing 2. The GPU dropping off the bus randomly and then being held in reset (fans at full speed) with nvidia-smi reporting a Lost GPU. Solution - After trying many things: - Moving Slots (this fixed issue 1, but had no effect on this isue) - Passing through to a Win 10 VM - could not get it to work (error code 43) despite the correct VBios and stubbing the additional 2 devices (The 1660 super appears as 4 devices), however I believe this also had issues with the GPU getting 'lost' as the VM could not start randomly and I'd need to reboot. - Native booting to Win 10 worked fine, no issues (I had to remove the HBA Card to ensure no changes to my array could occur) - New BIOS revision - Made no difference - Changing power supplies - Made no difference - Finally (and why I didn't try this earlier) I tried to run memtest from the unRAID boot menu which would just reset the PC and it would never load memtest. I found out that you can't run memtest if booting using UEFI, so disabled that in the BIOS. Memory testing passed 24 hours of testing, however I then remembered reports that VM Pass through could be problematic with UEFI enabled, so kept it disable and booted to unRAID in legacy mode, and it's been 3+ days now without the GPU falling over. I am a bit tentative, however, the UEFI being flaky with GPU's for VMs and it affecting the Linux Drivers is plausible so I'll update again if it makes it to a week. I'm also running GPU Stats again (no multiple BAR reports still). TLDR: - I think/hope I've fixed my problem and just want to share incase it helps anyone else, disabling UEFI seems to have got me a nice stable system. Just the plex issue of not managing the power modes correctly which is definitely not an issue with this plugin!
  6. Just a quick update on my 1660 super getting 'lost'.. I've now ruled out the HW (I think) - Booting directly to a windows 10 native install (I just removed the HBA controller and unRAID USB Stick and used a spare 120GB SSD with a fresh WIN 10 install) I have run GPU tools for 3 days solid with no issues seen - I've then booted to unRAID (non-nvidia) and passed it through as the primary GPU to a Win10 VM and that's run diagnostics for 2.5 days with no issues) All I can think is the 440.59 linux drivers don't sit nicely with my Asus 1660 Super OC Phoenix GPU / Ryzen 3600 / Asus B450F Motherboard. I guess in the spirit of this plugin all I can do now is not use the GPU until the next unraid build is released and hopefully the plugin can be updated to the latest drivers. I appreciate LinuxServer are busy so I can't expect anything more than they've stated. I have popped a +1 post in the feature request for native unRAID support for NVidia Drivers. I'll focus on fixing the one or two niggles I've not yet resolved and sit patiently with fingers crossed!
  7. Just to add another hand in the air for someone who would love not just baked in Nvidia GPU drivers, but also the ability to update said drivers if required. I absolutely love unRAID now, I'm a bit of a convert, and love the support, have no expectation that people should support plugins etc, so will not be complaining that limetech haven't 'fixed' my issues yet or anything, I just wanted to say that this feature (and presuming some ability to upgrade nvidia drivers) would be very much appreciated., especially as it might well solve one of the last niggles I have (my 1660 super falling off the PCI bus randomly every day or so). I like the talk in the thread about having it optional, we all use our servers for our own reasons that are most important to us. I also think (as someone who has been developing all manner of software and dealing with OS issues for 30 years) that I can imagine opening up things like driver support for GPU's is a bit of a minefield (however, my experience of unRAID and HBA controllers has shown it's pretty good at this!). Anyway this is just a +1 to the OP's request..
  8. I have quite a collection of USB memory sticks, so when choosing which one to use for unRAID I tried a few, from USB 2.0, USB 3.0 and USB 3.1. By far the most concerning one was the Sandisk Ultra Fit 3.1, just sat doing nothing in a USB slot it gets inordinately hot and I worried about it's longevity so tried all my options and found the Sandisk Ultra USB 3.0 16gb/32gb sticks to be performant with no heat issues. As mentioned the only theoretical advantage of faster USB drives is when unRAID boots, it loads everything in to RAM so you might see a marginal improvement in boot speed using a faster USB stick, but once booted the USB stick is unused as it's all down to your CPU and RAM as to how fast unRAID runs. Dockers / VMs etc will also depend on the speed of the cache drive (if using one) and how much it accesses the main array etc, but again, nothing dependent on the unRAID USB stick speed.
  9. I run a 3900x / X570 setup for my main gaming PC and a 3600 / B450 for unRAID. PSU shouldn't hugely matter.. the 3900x stock + Motherboard should be < 200W (much less when idling) 8 HDD's (2A each at power up) would be ~200W (much less when idling) Assuming SSD's and other items, you'd probably be OK with a 650W.. However, if it was me, I'd go for an 850 so a GPU + Overclocking and more HDD's would't need a PSU upgrade. I can say that my 3900x on a EVGA Bronze 650 does not overclock that well, it's not the wattage, I think it's the PSU's ability to handle sudden changes in power requirements, my Seasonic 850W Focus Platinum allows higher overclocks. I was fortunate to also pick up a cheap HX1200i corsair PSU (A mining rig spare someone was offloading) and that is obviously overkill, but also like the Seasonic I get the same high overclocks for my 3900X. RAM, aside from ECC/non-ECC the only advice I would give is that Ryzen 3rd gen supports higher memory speeds and they can make a performance difference. I have 32GB Of DDR4 3600 C18 in my unRAID box (Corsair 3800 C18 as it is good VFM). I am not running ECC (have on previous servers) but it's a consideration as your motherboard does support un-buffered ECC.
  10. Thanks for the help. I uninstalled the GPU Stats plugin and rebooted. The issue happened again within 10 minutes (Checking with nvidia-smi I get the same GPU Lost message) I rebooted, and it maybe lasted 4 hours or so before happening again. I've used the nvidia-bug-report.sh that is mentioned when nvidia-smi loses the GPU and also carefully checked the syslog 1. Despite my GTX1660 Super being on the 440.59 supported list (checked on nvidia.com). the nvidia-bug-report.log files states "WARNING: You do not appear to have an NVIDIA GPU supported by the 440.59 NVIDIA Linux graphics driver installed in this system".. 2. Trawling through various logs, I found the error code XID79 just before the GPU went missing on one occasion, on the Nvidia developer site, this unfortunately can be attrutable to pretty much anything, HW error, Driver Error, Temperature etc.. 3. I've been checking the temperatures / HW state of the card, after boot it's in P0 (12W out of 125W) @ 33C, it them occasionally bumps up to P0 (26W/125W)@44C, so even when plex uses the card, 44C is barely ticking over, so pretty sure it's not temperature. 4. I think (looking at logs) there could possibly be some correlation between drives spinning down and the GPU crashing (or it may well be coincidence), I would like to try bulk spinning down/up the drives to see if power spikes might be upsetting the GPU, as I know HDD's draw the most power when they are spinning up.. [edit] - I found example user scripts to spin down/up all disks and tried those several times, whilst the GPU is idle and whilst transcoding a 4k HDR, no issues found. 5. I did at some point (more of a quick trial) have some User Scripts to 'tweak the driver for obvious reasons' and also to bump the card back to it's lowest power setting.. I haven't had these enabled for some time, so I've deleted the scripts entirely , and re-installed the unRAID-NVidia 6.8.3 from the plugin just to 'clear' things out.. 6. with 100% repeatability, I can trigger the "caller _nv000908rm+0x1bf/0x1f0 [nvidia] mapping multiple BARS" and hte associated memory spanning message by just running nvidia-smi to check the GPU is still there, I did this every 5-10 minutes over lunch and everytime I get an associated message in syslog. So nothing conclusive yet, some observations, some clutching at straws, but I sense maybe some experimentation and discussion might prompt something of note.. One test I 'may' do is to go back to the normal unRAID build, and pass the GPU through to my windows 10 VM (it's only spun up once in a blue moon) and run something GPU intensive on that and see if it ever loses the GPU, whilst this is changing a few too many variables at once, it would at least indicate the HW itself is OK (Power/Temperature concerns etc)..
  11. That's a very good observation, I'll uninstall that plugin, reboot to ensure it's 'clean' and then report back!
  12. Any ideas how to start debugging an issue I'm having where the GPU just disappears? Basic Scenario - LinuxServer.io - Plex using GPU HW Encoding - Unraid Nvidia Plugin (for 6.8.3) - Brand new Asus 1660 Super (power led is white indicating all is well) - GPU Statistics Plugin Initially I had the GPU setup and encoding in Plex within minutes, having followed all the nice guides on here. The issue is that every day or so the GPU will just disappear (the webui Dashboard GPU Stats has no numbers, just '/' against each stat). Running nvidia-smi in a terminal gives me: "Unable to determine the device handle for GPU 0000:09:00.0: Unknown Error" The GPU itself has the fans at max as if it's crashed, I have to reboot the system where it then works again for a day or so. I was checking remotely this morning, and GPU Stats was showing sensible numbers until around 10:30 this morning, but as can be seen in the syslog, it gets spammed before/after this time with: Apr 27 10:50:10 DIG-NAS-UR001 kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window] Apr 27 10:50:10 DIG-NAS-UR001 kernel: caller _nv000908rm+0x1bf/0x1f0 [nvidia] mapping multiple BARs (hundreds of entries) and once the card's disappeared, I get a small number of entries: Apr 27 14:21:17 DIG-NAS-UR001 kernel: NVRM: GPU 0000:09:00.0: request_irq() failed (-22) I just don't know where to start, I have grabbed the diagnostics just in case it's useful (will upload) but just want to get advice on where to start/if anyone can help. The GPU with PLEX is working fantastically (when it works) I can transcode my recently ripped UHD movies and HW encode to 1080p/720p with no issues, so would love to get this working 'full time'.. Thanks for any help! edit - just to confirm, a remote 'restart' of unraid gets it going again.
  13. Thanks for the reply! In this day/age of security I'd say it's becoming essential if you do expose services by - Adding a layer of anonymity, anyone snooping around won't know the service you are proxying to, all they will know if they fail the access list authentication is you are running nginx. - By directly exposing the service, I am relying on the robustness of their individual authentication methods, and this ties in with the previous point of hiding the service as much as possible. My setup (in case it helps in any way!) It's setup using br0 (so own IP address) using default 8080/4443 ports My DNS record is a subdomain CNAME pointing to a Dynamic DNS address that points to my WAN IP. My proxy host in NPM is set as follows (private info removed Domain Name: subdomain.mydomain.com scheme: http Forward Hostname/IP: NginxProxyManager (I'm using the container name, but tried the IP as well with the same issue) Forward Port: 8181 Cache Assets: Off Block Common Exploits: On Websockets Support: Off Access List: "Home" (A list called home with a single user, 'admin' ) Custom Locations: None SSL: Custom (1and1 Wildcard cert for my domain) Force SSL: On HSTS Enabled: Off HTTP/2 Support: Off HSTS Subdomains: N/A Advanced: Empty It may well be an issue with the NPM itself?
  14. This may well be the stupidest idea ever and feel free to laugh.. I have added a Proxy Host to effectively reverse proxy to the NPM's (NginxProxyManagers) own WEB UI, I wondered if it would blow up, but that part works well, I can access the proxy manager externally (using a sub-domain) and SSL. What doesn't work is when I add an 'Access List' to the Proxy Host config, I do this for my other Proxy Hosts to my other dockers, this gives a first layer of authentication independently of the target docker which makes me sleep better! When I say doesn't work, I mean, when you first access the URI externally you get the authentication dialog from the access list, but entering the correct credentials has it just pop up the same authentication dialog again, I can't get to the NPM login page. Not sure if I'm being stupid here, it feels wrong proxying to itself, but the WEBUI is on port 8081, the proxying is over 8080/4443 (the defaults).