Snubbers

Members
  • Posts

    32
  • Joined

  • Last visited

Everything posted by Snubbers

  1. I've had this happen twice now, but effectively only the Unraid web UI becomes unresponsive, I might get the login screen, but loading the dashboard just hangs.. (all other dockers seem to be running fine and all docker web UIs are full speed). I've just had to get someone to use the power button to shut the server down, then power it back on.. this works but I assume the issue will come back.. I've checked logs etc and found they obviously reset at boot, so I've probably lost all pertinent information due to the power cycle. So I'm just waiting for it to happen again, but when it does, what would be the correct way to gather diagnostics if the web UI is unavailable? (I'm just trying to set myself up to start diagnosing the issue if it returns by getting good information to help).
  2. I've just installed a trial version of unRAID (alongside my main unRAID server). I've got a USB stick (Sandisk 32Gb) for the main array and an NVME for cache and an SSD as a second pool. Diskspeed detects all teh controllers and disks, showing the USB drives (the main array and the boot flash) all connected to the USB controller and there are drive images (close enough for the SAN DISK usb sticks).. Clicking on NVME or SSD brings up the info and benchmark buttons on the right.. Clicking on a USB drive brings the image of the drive up on the right, but no info or buttons. Is it possible to Benchmark USB Sticks? My intent is to use this very low power USFF PC as an app server, so don't need an array, so followed advice on the forums to drop in a USB stick to get around the array issue and that works well, but I would love to benchmark the USB sticks as I may use them for backups and other purposes, but wanted to check their performance. Apologies if I'm missing something, I've searched hte thread but can't see anything to confirm/deny if USB drives can be benchmarked.
  3. Thanks, I will certainly do that this time! I've just been reading up on reconfiguring the array, but honestly, just jumping through the hoop of letting it rebuild disk3 on the old Parity drive, then upgrade that afterwards isn't a big deal in the grand scheme!
  4. Hi, I had a drive 'disable' due to reallocated sector counts suddenly increasing.. (8TB Ironwolf NAS edition). The plan was to buy 2 x 14TB HDDS and 1. Swap the current 8TB parity drive with one of the new 14TB Drives 2. Once swapped, pop in the other new 14TB drive as a replacement data drive. (My plan for the old 8TB Parity is to actually replace the other remaining 8TB Ironwolf NAS drive in the array at at later date) (For reference, Disk 3 is the 'disabled' drive) So I started following the wiki for Parity Swapping A. I pre-cleared both 14TB HDDs, then stopped the array. B. I assigned the one of the new 14TB HDD to the Parity Drive Slot C. I assigned the old 8TB Parity HDD to the missing data drive (Disk 3) D. Both had blue icons next to them and a tick box / copy button appear, which I ticked and clicked 'copy' E. I waited 30 hours until copying completed! All good so far! So it's completed copying and I think it probably did have the 'Start' button ready for me, but this is where I went off piste and messed up! Since I didn't want to rebuild Disk 3 on the old 8TB Parity drive, but on the second new precleared 14TB Drive, I simply assigned Disk 3 to the second new 14TB Drive and erroneously assumed it'd keep the start array button and want to rebuild on to whatever was assigned in Disk 3.. Nope.. doing this just had 2 'x's against the parity and disk 3 slots and 'too many errors' preventing the array starting.. So I switched disk 3 back to the old 8TB Parity drive and that showed 'blue' icons against Parity/DIsk 3 and it reset the parity copy process back to the beginning.. I have kicked off the parity copy again, but once completed tomorrow, how should I proceed considering I want to actually replace the data drive with a new 14TB one.. I think the long way would be to let it rebuild the old parity drive as Disk 3, then once complete, stop the array and reassigned disk 3 as the other 14TB drive, and let it rebuild that.. Or is there are slightly better shortcut?
  5. Thanks for the help! I've fired up a Win 10 VM (Fresh) and instantly getting 250-320Mbps , as the Win 10 VM is updating whilst running it, that's about right! speedtest tracker is still 55-60Mbps. Additional checks 1. I've switched speedtest tracker to use br0 (Same as the Windows 10 machine) and no change 2. I've checked the 'Interface' stats on the unRaid dashboard and that shows the VM @ 300+Mbps and the Speedtest Tracker @ ~60Mbps It does look isolated to the speedtest dockers for some reason?
  6. I've always been having this issue (with the speedtest app and speedtest-tracker app). Effectively my ISP provides 350Mbit down / 35Mbit Upstream.. I can max out the connection in SABNZBD and other download dockers, but the two speedtest ones seem to cap at 60Mbit/s down. I've tried Bridge/Host network modes to match the other dockers that can max the connection out but nothing changes, it's still capping. I've checked the speedtest endpoint which is the one closest to me, and running speedtest from any browser on any other device in the house maxes out the connection to the same server absolutely fine.. It's almost like OOKLA's CLI is limiting things? I get good network speeds in the house (I can saturate the gigabit connection if transferring files etc.) Any ideas?
  7. Another happy user: I just picked up the RM850i second hand and negotiated based on the fact I presumed the 'i' part would be useless on UNRAID so got a bit knocked off.. Imagine my surprise when I suddenly came across this in the app store and plugged in the USB cable internally and it just worked! The only stat I was slightly intrigued about was the overall power draw, as I have 5 HDDs, 2 NVME Drives and 3 SSDs with a GTX1660 Super this is about what I'd expect when not doing too much.
  8. Nice! OT, my 1660 super keeps dropping off the bus again within a few hours of rebooting. I've written a user script that is scheduled every 10 minutes to just run nvidia-smi and put a log entry to say if it's present or not to see exactly when it drops off the bus. Currently you only know if you have GPU Stats installed so when you login to the WEBUI it fires up nvidia-smi and that's when you realise, but since I login fairly infrequently I don't know if there is a pattern to the GPU disappearing or if it's truly random.
  9. Or maybe "Hi Gee1" this thread is for specific issues with the unraid NVidia Plugin, to request changes to Dockers, see the support thread for that docker (should be in the following sub-forum: https://forums.unraid.net/forum/47-docker-containers/ , if not the app store entry for the docker should link to its support thread"
  10. It looks correct to me, you will notice there are 2 bar graphs per line, representing two cores, CPUx means it think that's a full Core, HTx means it think that core is a hyper threaded (or equivalent, partial core) linked to the first one. i.e. CPU0 - HT1 [ ]0% [ ]0% So with 2 'cores' represented per line that is 64 cores shown overall which gives the correct total. Why it thinks half the cores are Hyper Threaded might be a foible, or it might be that generically the opeterons have some architecture that does have some shared resources between cores and so it's probably best classified as Hyper Threaded. [edit] Ahh, OK the opteron 6380 is based on Pile Driver cores, these are an improved version of the Bull Dozer cores, which do share some resources between two threads, in which case, the "HT" naming of alternate CPU's is a bit of a generalisation but warranted.
  11. Just to update my own 'issue' and say I think I've tentatively found a solution after a bit more problem solving Issues 1. Multiple Log entries stating "kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]" and "kernel: caller _nv000908rm+0x1bf/0x1f0 [nvidia] mapping multiple BARs" This correlated with nvidia-smi being called (either manually or by GPU Stats plugin). Solution - It seems moving this to a different PCIE slot (I have two x16 slots, although not x16 if you use both) has stopped this error from appearing 2. The GPU dropping off the bus randomly and then being held in reset (fans at full speed) with nvidia-smi reporting a Lost GPU. Solution - After trying many things: - Moving Slots (this fixed issue 1, but had no effect on this isue) - Passing through to a Win 10 VM - could not get it to work (error code 43) despite the correct VBios and stubbing the additional 2 devices (The 1660 super appears as 4 devices), however I believe this also had issues with the GPU getting 'lost' as the VM could not start randomly and I'd need to reboot. - Native booting to Win 10 worked fine, no issues (I had to remove the HBA Card to ensure no changes to my array could occur) - New BIOS revision - Made no difference - Changing power supplies - Made no difference - Finally (and why I didn't try this earlier) I tried to run memtest from the unRAID boot menu which would just reset the PC and it would never load memtest. I found out that you can't run memtest if booting using UEFI, so disabled that in the BIOS. Memory testing passed 24 hours of testing, however I then remembered reports that VM Pass through could be problematic with UEFI enabled, so kept it disable and booted to unRAID in legacy mode, and it's been 3+ days now without the GPU falling over. I am a bit tentative, however, the UEFI being flaky with GPU's for VMs and it affecting the Linux Drivers is plausible so I'll update again if it makes it to a week. I'm also running GPU Stats again (no multiple BAR reports still). TLDR: - I think/hope I've fixed my problem and just want to share incase it helps anyone else, disabling UEFI seems to have got me a nice stable system. Just the plex issue of not managing the power modes correctly which is definitely not an issue with this plugin!
  12. Just a quick update on my 1660 super getting 'lost'.. I've now ruled out the HW (I think) - Booting directly to a windows 10 native install (I just removed the HBA controller and unRAID USB Stick and used a spare 120GB SSD with a fresh WIN 10 install) I have run GPU tools for 3 days solid with no issues seen - I've then booted to unRAID (non-nvidia) and passed it through as the primary GPU to a Win10 VM and that's run diagnostics for 2.5 days with no issues) All I can think is the 440.59 linux drivers don't sit nicely with my Asus 1660 Super OC Phoenix GPU / Ryzen 3600 / Asus B450F Motherboard. I guess in the spirit of this plugin all I can do now is not use the GPU until the next unraid build is released and hopefully the plugin can be updated to the latest drivers. I appreciate LinuxServer are busy so I can't expect anything more than they've stated. I have popped a +1 post in the feature request for native unRAID support for NVidia Drivers. I'll focus on fixing the one or two niggles I've not yet resolved and sit patiently with fingers crossed!
  13. Just to add another hand in the air for someone who would love not just baked in Nvidia GPU drivers, but also the ability to update said drivers if required. I absolutely love unRAID now, I'm a bit of a convert, and love the support, have no expectation that people should support plugins etc, so will not be complaining that limetech haven't 'fixed' my issues yet or anything, I just wanted to say that this feature (and presuming some ability to upgrade nvidia drivers) would be very much appreciated., especially as it might well solve one of the last niggles I have (my 1660 super falling off the PCI bus randomly every day or so). I like the talk in the thread about having it optional, we all use our servers for our own reasons that are most important to us. I also think (as someone who has been developing all manner of software and dealing with OS issues for 30 years) that I can imagine opening up things like driver support for GPU's is a bit of a minefield (however, my experience of unRAID and HBA controllers has shown it's pretty good at this!). Anyway this is just a +1 to the OP's request..
  14. I have quite a collection of USB memory sticks, so when choosing which one to use for unRAID I tried a few, from USB 2.0, USB 3.0 and USB 3.1. By far the most concerning one was the Sandisk Ultra Fit 3.1, just sat doing nothing in a USB slot it gets inordinately hot and I worried about it's longevity so tried all my options and found the Sandisk Ultra USB 3.0 16gb/32gb sticks to be performant with no heat issues. As mentioned the only theoretical advantage of faster USB drives is when unRAID boots, it loads everything in to RAM so you might see a marginal improvement in boot speed using a faster USB stick, but once booted the USB stick is unused as it's all down to your CPU and RAM as to how fast unRAID runs. Dockers / VMs etc will also depend on the speed of the cache drive (if using one) and how much it accesses the main array etc, but again, nothing dependent on the unRAID USB stick speed.
  15. I run a 3900x / X570 setup for my main gaming PC and a 3600 / B450 for unRAID. PSU shouldn't hugely matter.. the 3900x stock + Motherboard should be < 200W (much less when idling) 8 HDD's (2A each at power up) would be ~200W (much less when idling) Assuming SSD's and other items, you'd probably be OK with a 650W.. However, if it was me, I'd go for an 850 so a GPU + Overclocking and more HDD's would't need a PSU upgrade. I can say that my 3900x on a EVGA Bronze 650 does not overclock that well, it's not the wattage, I think it's the PSU's ability to handle sudden changes in power requirements, my Seasonic 850W Focus Platinum allows higher overclocks. I was fortunate to also pick up a cheap HX1200i corsair PSU (A mining rig spare someone was offloading) and that is obviously overkill, but also like the Seasonic I get the same high overclocks for my 3900X. RAM, aside from ECC/non-ECC the only advice I would give is that Ryzen 3rd gen supports higher memory speeds and they can make a performance difference. I have 32GB Of DDR4 3600 C18 in my unRAID box (Corsair 3800 C18 as it is good VFM). I am not running ECC (have on previous servers) but it's a consideration as your motherboard does support un-buffered ECC.
  16. Thanks for the help. I uninstalled the GPU Stats plugin and rebooted. The issue happened again within 10 minutes (Checking with nvidia-smi I get the same GPU Lost message) I rebooted, and it maybe lasted 4 hours or so before happening again. I've used the nvidia-bug-report.sh that is mentioned when nvidia-smi loses the GPU and also carefully checked the syslog 1. Despite my GTX1660 Super being on the 440.59 supported list (checked on nvidia.com). the nvidia-bug-report.log files states "WARNING: You do not appear to have an NVIDIA GPU supported by the 440.59 NVIDIA Linux graphics driver installed in this system".. 2. Trawling through various logs, I found the error code XID79 just before the GPU went missing on one occasion, on the Nvidia developer site, this unfortunately can be attrutable to pretty much anything, HW error, Driver Error, Temperature etc.. 3. I've been checking the temperatures / HW state of the card, after boot it's in P0 (12W out of 125W) @ 33C, it them occasionally bumps up to P0 (26W/125W)@44C, so even when plex uses the card, 44C is barely ticking over, so pretty sure it's not temperature. 4. I think (looking at logs) there could possibly be some correlation between drives spinning down and the GPU crashing (or it may well be coincidence), I would like to try bulk spinning down/up the drives to see if power spikes might be upsetting the GPU, as I know HDD's draw the most power when they are spinning up.. [edit] - I found example user scripts to spin down/up all disks and tried those several times, whilst the GPU is idle and whilst transcoding a 4k HDR, no issues found. 5. I did at some point (more of a quick trial) have some User Scripts to 'tweak the driver for obvious reasons' and also to bump the card back to it's lowest power setting.. I haven't had these enabled for some time, so I've deleted the scripts entirely , and re-installed the unRAID-NVidia 6.8.3 from the plugin just to 'clear' things out.. 6. with 100% repeatability, I can trigger the "caller _nv000908rm+0x1bf/0x1f0 [nvidia] mapping multiple BARS" and hte associated memory spanning message by just running nvidia-smi to check the GPU is still there, I did this every 5-10 minutes over lunch and everytime I get an associated message in syslog. So nothing conclusive yet, some observations, some clutching at straws, but I sense maybe some experimentation and discussion might prompt something of note.. One test I 'may' do is to go back to the normal unRAID build, and pass the GPU through to my windows 10 VM (it's only spun up once in a blue moon) and run something GPU intensive on that and see if it ever loses the GPU, whilst this is changing a few too many variables at once, it would at least indicate the HW itself is OK (Power/Temperature concerns etc)..
  17. That's a very good observation, I'll uninstall that plugin, reboot to ensure it's 'clean' and then report back!
  18. Any ideas how to start debugging an issue I'm having where the GPU just disappears? Basic Scenario - LinuxServer.io - Plex using GPU HW Encoding - Unraid Nvidia Plugin (for 6.8.3) - Brand new Asus 1660 Super (power led is white indicating all is well) - GPU Statistics Plugin Initially I had the GPU setup and encoding in Plex within minutes, having followed all the nice guides on here. The issue is that every day or so the GPU will just disappear (the webui Dashboard GPU Stats has no numbers, just '/' against each stat). Running nvidia-smi in a terminal gives me: "Unable to determine the device handle for GPU 0000:09:00.0: Unknown Error" The GPU itself has the fans at max as if it's crashed, I have to reboot the system where it then works again for a day or so. I was checking remotely this morning, and GPU Stats was showing sensible numbers until around 10:30 this morning, but as can be seen in the syslog, it gets spammed before/after this time with: Apr 27 10:50:10 DIG-NAS-UR001 kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window] Apr 27 10:50:10 DIG-NAS-UR001 kernel: caller _nv000908rm+0x1bf/0x1f0 [nvidia] mapping multiple BARs (hundreds of entries) and once the card's disappeared, I get a small number of entries: Apr 27 14:21:17 DIG-NAS-UR001 kernel: NVRM: GPU 0000:09:00.0: request_irq() failed (-22) I just don't know where to start, I have grabbed the diagnostics just in case it's useful (will upload) but just want to get advice on where to start/if anyone can help. The GPU with PLEX is working fantastically (when it works) I can transcode my recently ripped UHD movies and HW encode to 1080p/720p with no issues, so would love to get this working 'full time'.. Thanks for any help! edit - just to confirm, a remote 'restart' of unraid gets it going again.
  19. Thanks for the reply! In this day/age of security I'd say it's becoming essential if you do expose services by - Adding a layer of anonymity, anyone snooping around won't know the service you are proxying to, all they will know if they fail the access list authentication is you are running nginx. - By directly exposing the service, I am relying on the robustness of their individual authentication methods, and this ties in with the previous point of hiding the service as much as possible. My setup (in case it helps in any way!) It's setup using br0 (so own IP address) using default 8080/4443 ports My DNS record is a subdomain CNAME pointing to a Dynamic DNS address that points to my WAN IP. My proxy host in NPM is set as follows (private info removed Domain Name: subdomain.mydomain.com scheme: http Forward Hostname/IP: NginxProxyManager (I'm using the container name, but tried the IP as well with the same issue) Forward Port: 8181 Cache Assets: Off Block Common Exploits: On Websockets Support: Off Access List: "Home" (A list called home with a single user, 'admin' ) Custom Locations: None SSL: Custom (1and1 Wildcard cert for my domain) Force SSL: On HSTS Enabled: Off HTTP/2 Support: Off HSTS Subdomains: N/A Advanced: Empty It may well be an issue with the NPM itself?
  20. This may well be the stupidest idea ever and feel free to laugh.. I have added a Proxy Host to effectively reverse proxy to the NPM's (NginxProxyManagers) own WEB UI, I wondered if it would blow up, but that part works well, I can access the proxy manager externally (using a sub-domain) and SSL. What doesn't work is when I add an 'Access List' to the Proxy Host config, I do this for my other Proxy Hosts to my other dockers, this gives a first layer of authentication independently of the target docker which makes me sleep better! When I say doesn't work, I mean, when you first access the URI externally you get the authentication dialog from the access list, but entering the correct credentials has it just pop up the same authentication dialog again, I can't get to the NPM login page. Not sure if I'm being stupid here, it feels wrong proxying to itself, but the WEBUI is on port 8081, the proxying is over 8080/4443 (the defaults).
  21. My extended test has been stuck for 8 hours. I initially excluded all but one share, but it started checking app data as others have reported, then after 3 hours moved to the next folder now it’s stuck on the share I have all my backups on. is it’s safe to just terminate the process? I’m in the middle of an iteratively scheduled parity check which will take a couple more days before complete.
  22. I have been over there and see it affects a few people so added to a thread or two! Thanks! The other issue which I've also just had is the DNS rebind issue that only affects EAC3, so that's two silent ways it won't work 1. EasyAudioEncoder X flag setting incorrectly 2. EAE uses URI's (*.plex.direct) that my router for one sees as a DNS rebind attempt and blocks it.. I was sat there scratching my head when the same EAC3 audio tracks wouldn't play and finally stumbled across this whilst on the plex forums.. just adding *.plex.direct as an exception on the router and all is well again! This seems something so easy for plex to fix but they just seem to sit on it.. sometimes you feel like just doing it for them!
  23. I've just realised I have/had the EAC3 'issue'. i.e. any video file with EAC3 audio requiring audio transcode down to 2 channgels (most of my client apps) won't play the file, I get the following log entry: "ERROR - [Transcoder] [eac3_eae @ 0x7e9840] EAE timeout! EAE not running, or wrong folder? Could not read '/tmp/pms-198c89ec-c5fa-4ceb-99dc-409b57434d00/EasyAudioEncoder/Convert to WAV (to 8ch or less)/C02939D8-5F8B-432B-9FD9-6E7F76C40456_522-0-21.wav'" I found a solution in this thread, just deleting the appdata\plex\..\Codecs folder and restart so it recreates it and everything seems fine now! Instead of deleting, I just renamed the folder to "Codecs_OLD" so I could see what the difference was, there are only two differences 1. The licence file has different contents 2. (probably the most crucial!) the "EasyAudioEncoder" file (2.5Mb, no extension) does not have the executable flag set on the old non-working version! I think this happened after the update a day or so ago, or that's when I noticed it! Obviously it's fixed for now, but just wondering if anyone has any idea on how it might have happened in case it comes back at a later date?
  24. Terrible timing, I only just set it up yesterday after trying many many options (all vpn's suck for torrents it seems) And also getting the curl response code mismatch error Thanks Binhex, I thought it might be PIA's end!