CorneliousJD

Members
  • Posts

    689
  • Joined

  • Last visited

  • Days Won

    1

CorneliousJD last won the day on October 28 2020

CorneliousJD had the most liked content!

3 Followers

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

CorneliousJD's Achievements

Enthusiast

Enthusiast (6/14)

117

Reputation

  1. Ok thanks - that's where my confusion was, the Latest and Open Source were the same version number, and I assumed that it was auto-selecting the open source option. I understand how that I look at it a bit deeper (Today is the first day I've seen this plugin page option) Sorry for the confusion, but thanks for the explanation!
  2. Wonderful, thanks for the quick reply. I'll be using it just for HW transcoding and maybe some LLM tinkering in the future. I'm just not clear if I need the nvidia.conf file in the modprobe.d since "latest" right now seems to be the open source driver -- it mentions workstation cards and geforce cards, but the Tesla is technically neither, and is a "Datacenter card" I think I'm going to just bit the bullet and order the card now.
  3. I've tried looking through this thread but there's nearly 160 pages now. Can someone tell me if I install a Tesla P4 card (datacenter card) into my server, which driver should I be choosing? I'm not particularly confident that I know the option differences here, what are the pros/cons to the production/new feature/open source drivers? Will any of the 3 work w/ the Tesla P4?
  4. Hi there, I am going to be ordering a Tesla P4 to do hardware transcoding with but I'm finding only very old info about how to get Plex to see it and use it. I'd very much prefer to stick w/ the official container, but I'm unsure what new variables I need to add, and which specific devices I'd need to pass thru, if any? Old spaceinvaderone videos just had device UUIDs being added via variables but not actual devices being passed thru. I'm also not sure which driver version I should be picking/using w/ a Tesla GPU with the latest nvidia driver plugin. Is the official production one needed for Tesla GPUs (or preferred?) or should I stick w/ open source driver? Any help is appreciated!
  5. I wanted to make sure to post here for anyone else who finds this later, but i think this may be resolved. So there was definitely an issue somewhere, my MariaDB got corrected and that may have been the culprit in the end? I re-created docker folder as docker.img and re-added containers and they all added fine ,server was stable, but upon rebooting w/ docker autostarts on, it went to 100% again. I was patient this time and let it sit and it stabilized after 10-15 mins or so. I know I have a lot of containers and this is likely to be epxected. Whatever was initially going on though, the server wasn't stabilizing and it was sitting at 100% CPU for hours, so perhaps a combination of fixing my MariaDB (restored from backup) and/or re-creating docker folder as docker.img ended up resolving everyting. I'm at about 24 hours of stability w/out hitting high CPU now. Thanks JorgeB for the help. (Again!)
  6. Nearly every single container is telling me there's a shared mapping, is this normal? It doesn't seem right. Lidarr's appdata for example isn't mapped anywhere else
  7. I've completed everything above, during recreating my docker.img file I loaded everything back from previous apps in CA and they all ran fine, sitting at like 30% CPU usage. I setup my autostartup on docker containers I wanted auto-starting and rebooted and I'm back at 100% CPU again... I'm at a total and utter loss here. Is there anything I can look into? The server becomes unusable after it sits like this for a while.
  8. Interesting, I'll move back to a docker.img file -- i figured without that layer of the img file that i might see better performance w/ just a folder instead, plus never hitting limits (as you can see I have a LOT of containers running) I'll let the current btrfs balance finish first then kill off the docker folder and re-create a docker.img instead in /mnt/cache/system/docker/docker.img and re-fire up containers and see what happens from there.
  9. I left docker running and turned them all off one by one and watched my CPU but no changes, I ended up having every single container in a stopped state and CPU was still at 100% with dockerd being the culprit still. Very strange to me. I'm running a btrfs balance right now and CPU is back to 100% usage but it was shooting up before I started that even. Currently working so it's difficult to dedicate a ton of time to this during the work day but I'm starting to now regret my decision to go NVMe because everything was absoultely fine before this. I'm wondering if its somehow related?
  10. I left the server on overnight and woke up to much lower CPU usage, but I'm still wondering what could have caused this - the server became so sluggish it took an extremely long time to do anything. Also btw the top command showed 0.0 wa (IOWait) during all of this, forgot to include that screenshot.
  11. I'm at my witts end here on this one, totally at a loss. Been at this for hours tonight and I think I need to admit I need some help here. I replaced my cache drives w/ new ones, but found out that they didn't support deterministic trim, so I swapped them out for a Asus Hyper M.2 v2 PCIe 3.0 card that holds 4x M.2 NVMe SSDs, and did PCIe bifurcation to 4x/4x/4x/4x and that worked great. I swapped the NVMe drives in to the cache pool one at a time. It took about 12 hours for each drive to rebuild. I did notice my CPU pushing 100% usage pretty much the entire time, but I didn't think much of it because it was rebuilding the cache drives. At this time I also swapped everything from using /mnt/user/appdata/ to /mnt/cache/appdata/ to eek out some extra speed. I changed every container touching appdata to this and also my docker settings pointing to /mnt/cachce/system/docker (using folder, not docker.img) and /mnt/cahce/appdata/ I rebooted the server for good measure (glad I did because I found it tried to boot from NVMe and not my flash anymore, whoops!) and then I realized my MariaDB was corrupted and NextCloud stopped working. Weird... okay. moving on, I saw my CPU was stuck at 100% usage when Docker was running. I stopped ALL containers one by one and CPU was still at 100%, which didn't make much sense, all containers in stopped state still had this problem. At a loss, I deleted my docker directory (I'm using a directory, not an img) Everything started coming back up fine after re-adding them from CA, with the exception of course of Nextcloud/MariaDB. Thought I was in the clear and it was a corrupt docker folder at first, but while troubleshooting the MariaDB/Nextcloud issue, I saw my CPU shoot back up to 100% again. Very weird. And that's where I'm stuck now... I ended up removing MariaDB for now entirely and deleting its img but still no luck. As you can see I'm stuck at 100%, but overall usage per container is very low. Diagnostics and screenshots attached below. server-diagnostics-20240307-2308.zip
  12. Excellent, thank you. And thanks for your help on my SSD trim question too. This is a direct result of your help there. Much appreciated. I'll be getting this card and some NVMe drives for cache then.
  13. Hi everyone, I have the following setup Motherboard: SuperMicro X9DRi-LN4F+ https://www.supermicro.com/manuals/motherboard/C606_602/MNL-1258.pdf CPU: 2x Intel Xeon E5-2650 v2 https://www.intel.com/content/www/us/en/products/sku/75269/intel-xeon-processor-e52650-v2-20m-cache-2-60-ghz/specifications.html SAS Card: LSI 9211-8i HBA https://docs.broadcom.com/doc/12353333 I am wondering if I buy this ASUS Hyper V2 NVMe to PCIe adapter that I'll have enough lanes to run it in 4x/4x/4x/4x bifurcation and not risk taking any lanes away from the LSI SAS card? https://www.asus.com/us/motherboards-components/motherboards/accessories/hyper-m-2-x16-card-v2/ I have already updated the BIOS to v3.4 so bifurcation is enabled/supported on my motherboard. I am seeing that each CPU has 40 lanes available and I have two CPUs so I should have plenty of lanes to spare, however I'm not super familiar with the layout of the PCIe lanes and which one links to which CPU etc. Can anyone help shed some light on this for me? How many total lanes do I get between my CPUs and my motherboard? I couldn't really find info on how many lanes are available on the mobo side. I really just want to make sure I'm not sacrificing performance or ripping lanes away from the LSI card to put in the M.2 card. Thanks in advance!
  14. They are indeed. Is not having trim an issue? I'm admittedly not very familiar with its purpose. If so, in your opinion should I get a PCIe card that holds 2x M.2 drives and return the 2.5s and get M.2s instead so they are internal and on a different controller?
  15. fstrim: /mnt/cache: FITRIM ioctl failed: Remote I/O error I had some older 2.5" Samsung EVO SSDs that had been starting to give me problems so I replaced them in Inland (MIcroCenter) enterprise SSDs just the other day. Ever since swapping them both out, I am getting the above error emailed to me on a daily basis. The new drives are connected to the same ports on the backplane and everything and I've never seen these errors before with the old drives. Any help is appreciated!