March 12Mar 12 This is the support thread for the NVIDIA NIM Single Unraid template. Made this for my personal use so I could use NVIDIA-NIM with Unraid and my AnythingLLM container.GitHub RepoTemplateReadmeThis guide explains how to run NVIDIA NIM containers on Unraid using a consumer NVIDIA GPU.NIM provides optimized inference servers with an OpenAI-compatible API, making it easy to connect tools like AnythingLLM or Open WebUI.Tested environment:RTX 3060 12 GBUnraid 6.12+NIM 1.10.1PrerequisitesYou will need the following:Unraid 6.12 or laterNVIDIA GPU (Turing architecture or newer)Examples: RTX 20xx, RTX 30xx, RTX 40xxNVIDIA drivers installed in UnraidCommunity Applications → NerdTools or GPU Statistics pluginFree NVIDIA NGC accounthttps://build.nvidia.comNGC API key generated from your NGC dashboardModel SelectionNIM uses pre-optimized engine profiles, which are primarily designed for data center GPUs.Consumer GPUs require smaller models and reduced context windows.Example modelsModelVRAM RequiredFits 12 GB GPUmeta/llama-3.2-3b-instruct~6 GB✅ Recommendedmicrosoft/phi-3-mini-4k-instruct~8 GB✅ Yesnvidia/Llama-3.1-Nemotron-Nano-4B-v1.1~10 GB✅ Yesmistralai/mistral-7b-instruct-v0.3~14 GB fp16❌ OOMmeta/llama-3.1-8b-instruct~22 GB bf16❌ OOMmeta/llama-3.1-70b-instruct~80 GB❌ Multi-GPUIf you want to run 7B+ models on a 12GB GPU, consider Ollama, which supports quantized weights.NGC Registry Login⚠️ This must be done before Unraid can pull NIM images.NIM images are hosted on NVIDIA's private registry (nvcr.io), not Docker Hub.Run this one time in the Unraid terminal:docker login nvcr.ioLogin credentials:Username: $oauthtoken Password: YOUR_NGC_API_KEY⚠️ Important:docker login → allows pulling the container imageNGC_API_KEY → allows downloading model weights at runtimeBoth are required.Cache Directory PermissionsBefore starting the container, create the cache directory with the correct permissions.NIM runs inside the container as UID/GID 1000.If the cache directory is owned by root, the container will fail to start.Run:chown -R 1000:1000 /mnt/user/appdata/nvidia-nim/cache chmod 775 /mnt/user/appdata/nvidia-nim/cacheEnvironment VariablesVariableValueNotesNGC_API_KEYyour_ngc_api_keyRequired. Used to download model weightsNIM_MODEL_NAMEmeta/llama-3.2-3b-instructMust match the image tagNIM_MAX_MODEL_LEN16384Required for consumer GPUsNIM_CACHE_PATH/opt/nim/.cacheCache directoryCUDA_VISIBLE_DEVICES0Use 0 for single GPUPYTORCH_CUDA_ALLOC_CONFexpandable_segments:TrueReduces memory fragmentationFirst RunOn the first startup, NIM downloads the model weights to the cache directory.Example size:~6 GB for llama-3.2-3bThis can take several minutes depending on your internet connection.You can verify the container is running with:curl http://localhost:8000/v1/modelsConnecting ClientsNIM exposes an OpenAI-compatible API, so most AI clients work out of the box.Connection settingsSettingValueDocshttp://[unraid-ip]:8000/docsBase URLhttp://[unraid-ip]:8000/v1API KeyAny non-empty string (e.g. nim)Modelmeta/llama-3.2-3b-instructCompatible clientsAnythingLLMOpen WebUILangChainLlamaIndexCursor (custom OpenAI base URL)Any application with configurable OpenAI endpointsSwitching ModelsCurrently the template uses model-specific container images.To switch models:Stop the containerChange the Repository fieldExample:nvcr.io/nim/microsoft/phi-3-mini-4k-instruct:latestUpdate the model variable:NIM_MODEL_NAME=microsoft/phi-3-mini-4k-instructStart the containerTo run multiple models, create additional containers on different ports:8000 8001 8002They can share the same cache directory — weights will not be duplicated.TroubleshootingCache Permission ErrorIf the container fails with:PermissionError: [Errno 13] Permission denied: '/opt/nim/.cache/local_cache'Run:chown -R 1000:1000 /mnt/user/appdata/nvidia-nim/cache chmod 775 /mnt/user/appdata/nvidia-nim/cacheKV Cache Size ErrorExample error:ValueError: The model's max seq len (131072) is larger than the maximum number of tokens that can be stored in KV cacheFix:NIM_MAX_MODEL_LEN=16384If needed, try:8192Consumer GPUs cannot handle the full context window used by data center profiles.Common ErrorsErrorCauseFix401 UnauthorizedNot logged into nvcr.ioRun docker login nvcr.ioValueError: invalid literal 'all'CUDA_VISIBLE_DEVICES=allChange to 0PermissionError on .cacheWrong permissionsFix cache directory permissionsmax seq len > KV cacheContext window too largeSet NIM_MAX_MODEL_LEN=16384CUDA out of memoryModel too largeUse a smaller modelNo compatible profilesGPU too oldRequires RTX 20xx or newernvfp4 unsupported warningConsumer GPU limitationSafe to ignoreXML TemplateThis repository includes an Unraid Community Applications-compatible template:nvidia-nim-single.xmlTo install manually:/boot/config/plugins/dockerMan/templates-user/After copying the file there, it will appear in the Unraid Docker template list.Uploading Attachment...Uploading Attachment... Edited March 13Mar 13 by PikkonMG
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.