Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (โ‹ฎ) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

[SUPPORT] Nexus Orchestrator โ€“ Self-Hosted LLM Router

Featured Replies

This is the support thread for the Nexus Orchestrator.

๐Ÿ”— GitHub Repo๐Ÿ”— Docker Hub๐Ÿ”— Unraid Template


What is Nexus Orchestrator?

Nexus Orchestrator is a self-hosted orchestration layer that routes each LLM request to the best local or cloud model automatically.

This started as something I personally wanted for my own setup. I used AI tools along the way to help design, iterate, and refine parts of it โ€” but the goal and overall system came from solving my own use case. Itโ€™s not perfect, but it works.

At its core, a lightweight router model classifies each promptโ€™s intent (CODING, REASONING, CREATIVE, VISION, DOCUMENT, or GENERAL) and dispatches it to whichever model youโ€™ve configured for that category. You can freely mix local Ollama models with cloud providers in any combination.


Quick Install (Unraid)

  1. Copy the template to your flash drive: /boot/config/plugins/dockerMan/templates-user/nexus-orchestrator.xml

  2. Go to your Unraid Docker tab โ†’ Add Container

  3. Search for Nexus Orchestrator or select the template manually

  4. Set your Admin API Key (required)

  5. Set your Local Provider URL โ€” e.g. http://YOUR_SERVER_IP:11434 for Ollama. Do not use localhost, it resolves inside the container.

  6. Optionally set a cloud provider URL and API key for hybrid routing

  7. Set a Router Model โ€” something small works fine: gemma3:4b, qwen2.5:3b, or nemotron-mini:4b


100% Local Setup

  1. Set LOCAL_URL to your Ollama instance IP

  2. Set ROUTER_MODEL to a small local model

  3. In the UI under the Models tab, assign local models to each category

Nothing will leave your network.


Environment Variables

Variable

Required

Description

ADMIN_API_KEY

Yes

Password for the web UI and API

ENCRYPTION_SECRET

No

Encrypts stored data โ€” derives from Admin key if blank

LOCAL_URL

No

Local provider base URL (default: http://localhost:11434)

LOCAL_KEY

No

API key for local provider if required

CLOUD_URL

No

OpenAI-compatible base URL for cloud provider

CLOUD_API_KEY

No

API key for cloud provider

ROUTER_MODEL

No

Model used for intent classification

ROUTER_URL

No

Custom router URL โ€” defaults to LOCAL_URL if blank

ROUTER_KEY

No

API key for the router if different from local


Completed: 4/6/2026

โœ… CORS configuration
โœ… Cookie-based auth
โœ… SSRF protection
โœ… LaTeX/KaTeX rendering
โœ… Stop generation button
โœ… SQLite migration
โœ… Category Mappings cloud filter
โœ… Model fallback
โœ… Chat input UX improvements
โœ… Input validation (Zod schemas)
โœ… Rate limiting
โœ… Tests (Vitest)
โœ… Conversation pagination
โœ… Router result caching (opt-in toggle)
โœ… FAST category
โœ… SECURITY category
โœ… Error boundaries
โœ… Projects (organize chats into folders)
โœ… Multi-user support (per-user accounts, isolated config, conversations & projects)
โœ… Request queuing (per-user FIFO queue, up to 5 pending per user)
โœ… Web search via tool calling (SearXNG integration, LLM decides when to search)

Planned:
Multiple local providers (Ollama + llama-swap + llama.cpp simultaneously)
URL fetch/browse tool (companion to web search โ€” read a specific page)
Ollama backend abort


Configuration Guide

This guide covers the basic configuration needed to get Nexus Router working with local and optional cloud models.

LOCAL MODEL / API PROVIDER

This should point to your local LLM backend, usually Ollama.

Set the Provider URL to your machineโ€™s IP address
Example: http://192.168.1.100:11434

Do not use localhost, as that refers to the container itself

Leave the API Key blank for standard Ollama setups

Nexus will automatically detect Ollama and handle the connection

CLOUD MODEL / API PROVIDER (Optional)

This is for OpenAI-compatible providers such as OpenAI or OpenRouter.

Set the Provider URL
Enter your API key

If left unconfigured, any categories set to Cloud will show a warning
Local-only setups will still function normally

INTENT ROUTER

The router determines which category handles each request.

A small model is sufficient
Recommended: gemma3:4b, qwen2.5:3b

Leave the Router URL blank to reuse your local provider

Only set a custom URL if you want a separate routing endpoint

DISCOVERED MODELS

Nexus will list all models available from the Local Provider.

If models do not appear:
Verify the Provider URL is correct
Ensure the status shows Online

Models can be selected and assigned to categories from this list

CATEGORY MAPPINGS

Categories define how requests are routed.

Each category includes:
A provider (Local or Cloud)
A pool of models
A fallback order (first model is primary, others are used if it fails)

Default categories:

GENERAL โ€” general conversation
CODING โ€” programming and debugging
REASONING โ€” math, logic, and analysis
CREATIVE โ€” writing and brainstorming
VISION โ€” triggered by image input
DOCUMENT โ€” triggered by file input
FAST โ€” simple, low-latency responses
SECURITY โ€” security research and testing

Custom categories can be added from the Models tab

View Changelog here for patch notes. https://github.com/FaqFirebase/Nexus-Orchestrator/blob/master/CHANGELOG.md

Bugs are expected.

GitHub: https://github.com/FaqFirebase/Nexus-Orchestrator Docker Hub: https://hub.docker.com/r/pikkonmg/nexus-orchestrator Template repo: https://github.com/PikkonMG/unraid-docker-templates

Screenshots

nexus1.pngnexus2.png

nexus4.pngnexus3.png

Edited by PikkonMG

  • Author

Release Notes: Nexus v1.1.9

This update covers all major changes since v1.1.3. The biggest highlights are live reasoning display, multiple local provider support, a full security hardening pass, and a much cleaner UI.

Thinking / Reasoning Display (v1.1.9)

Nexus now shows live reasoning for models that support it.

For Ollama models, the server sends think: true through the native API and streams reasoning token-by-token as it generates. Models like DeepSeek R1 and QwQ that natively emit <think> tags are also supported. Reasoning appears in a collapsible purple section above the response and stays visible until you manually close it.

There are two levels of control:

  • Global default: System โ†’ Settings โ†’ Show Model Thinking (enabled by default)

  • Per-chat override: Brain icon in the chat input bar

Models that do not support thinking fall back silently to a normal response. The FAST category always skips thinking.

Multiple Local Providers (v1.1.5)

Nexus can now connect to multiple local backends at the same time, including Ollama, llama-swap, llama.cpp, LM Studio, Open WebUI, and other OpenAI-compatible endpoints.

Model discovery aggregates across all configured providers. Category assignments now store the provider URL alongside the model name so routing always hits the correct backend. Fallback chains also work across providers.

Existing single-provider setups migrate automatically with no manual action required.

Provider Compatibility Improvements (v1.1.6)

Provider health checks and model discovery now correctly handle endpoints whose base URL ends with /v1, such as llama-swap and LM Studio.

Other compatibility improvements:

  • llama-swap display names now appear correctly in the UI

  • the proper routing key is still used in API requests

  • per-attempt chat timeout increased from 60s to 300s

  • timeout remains configurable via CHAT_TIMEOUT_MS

  • model-loading retries increased to 5 attempts with 30-second intervals to better support slow model swaps

Web Search Sources (v1.1.4)

After a web search completes, Nexus now shows a collapsible Sources section below the response.

This includes:

  • title

  • URL

  • snippet

for each SearXNG result that was used.

UI Improvements (v1.1.7 and v1.1.8)

Several quality-of-life improvements landed across the interface:

  • Collapsible settings sections: All Models tab sections now collapse and expand

  • Persistent section state: Collapse state survives page refresh

  • Persistent active tab: The selected tab (Chat, Models, or System) is remembered across refreshes

  • Discovered Models redesign: The old dense card grid was replaced with a provider-grouped collapsible list

  • Better model readability: Models are grouped by source, size pills are color-coded by parameter tier, and the active router model is highlighted

  • Copy code button: Every code block now gets a hover copy button with 2-second success feedback

Security Hardening (v1.1.8)

A full server-side security audit was performed. Major improvements include:

  • CORS: Origin is now echoed explicitly, and credentials are only allowed when a matching origin is present

  • SSRF protection: Cloud metadata endpoints such as AWS IMDS, GCP metadata, and the Kubernetes API are blocked

  • LAN access preserved: Private LAN IPs remain allowed by design so local providers still work

  • Security headers: Added CSP, X-Frame-Options, X-Content-Type-Options, HSTS, Referrer-Policy, and Permissions-Policy

  • Session cleanup: Expired sessions are swept hourly

  • Session caps: Maximum 10 concurrent sessions per user, with oldest eviction on overflow

  • Password complexity: New passwords now require at least one uppercase letter, one lowercase letter, and one digit

  • Body limits: Global body limit reduced to 1 MB; chat and conversation routes keep 20 MB for vision/image use

  • API key decoupling: Changing the admin login password no longer breaks API clients using x-admin-key

  • Rate limiting: Password-change endpoint is now rate-limited alongside login protections

Docker Image Size Reduction (v1.1.9)

Production images are now smaller because build-time dependencies are excluded from the runtime image.

Image size dropped from about 127 MB to about 86 MB.

Bug Fixes Since v1.1.3

  • Fixed Ollama being misidentified as a generic OpenAI-compatible provider, which prevented reasoning/thinking from being sent

  • Fixed the localThinkingEnabled is not defined scope bug introduced during the thinking-toggle work

  • Fixed mixed content warnings on HTTPS deployments caused by a hardcoded http://localhost:11434 in the frontend bundle

  • Fixed FAST category routing so it no longer grabs prompts that actually need real answers

  • Fixed session state not being fully cleared on logout in some edge cases

Edited by PikkonMG

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions โ†’ Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.