Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Spitz12

Members
  • Joined

  • Last visited

  1. Hi everyone, I’ve been running Ollama in a Docker container on Unraid (version 6.12.15) using my AMD 7900 XT GPU, but I’m encountering a frustrating issue. Initially, the GPU works fine and is recognized by Ollama. However, after some time (or after processing a certain amount of data), the GPU experiences a hard crash and Ollama defaults back to CPU usage. Additionally, after running an AI model, the GPU usage remains at 100% indefinitely, even when no processes are running. My Setup: Unraid Version: 6.12.15 Motherboard: MSI PRO Z790-P WIFI DDR4 CPU: Intel Core i7-12700K Primary GPU: AMD 7900 XT (also tested with 7800 XT, same issue) Docker Container: Ollama (ollama/ollama:rocm) AMD Vendor Reset Plugin Installed Docker Extra Parameters: --device=/dev/kfd What’s Happening: When I first start the container, the GPU works fine, and it’s recognized properly. Models load and process on the GPU as expected. After some time (or after processing a heavy model), the GPU crashes, and the container switches to CPU processing. The syslog shows errors like: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3 [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue amdgpu 0000:03:00.0: amdgpu: GPU mode1 reset amdgpu 0000:03:00.0: amdgpu: VRAM is lost due to GPU reset! After the crash, the GPU usage stays at 100% indefinitely, even when the container is stopped. I have the AMD Vendor Reset Plugin installed, but it doesn’t seem to resolve the high usage issue or prevent the crashes. Restarting the container does not fix the issue — I have to either reboot Unraid or physically remove power from the system to reset the GPU. What I’ve Tried: Passed the GPU to the container using: --device=/dev/kfd Added the --runtime=rocm flag to ensure ROCm support. Tried switching to an AMD 7800 XT to test if it was a hardware issue — same results. Attempted to use the AMD Vendor Reset Plugin, but the GPU still locks at 100% usage until I physically power off the machine. Manually stopped the container after heavy processing, but the GPU remains locked at 100% utilization. Additional Notes: This behavior happens regardless of the AI model being used in Ollama. Even after stopping the container, the GPU remains locked at 100% usage until a full system reboot. I’m wondering if this is a ROCm driver issue or something related to Unraid’s power management. My Questions: Has anyone else experienced GPU resets with AMD cards while using Ollama or other AI workloads in Docker? Is there a way to force the GPU to reset using the AMD Vendor Reset Plugin without rebooting Unraid? Is there a known fix for the GPU staying at 100% usage indefinitely after use? Should I consider switching to a different container or driver to avoid these constant crashes? I’m hoping someone here has a solution since my goal was to use the 7900 XT for heavy AI workloads, but it’s practically unusable in its current state. Any help would be greatly appreciated! syslog.txt

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.