Spitz12

Members

Joined
September 2, 20241 yr
Last visited
October 26, 2025Oct 26

View Profile Find content

Noob

Current rank (1/14)

Posts

Find content

1
Reputation
Neutral

0

The recent visitors block is disabled and is not being shown to other users.

Spitz12 started following AMD 7900 XT GPU Stops Working in Ollama + Stuck at 100% Usage After Use
- March 9, 20251 yr
AMD 7900 XT GPU Stops Working in Ollama + Stuck at 100% Usage After Use
AMD 7900 XT GPU Stops Working in Ollama + Stuck at 100% Usage After Use

Spitz12 posted a topic in General Support

Hi everyone, I’ve been running Ollama in a Docker container on Unraid (version 6.12.15) using my AMD 7900 XT GPU, but I’m encountering a frustrating issue. Initially, the GPU works fine and is recognized by Ollama. However, after some time (or after processing a certain amount of data), the GPU experiences a hard crash and Ollama defaults back to CPU usage. Additionally, after running an AI model, the GPU usage remains at 100% indefinitely, even when no processes are running. My Setup: Unraid Version: 6.12.15 Motherboard: MSI PRO Z790-P WIFI DDR4 CPU: Intel Core i7-12700K Primary GPU: AMD 7900 XT (also tested with 7800 XT, same issue) Docker Container: Ollama (ollama/ollama:rocm) AMD Vendor Reset Plugin Installed Docker Extra Parameters: --device=/dev/kfd What’s Happening: When I first start the container, the GPU works fine, and it’s recognized properly. Models load and process on the GPU as expected. After some time (or after processing a heavy model), the GPU crashes, and the container switches to CPU processing. The syslog shows errors like: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3 [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue amdgpu 0000:03:00.0: amdgpu: GPU mode1 reset amdgpu 0000:03:00.0: amdgpu: VRAM is lost due to GPU reset! After the crash, the GPU usage stays at 100% indefinitely, even when the container is stopped. I have the AMD Vendor Reset Plugin installed, but it doesn’t seem to resolve the high usage issue or prevent the crashes. Restarting the container does not fix the issue — I have to either reboot Unraid or physically remove power from the system to reset the GPU. What I’ve Tried: Passed the GPU to the container using: --device=/dev/kfd Added the --runtime=rocm flag to ensure ROCm support. Tried switching to an AMD 7800 XT to test if it was a hardware issue — same results. Attempted to use the AMD Vendor Reset Plugin, but the GPU still locks at 100% usage until I physically power off the machine. Manually stopped the container after heavy processing, but the GPU remains locked at 100% utilization. Additional Notes: This behavior happens regardless of the AI model being used in Ollama. Even after stopping the container, the GPU remains locked at 100% usage until a full system reboot. I’m wondering if this is a ROCm driver issue or something related to Unraid’s power management. My Questions: Has anyone else experienced GPU resets with AMD cards while using Ollama or other AI workloads in Docker? Is there a way to force the GPU to reset using the AMD Vendor Reset Plugin without rebooting Unraid? Is there a known fix for the GPU staying at 100% usage indefinitely after use? Should I consider switching to a different container or driver to avoid these constant crashes? I’m hoping someone here has a solution since my goal was to use the 7900 XT for heavy AI workloads, but it’s practically unusable in its current state. Any help would be greatly appreciated! syslog.txt
- March 9, 20251 yr
- 4 replies

Spitz12

Joined

Last visited

Noob

Posts

Reputation

AMD 7900 XT GPU Stops Working in Ollama + Stuck at 100% Usage After Use

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)