I have removed the nvidia.conf file, I was planning to use the open source drivers in the future. I didn't think it would hurt leaving it in there.
I just changed it to CSM boot and it's working good, I am on the latest BIOS version, I have already enabled 4G decoding and Resizeable bar.
When the crashes happened, TDARR was not transcoding any videos and was not using the GPU at all. I just had another crash now, the TDARR docker was launched but not doing anything. The GPU utilization was 0. Should I still try to stop the TDARR docker and test?
It would be difficult, but I could try in a while from now. I was hoping I can passthrough the GPU into a Windows 10 VM and stress test it that way. The load on the GPU has been relatively low at 20% utilization max I've ever seen, and maybe maximum 80W I've ever seen too. Dumb question, but even if this was caused by the GPU and the GPU went haywire, wouldn't at most the dockers using the GPU crash, like TDARR? Why would the kernel crash as the kernel does/shouldn't rely on the GPU as it's running in headless mode?
During the crash that happened again, I stopped all docker containers and attempted to do a soft shutdown using the `powerdown` command and the `poweroff` command but nothing happened after waiting 15 minutes. I'm not good at Linux so I had to do a hard reboot.
I have a new diagnostics log for you, taken right after the crash.
dragon-diagnostics-20230302-0307.zip