LimesKey

Members
  • Posts

    7
  • Joined

  • Last visited

LimesKey's Achievements

Noob

Noob (1/14)

0

Reputation

  1. Any fix? I'm getting a very similar call trace
  2. I have syslog mirror to Unraid not flash, will that work? Actually the last time it crashed I put the GPU on another 500W PSU, so I have 2 x 500W PSUs, 1 is for the CPU and 2 HDDs, the other is for the GPU + another 2 HDDs. In a bit I will get a real server PSU with 1000w+ that would be a lot more efficient. It's a little janky but the GPU barely uses more than 70W while transcoding and overall it should not exceed the 500W of my PSU even if everything was on 1 PSU. Both PSUs are pretty-good quality corsair I believe and they are on a UPS.
  3. Had another crash again, this time it was when I know for sure there was lots of transcoding happening with TDARR. Perhaps then it does only crash when the GPU is being utilized? Yes I changed it to MACVLAN and noticed no difference and the crash still occurred. No reason, was just interested in it. Is there anything else I can try, can you link me to some general troubleshooting tips? Sorry for late reply, I've been busy.
  4. I have removed the nvidia.conf file, I was planning to use the open source drivers in the future. I didn't think it would hurt leaving it in there. I just changed it to CSM boot and it's working good, I am on the latest BIOS version, I have already enabled 4G decoding and Resizeable bar. When the crashes happened, TDARR was not transcoding any videos and was not using the GPU at all. I just had another crash now, the TDARR docker was launched but not doing anything. The GPU utilization was 0. Should I still try to stop the TDARR docker and test? It would be difficult, but I could try in a while from now. I was hoping I can passthrough the GPU into a Windows 10 VM and stress test it that way. The load on the GPU has been relatively low at 20% utilization max I've ever seen, and maybe maximum 80W I've ever seen too. Dumb question, but even if this was caused by the GPU and the GPU went haywire, wouldn't at most the dockers using the GPU crash, like TDARR? Why would the kernel crash as the kernel does/shouldn't rely on the GPU as it's running in headless mode? During the crash that happened again, I stopped all docker containers and attempted to do a soft shutdown using the `powerdown` command and the `poweroff` command but nothing happened after waiting 15 minutes. I'm not good at Linux so I had to do a hard reboot. I have a new diagnostics log for you, taken right after the crash. dragon-diagnostics-20230302-0307.zip
  5. I'm receiving some crashes from presumably the NVIDIA driver plugin in Unraid. I've run a short memtest already for about 24 hours, I can run one for longer if required. Pastebin for the entire crash is here. I did some research already and wasn't able to find much about this. My RAM is not ECC which I know is problematic but I'm hoping it's not a RAM issue. Feb 19 21:53:17 Dragon kernel: BUG: unable to handle page fault for address: ffffc90011110200 Feb 19 21:53:17 Dragon kernel: #PF: supervisor write access in kernel mode Feb 19 21:53:17 Dragon kernel: #PF: error_code(0x0002) - not-present page Feb 19 21:53:17 Dragon kernel: PGD 100000067 P4D 100000067 PUD 100180067 PMD 0 Feb 19 21:53:17 Dragon kernel: Oops: 0002 [#1] PREEMPT SMP NOPTI Feb 19 21:53:17 Dragon kernel: CPU: 5 PID: 9577 Comm: nvidia-smi Tainted: P O 5.19.17-Unraid #2 Feb 19 21:53:17 Dragon kernel: Hardware name: System manufacturer System Product Name/ROG STRIX B450-F GAMING II, BIOS 5003 02/03/2023 I am on the latest version of the plugin, and have NVIDIA driver 525.89.02 installed on my 2070 Super GPU. I rarely use the GPU and only use it for TDARR. The diagnostic below are from right now and the crash happened a week ago so I'm not sure how reliable they will be. dragon-diagnostics-20230227-2250.zip
  6. I cannot seem to get it working, I think it's trying to connect to a IPV6 Node. I don't have IPV6 supported on my network, how do I turn off IPV6? connect() to [2002:c0a8:90:1:4d34:458a:178c:eb95]:22556 failed: Cannot assign requested address (99)