Alz7777

Members
  • Posts

    9
  • Joined

  • Last visited

Alz7777's Achievements

Noob

Noob (1/14)

1

Reputation

  1. I'm not convinced this is closed. I've been having these hangs with an i9 9900 on version 6.11.5. Jan 28 21:20:00 Skippy2 kernel: i915 0000:00:02.0: [drm] Resetting rcs0 for CS error Jan 28 21:20:00 Skippy2 kernel: i915 0000:00:02.0: [drm] frigate.detecto[2023] context reset due to GPU hang Jan 28 21:20:00 Skippy2 kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 9:1:8ed1fff2, in frigate.detecto [2023] this is causing frigate to stop working and restart. Happens every hour or so.
  2. So I've got an odd one. I use iGPU for PLEX and it's being set up via direct path to /dev/dri/renderD128 in the Preferences.xml. My machine also has 2 x 3060 Ti which I use to transcode my library from h264 to h265 once in a while and play games. The problem is the following. I need nvidia-driver to be installed in order to use tdarr transcode, but whenever the driver is installed, PLEX (docker) ignores my iGPU and tries to transcode using nvidia, then obviously fails since no nvidia cards are being passed to it and falls back to software transcoding. If I uninstall the nvidia driver and reboot, PLEX works as intended using the iGPU. I am NOT binding the nvidia GPUs to VFIO at boot in order for me to play games. The way I use the server is as follows. 1. unraid sees both nvidia gpus and I use them in tdarr docker containers 2. when I start a VM with one of them passed through, the VM takes over and the GPU is not being seen anymore by the unraid 3. when I stop the VM, GPU is again being seen by unraid and can be used normally 4. iGPU is only used for PLEX I tried to force PLEX to use the iGPU, everything is ignored. It tries to use cuda hwaccel every time no matter how I modify Preferences.xml or pass the /dev/dri to it. Some reddit posts suggested to delete codecs, which I did but still no joy. Any ideas appreciated. Thanks
  3. @philbar715 what's your plex version if you use it that is. I've tried this strategy with monitor plugged in but the issue still happens for me. What I've observed is that 2 days might be a false positive, you need to hit it badly, e.g. try a few transcodes at a time and it will happen faster. Thanks
  4. One thing that fixed this for me was disabling C-state in the BIOS. I'm using a dummy plug for igpu. Not great but at least no crashes.
  5. it will solve the hangs but it won't solve the hardware transcoding problem for 12th gen
  6. So from my findings I think the problem is not 100% kernel related. Found this very helpful thread on media driver repo: https://github.com/intel/media-driver/issues/1342 which explains (if you're patient enough to read it) that the system hang is solved by kernel 5.17+ but the actual hardware transcoding crash is related to ffmpeg (https://github.com/intel/media-driver/issues/1342#issuecomment-1106171903). Plex doesn't push anything in syslog for me but I've tried emby and managed to get a log identical with the one explained in that thread. Jun 1 15:02:41 Skippy kernel: i915 0000:00:02.0: [drm] Resetting vcs0 for preemption time out Jun 1 15:02:41 Skippy kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:4:28fffffd, in ffmpeg [16514] Jun 1 15:02:49 Skippy kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:4:28fffffd, in ffmpeg [16514] Jun 1 15:02:49 Skippy kernel: i915 0000:00:02.0: [drm] Resetting vcs0 for stopped heartbeat on vcs0 Jun 1 15:02:49 Skippy kernel: i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on vcs0 Jun 1 15:02:49 Skippy elogind-uaccess-command[17064]: Failed to reset ACL on /dev/dri/card0: Operation not supported Jun 1 15:02:49 Skippy elogind-uaccess-command[17065]: Failed to reset ACL on /dev/dri/card0: Operation not supported Jun 1 15:02:49 Skippy kernel: [drm:intel_gt_reset [i915]] *ERROR* Failed to reset GuC, ret = -110 Jun 1 15:02:49 Skippy kernel: i915 0000:00:02.0: [drm] *ERROR* Failed to reset chip Jun 1 15:02:49 Skippy kernel: i915 0000:00:02.0: [drm:intel_gt_reset [i915]] CI tainted:0x9 by intel_gt_handle_error+0x343/0x530 [i915] Jun 1 15:02:49 Skippy kernel: [drm:__intel_gt_set_wedged [i915]] *ERROR* Failed to reset GuC, ret = -110 Jun 1 15:02:49 Skippy kernel: i915 0000:00:02.0: [drm] ffmpeg[16514] context reset due to GPU hang Jun 1 15:02:52 Skippy ntpd[1407]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized Jun 1 15:02:54 Skippy kernel: Fence expiration time out i915-0000:00:02.0:ffmpeg[16514]:3aa2! Jun 1 15:04:27 Skippy kernel: i915 0000:00:02.0: [drm] *ERROR* rcs0 TLB invalidation did not complete in 4ms! Jun 1 15:04:27 Skippy kernel: i915 0000:00:02.0: [drm] *ERROR* bcs0 TLB invalidation did not complete in 4ms! Jun 1 15:04:27 Skippy kernel: i915 0000:00:02.0: [drm] *ERROR* rcs0 TLB invalidation did not complete in 4ms! Jun 1 15:04:27 Skippy kernel: i915 0000:00:02.0: [drm] *ERROR* bcs0 TLB invalidation did not complete in 4ms! Jun 1 15:04:27 Skippy kernel: i915 0000:00:02.0: [drm] *ERROR* rcs0 TLB invalidation did not complete in 4ms! Jun 1 15:04:27 Skippy kernel: i915 0000:00:02.0: [drm] *ERROR* bcs0 TLB invalidation did not complete in 4ms! Jun 1 15:04:27 Skippy kernel: i915 0000:00:02.0: [drm] *ERROR* rcs0 TLB invalidation did not complete in 4ms! Jun 1 15:04:27 Skippy kernel: i915 0000:00:02.0: [drm] *ERROR* bcs0 TLB invalidation did not complete in 4ms! Jun 1 15:04:27 Skippy kernel: i915 0000:00:02.0: [drm] *ERROR* rcs0 TLB invalidation did not complete in 4ms! Jun 1 15:04:27 Skippy kernel: i915 0000:00:02.0: [drm] *ERROR* bcs0 TLB invalidation did not complete in 4ms! Although plex has proprietary transcoding, that's still based on ffmpeg so my thinking is that the same behaviour is happening. So luckily by upgrading unraid kernel to 5.18 I've solved the freeze problem, the actual hardware transcoding stops working with the above error and I think we need to wait until plex applies the ffmpeg fix into their own server. I'm using I7 12700k, unraid 6.10.1 with the unofficial 5.18 kernel
  7. Got 12700k with Asus mobo, shutting down win10 VM with 3060ti passed through freezes unraid with no error after some time, usually 1-2 hours, not immediately so upgrading to 5.18 didn't change this for me (same happens with old kernels, tried 5.10 with unraid 6.9.2 and 5.15 with unraid 6.10). Somehow it freezes faster now than before, with older kernels it was taking ~5-6 hours until freeze
  8. Alz7777

    Hardware Error

    Same problem and my server freezes every 12-24h and I have to long push the button to stop and restart. Initially I thought that the record identifier section is unique but after I saw this post, I can confirm it's the same UUID value for me too. I've got a 12700k and asus prime z690m, 32GB RAM ballistix; I'm a bit worried every restart will mess up my disks little by little.