March 27, 20233 yr Mandatory info: Version: 6.12.0-rc2 Dumped all plugins that were causing display issues. Tried to set up NVidia in plex again since it seemed to not work after update (Most likely I messed it up somehow) Tried dropping back to Intel iGPU instead (Also doesn't work), now Nvidia card is no longer seen. One core (c12 & ht13) are always pegged at 100% no matter what. Litany of errors that are repeated but I am entirely unsure how to approach fixing after much Google-Fu. Attached are my Diags. Thanks in advance all. tower-diagnostics-20230327-1625.zip
March 28, 20233 yr Author After another reboot it sees the Nvidia card now. Sadly the rest is unchanged. So, progress? lol
March 28, 20233 yr Mar 27 01:09:02 Tower kernel: pcieport 0000:00:01.0: AER: Uncorrected (Non-Fatal) error received: 0000:00:01.0 Mar 27 01:09:02 Tower kernel: pcieport 0000:00:01.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID) Mar 27 01:09:02 Tower kernel: pcieport 0000:00:01.0: device [8086:460d] error status/mask=00100000/00010000 Mar 27 01:09:02 Tower kernel: pcieport 0000:00:01.0: [20] UnsupReq (First) Mar 27 01:09:02 Tower kernel: pcieport 0000:00:01.0: AER: TLP Header: 34000000 01000010 00000000 00000000 Mar 27 01:09:02 Tower kernel: nvidia 0000:01:00.0: AER: can't recover (no error_detected callback) Mar 27 01:09:02 Tower kernel: pci 0000:01:00.1: AER: can't recover (no error_detected callback) Mar 27 01:09:02 Tower kernel: pcieport 0000:00:01.0: AER: device recovery failed Mar 27 01:09:03 Tower kernel: NVRM: GPU at PCI:0000:01:00: GPU-95ce5503-9b80-13d9-e9ce-953408e02ec4 Mar 27 01:09:03 Tower kernel: NVRM: Xid (PCI:0000:01:00): 79, pid='<unknown>', name=<unknown>, GPU has fallen off the bus. Mar 27 01:09:03 Tower kernel: NVRM: GPU 0000:01:00.0: GPU has fallen off the bus. Mar 27 01:09:03 Tower kernel: NVRM: A GPU crash dump has been created. If possible, please run Mar 27 01:09:03 Tower kernel: NVRM: nvidia-bug-report.sh as root to collect this data before Mar 27 01:09:03 Tower kernel: NVRM: the NVIDIA kernel module is unloaded. What is the current problem? Only issue I see in the previous logs is the Nvidia driver crashing.
March 28, 20233 yr Author I will check again. Thank you Jorge. tower-syslog-20230328-2032.zip Edited March 28, 20233 yr by LordShad0w syslog
March 29, 20233 yr Author Any ideas and should I be worried that a device in my server is failing? Thanks tower-syslog-20230329-1054.zip
March 29, 20233 yr You can try disabling ASPM to see if it helps with the PCIe errors: https://forums.unraid.net/topic/118286-nvme-drives-throwing-errors-filling-logs-instantly-how-to-resolve/?do=findComment&comment=1165009
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.