KingHawk Posted March 1 Share Posted March 1 (edited) I've had many problems this week so I'm looking what to do now. Short summary of what happened: - wednesday eve sudden crash of unraid (no logs); - thursday morning raid-1 ssd pool shows curruption errors and GPU falling off the bus for first time (rebooted as fast fix); - thursday afternoon again ssd pool curruption errors and GPU falling off the bus; - thursday eve updated unraid from .6 to .8, no immediate errors, after that and I've run a scrub with error fix but this didn't find any errors; - friday morning GPU falling off the bus, no corruption errors. I've just rebooted hoping the GPU would show up again but it didn't. I got a lot of errors like these (are they related?): Mar 1 14:57:30 JJ-SILVERSTONE kernel: pci 0000:06:00.0: BAR 6: failed to assign [mem size 0x00080000 pref] Mar 1 14:57:30 JJ-SILVERSTONE kernel: pci 0000:06:00.1: BAR 6: no space for [mem size 0x00080000 pref] Mar 1 14:57:30 JJ-SILVERSTONE kernel: pci 0000:06:00.1: BAR 6: failed to assign [mem size 0x00080000 pref] Mar 1 14:57:30 JJ-SILVERSTONE kernel: pci 0000:06:00.0: BAR 7: no space for [mem size 0x00100000 64bit] Mar 1 14:57:30 JJ-SILVERSTONE kernel: pci 0000:06:00.0: BAR 7: failed to assign [mem size 0x00100000 64bit] Everything was running fine after switching to the new GPU a while ago so I wonder what changed that would make it so that the GPU started falling off the bus and even stops showing up completely. Looking forward to any help in fixing these issues. jj-silverstone-diagnostics-20240301-1507.zip Edited March 1 by KingHawk Quote Link to comment
KingHawk Posted March 5 Author Share Posted March 5 (edited) I've managed to fix the BAR failed to assign warnings with the following settings in BIOS: 1) Above 4G Decoding - ENABLED 2) Re-Size BAR Support - ENABLED 3) SR-IOV Support - ENABLED 4) Hot-Plug Support - DISABLED I've also added 'pci=realloc=off' to '/boot/syslinux/syslinux.cfg' as described here: So now I'll just wait and see if everything stays stable with this setup. Edited March 5 by KingHawk Quote Link to comment
KingHawk Posted March 6 Author Share Posted March 6 Sadly still having trouble with the GPU falling off the bus after working normally for 22 hours. Mar 6 22:09:42 JJ-SILVERSTONE kernel: NVRM: GPU at PCI:0000:01:00: GPU-33d616df-a0e8-4c9a-3c11-0cad75613c6e Mar 6 22:09:42 JJ-SILVERSTONE kernel: NVRM: Xid (PCI:0000:01:00): 79, pid='<unknown>', name=<unknown>, GPU has fallen off the bus. Mar 6 22:09:42 JJ-SILVERSTONE kernel: NVRM: GPU 0000:01:00.0: GPU has fallen off the bus. Mar 6 22:09:42 JJ-SILVERSTONE kernel: NVRM: A GPU crash dump has been created. If possible, please run Mar 6 22:09:42 JJ-SILVERSTONE kernel: NVRM: nvidia-bug-report.sh as root to collect this data before Mar 6 22:09:42 JJ-SILVERSTONE kernel: NVRM: the NVIDIA kernel module is unloaded. Quote Link to comment
Solution ich777 Posted April 3 Solution Share Posted April 3 @KingHawk this is now solved in this post correct: Quote Link to comment
KingHawk Posted April 3 Author Share Posted April 3 4 hours ago, ich777 said: @KingHawk this is now solved in this post correct: Yes it is but I didn't mark it or add additional info to this post because I thought I deleted it 😮 1 Quote Link to comment
ich777 Posted April 3 Share Posted April 3 4 hours ago, KingHawk said: Yes it is but I didn't mark it or add additional info to this post because I thought I deleted it 😮 Just mark it as solved. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.