Jump to content

GPU fails showing up after reboot


Go to solution Solved by ich777,

Recommended Posts

I've had many problems this week so I'm looking what to do now.
Short summary of what happened:

- wednesday eve sudden crash of unraid (no logs);

- thursday morning raid-1 ssd pool shows curruption errors and GPU falling off the bus for first time (rebooted as fast fix);

- thursday afternoon again ssd pool curruption errors and GPU falling off the bus;

- thursday eve updated unraid from .6 to .8, no immediate errors, after that and I've run a scrub with error fix but this didn't find any errors;

- friday morning GPU falling off the bus, no corruption errors.

I've just rebooted hoping the GPU would show up again but it didn't.
I got a lot of errors like these (are they related?):
 

Mar  1 14:57:30 JJ-SILVERSTONE kernel: pci 0000:06:00.0: BAR 6: failed to assign [mem size 0x00080000 pref]
Mar  1 14:57:30 JJ-SILVERSTONE kernel: pci 0000:06:00.1: BAR 6: no space for [mem size 0x00080000 pref]
Mar  1 14:57:30 JJ-SILVERSTONE kernel: pci 0000:06:00.1: BAR 6: failed to assign [mem size 0x00080000 pref]
Mar  1 14:57:30 JJ-SILVERSTONE kernel: pci 0000:06:00.0: BAR 7: no space for [mem size 0x00100000 64bit]
Mar  1 14:57:30 JJ-SILVERSTONE kernel: pci 0000:06:00.0: BAR 7: failed to assign [mem size 0x00100000 64bit]


Everything was running fine after switching to the new GPU a while ago so I wonder what changed that would make it so that the GPU started falling off the bus and even stops showing up completely.
Looking forward to any help in fixing these issues.

jj-silverstone-diagnostics-20240301-1507.zip

Edited by KingHawk
Link to comment
Posted (edited)

I've managed to fix the BAR failed to assign warnings with the following settings in BIOS:
1) Above 4G Decoding - ENABLED
2) Re-Size BAR Support - ENABLED
3) SR-IOV Support - ENABLED
4) Hot-Plug Support - DISABLED
 

I've also added 'pci=realloc=off' to '/boot/syslinux/syslinux.cfg' as described here:


So now I'll just wait and see if everything stays stable with this setup.

Edited by KingHawk
Link to comment

Sadly still having trouble with the GPU falling off the bus after working normally for 22 hours.
 

Mar  6 22:09:42 JJ-SILVERSTONE kernel: NVRM: GPU at PCI:0000:01:00: GPU-33d616df-a0e8-4c9a-3c11-0cad75613c6e
Mar  6 22:09:42 JJ-SILVERSTONE kernel: NVRM: Xid (PCI:0000:01:00): 79, pid='<unknown>', name=<unknown>, GPU has fallen off the bus.
Mar  6 22:09:42 JJ-SILVERSTONE kernel: NVRM: GPU 0000:01:00.0: GPU has fallen off the bus.
Mar  6 22:09:42 JJ-SILVERSTONE kernel: NVRM: A GPU crash dump has been created. If possible, please run
Mar  6 22:09:42 JJ-SILVERSTONE kernel: NVRM: nvidia-bug-report.sh as root to collect this data before
Mar  6 22:09:42 JJ-SILVERSTONE kernel: NVRM: the NVIDIA kernel module is unloaded.

 

Link to comment
  • 4 weeks later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...