Jump to content

6.12.0 System hangs on boot after update


Go to solution Solved by JorgeB,

Recommended Posts

This morning, I updated to 6.12.0. After rebooting the server, the boot terminal gets to the following line and then hangs (the cursor stops blinking):

NET: Registered PF_UNIX/PF_LOCAL protocol family

 

I have tried booting into safe mode/no GUI, it always hangs on the same line.

 

My processor is a 9th Gen Intel (not 11th Gen, which I understand has issues). And the BIOS config hasn't been changed since I performed the updated.

 

Does anyone have any advice on how to proceed? Let me know if some other info/diagnostics would help to diagnose.

 

Thanks

Link to comment
17 minutes ago, JorgeB said:

Create a new flash drive with a stock v6.12.0 install, no key needed, and see if it boots, to check if the issue is config related.

 

Thanks, I created a fresh install on a different USB stick and that worked fine. How would you suggest to proceed? Is there a way of figuring out what config setting is causing the issue?

 

Link to comment
  • Solution

Backup current flash drive, then re-create a new install and restore the bare minimum, like the key, super.dat and pools folder for the assignments, user-templates docker folder, then re-test, if it still boots, and it should you can then reconfigure the rest of the server or try restoring a few config files at a time to see if you can find the culprit.

  • Thanks 1
Link to comment
1 hour ago, JorgeB said:

Backup current flash drive, then re-create a new install and restore the bare minimum, like the key, super.dat and pools folder for the assignments, user-templates docker folder, then re-test, if it still boots, and it should you can then reconfigure the rest of the server or try restoring a few config files at a time to see if you can find the culprit.

Success!

 

After a lot of trial and error, I narrowed it down to one file. The culprit was vfio-pci.cfg. I haven't yet looked into what this is or why this caused issues - any idea? I haven't had the server up and running long, but nothing yet seems to have broken without this file.

 

In case someone has the same problem, my solution was to back up my original USB stick, do a fresh install on that stick, copy over the contents of the backed-up config folder WITHOUT vfio-pci.cfg. It's possible that purely just deleting this file from the original USB setup might also solve the issue - so I'd probably try that first.

 

Thanks for your help.

  • Like 2
  • Thanks 1
Link to comment
21 minutes ago, elephantintheroom said:

After a lot of trial and error, I narrowed it down to one file. The culprit was vfio-pci.cfg.

It is worth noting that hardware IDs can change after an update, so it is always possible that the contents of this file are no longer valid if you are passing hardware through to a VM.

Link to comment
  • 2 weeks later...

I wouldn't say "highly likely", but "possible" : )

 

The thing is, the system does try to prevent problems here.

 

For each piece of hardware that you want to bind to vfio for passthrough to a VM, the vfio-pci.cfg config file includes both the Vendor ID (8086:125c in the example below) and the PCI ID (0000:04:00.0)

BIND=0000:04:00.0|8086:125c

 

It is the PCI ID that can change when you move hardware around (or possibly after an OS upgrade).

 

If the system can't find a piece of hardware with the specified Vendor ID at the specified PCI ID, then it simply skips that entry in the config file (so it is not bound to vfio)

 

In theory, skipping a piece of hardware that moved to a new PCI ID shouldn't cause any issues.  Deleting vfio-pci.cfg means skipping everything, and that works. But it seems that with certain hardware, if one item from the config is skipped and another isn't, that prevents the system from booting?

 

I'm not sure how we could improve on this, but am certainly open to suggestions.

Link to comment
5 hours ago, ljm42 said:

I wouldn't say "highly likely", but "possible" : )

 

I agree that "Highly likely" is probably an overstatement :)   Perhaps the best thing in the short term is to merely add a warning that this can happen to the release notes?

 

I noticed that the 6.12.2 release did not seem to have the checkbox to confirm you have read the release notes?  Is this intentional as I thought it was a good idea?

Link to comment
2 hours ago, itimpi said:

I noticed that the 6.12.2 release did not seem to have the checkbox to confirm you have read the release notes?  Is this intentional as I thought it was a good idea?

 

This is new in 6.12.2 and only becomes visible when upgrading to the next version, e.g. 6.12.3.

 

Link to comment
  • 3 weeks later...

I did a BIOS update on my ASUS motherboard and UnRaid got stuck at the exact network message. Turns out the same culprit is vfio-pci.cfg as well. Guess IOMMU groups really got messed up after the BIOS flash. Found this post and fixed it under 10 minutes. Thank you all.

Link to comment
10 minutes ago, achent said:

Turns out the same culprit is vfio-pci.cfg as well. Guess IOMMU groups really got messed up after the BIOS flash

Any time you make an OS update, BIOS update or hardware change it is a good idea to assume that the hardware IDs could change so that the contents of the vfio-pci.cfg file are no longer correct.

Link to comment
  • 4 weeks later...
  • 1 month later...

Hello, I did a Bios update on my server and installed a new graphics card. As soon as I try to activate the graphics card under Tools / System Devices to pass it through for a VM, the system wants a reboot. During this reboot, the server always hangs at the said point.
How can I fix this?

 

Thanks 

Link to comment
  • 3 weeks later...
On 6/18/2023 at 5:05 PM, elephantintheroom said:

Success!

 

After a lot of trial and error, I narrowed it down to one file. The culprit was vfio-pci.cfg. I haven't yet looked into what this is or why this caused issues - any idea? I haven't had the server up and running long, but nothing yet seems to have broken without this file.

 

In case someone has the same problem, my solution was to back up my original USB stick, do a fresh install on that stick, copy over the contents of the backed-up config folder WITHOUT vfio-pci.cfg. It's possible that purely just deleting this file from the original USB setup might also solve the issue - so I'd probably try that first.

 

Thanks for your help.

Thank you kind stranger. I was already losing my mind. I only get like 1 hour to tinker while the kid sleeps. Thanks to this thread it was actually a 2 minute fix. Renamed the vfio-pci.cfg and my server booted again. All great. THANK YOU!

Link to comment
  • 2 weeks later...
On 10/8/2023 at 6:54 AM, wastlfux said:

Thank you kind stranger. I was already losing my mind. I only get like 1 hour to tinker while the kid sleeps. Thanks to this thread it was actually a 2 minute fix. Renamed the vfio-pci.cfg and my server booted again. All great. THANK YOU!

What did you rename it? I just upgraded from 6.11.5 to 6.12.4 and it's been rebooting for 1250 seconds. 

Link to comment
  • 5 months later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...