Hard crash after upgrading from 6.5.3


joelones

Recommended Posts

Just upgraded to 6.7.2 and I'm experiencing hard random crashes (system unresponsive & always within first 5minutes of boot) and tried to tail the syslog to catch culprit, to no avail. Running the libreELEC dvb build along with two VMs (passing in a quad card NIC to pfsense, and a USB card to Windows along with a GPU).

 

I tried setting "PCIe ACS override: to both" from Downstream, previous setting on 6.5.3 was set to "Yes". Has been rock stable on 6.5.3.

 

I'm running an old AMD FX8320 cpu and ASRock 970 EXTREME4 motherboard. No further BIOS updates for this mobo since 2015.

 

This is my append: append vfio-pci.ids=1b73:1100 pcie_acs_override=downstream initrd=/bzroot

Both pci.ids corresponds to the NIC & USB, and I've got a WinTV HVR-1600  card which I'm not passing through.

 

Any help is much appreciated.

 

Edited by joelones
Link to comment
  • 1 month later...

Today, I also tried with 6.8rc5, same thing. Did memtest and also swapped RAM, so not the issue.

 

Also, I disabled ACS override, same deal. Perhaps a hardware conflict of some sort. PCI slots are pretty much occupied (https://imgur.com/a/8PNE6Co) except for the 2 PCI x1 slots.

 

So the rundown of cards:

  1. PCI Express 2.0 x1 (unoccupied)
  2. PCI Express 2.0 x16 (Radeon HD 6450)
  3. PCI (Dual NIC Intel Corporation 82546EB Gigabit Ethernet Controller)
  4. PCI Express 2.0 x1 (unoccupied)
  5. PCI Express 2.0 x16 (LSI SAS2008)
  6. PCI slot (WinTV HVR-1600)
  7. PCI Express 2.0 x16 (Quad NIC Intel Corporation 82571EB/82571GB Gigabit Ethernet Controller)

The attached diagnostics are from 6.8rc5.

 

To note: As for networking, I'm running a Dual, Quad and also using the on-board NIC. Running two VMs, and only passing the Quad NIC to pfSense. Multiple dockers, include tvheadend using the WinTV HVR-1600 bare metal.

 

Any insights would be greatly appreciated. At this point, 6.5.3 has been rock solid but can't upgrade to any version greater.

@limetech

 

 

Edited by joelones
Link to comment

When a problem is complex I try to break it down into more manageable pieces, so what I would be doing is looking for stability in just the basic NAS function of Unraid. In other words, I'd stop all VMs and docker containers and see if I still get random crashes. Then add features back in one at a time to try to find the one that causes the problem.

Link to comment
1 hour ago, John_M said:

When a problem is complex I try to break it down into more manageable pieces, so what I would be doing is looking for stability in just the basic NAS function of Unraid. In other words, I'd stop all VMs and docker containers and see if I still get random crashes. Then add features back in one at a time to try to find the one that causes the problem.

Right, thanks for the reply, exactly what I tried to do short of removing hardware. I stopped both docker and VM managers and I believe I only noticed the freezing when I brought back up the pfSense VM with the passed through quad nic.

Link to comment
7 minutes ago, John_M said:

Maybe you've located the problem. Is there a configuration problem with the quad NIC passthrough? How do you specify the four individual NICs - has something changed with the upgrade that makes your method invalid?

Not individual NICs, it's a quad NIC ( single card ) passed-through via the append command boot argument (vfio-pci.ids=1b73:1100) to the pfSense VM. The passthrough seems to work as it did prior to the upgrade, the VM boots with quad NIC as it did before. Nothing has changed really. I don't think I'm gonna get to the bottom of this unless I start swapping removing hardware and observing whether it crashes it or not. Appears to be software / kernel related. Something about this config that bothers newer kernels, just frustrating that it freezes without the slightest of indication of what it could be in the log.

Link to comment

Since you upgraded from 6.5.3 to 6.7.2 have you considered trying either 6.6.7 or 6.8.0-rc5 instead? There are known problems with 6.7.x that are fixed in 6.8 and when they first came to light a lot of people rolled back to 6.6 for stability. If you don't mind using an rc then I'd recommend the 6.8 series. If you want to stay with only stable releases then I'd recommend using 6.6.7 until 6.8.0-stable is released.

Link to comment
Just now, John_M said:

Since you upgraded from 6.5.3 to 6.7.2 have you considered trying either 6.6.7 or 6.8.0-rc5 instead? There are known problems with 6.7.x that are fixed in 6.8 and when they first came to light a lot of people rolled back to 6.6 for stability. If you don't mind using an rc then I'd recommend the 6.8 series. If you want to stay with only stable releases then I'd recommend using 6.6.7 until 6.8.0-stable is released.

Thanks, but that's exactly what I tried today, 6.8rc5 after experiencing the same with 6.6.7/6.7.x.

Link to comment
27 minutes ago, John_M said:

If you don't start the pfSense VM does the server remain stable?

I mean, it appeared to be the case. But honestly, I did not run it for too long without bringing up the pfSense VM, so I cannot be 100% certain that it doesn't freeze with pfSense off. But it has been my experience that it tends to freeze almost in a short amount time once pfSense starts.

 

Gonna probably pull out the legacy PCI NIC and get another PCI-E Quad NIC and use it instead. Perhaps something changes. Although I'm gonna have to pull out the dGPU and hope that this mobo boots with no GPU.

Edited by joelones
Link to comment
On 11/13/2019 at 2:30 PM, John_M said:

Well, it's up to you how you proceed from here. I can understand that not having your router will be a nuisance (that's the reason mine is a physical box) but on the other hand you want to get to the bottom of the problem. Your call.

Thanks for your help, and yeah I do have a pfSense backup (physical) box so that's not the problem in that regard. More so, it's everything else that runs on unRAID (dockers, etc...) that's the problem, and fear that'll I hose something in the process. But I do backup the flash before proceeding so in every case I tried, I was able to revert back to 6.5.3.

 

EDIT: Solved this by replacing an older PCI dual-nic card with a PCI-E quad nic and removing my Radeon HD 6450 PCI-E for a PCI rage cheapo GPU for basic console.

Edited by joelones
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.