HARDWARE ERROR and machine check events after board switch.


BenP511

Recommended Posts

Thanks for the speedy response!

Tried modprobe -r btusb it removes it but then it emedietly readded it again?

Z590 aorus master so no option to disable WiFi/bluetoothn the bios either, do you think removing the WiFi card will help? Last resort as it means disembling the server again! 

Link to comment

Just as a follow up, incase anyone else has the same issue.

In the end I had to dissemble the entire system end remove the wifi card under the IO shield. Most people with gigabyte aorus master z590 boards with this issue, might have todo the same. Let me know if update fixes this. :)

 

Thanks Again.

Link to comment
  • 1 month later...

Hello @all,

I copy you times my contribution from the German sub-forum into here, since you have here already experience with my new board.

 

I have now wasted 3 days of my life trying to get the new system to work, unfortunately without success. Much I could already read about the forum but here I come no further.

I have built a new server:

 

Gigabyte Aorus Master z590 @ Bios F3 (stock); then with F7 and F8 (latest) - currently back to F7 (Hyperthreading, Intel VD-T is enabled in BIOS)

Intel i9-10900K
Nvidia 1660 
Mellanox Connect-X3 Dual SPF+
New USB stick with UNRAID 6.10 rc2 was needed because the old one didn't boot with UEFI and in Lagency Mode - Black screen 

 

After the new stick was created, I copied the config folder from the old Unraid 6.9.2 stick and booted into UEFI. Then bought a new license for the new stick and booted the array. All good, everything there and Docker etc works.

 

My VM's on the other hand do not. At least not the ones I pass something through via passthrough.... Every time I get the message:

 

internal error: Unknown PCI header type '127' for device '0000:01:00.0' for the 1660 GPU in the Windows VM

and

internal error: Unknown PCI header type '127' for device '0000:02:00.0' for passing the Mellanox NIC to my Xpenology VM.

 

The Linus VM boots normally. Windows and Xpenology also but only without passthrough. 

 

I had read several threads about this error and they said to use a different BIOS version... I have now 3 through s.o. and no success had.

Do you guys have any advice for me?

 

Also I get the following message on the dashboard, did I run mcelog correctly here?

image.png.89ad39c1010e51f524496d077c85dcca.png

image.png.6a728c0ecdbb07ae2f1669be31493047.png

image.png.1ea4a8acf9a4c55900b0b3a4b690a21e.png

 

Stupid question: Or can't I just copy my config folder from the 6.9.2 stick to the new 6.10 rc2?

 

VFIO are of course selected and passed through, thus excluded during UNRAID boot:

image.png.9437a328967c6c2be79894ec6cffce44.png

 

In the corresponding VM's everything is actually also as it should here the Win-VM:

image.png.a89439ed32bf668b238429d25ab2e949.png

 

and here the XPenology VM:

image.png.d174964266279d958a243451b62aac0a.png

 

What is strange is that only the two devices in PCI slot 1 and 2 cause problems. In the third slot is the Dell 310 HBA, which is passed as LSI 9811 in IT mode for the 8 HDD's without any problems... maybe this is an info with which you experts can do something. In the Bios it says that slot 1&2 run with 8x and slot 3 with 4x. That is also so planned.

 

mgutt said to me: Have you tried it with your own GPU ROM?

->Not yet and that's the only thing I haven't tried with the Gpu. But it doesn't explain why the Mellanox card is not passed through (slot 2). I have already installed an old Intel Dual Gigabit NIC instead of the GPU (slot 1) for testing, this is also not passed through. What I still wanted to try - but is costly - all devices to be passed only in Slot3 so the last to plug after each other and see if he passes this. I have the feeling that it has to do with the two 1-2 slot.

Or just redo the stick and try with a fresh installation the whole thing.

 

mgutt said to me: Turn off CSM in the BIOS and set the primary GPU to the iGPU. Then boot unRAID in GUI mode and check if you see unRAID on the onboard HDMI and if the Nvidia HDMI stays black.

->I have had this from the beginning. iGPU is set as primary device and image only comes out there when I boot into GUI (both in UEFI and with CSM on). Currently CSM is off. 

I have now tried a few things yesterday evening:

  1. PCI Slot 1 - GPU removed. PCI Slot 2 - Mellanox Connect-X3 MCX312A-XCBT left in and the LSI 9811-IT in slot 3.
    -> problem still exists
  2. PCI Slot 1 - GPU left in. PCI Slot 2 - Mellanox Connect -X3 removed and LSI 9811-IT in Slot 3
    -> Win-VM starts without error message - but black screen at VNC output (GPU1: VNC & GPU2: Nvidia 1660) bottet also not final to end, because over RDP no connection can be established - but at least no ERROR 127... I think the reason is that I boot without GPU Bios.
  3. PCI Slot 1 - GPU removed and instead of it the Mellanox Connect-X3 card plugged in and LSI 9811-IT left in Slot 3
    -> Xpenology VM does not spit out 127 error anymore but still does not boot and gives the following error:
    image.png.10e408c24645306e6198f1a2ce4afd5b.png
     
  4. PCI Slot 1 - GPU left in. PCI Slot 2 with an old 2x Port Dual Intel Nic (1 Gig) and in Slot 3 the LSI 9811-IT.
    -> Xpenology VM boots once with the NIC passed through but is not found on the network. Have the VM restarted and then came the error as at the beginning:
    internal error: Unknown PCI header type '127' for device

So slowly I suspect I have a problem with the PCI lanes. I changed the settings for the bifurcation from Auto to 8x8x4 or 8x4x4 but it didn't work. Because if the Mellanox card is in slot 2 the error Unknown PCI header type '127' for device appears - or do I have to flash the Mellanox card with a certain "IT" firmware - similar to the LSI9811-IT ? 

I want to mention that I have completely exhausted the connections of the board and maybe there is a bottleneck somewhere that I don't know about. According to datanblatt I think however my KOnfiguration is so feasible. Have here meticulously checked everything before. From Unraid itself everything is actually recognized, even the Mellanox-X3 Dual SFP+ card. Only pass-through does not work. 

 

I have drawn the entire system structure for illustration:

image.png.6cc7d92a7b9476554edd49bca3d31d93.png

 

If you see something that is not compatible I am grateful for advice. I'm surprised that all components previously ran on my old ASUS z170 pro gaming board and there I had ncoh 2 dual Intel Nic cards to - the latter have only fallen out and have just been replaced by the Mellanox Connect-X3. Can't it be that the z590 chipset instead of the z170 can't handle the hardware... or?

 

Here is the block diagram of z170 pro gaming for comparison:

image.png.cde3737e6282a02fe5b7bfbc93cdd09b.png

 

Understandably, I'm a bit frustrated now. Put a lot of money on the table last week and currently got a system that isn't even capable of passing two PCI cards.... 

 

z590 board = 300€
10900K = 300€
Mellanox-X3 = 100€
10 GB switch = 100€
Unraid license = 100€

 

I hope so much that it is only a user error on my part - as I would like to keep the board very very much. 

 

Still grateful for any tip!

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.