Jump to content

10g network card somehow causing array failures?


Spitko

Recommended Posts

Ok this one's spooky.

I'm in the middle of a network upgrade, and one the tasks was to pop in a 10g network card to the unraid box as I slowly move towards multi-gig.

This went mostly fine; I pulled the slot 1 GPU out and popped the ROG Areion my board came with in its place. I don't really use unraid in GUI mode so this is (mostly) fine. (I'll deal with the GPU passthrough issues later).

About an hour after installation, one of my drives had a few read errors and went into emulated mode.

Well.. that sucks, but wasn't TOO surprising, given it was the oldest drive in the array, and a holdover from initializing the box. A bit early for an Ironwolf but still under warranty. Except... no errors? 

All smart tests passed, no bad sectors... every scan I could throw at it passed. I couldn't find (read: Forgot to order) a cold spare and it was going to take 2 weeks for the one I ordered to arrive, so I figured might as well rebuild it and see what happens. Rebuild goes fine. No issues.

Huh.


The next day I do a reboot while chasing down the exciting new GPU passthrough issues and suddenly a *different* is disabled. But... the logs don't in any way indicate why. Even the alert just says "disabled", no errors logged. All tests green as before.  Diagnostics from immediately after I got the alert are attached, but I don't even see the disable event in the log, so maybe it happened during shutdown and I didn't get the alert until reboot?

At this point I'm fairly convinced somehow this NIC is causing the errors, but I don't have any mental model for *how*. The NIC itself was working fine and the second drive is now rebuilding without any incident. There are no weird errors in the log

I've pulled the card for now because it's clearly dangerous, but this leaves me with two core problems:
1) I don't know how to validate the system stability with the card in
2) I don't know how to actually test the card without constantly putting the array into rebuild mode.

How should/can I proceed from here? I'd like to eventually have the card in service, but the current behavior just isn't tenable.

 

Edited by Spitko
Link to comment

I moved the 10g card to the 4x link on the chipset, and that appears to have solved the issue. Hopefully that helps someone else should they stumble across this thread. x399 has a single dedicated 4x outbound link so odds are if your motherboard has exactly one 4x slot just hanging around, it's probably that one. In the case of the ROG Zenith Extreme, it's slot 3.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...