Jump to content

Drives connected via HBA won't show up in Unraid


Recommended Posts

Hello everyone,

 

Quick hardware info:

  • CPU : AMD Ryzen Threadripper 1920X
  • Motherboard: AsRock Taichi x399
  • RAM : 64GB with ECC
  • HBA: LSI SAS 9200-16e  (System says LSI SAS 9201-16i ?)

 

I have this issue for months now. I have a JBOD with 10 drives connected via 8088 cables to my HBA. When I do a dirty shutdown, which sadly happened because of a power loss 2 days ago, Unraid won't detect those drives anymore after I restarted it. When I do a normal restart, this issue doesn't appear. In the past, when something like this happened, I tried unplugging and plugging the cables back in, restarting the JBOD or putting the HBA into a different PCIE slot. Normally, after some time, this works and after a restart the drives appear correctly again. Yesterday/today this wasn't the case...

 

I checked if maybe the HBA wouldn't be detected but this doesn't seem to be the case. I did the "lspci -v" command and it did show up as the following: (only first 2 lines)

        Serial Attached SCSI controller: Broadcom / LSI SAS2116 PCI-Express Fusion-MPT SAS-2 [Meteor] (rev 02)
        Subsystem: Broadcom / LSI SAS 9201-16i

It looks like my card is properly detected, supported according to the Unraid Hardware Compatibility Guide and also in a PCIE x8 slot (I checked that).

 

I don't really know what else I could try... Does anyone here maybe have an idea what's causing this issue? Like I said, this only happens after dirty shutdowns and randomly works again after restarting it a couple times and replugging some cables... Hope the diagnostics file will be of some help.

subaru-diagnostics-20230508-2140.zip

Link to comment

You have this line in the syslog:

May  8 21:36:51 Subaru kernel: mpt2sas_cm0: doorbell handshake int failed (line=6884)

 

Googling indicates that this is a problem could be a problem with the card itself or the firmware.  I would suggest that you google the error message.  (Flaky hardware can be a bitch.  The fact that this a LSI SAS 9200-16e (it really does have external connectors???) means that your buying options will probably be limited.  Remember that when buying the older LSI cards that you are buying the vendor and not the hardware!) 

Link to comment
21 hours ago, Frank1940 said:

You have this line in the syslog:

May  8 21:36:51 Subaru kernel: mpt2sas_cm0: doorbell handshake int failed (line=6884)

 

Googling indicates that this is a problem could be a problem with the card itself or the firmware.  I would suggest that you google the error message.  (Flaky hardware can be a bitch.  The fact that this a LSI SAS 9200-16e (it really does have external connectors???) means that your buying options will probably be limited.  Remember that when buying the older LSI cards that you are buying the vendor and not the hardware!) 


I really don't want to sound stupid but what exactly do you mean with "your buying options will probably be limited" and "it really does have external connectors???"?

And I did some googling and yeah, most people also write about hardware of firmware being the issue. So from what I can understand my only real options are either buying a different HBA or trying to upgrade to the newest firmware?
I did try to follow this guide to update my firmware but there still is an issue I haven't found an answer to...

I'm very very sure that I got the correct firmware packages and stuff and did the "chmod +x sas2flash" command to make it executable, however, when I try the "./sas2flash -listall" command I get this error message:

 

LSI Corporation SAS2 Flash Utility
Version 20.00.00.00 (2014.09.18)
Copyright (c) 2008-2014 LSI Corporation. All rights reserved

    No LSI SAS adapters found! Limited Command Set Available!
    ERROR: Command Not allowed without an adapter!
    ERROR: Couldn't Create Command -listall
    Exiting Program.

 

For other people with this issue the solution was that the firmware flashing utility was wrong/the wrong version, but that can't be the case for me? Mine is mpt2sas, even the error message you showed me says that so it has to be right, no?

 

There are 2 other things I found weird...

  1. When I use the "lspci -s 08:00.0 -vv | more" command, 08:00.0 being my HBA, and scroll to the very bottom it says "Kernel modules: mpt3sas". On a video from The Art of Server he used the same command but got a different output at the bottom. "Kernel driver in use: mpt2sas Kernel modules: mpt2sas". Obviously I won't get the Kernal driver message since it didn't load my card, but why am I getting a mpt2sas message?
     
  2. When I use the "lspci -v" command, it tells me my card is a LSI SAS 9201-16i, but as far as I know the i stands for internal, and I can clearly see that my 8088 ports are external, so I must actually have a LSI SAS 9201-16e card, no? Could this be some kind of issue or is this pretty irrelevant?

Thank you for giving me a lead to look for a solution. Do you maybe have any idea why I can't flash the firmware? Would flashing it even help and what other options are there for me to make it work again? Thank you and hope you have a nice day!

Link to comment

I have never done a flashing of an LIS card so I really can't give you much advice.  Here is a thread that has an extensive discussion on the topic.

 

        https://forums.unraid.net/topic/97870-how-to-upgrade-an-lsi-hba-firmware-using-unraid/

 

@JorgeB is a Guru in this area and I have just pinged him.  If he does not reply in this thread in the next day, post in the thread above with your problem. 

 

23 minutes ago, SherryKDA said:

When I use the "lspci -v" command, it tells me my card is a LSI SAS 9201-16i, but as far as I know the i stands for internal, and I can clearly see that my 8088 ports are external, so I must actually have a LSI SAS 9201-16e card, no?

 

Yes, you are correct.  The card is the external card but the firmware has been flashed with the Internal version.  I am not sure it if there is an 'external' version or if it makes any difference but it is possible that it makes all the difference.  That is where @JorgeB can help you out.   (I do see that the version you have is an old one.  You should have version 20.00.07.00 and that does make a difference!)

Link to comment
12 hours ago, JorgeB said:

Because the HBA is failing to initialize it's not being found by the flash util, try doing it in a different PC if available.

I installed the card on a different system with Fedora 37 installed and it worked. I used the commands from your guide and there were no errors, firmware and bios are on the newest version now.

 

Sadly this didn't solve any of my issues as it seems...
The message "Subaru kernel: mpt2sas_cm0: doorbell handshake int failed" still exists and I still can't see any of my drives. Also the card is still displayed as a LSI SAS 9201-16i?

 

Do you have any idea what else I could try to fix my issue? I'll attach a Diagnostics file from today as well, maybe you can have a look at it and maybe something changed? Thank you for replying tho.

subaru-diagnostics-20230510-2122.zip

Link to comment

I did the "lspci -s 08:00.0 -vv | more" command I mentioned above on the Fedora system too. On my server with UnRaid at the end it showed only "Kernel modules: mpt3sas", but on the Fedora system it says both "Kernel driver in use: mpt3sas" and "Kernel modules: mpt3sas", so I guess it did initialize?

Also since I was able to flash it there I guess it had to initialize? Found no error messages too when using "dmesg | grep mpt" like the one I get in UnRaid.

 

One other thing tho... my card should be using mpt3sas or am I wrong? If so, why is both UnRaid as well as Fedora loading/using mpt3sas modules? mpt3sas should only be used for pcie3 cards from what I understand.

Link to comment

My Motherboard has 4 different pcie slots, don't know the versions rn but they are all 3.0 or 2.0. In my UEFI I also set the x16 lanes to x8x8 so that there wouldn’t be a negotiation issue.

 

Before I flashed the card, I tried all of them and none worked. In the past after plugging it into different ones it sometimes worked but no slot was consistent. Currently none work, but I only tried one since the flash, if that would make a difference.

 

I have heard that issues with Ryzen and Epic systems exist with legacy pcie2 cards but I tried the solutions I found online, like changing x16 to x8x8 for example, but still nothing. 

Link to comment

Sadly, the ASRock x399 Taichi MB just doesn't play well with LSI controller cards.  Here's a conversation on Reddit that goes into greater detail:

 

https://www.reddit.com/r/unRAID/comments/98kdyp/lsi_920116i_and_asrock_taichi_x399/

 

I've got the same setup, and I've never gotten it to work.  I ended up going with dual AOC-SAS2LP-MV8 (Marvell based) controller cards.  That was many years ago, so there may be newer alternatives, but I know they work with this MB and a Threadripper 1950X.

 

The two Dell Perc H310 (LSI based) controllers that ASRock didn't like, work just fine in an ASUS MB, so I know they're functionally okay.

Link to comment
6 hours ago, JonathanM said:

Unfortunately Marvell has a bad reputation with linux compatibility. They work for some people, but other systems randomly drop drives.

Hmmm, they’re no longer running as my primary system, but they seemed okay before. 
 

Any idea how to get the ASRock x399 Taichi to play nice with LSI controllers?  Or what’s a good option outside of LSI and Marvell?

Link to comment
15 hours ago, ufopinball said:

Sadly, the ASRock x399 Taichi MB just doesn't play well with LSI controller cards. 

Hmm.. yeah I heard that Ryzen/Epyc support for the old cards is... poor. Sadly only started learning about that when it was too late.

 

I was planning on migrating my storage system to a different server anyways, the one where I installed Fedora and it worked with my card (Xeon btw), and use the Ryzen as a Virtualization host. So I guess I can still use my card after I migrated over to that server. Wanted to do this in a couple months tho for money reasons and keep this system running until then, but I guess I'll have to do it earlier now...

 

I have one last question tho so I don't fuck my data up. If I take my drives out of the JBod and connect them directly via SATA to my Motherboard (and via a Sata Expander to my Mobo) until I got all the components for migrating, will my drives still be listed correctly/will UnRaid still be able to register them correctly so I can start my array again and not loose any data? I'm pretty sure if I just connect them directly to my Mobo it'll be fine, since I did the reverse when I put them in my jbod and they were registered correctly. I don't have enough SATA ports tho so I need an expander, either a crappy pcie one or one I can put into an M.2 slot. Will UnRaid be able to recognize these drives correctly or will it think drives plugged in via an expander are new/unknown drives?

Link to comment
5 hours ago, SherryKDA said:

Hmm.. yeah I heard that Ryzen/Epyc support for the old cards is... poor. Sadly only started learning about that when it was too late.

 

I was planning on migrating my storage system to a different server anyways, the one where I installed Fedora and it worked with my card (Xeon btw), and use the Ryzen as a Virtualization host. So I guess I can still use my card after I migrated over to that server. Wanted to do this in a couple months tho for money reasons and keep this system running until then, but I guess I'll have to do it earlier now...

 

I have one last question tho so I don't fuck my data up. If I take my drives out of the JBod and connect them directly via SATA to my Motherboard (and via a Sata Expander to my Mobo) until I got all the components for migrating, will my drives still be listed correctly/will UnRaid still be able to register them correctly so I can start my array again and not loose any data? I'm pretty sure if I just connect them directly to my Mobo it'll be fine, since I did the reverse when I put them in my jbod and they were registered correctly. I don't have enough SATA ports tho so I need an expander, either a crappy pcie one or one I can put into an M.2 slot. Will UnRaid be able to recognize these drives correctly or will it think drives plugged in via an expander are new/unknown drives?

UnRaid identifies drives by serial number, so motherboard SATA ports should definitely be fine.  Add-in SATA PCIe cards should also be fine, but I don't believe USB connections are supported.  Not sure about the M.2 slot idea as I haven't tried it.  But, since it's newer technology, hopefully they've thought to support things like this.

 

Just make sure all your drives show up and are properly accounted for before you start the array.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...