SherryKDA Posted May 8, 2023 Share Posted May 8, 2023 Hello everyone, Quick hardware info: CPU : AMD Ryzen Threadripper 1920X Motherboard: AsRock Taichi x399 RAM : 64GB with ECC HBA: LSI SAS 9200-16e (System says LSI SAS 9201-16i ?) I have this issue for months now. I have a JBOD with 10 drives connected via 8088 cables to my HBA. When I do a dirty shutdown, which sadly happened because of a power loss 2 days ago, Unraid won't detect those drives anymore after I restarted it. When I do a normal restart, this issue doesn't appear. In the past, when something like this happened, I tried unplugging and plugging the cables back in, restarting the JBOD or putting the HBA into a different PCIE slot. Normally, after some time, this works and after a restart the drives appear correctly again. Yesterday/today this wasn't the case... I checked if maybe the HBA wouldn't be detected but this doesn't seem to be the case. I did the "lspci -v" command and it did show up as the following: (only first 2 lines) Serial Attached SCSI controller: Broadcom / LSI SAS2116 PCI-Express Fusion-MPT SAS-2 [Meteor] (rev 02) Subsystem: Broadcom / LSI SAS 9201-16i It looks like my card is properly detected, supported according to the Unraid Hardware Compatibility Guide and also in a PCIE x8 slot (I checked that). I don't really know what else I could try... Does anyone here maybe have an idea what's causing this issue? Like I said, this only happens after dirty shutdowns and randomly works again after restarting it a couple times and replugging some cables... Hope the diagnostics file will be of some help. subaru-diagnostics-20230508-2140.zip Quote Link to comment
Frank1940 Posted May 8, 2023 Share Posted May 8, 2023 You have this line in the syslog: May 8 21:36:51 Subaru kernel: mpt2sas_cm0: doorbell handshake int failed (line=6884) Googling indicates that this is a problem could be a problem with the card itself or the firmware. I would suggest that you google the error message. (Flaky hardware can be a bitch. The fact that this a LSI SAS 9200-16e (it really does have external connectors???) means that your buying options will probably be limited. Remember that when buying the older LSI cards that you are buying the vendor and not the hardware!) Quote Link to comment
SherryKDA Posted May 9, 2023 Author Share Posted May 9, 2023 21 hours ago, Frank1940 said: You have this line in the syslog: May 8 21:36:51 Subaru kernel: mpt2sas_cm0: doorbell handshake int failed (line=6884) Googling indicates that this is a problem could be a problem with the card itself or the firmware. I would suggest that you google the error message. (Flaky hardware can be a bitch. The fact that this a LSI SAS 9200-16e (it really does have external connectors???) means that your buying options will probably be limited. Remember that when buying the older LSI cards that you are buying the vendor and not the hardware!) I really don't want to sound stupid but what exactly do you mean with "your buying options will probably be limited" and "it really does have external connectors???"? And I did some googling and yeah, most people also write about hardware of firmware being the issue. So from what I can understand my only real options are either buying a different HBA or trying to upgrade to the newest firmware? I did try to follow this guide to update my firmware but there still is an issue I haven't found an answer to... I'm very very sure that I got the correct firmware packages and stuff and did the "chmod +x sas2flash" command to make it executable, however, when I try the "./sas2flash -listall" command I get this error message: LSI Corporation SAS2 Flash Utility Version 20.00.00.00 (2014.09.18) Copyright (c) 2008-2014 LSI Corporation. All rights reserved No LSI SAS adapters found! Limited Command Set Available! ERROR: Command Not allowed without an adapter! ERROR: Couldn't Create Command -listall Exiting Program. For other people with this issue the solution was that the firmware flashing utility was wrong/the wrong version, but that can't be the case for me? Mine is mpt2sas, even the error message you showed me says that so it has to be right, no? There are 2 other things I found weird... When I use the "lspci -s 08:00.0 -vv | more" command, 08:00.0 being my HBA, and scroll to the very bottom it says "Kernel modules: mpt3sas". On a video from The Art of Server he used the same command but got a different output at the bottom. "Kernel driver in use: mpt2sas Kernel modules: mpt2sas". Obviously I won't get the Kernal driver message since it didn't load my card, but why am I getting a mpt2sas message? When I use the "lspci -v" command, it tells me my card is a LSI SAS 9201-16i, but as far as I know the i stands for internal, and I can clearly see that my 8088 ports are external, so I must actually have a LSI SAS 9201-16e card, no? Could this be some kind of issue or is this pretty irrelevant? Thank you for giving me a lead to look for a solution. Do you maybe have any idea why I can't flash the firmware? Would flashing it even help and what other options are there for me to make it work again? Thank you and hope you have a nice day! Quote Link to comment
Frank1940 Posted May 9, 2023 Share Posted May 9, 2023 I have never done a flashing of an LIS card so I really can't give you much advice. Here is a thread that has an extensive discussion on the topic. https://forums.unraid.net/topic/97870-how-to-upgrade-an-lsi-hba-firmware-using-unraid/ @JorgeB is a Guru in this area and I have just pinged him. If he does not reply in this thread in the next day, post in the thread above with your problem. 23 minutes ago, SherryKDA said: When I use the "lspci -v" command, it tells me my card is a LSI SAS 9201-16i, but as far as I know the i stands for internal, and I can clearly see that my 8088 ports are external, so I must actually have a LSI SAS 9201-16e card, no? Yes, you are correct. The card is the external card but the firmware has been flashed with the Internal version. I am not sure it if there is an 'external' version or if it makes any difference but it is possible that it makes all the difference. That is where @JorgeB can help you out. (I do see that the version you have is an old one. You should have version 20.00.07.00 and that does make a difference!) Quote Link to comment
JorgeB Posted May 10, 2023 Share Posted May 10, 2023 11 hours ago, SherryKDA said: No LSI SAS adapters found! Limited Command Set Available! Because the HBA is failing to initialize it's not being found by the flash util, try doing it in a different PC if available. Quote Link to comment
SherryKDA Posted May 10, 2023 Author Share Posted May 10, 2023 12 hours ago, JorgeB said: Because the HBA is failing to initialize it's not being found by the flash util, try doing it in a different PC if available. I installed the card on a different system with Fedora 37 installed and it worked. I used the commands from your guide and there were no errors, firmware and bios are on the newest version now. Sadly this didn't solve any of my issues as it seems... The message "Subaru kernel: mpt2sas_cm0: doorbell handshake int failed" still exists and I still can't see any of my drives. Also the card is still displayed as a LSI SAS 9201-16i? Do you have any idea what else I could try to fix my issue? I'll attach a Diagnostics file from today as well, maybe you can have a look at it and maybe something changed? Thank you for replying tho. subaru-diagnostics-20230510-2122.zip Quote Link to comment
Frank1940 Posted May 10, 2023 Share Posted May 10, 2023 It may be late to ask this, but did It initialize on the Fedora system? (There is always the possibility that the LSI card itself is bad.) Quote Link to comment
SherryKDA Posted May 10, 2023 Author Share Posted May 10, 2023 I did the "lspci -s 08:00.0 -vv | more" command I mentioned above on the Fedora system too. On my server with UnRaid at the end it showed only "Kernel modules: mpt3sas", but on the Fedora system it says both "Kernel driver in use: mpt3sas" and "Kernel modules: mpt3sas", so I guess it did initialize? Also since I was able to flash it there I guess it had to initialize? Found no error messages too when using "dmesg | grep mpt" like the one I get in UnRaid. One other thing tho... my card should be using mpt3sas or am I wrong? If so, why is both UnRaid as well as Fedora loading/using mpt3sas modules? mpt3sas should only be used for pcie3 cards from what I understand. Quote Link to comment
Frank1940 Posted May 10, 2023 Share Posted May 10, 2023 You may want to google about the LSI driver being loaded. Here is the description from a vendor that I have purchased from about this model card: Do you have an available PCI-E 3.0X8 slot to use. Read the MB specs carefully. Read the caution about cooling. Quote Link to comment
JorgeB Posted May 11, 2023 Share Posted May 11, 2023 12 hours ago, SherryKDA said: I installed the card on a different system with Fedora 37 installed and it worked. This suggests a compatibility issue with the server hardware, if you haven't yet try a different PCIe slot if available Quote Link to comment
SherryKDA Posted May 11, 2023 Author Share Posted May 11, 2023 My Motherboard has 4 different pcie slots, don't know the versions rn but they are all 3.0 or 2.0. In my UEFI I also set the x16 lanes to x8x8 so that there wouldn’t be a negotiation issue. Before I flashed the card, I tried all of them and none worked. In the past after plugging it into different ones it sometimes worked but no slot was consistent. Currently none work, but I only tried one since the flash, if that would make a difference. I have heard that issues with Ryzen and Epic systems exist with legacy pcie2 cards but I tried the solutions I found online, like changing x16 to x8x8 for example, but still nothing. Quote Link to comment
ufopinball Posted May 13, 2023 Share Posted May 13, 2023 Sadly, the ASRock x399 Taichi MB just doesn't play well with LSI controller cards. Here's a conversation on Reddit that goes into greater detail: https://www.reddit.com/r/unRAID/comments/98kdyp/lsi_920116i_and_asrock_taichi_x399/ I've got the same setup, and I've never gotten it to work. I ended up going with dual AOC-SAS2LP-MV8 (Marvell based) controller cards. That was many years ago, so there may be newer alternatives, but I know they work with this MB and a Threadripper 1950X. The two Dell Perc H310 (LSI based) controllers that ASRock didn't like, work just fine in an ASUS MB, so I know they're functionally okay. Quote Link to comment
JonathanM Posted May 13, 2023 Share Posted May 13, 2023 11 minutes ago, ufopinball said: (Marvell based) controller cards Unfortunately Marvell has a bad reputation with linux compatibility. They work for some people, but other systems randomly drop drives. Quote Link to comment
ufopinball Posted May 13, 2023 Share Posted May 13, 2023 6 hours ago, JonathanM said: Unfortunately Marvell has a bad reputation with linux compatibility. They work for some people, but other systems randomly drop drives. Hmmm, they’re no longer running as my primary system, but they seemed okay before. Any idea how to get the ASRock x399 Taichi to play nice with LSI controllers? Or what’s a good option outside of LSI and Marvell? Quote Link to comment
SherryKDA Posted May 13, 2023 Author Share Posted May 13, 2023 15 hours ago, ufopinball said: Sadly, the ASRock x399 Taichi MB just doesn't play well with LSI controller cards. Hmm.. yeah I heard that Ryzen/Epyc support for the old cards is... poor. Sadly only started learning about that when it was too late. I was planning on migrating my storage system to a different server anyways, the one where I installed Fedora and it worked with my card (Xeon btw), and use the Ryzen as a Virtualization host. So I guess I can still use my card after I migrated over to that server. Wanted to do this in a couple months tho for money reasons and keep this system running until then, but I guess I'll have to do it earlier now... I have one last question tho so I don't fuck my data up. If I take my drives out of the JBod and connect them directly via SATA to my Motherboard (and via a Sata Expander to my Mobo) until I got all the components for migrating, will my drives still be listed correctly/will UnRaid still be able to register them correctly so I can start my array again and not loose any data? I'm pretty sure if I just connect them directly to my Mobo it'll be fine, since I did the reverse when I put them in my jbod and they were registered correctly. I don't have enough SATA ports tho so I need an expander, either a crappy pcie one or one I can put into an M.2 slot. Will UnRaid be able to recognize these drives correctly or will it think drives plugged in via an expander are new/unknown drives? Quote Link to comment
ufopinball Posted May 13, 2023 Share Posted May 13, 2023 5 hours ago, SherryKDA said: Hmm.. yeah I heard that Ryzen/Epyc support for the old cards is... poor. Sadly only started learning about that when it was too late. I was planning on migrating my storage system to a different server anyways, the one where I installed Fedora and it worked with my card (Xeon btw), and use the Ryzen as a Virtualization host. So I guess I can still use my card after I migrated over to that server. Wanted to do this in a couple months tho for money reasons and keep this system running until then, but I guess I'll have to do it earlier now... I have one last question tho so I don't fuck my data up. If I take my drives out of the JBod and connect them directly via SATA to my Motherboard (and via a Sata Expander to my Mobo) until I got all the components for migrating, will my drives still be listed correctly/will UnRaid still be able to register them correctly so I can start my array again and not loose any data? I'm pretty sure if I just connect them directly to my Mobo it'll be fine, since I did the reverse when I put them in my jbod and they were registered correctly. I don't have enough SATA ports tho so I need an expander, either a crappy pcie one or one I can put into an M.2 slot. Will UnRaid be able to recognize these drives correctly or will it think drives plugged in via an expander are new/unknown drives? UnRaid identifies drives by serial number, so motherboard SATA ports should definitely be fine. Add-in SATA PCIe cards should also be fine, but I don't believe USB connections are supported. Not sure about the M.2 slot idea as I haven't tried it. But, since it's newer technology, hopefully they've thought to support things like this. Just make sure all your drives show up and are properly accounted for before you start the array. Quote Link to comment
JonathanM Posted May 14, 2023 Share Posted May 14, 2023 18 hours ago, ufopinball said: I don't believe USB connections are supported. Sometimes they work, many times they don't. The translation circuits between USB to SATA vary wildly in quality and performance, it's much better to use direct SATA or SAS. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.