Problems starting array after installing LSI SAS card


Recommended Posts

Greetings everyone,

 

I am very new to the unraid party.  I just made the move from Synology to unraid when I snagged a deal on some hardware.  Things have been going very well for the most part.  I have several docker containers and a gaming vm working just how I want.  I have 8 10TB drives for my system.  4 that I used to seed the server and 4 that will be coming from my Synology.  I haven't actually inserted the 4 from the Synology yet as I was waiting to make sure I have all the kinks ironed out first in case I need to roll back. 

 

Since I have (or will have) 8 disks in my array and my mobo only has 8 sata headers, I figured I had better get a SAS controller so that I can expand when I'm ready to.  I have a friend that gifted me an LSI 9207 card that he pulled from some box he didn't need anymore.  

 

My issue arises when I insert my LSI 9207 and try to start my array.  The server boots up and unraid seems to be happy until I actually start the array.  When the array is started it seems that my configuration files go out the window.  My dark unraid theme changes back to default, my temperature sensors change back to Celsius, my shares all disappear, docker and vms are disabled...you get the idea.  With the LSI card in place, I am not able to stop the array once it has been started.  I am able to perform a clean shutdown though.  If I remove the LSI card then I can start/stop my array all day long and my configuration stays in place.  

 

I did manage to pull the syslog and diagnostics with the LSI card inserted, and I think I can see the issue in syslog.  I will attach the full zips to this thread, but I'll paste the interesting piece that stands out to me.  It is in regards to my flash drive that is hosting unraid.  From what I've seen in other unraid threads, usually when you see read errors on your unraid flash drive you should replace it.  However, I don't think this is symptomatic of a bad flash drive since I know the cause of the problem is inserting the LSI card and the solution is removing the LSI card.    

 

Aug 12 20:04:19 Tower root: cp: cannot create regular file '/boot/config': Input/output error
Aug 12 20:04:19 Tower kernel: FAT-fs (sda1): FAT read failed (blocknr 169)
Aug 12 20:04:21 Tower root: Starting Samba:  /usr/sbin/nmbd -D
Aug 12 20:04:21 Tower root:                  /usr/sbin/smbd -D
Aug 12 20:04:21 Tower root:                  /usr/sbin/winbindd -D
Aug 12 20:04:21 Tower kernel: fat__get_entry: 310 callbacks suppressed
Aug 12 20:04:21 Tower kernel: FAT-fs (sda1): Directory bread(block 29364) failed
Aug 12 20:04:21 Tower kernel: FAT-fs (sda1): Directory bread(block 29365) failed
Aug 12 20:04:21 Tower kernel: FAT-fs (sda1): Directory bread(block 29366) failed
Aug 12 20:04:21 Tower kernel: FAT-fs (sda1): Directory bread(block 29367) failed
Aug 12 20:04:21 Tower kernel: FAT-fs (sda1): Directory bread(block 29368) failed
Aug 12 20:04:21 Tower kernel: FAT-fs (sda1): Directory bread(block 29369) failed
Aug 12 20:04:21 Tower kernel: FAT-fs (sda1): Directory bread(block 29370) failed
Aug 12 20:04:21 Tower kernel: FAT-fs (sda1): Directory bread(block 29371) failed
Aug 12 20:04:21 Tower kernel: FAT-fs (sda1): Directory bread(block 29372) failed
Aug 12 20:04:21 Tower kernel: FAT-fs (sda1): Directory bread(block 29373) failed

 

On a side note, the LSI card is running version 17 and is already in IT mode.  I did attach a drive to the card and before I start the array, I can see my drive as I would expect to in the unassigned devices section.  For most of my testing, I don't even have the SAS cable attached to the card.  So I don't think this is an issue with any of the drives.  I know that I can update the software on the card to 20, but was hoping to avoid doing that unless there was a good reason.

 

Any input would be greatly appreciated.  Thank you in advance.  

 

-Ed

tower-diagnostics-20190813-0043.zip tower-syslog-20190813-0109.zip

Link to comment
3 minutes ago, itimpi said:

That looks like a problem with the flash drive.

The flash drive is new.  I know that doesn't prove anything.  It just doesn't seem like a flash drive problem if the server runs totally fine until I add a specific piece of hardware to it.  Removing the LSI card returns the server to normal status with no read errors on the flash drive.

Link to comment

Well the log fragment you posted showed that you are getting failures reading from the flash drive.   That is also consistent with the other symptoms which suggest that Unraid is losing access to its configuration information.

 

if you can you should use a USB2 port (and ideally also a USB2 flash drive) as USB2 has proved far more reliable than USB3.

Edited by itimpi
Link to comment

Perhaps it would be better to say it is a problem accessing the flash drive. Whether the flash drive itself, or the port it is on. Perhaps something about the motherboard is making the port not work correctly when you install the card, and it returns to working when you remove it.

 

After checkdisk on the flash drive, I would try a different port, USB2 as noted.

Link to comment

I have a couple more things to report on.  

 

I was able to get 2 more SAS9207-8i cards just like the original card.  I am able to recreate the issue with any of the 3 cards.  

 

I did manage to find a USB2 drive.  I need to read the instructions on how to migrate my config to the new drive.  I am still on the unraid trial with 2 days left.  Am I able to migrate my config without having a paid license or do I need to buy a license before I attempt this step?

 

I am going to update my BIOS this weekend.  A newer image just came out 4 days ago, so we'll see if that does anything for me.  Other BIOS settings are default for the most part.  I tweaked some basic UEFI settings and I have toggled the IOMMU on and off to see if it would make a difference.  I think all other settings are default.

Link to comment
1 hour ago, Edwardo said:

Am I able to migrate my config without having a paid license or do I need to buy a license before I attempt this step?

Trials can't be migrated seamlessly, for obvious reasons. However... I don't think there is anything stopping you from recreating your config manually.

 

I wouldn't try either step until after you update the BIOS and see what the results are.

Link to comment

Problem may be on USB controller side

 

Aug 12 20:03:15 Tower kernel: xhci_hcd 0000:0a:00.3: remove, state 4
Aug 12 20:03:15 Tower kernel: usb usb4: USB disconnect, device number 1
Aug 12 20:03:15 Tower kernel: xhci_hcd 0000:0a:00.3: USB bus 4 deregistered
Aug 12 20:03:15 Tower kernel: xhci_hcd 0000:0a:00.3: remove, state 1
Aug 12 20:03:15 Tower kernel: usb usb3: USB disconnect, device number 1
Aug 12 20:03:15 Tower kernel: usb 3-1: USB disconnect, device number 2
Aug 12 20:03:15 Tower rc.diskinfo[6194]: SIGHUP received, forcing refresh of disks info.
Aug 12 20:03:15 Tower kernel: usb 3-2: USB disconnect, device number 3
Aug 12 20:03:15 Tower kernel: xhci_hcd 0000:0a:00.3: Host halt failed, -110
Aug 12 20:03:15 Tower kernel: xhci_hcd 0000:0a:00.3: Host controller not halted, aborting reset.
Aug 12 20:03:15 Tower kernel: xhci_hcd 0000:0a:00.3: USB bus 3 deregistered

 

Does any passthrough cause this ?

Link to comment

I got one of the SAS cards updated to the latest fw.  No change.

 

I am doing some USB passthrough for my gaming vm.  To make sure that wasn't the cause of my problem, I just disabled the auto start for that vm.  Is there something else I can do to confirm that this isn't a passthrough problem?

Link to comment
9 hours ago, Benson said:

I have overlook something, the problem happen when add LSI HBA.

 

BTW, how often problem occur, and does any help if stop that VM ?

The problem occurs every time I start the array with any of my 3 LSI cards installed.  Leaving the VM off has no effect.  The system will boot with the LSI card in place, but will fall apart once the array is started.

Link to comment
2 hours ago, Edwardo said:

The problem occurs every time I start the array with any of my 3 LSI cards installed.  Leaving the VM off has no effect.  The system will boot with the LSI card in place, but will fall apart once the array is started.

Note, seems some kind hardware / firmware relate issue, not easy to solve.

 

Suggest try an addon USB card, but I can't confirm bootable or not.

Edited by Benson
Link to comment

This problem can be annoying. Delete the boot bios from the LSI card.

You cannot boot from a drive off the LSI card anymore.

But you can boot off your USB ports and system SATA ports.

If you place more than one LSI controller in your system you have to do this.

Don't know why but the bios on the controller fights and disk's go into error randomly.

You simply run the flash tool on the LSI card and tell it to delete the bios.

 

Go to the boot section and set the first boot to the UEFI shell and restart.

From the SHELL> prompt, type in this command  sas2flash -list save the results of this output by your preferred method.

From the SHELL> prompt, type in this command  sas2flash -o -e 5 to erase the boot services area of the flash chip.

And one last time, type in sas2flash -list and verify the Bios Version now reads N/A

 

Link to comment
11 hours ago, Benson said:

Note, seems some kind hardware / firmware relate issue, not easy to solve.

 

Suggest try an addon USB card, but I can't confirm bootable or not.

You're suggesting adding an internal USB card and moving my boot/unraid usb device to it?  I actually have a card I could use for this.  

Link to comment
6 hours ago, Maticks said:

This problem can be annoying. Delete the boot bios from the LSI card.

You cannot boot from a drive off the LSI card anymore.

But you can boot off your USB ports and system SATA ports.

If you place more than one LSI controller in your system you have to do this.

Don't know why but the bios on the controller fights and disk's go into error randomly.

You simply run the flash tool on the LSI card and tell it to delete the bios.

 

Go to the boot section and set the first boot to the UEFI shell and restart.

From the SHELL> prompt, type in this command  sas2flash -list save the results of this output by your preferred method.

From the SHELL> prompt, type in this command  sas2flash -o -e 5 to erase the boot services area of the flash chip.

And one last time, type in sas2flash -list and verify the Bios Version now reads N/A

 

So I only have/want 1 of the LSI cards to be installed at once.  Because of the issue I'm having after installing one, I borrowed a couple more from a friend just to make sure my issue wasn't due to a bad card.  I am only ever booting with a single LSI card in place.  Also, while updating that single card to the latest fw, I accidentally removed the boot rom from the chip.  It didn't make a difference for my issue...

Link to comment
1 hour ago, Edwardo said:

You're suggesting adding an internal USB card and moving my boot/unraid usb device to it?  I actually have a card I could use for this.  

Apparently my USB card is not bootable :(

 

But this did get me thinking.  I ended up moving my unraid flash drive to my USB 3.1 port.  For some reason I thought maybe the 3.1 port is on a different internal controller than the other 3.0 ports.  It actually worked.  I am posting this from my Windows VM within unraid with my array running and the LSI card installed with 1 drive attached.

 

There is definitely something weird with the integrated USB controller on my system and the LSI card.  Now that the unraid drive is on a different port, I am having issues with other USB devices attached to the 3.0 ports.  For example, I keep getting notified that unraid has lost communication with my UPS.  Also my USB keyboard that is passed through to my VM keeps disconnecting.

 

Since I don't need to worry about booting from my other USB devices, I think I'll reinstall that USB card and move everything over to it.  

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.