Jump to content

LSI 9207-8I Drive Errors again


msteger

Recommended Posts

Hi Friends,

I am back to having trouble again which is very odd because nothing has changed other than an upgrade which I rolled back and still have the same issues.   So, in summary this is what I am seeing:

I can get the server into 1 of 2 states.  

 

State 1: I have two drives that report errors and eventually drop off.   These are fairly new drives that work fine before the array is started and there are not smart issues that I can see.   In this state the parity rebuilds are of an expected speed.  The drives that get knocked off are always connected to the LSI board although I have other drives connected that work OK.

 

State 2: Same drives, no write or read errors but parity checks estimate 300 days.  I let it run overnight and it only completed 36GB out of 12 TB.   

 

The states change randomly as I connect different drives to different ports (motheboard, versus LSI).   I can't seem to pinpoint an exact cause and effect.  I worked with this for over 12 hours yesterday and this build has been running for years on the same hardware. 

 

The motherboard has 2 SATA chips with a total of 10 SATA ports.   I am currently in state 2 which I obtained by disabling one of the motherboard SATA chips and connecting all drives to the First set of motherboard ports and the LSI board.

 

I am thinking this is some kind of IRQ or conflict.  Odd thing is that since I upgraded the firmware on the LSI board I have not had any issues for months.   The firmware is the 20.0.0.9 version.  

 

Note, I have tried new cables and I have two LSI boards (both 20.0.0.9 IT mode, the second one is an older 9220-8I) which exhibit the exact same behavior.  I even tried disabling all of the motherboard ports and using the two cards which resulted in State 1 again.   The last time this happened the LSI firmware update seemed to solve the issue.

 

Also note that I had a VM with Video pass-through working on this machine with no issues.   The first thing I did when my first drive dropped was to disable VM's and remove the graphics card.  I also disabled all of the serial ports, the Bluetooth and the Wi-fi from the motherboard with no effect.  

 

Thanks for any help, I ordered an Adaptec SAS card (which doesn't come in until next week) to see if it is some LSI compatibility issue with the motherboard.

 

 My diagnostics are attached.

 

M

jabba-diagnostics-20190705-1223.zip

Link to comment

Attach diagnostics directly to your post, don't upload them to some random file hosting site.

 

LSI adapters expect to have constant cool airflow, as they are designed to be used primarily in commercial servers with fan walls or other circulation aids. Home servers many times don't have forced air over the card's heatsink, causing overheating and premature failure. Is your card directly in the path of forced air?

 

 

Link to comment

This is a bit off the wall.  There is one issue with the locking type of SATA connector and a change in the drive in the way that the drive end connector is designed.  (Many locking SATA connectors don't have the 'bump'...)  You can read about it here:

 

   https://support-en.wd.com/app/answers/detail/a_id/15954

 

You can easily check for this condition by pulling gently on any locking  SATA cable at the drive end.  It will have a slight resistance until the connector comes out of the socket.  If you can't feel the resistance, you have the problem.   OF course, if you have the shroud, the metal locking clip will secure the electrical connection.   

Link to comment

OK, I figured it out.   Thanks everyone for making me look in a different direction.    I started looking at the two drives that were constantly the issue, it didn't seem to matter which SATA port they were plugged into and they were next to each other.   That made me wonder about the power.  They were both connected to the same power connector with a splitter cable.  I threw that cable away, (it was a high quality one) and used another splitter connected to a different power and all is well again.  

 

So I think my chain of events is as follows:

1.  I just figured out the downstairs AC is not working great, it has been 82 degrees in the computer area all morning.   This may have caused the LSI card to overheat making a single drive drop.   

2. In my attempts to fix it, I must have broken or bent some part of that power cable so no matter what I did and how I connected things SATA wise it was no good.  

 

Lesson from all of this is - just because a drive has power and works it doesn't mean that you can rule out the power connection as the problem.  

 

Thanks All!

Link to comment

Good solving, hard time for the new 12TB disk.

 

I build the SATA power cable if not enough length, won't use Y cable. For 82c, just little bit high. Yesterday, my HBA may be heat dead (but another one fine), mainly due to bad ventilation, later will fix it.

Link to comment
38 minutes ago, msteger said:

1.  I just figured out the downstairs AC is not working great, it has been 82 degrees in the computer area all morning.   This may have caused the LSI card to overheat making a single drive drop.   

82F should not be a problem but if  the computer was case open while you were troubleshooting, that could be an issue.  Your fans should be setup so that the air comes over the drives, across the MB and cards and exits out the rear.  Removing the side/top will virtually eliminate the air flow across the MB and cards.  That is why most dedicated servers from the major manufacturers all sound like jet planes taking off.  The forced air is what keeps them cool enough to function.

Link to comment

That 12TB disk was a gift from the Ebay Gods.  I purchased a 10TB to use for parity and a 12TB came.   Thought about selling it but then decided "Gift Horse" / "Mouth" well you know.   Most of the other drives are shucks.    Shucking is getting risky though, I have two (Western Digital) where I had to disable pin 3, which does work but is a royal pain.

 

As for the cooling, the problem drive dropped when I was away on vacation.   It was 115 in AZ last week so who knows how hot the computer room got.   In the computer, I have 5 huge fans pushing air from front to back and top to bottom but I am going to add more around the LSI card just to be safe.   That bad boy does get warm to the touch.   Thanks for the good advice, I have noticed some are even replacing the heatsink on the LSI board with a good quality mini cooler.   Might be something to think about.  

 

Link to comment

 This is a YouTube video about the problem that he is talking about.  You suffered only the minor failure problem of these connectors.  The video describes the worst case scenario!  The basic failure mechanism is the same as the OP experienced.  I have used the Molex-to-SATA power cables with the press-together SATA connectors when required and trust them .  I would never use the SATA-to-SATA type.   

 

   https://www.youtube.com/watch?v=TataDaUNEFc

 

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...