Jump to content

Rebuild After Data Drive Disabled - Time Sensitive Issue


Recommended Posts

I have been having an issue with one of my data drives dropping out at random. It is connected to the onboard sata port of my motherboard. I unfortunately rebooted my server before downloading diagnostics (I know), but I have attached them anyway. This an example of the error messages I am getting now (the specific block/sector changes between error messages):

 

print_reg error: I/O error, dev sdf, sector 31251758915

buffer I/O error on dev sdf1, logical block 31251758852, async page read

 

I had a different drive drop out on the same port so I am pretty sure the port is the issue unless it is a power supply problem (I replaced the cable in the past so seams unlikely that's the issue, but always possible). 

 

Aside from general advice about what to do in this situation I am also concerned about the best way to re-introduce the drive into the array if I get it back online. It dropped out while I was copying data to the array (no idea to which drive) so there was definitely data added after the drive was disabled. In this situation do I want to simply assign the drive back in and let it go through a parity check/rebuild or is it better to assume the data on the drive is good and rebuild the parity drive somehow? Basically im not sure what happens to the parity integrity when a drive drops ouot the array is still being written to.

 

Thanks so much for the help. I am definitely new to this and this server is very important for my work so any advice is appreciated no matter how obvious!

diagnostics-20210521-1213.zip

Link to comment

Disk dropped again, so there's no SMART report, you're using a Marvell controller with port multipliers, these are a known problem separately, even worse together, there's also another controller with port multipliers, you should really get rid of those.

 

Connect the disk to a different port/controller and post new diags so we can see SMART.

Link to comment
On 5/21/2021 at 1:26 PM, JorgeB said:

Disk dropped again, so there's no SMART report, you're using a Marvell controller with port multipliers, these are a known problem separately, even worse together, there's also another controller with port multipliers, you should really get rid of those.

 

Connect the disk to a different port/controller and post new diags so we can see SMART.

Got you, thanks for the advice! I currently have all my drives attached to the onboard SATA ports (motherboard is the ASrock Rack X470D4U). It sounds like the only option is to buy an add-in card controller correct? I was avoiding doing the because I'm using those slots for other things. My plan was to get an LSI card flashed to IT mode like this one: https://www.ebay.com/itm/254940196705

 

Assuming it's just a controller issue then, what is the best way to re-introduce the drive once I have it back up? (I will definitely get the SMART report as soon as possible) Thanks again!

Link to comment
11 minutes ago, GroundMoose said:

My plan was to get an LSI card flashed to IT mode like this one: https://www.ebay.com/itm/254940196705

 

With new LSI cards of this vintage, they are all made by some Chinese manufacturer using the LSI chip set.  (They reversed-engineered an original LSI board-- often right down to the paper labels.)  Problem is quality as none of the manufacturer(s?) never seems to put his information on any of the boards.  For that reason, they are often considered counterfeit!  So when you buy one, you are really purchasing the vendor with whom you are dealing.  Vet him carefully.  Price, while important, should not be your only parameter when making the purchase decision. You should make sure that the firmware version is 20.00.07.00  as earlier versions have problems.  (If it is not listed, an e-mail to the vendor may provide an indication of how responsible he will be if you have an issue after you make your purchase.)  

 

As least, this manufacturer did not put the LSI logo on the board which a positive...

Link to comment
2 hours ago, Frank1940 said:

As least, this manufacturer did not put the LSI logo on the board which a positive...

 

Interesting...I was not aware counterfeit cards were so prevalent. Is there any way to check once I get my hands on the card itself? Alternatively, are there any reputable vendors of these cards?

 

Also, forgive my ignorance, but why would it be a positive that there is no LSI logo on this card?

Edited by GroundMoose
Link to comment
34 minutes ago, GroundMoose said:

Also, forgive my ignorance, but why would it be a positive that there is no LSI logo on this card?

At least the manufacturer is somehow honest and does not put another company name on the product. Many do go that far and it makes it more difficult to identify reals product.

Link to comment
40 minutes ago, GroundMoose said:

Is there any way to check once I get my hands on the card itself?

Send the serial number to LSI tech support.

 

However, any 92xx series card that is advertised as new IS counterfeit. 93XX is the only series currently being manufactured by LSI.

 

If you want a 92XX card, your best bet is a used server pull manufactured under a big brand like dell, typically like a dell h310

Link to comment

Look for the 8-port cards (about half-way down the first post) in this thread:

 

      https://forums.unraid.net/topic/102010-recommended-controllers-for-unraid/?tab=comments#comment-941151

 

It lists the OEM boards server LSI boards.    As far as I know, these boards were all made for Dell and IBM by LSI.   These used LSI boards came on the market after server farms were removed from service and the equipment was sold off for salvage. 

Link to comment
10 hours ago, GroundMoose said:

I currently have all my drives attached to the onboard SATA ports (motherboard is the ASrock Rack X470D4U).

 

There are no Marvell controllers on that motherboard. Six of the SATA ports are provided by the X470 (ASMedia IP licensed by AMD) and two are provided by a discrete ASMedia chip, with no port multipliers. ASMedia controllers are generally reliable. I notice you've had two different disks dropped by the same port. Were they connected using the same SATA cable, by any chance? Have you tried replacing that cable?

  • Like 1
Link to comment
6 hours ago, John_M said:

There are no Marvell controllers on that motherboard.

No they are not, looks like I was looking at different diags, sorry about that.

 

On 5/21/2021 at 6:26 PM, JorgeB said:

Disk dropped again, so there's no SMART report, you're using a Marvell controller with port multipliers, these are a known problem separately, even worse together, there's also another controller with port multipliers, you should really get rid of those.

 

Connect the disk to a different port/controller and post new diags so we can see SMART.

 

Ignore all I posted before, disk dropped offline and looks to be the result of a bad SATA cable, though you still dint' post the SMART report.

 

 

Link to comment
On 5/24/2021 at 8:31 PM, John_M said:

I notice you've had two different disks dropped by the same port. Were they connected using the same SATA cable, by any chance? Have you tried replacing that cable?

 

On 5/25/2021 at 2:54 AM, JorgeB said:

disk dropped offline and looks to be the result of a bad SATA cable, though you still dint' post the SMART report.

 

I still haven't had a chance to reconnect the drive and get the SMART report off it but I'll do that soon.

 

I did try replacing the cable after the first dropout, although admittedly with another cheap SATA cable from the same manufacturer. It seems strange that the same port would drop two different drives with two different cables though since I'm using the same cables on all my drives. I went ahead and got what seems like a reputable Dell LSI card though. Is there a chance it's still worth going that route?

Edited by GroundMoose
Link to comment
5 minutes ago, GroundMoose said:

I went ahead and got what seems like a reputable Dell LSI card though. Is there a chance it's still worth going that route?

Not for now unless you need the extra ports, though you might need it the future since these AMD chipsets sometimes have issues with the onboard SATA controllers, dropping multiple disks at the same time.

Link to comment
52 minutes ago, JorgeB said:

SMART looks good and the emulated disk is mounting so you can rebuild on top: stop array, unassign disk5, start array, stop array, re-assign disk5, start array to begin rebuild.

 

Ok amazing. The only thing I was worried about is that I definitely copied a bunch of stuff to the array before realizing that the drive was down. Won't this affect the parity in a weird way and then how the drive is rebuilt?

Edited by GroundMoose
Link to comment
8 minutes ago, GroundMoose said:

Ok amazing. The only thing I was worried about is that I definitely copied a bunch of stuff to the array before realizing that the drive was down. Won't this affect the parity in a weird way and then how the drive is rebuilt?

 

No, any data that would go to the disabled disk would still be written in the emulated disk, and parity remains always in sync.

Link to comment
17 minutes ago, JorgeB said:

 

No, any data that would go to the disabled disk would still be written in the emulated disk, and parity remains always in sync.

 

Ok great to know. Thanks again for all the help!

Link to comment
1 hour ago, JorgeB said:

SMART looks good and the emulated disk is mounting so you can rebuild on top: stop array, unassign disk5, start array, stop array, re-assign disk5, start array to begin rebuild.

 

I did receive this warning:

 

udma crc error count is 1

 

Is this left over from the original bad cable or could this be a current problem?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...