ndk Posted May 21, 2021 Share Posted May 21, 2021 I have been having an issue with one of my data drives dropping out at random. It is connected to the onboard sata port of my motherboard. I unfortunately rebooted my server before downloading diagnostics (I know), but I have attached them anyway. This an example of the error messages I am getting now (the specific block/sector changes between error messages): print_reg error: I/O error, dev sdf, sector 31251758915 buffer I/O error on dev sdf1, logical block 31251758852, async page read I had a different drive drop out on the same port so I am pretty sure the port is the issue unless it is a power supply problem (I replaced the cable in the past so seams unlikely that's the issue, but always possible). Aside from general advice about what to do in this situation I am also concerned about the best way to re-introduce the drive into the array if I get it back online. It dropped out while I was copying data to the array (no idea to which drive) so there was definitely data added after the drive was disabled. In this situation do I want to simply assign the drive back in and let it go through a parity check/rebuild or is it better to assume the data on the drive is good and rebuild the parity drive somehow? Basically im not sure what happens to the parity integrity when a drive drops ouot the array is still being written to. Thanks so much for the help. I am definitely new to this and this server is very important for my work so any advice is appreciated no matter how obvious! diagnostics-20210521-1213.zip Quote Link to comment
JorgeB Posted May 21, 2021 Share Posted May 21, 2021 Disk dropped again, so there's no SMART report, you're using a Marvell controller with port multipliers, these are a known problem separately, even worse together, there's also another controller with port multipliers, you should really get rid of those. Connect the disk to a different port/controller and post new diags so we can see SMART. Quote Link to comment
ChatNoir Posted May 21, 2021 Share Posted May 21, 2021 You can find a list of known good controllers there : https://forums.unraid.net/topic/102010-recommended-controllers-for-unraid/ Quote Link to comment
ndk Posted May 24, 2021 Author Share Posted May 24, 2021 On 5/21/2021 at 1:26 PM, JorgeB said: Disk dropped again, so there's no SMART report, you're using a Marvell controller with port multipliers, these are a known problem separately, even worse together, there's also another controller with port multipliers, you should really get rid of those. Connect the disk to a different port/controller and post new diags so we can see SMART. Got you, thanks for the advice! I currently have all my drives attached to the onboard SATA ports (motherboard is the ASrock Rack X470D4U). It sounds like the only option is to buy an add-in card controller correct? I was avoiding doing the because I'm using those slots for other things. My plan was to get an LSI card flashed to IT mode like this one: https://www.ebay.com/itm/254940196705 Assuming it's just a controller issue then, what is the best way to re-introduce the drive once I have it back up? (I will definitely get the SMART report as soon as possible) Thanks again! Quote Link to comment
Frank1940 Posted May 24, 2021 Share Posted May 24, 2021 11 minutes ago, GroundMoose said: My plan was to get an LSI card flashed to IT mode like this one: https://www.ebay.com/itm/254940196705 With new LSI cards of this vintage, they are all made by some Chinese manufacturer using the LSI chip set. (They reversed-engineered an original LSI board-- often right down to the paper labels.) Problem is quality as none of the manufacturer(s?) never seems to put his information on any of the boards. For that reason, they are often considered counterfeit! So when you buy one, you are really purchasing the vendor with whom you are dealing. Vet him carefully. Price, while important, should not be your only parameter when making the purchase decision. You should make sure that the firmware version is 20.00.07.00 as earlier versions have problems. (If it is not listed, an e-mail to the vendor may provide an indication of how responsible he will be if you have an issue after you make your purchase.) As least, this manufacturer did not put the LSI logo on the board which a positive... Quote Link to comment
ndk Posted May 24, 2021 Author Share Posted May 24, 2021 (edited) 2 hours ago, Frank1940 said: As least, this manufacturer did not put the LSI logo on the board which a positive... Interesting...I was not aware counterfeit cards were so prevalent. Is there any way to check once I get my hands on the card itself? Alternatively, are there any reputable vendors of these cards? Also, forgive my ignorance, but why would it be a positive that there is no LSI logo on this card? Edited May 24, 2021 by GroundMoose Quote Link to comment
ChatNoir Posted May 24, 2021 Share Posted May 24, 2021 34 minutes ago, GroundMoose said: Also, forgive my ignorance, but why would it be a positive that there is no LSI logo on this card? At least the manufacturer is somehow honest and does not put another company name on the product. Many do go that far and it makes it more difficult to identify reals product. Quote Link to comment
JonathanM Posted May 24, 2021 Share Posted May 24, 2021 40 minutes ago, GroundMoose said: Is there any way to check once I get my hands on the card itself? Send the serial number to LSI tech support. However, any 92xx series card that is advertised as new IS counterfeit. 93XX is the only series currently being manufactured by LSI. If you want a 92XX card, your best bet is a used server pull manufactured under a big brand like dell, typically like a dell h310 Quote Link to comment
Frank1940 Posted May 24, 2021 Share Posted May 24, 2021 Look for the 8-port cards (about half-way down the first post) in this thread: https://forums.unraid.net/topic/102010-recommended-controllers-for-unraid/?tab=comments#comment-941151 It lists the OEM boards server LSI boards. As far as I know, these boards were all made for Dell and IBM by LSI. These used LSI boards came on the market after server farms were removed from service and the equipment was sold off for salvage. Quote Link to comment
John_M Posted May 25, 2021 Share Posted May 25, 2021 10 hours ago, GroundMoose said: I currently have all my drives attached to the onboard SATA ports (motherboard is the ASrock Rack X470D4U). There are no Marvell controllers on that motherboard. Six of the SATA ports are provided by the X470 (ASMedia IP licensed by AMD) and two are provided by a discrete ASMedia chip, with no port multipliers. ASMedia controllers are generally reliable. I notice you've had two different disks dropped by the same port. Were they connected using the same SATA cable, by any chance? Have you tried replacing that cable? 1 Quote Link to comment
JorgeB Posted May 25, 2021 Share Posted May 25, 2021 6 hours ago, John_M said: There are no Marvell controllers on that motherboard. No they are not, looks like I was looking at different diags, sorry about that. On 5/21/2021 at 6:26 PM, JorgeB said: Disk dropped again, so there's no SMART report, you're using a Marvell controller with port multipliers, these are a known problem separately, even worse together, there's also another controller with port multipliers, you should really get rid of those. Connect the disk to a different port/controller and post new diags so we can see SMART. Ignore all I posted before, disk dropped offline and looks to be the result of a bad SATA cable, though you still dint' post the SMART report. Quote Link to comment
ndk Posted May 27, 2021 Author Share Posted May 27, 2021 (edited) On 5/24/2021 at 8:31 PM, John_M said: I notice you've had two different disks dropped by the same port. Were they connected using the same SATA cable, by any chance? Have you tried replacing that cable? On 5/25/2021 at 2:54 AM, JorgeB said: disk dropped offline and looks to be the result of a bad SATA cable, though you still dint' post the SMART report. I still haven't had a chance to reconnect the drive and get the SMART report off it but I'll do that soon. I did try replacing the cable after the first dropout, although admittedly with another cheap SATA cable from the same manufacturer. It seems strange that the same port would drop two different drives with two different cables though since I'm using the same cables on all my drives. I went ahead and got what seems like a reputable Dell LSI card though. Is there a chance it's still worth going that route? Edited May 27, 2021 by GroundMoose Quote Link to comment
JorgeB Posted May 27, 2021 Share Posted May 27, 2021 5 minutes ago, GroundMoose said: I went ahead and got what seems like a reputable Dell LSI card though. Is there a chance it's still worth going that route? Not for now unless you need the extra ports, though you might need it the future since these AMD chipsets sometimes have issues with the onboard SATA controllers, dropping multiple disks at the same time. Quote Link to comment
ndk Posted May 28, 2021 Author Share Posted May 28, 2021 On 5/21/2021 at 1:26 PM, JorgeB said: Connect the disk to a different port/controller and post new diags so we can see SMART. Ok I have the disk back up on a new controller. Here are the new diags with SMART reports. My question now is how best to put the disk back in the array. diagnostics-20210528-1157.zip Quote Link to comment
JorgeB Posted May 28, 2021 Share Posted May 28, 2021 SMART looks good and the emulated disk is mounting so you can rebuild on top: stop array, unassign disk5, start array, stop array, re-assign disk5, start array to begin rebuild. Quote Link to comment
ndk Posted May 28, 2021 Author Share Posted May 28, 2021 (edited) 52 minutes ago, JorgeB said: SMART looks good and the emulated disk is mounting so you can rebuild on top: stop array, unassign disk5, start array, stop array, re-assign disk5, start array to begin rebuild. Ok amazing. The only thing I was worried about is that I definitely copied a bunch of stuff to the array before realizing that the drive was down. Won't this affect the parity in a weird way and then how the drive is rebuilt? Edited May 28, 2021 by GroundMoose Quote Link to comment
JorgeB Posted May 28, 2021 Share Posted May 28, 2021 8 minutes ago, GroundMoose said: Ok amazing. The only thing I was worried about is that I definitely copied a bunch of stuff to the array before realizing that the drive was down. Won't this affect the parity in a weird way and then how the drive is rebuilt? No, any data that would go to the disabled disk would still be written in the emulated disk, and parity remains always in sync. Quote Link to comment
ndk Posted May 28, 2021 Author Share Posted May 28, 2021 17 minutes ago, JorgeB said: No, any data that would go to the disabled disk would still be written in the emulated disk, and parity remains always in sync. Ok great to know. Thanks again for all the help! Quote Link to comment
ndk Posted May 28, 2021 Author Share Posted May 28, 2021 1 hour ago, JorgeB said: SMART looks good and the emulated disk is mounting so you can rebuild on top: stop array, unassign disk5, start array, stop array, re-assign disk5, start array to begin rebuild. I did receive this warning: udma crc error count is 1 Is this left over from the original bad cable or could this be a current problem? Quote Link to comment
JorgeB Posted May 28, 2021 Share Posted May 28, 2021 11 minutes ago, GroundMoose said: Is this left over from the original bad cable This, CRC errors attribute doesn't reset, but as long as it doesn't keep increasing issue is solved. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.