November 17, 20205 yr Hey all, I'm trying to get to the root cause of several disk read errors that I've been experiencing. Earlier this year, I experienced my first red X due to several read and write errors, so I RMA'd the drive, smooth sailing after. October 19, I had another drive appear with a red X, with only read errors. I swapped that out with a spare drive, began the rebuild, and also performed a pre clear on the 'bad' disk. Array rebuilt correctly, pre clear was successful, so again, I had another spare drive to use for later on. November 9, I had yet another (different) drive appear with a red X for read errors. I swapped this drive out for the drive that i pre-cleared from October, parity was rebuilt/verified successfully. A week later, (November 17) I have another red X, but this is on the disk I had just swapped in (the disk that had errors in October, but pre-cleared just fine). I'm at a loss here, and I am currently down 2 drives (1 spare, 1 for the array, so I am without parity). I did swap out the SATA cable in October for the 1 drive, but I just purchased a 6 pack of SATA cables that I plan to replace all the existing cables with. I've attached all 3 diag logs for each occasion, and the smart check for the disk that has errored out twice now. I'm not entirely sure what I'm looking for, if anyone can assist, or have any ideas on where I can start/continue to troubleshoot, any help would be much appreciated. glados-diagnostics-20201117-1341.zip glados-diagnostics-20201109-1007.zip glados-smart-20201019-2052.zip glados-diagnostics-20201019-1040.zip
November 17, 20205 yr Community Expert You should have asked much sooner. By far the most frequent cause of these problem are bad connections and there is usually nothing wrong with the disk. SMART for that disk looks fine. And syslog looks like it is a bad connection on that disk. You should update your plugins. Some of yours are so out-of-date you are getting missing csrf_token. Don't know if that might also be the reason for all the nginx lines in syslog or not but makes it hard to work with.
November 17, 20205 yr Author thanks trurl, I agree I should have posted earlier but I definitely didn't think it would reoccur as often as it had. I'll try to swap out the SATA cables with new ones tonight and see how that goes. Is there a possibility that it may be something other than the cables/connections? (mobo possibly)? Thanks again for taking a look, I was hoping that log-wise it looked like a bad connection, but I wasn't sure. I'll post back when I've made some progress
November 26, 20205 yr Author So I swapped out all the SATA cables, put assigned one of the 'possibly questionable but precleared perfectly fine' disks into the disk 2 array slot, rebuilt/validated the array, and within a few hours of the rebuild, there were disk read warnings for Disk 2, but it did not disable the disk. 2 days later Disk 2 is now disabled drive. I've attached the diagnostics, but I'm not sure where to go from here. I can put the other 'possibly questionable but precleared perfectly fine' disks into the disk2 slot and run the array rebuild/validation, but I'm really getting demoralized having to do this each week now, something is obviously wrong. Not sure if its the disks, or the actual SATA port on the mobo, but I believe the 'disk 2' slot when it failed twice before was on 2 different SATA ports (old cables at the time though). Does anyone have any insight on what is wrong possibly? looks to be the same read errors again tower-diagnostics-20201126-1530.zip
November 26, 20205 yr Community Expert How's the power supply? How many of those drives are connected to a single cable/splitter?
November 27, 20205 yr Author that's a great point, let me take a look tonight and I'll report back. Thanks for the idea
November 27, 20205 yr Author Looks like I'm running a Corsair SF450 PSU, which has 2 modular ports for SATA and MOLEX. At most, I will run 5 HDDs and 1 Cache, but typically its 4 HDD and 1 Cache. I do have splitters, so the set up looks like this currently: PSU SATA Port: No splitters in play Cache(SSD) Disk 4 - WD Red 6TB - Disk not in array, usually my pre-clear or emergency SATA drive slot PSU MOLEX Port: SPLITTER 1 - PARITY + DISK1 - MOLEX TO SATA SPLITTER 2 - DISK2 + DISK3 - MOLEX TO SATA SPLITTER 3 - 1x 140mm + 2x 92mm fans - MOLEX TO MOLEX To my knowledge (and googling), SATA connectors have 12V, 5V and 3.3V wires, molex uses only first 2 (12 and 5V) and SATA HDD never really used the 3.3v rail... just +12v and +5v. SSDs generally use just the +5v alone. Power Specifications (SF450 specs) 3.3V 5V 12V 5VSB -12V Max. Power Amps 15 20 37.5 2.5 0.3 Watts 100 450 12.5 3.6 Also looks like the EFRX 6TB drives use 1.75 amps at peak, and 5.3 watts for read / write, everything looks like it should be able to power everything correctly, and my typical power draw is ~40 watts idling. Edited November 27, 20205 yr by smakdafrog
November 27, 20205 yr Author I removed all the splitters that were there, and have changed the configuration to the following with brand new splitters and PSU SATA/MOLEX wires to hopefully even out the load: PSU SATA Port: No splitter in play CACHE (SSD) DISK2 DISK4 PSU MOLEX Port: SPLITTER 1 - PARITY + DISK1 + DISK3 - MOLEX TO SATA SPLITTER 2 - 1x 140mm + 2x 92mm fans - MOLEX TO MOLEX I triggered a parity sync / data rebuild, so we'll see what happens.
November 28, 20205 yr Community Expert Yeah, your drives were likely dropping due to the power sagging. Don't use splitters if you can avoid it at all - and if you must, make sure they're the punch down type where you can see the pins attached to the wires and NOT the molded type. I can't stress enough how important it is to avoid the molded splitters.
December 5, 20205 yr Author On 11/27/2020 at 8:53 PM, Michael_P said: Yeah, your drives were likely dropping due to the power sagging. Don't use splitters if you can avoid it at all - and if you must, make sure they're the punch down type where you can see the pins attached to the wires and NOT the molded type. I can't stress enough how important it is to avoid the molded splitters. Michael, I think your hunch may have been correct. The array has been running solid for a week now, so we'll see. I will have to double check the type of splitter I have, but I'm pretty sure its the punch down type. I did a bit of research on that, so thank you for the heads up!
December 5, 20205 yr Community Expert I had a similar issue driving me nuts for over a year Thanks for the follow-up
Archived
This topic is now archived and is closed to further replies.