johnsanc Posted July 31, 2015 Share Posted July 31, 2015 I just noticed that I have thousands of read/write errors to my parity disk. Would anyone mind taking a look to see what might have caused this? My parity is an 8TB Seagate Archive drive. Syslog attached. Complete diagnostics zip is over the 192kb max for upload. syslog-2015-07-31.txt.zip Link to comment
CHBMB Posted July 31, 2015 Share Posted July 31, 2015 Yep, definitely a lot of read/write errors to drive0. Why don't you post a SMART report on the disk as well as see what that shows.. Link to comment
johnsanc Posted July 31, 2015 Author Share Posted July 31, 2015 SMART report attached. Let me know if you need anything else. ST8000AS0002-1NA17Z_Z8402Y2D.txt Link to comment
jphipps Posted July 31, 2015 Share Posted July 31, 2015 These entries from the smart report don't look too good on the drive: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 120 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 120 You may want to replace that drive, and then run a pre-clear to test it out. Link to comment
dgaschk Posted August 1, 2015 Share Posted August 1, 2015 See here: http://lime-technology.com/wiki/index.php/Troubleshooting#Resolving_a_Pending_Sector Link to comment
johnsanc Posted August 1, 2015 Author Share Posted August 1, 2015 Does that mean I need to preclear this drive again to get those pending sectors down to zero before this drive can be used for Parity? Link to comment
johnsanc Posted August 2, 2015 Author Share Posted August 2, 2015 I decided to just continue with a Parity-Sync to see if it would work. Over the course of the parity sync all 120 Pending Sectors and Offline Uncorrectable errors returned to zero. Also, reallocated sectors are still zero as well. Does anyone know why these errors may have occurred in the first place and also why this now appears to be fixed? Link to comment
CHBMB Posted August 2, 2015 Share Posted August 2, 2015 I decided to just continue with a Parity-Sync to see if it would work. Over the course of the parity sync all 120 Pending Sectors and Offline Uncorrectable errors returned to zero. Also, reallocated sectors are still zero as well. Does anyone know why these errors may have occurred in the first place and also why this now appears to be fixed? Can't really answer your question, but I think you should run a non-correcting parity check now to just give it all the "once over" and check the parity is valid.. Link to comment
johnsanc Posted August 4, 2015 Author Share Posted August 4, 2015 Parity sync completed successfully, but shortly thereafter the pending sectors and offline uncorrectable started going up again and I also see a bunch more command timeouts in the SMART report. I also see stuff like this in the logs about 8 times since the parity sync completed. Does this look more like a cabling / power issue rather than an issue from the drive itself? Aug 2 23:06:33 Tower kernel: ata5.00: exception Emask 0x10 SAct 0x0 SErr 0x280100 action 0x6 frozen (Errors) Aug 2 23:06:33 Tower kernel: ata5.00: irq_stat 0x08000000, interface fatal error (Errors) Aug 2 23:06:33 Tower kernel: ata5: SError: { UnrecovData 10B8B BadCRC } (Errors) Aug 2 23:06:33 Tower kernel: ata5.00: failed command: READ DMA EXT (Minor Issues) Aug 2 23:06:33 Tower kernel: ata5.00: cmd 25/00:40:00:c4:d8/00:05:3f:01:00/e0 tag 10 dma 688128 in (Drive related) Aug 2 23:06:33 Tower kernel: res 50/00:00:ff:c3:d8/00:00:3f:01:00/e0 Emask 0x10 (ATA bus error) (Errors) Aug 2 23:06:33 Tower kernel: ata5.00: status: { DRDY } (Drive related) Aug 2 23:06:33 Tower kernel: ata5: hard resetting link (Minor Issues) Aug 2 23:06:34 Tower kernel: ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300) (Drive related) Aug 2 23:06:34 Tower kernel: ata5.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded (Drive related) Aug 2 23:06:34 Tower kernel: ata5.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out (Drive related) Aug 2 23:06:34 Tower kernel: ata5.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out (Drive related) Aug 2 23:06:34 Tower kernel: ata5.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded (Drive related) Aug 2 23:06:34 Tower kernel: ata5.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out (Drive related) Aug 2 23:06:34 Tower kernel: ata5.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out (Drive related) Aug 2 23:06:34 Tower kernel: ata5.00: configured for UDMA/133 (Drive related) Aug 2 23:06:34 Tower kernel: ata5: EH complete (Drive related) Link to comment
Zonediver Posted August 4, 2015 Share Posted August 4, 2015 It seems, it is a Seagate-HDD >>> remove it as soon as possible - all of my 26 Seagate HDDs died in this way - hurry up Link to comment
johnsanc Posted August 6, 2015 Author Share Posted August 6, 2015 Same issue happened again, after a couple of days I get thousands or read/write errors until the parity is just marked as disabled. I suppose I will try a preclear and record reports before and after. If its an issue with the drive it should be apparent with a preclear shouldn't it? Link to comment
dgaschk Posted August 6, 2015 Share Posted August 6, 2015 10B8B BadCRC Bad or loose SATA cable or dirty or bad SATA port. Link to comment
johnsanc Posted August 6, 2015 Author Share Posted August 6, 2015 Thanks for the tip. I also stumbled upon this which is pretty helpful: https://lime-technology.com/wiki/index.php/The_Analysis_of_Drive_Issues Because this issue is consistently appearing on the parity drive only, I was curious what the safest troubleshooting route would be. This drive is currently in a 5-bay drive cage that takes 2 power connectors. One other drive in the cage has about 30 Pending Sectors (Not sure how long those have been there, but its an old drive), and the other 3 appear to be fine. Ive tried reseating the existing sata cable which is a decent one with clips on the connectors. I've also reseated the drive itself in the cage, but I haven't moved any drives around. Neither of these fixed the issue. My PSU is a Seasonic X650 and the eXtreme PSU calculator estimates I may use about 550 watts. Link to comment
johnsanc Posted August 9, 2015 Author Share Posted August 9, 2015 A quick update. Here's what I've tried to diagnose why my parity drive had all these read/write issues. [*]Reseated disk in cage and parity-synced with same SATA and power connection - worked fine, but failed a day later with the same errors [*]Reseated disk in cage and precleared the parity disk in the same slot - failed at around 85% of zeroing step [*]Removed parity disk and precleared an old 250GB HDD in the same slot - No issues in syslog, preclear fine [*]Precleared parity on a different SATA cable, port, and power connection (same server) - No issues in syslog, preclear report attached So this leads me to believe that the 8TB parity drive is fine, but its an issue with the power, SATA cable, or that slot in the drive cage. However, I am wondering why there were zero issues with the 250GB test drive in that slot. I am pretty sure I have adequate power considering that the 250GB drive is more demanding than the 8TB drive, also the 5-bay drive cage is powered by 2 SATA power connectors, and the other 4 drives appear fine. Any thoughts on next steps? preclear_rpt_Z8402Y2D_2015-08-09.txt preclear_start_Z8402Y2D_2015-08-09.txt preclear_finish_Z8402Y2D_2015-08-09.txt Link to comment
johnsanc Posted August 12, 2015 Author Share Posted August 12, 2015 All is well again. It appears to have been a bad SATA cable. Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.