Jump to content

Thousands of read/write errors to parity


johnsanc

Recommended Posts

These entries from the smart report don't look too good on the drive:

 

197 Current_Pending_Sector  0x0012  100  100  000    Old_age  Always      -      120

198 Offline_Uncorrectable  0x0010  100  100  000    Old_age  Offline      -      120

 

You may want to replace that drive, and then run a pre-clear to test it out.

Link to comment

I decided to just continue with a Parity-Sync to see if it would work. Over the course of the parity sync all 120 Pending Sectors and Offline Uncorrectable errors returned to zero. Also, reallocated sectors are still zero as well.

 

Does anyone know why these errors may have occurred in the first place and also why this now appears to be fixed?

Link to comment

I decided to just continue with a Parity-Sync to see if it would work. Over the course of the parity sync all 120 Pending Sectors and Offline Uncorrectable errors returned to zero. Also, reallocated sectors are still zero as well.

 

Does anyone know why these errors may have occurred in the first place and also why this now appears to be fixed?

 

Can't really answer your question, but I think you should run a non-correcting parity check now to just give it all the "once over" and check the parity is valid..

Link to comment

Parity sync completed successfully, but shortly thereafter the pending sectors and offline uncorrectable started going up again and I also see a bunch more command timeouts in the SMART report. I also see stuff like this in the logs about 8 times since the parity sync completed. Does this look more like a cabling / power issue rather than an issue from the drive itself?

 

 

Aug  2 23:06:33 Tower kernel: ata5.00: exception Emask 0x10 SAct 0x0 SErr 0x280100 action 0x6 frozen (Errors)
Aug  2 23:06:33 Tower kernel: ata5.00: irq_stat 0x08000000, interface fatal error (Errors)
Aug  2 23:06:33 Tower kernel: ata5: SError: { UnrecovData 10B8B BadCRC } (Errors)
Aug  2 23:06:33 Tower kernel: ata5.00: failed command: READ DMA EXT (Minor Issues)
Aug  2 23:06:33 Tower kernel: ata5.00: cmd 25/00:40:00:c4:d8/00:05:3f:01:00/e0 tag 10 dma 688128 in (Drive related)
Aug  2 23:06:33 Tower kernel:         res 50/00:00:ff:c3:d8/00:00:3f:01:00/e0 Emask 0x10 (ATA bus error) (Errors)
Aug  2 23:06:33 Tower kernel: ata5.00: status: { DRDY } (Drive related)
Aug  2 23:06:33 Tower kernel: ata5: hard resetting link (Minor Issues)
Aug  2 23:06:34 Tower kernel: ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300) (Drive related)
Aug  2 23:06:34 Tower kernel: ata5.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded (Drive related)
Aug  2 23:06:34 Tower kernel: ata5.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out (Drive related)
Aug  2 23:06:34 Tower kernel: ata5.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out (Drive related)
Aug  2 23:06:34 Tower kernel: ata5.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded (Drive related)
Aug  2 23:06:34 Tower kernel: ata5.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out (Drive related)
Aug  2 23:06:34 Tower kernel: ata5.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out (Drive related)
Aug  2 23:06:34 Tower kernel: ata5.00: configured for UDMA/133 (Drive related)
Aug  2 23:06:34 Tower kernel: ata5: EH complete (Drive related)

 

 

Link to comment

Same issue happened again, after a couple of days I get thousands or read/write errors until the parity is just marked as disabled. I suppose I will try a preclear and record reports before and after. If its an issue with the drive it should be apparent with a preclear shouldn't it?

Link to comment

Thanks for the tip. I also stumbled upon this which is pretty helpful: https://lime-technology.com/wiki/index.php/The_Analysis_of_Drive_Issues

 

Because this issue is consistently appearing on the parity drive only, I was curious what the safest troubleshooting route would be. This drive is currently in a 5-bay drive cage that takes 2 power connectors. One other drive in the cage has about 30 Pending Sectors (Not sure how long those have been there, but its an old drive), and the other 3 appear to be fine.

 

Ive tried reseating the existing sata cable which is a decent one with clips on the connectors. I've also reseated the drive itself in the cage, but I haven't moved any drives around. Neither of these fixed the issue. My PSU is a Seasonic X650 and the eXtreme PSU calculator estimates I may use about 550 watts.

Link to comment

A quick update. Here's what I've tried to diagnose why my parity drive had all these read/write issues.

 

[*]Reseated disk in cage and parity-synced with same SATA and power connection - worked fine, but failed a day later with the same errors

[*]Reseated disk in cage and precleared the parity disk in the same slot - failed at around 85% of zeroing step

[*]Removed parity disk and precleared an old 250GB HDD in the same slot - No issues in syslog, preclear fine

[*]Precleared parity on a different SATA cable, port, and power connection (same server) - No issues in syslog, preclear report attached

 

So this leads me to believe that the 8TB parity drive is fine, but its an issue with the power, SATA cable, or that slot in the drive cage. However, I am wondering why there were zero issues with the 250GB test drive in that slot.

 

I am pretty sure I have adequate power considering that the 250GB drive is more demanding than the 8TB drive, also the 5-bay drive cage is powered by 2 SATA power connectors, and the other 4 drives appear fine.

 

Any thoughts on next steps?

 

preclear_rpt_Z8402Y2D_2015-08-09.txt

preclear_start_Z8402Y2D_2015-08-09.txt

preclear_finish_Z8402Y2D_2015-08-09.txt

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...