Thousands of read/write errors to parity

July 31, 201510 yr

I just noticed that I have thousands of read/write errors to my parity disk. Would anyone mind taking a look to see what might have caused this? My parity is an 8TB Seagate Archive drive.

Syslog attached. Complete diagnostics zip is over the 192kb max for upload.

syslog-2015-07-31.txt.zip

Quote

July 31, 201510 yr

Yep, definitely a lot of read/write errors to drive0.

Why don't you post a SMART report on the disk as well as see what that shows..

Quote

July 31, 201510 yr

Author

SMART report attached. Let me know if you need anything else.

ST8000AS0002-1NA17Z_Z8402Y2D.txt

Quote

July 31, 201510 yr

These entries from the smart report don't look too good on the drive:

197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 120

198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 120

You may want to replace that drive, and then run a pre-clear to test it out.

Quote

August 1, 201510 yr

See here: http://lime-technology.com/wiki/index.php/Troubleshooting#Resolving_a_Pending_Sector

Quote

August 1, 201510 yr

Author

Does that mean I need to preclear this drive again to get those pending sectors down to zero before this drive can be used for Parity?

Quote

August 2, 201510 yr

Author

I decided to just continue with a Parity-Sync to see if it would work. Over the course of the parity sync all 120 Pending Sectors and Offline Uncorrectable errors returned to zero. Also, reallocated sectors are still zero as well.

Does anyone know why these errors may have occurred in the first place and also why this now appears to be fixed?

Quote

August 2, 201510 yr

I decided to just continue with a Parity-Sync to see if it would work. Over the course of the parity sync all 120 Pending Sectors and Offline Uncorrectable errors returned to zero. Also, reallocated sectors are still zero as well.

Does anyone know why these errors may have occurred in the first place and also why this now appears to be fixed?

Can't really answer your question, but I think you should run a non-correcting parity check now to just give it all the "once over" and check the parity is valid..

Quote

August 4, 201510 yr

Author

Parity sync completed successfully, but shortly thereafter the pending sectors and offline uncorrectable started going up again and I also see a bunch more command timeouts in the SMART report. I also see stuff like this in the logs about 8 times since the parity sync completed. Does this look more like a cabling / power issue rather than an issue from the drive itself?

Aug  2 23:06:33 Tower kernel: ata5.00: exception Emask 0x10 SAct 0x0 SErr 0x280100 action 0x6 frozen (Errors)
Aug  2 23:06:33 Tower kernel: ata5.00: irq_stat 0x08000000, interface fatal error (Errors)
Aug  2 23:06:33 Tower kernel: ata5: SError: { UnrecovData 10B8B BadCRC } (Errors)
Aug  2 23:06:33 Tower kernel: ata5.00: failed command: READ DMA EXT (Minor Issues)
Aug  2 23:06:33 Tower kernel: ata5.00: cmd 25/00:40:00:c4:d8/00:05:3f:01:00/e0 tag 10 dma 688128 in (Drive related)
Aug  2 23:06:33 Tower kernel:         res 50/00:00:ff:c3:d8/00:00:3f:01:00/e0 Emask 0x10 (ATA bus error) (Errors)
Aug  2 23:06:33 Tower kernel: ata5.00: status: { DRDY } (Drive related)
Aug  2 23:06:33 Tower kernel: ata5: hard resetting link (Minor Issues)
Aug  2 23:06:34 Tower kernel: ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300) (Drive related)
Aug  2 23:06:34 Tower kernel: ata5.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded (Drive related)
Aug  2 23:06:34 Tower kernel: ata5.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out (Drive related)
Aug  2 23:06:34 Tower kernel: ata5.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out (Drive related)
Aug  2 23:06:34 Tower kernel: ata5.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded (Drive related)
Aug  2 23:06:34 Tower kernel: ata5.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out (Drive related)
Aug  2 23:06:34 Tower kernel: ata5.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out (Drive related)
Aug  2 23:06:34 Tower kernel: ata5.00: configured for UDMA/133 (Drive related)
Aug  2 23:06:34 Tower kernel: ata5: EH complete (Drive related)

Quote

August 4, 201510 yr

It seems, it is a Seagate-HDD >>> remove it as soon as possible - all of my 26 Seagate HDDs died in this way - hurry up

Quote

August 6, 201510 yr

Author

Same issue happened again, after a couple of days I get thousands or read/write errors until the parity is just marked as disabled. I suppose I will try a preclear and record reports before and after. If its an issue with the drive it should be apparent with a preclear shouldn't it?

Quote

August 6, 201510 yr

10B8B BadCRC

Bad or loose SATA cable or dirty or bad SATA port.

Quote

August 6, 201510 yr

Author

Thanks for the tip. I also stumbled upon this which is pretty helpful: https://lime-technology.com/wiki/index.php/The_Analysis_of_Drive_Issues

Because this issue is consistently appearing on the parity drive only, I was curious what the safest troubleshooting route would be. This drive is currently in a 5-bay drive cage that takes 2 power connectors. One other drive in the cage has about 30 Pending Sectors (Not sure how long those have been there, but its an old drive), and the other 3 appear to be fine.

Ive tried reseating the existing sata cable which is a decent one with clips on the connectors. I've also reseated the drive itself in the cage, but I haven't moved any drives around. Neither of these fixed the issue. My PSU is a Seasonic X650 and the eXtreme PSU calculator estimates I may use about 550 watts.

Quote

August 9, 201510 yr

Author

A quick update. Here's what I've tried to diagnose why my parity drive had all these read/write issues.

[*]Reseated disk in cage and parity-synced with same SATA and power connection - worked fine, but failed a day later with the same errors

[*]Reseated disk in cage and precleared the parity disk in the same slot - failed at around 85% of zeroing step

[*]Removed parity disk and precleared an old 250GB HDD in the same slot - No issues in syslog, preclear fine

[*]Precleared parity on a different SATA cable, port, and power connection (same server) - No issues in syslog, preclear report attached

So this leads me to believe that the 8TB parity drive is fine, but its an issue with the power, SATA cable, or that slot in the drive cage. However, I am wondering why there were zero issues with the 250GB test drive in that slot.

I am pretty sure I have adequate power considering that the 250GB drive is more demanding than the 8TB drive, also the 5-bay drive cage is powered by 2 SATA power connectors, and the other 4 drives appear fine.

Any thoughts on next steps?

preclear_rpt_Z8402Y2D_2015-08-09.txt

preclear_start_Z8402Y2D_2015-08-09.txt

preclear_finish_Z8402Y2D_2015-08-09.txt

Quote

August 12, 201510 yr

Author

All is well again. It appears to have been a bad SATA cable.

Quote

Thousands of read/write errors to parity

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)