Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Strange Parity errors - need help narrowing down issue.

Featured Replies

I recently upgraded a data disk, and after it finished rebuilding, ran two passes of a NOCORRECT parity check.  There were no sync errors reported during these tests.  When the monthly NOCORRECT parity check ran, it reported 32 sync errors.  I re-ran two more NOCORRECT parity checks, and again, there were no errors reported.  So I went ahead and ran a CORRECT parity check, but the 32 sync errors reappeared in a different range.

 

It seems as though the errors were on ata6.00, which corresponds to my parity disk. 

 

May 31 20:06:17 Tower kernel: ata6.00: ATA-8: WDC WD3001FAEX-00MJRA0, 01.01L01, max UDMA/133

 

After the original 32 sync errors, I ran a long SMART test which didn't turn up anything unusual.  I did recently upgrade the parity disk a few weeks back, but before putting it in the system ran three passes of badblocks v1.42 and one pass of preclear with nothing of note.  The only thing I can think of that may have caused the sync errors was the MOVER script running during these checks, but that shouldn't interfere should it?

 

If anybody has some further insight, I'd appreciate it.  Thank you.

 

SMART report:

 

smartctl -a -d ata /dev/sdf
smartctl 5.40 2010-10-16 r3189 [i486-slackware-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD3001FAEX-00MJRA0
Serial Number:    WD-WCC130288340
Firmware Version: 01.01L01
User Capacity:    3,000,592,982,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Mon Jun  3 07:07:20 2013 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)	Offline data collection activity
				was completed without error.
				Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
				without error or no self-test has ever 
				been run.
Total time to complete Offline 
data collection: 		 (34740) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
				Auto Offline data collection on/off support.
				Suspend Offline collection upon new
				command.
				Offline surface scan supported.
				Self-test supported.
				Conveyance Self-test supported.
				Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
				power-saving mode.
				Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
				General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 255) minutes.
Conveyance self-test routine
recommended polling time: 	 (   5) minutes.
SCT capabilities: 	       (0x70b5)	SCT Status supported.
				SCT Feature Control supported.
				SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   167   167   021    Pre-fail  Always       -       10608
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       29
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       445
10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       10
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       3
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       25
194 Temperature_Celsius     0x0022   120   111   000    Old_age   Always       -       32
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       1
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%       408         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

 

 

First 32 sync errors:

 

Jun  1 06:25:12 Tower kernel: md: parity incorrect, sector=2654608040
Jun  1 06:25:12 Tower kernel: md: parity incorrect, sector=2654608048
Jun  1 06:25:12 Tower kernel: md: parity incorrect, sector=2654608056
Jun  1 06:25:12 Tower kernel: md: parity incorrect, sector=2654608064
Jun  1 06:25:12 Tower kernel: md: parity incorrect, sector=2654608072
Jun  1 06:25:12 Tower kernel: md: parity incorrect, sector=2654608080
Jun  1 06:25:12 Tower kernel: md: parity incorrect, sector=2654608088
Jun  1 06:25:12 Tower kernel: md: parity incorrect, sector=2654608096
Jun  1 06:25:12 Tower kernel: md: parity incorrect, sector=2654608104
Jun  1 06:25:12 Tower kernel: md: parity incorrect, sector=2654608112
Jun  1 06:25:12 Tower kernel: md: parity incorrect, sector=2654608120
Jun  1 06:25:12 Tower kernel: md: parity incorrect, sector=2654608128
Jun  1 06:25:12 Tower kernel: md: parity incorrect, sector=2654608136
Jun  1 06:25:12 Tower kernel: md: parity incorrect, sector=2654608144
Jun  1 06:25:12 Tower kernel: md: parity incorrect, sector=2654608152
Jun  1 06:25:12 Tower kernel: md: parity incorrect, sector=2654608160
Jun  1 06:25:12 Tower kernel: md: parity incorrect, sector=2654608168
Jun  1 06:25:12 Tower kernel: md: parity incorrect, sector=2654608176
Jun  1 06:25:12 Tower kernel: md: parity incorrect, sector=2654608184
Jun  1 06:25:12 Tower kernel: md: parity incorrect, sector=2654608192
Jun  1 06:25:12 Tower kernel: md: parity incorrect, sector=2654608200
Jun  1 06:25:12 Tower kernel: md: parity incorrect, sector=2654608208
Jun  1 06:25:12 Tower kernel: md: parity incorrect, sector=2654608216
Jun  1 06:25:12 Tower kernel: md: parity incorrect, sector=2654608224
Jun  1 06:25:12 Tower kernel: md: parity incorrect, sector=2654608232
Jun  1 06:25:12 Tower kernel: md: parity incorrect, sector=2654608240
Jun  1 06:25:12 Tower kernel: md: parity incorrect, sector=2654608248
Jun  1 06:25:12 Tower kernel: md: parity incorrect, sector=2654608256
Jun  1 06:25:12 Tower kernel: md: parity incorrect, sector=2654608264
Jun  1 06:25:12 Tower kernel: md: parity incorrect, sector=2654608272
Jun  1 06:25:12 Tower kernel: md: parity incorrect, sector=2654608280
Jun  1 06:25:12 Tower kernel: md: parity incorrect, sector=2654608288
Jun  1 06:25:12 Tower kernel: ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Jun  1 06:25:12 Tower kernel: ata6.00: irq_stat 0x40000001
Jun  1 06:25:12 Tower kernel: ata6.00: failed command: READ DMA EXT
Jun  1 06:25:12 Tower kernel: ata6.00: cmd 25/00:00:d8:1d:3a/00:04:9e:00:00/e0 tag 0 dma 524288 in
Jun  1 06:25:12 Tower kernel:          res 51/40:df:ec:1f:3a/00:01:9e:00:00/e0 Emask 0x9 (media error)
Jun  1 06:25:12 Tower kernel: ata6.00: status: { DRDY ERR }
Jun  1 06:25:12 Tower kernel: ata6.00: error: { UNC }
Jun  1 06:25:12 Tower kernel: ata6.00: configured for UDMA/133
Jun  1 06:25:12 Tower kernel: ata6: EH complete

 

Second 32 sync errors:

 

Jun  3 01:45:06 Tower kernel: md: correcting parity, sector=1915897096
Jun  3 01:45:06 Tower kernel: md: correcting parity, sector=1915897104
Jun  3 01:45:06 Tower kernel: md: correcting parity, sector=1915897112
Jun  3 01:45:06 Tower kernel: md: correcting parity, sector=1915897120
Jun  3 01:45:06 Tower kernel: md: correcting parity, sector=1915897128
Jun  3 01:45:06 Tower kernel: md: correcting parity, sector=1915897136
Jun  3 01:45:06 Tower kernel: md: correcting parity, sector=1915897144
Jun  3 01:45:06 Tower kernel: md: correcting parity, sector=1915897152
Jun  3 01:45:06 Tower kernel: md: correcting parity, sector=1915897160
Jun  3 01:45:06 Tower kernel: md: correcting parity, sector=1915897168
Jun  3 01:45:06 Tower kernel: md: correcting parity, sector=1915897176
Jun  3 01:45:06 Tower kernel: md: correcting parity, sector=1915897184
Jun  3 01:45:06 Tower kernel: md: correcting parity, sector=1915897192
Jun  3 01:45:06 Tower kernel: md: correcting parity, sector=1915897200
Jun  3 01:45:06 Tower kernel: md: correcting parity, sector=1915897208
Jun  3 01:45:06 Tower kernel: md: correcting parity, sector=1915897216
Jun  3 01:45:06 Tower kernel: md: correcting parity, sector=1915897224
Jun  3 01:45:06 Tower kernel: md: correcting parity, sector=1915897232
Jun  3 01:45:06 Tower kernel: md: correcting parity, sector=1915897240
Jun  3 01:45:06 Tower kernel: md: correcting parity, sector=1915897248
Jun  3 01:45:06 Tower kernel: md: correcting parity, sector=1915897256
Jun  3 01:45:06 Tower kernel: md: correcting parity, sector=1915897264
Jun  3 01:45:06 Tower kernel: md: correcting parity, sector=1915897272
Jun  3 01:45:06 Tower kernel: md: correcting parity, sector=1915897280
Jun  3 01:45:06 Tower kernel: md: correcting parity, sector=1915897288
Jun  3 01:45:06 Tower kernel: md: correcting parity, sector=1915897296
Jun  3 01:45:06 Tower kernel: md: correcting parity, sector=1915897304
Jun  3 01:45:06 Tower kernel: md: correcting parity, sector=1915897312
Jun  3 01:45:06 Tower kernel: md: correcting parity, sector=1915897320
Jun  3 01:45:06 Tower kernel: md: correcting parity, sector=1915897328
Jun  3 01:45:06 Tower kernel: md: correcting parity, sector=1915897336
Jun  3 01:45:06 Tower kernel: md: correcting parity, sector=1915897344
Jun  3 01:46:06 Tower kernel: ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Jun  3 01:46:06 Tower kernel: ata6.00: failed command: FLUSH CACHE EXT
Jun  3 01:46:06 Tower kernel: ata6.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
Jun  3 01:46:06 Tower kernel:          res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Jun  3 01:46:06 Tower kernel: ata6.00: status: { DRDY }
Jun  3 01:46:06 Tower kernel: ata6: hard resetting link
Jun  3 01:46:07 Tower kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jun  3 01:46:07 Tower kernel: ata6.00: configured for UDMA/133
Jun  3 01:46:07 Tower kernel: ata6.00: retrying FLUSH 0xea Emask 0x4
Jun  3 01:46:07 Tower kernel: ata6: EH complete

syslog-2013-06-03.zip

It's difficult to say what causes these occasional sync errors ... but it's not uncommon to occasionally have non-zero sync counts.  I've had this perhaps 2-3 times in the 4 years I've been using UnRAID ... but it's never represented a real problem (i.e. all the data was fine).

 

If you're concerned, run a comparison of all of your data against your backups.  I've done that a couple of times when the sync count was non-zero, but it's always been perfect, so I'm reasonably convinced that Tom's view that sync errors are always actually on the parity disk (which is why corrections are written to that disk) is accurate.  In fact, I never do a "non-correcting" test ... why would you NOT want to correct one of these errors??

 

  • Author

garycase -

 

I just found it strange to have two consecutive parity checks in a row complete without error, and then to have the third report an error.  Then more weirdness since running two more tests and a long SMART test without error, and the third check to report the exact same number of sync errors.  I was under the impression that one should be wary of sync errors.  Plus, since there were UNC errors reported on the parity disk, I thought that maybe the parity disk might have to be replaced.  But it was odd that there were no reallocated or pending sectors reported on SMART.  I thought the rigorous exercise through badblocks and the preclear would have turned up any problems.

 

dgaschk -

 

attached the syslog as requested to the original post.

 

Thank you all...

Try a new parity disk and/or SATA cable. Run pre-clear on the parity drive.

  • Author

Try a new parity disk and/or SATA cable. Run pre-clear on the parity drive.

 

Tried replacing the parity disk, but ran into a slew of sync errors (with a brand new hard disk, 3x badblocks v1.42 and precleared).  So I opened up the case and checked things out.  The breakout cable still seemed firmly seated, and connected directly to mobo.  Didn't know what else could have happened, so I went ahead and replaced the backplane with an unused one.  Took the moment to upgrade one of the controller cards from an SASLP to a SAS2LP as well.

 

Parity rewritten without a problem, and starting first parity check which looks like it will complete without issue either.  Will start a second as soon as that is finished.  Unfortunately, looks like some of my data on the other disks will be corrupted as the parity disk was the first thing I upgraded followed by some other 3TB data disks.  No big deal as there was nothing irreplaceable. 

 

One benefit is that my parity speed has increased from ~70 MB/s to ~100 MB/s! 

Archived

This topic is now archived and is closed to further replies.

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.