Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Parity sync errors..

Featured Replies

I'm trying to fix my HPA drives and while rebuilding the drive I've started getting a lot of these messages.  The error would happen only a few times when I do a full parity check, but right now every time I do a rebuild/parity check my syslog looks like someone got murdered.

 

The result of the rebuild:

Parity updated  2196  times to address sync errors. 

 

My syslog:

Apr 20 21:10:04 Tower kernel: ata1.00: failed command: READ DMA EXT (Minor Issues)
Apr 20 21:10:04 Tower kernel: ata1.00: cmd 25/00:00:7f:63:00/00:04:3a:00:00/e0 tag 0 dma 524288 in (Drive related)
Apr 20 21:10:04 Tower kernel:          res 51/40:cf:9f:65:00/40:01:3a:00:00/e0 Emask 0x9 (media error) (Errors)
Apr 20 21:10:04 Tower kernel: ata1.00: status: { DRDY ERR } (Drive related)
Apr 20 21:10:04 Tower kernel: ata1.00: error: { UNC } (Errors)
Apr 20 21:10:04 Tower kernel: ata1.00: configured for UDMA/133 (Drive related)
Apr 20 21:10:04 Tower kernel: ata1: EH complete (Drive related)
Apr 20 21:10:06 Tower kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 (Errors)
Apr 20 21:10:06 Tower kernel: ata1.00: BMDMA stat 0x25 (Drive related)
Apr 20 21:10:06 Tower kernel: ata1.00: failed command: READ DMA EXT (Minor Issues)
Apr 20 21:10:06 Tower kernel: ata1.00: cmd 25/00:00:7f:63:00/00:04:3a:00:00/e0 tag 0 dma 524288 in (Drive related)
Apr 20 21:10:06 Tower kernel:          res 51/40:cf:9f:65:00/40:01:3a:00:00/e0 Emask 0x9 (media error) (Errors)
Apr 20 21:10:06 Tower kernel: ata1.00: status: { DRDY ERR } (Drive related)
Apr 20 21:10:06 Tower kernel: ata1.00: error: { UNC } (Errors)
Apr 20 21:10:06 Tower kernel: ata1.00: configured for UDMA/133 (Drive related)
Apr 20 21:10:06 Tower kernel: sd 1:0:0:0: [sda] Unhandled sense code (Drive related)
Apr 20 21:10:06 Tower kernel: sd 1:0:0:0: [sda] Result: hostbyte=0x00 driverbyte=0x08 (System)
Apr 20 21:10:06 Tower kernel: sd 1:0:0:0: [sda] Sense Key : 0x3 [current] [descriptor] (Drive related)
Apr 20 21:10:06 Tower kernel: Descriptor sense data with sense descriptors (in hex):
Apr 20 21:10:06 Tower kernel:         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
Apr 20 21:10:06 Tower kernel:         3a 00 65 9f 
Apr 20 21:10:06 Tower kernel: sd 1:0:0:0: [sda] ASC=0x11 ASCQ=0x4 (Drive related)
Apr 20 21:10:06 Tower kernel: sd 1:0:0:0: [sda] CDB: cdb[0]=0x28: 28 00 3a 00 63 7f 00 04 00 00 (Drive related)
Apr 20 21:10:06 Tower kernel: end_request: I/O error, dev sda, sector 973104543 (Errors)
Apr 20 21:10:06 Tower kernel: ata1: EH complete (Drive related)
Apr 20 21:10:06 Tower kernel: md: disk0 read error (Errors)
Apr 20 21:10:06 Tower kernel: handle_stripe read error: 973104480/0, count: 1 (Errors)
Apr 20 21:10:06 Tower kernel: md: disk0 read error (Errors)
Apr 20 21:10:06 Tower kernel: handle_stripe read error: 973104488/0, count: 1 (Errors)
Apr 20 21:10:06 Tower kernel: md: disk0 read error (Errors)
Apr 20 21:10:06 Tower kernel: handle_stripe read error: 973104496/0, count: 1 (Errors)

 

The handle_stripe read error goes on for hundreds of lines and the first bunch of errors repeat several times.  I'm not sure if this is my drive or the controller itself causing the issue.  Does ata1 refer to the drive on the controller or my motherboard?  sda is my parity drive and it doesn't show any errors in SMARTtest

 

Here's the attributes result for the drive

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       35747
  3 Spin_Up_Time            0x0027   040   040   021    Pre-fail  Always       -       15000
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       551
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   089   089   000    Old_age   Always       -       8191
10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       35
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       25
193 Load_Cycle_Count        0x0032   190   190   000    Old_age   Always       -       32762
194 Temperature_Celsius     0x0022   099   081   000    Old_age   Always       -       53
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   196   196   000    Old_age   Always       -       1425
198 Offline_Uncorrectable   0x0030   197   194   000    Old_age   Offline      -       1111
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   165   001   000    Old_age   Offline      -       7022

 

Any help would be appreciated.

These two attributes in the SMART report show that there are many un-readable sectors on the parity disk.  Basically, it is failing, and pretty badly.

197 Current_Pending_Sector  0x0032  196  196  000    Old_age  Always      -      1425

198 Offline_Uncorrectable  0x0030  197  194  000    Old_age  Offline      -      1111

 

On modern drives there are usually about 2000 spare sectors.  You've got 1425 pending re-allocation the next time they are written.  What you are seeing in the syslog are all the errors every time an attempt is made to read them.

 

The drive needs to be replaced.  Do not wait for it to say it has failed.  Start the  RMA process now.  You would not be able to recover from another data disk failing with so many errors existing in reading the parity disk.

 

Joe L.

  • Author

Thanks Joe, I checked the rest of the drives and none of the other drives are above 0 for the two attributes you posted, whew. I suspected it was my parity drive but I wasn't sure what the deal was with the controller resetting.

And you are also potentially cooking this one at 53 degrees C:

 

194 Temperature_Celsius    0x0022  099  081  000    Old_age  Always      -      53

  • Author

And you are also potentially cooking this one at 53 degrees C:

 

194 Temperature_Celsius     0x0022   099   081   000    Old_age   Always       -       53

This was right after the rebuild so 15 drives making heat...  It's at 29 C right now.

This was right after the rebuild so 15 drives making heat...  It's at 29 C right now.

You probably need to take a serious look at your cooling if you're seeing temperature variations that high, 29C --> 53C.  If one of my drives were that hot I think I'd be getting pretty worried would crap my pants.  That may not be what caused this specific failure but temps that high - even if it is only during parity calculations - certainly are not helping you to get the most life out of your drives.

Archived

This topic is now archived and is closed to further replies.

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.