Parity drive errors - Dual Parity


Recommended Posts

The issue I need to clarify is I have a system with 8 data drives and 2 parity drives, 1 data drive has failed and I'm preparing to replace it but one of the parity drives are also showing read errors during a parity check.

 

So whats the best way to proceed to replace the failed data drive. Will there be any issues with the rebuild if 1 parity drive shows read errors and the other is fine?.

 

The parity drive that has errors as the same batch as the data drive I'm replacing and shows errors every time I run a party check. Most probably it needs to be replaced asap too.

 

Below is what smart is showing for the Parity Drive that's having issues.

 

ATA Error Count: 11 (device log contains only the most recent five errors)
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 11 occurred at disk power-on lifetime: 9338 hours (389 days + 2 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 08 48 6d 1c e0  Error: UNC 8 sectors at LBA = 0x001c6d48 = 1862984

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 48 6d 1c e0 08   4d+14:26:03.950  READ DMA
  c8 00 08 40 6d 1c e0 08   4d+14:26:03.949  READ DMA
  c8 00 08 38 6d 1c e0 08   4d+14:26:03.949  READ DMA
  c8 00 08 30 6d 1c e0 08   4d+14:26:03.949  READ DMA
  c8 00 20 10 6d 1c e0 08   4d+14:26:03.949  READ DMA

Error 10 occurred at disk power-on lifetime: 9338 hours (389 days + 2 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 58 49 1c e0  Error: UNC at LBA = 0x001c4958 = 1853784

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 00 f8 48 1c e0 08   4d+14:25:57.826  READ DMA
  c8 00 00 f8 47 1c e0 08   4d+14:25:56.953  READ DMA
  c8 00 08 a0 46 1c e0 08   4d+14:25:56.936  READ DMA
  c8 00 08 98 46 1c e0 08   4d+14:25:56.920  READ DMA

Error 9 occurred at disk power-on lifetime: 9338 hours (389 days + 2 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 d8 e8 1b e0  Error: UNC at LBA = 0x001be8d8 = 1829080

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 00 f8 e7 1b e0 08   4d+14:25:40.675  READ DMA
  c8 00 00 f8 e6 1b e0 08   4d+14:25:40.646  READ DMA
  c8 00 00 f8 e5 1b e0 08   4d+14:25:40.636  READ DMA
  c8 00 00 f8 e4 1b e0 08   4d+14:25:40.632  READ DMA
  c8 00 00 f8 e3 1b e0 08   4d+14:25:40.628  READ DMA

Error 8 occurred at disk power-on lifetime: 9022 hours (375 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 88 28 18 1d e0  Error: UNC 136 sectors at LBA = 0x001d1828 = 1906728

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 88 c0 17 1d e0 08      20:23:18.312  READ DMA

Error 7 occurred at disk power-on lifetime: 9022 hours (375 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 40 d0 25 1d e0  Error: UNC 64 sectors at LBA = 0x001d25d0 = 1910224

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 40 c0 25 1d e0 08      20:13:14.801  READ DMA

 

Link to comment
1 hour ago, toohai said:

Will there be any issues with the rebuild if 1 parity drive shows read errors and the other is fine?

 

If it's parity1 that has issues and there are errors during the rebuild it should use parity2, but it's something I've never seen before, so not quite sure.

 

You could also disable the known bad parity disk and rebuild using only the good one, but I'd like to see first an attempt to rebuild with both paritys, you can always disable it and rebuild again if needed, but don't forget to grab the diagnostics before rebooting if there are any errors so we can see what happened.

Link to comment
10 minutes ago, johnnie.black said:

You could also disable the known bad parity disk and rebuild using only the good one, but I'd like to see first an attempt to rebuild with both paritys, you can always disable it and rebuild again if needed, but don't forget to grab the diagnostics before rebooting if there are any errors so we can see what happened.

1

 

I can do this but I want to check if Unraid is able to rebuild as it is without writing any corrupted data back into my data disk. Remove the parity would also leave the array prone to failure if another disk shows errors during the rebuild.

Edited by toohai
Link to comment

Finally, I'm replacing the failed data disk and running rebuild as it is, I'm assuming unraid is build such that the read data is reconstructed before being supplied for parity rebuild, some read errors are showing up on the second parity drive. but parity rebuild is going forward without issues. I would still like it if someone would confirm that this is how it is.

Link to comment
30 minutes ago, toohai said:

I'm assuming unraid is build such that the read data is reconstructed before being supplied for parity rebuild

 

Not quite sure what you mean here.

 

31 minutes ago, toohai said:

some read errors are showing up on the second parity drive. but parity rebuild is going forward without issues. I would still like it if someone would confirm that this is how it is.

 

If read errors are on the second parity only rebuilt disk will be fine but grab and post your diagnostics when it finishes.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.