Parity drive errors

January 31, 20242 yr

I don't check the dashboard everyday - but, I checked a couple of weeks ago and found that one of my parity drives had an error and was disabled. I started an extended self-test and these are the results. I'm looking for advice here on if I should ignore the error - or, try to see if the drive is covered under warranty for replacement?

Thanks in advance::

ATA Error Count: 11733 (device log contains only the most recent five errors)
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 11733 occurred at disk power-on lifetime: 18079 hours (753 days + 7 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 53 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 50 ff ff ff ef 00  19d+15:20:45.234  READ DMA EXT
  25 00 08 ff ff ff ef 00  19d+15:20:45.234  READ DMA EXT
  35 00 88 ff ff ff ef 00  19d+15:20:45.231  WRITE DMA EXT
  35 00 30 ff ff ff ef 00  19d+15:20:45.229  WRITE DMA EXT
  35 00 40 ff ff ff ef 00  19d+15:20:45.227  WRITE DMA EXT

Error 11732 occurred at disk power-on lifetime: 18079 hours (753 days + 7 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 53 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 c8 ff ff ff ef 00  19d+15:18:55.847  READ DMA EXT
  35 00 18 ff ff ff ef 00  19d+15:18:55.841  WRITE DMA EXT
  35 00 40 ff ff ff ef 00  19d+15:18:55.837  WRITE DMA EXT
  35 00 40 ff ff ff ef 00  19d+15:18:55.833  WRITE DMA EXT
  35 00 40 ff ff ff ef 00  19d+15:18:55.829  WRITE DMA EXT

Error 11731 occurred at disk power-on lifetime: 18059 hours (752 days + 11 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 53 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 f8 ff ff ff ef 00  18d+19:49:36.163  READ DMA EXT
  25 00 88 ff ff ff ef 00  18d+19:49:36.161  READ DMA EXT
  25 00 40 ff ff ff ef 00  18d+19:49:36.158  READ DMA EXT
  25 00 40 ff ff ff ef 00  18d+19:49:36.140  READ DMA EXT
  25 00 80 ff ff ff ef 00  18d+19:49:36.139  READ DMA EXT

Error 11730 occurred at disk power-on lifetime: 18059 hours (752 days + 11 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 53 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 40 ff ff ff ef 00  18d+19:48:32.973  READ DMA EXT
  25 00 40 ff ff ff ef 00  18d+19:48:32.970  READ DMA EXT
  25 00 40 ff ff ff ef 00  18d+19:48:32.940  READ DMA EXT
  35 00 b0 ff ff ff ef 00  18d+19:48:32.939  WRITE DMA EXT
  ea 00 00 00 00 00 a0 00  18d+19:48:32.873  FLUSH CACHE EXT

Error 11729 occurred at disk power-on lifetime: 18059 hours (752 days + 11 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 53 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 40 ff ff ff ef 00  18d+19:47:56.650  READ DMA EXT
  ea 00 00 00 00 00 a0 00  18d+19:47:56.636  FLUSH CACHE EXT
  35 00 08 ff ff ff ef 00  18d+19:47:56.636  WRITE DMA EXT
  25 00 40 ff ff ff ef 00  18d+19:47:56.615  READ DMA EXT
  25 00 08 ff ff ff ef 00  18d+19:47:56.614  READ DMA EXT

Quote

January 31, 20242 yr

Community Expert

1 minute ago, Nomad32 said:

I don't check the dashboard everyday

You must setup Notifications to alert you immediately by email or other agent as soon as a problem is detected. Don't let one unnoticed problem become many and data loss.

That is only a part of the SMART report, and doesn't include the results of the extended test.

Attach Diagnostics to your NEXT post in this thread. that will give us that information and a lot more so we can get a more complete understanding of your situation.

Quote

January 31, 20242 yr

Author

Sorry - attached is the SMART report and the diagnostics report.

tower-smart-20240131-1218.zip tower-diagnostics-20240131-1220.zip

Quote

January 31, 20242 yr

Community Expert

4 minutes ago, Nomad32 said:

attached is the SMART report and the diagnostics report.

Diagnostics already includes SMART for all attached disks.

Quote

January 31, 20242 yr

Community Expert
Solution

ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  5 Reallocated_Sector_Ct   PO--CK   001   001   010    NOW  0 (0 6)

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: unknown failure    10%     18606         0

Both of these are a little unusual. The RAW_VALUE on reallocated isn't a simple count like we usually see, but the FAIL column says NOW, so I would say it has failed.

Extended self-test also says failed, but it doesn't know why. Usually we get something like read failure.

In any case, yes, it should be replaced.

You are also having problems with disk4, and it has a rather high Reallocated count. In fact, I would say that one needs replacing as well.

Do any of your other disks show SMART ( 👎 ) warnings on the Dashboard page?

What do you check on the Dashboard on those rare occasions?

Unrelated, your system share has files on the array. And you should update your plugins.

Quote

January 31, 20242 yr

Author

OK - thank you. Disk 4 has shown errors for a while now - but, never an issue bad enough to crap out on me. I think I'll pick up a couple of larger drives to replace the parity drives, and then use the good parity drive to replace disk 4.

Oh well - thank you so much for letting me know.

Quote

February 15, 20242 yr

Author

I ended up buying 2, new, 14TB drives. I started by removing the dead 12TB parity drive, and replacing it with the 14TB drive. Unraid automagically started to rebuild parity on the new parity disk.

Once that completed - I followed the steps on the parity swap procedures page:: https://docs.unraid.net/legacy/FAQ/parity-swap-procedure/

For anyone curious - That process basically works through adding a new drive that is larger than your current parity drive. You remove the data drive you're replacing, change the current parity drive to be the data drive, and then assign the new disk as parity. After rebuilding parity, the system will start up the array and rebuild the new data disk (old parity drive converted to data).

Quote

February 15, 20242 yr

Community Expert

6 minutes ago, Nomad32 said:

After rebuilding parity

Parity swap copies parity, it doesn't rebuild it. There aren't enough disks to rebuild it since there is already another disk to be rebuilt. The array is offline during the parity copy so nothing can change that would make that copy out-of-sync.

Quote

February 15, 20242 yr

Author

Understood. And, it copied directly from my other parity drive. I apologize - but, essentially believed (when I typed that), that I was rebuilding parity, by copying it to the new parity drive?

Quote

February 15, 20242 yr

Community Expert

Rebuild means you get the data from the parity calculation by reading all other disks.

Quote

1

Parity drive errors

Featured Replies

Solved by trurl

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)