Parity drive disabled


Recommended Posts

Ok so this is odd. My parity drive is disabled. Tried running a smart test, and it said completed without error. But when I look in the log I see the following. What's going on here?

 

ATA Error Count: 165 (device log contains only the most recent five errors)
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 165 occurred at disk power-on lifetime: 62237 hours (2593 days + 5 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 00 ff ff ff 4f 00  16d+13:14:37.016  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00  16d+13:14:37.016  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00  16d+13:14:37.016  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00  16d+13:14:37.015  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00  16d+13:14:37.015  READ FPDMA QUEUED

Error 164 occurred at disk power-on lifetime: 61901 hours (2579 days + 5 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 00 ff ff ff 4f 00   2d+13:31:25.905  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00   2d+13:31:25.905  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00   2d+13:31:25.905  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00   2d+13:31:25.905  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00   2d+13:31:25.905  READ FPDMA QUEUED

Error 163 occurred at disk power-on lifetime: 61847 hours (2576 days + 23 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 00 ff ff ff 4f 00      06:41:38.399  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00      06:41:38.396  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00      06:41:38.393  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00      06:41:38.389  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00      06:41:38.387  READ FPDMA QUEUED

Error 162 occurred at disk power-on lifetime: 61844 hours (2576 days + 20 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: WP at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 00 08 ff ff ff 4f 00      03:58:29.699  WRITE FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      03:58:29.698  READ FPDMA QUEUED
  60 00 40 ff ff ff 4f 00      03:58:29.698  READ FPDMA QUEUED
  60 00 40 ff ff ff 4f 00      03:58:29.698  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      03:58:29.698  READ FPDMA QUEUED

Error 161 occurred at disk power-on lifetime: 61844 hours (2576 days + 20 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 00 ff ff ff 4f 00      03:58:18.632  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00      03:58:18.628  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00      03:58:18.625  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00      03:58:18.622  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00      03:58:18.619  READ FPDMA QUEUED

 

Link to comment
On 1/28/2023 at 9:45 PM, trurl said:

Diagnostics after reboot, can't see what happened before.

tower-smart-20230129-1932.zip

Check connections, power and SATA, both ends, including splitters. Run an extended SMART self-test on parity.

Connections checked, smart report and full diags attached. Again, says passed no errors. Took a long time for the extended test. Itower-diagnostics-20230129-2145.zip think I did it twice though!

Link to comment
On 1/29/2023 at 10:15 PM, trurl said:
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     62559         -
# 2  Extended offline    Completed without error       00%     62545         -

You can rebuild to the same disk.

https://wiki.unraid.net/Manual/Storage_Management#Rebuilding_a_drive_onto_itself

Ok so parity rebuilding, meanwhile:

  Unmountable: Unsupported partition layout

On *another* disk. Getting fed up with this :(

1. How do I know what I have lost, given I have no parity disk, and one of the disks (another one) won't mount?

2. What's going on? This chassis worked fine before unraid (please don't flame me, I'm just telling it like it is)... since unraid, my disks have been dropping like flies. nvme, hdd, one by one they are all going! 

Link to comment

You might need to take a closer look at how you are powering all these disks.

 

Better if you don't try to put more than 4 disks on a single power cable. If using splitters, MOLEX crimped (not molded) splitters are preferred.

 

Don't bundle data cables. Make sure each cable, power or SATA, has enough slack for the connector to sit squarely on the connection without any tension.

 

You might need to replace SATA cable if it continues to give problems.

 

 

Link to comment
19 hours ago, trurl said:

You might need to take a closer look at how you are powering all these disks.

 

Better if you don't try to put more than 4 disks on a single power cable. If using splitters, MOLEX crimped (not molded) splitters are preferred.

 

Don't bundle data cables. Make sure each cable, power or SATA, has enough slack for the connector to sit squarely on the connection without any tension.

 

You might need to replace SATA cable if it continues to give problems.

 

 

tower-diagnostics-20230201-2012.zip

Diags attached.

 

Would parity not let me restore the missing disk, if it was not recoverable, though? I don't really know what was on that disk, although presumably a number of my files!

 

I'll check all the cables again and look at how they're routed etc... but I've lost (due to same fault) 2 x NVME drives as well, and they are both on PCIe cards, so no cables involved!

Link to comment
1 hour ago, trurl said:

What exactly do you mean "same fault"?

 

nvme0 is assigned as cache, and cache is mounted.

 

The only other I see is nvme1, and it isn't assigned.

As in 'unmountable partition'. nvme0 & nvme1 were originally a BTRFS pool. When that went bad, I switched to XFS for cache instead, on one of them. That then became unmountable. So I've reformatted it (having lost my 'live' appdata etc etc), to start again. Don't know how long it will last this time! And that's just the (pcie mounted) SSDs!)

Link to comment
  • 2 weeks later...
08:00.0 SATA controller [0106]: Marvell Technology Group Ltd. 88SE9172 SATA 6Gb/s Controller [1b4b:9172] (rev 11)

Marvell controllers are not recommended, but it doesn't look like you are using it. Looks like disks 3, 4 and all others are instead using

02:00.0 Serial Attached SCSI controller [0107]: Broadcom / LSI SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] [1000:0072] (rev 02)

No SMART report for disk3, looks like it has disconnected. It was already unmountable when you booted Feb 1. It needs to be repaired.

No SMART report for disk4, looks like it has disconnected. Emulated disk4 is mounted and 80% full. It needs to be rebuilt.

 

Log space is completely full. No syslogs in those diagnostics since Feb1, parity rebuild had not finished when you were having problems with those 2 data disks, so not clear parity build would have been good. But it is emulating disk4 so that's a good sign. Might be a good idea to rebuild disk4 to a spare just in case.

 

Not clear why webUI isn't working now.

 

Check connections on disks 3, 4, both ends, including power and splitters. Reboot and post new diagnostics.

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.