Parity drive disabled

banterer · January 28, 2023

Ok so this is odd. My parity drive is disabled. Tried running a smart test, and it said completed without error. But when I look in the log I see the following. What's going on here?

ATA Error Count: 165 (device log contains only the most recent five errors)
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 165 occurred at disk power-on lifetime: 62237 hours (2593 days + 5 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 00 ff ff ff 4f 00  16d+13:14:37.016  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00  16d+13:14:37.016  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00  16d+13:14:37.016  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00  16d+13:14:37.015  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00  16d+13:14:37.015  READ FPDMA QUEUED

Error 164 occurred at disk power-on lifetime: 61901 hours (2579 days + 5 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 00 ff ff ff 4f 00   2d+13:31:25.905  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00   2d+13:31:25.905  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00   2d+13:31:25.905  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00   2d+13:31:25.905  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00   2d+13:31:25.905  READ FPDMA QUEUED

Error 163 occurred at disk power-on lifetime: 61847 hours (2576 days + 23 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 00 ff ff ff 4f 00      06:41:38.399  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00      06:41:38.396  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00      06:41:38.393  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00      06:41:38.389  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00      06:41:38.387  READ FPDMA QUEUED

Error 162 occurred at disk power-on lifetime: 61844 hours (2576 days + 20 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: WP at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 00 08 ff ff ff 4f 00      03:58:29.699  WRITE FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      03:58:29.698  READ FPDMA QUEUED
  60 00 40 ff ff ff 4f 00      03:58:29.698  READ FPDMA QUEUED
  60 00 40 ff ff ff 4f 00      03:58:29.698  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      03:58:29.698  READ FPDMA QUEUED

Error 161 occurred at disk power-on lifetime: 61844 hours (2576 days + 20 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 00 ff ff ff 4f 00      03:58:18.632  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00      03:58:18.628  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00      03:58:18.625  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00      03:58:18.622  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00      03:58:18.619  READ FPDMA QUEUED

trurl · January 28, 2023

attach diagnostics to your NEXT post in this thread

banterer · January 28, 2023

48 minutes ago, trurl said:

attach diagnostics to your NEXT post in this thread

tower-diagnostics-20230128-2131.zip

trurl · January 28, 2023

Diagnostics after reboot, can't see what happened before.

Check connections, power and SATA, both ends, including splitters. Run an extended SMART self-test on parity.

banterer · January 29, 2023

On 1/28/2023 at 9:45 PM, trurl said:

Diagnostics after reboot, can't see what happened before.

tower-smart-20230129-1932.zip

Check connections, power and SATA, both ends, including splitters. Run an extended SMART self-test on parity.

Connections checked, smart report and full diags attached. Again, says passed no errors. Took a long time for the extended test. Itower-diagnostics-20230129-2145.zip think I did it twice though!

trurl · January 29, 2023

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     62559         -
# 2  Extended offline    Completed without error       00%     62545         -

You can rebuild to the same disk.

https://wiki.unraid.net/Manual/Storage_Management#Rebuilding_a_drive_onto_itself

banterer · February 1, 2023

On 1/29/2023 at 10:15 PM, trurl said:
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     62559         -
# 2  Extended offline    Completed without error       00%     62545         -
You can rebuild to the same disk.

https://wiki.unraid.net/Manual/Storage_Management#Rebuilding_a_drive_onto_itself

Ok so parity rebuilding, meanwhile:

Unmountable: Unsupported partition layout

On *another* disk. Getting fed up with this

1. How do I know what I have lost, given I have no parity disk, and one of the disks (another one) won't mount?

2. What's going on? This chassis worked fine before unraid (please don't flame me, I'm just telling it like it is)... since unraid, my disks have been dropping like flies. nvme, hdd, one by one they are all going!

trurl · February 1, 2023

Parity won't help with unmountable anyway, you need to repair the filesystem.

Attach diagnostics to your NEXT post in this thread.

trurl · February 1, 2023

You might need to take a closer look at how you are powering all these disks.

Better if you don't try to put more than 4 disks on a single power cable. If using splitters, MOLEX crimped (not molded) splitters are preferred.

Don't bundle data cables. Make sure each cable, power or SATA, has enough slack for the connector to sit squarely on the connection without any tension.

You might need to replace SATA cable if it continues to give problems.

banterer · February 1, 2023

19 hours ago, trurl said:

You might need to take a closer look at how you are powering all these disks.

Better if you don't try to put more than 4 disks on a single power cable. If using splitters, MOLEX crimped (not molded) splitters are preferred.

Don't bundle data cables. Make sure each cable, power or SATA, has enough slack for the connector to sit squarely on the connection without any tension.

You might need to replace SATA cable if it continues to give problems.

tower-diagnostics-20230201-2012.zip

Diags attached.

Would parity not let me restore the missing disk, if it was not recoverable, though? I don't really know what was on that disk, although presumably a number of my files!

I'll check all the cables again and look at how they're routed etc... but I've lost (due to same fault) 2 x NVME drives as well, and they are both on PCIe cards, so no cables involved!

trurl · February 1, 2023

3 minutes ago, banterer said:

Would parity not let me restore the missing disk, if it was not recoverable, though? I don't really know what was on that disk, although presumably a number of my files!

19 hours ago, trurl said:

Parity won't help with unmountable anyway, you need to repair the filesystem.

Check filesystem on disk3

trurl · February 1, 2023

6 minutes ago, banterer said:

lost (due to same fault) 2 x NVME drives

What exactly do you mean "same fault"?

nvme0 is assigned as cache, and cache is mounted.

The only other I see is nvme1, and it isn't assigned.

banterer · February 1, 2023

1 hour ago, trurl said:

What exactly do you mean "same fault"?

nvme0 is assigned as cache, and cache is mounted.

The only other I see is nvme1, and it isn't assigned.

As in 'unmountable partition'. nvme0 & nvme1 were originally a BTRFS pool. When that went bad, I switched to XFS for cache instead, on one of them. That then became unmountable. So I've reformatted it (having lost my 'live' appdata etc etc), to start again. Don't know how long it will last this time! And that's just the (pcie mounted) SSDs!)

trurl · February 2, 2023

3 hours ago, trurl said:

Check filesystem on disk3

Be sure to capture the results so you can post them.

banterer · February 14, 2023

On 2/2/2023 at 12:14 AM, trurl said:

Be sure to capture the results so you can post them.

Ok, so parity is currently ok, but I have one disk unmountable, one disabled and 'emulated'. And 'stop' is disabled so I can't stop and start in maintenance mode.

Please advise??

banterer · February 14, 2023

..update, now I've lost access to the GUI Really don't know what's going on here.

I can still access the terminal - how can I cleanly shut down and reboot from there? I've tried the instructions here, but even the first command `/root/samba stop` isn't recognised.

trurl · February 15, 2023

Can you get diagnostics?

banterer · February 15, 2023

623139956_Screenshot2023-02-15at00_49_37.png.a39837e84f40de6c72ce3831dbd5c7c2.png

This is all I can get form the GUI I still have access to the terminal.

trurl · February 15, 2023

40 minutes ago, trurl said:

Can you get diagnostics?

from the terminal. Click the link

banterer · February 15, 2023

23 minutes ago, trurl said:

from the terminal. Click the link

The link, in the terminal? Are you trying to be funny?

trurl · February 15, 2023

The word diagnostics in this post and everywhere it appears in posts on this forum is a link to instructions on how to get diagnostics, including how to get them from the terminal.

banterer · February 15, 2023

I have read that link. I don't see instructions from the terminal. Maybe it's late, maybe it's my fault. Your curt replies seem to suggest so.

banterer · February 15, 2023

Ok, found it, running...

banterer · February 15, 2023

tower-diagnostics-20230215-0203.zip

trurl · February 15, 2023

08:00.0 SATA controller [0106]: Marvell Technology Group Ltd. 88SE9172 SATA 6Gb/s Controller [1b4b:9172] (rev 11)

Marvell controllers are not recommended, but it doesn't look like you are using it. Looks like disks 3, 4 and all others are instead using

02:00.0 Serial Attached SCSI controller [0107]: Broadcom / LSI SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] [1000:0072] (rev 02)

No SMART report for disk3, looks like it has disconnected. It was already unmountable when you booted Feb 1. It needs to be repaired.

No SMART report for disk4, looks like it has disconnected. Emulated disk4 is mounted and 80% full. It needs to be rebuilt.

Log space is completely full. No syslogs in those diagnostics since Feb1, parity rebuild had not finished when you were having problems with those 2 data disks, so not clear parity build would have been good. But it is emulating disk4 so that's a good sign. Might be a good idea to rebuild disk4 to a spare just in case.

Not clear why webUI isn't working now.

Check connections on disks 3, 4, both ends, including power and splitters. Reboot and post new diagnostics.

Parity drive disabled

Recommended Posts

Link to comment

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Popular Posts

trurl

JorgeB

Posted Images

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation