Jump to content

Parity drive issue turned into a bigger issue


Go to solution Solved by itimpi,

Recommended Posts

Hey all, I recently had one of my parity drives showing Parity device disabled.

 

To try to resolve this issue, I did these steps:

  1. Ran a parity check with Write corrections to parity checked. This completed successfully, but the parity drive was still disabled.
  2. Stopped the array, selected no device on the failed parity drive slot, restarted the array, stopped the array again, select the parity drive, restarted the array.

 

After doing that, both parity drives were down and disk 7 showed Device contents emulated as well as Unsupported or no file system.

 

Tried to debug this a bit with a helpful user on the forum, but unfortunately wasn't able to fix the issue.

 

The additional steps taken during debugging were:

  1. Tools > New Config > Select to preserve all arrays and pools > Apply
  2. Started the array and formatted disk 7
  3. Clicked the red X on the two parity drives under unassigned devices to clear the drives, then formatted both drives.
  4. Stopped the array, assigned the two parity drives, and restarted the array

 

I also reorganized drives and cables, to confirm that disk 7 is consistently failing and that it is not an issue with the cables or HBA. I connected a SATA cable from disk 7 directly to the motherboard instead of connecting it to the HBA.

 

Starting the array now seems very slow:

Oct 12 00:35:50 raphnas root: realtime =none                   extsz=4096   blocks=0, rtextents=0
Oct 12 00:35:50 raphnas emhttpd: mounting /mnt/disk9
Oct 12 00:35:50 raphnas emhttpd: shcmd (127): mkdir -p /mnt/disk9
Oct 12 00:35:50 raphnas emhttpd: shcmd (128): mount -t xfs -o noatime,nouuid /dev/md9p1 /mnt/disk9
Oct 12 00:35:50 raphnas kernel: XFS (md9p1): Mounting V5 Filesystem
Oct 12 00:35:51 raphnas kernel: ata6.00: exception Emask 0x50 SAct 0x200000 SErr 0x4890800 action 0xe frozen
Oct 12 00:35:51 raphnas kernel: ata6.00: irq_stat 0x0c400040, interface fatal error, connection status changed
Oct 12 00:35:51 raphnas kernel: ata6: SError: { HostInt PHYRdyChg 10B8B LinkSeq DevExch }
Oct 12 00:35:51 raphnas kernel: ata6.00: failed command: READ FPDMA QUEUED
Oct 12 00:35:51 raphnas kernel: ata6.00: cmd 60/00:a8:38:0d:01/01:00:80:04:00/40 tag 21 ncq dma 131072 in
Oct 12 00:35:51 raphnas kernel:         res 40/00:00:38:0d:01/00:00:80:04:00/40 Emask 0x50 (ATA bus error)
Oct 12 00:35:51 raphnas kernel: ata6.00: status: { DRDY }
Oct 12 00:35:51 raphnas kernel: ata6: hard resetting link
Oct 12 00:35:52 raphnas kernel: ata6: SATA link down (SStatus 0 SControl 380)
Oct 12 00:35:58 raphnas kernel: ata6: hard resetting link
Oct 12 00:35:58 raphnas kernel: ata6: SATA link down (SStatus 0 SControl 380)
Oct 12 00:35:58 raphnas kernel: ata6: limiting SATA link speed to <unknown>
Oct 12 00:35:59 raphnas kernel: ata6: hard resetting link
Oct 12 00:36:04 raphnas kernel: ata6: link is slow to respond, please be patient (ready=0)
Oct 12 00:36:09 raphnas kernel: ata6: COMRESET failed (errno=-16)
Oct 12 00:36:09 raphnas kernel: ata6: hard resetting link


As it currently stands, when the array finally starts up, there are errors on disks 4 and 7.

 

A few more details about my setup in case it helps:

  • 12x 20TB WD reds, 10x storage, 2x parity (all brand new drives, bought two months ago)
  • 1x 1TB NVME
  • 2x LSI SAS 9211-8i HBA (8x drives on one HBA, 4x drives on the other HBA)
  • Asus Prime H570-Plus mobo
  • Intel Celeron G5925 CPU
  • 32GB DDR4 RAM
  • Corsair CX550 PSU

 

Attaching diagnostics for further debugging.

 

Please let me know if there is any further information I can provide to help identify the issue.

raphnas-diagnostics-20231012-0057.zip

Edited by raphytaffy
Link to comment
5 hours ago, raphytaffy said:

To try to resolve this issue, I did these steps

Wish you had asked on the forum before doing anything at all. Everything you did was wrong.

 

Hope you didn't have anything important on disk7.

 

After you fix your hardware issue,  please ask on the forum for further advice on how to proceed.

Link to comment
22 hours ago, JorgeB said:

There are errors with multiple disks suggesting a power/connection problem, check/replace cables or try with a different PSU if available.

 

I tried connecting the hard drives to a different PSU and now they have started with zero errors, but the array took a while to start up, whereas before it would start up at boot time rather quickly.

 

17 hours ago, trurl said:

Wish you had asked on the forum before doing anything at all. Everything you did was wrong.

 

Hope you didn't have anything important on disk7.

 

After you fix your hardware issue,  please ask on the forum for further advice on how to proceed.

 

I do have a cloud backup, so this isn't the end of the world, just tedious as I try to resolve the issue.

 

Here is what my array looks like now:

image.thumb.png.f83afc4e91c77b0c3da2a2ff5f2a1d4f.png

 

I still see a lot of warnings in the logs:

Oct 13 00:35:49 raphnas kernel: ata6.00: failed command: READ FPDMA QUEUED
Oct 13 00:35:49 raphnas kernel: ata6.00: cmd 60/00:98:78:bd:00/04:00:00:00:00/40 tag 19 ncq dma 524288 in
Oct 13 00:35:49 raphnas kernel:         res 40/00:00:78:cb:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Oct 13 00:35:49 raphnas kernel: ata6.00: status: { DRDY }
Oct 13 00:35:49 raphnas kernel: ata6.00: failed command: READ FPDMA QUEUED
Oct 13 00:35:49 raphnas kernel: ata6.00: cmd 60/00:a8:78:cb:00/04:00:00:00:00/40 tag 21 ncq dma 524288 in
Oct 13 00:35:49 raphnas kernel:         res 40/00:00:78:cb:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Oct 13 00:35:49 raphnas kernel: ata6.00: status: { DRDY }
Oct 13 00:35:49 raphnas kernel: ata6: hard resetting link
Oct 13 00:35:50 raphnas kernel: ata6: SATA link down (SStatus 0 SControl 310)
Oct 13 00:35:55 raphnas kernel: ata6: hard resetting link
Oct 13 00:35:55 raphnas kernel: ata6: SATA link down (SStatus 0 SControl 310)
Oct 13 00:35:57 raphnas kernel: ata6: hard resetting link
Oct 13 00:36:03 raphnas kernel: ata6: link is slow to respond, please be patient (ready=0)
Oct 13 00:36:07 raphnas kernel: ata6: COMRESET failed (errno=-16)
Oct 13 00:36:07 raphnas kernel: ata6: hard resetting link

 

Full logs attached.

 

Should I let the parity run at this point?

Edit #1: I spoke too soon. There are errors on disks 7 and 10 now. Disk 7 has its own SATA cable connected directly to the motherboard, while disk 10 is connected to an HBA.

 

Edit #2: Disks 4 and 6 have joined the party.

 

 

raphnas-diagnostics-20231013-0113.zip

Edited by raphytaffy
Link to comment
37 minutes ago, JorgeB said:

Still issues, if it's not the PSU it could be a controller or cable problem.

 

Thanks, I'm not great at debugging hardware issues, but I think I may have resolved it.

 

My previous PSU (Corsair CX550) only had one SATA power cable, so I daisy chained SATA extenders together (don't laugh at me haha):

image.thumb.png.3cf0529ab9922623fa54d1fbe610bc8e.png

 

This time, the array started on boot without any issues and there are zero errors on the drives so far.

 

Parity-Sync is currently in progress for Parity 2, however Parity 1 is still showing Parity device disabled. Should I let that continue?

 

Additional questions:

  • Upvote 1
Link to comment
4 hours ago, itimpi said:
  • Stop array
  • unassign parity1 ready to make Unraid forget current assignment
  • start array to commit change
  • stop array
  • assign parity1
  • start array to sync both parity drives

 

Thank you! These were the steps I originally took to try to run the Parity-Sync, but I guess the power issue caused other issues to surface.

 

Parity-Sync is now running for both drives. Appreciate everyone's help! Fingers crossed it finishes without errors.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...