Parity Drive Read Errors (Urgent)


Recommended Posts

I ran into a problem. I was encoding videos in the Handbrake container, then at the same time, ran into that highly obnoxious issue where the file permissions for certain folders reset to something I couldn't move, so I ran Safe New Permissions. This caused my Unraid server to freeze with all my CPU cores locked at 100%, and this wasn't ending so I had to shut down my server manually. When I restarted, I started getting these nasty errors. My parity drive wasn't working. I tried removing it from the array and readding it, it started a read check, but that just started returning a mountain of errors, and now my Disk 4 isn't working.

 

I can't even copy-paste the log because it keeps filling up endlessly with errors.

 

nCbnEmr.png

 

KX4gmBl.png

 

I'm scared all my data is starting to disappear now. Is there nothing that can be done?

Edited by Stubbs
Link to comment
12 minutes ago, JorgeB said:

Replace/reconnect cables (both power and SATA) on these disks:

 

WDC_WD30EFZX-68AWUN0_WD-WX22DB0KE762

WDC_WD30EFRX-68EUZN0_WD-WCC4N4LHZESE

 

And post new diags.

 

Ok, my Disk 4 is back online and working, but my Parity Drive is still broken.

 

Both of these drives are in my x3 HDD Hotswap Bay, which are powered by two power cables. They're extremely difficult to replace or re-insert in my set up so I couldn't do that. The middle HDD (disk 3) has not failed yet, but the Parity Drive in there still has problems. I replaced both SATA cables for the top and bottom bay (parity and disk 4).

 

1fDRSUf.png

tower-diagnostics-20210721-0105.zip

Link to comment
10 minutes ago, JorgeB said:

No ATA errors so far, try re-syncing parity:

 

https://wiki.unraid.net/Manual/Storage_Management#Rebuilding_a_drive_onto_itself

 

 

Well, it appears to have started rebuilding without any issues so far. The problem is, I think I replaced the previous SATA cables with slower 3GB/S ones.

 

uf9TKIT.png

 

I'm thinking that maybe I should do a quick swap back to the old ones, assuming they're not the problem. I don't want to spend a full 24 hours stressing my array.

Link to comment
8 minutes ago, JorgeB said:

That can't be it, SATA1 can still do close to 150MB/s, but you can post new diags to see if there's something visible.

I just reconnected the previous(fast) cables, booted up, parity rebuild started, and sure enough, those old, nasty errors started appearing again, along with a lousy 6 MB/s rebuild speed (in contrast to the 50MB/s speed I had with the "slower" cables. Here is the log.

 

My Array won't even turn off now, it's stuck on "Array Stopping•Retry unmounting disk share(s)...". I'm worried I will have to restart it again and pray no data will be lost.

tower-diagnostics-20210721-0142.zip

Edited by Stubbs
Link to comment

I replaced the cables again and started rebuilding the parity drive. The speed went down to 2 MB/s at the start before going back up to 25 MB/s. I'm worried I'll be an old man before it fully rebuilds.

 

I don't feel like touching it again because this has been an absolute nightmare for me all day.

 

Jul 21 02:40:19 Tower kernel: IPv6: ADDRCONF(NETDEV_CHANGE): veth7db6519: link becomes ready
Jul 21 02:40:19 Tower kernel: br-236e25b7e44c: port 3(veth7db6519) entered blocking state
Jul 21 02:40:19 Tower kernel: br-236e25b7e44c: port 3(veth7db6519) entered forwarding state
Jul 21 02:40:21 Tower avahi-daemon[4067]: Joining mDNS multicast group on interface veth7db6519.IPv6 with address fe80::ec19:f4ff:fe80:239c.
Jul 21 02:40:21 Tower avahi-daemon[4067]: New relevant interface veth7db6519.IPv6 for mDNS.
Jul 21 02:40:21 Tower avahi-daemon[4067]: Registering new address record for fe80::ec19:f4ff:fe80:239c on veth7db6519.*.
Jul 21 02:40:39 Tower kernel: ata6.00: exception Emask 0x0 SAct 0x703000 SErr 0x0 action 0x0
Jul 21 02:40:39 Tower kernel: ata6.00: irq_stat 0x40000008
Jul 21 02:40:39 Tower kernel: ata6.00: failed command: WRITE FPDMA QUEUED
Jul 21 02:40:39 Tower kernel: ata6.00: cmd 61/40:a0:a8:a4:7f/05:00:00:00:00/40 tag 20 ncq dma 688128 out
Jul 21 02:40:39 Tower kernel: res 41/10:00:a8:a4:7f/00:00:00:00:00/40 Emask 0x481 (invalid argument) <F>
Jul 21 02:40:39 Tower kernel: ata6.00: status: { DRDY ERR }
Jul 21 02:40:39 Tower kernel: ata6.00: error: { IDNF }
Jul 21 02:40:39 Tower kernel: ata6.00: configured for UDMA/133
Jul 21 02:40:39 Tower kernel: ata6: EH complete
Jul 21 02:40:46 Tower kernel: ata6.00: exception Emask 0x0 SAct 0x7000000 SErr 0x0 action 0x0
Jul 21 02:40:46 Tower kernel: ata6.00: irq_stat 0x40000008
Jul 21 02:40:46 Tower kernel: ata6.00: failed command: WRITE FPDMA QUEUED
Jul 21 02:40:46 Tower kernel: ata6.00: cmd 61/40:c0:a8:a4:7f/05:00:00:00:00/40 tag 24 ncq dma 688128 out
Jul 21 02:40:46 Tower kernel: res 41/10:00:a8:a4:7f/00:00:00:00:00/40 Emask 0x481 (invalid argument) <F>
Jul 21 02:40:46 Tower kernel: ata6.00: status: { DRDY ERR }
Jul 21 02:40:46 Tower kernel: ata6.00: error: { IDNF }

 

tower-diagnostics-20210721-0240.zip

Link to comment
16 minutes ago, JorgeB said:

If you still getting those ATA errors there's still a problem, like mentioned it's likely cable/power related, if different cables don't help try another PSU.

They are not showing up on the log anymore, but the rebuild speed varies from 29 MB/s to 3 MB/s.

 

I've been changing the cables around multiple times, and re-inserted the drives into the hotswap bay. I think it's working fine now, and I'm starting to suspect there's a big problem with one of my previous SATA cables. I actually had a really difficult time unplugging one from the motherboard.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.