Jump to content

Two array disks disabled, what to do next?


razr
Go to solution Solved by JorgeB,

Recommended Posts

Unraid Version 6.9.2

8x 8TB Seagate IronWolf in Array (2 Parity disks)

2x 480GB Marvell based SanDisk SSDs as Cache

1x 500GB Samsung NVMe mainly for Docker

1x 6TB external HDD with some arbitrary data

Diagnostics zip attached!

 

TL;DR

Two of my array disks were disabled shortly after each other in the last week. One of those disks had the same issue two weeks back. Back then I removed and re-added it to the array and restarted the parity sync (as it was a parity disk). As it happened again now with multiple disks, I'd like to figure out if there is actually an issue here. How can I find out what the actual problem is (SATA cable, HBA, actually failing disk, ...)?

 

The whole story:

Three or four weeks back my OS flash drive failed. Before the server was running for more than two years without any major issues. I replaced the drive with a new one, did a re-setup of the OS, re-created the array and everything worked out for about a week. Then all of a sudden one of my parity drives failed. After some reading in the wiki and forums I removed from and re-added the disk to the array and restarted the parity sync. Which succeeded after about 12 hours.

 

Beginning of this week I tried to access some files and saw, that now two of my disks are marked as _disabled_: One is the same parity disk, that was disabled two weeks back and the other is a data disk.

 

Unfortunately I have no physical access to the machine until beginning of next week. And even more unfortunately my VPN app runs as a Docker container on the server. So powering it down or stopping the array would cut me off from accessing my network (yes, I need to fix that). So I kept it running. To prevent more damage being done (by writing data to the array) I stopped everything that does any writing on the array. My containers are running on an extra NVMe. Some read data from the array, but they are stopped now.

 

My question now is: How do I proceed? How can I find out what exactly the problem is? I had a look at the diagnostics files myself and I saw a couple of read errors on disks 4, 5 and 6. If I remember correctly those three and the second parity disk (the one, that is disabled right now) are connected to my HBA (Dell Perc H200). I did not see any errors on the first parity disk or disks 1, 2 or 3. Which are all connected to the motherboard directly. Maybe that could be a hint?

 

I tried to do a _SMART extended self-test_ on those disks, but it stopped at 10%. I don't see any errors. In the logs it just says _spinning down /dev/sdX_ shortly after.

 

Would be great if someone could help me out here.

 

Thanks in advance!

coruscant-diagnostics-20220218-2153.zip

Link to comment

Ah, great! Thank you so much!

 

So I assume there is nothing wrong with my data now. Is there a way to tell UNRAID, that it should re-add the disks to the array and just start up without doing any rebuilds? I would probably start a new parity sync afterwards, just to make sure the parity disks are up to date again.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...