Jump to content

Two drives fail at the same time. What could be the cause? How to repair?


Recommended Posts

Hi there,

I got a an unraid server with 13 disks and 2 parity drives ranging from 8-14TB. A few days ago I couln't open a few files so I checked the web interface and to my horror unraid reported two disks with errors. One data disk and the first parity drive. I shut down the server and now I want to start the trouble shooting and need your help.

 

grafik.png.f811853eca002083a9c0335118714dbb.png

 

Before I shut it down I saw that both drives had an error count of about 2K. I don't suspect a drive failure since both drives failed at the same time. You can't see it but these mails came all in the same minute:

 

grafik.png.da6accbf3bcc8ef7337ce747ee67956b.png

 

My drives are connected either to the mainboard (Gigabyte X470 Aorus Ultra Gaming) or to one of two SAS cards. One Dell PERC H310 and one H200.

 

Here some infos about the failed drives.

 

 * Disk 7 is connected to the H310 on the A Port along with three other drives. The B port doesn't have any drives. A SMART short self test completed without errors. Here are the downloaded SMART results: Disk 7 SMART.txt

 * Parity 1 is connected to the H200 on the A Port along with two other drives. The B port also has three drives connected. A SMART short self test completed without errors. Here are the downloaded SMART results: Parity 1 SMART.txt

 

My first guess was that maybe one of the 4xSata SAS cables may gotten loose or one of the cards is faulty but the drives are on different cables on different cards

 

What are my options now? What can I do to find the cause for the errors? If possible I want to prevent buying new drives and replace them if possible since the current prices are insane.

 

I am pretty lost so any help is appreciated 🙂

Cheers!

Edited by Henning
Link to comment
4 minutes ago, Henning said:

A few days ago I couln't open a few files so I checked the web interface and to my horror unraid reported two disks with errors

This should never happen, you should have system notifications enable to be notified immediately when there's a problem, or it might be too late.

 

6 minutes ago, Henning said:

I shut down the server and now I want to start the trouble shooting and need your help.

Next time before doing this grab the diags or the syslog will be lost after powering back on.

 

Start the array, grab the diagnostics and post the here.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...