Multiple Disks Fail in Series - Longstanding Issue


Recommended Posts

Hey all,

 

I'm running into an issue with my array regarding failed disks. Disks 1 - 4 will fail one at a time. I've occasionally had this happen with other disks, but it is almost always Disk 1-4:

 

Typically this happens as follows:

  1. A disk (Disk 3 in today's case) will randomly fail
  2. I start rebuilding Disk 3
  3. Disk 2 will fail during rebuild
  4. Rinse & repeat with another 1-2 disk failures

 

Hardware details are in my signature, but my server has a Corsair 850w PSU, Threadripper 2950x, EVGA 2080 Super (undervolted), 10 HDD, 3 NVMe, and 1 SSD. I am also using an LSI 9207-8i with the recommended firmware for Unraid. I don't think it's a power supply issue based on a power draw calc.

 

This was happening once every 1-2 months for about 6-12 months, then stopped about 6 months ago. Sometimes only 1 disk would fail, other times it would be multiple disks in series.

 

So far, I've tried 2 different LSI 9207-8i cards (same firmware on both), reseating the SATA & power cables, and even swapped the SATA cables. None of these have had an effect. Then randomly, about 3 months after the last attempted fix, the failures stopped. After about 6 months of no issues, the disk failures have restarted as of today.

 

Today, I had Disk 3 fail, then during Disk 3 rebuild, Disk 2 failed. I'm still rebuilding Disk 3 currently.

 

Any idea what may be causing the issue or any recommendations as far as what to do / where to look to fix this issue?

tower-diagnostics-20210226-1534.zip

Edited by Giggity_Grant
Corrected disk #'s
Link to comment

I will try moving the Seagates over to the mobo sata ports after the rebuild completes. Unfortunately, I don't have a spare power supply.

 

After rebuilding Disk 3, I started rebuilding Disk 2. About 10% into the Disk 2 rebuild, Disk 3 failed again. Looks like it will be a few days until I can try using the mobo sata ports.

Edited by Giggity_Grant
Corrected disk #'s
Link to comment

Well, I finished the Disk 2 rebuild, but in the process of starting the Disk 3 re-rebuild, the Web GUI dropped out and cannot be reached. 

 

After the Disk 2 rebuild, I began the process of rebuilding Disk 3 (again, for the 2nd time in 5 days):

  • Stopped array
  • De-selected the HDD for Disk 3
  • Started array
  • Stopped array
  • Assigned the HDD to Disk 3

As soon as I hit the "start" button to start the array, the WebGui lost all connectivity.

 

Additional Tests / Symptoms

  • UnRaid server no longer appears as an active client on my network 
  • Tried pinging the Unraid Server IP. This was unsuccessful, server could not be reached
  • Tried to SSH into UnRaid server - "connection could not be established. network is unreachable"
  • Tried reaching various -arr docker containers via their IP address & port number in web browser, but this was not successful. No docker containers could be reached.

 

Not sure what to do now, other than force an unsafe shutdown (cut power), reboot, and try to start the Disk 3 rebuild process again.

Link to comment

@JorgeB I went ahead and re-wire the entire server, putting the seagate drives on the motherboard SATA ports. I also replaced the sata power cable that fed these drives as well.

 

This seems to have resolved the problem for now.

 

Unfortunately, I had already done a hard restart on the server by the time I saw your message, but I've attached a Diagnostics report that I just pulled in the event that it might be helpful.

tower-diagnostics-20210308-2040.zip

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.