Need help with a failed drive

Seanmc980 · September 10, 2023

Hello, I went to shut down my server and noticed that one of my drives in the array had a red X next to it. It listed 64 errors in the column. I tried to restart hoping that a reboot would just fix the issue (knowing that it probably wouldn't) but the drive is still red. When I first noticed this, the drive was missing it's temperature reading. The system hung and never rebooted. I issued a shutdown, and eventually it did shut down, but it indicated that it wasn't a clean shut down. I started the array in maintenance mode and started a parity check.. I feel like this is probably a 20 hour waste of time, so i'm reaching out to support for help with getting this back up and running.

After the reboot, the log files seemed to have cleared. The drive is still showing under the array but the drive's letters have changed.. it went from SDK to SDG and the drive is shown below in under historical device..

This array is made up of 4 12TB drives, with 1 parity, 1 2TB drives and 3 unassigned drives. I want to make sure I perform the right steps, this is my main server for all my media and back ups.. I really do not want to screw this up, please someone walk me through the process.. I found a couple of forum posts, but they linked to outdated and removed posts.

JonathanM · September 10, 2023

Attach the diagnostics zip file to your next post in this thread.

Seanmc980 · September 11, 2023

I did reboot, like mentioned. Hopefully something in these logs will be helpful. Thank you

btch-diagnostics-20230910-2024.zip

trurl · September 11, 2023

1 hour ago, Seanmc980 said:

started a parity check

Fortunately you didn't start a correcting parity check

Sep 10 14:07:13 BTCH kernel: mdcmd (36): check nocorrect

since what you need to do is rebuild the disabled disk using the existing parity.

syslog indicates emulated disk3 mounted before you restarted in maintenance mode, and disk3 SMART looks OK. Probably just a connection problem, but can't say for sure since it happened before reboot so nothing in syslog about that.

It should be OK to rebuild to the same disk, but you should check connections first.

https://docs.unraid.net/unraid-os/manual/storage-management/#rebuilding-a-drive-onto-itself

Do you have backups of anything important and irreplaceable?

Seanmc980 · September 11, 2023

I tried the steps you suggested, but the drive came back with errors. It's paused at 10% of the rebuild/sync. I've attached the diagnosis zip..what should I do know?

btch-diagnostics-20230911-0632.zip

JorgeB · September 11, 2023

Looks more like a power/connection issue, replace cables/swap slot and try again.

Seanmc980 · September 11, 2023

22 minutes ago, JorgeB said:

Looks more like a power/connection issue, replace cables/swap slot and try again.

How can you tell? I can replace the power cable and swap sata. I'll report back.

Seanmc980 · September 11, 2023

This drive was powered by an expansion sata power connector.. one of those old style HDD to Sata power adapters.. the one power cable was split three times to power 2 fans, then the sata adapter.. it was powering 4 drives, including the one with errors.. I wasn't aware that this was a terrible idea, until now. The power supply is an older modular ATX style, but I lost all the expansion cables in a move.. new power supply ordered, server shut down until I get it swapped and properly powered.. I'll report back when I get it retested. Thanks for your time.

Edited September 11, 2023 by Seanmc980

Seanmc980 · September 11, 2023

The other drives (without errors) are all being powered off the built in harness Sata power, so this does point towards bad connection/bad power.

trurl · September 11, 2023

Bad communication with a disk can be caused by bad cables (power or SATA), bad connectors (power and splitters or SATA, either end), loose connections (power and SATA, either end).

Each connection must sit squarely on the connector, with no tension in the cable that might cause it to move. Don't bundle data cables or you could get crosstalk interference. Don't put more than 4 drives on a single PSU cable.

Need help with a failed drive

Recommended Posts

Seanmc980

Link to comment

JonathanM

Link to comment

Seanmc980

Link to comment

trurl

Link to comment

Seanmc980

Link to comment

JorgeB

Link to comment

Seanmc980

Link to comment

Seanmc980

Link to comment

Seanmc980

Link to comment

trurl

Link to comment

Join the conversation