Seanmc980 Posted September 10, 2023 Share Posted September 10, 2023 Hello, I went to shut down my server and noticed that one of my drives in the array had a red X next to it. It listed 64 errors in the column. I tried to restart hoping that a reboot would just fix the issue (knowing that it probably wouldn't) but the drive is still red. When I first noticed this, the drive was missing it's temperature reading. The system hung and never rebooted. I issued a shutdown, and eventually it did shut down, but it indicated that it wasn't a clean shut down. I started the array in maintenance mode and started a parity check.. I feel like this is probably a 20 hour waste of time, so i'm reaching out to support for help with getting this back up and running. After the reboot, the log files seemed to have cleared. The drive is still showing under the array but the drive's letters have changed.. it went from SDK to SDG and the drive is shown below in under historical device.. This array is made up of 4 12TB drives, with 1 parity, 1 2TB drives and 3 unassigned drives. I want to make sure I perform the right steps, this is my main server for all my media and back ups.. I really do not want to screw this up, please someone walk me through the process.. I found a couple of forum posts, but they linked to outdated and removed posts. Quote Link to comment
JonathanM Posted September 10, 2023 Share Posted September 10, 2023 Attach the diagnostics zip file to your next post in this thread. Quote Link to comment
Seanmc980 Posted September 11, 2023 Author Share Posted September 11, 2023 I did reboot, like mentioned. Hopefully something in these logs will be helpful. Thank you btch-diagnostics-20230910-2024.zip Quote Link to comment
trurl Posted September 11, 2023 Share Posted September 11, 2023 1 hour ago, Seanmc980 said: started a parity check Fortunately you didn't start a correcting parity check Sep 10 14:07:13 BTCH kernel: mdcmd (36): check nocorrect since what you need to do is rebuild the disabled disk using the existing parity. syslog indicates emulated disk3 mounted before you restarted in maintenance mode, and disk3 SMART looks OK. Probably just a connection problem, but can't say for sure since it happened before reboot so nothing in syslog about that. It should be OK to rebuild to the same disk, but you should check connections first. https://docs.unraid.net/unraid-os/manual/storage-management/#rebuilding-a-drive-onto-itself Do you have backups of anything important and irreplaceable? Quote Link to comment
Seanmc980 Posted September 11, 2023 Author Share Posted September 11, 2023 I tried the steps you suggested, but the drive came back with errors. It's paused at 10% of the rebuild/sync. I've attached the diagnosis zip..what should I do know? btch-diagnostics-20230911-0632.zip Quote Link to comment
Solution JorgeB Posted September 11, 2023 Solution Share Posted September 11, 2023 Looks more like a power/connection issue, replace cables/swap slot and try again. Quote Link to comment
Seanmc980 Posted September 11, 2023 Author Share Posted September 11, 2023 22 minutes ago, JorgeB said: Looks more like a power/connection issue, replace cables/swap slot and try again. How can you tell? I can replace the power cable and swap sata. I'll report back. Quote Link to comment
Seanmc980 Posted September 11, 2023 Author Share Posted September 11, 2023 (edited) This drive was powered by an expansion sata power connector.. one of those old style HDD to Sata power adapters.. the one power cable was split three times to power 2 fans, then the sata adapter.. it was powering 4 drives, including the one with errors.. I wasn't aware that this was a terrible idea, until now. The power supply is an older modular ATX style, but I lost all the expansion cables in a move.. new power supply ordered, server shut down until I get it swapped and properly powered.. I'll report back when I get it retested. Thanks for your time. Edited September 11, 2023 by Seanmc980 Quote Link to comment
Seanmc980 Posted September 11, 2023 Author Share Posted September 11, 2023 The other drives (without errors) are all being powered off the built in harness Sata power, so this does point towards bad connection/bad power. Quote Link to comment
trurl Posted September 11, 2023 Share Posted September 11, 2023 Bad communication with a disk can be caused by bad cables (power or SATA), bad connectors (power and splitters or SATA, either end), loose connections (power and SATA, either end). Each connection must sit squarely on the connector, with no tension in the cable that might cause it to move. Don't bundle data cables or you could get crosstalk interference. Don't put more than 4 drives on a single PSU cable. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.