zLynde Posted April 2, 2022 Share Posted April 2, 2022 Hey all, I've come up against a problem that is above my skillset, and hope someone here will be able to help. I check up on my server yesterday, and found that one of my drives had failed (disk 4). It's smart health still showed okay, but it was a bit of an older drive so I went ahead and replaced it with a new drive that I had at the ready. The drive appeared to mount okay, but when I checked back on the parity rebuild this morning, I saw that it too had failed. My first move is going to be to replace the sata cable, as I've heard that's a common cause of issues like this. Beyond that, though, I'm not sure what to do. I've attached a diagnostic file below, in the hopes that it will shed some light on the issue. Old threads here have helped me fix many problems in the past, and I'm hoping you guys can do it again here. Thanks for your help! hal-diagnostics-20220402-1034.zip Quote Link to comment
Squid Posted April 2, 2022 Share Posted April 2, 2022 It's hard to discern why the drive(s) failed without diagnostics before a reboot when they were listed as being disabled. Cabling is always the go-to as hard drives are actually one of the most reliable components in any random system, whereas SATA connections since day 1 have not been. Quote Link to comment
zLynde Posted April 3, 2022 Author Share Posted April 3, 2022 8 hours ago, Squid said: It's hard to discern why the drive(s) failed without diagnostics before a reboot when they were listed as being disabled. Cabling is always the go-to as hard drives are actually one of the most reliable components in any random system, whereas SATA connections since day 1 have not been. Thanks for the tip. I realized after I should have downloaded diagnostics before shutting down the server. The drive reconnected after a restart, and I'm attempting the parity rebuild now. If it becomes disconnected again, I'll make sure to run diagnostics before shutting down. Quote Link to comment
zLynde Posted April 3, 2022 Author Share Posted April 3, 2022 Quick update, The data rebuild ran overnight, but shows an estimated duration of 300+ days, which I can't imagine is right. New logs attached, though drive 4 did not disconnect. I'm going to stop the rebuild and replace the cable before another attempt. hal-diagnostics-20220403-0911.zip Quote Link to comment
itimpi Posted April 3, 2022 Share Posted April 3, 2022 The diagnostics show continual resets on disk1 and disk4 which explains the excessive time. I would carefully check cabling to these drives. Quote Link to comment
zLynde Posted April 3, 2022 Author Share Posted April 3, 2022 1 hour ago, itimpi said: The diagnostics show continual resets on disk1 and disk4 which explains the excessive time. I would carefully check cabling to these drives. Thanks for parsing that. I did swap the cable on drive 4, and the rebuild is now showing a much more reasonable 20 hour time, at an acceptable data transfer speed. I'll monitor it for now, and swap the cable to drive 1 if problems persist. For anyone stumbling upon this thread in the future, it seems swapping sata cables is step 1 for a disconnected drive. If there's not further updates, take this as a sign that a replacement cable was the only troubleshooting step required in this particular case. Bit thanks to itimpi for the help! Quote Link to comment
zLynde Posted May 2, 2022 Author Share Posted May 2, 2022 The saga continues! After a month of normal operation, I checked in on my server today. The monthly parity check had started yesterday, but I could tell from the fan speed it was still going this morning. When I pulled up the server GUI, it showed several hundred days remaining, with a only 500 KB/s speed, and the read errors were piling up on disk 4 (which was just replaced 1 month ago). I've attached logs from my second attempt, and did go ahead and swap cables on drives 1 and 4 again this morning, just to be sure that the cables were not the issues. Any help you guys can provide is greatly appreciated. Thanks! hal-diagnostics-20220502-1138.zip Quote Link to comment
JorgeB Posted May 2, 2022 Share Posted May 2, 2022 Still issues with disks 1 and 4, especially 4, also 2 of the SATA ports are set to IDE, you should change that in the BIOS, but it's not the ports where there are currently issues. Quote Link to comment
zLynde Posted May 2, 2022 Author Share Posted May 2, 2022 (edited) Short SMART test on disk four was completed without error, running extended smart test now Edited May 2, 2022 by zLynde Quote Link to comment
zLynde Posted May 2, 2022 Author Share Posted May 2, 2022 4 minutes ago, JorgeB said: Still issues with disks 1 and 4, especially 4, also 2 of the SATA ports are set to IDE, you should change that in the BIOS, but it's not the ports where there are currently issues. Wonderful. I'm starting to wonder if its an issue with the SATA ports on the Mobo. Any ideas for testing that theory? I've got a cold spare here, but this is gonna cost me a fortune if I've got to swap the drive every month lol. Quote Link to comment
JorgeB Posted May 2, 2022 Share Posted May 2, 2022 Make sure you swap/replace both the power and SATA cables first, as one of those it's usually the culprit, and also much easier and cheaper to do first. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.