Jump to content

Original and Replacement Drives Failed


Recommended Posts

Hey all, I've come up against a problem that is above my skillset, and hope someone here will be able to help.

 

I check up on my server yesterday, and found that one of my drives had failed (disk 4). It's smart health still showed okay, but it was a bit of an older drive so I went ahead and replaced it with a new drive that I had at the ready. The drive appeared to mount okay, but when I checked back on the parity rebuild this morning, I saw that it too had failed.

 

My first move is going to be to replace the sata cable, as I've heard that's a common cause of issues like this. Beyond that, though, I'm not sure what to do. I've attached a diagnostic file below, in the hopes that it will shed some light on the issue.

 

Old threads here have helped me fix many problems in the past, and I'm hoping you guys can do it again here. Thanks for your help!

hal-diagnostics-20220402-1034.zip

Link to comment
8 hours ago, Squid said:

It's hard to discern why the drive(s) failed without diagnostics before a reboot when they were listed as being disabled.

 

Cabling is always the go-to as hard drives are actually one of the most reliable components in any random system, whereas SATA connections since day 1 have not been.

Thanks for the tip. I realized after I should have downloaded diagnostics before shutting down the server. The drive reconnected after a restart, and I'm attempting the parity rebuild now. If it becomes disconnected again, I'll make sure to run diagnostics before shutting down.

Link to comment
1 hour ago, itimpi said:

The diagnostics show continual resets on disk1 and disk4 which explains the excessive time.   I would carefully check cabling to these drives.

Thanks for parsing that. I did swap the cable on drive 4, and the rebuild is now showing a much more reasonable 20 hour time, at an acceptable data transfer speed. I'll monitor it for now, and swap the cable to drive 1 if problems persist.

 

For anyone stumbling upon this thread in the future, it seems swapping sata cables is step 1 for a disconnected drive. If there's not further updates, take this as a sign that a replacement cable was the only troubleshooting step required in this particular case. Bit thanks to itimpi for the help!

Link to comment
  • 5 weeks later...

The saga continues!

After a month of normal operation, I checked in on my server today. The monthly parity check had started yesterday, but I could tell from the fan speed it was still going this morning. When I pulled up the server GUI, it showed several hundred days remaining, with a only 500 KB/s speed, and the read errors were piling up on disk 4 (which was just replaced 1 month ago). I've attached logs from my second attempt, and did go ahead and swap cables on drives 1 and 4 again this morning, just to be sure that the cables were not the issues.

 

Any help you guys can provide is greatly appreciated. Thanks!

 

 

hal-diagnostics-20220502-1138.zip

Link to comment
4 minutes ago, JorgeB said:

Still issues with disks 1 and 4, especially 4, also 2 of the SATA ports are set to IDE, you should change that in the BIOS, but it's not the ports where there are currently issues.

Wonderful. I'm starting to wonder if its an issue with the SATA ports on the Mobo. Any ideas for testing that theory?

 

I've got a cold spare here, but this is gonna cost me a fortune if I've got to swap the drive every month lol.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...