Jump to content

Disk Error - Looking for Advice


3doubled

Recommended Posts

Hi,

 

I'm having a disk error that keeps forcing one of my drives to 'soft resetting link'. I'm only seeing it on one drive, but I may have observed similar behavior on other drives before. I'm a bit worried it is my motherboard SATA controller. I've attached my most recent syslog and a SMART report from the drive displaying the errors (ATA6, which is sdf according to the ata_devices.sh script). I ran a short SMART test on the drive in question about an hour ago with no errors.

 

EDIT: I've posted an updated syslog. Overnight the errors kept going, but I noticed some actual Read errors as well as link resets for ata3 and ata4.

 

EDIT2: I now have read errors on the Unraid Main page on 3 of my drives: disk 1 (sdf, ata6), disk3 (sde, ata4), and disk 12 (sdd, ata3). It also lists 47,218,319,685,376 writes to disk3... which is concerning.

 

EDIT3: I stopped the array, which took a couple of minutes, and now I have a red ball on disk3... this problem is really avalanching

 

EDIT4: Things are really hitting the fan now. After the last power cycle (adding a new HDD in case I needed to replace disk3) disk12 has disappeared. I've attached the latest diagnostic report in the post below. Disk was not detected because I knocked a SATA cable loose while checking for loose cables...

 

I'm was running Unraid 6.0-rc3, with no modifications, dockers, or VM. I upgraded to Unraid 6.0.0 during the course of the troubleshooting.

 

Thanks for the help in advance!

disk1-attributes-ATA6-sdf.txt

ata_devices_output_2015-06-15.txt

Syslog_-_2015-06-15.txt

Link to comment

My apologies for the mixup. I'm not sure what happened, but when I updated my syslog this morning I apparently saved an empty file. I've attached my first syslog to the original post. This syslog shows the link resets, but not the more recent read errors or what might have led to the red ball. I will see if I can recover the more recent syslog when I return home, but if Unraid didn't save the syslog to the flash drive, then it may be lost when I rebooted after upgrading to the lastest version of Unraid 6.0.

Link to comment

1) I've attached the diagnostic output following an upgrade and reboot. I started this upgrade and reboot hoping to get you the diagnostic file you were asking for and thinking I had saved my entire syslog from the last two days. However, I was wrong. My apologies for my stupidity, I used to run Unmenu with a plugin for automatic syslog saves, but I haven't got around to installing Unmenu after the Unraid 6 upgrade.

 

2)I fortunately left a syslog browser window open so it captures most of the events of the past 24 hours, but it is not technically complete.

 

3) Upon rebooting, disk3 now appears with as faulty (an x in dashboard). This is suspicious when the original problem started with disk1.

 

Thanks for your patience, I've been all over the place today.

Syslog_-_2015-06-16.zip

tower-diagnostics-20150616-1937.zip

Link to comment

A quick update. I bought a new 4TB WD Red in case I need to replace the "failed" 1.5TB disk. I powered down, added the disk on a spare SATA port on my supermicro saslp-mv8 with the intention to preclear this disk so it would be ready for replacement. Upon rebooting a 2nd drive is reported as failed - disk12.

 

Now this is getting worrisome as I cannot rebuild if both disks have truly failed.

 

I've attached the diagnostic following the last powerup.

 

Thanks again in advance for the help.

tower-diagnostics-20150617-2025.zip

Link to comment

My PSU is a Corsair TX650W. It has handled 14 drives for over a year and the server for 6 years, so it has a good track record so far.

 

Funny enough I did a wiggle check of my SATA cables when I  installed the new drive, but I must have actually loosened one of the cables in the act. Good call on the cable check, thanks.

 

So I'm back to only 1 drive "failed" (disk3, sde) after seeing numerous link resets and read errors for disk1, disk3, and disk12. I'm suspecting that disk3 has actually failed, but I'm not sure that it was the source of all of the link resets and read errors on the other disks. Could the failing disk3 have been overloading the SATA controller, causing link resets and read errors on the other disks?

 

If so, I'll try preclearing the new drive and then rebuilding the failed drive.

Link to comment

Sorry for the slow reply, I was away during the weekend.

 

I've attached my latest syslog and a screenshot of my unraid main page. There is not much to see in the screenshot. Disk3 is disabled because disk3 is no longer detected (must have totally failed).

 

During the weekend I used the preclear script to preclear a new 4TB Red drive. I did one run while connected to my supermicro saslp-mv8 and then two runs connected to my motherboard's SATA controller. No errors after all 3 runs. I take it my controller is fine?

 

My initial worry was regarding the errors in my earlier syslogs. I was seeing read errors on the other disks (not disk3). Could a failing disk cause read errors on other disks? Now that I've started a disk rebuild of disk3, I see a few errors in my syslog (see attached). The most worrisome errors begin at 20:36:53 and end at 20:37:46. The other odd error I haven't seen before is about 1000 "program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO".

 

Thanks

tower-diagnostics-20150622-2143.zip

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...