Jump to content

Syslog reports disk errors, but no redball


cpthook

Recommended Posts

Hello forum

 

last evening I received an alert from the "Fix Common Problems" tool by e-mail that my Parity disk (sdc) reported errors.  Well, sure enough the number 460 was in the error column next to Parity.  I was doing nothing special at the time when the syslog reported the errors, but I do have automated services running in the background, e.g, mover, appsharebackup, etc.  I'm a bit puzzled by this because the Parity is still active and not red-balled.  The tool reported the following: If the disk has not been disabled, then unRaid has successfully rewritten the contents of the offending sectors back to the hard drive.  Parity is 4TB in size. 

 

What I've done so far:

 

[*]I immediately wanted to test the integrity of my Parity drive so I ran a Parity check and went to sleep, only to wake up and find the unRAID GUI unresponsive but server was still active; VMs and APPSW containers were also active and I was receiving a Ping response from TOWER.

[*]At this point, I went ahead and bounced my server from the command line and proceeded to perform the recommendation from Fix Common Problems and run a SMART test which is currently stuck at 90% after over 14hours; however, I ran a quick SMART prior to the extended which reported no errors.

  I have yet to shut down my server and check SATA cable connections, but I will after SMART test is complete

 

Questions:

 

1.  Should I wait for the extended SMART test to complete and how long should I expect to wait for the test to complete on a 4TB WD drive?

2.  Should I suspect imminent drive failure at this point? I've ordered a replacement WD Red NAS drive just in case?

3.  What are some other areas I should check that may be causing these errors in order to rule out possible drive failure? The drive has been in operation for less than one year.  This point may be moot, but i still wanted to mention.

 

Thanks, in advance for any replies and assistance provided.  I've included the latest unRAID diagnostic report for review.

 

 

 

towerii-diagnostics-20161024-1928.zip

TOWERII.PNG.655352ae0bb154eea2fb457a5c5cbb4d.PNG

Link to comment

Errors on syslog indicate a interface issue, check/replace both cables and sata port if available.

 

Thanks for the response.  When running Parity check, errors begin to show up on Parity disc at around 62% and speed slows down to a crawl.  This has happened at the exact same time during the Parity Check process on 3 separate occasions, with the most recent being after I'd done what you suggested in the last post.  Notice updated log now referencing the errors on a different sata port (ata1) and error activity is almost identical. 

 

SMART test run on the disk itself finally finished but reported no errors.

syslog_10_25.txt

Link to comment

When running Parity check, errors begin to show up on Parity disc at around 62% and speed slows down to a crawl.  This has happened at the exact same time during the Parity Check process on 3 separate occasions

 

This points to a disk problem, despite the healthy SMART report, next step would be to replace the disk with a spare to confirm.

Link to comment

Well, I sure hope my rambling about this issue will help others in the future.  Replacement disc installed and Parity Sync complete with o errors; however, although my server is back up and running I'm concerned about why unRAID detected read errors on the disc but SMART report is clean.  RMA under these circumstances may not be easy even though warranty is good until 10/17. 

 

Thanks again for the support johnnie.black. 

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...