help with smart report


Recommended Posts

I have a drive in my array that has been disabled and seems to be passing the smart test, but I am a bit of a noob at this and I am probably missing something so any help would be appreciated. 

 

A little back story - about a week ago this drive was put in "disabled" status. I stopped the array, removed the drive from the array, started the array, then stopped and re-added the drive again. that worked for a little bit but then the drive went back to being disabled again. I have checked cables to make sure they are all properly connected but that doesn't seem to help the drive from being disabled.

 

I thought I would check in the forums here before I went out and bought a new drive.

WDC_WD40EFRX-68N32N0_WD-WCC7K6VNN8P2-20191231-1728.txt

Link to comment

Disk have 6 errors

 

Error 6 [5] occurred at disk power-on lifetime: 16712 hours (696 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

 

And longtest also not pass with "read failure"

 

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       60%     16754         3045234944

 

If longtest again still fail, you should identify disk were bad.

  • Like 1
Link to comment
  • 3 weeks later...

Hi folks - I have a similar scenario. I have been running this Unraid array for years and notwithstanding a few new hard disks replacing some older ones, have had no issues with this array until today when one of my drives has become "red-balled" - or "Red X". I have undertaken a Smart Diagnostic Report and it doesn't appear to have any issues with the key stats of "Red Read Error rate" and "reallocated sector". Before rebuilding the drive from my parity (which is valid) I wanted to see if someone who is more knowledgeable about reading these SMART reports can check this drive to see whether it is indeed developing issues and should be replaced, or whether it is reasonably safe to rebuild to the same disk.


SMART report attached.

 

Thanks in advance.

Best wishes

Mike.

WDC_WD20EARX-00PASB0_WD-WCAZAD181918-20200120-1457.txt

Link to comment

Unfortunately you rebooted so there is nothing in the syslog to tell us what may have caused the disk to be disabled.

 

Your Unraid version and some of your plugins are not up-to-date. Any reason why?

 

Also, many of your disks are still ReiserFS..

 

SMART for that disk looks OK, but it is pretty old. Safest approach would be to rebuild to a new disk and keep the original as a backup in case there is a problem with the rebuild.

 

I didn't check SMART for the other disks. Do any of them have SMART warnings on the Dashboard?

 

If you really want to rebuild to that same disk then you should run an extended SMART test on it first.

 

 

Link to comment

Hi there - thanks for the quick reply.

 

No real reason why my Plugins aren't uptodate, I rarely use them, and have just updated them so they are the latest version.

I do however keep track of my Unraid version, and it seems to suggest this is the latest version when I pressed the little "i" icon next to the version - this is how I have updated it in the past, however having forced an Operating System check, I notice as you suggest that v6.8.1 is available, so will update that now - thanks for the tip.

 

Appreciate the suggestion on the drive in terms of its age - I will do a full Extended SMART test before contemplating restoring to that, but will also look at purchasing a replacement disk.

 

None of the other drives show any SMART errors.

 

Quick question for you - if I was to replace that 2TB drive with a 6TB drive (my parity array is 6TB) - will it happily rebuild that drive to a larger capacity drive, or does the replacement disk have to be the same size as what it is replacing?

 

Thanks for your help.

Cheers,

Mike.

Link to comment
  • 3 weeks later...

Hi guys - apologies to return back here with the same scenario again, but after doing an extended SMART test, discovering the drive had no errors, checked the cables, etc. and then rebuilt the drive from parity, and having no issues for the last few weeks, the same disk has red-balled again (red X these days). The array has been off for the last few days, but turning it on has disabled Disk 5 and hopefully UnRAID's decision to deactivate Disk 5 is captured properly by my diagnostic file. I have attached it here - I would appreciate thoughts on what could be going on with this disk as extensively testing it revealed no issues, but if it is ultimately the cause for these issues, I'm happy to just replace it with a brand new one - unless of course it happens to be the cables, but they look absolutely fine to the naked eye, but again would welcome feedback.

Thanks in advance.
Cheers

Mike.

bargetower-diagnostics-20200209-2126.zip

Link to comment

Disk dropped offline:

 

Feb  9 21:21:08 BargeTower kernel: ata6: SATA link down (SStatus 0 SControl 300)
Feb  9 21:21:08 BargeTower kernel: ata6.00: link offline, clearing class 1 to NONE
Feb  9 21:21:09 BargeTower kernel: ata6: SATA link down (SStatus 0 SControl 300)
Feb  9 21:21:09 BargeTower kernel: ata6.00: link offline, clearing class 1 to NONE
Feb  9 21:21:10 BargeTower kernel: ata6: SATA link down (SStatus 0 SControl 300)
Feb  9 21:21:10 BargeTower kernel: ata6.00: disabled
Feb  9 21:21:10 BargeTower kernel: ata6.00: detaching (SCSI 6:0:0:0)

 

Check connections and post new diags.

Link to comment

Hi guys, thanks for the above - I have tried 3 different SATA cables (all have never been used) connecting into that particular disk and interesting, it is now showing under the unRAID array as "Not Installed" - see new diagnostics attached. It used to be just taken out of service, but now is showing as not being even detected. This has not occurred to me before.

I'm trying to figure out if the hard disk has just "died" or whether the power cable is problematic, however this singular SATA-style power cable from the motherboard has two "ends" to it - one going into one of the other hard disks at the end of the cable, and the one going into this hard disk is "half way" down the cable - I'm sure you know what I'm describing which suggests to me the cable itself must be okay (or perhaps that's not right).

 

Is it more than likely the hard disk has just "died" - I just didn't think that this is how a hard disk would behave. I have another hard disk that I took out of service in my unRAID array and is sitting on a shelf - is it worth plugging this one in (the data on it is old and could simply be overwritten) and seeing if it is the hard disk that is problematic in lieu of replacing it with a new drive, or is there something more sinister going on here?

I can't believe that the PCI card I have plugged into my motherboard (for more SATA slots) has only this particular port that isn't functioning without the others being problematic, so would welcome insight as to what might be going on here. It does sound on the face of it that the hard disk has completely failed but am not ruling out power either.

Best wishes,
Mike.

bargetower-diagnostics-20200212-1657.zip

Link to comment

Disk isn't initializing correctly:

Feb 12 16:54:34 BargeTower kernel: ata6: SATA link down (SStatus 0 SControl 300)
Feb 12 16:54:38 BargeTower kernel: ata6: limiting SATA link speed to 1.5 Gbps
Feb 12 16:54:39 BargeTower kernel: ata6: SATA link down (SStatus 0 SControl 310)
Feb 12 16:54:42 BargeTower kernel: ata6: SATA link down (SStatus 0 SControl 300)
Feb 12 16:55:03 BargeTower kernel: ata6: limiting SATA link speed to 1.5 Gbps
Feb 12 16:55:04 BargeTower kernel: ata6: SATA link down (SStatus 0 SControl 310)
Feb 12 16:55:07 BargeTower kernel: ata6: SATA link down (SStatus 0 SControl 300)
Feb 12 16:55:07 BargeTower kernel: ata6: limiting SATA link speed to 1.5 Gbps
Feb 12 16:55:09 BargeTower kernel: ata6: SATA link down (SStatus 0 SControl 310)

 

Try swapping SATA cable with another one, just on the disk, so it also uses a different SATA port, if still the same disk is likely dead.

Link to comment

Thanks for that! I did use three completely separate SATA cables (two of which have never been used, one has been used but works correctly). - so convinced the cables are fine. I do think testing a different SATA port makes sense. 

Just a quick check on that front - if I swap out a SATA cable that is currently connected to one of my other disks and try this one (just to see if the disk is indeed dead) - will that cause my array to ruin itself or when I simply plug the working disk (to test the port/cable) back again will the array simply resume its current state okay. Reason I ask is I don't want to inadvertently cause another issue by testing this one.

Put another way, if I unplug "Disk 4" with Disk 5 currently failing, and have the SATA cable that I know works into Disk 5, and then Disk 5 then works again, will the array become problematic because now Disk 4 has been recorded as "Not Connected" or will my array be okay, I then reconnect the cables the way they were, deduce the disk is now dead and replace the disk.

Just being super conservative, appreciate your advice!
Cheers

Mike

Link to comment
32 minutes ago, MikeyJeff said:

Put another way, if I unplug "Disk 4" with Disk 5 currently failing, and have the SATA cable that I know works into Disk 5, and then Disk 5 then works again, will the array become problematic because now Disk 4 has been recorded as "Not Connected" or will my array be okay, I then reconnect the cables the way they were, deduce the disk is now dead and replace the disk.

If another disk goes missing array won't start and warn about it, in that case just power down and swap cables back.

Link to comment

Hi folks - thanks for your assistance with the above - turns out after replacing the SATA cable, checking the power cable, that the hard disk just died. As soon as I put another hard disk in, the array recognised it immediately. To that end, the replacement drive is another WDC 2TB drive that was taken out of service upon getting a new 6TB WDC Red NAS drive recently, so this one is sufficing for now, but I'd like the brain's trust view on whether this drive is of good condition or not. The SMART report passes according to the UnRAID operating system, however the re-allocated sector count bothers me. It is only "3" and isn't increasing however I wanted to know if there's anything else amiss with this drive and should consider replacing it, or whether keeping it and simply keeping an eye on SMART IDs 5, 187, 197, 198, 199 would be appropriate.

Thanks again folks, SMART report attached for the replacement drive.

Cheers,
Mike

bargetower-smart-20200218-0837.zip

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.