Jump to content

Slow Parity Check


blu3_v2

Recommended Posts

First a bit of background.

 

Last week our home suffered a power-failure. Unfortunately my APC BE700G-AZ UPS failed, causing my unRAID server to undergo a 'dirty' shutdown.

 

After purchasing a new UPS I restarted the server which triggered an automatic parity check. This check completed successfully in around 7-8 hours (approx. 130 MB/sec on a 4Tb parity drive) which is normal for my system so I thought everything was great. The server has remained powered-on the whole time since without issue.

 

However, during my scheduled monthly Parity check which is run on the 1st of the month, the parity check took twice as long at 14+ hours (averaging 75 MB/sec). I was not accessing or using the server in any way either time and the system has been up the whole time in between.

 

I have attached diagnostics to this post. Could someone please take a look and offer any advice as it seems strange that there would be such a difference with no other variables between parity checks.

 

Thanks in advance

watchtower-diagnostics-20150801-1751.zip

Link to comment

A drive has failed and must be replaced:

Device Model:     HGST HDS724040ALE640
Serial Number:    PK1311PBGJKHRX
LU WWN Device Id: 5 000cca 23dc787a8
Firmware Version: MJAOA580
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Sat Aug  1 17:51:19 2015 AEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
See vendor-specific Attribute list for failed Attributes.

 

What dies unRAID Main show for this drive? This drive should still be under warranty.

Link to comment

What dies unRAID Main show for this drive? This drive should still be under warranty.

 

Thanks for picking that up.

 

unRAID still shows that drive as 'green-balled' on the Main page and says it's SMART status is fine with a Thumbs-up on the Dashboard which seems a bit odd.

 

This is the third drive in the same slot in my N36L Microserver to die so I am beginning to suspect there may be a PSU issue. I shutdown unRAID after posting my diagnostics last night and checked that the power/SATA cables are connected and drive fully seated in the slot and everything was fine.

 

I will see what happens in the next few days. The drive is a RMA replacement so is definitely still under warranty being only a few months old.

 

 

Link to comment

What version of unRAID are you running.

If there is a FAILING NOW attribute, that should surely send off a notification.

 

Currently running 6.0.0.

 

I have just run the attached diagnostics script again and this time the disk in question is listed as Pre-Fail in the SMART report (this is after shutting down server and checked cabling etc).

 

Is there any other checks I need to do on the system?

watchtower-diagnostics-20150802-1448.zip

Link to comment

Maybe bonienl has some input on this. I would have thought the front end to reveal a FAILING NOW attribute.

If you don't add that particular smart attribute to the ones which are monitored (by default: 5, 187, 188, 197, 198) then you don't get notifications on it.  But, you're right that if ANY attribute is failing then a notification should be sent.
Link to comment

Maybe bonienl has some input on this. I would have thought the front end to reveal a FAILING NOW attribute.

If you don't add that particular smart attribute to the ones which are monitored (by default: 5, 187, 188, 197, 198) then you don't get notifications on it.  But, you're right that if ANY attribute is failing then a notification should be sent.

 

Currently no notification is sent when an attribute goes into FAILED state, the only to see it, is by making a SMART report.

 

This item is added to my todo list together with the enhancements proposed by RobJ, now I only need to find time ???

Link to comment

When this appears, it is cause for alarm.

 

SMART overall-health self-assessment test result: FAILED!

Drive failure expected in less than 24 hours. SAVE ALL DATA.

 

In my experience, SMART doesn't catch every situation, but when it is warning you as this, It's going to be a near death experience.

Granted it could be hours, days, months in this state, but in this state, it could go in the next few minutes as well.

If the firmware's own diagnostics say something is wrong, you better believe it! LOL!

Link to comment

Check the RAW smart report. If anything says FAILING NOW or this message shows up.

 

SMART overall-health self-assessment test result: FAILED!

Drive failure expected in less than 24 hours. SAVE ALL DATA.

 

then RMA the drive immediately.

Frankly, the prior report stated this, So I would prepare to replace it.

If I had an onsite spare, I would be pre clearing it now.

There really could be an issue with the PSU, which may be why there was a spin up error posted.

Do you have other additional internal drives or just the 4 drives for the micro server?

Link to comment

Do you have other additional internal drives or just the 4 drives for the micro server?

 

My server currently has 4x HGST 4Tb as the array and 1x HGST 500Gb 2.5" laptop drive as a cache drive.

 

Others have installed more drives without issues using the included PSU so it should have enough power. Plus, it is a little strange that only one slot is suffering errors as all of the slots are supplied off the same rail and set of Molex cables.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...