Disk disabled, content emulated (DRDY ERR ICRC ABRT)

stealth82 · November 30, 2015

Hello, I think this could be my first disk dying of old age but I wanted to have some double confirmation by experts here.

Today I was manually copying through mc data from my cache drive to /mnt/disk2 when promptly a notification on my iPhone came in. It was unRAID telling me something was wrong...

Now disk2 is emulated and I tried to check SMART results to see what happened. Point is... it says the disk is unavailable and it can be spun up for diagnostic.

I checked the syslog, which I attached, and looked up these 2 errors that I saw: DRDY ERR ICRC ABRT

They should be, respectively:

Drive media issue #1: These are almost always associated with bad sectors.

Drive media issue #2: a pretty good indicator of a poor quality SATA cable

Now the last one made me think. Some weeks ago I bought a Supermicro AOC-SASLP-MV8 controller and 2 Mini SAS to 4-SATA SFF-8087 Multi-Lane Forward Breakout Internal Cables. Till some moments ago I had no issues whatsoever though.

Is it possible that just one sub-cable out of 4 is bad?

Should I be worried about it or it could be that the cause is the disk's old age?

I say old age because it shouldered 4y, 6m, 9d, 14h of service so far (I read that stat from its sibling, I have 2 disks bought in the same period).

A new 4TB drive is on the way now and I will have to go for a parity swap procedure when it arrives. Are there any suggestions before getting into that or I should just give up on the old disk?

tower-diagnostics-20151130-1731.zip

trurl · November 30, 2015

For unRAID v6, instead of posting syslog, you should always go to Tools - Diagnostics and post the complete diagnostics zip

stealth82 · November 30, 2015

Apologies. I attached the right zip file now.

trurl · November 30, 2015

SMART for disk2 looks OK. You could just check connections and try to rebuild the drive to itself:

Stop array

Unassign disk2

Start array

Stop array

Reassign disk2

Start array

Wait for rebuild

stealth82 · December 1, 2015

Are you sure? If it reads "A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options." is it a sign that looks OK?

WDC_WD20EARS-00MVWB0_WD-WMAZA0747093-20151130-1731.txt

smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.1.13-unRAID] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               /1:0:1:0
Product:              
User Capacity:        600,332,565,813,390,450 bytes [600 PB]
Logical block size:   774843950 bytes
Physical block size:  1549687900 bytes
Lowest aligned LBA:   14896
scsiModePageOffset: response length too short, resp_len=47 offset=50 bd_len=46
scsiModePageOffset: response length too short, resp_len=47 offset=50 bd_len=46
>> Terminate command early due to bad response to IEC mode page
A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.

trurl · December 1, 2015

Are you sure? If it reads "A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options." is it a sign that looks OK?

WDC_WD20EARS-00MVWB0_WD-WMAZA0747093-20151130-1731.txt

smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.1.13-unRAID] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               /1:0:1:0
Product:              
User Capacity:        600,332,565,813,390,450 bytes [600 PB]
Logical block size:   774843950 bytes
Physical block size:  1549687900 bytes
Lowest aligned LBA:   14896
scsiModePageOffset: response length too short, resp_len=47 offset=50 bd_len=46
scsiModePageOffset: response length too short, resp_len=47 offset=50 bd_len=46
>> Terminate command early due to bad response to IEC mode page
A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.

Sorry my bad. I was looking at smart for disk1. Replace the drive.

stealth82 · December 2, 2015

OK, I think the worst case scenario has just occurred.

I wanted to take down the disk but since I bought a sata cage and rewired everything I wanted to give the disk another try.

The disk came back online and it reported no errors. I guess a wire really got loose - it wasn't the disk, I can't give myself another explanation.

Anyway I put it back into the array and unRAID started rebuilding it.

As it was some hours in the rebuilding process the parity drive started throwing errors (843 in the errors column)

187	Reported uncorrect	0x0032	017	017	000	Old age	Always	Never	83
197	Current pending sector	0x0012	100	099	000	Old age	Always	Never	128
198	Offline uncorrectable	0x0010	100	099	000	Old age	Offline	Never	128

The disk that is getting rebuilt is toasted - data can't be trusted, I'm toasted. Am I right? :'(

trurl · December 2, 2015

OK, I think the worst case scenario has just occurred.

I wanted to take down the disk but since I bought a sata cage and rewired everything I wanted to give the disk another try.

The disk came back online and it reported no errors. I guess a wire really got loose - it wasn't the disk, I can't give myself another explanation.

Anyway I put it back into the array and unRAID started rebuilding it.

As it was some hours in the rebuilding process the parity drive started throwing errors (843 in the errors column)
187	Reported uncorrect	0x0032	017	017	000	Old age	Always	Never	83
197	Current pending sector	0x0012	100	099	000	Old age	Always	Never	128
198	Offline uncorrectable	0x0010	100	099	000	Old age	Offline	Never	128
The disk that is getting rebuilt is toasted - data can't be trusted, I'm toasted. Am I right? :'(

That disk should be replaced. Most likely the parity issues are a connection problem caused by your rewiring since its SMART looked good from your diagnostics. Check your connections and remove the bad drive and reboot. You should be able to see if the data is being emulated. If so then you will be able to rebuild on a new disk.

stealth82 · December 3, 2015

Unfortunately, I don't think so. The parity drive has always been directly attached to the motherboard with a cable I don't have reasons to doubts. The connection was and is solid.

That drive, though, had given me that very same error in the past. After that I put it under observation, ran a couple of preclears on it and seemed fine (I think some under a 100 sectors reallocated but no more growing pending sectors). I guess the best thing to do would have been to trash it rather than risk it... but i didn't have any disk to spare at the time.

Is there any way I can know what sectors have affected the rebuilt drive now.

What I would like to do if I can isolate the problem is to replace the parity drive with a new disk but what you are saying makes me think I could try to rebuild again from the "faulty" parity drive. I really don't know what to do now.

stealth82 · December 3, 2015

I attached a new diagnostic file.

I'd really love to know if there's any way to track down whether the rebuilt has been affected - I think it has - and on what data, if any, the bad sectors "landed". I say if any because the rebuilt disk is 75% full and the errors started appearing in the last 25% of the rebuilding process I think. I don't know if this might mean that maybe there were not files there but just empty space to rebuild.

Any insight?

P.S. Why is unRAID considering the rebuilt disk OK considered it knows there were read errors from the parity?

tower-diagnostics-20151203-1024.zip

stealth82 · December 3, 2015

I don't know if it's related but SMART is telling me the parity disk pending sector count is increasing (184 now).

The parity disk is spun off. I wonder how it can know that considered it's off... I'm scrubbing the rebuilt disk - I don't know if it is related.

trurl · December 3, 2015

Looking at these latest diagnostics, disk2 looks good but parity is failing, as you said. Can you read disk2?

stealth82 · December 3, 2015

I can but my fear is that some of the rebuilt data is corrupt since it's been rebuilt by a failing parity drive with unreadable sectors.

trurl · December 3, 2015

What filesystem is disk2?

stealth82 · December 3, 2015

btrfs

trurl · December 3, 2015

Don't know much about trying to test btrfs disks for corruption or fixing them. You can try searching, but I don't think there is much documented in our forum or wiki. Maybe out in the wild wild web where btrfs is used there may be some documentation you could google.

stealth82 · December 6, 2015

Well, I don't know how to intepret this but...

Inspired by this thread, I just gave a couple of more tries to the issue.

I ran a parity sync without corrections and it finished just a few minutes ago. The colums reads/writes at the end read 1883 errors but apart from that no sync errors?!? How should I intepret the 0 sync errors count? I don't know.

Anyway I'm burying all this. A new parity disk is running the array just now and the sync is in progress.

tower-diagnostics-20151206-1534.zip

Disk disabled, content emulated (DRDY ERR ICRC ABRT)

Recommended Posts

stealth82

Link to comment

trurl

Link to comment

stealth82

Link to comment

trurl

Link to comment

stealth82

Link to comment

trurl

Link to comment

stealth82

Link to comment

trurl

Link to comment

stealth82

Link to comment

stealth82

Link to comment

stealth82

Link to comment

trurl

Link to comment

stealth82

Link to comment

trurl

Link to comment

stealth82

Link to comment

trurl

Link to comment

stealth82

Link to comment

Archived