[SOLVED] Pending Sector and Uncorrectable errors

FreeMan · July 10, 2018

I received notifications this morning of 8 Pending Sectors and 8 Offline Uncorrectable errors on an... elderly... (4.75 power-on years) 2TB Seagate Barracuda. I just got home, ran a short SMART test (attached) and have started a long SMART test. Reallocated sector count is still at 0.

I was planning on upgrading to a new 8TB parity and moving the current 4TB parity drive into the array as a replacement for an equally (or more) elderly 1TB drive. Should I plan on replacing this one first instead?

Extended SMART results will be posted as soon as they're available.

nas-smart-20180710-1845.zip

Edited July 17, 2018 by FreeMan
resolved

FreeMan · July 11, 2018

OK, smart report attached. It shows errors but I've not reviewed it, doing this via TeamViewer from the office...

I got an additional earning this morning that there was an error, then a notification the it was recovered.

extended SMART and fresh diagnostics attached.

of note, the power went out at the house while we were gone on vacay. the UPS did its thing and shut the server down properly. it remained powered off for about a week while we were gone, and these issues cropped up about 48 hours after turning it back on.

nas-diagnostics-20180711-1012.zip

nas-smart-20180711-0841.zip

Edited July 14, 2018 by FreeMan
spelling/grammar

JorgeB · July 11, 2018

Extended SMART test failed = disk needs to be replaced

FreeMan · July 11, 2018

Extended SMART test failed = disk needs to be replaced

Thanks, Johnnie.

It was bound to happen eventually, I thought I was going to get a jump on it and start replacing the oldest drives before that happened...

Guess I got started a bit too late. Thank goodness for CrashPlan - that disk has all of family photos on it, and I just got an email from CP this morning that backups are complete.

Sent from Tapatalk

FreeMan · July 11, 2018

That was odd...

I just got 2 notifications:

@14:56

Notice [NAS] - array turned good

{numbers}

Array has 0 disks with read errors

normal

@14:57

Notice [NAS] - Parity check started

{numbers}

Size 4TB

Warning

I was going to ask about running a non-correcting parity check (my last one ran w/no issues on 1 June. The server was down on 1 July.) just to ensure that everything was good. I don't think I want to run a correcting parity check because disk 7 is known to be failing, and I don't want a read issue there to cause a change to parity and corrupt a (believed/known) good parity in case of complete disk failure.

Does running a non-correcting parity check make sense at this point, or should I just pop down to the shop, pick up a new 8TB drive, and do a parity/data-drive switcharoo?

FreeMan · July 11, 2018

I've just ordered a new WD MyBook external drive from Newegg and I'll stop by the warehouse to pick it up tomorrow. (The bonus of having a warehouse in the city where you live. The downside - I have to pay sales tax.)

Since the preclear plugin seems to be out of fashion these days, what's recommended to test the drive for infant mortality prior to shucking and installing internally?

JonathanM · July 11, 2018

38 minutes ago, FreeMan said:

I've just ordered a new WD MyBook external drive from Newegg and I'll stop by the warehouse to pick it up tomorrow. (The bonus of having a warehouse in the city where you live. The downside - I have to pay sales tax.)

Since the preclear plugin seems to be out of fashion these days, what's recommended to test the drive for infant mortality prior to shucking and installing internally?

Plug it in to a windows box and run the wddiag suite on it. http://downloads.wdc.com/windlg/WinDlg_v1_31.zip

A sequence of writing zeroes and then a long smart test would accomplish something very similar to preclear.

Just be sure to keep the drive cool, the externals don't have the best circulation so I'd put a fan blowing on it.

FreeMan · July 11, 2018

Thanks, @jonathanm. File downloaded & ready to fire up.

JorgeB · July 12, 2018

11 hours ago, FreeMan said:

I don't think I want to run a correcting parity check because disk 7 is known to be failing, and I don't want a read issue there to cause a change to parity and corrupt a (believed/known) good parity in case of complete disk failure.

Definitely don't, it might corrupt parity.

11 hours ago, FreeMan said:

Does running a non-correcting parity check make sense at this point, or should I just pop down to the shop, pick up a new 8TB drive, and do a parity/data-drive switcharoo?

Don't see the point in running a check, just replace the disk.

FreeMan · July 13, 2018

21 hours ago, johnnie.black said:

Definitely don't, it might corrupt parity.

Currently, I have a 4TB drive as my parity. I just picked up an 8TB drive, and once the initial testing to ensure there will be no infant mortality happens, it will become my new parity and the existing 4TB will replace the failing 2TB disk7.

Replacing parity with a larger drive is simple - shut down the array, put the larger disk in, assign the biggest disk to the parity slot & let it rebuild parity. However, I'm not 100% convinced this is a good idea, because it's possible I may have a bad file or two on the failing drive.

Replacing the data drive is simple, except that I don't have a 4TB or smaller drive to replace it with.

I do have just enough space on other drives to be able to scatter the data from the failing disk to other disks, remove the failing disk from the array, then swap parity & add the former parity back into the array.

What is the best procedure to do what I need to do?

remotevisitor · July 13, 2018

You want the "parity swap" procedure ... https://lime-technology.com/wiki/The_parity_swap_procedure

This procedure copies your existing parity to the new (larger) disk. When that is done it rebuilds the failed data drive onto the disk that was the old parity disk.

Make sure you understand what this process involves as you must ensure that you do all the necessary steps .... ask for help from the experts here if you are unsure or need further clarification on the steps involved.

FreeMan · July 14, 2018

Thanks, @remotevisitor. I knew the instructions were out there somewhere.

The new drive has been tested and zeroed (as part of the testing) and parity swap is in progress at 13% complete on writing parity to the new drive.

Guess I should plan on another drive sooner rather than later to replace a few of the senior 1TB drives I've currently got.

FreeMan · July 15, 2018

The parity copy has completed, and it's now rebuilding data onto the former parity disk.

Is there any reason that there are no user shares available at this point?

Current diagnostics attached. Previous diagnostics (showing existing shares) are in posts above.

nas-diagnostics-20180715-1015.zip

Squid · July 15, 2018

If nothing else, you found a bug in CA Backup where it was still attempting to backup even though the destination didn't exist. (Because the array wasn't started) This had the effect of spamming your logs with all the errors about xattr

Edited July 15, 2018 by Squid

FreeMan · July 15, 2018

11 minutes ago, Squid said:

If nothing else, you found a bug in CA Backup

Glad I could help!

Do I need to manually recreate all my shares, or are they likely to come back on their own after a reboot? I'm sitting at 56% complete on the drive rebuild right now, so I'm not rebooting any time soon, but if I need to manually recreate, I'll get started on it right now so the array is usable again.

Squid · July 15, 2018

I think you started the array in Maintenance mode, as I don't see the disks being mounted before CA started throwing all of those errors. IE: You'll have to wait til the rebuild is done.

FreeMan · July 15, 2018

I saw the checkbox to do so, but I don't think I did, though obviously I could be wrong.

Is there any other way to check on that while the rebuild is happening?

Sent from Tapatalk

Squid · July 15, 2018

Ok. Dug through further, and you did start the array. But when it wasn't started, CA still attempted the backup. You still get the prize for the bug find. Unfortunately, CA basically spammed the crap out of your syslog because of this. Fixed

FreeMan · July 15, 2018

Question still stands: Do I have to manually recreate the shares (in which case I'll figure everything out now), or will they recreate on a reboot (or just stop/start the array)?

2 hours ago, Squid said:

You still get the prize for the bug find.

What do I get? My very own, unautographed, digital image of @Squid in my thread?

FreeMan · July 17, 2018

For the record, the shares did get created when I rebooted the server after the disk rebuild.

[SOLVED] Pending Sector and Uncorrectable errors

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation