FreeMan Posted July 10, 2018 Share Posted July 10, 2018 (edited) I received notifications this morning of 8 Pending Sectors and 8 Offline Uncorrectable errors on an... elderly... (4.75 power-on years) 2TB Seagate Barracuda. I just got home, ran a short SMART test (attached) and have started a long SMART test. Reallocated sector count is still at 0. I was planning on upgrading to a new 8TB parity and moving the current 4TB parity drive into the array as a replacement for an equally (or more) elderly 1TB drive. Should I plan on replacing this one first instead? Extended SMART results will be posted as soon as they're available. nas-smart-20180710-1845.zip Edited July 17, 2018 by FreeMan resolved Quote Link to comment
FreeMan Posted July 11, 2018 Author Share Posted July 11, 2018 (edited) OK, smart report attached. It shows errors but I've not reviewed it, doing this via TeamViewer from the office... I got an additional earning this morning that there was an error, then a notification the it was recovered. extended SMART and fresh diagnostics attached. of note, the power went out at the house while we were gone on vacay. the UPS did its thing and shut the server down properly. it remained powered off for about a week while we were gone, and these issues cropped up about 48 hours after turning it back on. nas-diagnostics-20180711-1012.zip nas-smart-20180711-0841.zip Edited July 14, 2018 by FreeMan spelling/grammar Quote Link to comment
JorgeB Posted July 11, 2018 Share Posted July 11, 2018 Extended SMART test failed = disk needs to be replaced Quote Link to comment
FreeMan Posted July 11, 2018 Author Share Posted July 11, 2018 Extended SMART test failed = disk needs to be replacedThanks, Johnnie.It was bound to happen eventually, I thought I was going to get a jump on it and start replacing the oldest drives before that happened...Guess I got started a bit too late. Thank goodness for CrashPlan - that disk has all of family photos on it, and I just got an email from CP this morning that backups are complete. Sent from Tapatalk Quote Link to comment
FreeMan Posted July 11, 2018 Author Share Posted July 11, 2018 That was odd... I just got 2 notifications: @14:56 Notice [NAS] - array turned good {numbers} Array has 0 disks with read errors normal @14:57 Notice [NAS] - Parity check started {numbers} Size 4TB Warning I was going to ask about running a non-correcting parity check (my last one ran w/no issues on 1 June. The server was down on 1 July.) just to ensure that everything was good. I don't think I want to run a correcting parity check because disk 7 is known to be failing, and I don't want a read issue there to cause a change to parity and corrupt a (believed/known) good parity in case of complete disk failure. Does running a non-correcting parity check make sense at this point, or should I just pop down to the shop, pick up a new 8TB drive, and do a parity/data-drive switcharoo? Quote Link to comment
FreeMan Posted July 11, 2018 Author Share Posted July 11, 2018 I've just ordered a new WD MyBook external drive from Newegg and I'll stop by the warehouse to pick it up tomorrow. (The bonus of having a warehouse in the city where you live. The downside - I have to pay sales tax.) Since the preclear plugin seems to be out of fashion these days, what's recommended to test the drive for infant mortality prior to shucking and installing internally? Quote Link to comment
JonathanM Posted July 11, 2018 Share Posted July 11, 2018 38 minutes ago, FreeMan said: I've just ordered a new WD MyBook external drive from Newegg and I'll stop by the warehouse to pick it up tomorrow. (The bonus of having a warehouse in the city where you live. The downside - I have to pay sales tax.) Since the preclear plugin seems to be out of fashion these days, what's recommended to test the drive for infant mortality prior to shucking and installing internally? Plug it in to a windows box and run the wddiag suite on it. http://downloads.wdc.com/windlg/WinDlg_v1_31.zip A sequence of writing zeroes and then a long smart test would accomplish something very similar to preclear. Just be sure to keep the drive cool, the externals don't have the best circulation so I'd put a fan blowing on it. Quote Link to comment
FreeMan Posted July 11, 2018 Author Share Posted July 11, 2018 Thanks, @jonathanm. File downloaded & ready to fire up. Quote Link to comment
JorgeB Posted July 12, 2018 Share Posted July 12, 2018 11 hours ago, FreeMan said: I don't think I want to run a correcting parity check because disk 7 is known to be failing, and I don't want a read issue there to cause a change to parity and corrupt a (believed/known) good parity in case of complete disk failure. Definitely don't, it might corrupt parity. 11 hours ago, FreeMan said: Does running a non-correcting parity check make sense at this point, or should I just pop down to the shop, pick up a new 8TB drive, and do a parity/data-drive switcharoo? Don't see the point in running a check, just replace the disk. Quote Link to comment
FreeMan Posted July 13, 2018 Author Share Posted July 13, 2018 21 hours ago, johnnie.black said: Definitely don't, it might corrupt parity. Currently, I have a 4TB drive as my parity. I just picked up an 8TB drive, and once the initial testing to ensure there will be no infant mortality happens, it will become my new parity and the existing 4TB will replace the failing 2TB disk7. Replacing parity with a larger drive is simple - shut down the array, put the larger disk in, assign the biggest disk to the parity slot & let it rebuild parity. However, I'm not 100% convinced this is a good idea, because it's possible I may have a bad file or two on the failing drive. Replacing the data drive is simple, except that I don't have a 4TB or smaller drive to replace it with. I do have just enough space on other drives to be able to scatter the data from the failing disk to other disks, remove the failing disk from the array, then swap parity & add the former parity back into the array. What is the best procedure to do what I need to do? Quote Link to comment
remotevisitor Posted July 13, 2018 Share Posted July 13, 2018 You want the "parity swap" procedure ... https://lime-technology.com/wiki/The_parity_swap_procedure This procedure copies your existing parity to the new (larger) disk. When that is done it rebuilds the failed data drive onto the disk that was the old parity disk. Make sure you understand what this process involves as you must ensure that you do all the necessary steps .... ask for help from the experts here if you are unsure or need further clarification on the steps involved. Quote Link to comment
FreeMan Posted July 14, 2018 Author Share Posted July 14, 2018 Thanks, @remotevisitor. I knew the instructions were out there somewhere. The new drive has been tested and zeroed (as part of the testing) and parity swap is in progress at 13% complete on writing parity to the new drive. Guess I should plan on another drive sooner rather than later to replace a few of the senior 1TB drives I've currently got. Quote Link to comment
FreeMan Posted July 15, 2018 Author Share Posted July 15, 2018 The parity copy has completed, and it's now rebuilding data onto the former parity disk. Is there any reason that there are no user shares available at this point? Current diagnostics attached. Previous diagnostics (showing existing shares) are in posts above. nas-diagnostics-20180715-1015.zip Quote Link to comment
Squid Posted July 15, 2018 Share Posted July 15, 2018 (edited) If nothing else, you found a bug in CA Backup where it was still attempting to backup even though the destination didn't exist. (Because the array wasn't started) This had the effect of spamming your logs with all the errors about xattr Edited July 15, 2018 by Squid Quote Link to comment
FreeMan Posted July 15, 2018 Author Share Posted July 15, 2018 11 minutes ago, Squid said: If nothing else, you found a bug in CA Backup Glad I could help! Do I need to manually recreate all my shares, or are they likely to come back on their own after a reboot? I'm sitting at 56% complete on the drive rebuild right now, so I'm not rebooting any time soon, but if I need to manually recreate, I'll get started on it right now so the array is usable again. Quote Link to comment
Squid Posted July 15, 2018 Share Posted July 15, 2018 I think you started the array in Maintenance mode, as I don't see the disks being mounted before CA started throwing all of those errors. IE: You'll have to wait til the rebuild is done. Quote Link to comment
FreeMan Posted July 15, 2018 Author Share Posted July 15, 2018 I saw the checkbox to do so, but I don't think I did, though obviously I could be wrong.Is there any other way to check on that while the rebuild is happening? Sent from Tapatalk Quote Link to comment
Squid Posted July 15, 2018 Share Posted July 15, 2018 Ok. Dug through further, and you did start the array. But when it wasn't started, CA still attempted the backup. You still get the prize for the bug find. Unfortunately, CA basically spammed the crap out of your syslog because of this. Fixed Quote Link to comment
FreeMan Posted July 15, 2018 Author Share Posted July 15, 2018 Question still stands: Do I have to manually recreate the shares (in which case I'll figure everything out now), or will they recreate on a reboot (or just stop/start the array)? 2 hours ago, Squid said: You still get the prize for the bug find. What do I get? My very own, unautographed, digital image of @Squid in my thread? Quote Link to comment
FreeMan Posted July 17, 2018 Author Share Posted July 17, 2018 For the record, the shares did get created when I rebooted the server after the disk rebuild. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.