Failing drive and ungodly long parity-sync


Recommended Posts

I just resently had to replace a dead flash drive for the server. Right now it's in parity-sync but it's gone from about 1000 minutes to 316303.2 minutes. I noticed a drive has over 6k errors and need to replaced but can I do this before the sync is done? Is there anything I can do or do I have to wait it out?

 

UnRaid: Ver 4.7

Unraid 4.7.png

Link to comment

Moved to Legacy Support.

 

How is it that you are just now coming to the forum with your first post and it's about a very very old version of unRAID? Many of us haven't worked with that version, and I just barely remember it myself.

 

Stop the parity sync until we can get a better idea of what the problem is.

 

Unfortunately, getting useful diagnostics from that old version is a lot more trouble than we have to go to on the latest versions.

 

We need the syslog and SMART report for that drive giving errors. Even better would be syslog and SMART report for all drives. If you were on V6 you could get all of this in a nice zip to post for us, but instead you will have to get each separately. If you want you could zip them yourself and then you would only need to attach one thing to your next post.

 

See here:

 

https://lime-technology.com/forums/topic/9277-how-to-report-a-defect-and-capture-syslog-and-smart-reports/

 

 

Link to comment

Be sure to read the first several posts at that link I gave so you know how to get syslog and SMART reports.

 

Did you have to go into the case to replace the flash drive? Sometimes people will disturb the disk connections if they open the case.

 

It would also be useful if you could tell us a little about your hardware. It would be nice if you can easily upgrade to V6 after we get this problem squared.

Link to comment

Well I've never had a problem with 4.7 so I didn't think there was a reason to upgrade, although I have seen the nifty things that the new versions offer. Plus I thought you had to pay for another license. I've attached a zip file with all the smart reports for each drive and the system log.

 

As for the hardware:

MB: ASRock FM2A85X Extreme6

CPU: AMD A4-5300

Mem: 1GB DDR3-1066

Controllers: Adaptec 1430SA x 2

 

Is there anything else you need?

Many thanks!

 

 

 

HIVE Syslog and Smart Reports.zip

Link to comment

I'm still using my USB and license that I got when I had 4.7.  Upgrades are free and so far it doesn't look like that will change any time soon.  But I would probably pay for upgrades if I was stuck on a version that only supports 2TB drives like 4.7.  And then again when the VM manager and Docker were added.

Link to comment

You have multiple disks with issues. Unfortunately, the syslog has rotated and is only showing all the recent errors, but none of the old information that would make it possible for me to identify each disk by their assigned slot. I can see the serial numbers in the SMART though and that will be enough. You could perhaps get older syslogs from /var/log/syslog.1, /var/log/syslog.2, etc. but it's probably not necessary. The latest unRAID makes all this much easier.

 

Also, the latest unRAID also helps you to keep track of impending issues by notifying you immediately by email or other agent, for example, when a disk SMART begins to show problems. We may have problems saving all your data in this current state since you have multiple unreliable disks, and parity plus all other disks must be read reliably in order to rebuild any disk.

 

The disk you have labeled SMART as having errors is actually FAILING NOW and must be replaced immediately. Unfortunately, you also have 2 other disks with pending sectors and so they can't really be trusted to accurately rebuild the failing disk. Those disks should be replaced also, ASAP, but of course you can only rebuild one at a time and the FAILING NOW disk takes priority. I guess we will have to start there and hope for the best, possibly if you wind up with a corrupt rebuild we can repair the filesystem and save most things.

 

Why did you decide to do a parity sync anyway? That probably has corrupted parity somewhat, which of course also makes an accurate rebuild unlikely.

 

One last comment, you don't really have enough RAM for an upgrade to V6. I haven't checked the specs for the other hardware.

 

Let us know if you need more details about how to proceed with rebuilding the failing disk to a new disk.

 

Device Model:     Hitachi HDS723020BLA642
Serial Number:    MN1220F326RLAD
  5 Reallocated_Sector_Ct   0x0033   001   001   005    Pre-fail  Always   FAILING_NOW 1975
196 Reallocated_Event_Count 0x0032   001   001   000    Old_age   Always       -       2392
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       31

Device Model:     WDC WD20EARS-22MVWB0
Serial Number:    WD-WCAZA3935061
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       1
196 Reallocated_Event_Count 0x0032   199   199   000    Old_age   Always       -       1
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       17
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       1

Device Model:     WDC WD20EARS-00MVWB0
Serial Number:    WD-WCAZA5742681
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       8

 

Link to comment

Another approach would be to create a new array with only the good disks, sync parity, then see if you can mount the bad disks outside the array (another thing that is much simpler in V6) and try to copy their contents. That has the advantage of getting the good disks protected, but it means you can't really rebuild any of the bad disks and will just have to hope you can read them well enough to get something off them.

 

Do you have any backups?

Link to comment

Sigh, well if I replace the failing disk how would I go about it? Would it be the same steps to replace the other two drives?

 

Also, I didn't actually start the parity sync it did it on it's own when I replaced the usb and started it back up.

Link to comment
4 hours ago, Imba said:

I didn't actually start the parity sync it did it on it's own when I replaced the usb and started it back up.

 

It must have seen super.dat that was copied without the array stopped and assumed unclean shutdown. Latest unRAID does a non-correcting parity check on unclean shutdown.

 

Can you add RAM? Maybe V6 NAS capability only would work with 1GB but it would be very tight.

 

Here is a link to the upgrading wiki:

 

https://lime-technology.com/wiki/index.php/Upgrading_to_UnRAID_v6

Link to comment
18 hours ago, pwm said:

Older versions of unRAID was more interested in doing corrective parity sync.

I believe you mean check, sync are always write.

 

18 hours ago, pwm said:

Didn't all versions before version 6 default to have the 'correcting' checkbox set even if someone wanted to manually start a parity scan?

It still does, you need to uncheck the "write corrections to parity" box before starting a non correcting manual check, though on newer releases it does default to non correct after an unclean shutdown.

Link to comment

unRAID really should stay away from writing corrections unless the user more or less forces that operation. In case there is something wrong, the user should be given the full set of options of what steps to try to recover - which means the most recent parity must be left intact.

  • Like 1
Link to comment

Ok so I'm confused as to what steps I should be taking, I can replace the failing hard drive and more than likely add more RAM. But I don't understand how to go about all this. 

Upgrade before anything?

Replace drive first?

How to get info from the failing drives?

 

I'm sorry UnRAID seems to be beyond the limits of my usual comprehension.

Link to comment
On 5/18/2018 at 11:07 PM, trurl said:

Do you have any backups?

 

This is probably the first thing to consider. If you have any important and irreplaceable files that you don't have backed up then try to copy them from the server to your PC.

Link to comment
  • 2 weeks later...

I don't have many overly important files on the server, but there are some things that I would like to save of course. So should I just start the server again, stop the sync, and try to pull from the failing drive? Or do I have to pull files in general (e.g. from shares as oppose to the drive that is failing).

Link to comment

Did you run an extended test on the drives with pending sectors to confirm if they are failing or not?

 

On 5/19/2018 at 7:41 AM, johnnie.black said:

You should run an extended test on those WD disks with pending sectors, they can some time show false positives, i.e., the disks may be fine for now, the Hitachi is definitely failing.

 

Link to comment
  • 2 weeks later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.