[SOLVED] Unraid 4.5.6, Parity drive failing, data drive with uncorrectable error


Recommended Posts

After a power failure, the performance was so bad we could not watch a movie (it would play a couple of frames, then wait 20-30 seconds over and over).

 

I got the syslog and SMART reports for all of the drives (1 parity and 6 data).

The SMART report for the parity drive says it will fail in less than 24 hours.
One of the data drives has Current_Pending_Sector = 1, and Offline_Uncorrectable = 1.  I ran short and long tests on this drive.
None of the other drives show any errors (I ran long tests on all of them).

The drives show up in the management console with no errors.  It says Parity is valid, and shows 8 parity errors found when the last check ran after the power failure.

 

I think I need to replace the parity drive, but I'm concerned about what will happen with the data drive with the pending sector.

I'm guessing that I should do the following:
1. Replace parity drive.
2. Bring up Unraid and assign the new drive as the parity drive.
3. Rebuild the parity drive.

 

Will I be able to re-build the parity drive when the data drive has these errors?

 

I am new to all of this.  Any help would be appreciated.

 

Thanks,

Frank

smartrepg_long.txt

smartreph_long.txt

Syslog.txt

Link to comment
34 minutes ago, FrankS said:

I think I need to replace the parity drive,

With 3800+ reallocated sectors (and it's gone through its pool of extra sectors available to reallocate to -> hence the failing now), its a definite replace

 

35 minutes ago, FrankS said:

Will I be able to re-build the parity drive when the data drive has these errors?

Since it has 1 pending sector, parity will rebuild fine.  But, if/when you choose to replace the data drive, you *may* have a corrupted file as you're rebuilding the parity drive with info from a drive that may or may not read the proper information from the sector.  After rebuilding the data drive, run another parity check to see if parity still matches.

 

BTW, after you handle the drive replacements you should really consider upgrading to 6.5.3+.  3800 reallocated sectors is a definite no-no, and unRaid would have told you about this drive way before it hit that point on any version of unRaid 6.x+  I don't mind a few reallocated, but some users insist that any drive with a single reallocated is worthy of replacement.

Link to comment

Thanks for responding.

 

Most of the data is media (movies, TV series etc.), so I don't have backups because of the size.  Losing it would suck, but not be the end of the world!

I have some other data (family photos) that I need to verify I have copies of, and try to copy if I don't have backups.

 

 

Link to comment
9 minutes ago, Squid said:

With 3800+ reallocated sectors (and it's gone through its pool of extra sectors available to reallocate to -> hence the failing now), its a definite replace

 

Since it has 1 pending sector, parity will rebuild fine.  But, if/when you choose to replace the data drive, you *may* have a corrupted file as you're rebuilding the parity drive with info from a drive that may or may not read the proper information from the sector.  After rebuilding the data drive, run another parity check to see if parity still matches.

 

BTW, after you handle the drive replacements you should really consider upgrading to 6.5.3+.  3800 reallocated sectors is a definite no-no, and unRaid would have told you about this drive way before it hit that point on any version of unRaid 6.x+  I don't mind a few reallocated, but some users insist that any drive with a single reallocated is worthy of replacement.

 

Thank you!.  I was worried that the rebuild would fail, and I would lose everything.  I guess could handle losing a file or two.  I knew I needed to replace the parity drive, and assumed the same for the data drive and had already ordered replacements.  I was just unclear if I needed to do something special because of the errors on the data drive.

 

Also, do I need to preclear the new drives?  I've read through the documentation and forum posts and am confused about this (both if I need to do it, and how to do it).

 

I will definitely look into upgrading to unRaid 6.  It looks like there have been many useful changes.

 

 

 

 

Link to comment
Just now, FrankS said:

Also, do I need to preclear the new drives?

It certainly doesn't hurt.  But in my mind, a rebuild of parity/data drive, followed by a non-correcting parity check (not sure if 4.x allows you to do a non-correcting though) and then checking SMART is effectively the same thing as a single pass of preclear, and saves you time/power.

Link to comment
2 minutes ago, Squid said:

It certainly doesn't hurt.  But in my mind, a rebuild of parity/data drive, followed by a non-correcting parity check (not sure if 4.x allows you to do a non-correcting though) and then checking SMART is effectively the same thing as a single pass of preclear, and saves you time/power.

 

Thanks again.  But what about the data drive.

Link to comment
45 minutes ago, FrankS said:

Thanks.  I'll see if I can figure out how to do it for the data drive.

 

For your very old version you would have to go back to the original Joe L. script. You can ignore any of the more recent discussions of preclear.

 

https://lime-technology.com/forums/topic/2732-preclear_disksh-a-new-utility-to-burn-in-and-pre-clear-disks-for-quick-add/

 

I tend to agree with Squid as far as preclearing a replacement disk. Rebuild followed by parity check is a pretty good test, and arguably is better than leaving a drive with known issues in the array while you do a lengthy preclear. Preclearing makes more sense for testing drives that will be added to a new slot in the array.

Link to comment
1 hour ago, trurl said:

 

For your very old version you would have to go back to the original Joe L. script. You can ignore any of the more recent discussions of preclear.

 

https://lime-technology.com/forums/topic/2732-preclear_disksh-a-new-utility-to-burn-in-and-pre-clear-disks-for-quick-add/

 

I tend to agree with Squid as far as preclearing a replacement disk. Rebuild followed by parity check is a pretty good test, and arguably is better than leaving a drive with known issues in the array while you do a lengthy preclear. Preclearing makes more sense for testing drives that will be added to a new slot in the array.

 

That makes sense.  Thanks.

Link to comment

OK, so I did the following:

Stopped the array, and shut down the system.
Replaced the 2TB parity drive.
Restarted unRaid.


The management console had a blue orb next to the parity drive and green for the data drives.  I checked the devices page and everything was correct.
I checked the box under the start button, and clicked start.
The management console became unresponsive.

 

I used the /root/mdcmd status command to see the progress and it shows the following:
mdState=STARTED
mdNumProtected=7
mdNumDisabled=0
mdNumInvalid=1
mdInvalidDisk=0
mdResync=1953514552
mdResyncPos=512
mdResyncPrcnt=0.0
mdResyncFinish=591840.2
mdResyncSpeed=0

and for the new disk:
diskState.0=6
rdevStatus.0=DISK_INVALID
rdevNumErrors.0=0
rdevLastIO.0=1535561073

I've run this several times, and the only thing changing is the mdResyncFinish value.

 

Now, I checked the syslog (attached as a zip file) and it is logging these lines over and over:
Aug 29 12:55:50 Tower kernel: ata1: failed to resume link (SControl FFFFFFFF)
Aug 29 12:55:51 Tower kernel: ata2: failed to resume link (SControl FFFFFFFF)
Aug 29 12:55:51 Tower kernel: ata4: failed to resume link (SControl FFFFFFFF)

 

What should I do?

 

syslog_newparity.zip

Edited by FrankS
sending zip instead of txt for size
Link to comment

Obviously it is having trouble communicating with at least 3 disks. Hardware problem of some kind. Bad connections, cables, ports, controllers, PSU?

 

Bad connections (all SATA and power, both ends) are a good place to start, especially in situations where you have been in the case messing around. SATA cables should be unbundled so they don't crosstalk, the connector should sit squarely on the connection with no strain that might cause it to shift or come loose.

Link to comment

Thanks trurl.

 

The 3 drives getting errors are all on a PCIe controller card.  I tried re-seating the card, and checked all of the cables, and no change.

 

The card is a Rosewill RC-218 4-port PCIe x4 SATA, and I'm not sure what to replace it with (it is no longer available).

I'm looking at the unRaid Hardware Compatibility page to see if I can figure out what to get.

Link to comment
4 hours ago, FrankS said:

The card is a Rosewill RC-218 4-port PCIe x4 SATA

That same card has been working well for me for many years, but I suppose nothing lasts forever. I especially like the ability to put some ports as eSATA. I use that feature in V6 with Unassigned Devices to create backup disks for offsite storage.

 

Not sure what to suggest for a replacement these days though.

Link to comment

Thanks everyone for your help.  Once I got past the hardware problems, I got the server back up.

 

I performed the following steps:

Replaced the parity drive and let unRAID rebuild it.

Ran a parity check.

Replaced the data drive and let unRAID rebuild it.

Ran a parity check.

 

The system is running fine with no errors.

Link to comment
  • FrankS changed the title to [SOLVED] Unraid 4.5.6, Parity drive failing, data drive with uncorrectable error

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.