[SOLVED] Help please. Red balled drive replaced. Same failures.


Recommended Posts

My v5 unraid server has been running for about 5+ years now with no fail.  I out of curiosity one day decided to go into the Main and check status, only to find a red balled drive.  I shut the server down reseated all the cables and restarted, ran the trust parity rebuild and everything seemed fine for a couple weeks.  Checked in again recently and found it red balled again.  So I pulled and replaced the drive however I moved it to another controller and assigned it as previous drive.  Again started rebuild and let it run for the weekend as I was leaving town for the holiday.  Came home and everything looked like it was back to normal so I started a parity check as the one scheduled at the beginning of this month didn't run due to the drive issues.  This parity check was also cancelled or stopped and the same drive red balled again.  The attached syslog is pretty much greek to me so if someone could guide me with correcting these issue I would much appreciate it.

Thank you.

V

syslog-2016-09-05.txt

Link to comment
  • Replies 57
  • Created
  • Last Reply

Top Posters In This Topic

Did you test the new drive? Post SMART for disk9.

 

If you were on V6 you would have gotten emails telling you about the redball instead of having to accidentally discover it who knows how long after it happened. Not clear exactly what you did to rebuild since you mention trusting parity, which usually means a New Config. You should have definitely rebuilt the data disk from the existing parity since the array had the valid data for that disk but the disk did not, and there is no telling how out-of-sync that data drive was since you didn't know when it was redballed.

 

Perhaps you already know this, but unRAID disables a disk (redball) when a write to it fails, and from that point until the drive is rebuilt, unRAID will not use the disk again. Instead it will emulate the disk by calculating its data from all other disks plus parity. It emulates the drive whether reading or writing, so any writes done to the disk after it was redballed is only in the parity array and not on the disk itself.

 

If you insist on staying with v5 you should at least upgrade to the latest v5.0.6.

 

 

Link to comment

Thank you for the response.  Attached is the smart report for disk9.

I have not kept up on the latest of the unRaid versions and info so I am ignorant of the functionality of v6 however I did not want to upgrade with a red balled disk, anyway I do appreciate the suggestion for upgrading and will when this is corrected.

The first red balled drive I rebuilt with the 'trust my array' procedure as I was sure I hadn't wrote to it.  As stated previous it was working again as usual until I checked again days/week later and saw it disabled again.  That was when I physically replaced the drive and rebuilt the new drive from parity.  The unRaid 'main' showed everything was successful and running, however I needed to run a manual parity check because the previous drive issues caused it to cancel.  I haven't had a successful scheduled parity check in 2 months because of the drive issues.

Thanks again.

V

smart.txt

Link to comment

The first red balled drive I rebuilt with the 'trust my array' procedure as I was sure I hadn't wrote to it.

You may not have directly written to it, but a write failed, or unraid wouldn't have red balled it. There is a good chance you have minor file system corruption as a result of using the trust my array.

 

The way unraid works, when a file is read, and the drive fails to respond, it spins up the rest of the drives and calculates what the data in that spot should be, and attempts to write the data calculated from the rest of the drives back to the unresponsive drive. If the write succeeds, then the error counter is incremented for the drive, and life goes on. If the write fails, unraid red balls the drive, and ALL further activity happens without the red balled drive. Any file system housekeeping, any writes, are only on the emulated drive, which is guaranteed by definition to be different than the physical drive.

Link to comment

Thank you for the response.  Attached is the smart report for disk9.

I have not kept up on the latest of the unRaid versions and info so I am ignorant of the functionality of v6 however I did not want to upgrade with a red balled disk, anyway I do appreciate the suggestion for upgrading and will when this is corrected.

The first red balled drive I rebuilt with the 'trust my array' procedure as I was sure I hadn't wrote to it.  As stated previous it was working again as usual until I checked again days/week later and saw it disabled again.  That was when I physically replaced the drive and rebuilt the new drive from parity.  The unRaid 'main' showed everything was successful and running, however I needed to run a manual parity check because the previous drive issues caused it to cancel.  I haven't had a successful scheduled parity check in 2 months because of the drive issues.

Thanks again.

V

 

That drive is seriously bad, very close to failing completely.  Reallocated_Sector_Ct VALUE is down to 12, perhaps lower than I've ever seen one, Current_Pending_Sector has hit bottom at 001, and a count of over 3600 bad sectors still to deal with.  If parity can be trusted, replace with a new drive as soon as possible.  If you aren't sure about your parity, then copy off all data of importance first, then the rest of the data, and then replace the drive.

Link to comment

Thank you RobJ for looking at the drive, I will replace it again.  However I'm not sure what indications I would have to NOT trust parity.  This drive is seldom written to directly by me, and until it was disabled I have had no errors.  So I guess my question is what would tell me that parity was not to be trusted for a rebuild?

Thanks again.

V

Link to comment

I'm a little confused by what you actually mean when you say you used the "trust my array" procedure. Maybe it means something else on v5 but usually "trust my array" means you have set a new configuration and you don't want unRAID to rebuild parity. To get from there to actually rebuilding a data disk is possible but requires a few extra steps. Do you have a wiki link or something so we could see what procedure you followed?

 

Also, why did you use a bad drive for the rebuild? Do you not test your drives before using them in your server? It is very important that all bits of all drives be trustworthy since all bits of all drives will be needed to rebuild when one of them fails.

 

And, V6 would have told you that you were using a bad drive.

Link to comment

I did not set a new config, I just followed what it said for re-enabling a drive, Per the 1st bullet point:

https://lime-technology.com/wiki/index.php/Troubleshooting#Re-enable_the_drive

 

https://lime-technology.com/wiki/index.php/Make_unRAID_Trust_the_Parity_Drive,_Avoid_Rebuilding_Parity_Unnecessarily

 

You are correct, I did not preclear the replacement drive as I was in a hurry heading out of town.  Believe it or not that drive had been sitting cold in the server 'brand new' for the last 4 years.  Go figure.

Thanks.

V

Link to comment

I did not set a new config, I just followed what it said for re-enabling a drive, Per the 1st bullet point:

https://lime-technology.com/wiki/index.php/Troubleshooting#Re-enable_the_drive

 

https://lime-technology.com/wiki/index.php/Make_unRAID_Trust_the_Parity_Drive,_Avoid_Rebuilding_Parity_Unnecessarily

 

You are correct, I did not preclear the replacement drive as I was in a hurry heading out of town.  Believe it or not that drive had been sitting cold in the server 'brand new' for the last 4 years.  Go figure.

Thanks.

V

The first link gives you the second link, and the second link is basically telling you how to avoid rebuilding anything, parity or data disk. But your OP said "trust parity rebuild", so I guess that is why I was confused.

 

So I assume the first time you didn't really rebuild anything.

 

These details can be critical when deciding how to proceed. It sounds as though we have no reason to think the data disk or the parity disk are either one valid at this point.

 

Do you have the original drive that failed? It might actually be in better condition than the one you replaced it with.

 

 

Link to comment

Yes I do have the original failed drive.

 

When I used the 'trust my array' procedure it most definitely said on the 'main'- rebuilding data on drive 9 so I have no reason to doubt it actually did.

How do I proceed?

Thanks.

V

You must not have followed the "trust" procedure.

 

So just to recap, what is the current state of your array? Disk9 still redball and all other drives green? Can you browse the files on disk9? If all other drives are green then you should be able to read the emulated disk9 to see what is on it.

Link to comment

Ok, I went back and re read the procedure myself and agree with you that I must not have done the 'trust' procedure as you mention.  I can agree because it states NOT to do it with disable drives.  So I did just the 're-enable a drive' directions.  Under the assumption as I said previous that I trusted the current parity and also assumed it was a cable issue.  Again as I stated after the rebuild it was online and status was green.  I run headless so I have to intentionally log in and when i did recently it had been red balled again.  It was at that point that I replaced the drive.

 

Currently I tried to reinstall and assign the original red balled drive however it will not allow it as the replacement was larger.  So... where do I go now?  New current status is disk9 unassigned, all others green.

Thanks,

V

 

Yes I do have the original failed drive.

 

When I used the 'trust my array' procedure it most definitely said on the 'main'- rebuilding data on drive 9 so I have no reason to doubt it actually did.

How do I proceed?

Thanks.

V

You must not have followed the "trust" procedure.

 

So just to recap, what is the current state of your array? Disk9 still redball and all other drives green? Can you browse the files on disk9? If all other drives are green then you should be able to read the emulated disk9 to see what is on it.

Link to comment

Currently I tried to reinstall and assign the original red balled drive however it will not allow it as the replacement was larger.  So... where do I go now?  New current status is disk9 unassigned, all others green.

Thanks,

V

This is another scenario that would be handled better with V6. I won't go into the details about that.

 

Unfortunately with V5, I think unRAID is going to start a parity check if you New Config with the old disk, even if you do the trust procedure.

 

Probably the best thing would be to copy everything off the emulated drive onto another computer. Then we can consider different ways to proceed.

 

Before you do that though, get a SMART for all of your other disks. We don't want one of them to fail while we are using them all to emulate the disabled disk. And I think you should be able to get SMART from the old disk without actually assigning it to any slot.

 

Link to comment

Before you do that though, get a SMART for all of your other disks. We don't want one of them to fail while we are using them all to emulate the disabled disk. And I think you should be able to get SMART from the old disk without actually assigning it to any slot.

 

Is there a way to do that all at once or do I have to do it for each drive individually?

Link to comment

Before you do that though, get a SMART for all of your other disks. We don't want one of them to fail while we are using them all to emulate the disabled disk. And I think you should be able to get SMART from the old disk without actually assigning it to any slot.

 

Is there a way to do that all at once or do I have to do it for each drive individually?

No way in V5
Link to comment

Ok I'm getting the jist of how seriously depreciated V5 is.  Can I or should I update to help this along, is that possible?

Thanks.

V

I would wait until I had my array stable again before tackling the upgrade to V6. Not sure what you would gain if you upgraded to the latest V5.

 

The notifications in V6 are just what you need if you intend to set it and forget it like this.

Link to comment

I think I would still proceed by copying all the data from the emulated disk somewhere else, unless you already have it backed up. Then we would have something to fall back on in the event there were any problems with trying to rebuild it again.

 

On that subject, do you have backups of all your critical data? unRAID parity protection is not a substitute for backups.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.