Jump to content

redball disk


Recommended Posts

I got a red ball data drive, so I ran a smartctl short test.  I see a powercount error?    I did try swapping drive to different port and same thing.

 

Not great, you have a lot of things going on here.

 

1 - you have 4 reallocated sectors

2 - you have 3 pending sectors (these have been marked as bad, but have not yet been reallocated). Through parity checks these should be reallocated, or sometimes the drive changes its mind and clears the pending flag, but pending sectors are dangerous if you don't clear them.

 

You also had some kind of ATA error - "Error: UNC at LBA = 0x0344e86c = 54847596". Maybe an uncorrectable read error? This error occured only a few power on hours before the smart report was taken.

 

It looks like the drive is over 3 years old. I am not sure what the warranty is on these, but I would be looking to RMA it if you can. You could try running parity checks and looking for these errors to clear, but if this drive has been running clean for 3 years, and this is just starting to happen, that it is likely the beginning of more problems to come.

 

Link to comment

After I made the second screen shot, I noticed there is no parity check button.  I think its cuz i still have a redball drive in my array.

 

You are running some WebGUI plugin or enhancement I don't use.

 

If your array is started (which it appears to be), and you can see the contents of disk4 (which should be being simulated by parity + all other array disks), then you can rebuild the failed disk.

 

unRAID will not let you start an array in an invalid state.

 

I am not sure what it means that "data is invalid" for parity. Does not make sense to me.

Link to comment

One of my drives was showing as redballed after it over heated. This is what I did:

 

Stop the array

un-assign the red-ball-drive

start the array with it un-assigned (this will cause unRAID to forget its model/serial number)

stop the array once more

re-assign the drive (it will think it is a replacement for the missing drive and will re-construct onto it when you re-start the array)

start the array.

 

It rebuild fine with no errors.

I was then able to do a parity check with no errors. Not sure if this will work for you (or if it even recommended)

 

I can now replace the drive when I have a new one (as I dont want a somewhat damaged drive in my array)

Link to comment

One of my drives was showing as redballed after it over heated. This is what I did:

 

Stop the array

un-assign the red-ball-drive

start the array with it un-assigned (this will cause unRAID to forget its model/serial number)

stop the array once more

re-assign the drive (it will think it is a replacement for the missing drive and will re-construct onto it when you re-start the array)

start the array.

It rebuild fine with no errors.

I was then able to do a parity check with no errors. Not sure if this will work for you (or if it even recommended)

 

I can now replace the drive when I have a new one (as I dont want a somewhat damaged drive in my array)

This is a method often used when it has been determined that the problem was NOT the disk, such as cabling. A smart report might help. The problem with leaving a drive with unknown health is that it might fail while you are trying to rebuild another disk, and then you have lost both.

Link to comment

Apparently you do not have valid parity data -- if that's in fact the case you can't rebuild the failed disk.  Before doing anything else, boot to Safe Mode and post what the stock Web GUI shows, just to confirm that's in fact the case.

 

If parity is in fact not valid, then I'd simply do a New Config that does NOT include the failed drive;  let it build parity;  run a parity check to confirm all is well;  and THEN either (a) attempt to recover the data from your failed drive on another system or (b) if you have backups, just copy the data that was on that drive back to the array from your backups.

 

Link to comment

also my Parity Status = Data is invalid  - should i run a parity check again before I swap out the dead drive?

 

Just to be very clear -- once you have a failed drive (red-balled) you can NOT update your parity.  If the parity isn't valid at that point, there's nothing you can do about it.

 

Link to comment

After I made the second screen shot, I noticed there is no parity check button.  I think its cuz i still have a redball drive in my array.

 

You are running some WebGUI plugin or enhancement I don't use.

 

If your array is started (which it appears to be), and you can see the contents of disk4 (which should be being simulated by parity + all other array disks), then you can rebuild the failed disk.

 

unRAID will not let you start an array in an invalid state.

 

I am not sure what it means that "data is invalid" for parity. Does not make sense to me.

 

In case it would be helpful, I took a look at the source code for the dynamix webgui plugin to see what it meant by "data is invalid".  It looks at the output of "mdcmd status" and if mdNumInvalid is NOT zero it then checks the value of mdInvalidDisk, if the value of mdInvalidDisk IS zero it prints "parity is invalid" if mdInvalidDisk is NOT zero it prints "data is invalid".

Link to comment

A redball is rarely a failed disk.

 

Usually a redball means a bad or loose SATA or power cable. But could also be a bad port on the controller, a bad drive cage, a poorly routed SATA cable (picking up interference), or something else interfering the the drive getting power or data.

Link to comment

A redball is rarely a failed disk.

 

???

 

It's true that the UnRAID manual doesn't say a red ball status means a disk has failed .. specifically, it says "... Red: the disk is disabled."

 

But that DOES mean there was a write FAILURE ... and I'm not sure I'd agree that this is "rarely" due to a failed disk.    It certainly isn't ALWAYS a failed disk ... but it's certainly not a possibility I'd discount as "rare."

 

It's certainly true, however, that there are several other things that need to be checked before concluding that the disk is bad.  Look at its SMART data; check the cables (both data and power);  reseat it if it's in a hot-swap bay;  and try a different port if you have spare SATA ports.

 

I suppose it's a matter of semantics ... but I don't think "rare" is a correct assessment in this case.

 

Link to comment

How about semi-unlikely?  :P

 

I have been a user since 2007 and never had a disk fail that way. And based on the forums, the vast majority of reported red balls are not failed disks.

 

Disks don't usually just up and die. They usually show signs through smart reports of impending issues. Even the gentlest nudge on a SATA cable is enough to cause problems. Drive cages are very important to IMO as they keep users from cracking open the case and risking this problem. They add cost and could, potentially cause their own issues, but my experience is that users with these cages have much much fewer red balls problems than users without. including me, who had had a ton of redballs before going with the cages. I am a real convert in this sense. Used to be I argued they were an unnecessary cost. Now I think they are indispensable.

Link to comment

Semi-unlikely?    I'd have said something like "A red ball doesn't necessarily mean the disk has actually failed."

 

But "semi-unlikely" is fine  :)

 

r.e. drive cages -- they're certainly convenient (I have quite a few) ... but they DO add one additional connection to the mix (the SATA cable is still there ... it's just connected to the cage's connector instead of directly to the disk) -- which is yet-another failure point.    It is, of course, much easier to re-seat a drive in a cage then it is to open the case and reseat the cables -- but the cables can still come loose due to case movements, etc.    Statistically, the more connections, the higher the likelihood of an issue with one of them ... but I agree that if you're moving drives around or just otherwise working inside a case it increases the likelihood of a SATA cable problem;  and you're probably doing that a lot less on systems with the cages.

 

I HAD switched completely to the Cooler-Master 4-in-3's [ http://www.newegg.com/Product/Product.aspx?Item=N82E16817993002&cm_re=cooler_master_4-in-3-_-17-993-002-_-Product ]  as a preference over the hot-swap cages due to the much better cooling (they outperform every cage I've tried -- SuperMicro, Icy-Dock, I-Star);  but the next time I need one I'm going to use the new Icy-Dock Black Vortex cages with the front fans, which have the same cooling as the Cooler-Master cages.  They are available in both an open-frame version [cables go directly to the drives ... allowing them to also support IDE drives] and a hot-swap version (SATA only, but cables go to the cage).

 

http://www.newegg.com/Product/Product.aspx?Item=N82E16817198059&cm_re=Icy_Dock_4-in-3-_-17-198-059-_-Product

http://www.newegg.com/Product/Product.aspx?Item=N82E16817994171&cm_re=Icy_Dock_4-in-3-_-17-994-171-_-Product

 

The much-better cooling is a reasonable trade-off for the 20% reduction in # of drives supported ... and with the higher capacity of today's drives you can still get 24TB of space/cage  :)

 

Link to comment

Semi-unlikely?    I'd have said something like "A red ball doesn't necessarily mean the disk has actually failed."

 

But "semi-unlikely" is fine  :)

 

r.e. drive cages -- they're certainly convenient (I have quite a few) ... but they DO add one additional connection to the mix (the SATA cable is still there ... it's just connected to the cage's connector instead of directly to the disk) -- which is yet-another failure point.    It is, of course, much easier to re-seat a drive in a cage then it is to open the case and reseat the cables -- but the cables can still come loose due to case movements, etc.    Statistically, the more connections, the higher the likelihood of an issue with one of them ... but I agree that if you're moving drives around or just otherwise working inside a case it increases the likelihood of a SATA cable problem;  and you're probably doing that a lot less on systems with the cages.

 

I HAD switched completely to the Cooler-Master 4-in-3's [ http://www.newegg.com/Product/Product.aspx?Item=N82E16817993002&cm_re=cooler_master_4-in-3-_-17-993-002-_-Product ]  as a preference over the hot-swap cages due to the much better cooling (they outperform every cage I've tried -- SuperMicro, Icy-Dock, I-Star);  but the next time I need one I'm going to use the new Icy-Dock Black Vortex cages with the front fans, which have the same cooling as the Cooler-Master cages.  They are available in both an open-frame version [cables go directly to the drives ... allowing them to also support IDE drives] and a hot-swap version (SATA only, but cables go to the cage).

 

http://www.newegg.com/Product/Product.aspx?Item=N82E16817198059&cm_re=Icy_Dock_4-in-3-_-17-198-059-_-Product

http://www.newegg.com/Product/Product.aspx?Item=N82E16817994171&cm_re=Icy_Dock_4-in-3-_-17-994-171-_-Product

 

The much-better cooling is a reasonable trade-off for the 20% reduction in # of drives supported ... and with the higher capacity of today's drives you can still get 24TB of space/cage  :)

 

I don't like these debates between moderators, because I think it muddies the issue for the users. I really think we are closely aligned on this topic.

 

You may say "statistically speaking" each connection adds risk. That is true and I get it AND I EVEN SAID IT IN MY POST.

 

They add cost and could, potentially cause their own issues ...

 

But "statistically speaking" when looking at real world issues, you'd be hard pressed to find a user that got a red ball that was traced back to a bad drive cage. "Statistically speaking" we get users with red balls without cages traced back to some form of cabling issue all the time.

Link to comment

I agree we've all had our own experiences ...

 

The ONLY red balls I've had in 5+ years of using UnRAID on 4 different systems have been due to drives that needed to be re-seated in their cages (both Icy Dock and SuperMicro).    Probably due to a bit of movement when the system was opened to blow out the dust or clean the filters.  I've NEVER had a loose cable problem on the drives connected internally.

 

I think the simple fact is that MOVEMENT can cause issues with any connection -- I suspect users who "build and forget" have far more reliable systems than those who are constantly "fiddling" with their systems, regardless of how the drives are mounted.

 

But I agree we've talked this issue to death.  Enough  :)

 

 

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...