Drive Reconstruction Stalled (drive icon grey triangle?)


Recommended Posts

Unraid 6 RC3

 

I have a 1TB drive that was accidently unplugged when a server was moved.  It came up as failed (of course), so after some fiddling and a reboot or two I opened the case to see what the issue was.  Unplugged.  Easy right?

 

So I plugged it back in and rebooted.  The drive then responded I did a smart check on it and left it to do what it does.

 

This morning I checked to see what's going on and the drive is still active, but...the drive icon is a grey triangle.  That state is not on the list!

 

Short and extended self-tests ran with no problems.

http://my.jetscreenshot.com/12412/20150525-vfqk-47kb.jpg

 

Screenshot of Main

http://my.jetscreenshot.com/12412/20150524-oigx-94kb.jpg

http://my.jetscreenshot.com/12412/20150525-8eru-61kb.jpg

 

Screenshot of unmenu - disk array status - disk_dsbl?

http://my.jetscreenshot.com/12412/20150524-4pae-150kb.jpg

 

I've run Smart check a couple times, no errors.  No errors reported.

 

Smart report, disk attributes & syslog zip attached.

 

Thanks in advance guys, for your wisdom and experience!

 

Disk_Failure.zip

Link to comment

In order to get unRaid to rebuild the drive back onto itself, you've got to stop the array.  Change disk6 to be not installed.  Restart the array.  Stop the array, change disk6 back to what its supposed to be.  When you start the array, it will begin to rebuild the drive.

Link to comment

Thanks Squid, "content being reconstructed"...  Is this the same process you'd use to upgrade a disk?  i.e. if I wanted to replace the 1TB with a 3TB or whatever?

Yes.  But your parity drive has to be as large or larger than the largest data drive

And might I also say, Squid, thanks for not embarrassing me by identifying I pulled the Smart report on the wrong drive!  Duh...  ;)  Need to get more sleep!

 

#youguysaregreat!

Didn't even look at it
Link to comment

The drive reconstruction ran for a few minutes then this displayed...

 

http://my.jetscreenshot.com/12412/20150525-qrq8-120kb.jpg

 

Disk ID shows this: http://my.jetscreenshot.com/12412/20150525-4zh5-36kb.jpg

 

But the cool thing is that my drive has been expanded to a 600PB drive!  ;)

http://my.jetscreenshot.com/12412/20150525-svby-48kb

 

Im

 

 

 

.

 

 

Unraid 6 RC3

 

I have a 1TB drive that was accidently unplugged when a server was moved.  It came up as failed (of course), so after some fiddling and a reboot or two I opened the case to see what the issue was.  Unplugged.  Easy right?

 

So I plugged it back in and rebooted.  The drive then responded I did a smart check on it and left it to do what it does.

 

This morning I checked to see what's going on and the drive is still active, but...the drive icon is a grey triangle.  That state is not on the list!

 

Short and extended self-tests ran with no problems.

http://my.jetscreenshot.com/12412/20150525-vfqk-47kb.jpg

 

Screenshot of Main

http://my.jetscreenshot.com/12412/20150524-oigx-94kb.jpg

http://my.jetscreenshot.com/12412/20150525-8eru-61kb.jpg

 

Screenshot of unmenu - disk array status - disk_dsbl?

http://my.jetscreenshot.com/12412/20150524-4pae-150kb.jpg

 

I've run Smart check a couple times, no errors.  No errors reported.

 

Smart report, disk attributes & syslog zip attached.

 

Thanks in advance guys, for your wisdom and experience!

Link to comment

I've replaced disk 6 and started a rebuild with another precleared drive.  It immediately stated it was rebuilding then reported a command timeout 3.

 

http://my.jetscreenshot.com/12412/20150525-fzkq-139kb.jpg

 

All of these drives came from operational (and not in error) windows based system.  All passed multiple Pre-clears.  Why so many errors under Linux??!?  Is it more picky or something?

Link to comment

Well... I think this must be a controller problem.  Came back after I had swapped a drive, replaced the cables and started the rebuild.  A while after the rebuild started, the very same port is reporting bad data...  So perhaps the Sata Adapter is bad?  Syslog attached.

 

http://my.jetscreenshot.com/12412/20150525-ze7a-102kb.jpg

 

I guess this is what you get when you try to use up 'stuff' laying around.  Probably better just to sell it on ebay and buy something new...  sheesh...

syslog20150525-1.zip

Link to comment

So are Linux drivers just more particular?  Or is a better way to say, less tolerant of marginal hardware?  I've had several drives that pass pre-clear several times go 'bad', but format just fine under Windows and don't have any chkdsk issues...  Plus the side-track trying to get that ASUS P8Z68-v Pro running.  I switched to the ASUS M5A97 EVO and its working for several days with no issues.  Mind you its bare and just idling, but I couldn't get the P8z68 to do that!

 

Next step is to plant the AOC HBA and see if it pukes...

Link to comment

So are Linux drivers just more particular?  Or is a better way to say, less tolerant of marginal hardware?  I've had several drives that pass pre-clear several times go 'bad', but format just fine under Windows and don't have any chkdsk issues.

chkdsk is tolerant of errors that linux reports. Most times the drives own SMART reports are the best indicator of issues, either in windows or linux. Since windows doesn't need 100% of the drive to be flawless, a marginal drive will function just fine in a windows environment, but puke in an unraid server. Unraid requires 100% accuracy over the entire disk surface* of every drive to properly recover 1 failed drive. *(subject to the actual sizes of the drives involved)

 

Many years ago I made the mistake of putting some marginal drives in an unraid server thinking, "hey, if one fails, no big deal I'll just rebuild it." That logic failed bigtime when I had a second failure while trying to rebuild a drive. Now I no longer tolerate any marginal drives in my array. First sign of trouble, it's out of there. I've seen too many customers drives fail with very little or no warning, so I'm very conservative on my own servers.

Link to comment

So are Linux drivers just more particular?  Or is a better way to say, less tolerant of marginal hardware?  I've had several drives that pass pre-clear several times go 'bad', but format just fine under Windows and don't have any chkdsk issues...  Plus the side-track trying to get that ASUS P8Z68-v Pro running.  I switched to the ASUS M5A97 EVO and its working for several days with no issues.  Mind you its bare and just idling, but I couldn't get the P8z68 to do that!

 

Next step is to plant the AOC HBA and see if it pukes...

Preclear (or clear) uses every part of a drive. Building or rebuilding parity or a data disk uses every part of every drive. I don't think Windows knows if a disk has problems until it tries to read or write data to a part that has problems, which may not ever happen. Chkdsk is only testing for file system problems unless you do a surface scan.
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.