[Solved] HDD Failure (or not?)


Recommended Posts

Probably stupid question, but after upgrade to UNRAID 6 I'm getting one disk drive listed as "Faulty" on the dashboard and "Device is disabled,  Contents emulated" on the main page. I thought it might be SATA cable (was little wobbly on HDD side) so I replaced the cable with brand new one. However, the error persists. Not sure however if the disk is really faulty as said, because it looks like SMART test returns OK with no errors reported and even check of REISERFS show that the filesystem is indeed there and looks healthy.

What is the best approach in this scenario?

 

Thanks D.

 

REISERFS Check Output:

reiserfsck 3.6.24

 

 

 

Will read-only check consistency of the filesystem on /dev/md1

 

Will put log info to 'stdout'

 

###########

 

reiserfsck --check started at Sat Jul 25 22:02:04 2015

 

###########

 

Replaying journal:

Replaying journal: Done.

 

Reiserfs journal '/dev/md1' in blocks [18..8211]: 0 transactions replayed

 

Checking internal tree..  finished

 

Comparing bitmaps..finished

 

Checking Semantic tree:

 

finished

 

No corruptions found

 

There are on the filesystem:

 

Leaves 390322

 

Internal nodes 2457

 

Directories 16848

 

Other files 143264

 

Data block pointers 365776590 (23 of them are zero)

 

Safe links 0

 

###########

 

reiserfsck finished at Sat Jul 25 22:19:54 2015

 

###########

Link to comment

Once a disk has been marked as "disabled" by unRAID then the condition is not cleared until the disk is rebuilt.  At this point writes are not happening to the physical drive as it is being 'emulated' from the combination of parity plus the other drives so you need to do a rebuild to ensure that no data loss occurs.  If you happen to have a spare disk then the rebuild should be onto the spare disk so you can later test the "faulty" disk outside the array.  If (as in this case) you think the reason was not the disk itself but an external factor, then you can rebuild the disk back onto itself.  The process in this case is:

[*]Stop the array

[*]Unassign the disk that is marked as disabled

[*]Start the array.  You will be warned that the array will be unprotected, but this is OK as at this point we are just trying to get unRAID to 'forget' the serial number of the disabled disk.

[*]Stop the array

[*]Assign the disk again.  You will be told that starting the array will cause a rebuild

[*]Start the array and the rebuild of the disk will start.

When the rebuild completes then the 'disabled' state will be removed.

 

If you are rebuilding onto a spare disk then steps 2 to 4 can be omitted as it is not necessary to get unRAID to 'forget' the serial number of the disabled disk, although following the full process listed above will still work.

Link to comment

This rebuild is full or incremental? I haven't made any writes at all to the affected disk so incremental rebuild should be super fast. If the rebuild will have to be full I would probably take the disk out to get it more thoroughly tested as you mentioned and then put it back as a "new" disk and rebuilt the array.

 

D.

Link to comment

This rebuild is full or incremental? I haven't made any writes at all to the affected disk so incremental rebuild should be super fast. If the rebuild will have to be full I would probably take the disk out to get it more thoroughly tested as you mentioned and then put it back as a "new" disk and rebuilt the array.

The rebuild is always the full disk.  unRAID makes no assumption about the current contents so writes out for each sector what it thinks should be there.  For most of the disk this means unRAID will simply be over-writing sectors with the same data as that already present, but that does no damage.

 

It is your decision as to whether you simply do the rebuild and see if it completes without error, or remove it to carry out more comprehensive tests before doing the rebuild.  The rebuild is itself quite a comprehensive test so might well be sufficient. 

 

If you go the rebuild-on-to-itself route you should check the SMART attributes at the end to see if everything still looks good.  A non-correcting parity check is also worth doing as it validates that what was written is what is being read back (on the assumption that 0 errors are found).

Link to comment

Did as you said and it looks like everything is just fine. I was confused by the fact there was no way how to get the disk back to work without doing rebuild, which seemed to me like a rather extreme option how to solve trivial intermittent problem. Thought there must be some way how to check the disk against parity to prove its content is OK without rebuild, but apparently not.

 

Thanks

 

D.

Link to comment

Did as you said and it looks like everything is just fine. I was confused by the fact there was no way how to get the disk back to work without doing rebuild, which seemed to me like a rather extreme option how to solve trivial intermittent problem. Thought there must be some way how to check the disk against parity to prove its content is OK without rebuild, but apparently not.

The problem is that once a disk has been disabled then it means that writes have failed so there is a discrepancy between the data that is actually on the disk, and what should be there.  unRAID does not keep a record of exactly what sectors failed to write so the only way to determine which ones could be in error is to go through the whole disk.  This is in effect what the rebuild does.

 

It might seem like a trivial error, but if unRAID did not do what it does you would have corruption on the disk with no way to determine what the corruption affected.

 

It might be nice if unRAID had a way of recording exactly what sectors had been written to the emulated disk and had not been written to the physical disk so that a much faster recovery could be done by rewriting just those sectors.  However this would be a none-trivial enhancement and I suspect is unlikely to ever happen.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.