Jump to content

Drive DISABLED all of a sudden, HELP - no idea what to do


Recommended Posts

I'm a medical professional, requesting your smart people's help!

 

Unraid 6.8.3

Set up:  Seagate 6 TB drive X 3 + WD 4 TB x 1 + a cache SSD.  One 6TB drive is the parity drive.  The drive disabled is "Disk3" the most recent drive that I added to the server

 

All of a sudden, one 6TB drive is "disabled" and in "emulation mode".  I can read files that are on that drive but I'm assuming "emulation mode" means it's because of the parity drive.  I

 

I don't know enough about any of this so please bear with my ignorance.  I looked around some of the posts and here's what I did

 

1.   Rebooted a couple of times from the GUI - no change

2.  Undid the SATA cable and reconnected.  Used a different SATA cable . No difference

3.  I downloaded the SMART file for that drive and it's attached here - from what I can glean it says PASSED 

4.  Going through some of the posts here there was a suggestion to follow the steps outlined here https://wiki.unraid.net/UnRAID_6/Storage_Management#Checking_a_File_System  

5.  So I went into "maintenance mode" clicked on the Drive in question, then  Check button to run the file system check.  Copied the info I got onto a text document it's attached

 

I have no idea what to do now - please help.  Is the drive gone bad ??

 

Thank you in advance

 

klingon-smart-20210303-2130.zip filecheck.txt

 

2021-03-03_21-50-44.jpg

Edited by ptcadoc
Change image
Link to comment

Disk3 SMART OK and as you mentioned it is emulated successfully. Since you rebooted before getting us the diagnostics we can't see any details about what may have disabled the disk except of course a write to it failed which is the reason a disk gets disabled.

 

Check all connections, power and SATA, both ends including splitters.

 

Then you will have to rebuild the disk since it is out-of-sync and the emulated contents need to be written back to the disk.

 

Safest approach is to rebuild to another disk and keep the original as it is in case of problems during rebuild. But should be OK to rebuild to the same disk.

 

Do you have backups of everything important and irreplaceable?

Link to comment

Hi - thank you for the reply

 

Yes I'm realizing I should have gotten diagnostics before rebooting I guess the instinct to "turn it off and on" got the better of me

 

I have everything backed up.  

 

I guess my next (stupid) question is what are the exact steps to do the rebuild?  Can you provide an "idiot's guide"?

 

Thank you in advance

Link to comment

Thank you.  I looked at the article you referred to

 

Quick question regarding the steps:

 

1. Stop array

2. Unassign disabled disk

3. Start array so the missing disk is registered

 

Does "unassign disabled disk" mean change disk3 to "no device" as shown in the picture?

 

Thank you 

 

 

image.thumb.png.a8f64e32df9e346d66426a61d1a2d84e.png

Link to comment

Thank you.  I followed the steps and as soon as I made the drive unassigned this error popped up.  I continued and now have it doing a "rebuild".  Is this error message suggesting a drive failure?

 

Thank you immensely for the help

 

One other question - while this rebuild is happening I should refrain from using Plex or writing anything to the server (or is it OK to do so)?

 

2021-03-04_0-29-20.jpg

Edited by ptcadoc
edit
Link to comment

CRC errors are connection issues not disk issues. That single CRC error was already there in your previous diagnostics. You can acknowledge that count by clicking on the SMART warning on the Dashboard page and it will warn you again if it increases.

 

Accessing files while rebuilding is allowed but will slow down rebuild and slow down file access.

Link to comment
1 hour ago, SavellM said:

Similar issue to myself.

According to his description, rebooting was tried after the disk was already disabled. No evidence rebooting caused the disabled disk. In fact, since it was a disk just added, bad connection is the likely cause.

Link to comment
19 hours ago, trurl said:

CRC errors are connection issues not disk issues. That single CRC error was already there in your previous diagnostics. You can acknowledge that count by clicking on the SMART warning on the Dashboard page and it will warn you again if it increases.

 

Accessing files while rebuilding is allowed but will slow down rebuild and slow down file access.

 

So after 15 hours of "rebuilding" it's all good !!!!!!!!  Disk3 is back online and I got these messages

image.thumb.png.36c7dedd374aa6d22da448efb858a9cb.png

 

I'd like to thank everyone for their suggestions especially @trurl I really really appreciate the time you took out to reply

 

Moving forward, can I pick your brains on 

1.  Why this might have happened (it seems, if I understand correctly) this wasn't a hard drive hardware issue
2.  More importantly, how to prevent it from happening again?

 

Thank you 

Link to comment
1 hour ago, ptcadoc said:

pick your brains

Connection issues are much more common than drive issues. Make sure the SATA and power cable connectors fit squarely on the disk connections, with no tension on the cables that might cause them to move. Do not bundle cables to try to make things look nice, that might cause that tension I mentioned, or have some things move other things. All you need is to have it neat enough for good airflow.

 

Other connections along the way, such as power splitters, can be involved. And sometimes cables are bad.

 

Some controllers also have issues but these are usually confined to specific models.

Link to comment
5 hours ago, trurl said:

Connection issues are much more common than drive issues. Make sure the SATA and power cable connectors fit squarely on the disk connections, with no tension on the cables that might cause them to move. Do not bundle cables to try to make things look nice, that might cause that tension I mentioned, or have some things move other things. All you need is to have it neat enough for good airflow.

 

Other connections along the way, such as power splitters, can be involved. And sometimes cables are bad.

 

Some controllers also have issues but these are usually confined to specific models.

So let me see if I understood correctly

 

This still could have been a cable / connection issue?  But then when I undid the SATA cable and replugged it in, the drive still remained "disabled" doesn't that suggest the cable was NOT the isssue?

 

Do I have ways to prevent this from happening again?

 

Thank you 

Link to comment

Here is how this works.

 

When a write to a disk fails, Unraid disables it. But that failed write is still used to update parity.

 

After a disk is disabled, it is no longer used. Instead, it is emulated from the parity calculation. The data can be read from the parity calculation, and writes to the emulated disk happen by updating parity. But the physical disk isn't used because it is out-of-sync with the parity calculation.

 

When you rebuild the disk, the emulated contents are written to the disk and it becomes enabled again because it is in-sync after the rebuild. The failed write that disabled the disk, and any subsequent writes to the emulated disk, are recovered.

 

  • Like 1
Link to comment

I see, thank you makes sense now

 

For my own learning/understanding, how long can this "emulated mode" have continued? In other words, could I have continued to "write" to Disk3 for quite a while and if so, at what point would have things finally broken down?  

Link to comment

It could continue indefinitely, but since the disk was disabled, you no longer had any redundancy.

 

Parity is a common concept in computers and communications. It is basically the same idea wherever it is used. Parity is just an extra bit that allows a missing bit to be calculated from all the other bits. Parity by itself can recover nothing, all the other disks are required.

 

If there was a problem with another disk then you had no way to recover since you already had a missing disk (though since each disk is independent in Unraid nothing on other disks would be lost). Depending on the exact situation in that case you might lose some data.

Link to comment
28 minutes ago, trurl said:

It could continue indefinitely, but since the disk was disabled, you no longer had any redundancy.

 

Parity is a common concept in computers and communications. It is basically the same idea wherever it is used. Parity is just an extra bit that allows a missing bit to be calculated from all the other bits. Parity by itself can recover nothing, all the other disks are required.

 

If there was a problem with another disk then you had no way to recover since you already had a missing disk (though since each disk is independent in Unraid nothing on other disks would be lost). Depending on the exact situation in that case you might lose some data.

Got it.  I'm impressed (from an uninformed person's perspective) with Unraid.  Previously I had a FreeNAS server and had nothing but problems including a failed disk rebuild after a drive failed and repeated glitches so I finally switched around 6 months ago.  With Unraid, a glitch happened and got rectified easily, although only after I sought help as I didn't know how to go about it.

 

Thanks again

  • Like 1
Link to comment
6 hours ago, ptcadoc said:

For my own learning/understanding, how long can this "emulated mode" have continued? In other words, could I have continued to "write" to Disk3 for quite a while and if so, at what point would have things finally broken down?  

Do you have some kind of notifications setup so that you can act quickly if an issue is detected by Unraid ?

 

You can setup various type of notifications from emails to fancy push on a phone, etc.

Better to tackle a problem early on before than wait and find yourself in a bad situation with several undetected failures.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...