Disk in error state (disk disbl) but passes SMART


Recommended Posts

I'm fairly new to unraid and just had an 8TB drive go bad. I have one 10TB parity so data is okay.

 

I got the following two notifications:

 

Unraid array errors: 12-05-2020 07:56

Warning [TOWER] - array has errors
Array has 1 disk with read errors

 

Unraid Disk 13 error: 12-05-2020 07:56

Alert [TOWER] - Disk 13 in error state (disk dsbl)

 

I ran a short and extended SMART test and both passed. There are no entries under the SMART error log. I see no failures under disk attributes. However when I view the array the drive shows 16 errors.

 

The drive maybe a couple years old but wasn't opened up and powered on until a little over four months ago when I set my system up. I have had other random issues in the past that would cause unexpected restarts. Not sure if that could have been a factor in this instance.


Can someone please review diagnostics and provide any insights? I have not restarted since I saw the notification. I'm in the process of emptying another 8TB drive to move into this slot if the drive is indeed already bad.

 

tower-diagnostics-20200514-2055.zip

Link to comment

Disk13 looks OK. It seems to have been disabled a couple of days ago when writes to it failed. Maybe just a bad connection so you should check that.

 

It will have to be rebuilt to get it enabled again since it is now out of sync. You can either rebuild to the same disk or to another. If you rebuild to another then you can keep the original as a backup in case there is any problem rebuilding, but it is probably safe to rebuild to the same disk if all of your other disks are OK.

 

Do any have SMART warnings on the Dashboard?

 

 

Link to comment
1 hour ago, trurl said:

Do any have SMART warnings on the Dashboard?

19 minutes ago, Dradder1 said:

On the dashboard the drive shows as being healthy but disabled.

I meant do ANY of your disks have SMART warnings. ALL disks are needed to rebuild the disabled disk.

 

Link to comment

All others drives in the array are healthy.  I checked my cache drive (SSD) and it does show a SMART error on the dashboard.

 

The way I setup my shares were that the first x disks were allocated for movies; the last x disks for TV Shows. I checked the contents of the cache drive and it has some empty TV Show folders but no files at all. At one point in time I enabled cache drive for movies and TV shows to help speed up importing of data. Once I had a couple of issues with my SSD not fully emptying I disabled use cache in both shares.

Link to comment
2 hours ago, Dradder1 said:

All others drives in the array are healthy

I should have said all disks in the array are needed to rebuild so you are good on that.

2 hours ago, Dradder1 said:

cache drive (SSD) and it does show a SMART error

Looks like a CRC. Usually a connection issue but since it is just the one I would just acknowledge it by clicking on that warning on the Dashboard and it will not warn you again unless it increases.

Link to comment

Thanks for the direction. I'm in the process of using Unbalance to move files off the disabled drive. Once done I'll proceed with the steps provided:

 

Stop array

Unassign disabled disk

Start array with disabled disk unassigned

Stop array

Reassign disabled disk

Start array to begin rebuild

Link to comment
8 hours ago, Dradder1 said:

I'm in the process of using Unbalance to move files off the disabled drive. Once done I'll proceed with the steps provided

Why?

 

While the disk is disabled, it is not used at all. It is emulated by the parity calculation by reading all other disks. So, moving files from the disk is actually working all of the disk harder than usual, and your array is currently unprotected since you only have single parity and a disabled disk.

 

Also, the whole point of rebuilding is to get the disk back with all of its data intact. So, you are only making your system work harder, and you are unprotected until the rebuild is complete, and you are delaying that rebuild.

 

Your data is actually safer if you go ahead with the rebuild instead of moving data from the emulated disk to other disks in the array.

Link to comment

Does it matter if I've not had a parity check in more than a couple of weeks?

 

If not and to make sure I do it correctly is what I've listed below right:

 

1. Stop array

2. Unassign disabled disk - Do I, a)  set "no device" for the disabled disk in the Main tab or b) exclude the disabled disk under the Shares tab where I assigned the drive?

3. Start array with disabled disk unassigned

4. Stop array

5. Reassign disabled disk - In Main tab do I change from "no device" to the same drive/serial no.?

6. Start array to begin rebuild

Link to comment
21 hours ago, Dradder1 said:

Does it matter if I've not had a parity check in more than a couple of weeks?

If your last parity check had zero errors then it should be fine. If it didn't why didn't you do something about it then? Exactly zero parity errors is the only acceptable result.

 

21 hours ago, Dradder1 said:

1. Stop array

2. Unassign disabled disk - Do I, a)  set "no device" for the disabled disk in the Main tab or b) exclude the disabled disk under the Shares tab where I assigned the drive? User shares are completely irrelevant for this, and in fact, as far as user shares are concerned, the emulated disk still has all its files.

3. Start array with disabled disk unassigned

4. Stop array

5. Reassign disabled disk - In Main tab do I change from "no device" to the same drive/serial no.?

6. Start array to begin rebuild

 

Link to comment

There was a section on the troubleshooting wiki called Re-enable the drive for when you want to re-use a drive that was reported bad but you are confident is good. I used this to confirm the steps about setting the drive to no device without actually removing it physically. It was the same steps you provided earlier.

https://wiki.unraid.net/Troubleshooting#Re-enable_the_drive

 

The data rebuild took 1 day 8 hours but finished successfully re-using the same drive. Before I started the process I did shut down and made sure all cables were securely connected to all drives which they appeared to be.

 

On a side note other than my drives all other internal components are at least 8 years old and sure enough after completing the the data rebuild the server became unresponsive. I connected a monitor and no signal. Looks like I'll be replacing the internal components very soon but that's another topic.

 

Thank you for your help with this. I appreciate it.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.