Jump to content

Device Disabled, from syslog: was (sde) is now (sds)


Go to solution Solved by trurl,

Recommended Posts

Unraid disables a disk when a write to it fails for whatever reason. After a disk is disabled, it isn't used again, but the array can continue to function as if the disk were still present.

 

Any reads of the disabled disk instead read all other disks and get the data from the parity calculation. Any writes to the disabled disk update parity as if the disk had been written. This is often referred to as emulation.

 

The initial failed write that disabled the disk, and any subsequent writes to the disabled disk, can be recovered by rebuilding. So, if you don't rebuild the disk, anything written to the disk after it became disabled will be lost.

 

But in your case with the other things that might make rebuilding more of a problem, it seems better to just accept the current state of the physical disk.

 

I forgot about disk3 when I was suggesting New Config, though. I recommend leaving that one out since it is failing. Maybe you can copy its data as an Unassigned Device after you get parity rebuilt. If you want you can move other disks up in the assignments when you New Config, or just leave disk3 with no disk assigned then you can add a disk there later.

 

Link to comment
Posted (edited)

oki doki. 

 

I guess I will do things step by step: 

1. stop everything, umount the array, microcheck disk8 data

2. new config, restoring disk8 in place, removing disk from 10 to 14

2b. fine tune notification to check better when one error occurs 

3. recalculate parity1 (I can't stop services so this will be a long and delicate step, in case of fail of a disk I will lose its data and disk3 is at risk)

4. schedule, and do, smart self-extended tests on all disks

5. change disk3 with a new one

6. move data from disk2 to other disks in the same share (3-6)

7. create a double parity with a disk2: 20tb (I really hope that one day I'll be able to have more than one (two) unraid array in one installation)

8. continue to move data on "backup" share adding disks when needed

 

 

 

Edited by skler
Link to comment
27 minutes ago, skler said:

ps. is it possible to backup the actually config when I create a new config? 

Your configuration is on the boot flash in the config folder. This includes your disk assignments, and everything else configured in the webUI. You must always have a current backup of your configuration.

 

But after New Config and starting the array to rebuild parity, not clear what point there would be in going back to a previous disk assignment configuration. 

Link to comment

Some critical information is clipped at right in your screenshot.

 

The Errors column would show if there were I/O errors going on during rebuild.

 

More importantly, FS, Size, Used, and Free columns. If all disks are mounting correctly these will show you that information and hopefully it is as expected. But, if a disk is unmountable, those columns would show that instead, and the filesystem of the disk would need repair. Any disk that hasn't been formatted yet is also unmountable, so in that case it would be expected to display as unmountable.

 

All this is an excuse to explain some more details about how things work.

 

You must never format an unmountable disk in the array if the disk is supposed to have data on it.

 

Format is a write operation. It writes an empty filesystem to the disk. Unraid treats this write operation just as it does any other, by updating parity so the array will be in sync.

 

So, if you format a disk in the array, parity agrees the disk has an empty filesystem and any data it might have had can't be recovered from parity.

 

And this applies if the disk is also disabled and being emulated by parity. If you rebuild a disk, the contents of the emulated disk will be the result of the rebuild assuming everything works correctly. So, if you format a disabled/emulated disk in the array, the only thing that can be rebuilt is an empty filesystem.

 

It is a common mistake to think an unmountable disk should be formatted so that it can then be recovered from parity. Don't.

 

Usually, if a disabled/emulated disk is unmountable, you would check filesystem to repair the filesystem of the emulated disk before rebuilding it.

  • Upvote 1
Link to comment

20TB parity rebuild will take more than a day. My usual estimate is 2-3 hours per TB so more like 2 full days.

 

I know my 8TB monthly parity check takes about 16 hours, which I let run 6 hours at a time during the night when nobody is using the server, so it finishes on the third night.

 

Post new diagnostics if things don't seem to be going well, or if you just want us to take a look.

Link to comment
5 minutes ago, trurl said:

Some critical information is clipped at right in your screenshot.

 

The Errors column would show if there were I/O errors going on during rebuild.

 

ok, atm is all fine but I've started a couple of minutes ago

1129888113_Screenshot2024-01-08at19_17_52.thumb.png.4c6dade83511837a90cc94e6a757447c.png

 

 

5 minutes ago, trurl said:

And this applies if the disk is also disabled and being emulated by parity. If you rebuild a disk, the contents of the emulated disk will be the result of the rebuild assuming everything works correctly. So, if you format a disabled/emulated disk in the array, the only thing that can be rebuilt is an empty filesystem.

 

It is a common mistake to think an unmountable disk should be formatted so that it can then be recovered from parity. Don't.

 

Usually, if a disabled/emulated disk is unmountable, you would check filesystem to repair the filesystem of the emulated disk before rebuilding it.

 

Great advice! But I've understood a bit how parity works now, what does the rebuild in place and the new disk. What I don't have so clear is the error handling, parity doesn't take care of errors? if an errors occur the parity is in error too and can't be more used?

 

Another things is about the disk report, if a disk is going to fail it can be sent as notification in the array report? 

 

4 minutes ago, trurl said:

Post new diagnostics if things don't seem to be going well, or if you just want us to take a look.

 

Things seems good, btw diagnostic is attached (I did a reboot before creating a new config) 

 

 

 

 

 

littleboy-diagnostics-20240108-1924.zip

Link to comment

Often if there is filesystem corruption, parity will agree with that. If the disk is disabled/emulated and the filesystem is corrupt, then corrupt filesystem is definitely what the array agrees with since the physical disk isn't being used.

 

If something is out-of-sync and so it appears that the emulated filesystem is corrupt, there is no way to know which disk is to blame since parity doesn't know. I have seen instances where the emulated filesystem is corrupt, but it is due to problems with another disk in the array not being reliably read and the filesystem on the physical disk is not corrupt, though it may be missing some writes since it was disabled.

 

Best bet is to repair the emulated filesystem then rebuild. If the emulated filesystem is mountable, the rebuild should be mountable assuming rebuild goes well.

 

Sometimes if we aren't reasonably sure about how well the system is working, we will suggest rebuilding to another disk instead of on top of the same disk. That way the original physical disk is still available with its contents in case of problems rebuilding.

 

 

Link to comment

Unraid monitors certain SMART attributes and if they increase, you will get a warning. The global default for these monitored attributes is in Disk Settings. Also, each disk has overrides for these global settings. We usually recommend adding SMART attributes 1 and 200 for monitoring on all WD disks because the way the WD firmware represents these particular attributes is a simple counter which is easier to monitor than the way other manufacturers represent these.

 

If you do get a SMART warning on the Dashboard page you can click on it to Acknowledge and it will warn again if it increases.

 

UDMA CRC ERRORS are almost always connection problems. I typically Acknowledge the occasional CRC, maybe reseat the plug next time I'm in the case, investigate further if they continue to increase or increase by a lot.

 

Reallocated sectors depends on how many and how fast they increase. Disks have some spare sectors for this purpose, but if it gets too large or increases too fast, replace.

 

Pending sectors can be more troublesome. These are sectors that couldn't be reliably read, and will be reallocated next time they are written. Sometimes these will take care of themselves, sometimes it might be worth rebuilding the disk just to ensure the rewrite/reallocation happens.

 

 

Link to comment
4 minutes ago, trurl said:

UDMA CRC ERRORS are almost always connection problems

These don't necessarily even cause I/O errors. The disk firmware says the data received doesn't pass a checksum test. The data is usually resent so things continue to work just fine.

 

And connection problems often will not result in CRC errors, because the disk never received any data to do the checksum on.

Link to comment

A disk can become disabled without triggering any warning for the SMART attributes.

6 minutes ago, trurl said:

connection problems often will not result in CRC errors

But you will still get a warning because the disk became disabled.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...