Device Disabled, from syslog: was (sde) is now (sds)

skler · January 7

emhttpd: error: hotplug_devices, 1713: No such file or directory (2): tagged device WDC_WD60EFRX-68MYMN1_WD-WX11DB4H8T33 was (sde) is now (sds)

is it possible to restore the device? data seems available

littleboy-diagnostics-20240107-1006.zip

JorgeB · January 7

Looks more like a power/connection issue, disk14 read errors on the other hand look like an actual disk problem, you should run an extended SMART test.

skler · January 7

the compromised disk is: disk8, actually is no more included into the array:

and recognized as sds:

if I stop the array and change the disk8 from sde to sds is a good practice? or I should reboot everything?

Edited January 7 by skler
added more infos

JorgeB · January 7

Like mentioned that looks more like a power/connection problem, check/replace cables and if the emulated disk is mounting you can rebuild on top

trurl · January 7

Post new diagnostics after fixing cables

skler · January 7

sorry jorge but I don't understand, what you mean with rebuild on top?

the procedure is:
- restart unraid

- don't start the array

- change the disk8 with sds or sde if after reboot is recognized again as sde

- start the array

- start the parity check

is this right?

I can do something without stopping the service, and stopping the array?

skler · January 7

1 minute ago, trurl said:

Post new diagnostics after fixing cables

I don't have cables, is the backplane of the Dell R740xd

trurl · January 7

Previous diagnostics shows disabled/emulated disk8 was mounting, so it should be OK to rebuild on top assuming no other problems. But some of your disks are pretty old. Do any show SMART warnings ( 👎 ) on the Dashboard page?

49 minutes ago, JorgeB said:

disk14 read errors on the other hand look like an actual disk problem, you should run an extended SMART test.

trurl · January 7

1 minute ago, trurl said:

assuming no other problems. But some of your disks are pretty old. Do any show SMART warnings ( 👎 ) on the Dashboard page?

Every bit of every array disk must be reliably read to reliably rebuild a missing or disabled disk.

trurl · January 7

Forget about the sds vs sde, there is nothing you can do about it anyway, and Unraid doesn't care. It only cares about the serial numbers and whether a disk got disconnected. And it did disconnect, which is why sd changed when it reconnected.

Do you have another copy of anything important and irreplaceable?

5 minutes ago, trurl said:

rebuild on top

https://docs.unraid.net/unraid-os/manual/storage-management/#rebuilding-a-drive-onto-itself

skler · January 7

3 minutes ago, trurl said:

Previous diagnostics shows disabled/emulated disk8 was mounting, so it should be OK to rebuild on top assuming no other problems. But some of your disks are pretty old. Do any show SMART warnings ( 👎 ) on the Dashboard page?

yes, I have some but this was healthy

errors btw are not "so bad" I guess, just few sector relocation (for ex. disk3):

skler · January 7

4 minutes ago, trurl said:

Do you have another copy of anything important and irreplaceable?

no, I have an zfs snapshot on another disk and the parity

6 minutes ago, trurl said:

https://docs.unraid.net/unraid-os/manual/storage-management/#rebuilding-a-drive-onto-itself

Ok I will follow this guide. Just one question, the reconstruct process will preserve existing files or will format the device? I mean will be something fast if is all in place or not?

trurl · January 7

Disk 3 has this:

# 1  Extended offline    Completed: read failure       90%     49686         5011200

so it needs replacing

Disk4 and disk7 haven't had an extended self-test.

And, as already mentioned

1 hour ago, JorgeB said:

disk14 read errors on the other hand look like an actual disk problem, you should run an extended SMART test.

I would worry whether you can safely rebuild anything with these other disk problems. If you had dual parity maybe it would be OK.

4 minutes ago, skler said:

no, I have an zfs snapshot on another disk and the parity

Parity is not a substitute for backup.

I think I would make sure I had another copy of anything important and irreplaceable on another system before attempting anything else.

trurl · January 7

And I definitely would not rebuild on top with those other disk problems. If you insist on rebuilding despite all this, then rebuild to another disk and keep the original as it is since rebuild is likely to have problems.

trurl · January 7

23 minutes ago, trurl said:

Every bit of every array disk must be reliably read to reliably rebuild a missing or disabled disk.

We already know disk3 can't be reliably read.

trurl · January 7

1 minute ago, trurl said:

rebuild to another disk

https://docs.unraid.net/unraid-os/manual/storage-management/#normal-replacement

trurl · January 7

9 minutes ago, trurl said:

I think I would make sure I had another copy of anything important and irreplaceable on another system before attempting anything else.

This should be your first concern.

trurl · January 7

How long have you been running like this?

Do you have Notifications setup to alert you immediately by email or other agent as soon as a problem is detected? Don't let one problem become multiple problems (already has) and data loss.

skler · January 7

First of all thanks you all for support. I appreciate it a lot.

First things I would do is to run an extended self-test but I can't do it due to the needs to stop the spin-down delay. But, when I change the settings I have a 502 error:

Jan  7 16:05:11 littleboy nginx: 2024/01/07 16:05:11 [error] 61578#61578: *16713838 connect() to unix:/var/run/emhttpd.socket failed (11: Resource temporarily unavailable) while connecting to upstream, client: 192.168.3.3, server: , request: "POST /update.htm HTTP/2.0", upstream: "http://unix:/var/run/emhttpd.socket:/update.htm", host: "10.1.10.191", referrer: "https://10.1.10.191/Settings/DiskSettings"

Moreover at the moment the system is running a read check I guess (I didn't anything) :

Maybe I didn't gave the right attention to what is happening, I build the array with unRaid few weeks ago using some disks from an old (not unraid) NAS.

Now, I can't backup all now, there are more than 60TB of data and I don't have another system for this kind of stuffs.

Disks with smart errors in array are: Disk3, Disk4, Disk7

Disk8 is disabled.

Disk7 is not included in the shares, is a TimeMachine backup. I can lose this data, are not so relevant.

Disk8 have some backups, I have a backup of it as a zfs snapshot on Disk3 (that have smart errors)

Disk3, Disk4 have some data, if I can preserve them is very good, I can't backup them, but if lost maybe I can recover them in some way.

===========================================================================================================================

What you suggest to do at this point?

4 hours ago, trurl said:

https://docs.unraid.net/unraid-os/manual/storage-management/#normal-replacement

Do a normal replacement on Disk8 or Disk3?

===========================================================================================================================

4 hours ago, trurl said:

Do you have Notifications setup to alert you immediately by email or other agent as soon as a problem is detected? Don't let one problem become multiple problems (already has) and data loss.

To be honest I receive tons of notification and I don't understand what are the most relevant, but I didn't received anything that tells me to change some disks. Only this morning I had a report on Disk8 error:

(Disk3 and Disk4 are reported as "OK" btw)

I had some warning on read errors (but it says that the sector was reallocated without problems)

And few days ago some normal state:

Maybe I have to activate some notifications on smart results?

Edited January 7 by skler

trurl · January 7

This is how I have Notifications setup.

I get all my notifications by email so I don't have to check the webUI to know about them.

trurl · January 7

Disk3 is has failed an extended test and needs to be replaced. Obviously it is not a good place for the snapshots.

Cancel the read check. Stop using your server for anything until we get all this fixed.

3 hours ago, skler said:

I build the array with unRaid few weeks ago using some disks from an old (not unraid) NAS.

I always say each additional disk is an additional point of failure. I recommend only having as many disks as you need for capacity. Most of your disks seem to be empty or nearly so. And at least some of them shouldn't be used anyway.

Looks like disks 7 and 8 are the only disks with any significant amount of data. Since the disk formerly assigned as disk8 is currently unassigned, can you mount it as an Unassigned Device? If so it might be a good idea to start over with that disk, the disk currently assigned as disk7, and maybe just a few of the newer larger disks in a New Configuration and rebuild parity with just those.

skler · January 8

9 hours ago, trurl said:

Disk3 is has failed an extended test and needs to be replaced. Obviously it is not a good place for the snapshots.

How did you find this? With notices I can activate report of that kind?

9 hours ago, trurl said:

Cancel the read check. Stop using your server for anything until we get all this fixed.

I always say each additional disk is an additional point of failure. I recommend only having as many disks as you need for capacity. Most of your disks seem to be empty or nearly so. And at least some of them shouldn't be used anyway.

That's a good point, I've started with this project in mind and only 4 disks but adding a disk is a really slow procedure due to the fact that building the parity is a long process, sometimes I will fill a disk quickly than the parity process ends, so I plan to add everything and change them with something with more capacity when/if fails or full.

TIPs: is there a solution to speed up the pairity process when adding new disks? if I zeroing a disk the process will run?

9 hours ago, trurl said:

Looks like disks 7 and 8 are the only disks with any significant amount of data. Since the disk formerly assigned as disk8 is currently unassigned, can you mount it as an Unassigned Device? If so it might be a good idea to start over with that disk, the disk currently assigned as disk7, and maybe just a few of the newer larger disks in a New Configuration and rebuild parity with just those.

yes, if I mount it all works. It is on zfs, manual mounting it I have all data and snapshots:

My actually load is this:

I've everything full except:

disk7: is the TimeMachine not included in shares

disks: 8-11 my "backup" share, I'm loading data in these days.

disks: 12-14 (not planned to be used at the moment, I can remove them if adding them back will not be a process that will require a week or more)

In the future, if the feature will be available, I'd like to create a second array with its parity with disks 8-14 for my backups (rarely used data), if the feature will not be released in the next version then I will add a second parity for everything

Disk3 is already planned to be changed with something else, before the error of disk8 was raised I've started to moving the data from disk3 to disk8, then I will remove this disk from the array and use it as unassigned device for testing things (it will fails soon I know, but In the meantime I can use it for non critical stuffs as I'm already doing with unassigned "dev2")

If I can follow this procedure:

15 hours ago, trurl said:

https://docs.unraid.net/unraid-os/manual/storage-management/#rebuilding-a-drive-onto-itself

I guess will be for me the best solution, due to the fact maybe was just an electrical problem (maybe with rain I had a problem with the electrical implant) I don't have an UPS at the moment

If not can I just exclude disk8? the "backup" share will be automatically populated on disks: 9-11?

trurl · January 8

19 hours ago, trurl said:

Disk3 has failed an extended test and needs to be replaced.

9 hours ago, skler said:

How did you find this?

This line in the SMART report for that disk:

On 1/7/2024 at 8:37 AM, trurl said:

# 1  Extended offline    Completed: read failure       90%     49686         5011200

9 hours ago, skler said:

adding a disk is a really slow procedure due to the fact that building the parity is a long process

The normal way of adding disks to the array does not require rebuilding parity.

https://docs.unraid.net/unraid-os/manual/storage-management/#adding-disks

Since physical disk8 seems to have its data and nothing is wrong with it, I would New Config with that disk assigned as disk8, go ahead and keep disk9 since it is 6TB, leave the others out, and rebuild parity. Personally, I would consider adding parity2 also since you will have so many disks and plan to have more. Of course, it would have to be 20TB parity2 since you already have 20TB data disks in the array. And if you could add it now you could build parity and parity2 at the same time when you New Config.

trurl · January 8

9 hours ago, skler said:

My actually load is this

I see that now in your screenshots.

20 hours ago, trurl said:

Looks like disks 7 and 8 are the only disks with any significant amount of data.

I guess I need to adjust my reading of diagnostics somewhat with the new ZFS capabilities. I don't use it myself.

skler · January 8

26 minutes ago, trurl said:

Since physical disk8 seems to have its data and nothing is wrong with it, I would New Config with that disk assigned as disk8, go ahead and keep disk9 since it is 6TB, leave the others out, and rebuild parity. Personally, I would consider adding parity2 also since you will have so many disks and plan to have more. Of course, it would have to be 20TB parity2 since you already have 20TB data disks in the array. And if you could add it now you could build parity and parity2 at the same time when you New Config.

Thanks a lot @trurl, this was an enterprise grade support.

I will double check all data in disk, to be sure there is nothing corrupted. I guess that starting from a new config will reset the parity disk and if there are some error in mine data it can't be recovered. BTW in both case seems to be the best solution, because I can restore the snapshot from disk3 to there.

Thanks a lot again.

Device Disabled, from syslog: was (sde) is now (sds)

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation