Jump to content

Device Disabled, from syslog: was (sde) is now (sds)


Go to solution Solved by trurl,

Recommended Posts

Posted (edited)

the compromised disk is: disk8, actually is no more included into the array:

Screenshot 2024-01-07 at 13.30.43.png

and recognized as sds:

2035977264_Screenshot2024-01-07at13_32_17.thumb.png.9e7c63c0170db00d0fb01c211b04f9ef.png

 

if I stop the array and change the disk8 from sde to sds is a good practice? or I should reboot everything?  

Edited by skler
added more infos
Link to comment

sorry jorge but I don't understand, what you mean with rebuild on top? 

 

the procedure is:
- restart unraid

- don't start the array

- change the disk8 with sds or sde if after reboot is recognized again as sde

- start the array

- start the parity check

 

is this right? 

 

I can do something without stopping the service, and stopping the array? 

Link to comment

Previous diagnostics shows disabled/emulated disk8 was mounting, so it should be OK to rebuild on top assuming no other problems. But some of your disks are pretty old. Do any show SMART warnings ( 👎 ) on the Dashboard page?

 

49 minutes ago, JorgeB said:

disk14 read errors on the other hand look like an actual disk problem, you should run an extended SMART test.

 

Link to comment
1 minute ago, trurl said:

assuming no other problems. But some of your disks are pretty old. Do any show SMART warnings ( 👎 ) on the Dashboard page?

Every bit of every array disk must be reliably read to reliably rebuild a missing or disabled disk.

Link to comment

Forget about the sds vs sde, there is nothing you can do about it anyway, and Unraid doesn't care. It only cares about the serial numbers and whether a disk got disconnected. And it did disconnect, which is why sd changed when it reconnected.

 

Do you have another copy of anything important and irreplaceable?

 

5 minutes ago, trurl said:

rebuild on top

 

https://docs.unraid.net/unraid-os/manual/storage-management/#rebuilding-a-drive-onto-itself

Link to comment
3 minutes ago, trurl said:

Previous diagnostics shows disabled/emulated disk8 was mounting, so it should be OK to rebuild on top assuming no other problems. But some of your disks are pretty old. Do any show SMART warnings ( 👎 ) on the Dashboard page?

 

 

yes, I have some but this was healthy

 

18063030_Screenshot2024-01-07at14_24_11.thumb.png.96e6b311b8f078bfda421a26a21eede9.png

 

errors btw are not "so bad" I guess, just few sector relocation (for ex. disk3): 

 

1795287130_Screenshot2024-01-07at14_26_32.thumb.png.bdb0647e9adc88c5ba2942abbbc10e81.png

 

 

Link to comment
4 minutes ago, trurl said:

Do you have another copy of anything important and irreplaceable?

no, I have an zfs snapshot on another disk and the parity 

 

6 minutes ago, trurl said:

Ok I will follow this guide. Just one question, the reconstruct process will preserve existing files or will format the device? I mean will be something fast if is all in place or not? 

 

Link to comment

Disk 3 has this:

# 1  Extended offline    Completed: read failure       90%     49686         5011200

so it needs replacing

 

Disk4 and disk7 haven't had an extended self-test.

 

And, as already mentioned

1 hour ago, JorgeB said:

disk14 read errors on the other hand look like an actual disk problem, you should run an extended SMART test.

 

I would worry whether you can safely rebuild anything with these other disk problems. If you had dual parity maybe it would be OK.

 

4 minutes ago, skler said:

no, I have an zfs snapshot on another disk and the parity 

Parity is not a substitute for backup.

 

I think I would make sure I had another copy of anything important and irreplaceable on another system before attempting anything else.

 

 

 

Link to comment

How long have you been running like this?

 

Do you have Notifications setup to alert you immediately by email or other agent as soon as a problem is detected? Don't let one problem become multiple problems (already has) and data loss.

Link to comment
Posted (edited)

First of all thanks you all for support. I appreciate it a lot. 

 

First things I would do is to run an extended self-test but I can't do it due to the needs to stop the spin-down delay. But, when I change the settings I have a 502 error:

 

Jan  7 16:05:11 littleboy nginx: 2024/01/07 16:05:11 [error] 61578#61578: *16713838 connect() to unix:/var/run/emhttpd.socket failed (11: Resource temporarily unavailable) while connecting to upstream, client: 192.168.3.3, server: , request: "POST /update.htm HTTP/2.0", upstream: "http://unix:/var/run/emhttpd.socket:/update.htm", host: "10.1.10.191", referrer: "https://10.1.10.191/Settings/DiskSettings"

 

Moreover at the moment the system is running a read check I guess (I didn't anything) : 

82130029_Screenshot2024-01-07at15_55_14.thumb.png.dd0a8b964a31fc473a39a2d04b6fc7e2.png

 

Maybe I didn't gave the right attention to what is happening, I build the array with unRaid few weeks ago using some disks from an old (not unraid) NAS. 

 

Now, I can't backup all now, there are more than 60TB of data and I don't have another system for this kind of stuffs. 

 

Disks with smart errors in array are: Disk3, Disk4, Disk7

Disk8 is disabled.

 

1514077905_Screenshot2024-01-07at16_10_02.thumb.png.08709f5dcd6e9b9ab1eed49f2e6059eb.png

 

Disk7 is not included in the shares, is a TimeMachine backup. I can lose this data, are not so relevant. 

Disk8 have some backups, I have a backup of it as a zfs snapshot on Disk3 (that have smart errors) 

Disk3, Disk4 have some data, if I can preserve them is very good, I can't backup them, but if lost maybe I can recover them in some way. 

 

===========================================================================================================================

 

What you suggest to do at this point?

4 hours ago, trurl said:

Do a normal replacement on Disk8 or Disk3? 

 

===========================================================================================================================

 

 

4 hours ago, trurl said:

Do you have Notifications setup to alert you immediately by email or other agent as soon as a problem is detected? Don't let one problem become multiple problems (already has) and data loss.

To be honest I receive tons of notification and I don't understand what are the most relevant, but I didn't received anything that tells me to change some disks. Only this morning I had a report on Disk8 error:

985173648_Screenshot2024-01-07at16_29_01.thumb.png.b826c8777310b172225435fd054d7eea.png

(Disk3 and Disk4 are reported as "OK" btw) 

 

I had some warning on read errors (but it says that the sector was reallocated without problems) 

2048387285_Screenshot2024-01-07at16_24_52.thumb.png.f54f1af9bc353a0cba04bf33ddd96412.png

 

And few days ago some normal state:

1191117595_Screenshot2024-01-07at16_23_54.thumb.png.283d39320da2866ea29c5088b62ee6c8.png

 

Maybe I have to activate some notifications on smart results?  

 

 

Edited by skler
Link to comment

Disk3 is has failed an extended test and needs to be replaced. Obviously it is not a good place for the snapshots.

 

Cancel the read check. Stop using your server for anything until we get all this fixed.

3 hours ago, skler said:

I build the array with unRaid few weeks ago using some disks from an old (not unraid) NAS. 

I always say each additional disk is an additional point of failure. I recommend only having as many disks as you need for capacity. Most of your disks seem to be empty or nearly so. And at least some of them shouldn't be used anyway.

 

Looks like disks 7 and 8 are the only disks with any significant amount of data. Since the disk formerly assigned as disk8 is currently unassigned, can you mount it as an Unassigned Device? If so it might be a good idea to start over with that disk, the disk currently assigned as disk7, and maybe just a few of the newer larger disks in a New Configuration and rebuild parity with just those.

Link to comment
9 hours ago, trurl said:

Disk3 is has failed an extended test and needs to be replaced. Obviously it is not a good place for the snapshots.

 

How did you find this? With notices I can activate report of that kind? 

 

9 hours ago, trurl said:

Cancel the read check. Stop using your server for anything until we get all this fixed.

I always say each additional disk is an additional point of failure. I recommend only having as many disks as you need for capacity. Most of your disks seem to be empty or nearly so. And at least some of them shouldn't be used anyway.

 

That's a good point, I've started with this project in mind and only 4 disks but adding a disk is a really slow procedure due to the fact that building the parity is a long process, sometimes I will fill a disk quickly than the parity process ends, so I plan to add everything and change them with something with more capacity when/if fails or full. 

 

TIPs: is there a solution to speed up the pairity process when adding new disks? if I zeroing a disk the process will run? 

 

9 hours ago, trurl said:

Looks like disks 7 and 8 are the only disks with any significant amount of data. Since the disk formerly assigned as disk8 is currently unassigned, can you mount it as an Unassigned Device? If so it might be a good idea to start over with that disk, the disk currently assigned as disk7, and maybe just a few of the newer larger disks in a New Configuration and rebuild parity with just those.

 

yes, if I mount it all works. It is on zfs, manual mounting it I have all data and snapshots: 

1652918330_Screenshot2024-01-08at05_11_18.thumb.png.14f0d79484d6c13e4c5541ecaeb1cfff.png

 

My actually load is this: 

1166948709_Screenshot2024-01-08at05_17_59.thumb.png.f706259187fa1630df4d74995b6026f9.png

 

I've everything full except:

disk7: is the TimeMachine not included in shares 

disks: 8-11  my "backup" share, I'm loading data in these days.  

disks: 12-14 (not planned to be used at the moment, I can remove them if adding them back will not be a process that will require a week or more) 

 

In the future, if the feature will be available, I'd like to create a second array with its parity with disks 8-14 for my backups (rarely used data), if the feature will not be released in the next version then I will add a second parity for everything  

 

Disk3 is already planned to be changed with something else, before the error of disk8 was raised I've started to moving the data from disk3 to disk8, then I will remove this disk from the array and use it as unassigned device for testing things (it will fails soon I know, but In the meantime I can use it for non critical stuffs as I'm already doing with unassigned "dev2") 

 

If I can follow this procedure: 

15 hours ago, trurl said:

I guess will be for me the best solution, due to the fact maybe was just an electrical problem (maybe with rain I had a problem with the electrical implant) I don't have an UPS at the moment

 

If not can I just exclude disk8? the "backup" share will be automatically populated on disks: 9-11?

 

 

 

 

Link to comment
  • Solution
19 hours ago, trurl said:

Disk3 has failed an extended test and needs to be replaced.

 

9 hours ago, skler said:

How did you find this?

 

This line in the SMART report for that disk:

On 1/7/2024 at 8:37 AM, trurl said:
# 1  Extended offline    Completed: read failure       90%     49686         5011200

 

9 hours ago, skler said:

adding a disk is a really slow procedure due to the fact that building the parity is a long process

The normal way of adding disks to the array does not require rebuilding parity.

 

https://docs.unraid.net/unraid-os/manual/storage-management/#adding-disks

 

Since physical disk8 seems to have its data and nothing is wrong with it, I would New Config with that disk assigned as disk8, go ahead and keep disk9 since it is 6TB, leave the others out, and rebuild parity. Personally, I would consider adding parity2 also since you will have so many disks and plan to have more. Of course, it would have to be 20TB parity2 since you already have 20TB data disks in the array. And if you could add it now you could build parity and parity2 at the same time when you New Config.

 

Link to comment
9 hours ago, skler said:

My actually load is this

I see that now in your screenshots.

 

20 hours ago, trurl said:

Looks like disks 7 and 8 are the only disks with any significant amount of data.

I guess I need to adjust my reading of diagnostics somewhat with the new ZFS capabilities. I don't use it myself.

  • Like 1
Link to comment
26 minutes ago, trurl said:

Since physical disk8 seems to have its data and nothing is wrong with it, I would New Config with that disk assigned as disk8, go ahead and keep disk9 since it is 6TB, leave the others out, and rebuild parity. Personally, I would consider adding parity2 also since you will have so many disks and plan to have more. Of course, it would have to be 20TB parity2 since you already have 20TB data disks in the array. And if you could add it now you could build parity and parity2 at the same time when you New Config.

 

Thanks a lot @trurl, this was an enterprise grade support. 

 

I will double check all data in disk, to be sure there is nothing corrupted. I guess that starting from a new config will reset the parity disk and if there are some error in mine data it can't be recovered. BTW in both case seems to be the best solution, because I can restore the snapshot from disk3 to there. 

 

Thanks a lot again. 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...