Parity drive disabled


Recommended Posts

So, checked all connections, moved the 2x problematic drives (3 and 4) to other bays vacated by removing unused drives, rebooted, (gui fine now), ran diags, attached.

 

For info - all drives in the array are on a hotswap backplane, with 8 x SATA ports connected to the SAS card by 2x SAS->SATA 4 way splitters. I chose the card after a long conversation (well several) on this forum, as others had had success with them.

tower-diagnostics-20230215-1623.zip

Link to comment

You have to start the array to start the docker service. You can't start the array because you have a missing disk and a disabled disk, but only single parity.

 

It would be possible to New Config/Rebuild parity without the missing disk, then the disabled disk would be accepted just as it is. But you would have no way to rebuild the missing disk.

 

It is also possible to get Unraid to accept disk4 just as it is and rebuild disk3 instead. How well that would work depends on how out-of-sync things are.

 

17 hours ago, trurl said:

parity rebuild had not finished when you were having problems with those 2 data disks, so not clear parity build would have been good

  

Might even be best if both data disks were used in a New Config just as they are and try parity rebuild again. Assuming we can get disk3 to show up.

 

Do you have backups of anything important and irreplaceable?

Link to comment
13 minutes ago, banterer said:

if we can at least get disk4 up and running, my data should be safe?

According to those first diagnostics you posted, both disk3 and disk4 were 80% full. You have to have disk3 to rebuild disk4. Or, you have to have disk4 to rebuild disk3. Not sure how well rebuild would work in either case depending on how good parity was or how out-of-sync disk4 had gotten while it was disabled.

 

Link to comment
21 minutes ago, trurl said:

Do you have backups of anything important and irreplaceable?

 

Yeah this is where I say 'no', and you tell me I'm stupid, right? The truth is a lot more complicated than that. EG right now I'm supposed to be doing stuff to something on the cloud, that this is the offsite backup for. I can't do the cloud stuff, because I can't risk it without a backup. And a dozen other complex things that would take too long to explain.

 

What would you do, out of the two options (force accept disk4, and new config)?

Link to comment
1 hour ago, trurl said:

backups of anything important and irreplaceable?

I mention backups because, often, not everything is important and irreplaceable. So it might make sense to concentrate our efforts on things that are.

 

Disk3 never gave a SMART report even on your first diagnostics, so couldn't really see how healthy it was. Was it an old disk? Or maybe a new disk that hadn't been tested?

 

Did it have any data on it that it might be worth going to some extra trouble to try to recover?

Link to comment
1 hour ago, trurl said:

Disk3 never gave a SMART report even on your first diagnostics, so couldn't really see how healthy it was. Was it an old disk? Or maybe a new disk that hadn't been tested?

 

Did it have any data on it that it might be worth going to some extra trouble to try to recover?

 

It was one of the older ones. But I don't really know which data was on it, as the folders are all split. Is that something I can find out somehow?

Link to comment
4 minutes ago, banterer said:

Is that something I can find out somehow?

Only by elimination unless we can recover its data.

 

We do have this much information from those first diagnostics when disk3 was apparently working well enough to read.

appdata                           shareUseCache="prefer"  # Share exists on cache, disk1, disk2
b-----s                           shareUseCache="no"      # Share exists on disk1, disk2, disk3, disk4
c--v                              shareUseCache="yes"     # Share exists on disk1, disk3, disk4
domains                           shareUseCache="prefer"  # Share exists on cache
d-------s                         shareUseCache="yes"     # Share exists on disk1, disk2
isos                              shareUseCache="yes"     # Share exists on disk1, disk2
kr                                shareUseCache="prefer"  # Share does not exist
m---a                             shareUseCache="no"      # Share exists on disk1, disk2, disk3, disk4
system                            shareUseCache="prefer"  # Share exists on disk1
v-s                               shareUseCache="prefer"  # Share exists on disk1

 

 

Link to comment

Just had a closer look at your most recent diagnostics. Disks 1, 2 are the only healthy disks you have in the array. That is going to make things very tricky indeed. We can't count on parity or disk4 even if their contents were good.

 

At this point we would probably begin talking about cloning disk4 since you can't rebuild it. And we would probably give up on disk3 entirely.

 

Why were you using all these bad disks? And looking at those first diagnostics, you were apparently ignoring this crucial warning from Fix Common Problems.

Jan 29 17:18:01 Tower root: Fix Common Problems: Warning: No destination (browser / email / agents set for Warning level notifications

You must setup Notifications to alert you immediately by email or other agent as soon as a problem is detected. Don't allow one problem to become multiple problems and data loss.

 

Likely you had multiple problems to begin with. Notifications would have been screaming at you about all those bad disks. They also would have all had SMART warnings ( 👎 )on the Dashboard page. Probably parity and disk4 are showing those now if they are spunup.

Link to comment
2 minutes ago, trurl said:

We can see if the array can emulate disk3 but I don't have much hope.

 

We will have to have a spare 3TB disk that can play the role of disk 3 so we can disable it instead of disk4.

 

Ok, so first step get another disk. I guess it can be bigger than 3TB, just at least 3TB, right?

 

Can you list out the steps I should take?

Link to comment
7 minutes ago, trurl said:

When do you expect to have a spare disk?

Ultimately, you need to replace parity and disk4 also regardless of how disk3 comes out. But maybe wait and see before getting those extra disks in case it makes sense to get larger parity.

 

Expect to work on this for a few days with no guarantees.

Link to comment

I'm going to summarize where we are and suggest ways to proceed in case anyone else wants to get involved.

 

Multiple disk problems during parity rebuild. Since then, parity and disk4 have many pending sectors. Disks 1, 2 seem OK.

 

Disk4 currently disabled, disk3 missing and presumed dead.

 

I am thinking about trying to get the array to emulate disk3 instead of currently disabled disk4 using the usual trick of New Config/Trust parity to get all disks into the array including a spare disk3, then disable disk3.

 

I would expect emulated disk3 to be unmountable at that point, and usually we would repair filesystem before rebuilding. But since we aren't rebuilding on top of the original, and the whole array is pretty shaky anyway, maybe do the repair after rebuild (if we can even get that far).

 

Other things that might be considered is cloning parity and disk4 before doing anything else, but that would mean even more spare disks.

 

I'm going to ping some of the usual suspects

@JonathanM

@itimpi

and of course

@JorgeB Way past bedtime in that part of the world so probably be some hours before any response

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.