Sometimes when running a scheduled parity check I don't get enough throughput from the system for other operations and so I need to pause it. The problem is sometimes I forget to unpause the check to allow it to continue. It's be great if the pause function had a little drop down next to it that allowed one to 'Pause for Y hours' 2,4,8,12,24,48 or maybe X to set a custom value.
Good point, they were a group of metadata files that 'corrupted'.
I supposed it's possible they were updated from somewhere all on the same drive between the time the checksum and checks were made.
That would explain that.
I'm running Dynamix File Integrity and for the first time have some bitrot corruption on one of my drives.
What's protocol when this happens?
Can I replace the drive with a larger drive, format the removed drive to reset the sectors, and use it to upgrade another drive
or is it a sign of drive failure that's just going to get worse and I should put it out to pasture?
Unfortunately it didn't work.
jonnie was correct. I even tried to start in maintenance mode as a first step thinking it would not mount any of the drives as a precaution. By the time I was able to stop it I had 7303 parity writes.
I ended up popping in a replacement drive and starting from scratch with a 'new' array.
I mounted the 'broken' drive outside the array and once the initial parity is rebuilt (protecting the rest of the array) I'll copy the data back over from lost+found and run a crc check on it vs the backup data.
Thanks for all your help. I really appreciate the support you provide this community when things go south.
I'll open up a new thread for Tom asking that 'Trust My Parity' actually does so.
Let me preface this with: Ughhhhhh
This seem to come down to an EEOC error.
When reassigning drives, I accidentally swapped the Parity Drive with a Data disk.
This is why the one drive was unmountable and why no superblock could be found.
Now, you may take a moment to slap me upside the head for being stupid.
...Go ahead. I'll wait...
Now with that out of the way here is my plan.
Since my parity drive was not touched beyond the superblock scans, that should be ok.
The rest of the array was not really written to during the shrink EXCEPT for the drive that was accidentally made the 'new parity' - disk 1
This means I just have to address that one disk.
I was thinking of this:
1) re-initializing the array
2) set the the old assignments
3) tell Unraid to 'Trust my Parity'
4) Shut down and fail disk 1 by replacing the drive
5) Let a rebuild occur on disk 1
In theory this should get me back to square 1ish at which point I can run a crc check on the data against the backups
Does this sound reasonable? Is there something I'm not thinking of?
Yup. I fell in the camp of converting ALL my disks over to XFS for unraid 6
I have backups, but the hardest part is figuring out what was on this particular disk in relation to the others to restore.
Since my backups aren't actively online it's harder to run a straight diff on the whole array with ViceVersa or the like.
Appreciate the UPS suggestions. What I've had had been good enough for a while.
Yup. There's a UPS. I forced the bad shutdown.
That said, my UPS is just a generic one. Is there one you recommend that can trigger a graceful shutdown?
(xfs_repair is still searching for a secondary superblock)
Thanks BRIT good to hear about BTRFS (a lot less work for me too)
I'm running the commands on the /dev/sd# with the array not started
Since the drive is 'unmountable' it wouldn't be available on /dev/md#
Parity is already shot since I was trying to shrink the array at the time.
Yes I caused it.
I'll need to replace it since I can't find a valid superblock for drive. I figure I'll put another drive in the array and run this corrupted drive outside the array and copy what I can off of it to the replacement drive.
Good thought. I ran a xfs_repair -n to check and that's when it replied 'Sorry, could not find valid secondary superblock option' (Which I found odd)
I'll likely have to blow it out with a xfs_repair -L to at least hope I can easily recover the data