Array data loss during parity rebuild


Fuggin

Recommended Posts

I updated my dual parity drives and the parity was rebuilding when the scheduler happened to take over and wrote corrections to the re-building parity causing me to lose about half of my array data.

 

Can I put the old dual parity drives back in and rebuild the array from them?

Link to comment

I am confused as writing parity should never affect the content of the data drives :(    Perhaps if you post your system’s diagnostics zip file we might be able to get a better idea of what happened and the current state of the system.

 

BTW:  it is recommended that scheduled parity checks are set to be non-correcting - your description makes it sound as this is not how you have yours set?

Link to comment
8 minutes ago, Fuggin said:

and the parity was rebuilding when the scheduler happened to take over and wrote corrections to the re-building parity causing me to lose about half of my array data.

That doesn't make much sense, IIRC if the parity check is scheduled to start during a rebuild/re-sync it just starts over the actual operation Unraid was doing, please post the diagnostics.

Link to comment
26 minutes ago, itimpi said:

I am confused as writing parity should never affect the content of the data drives :(    Perhaps if you post your system’s diagnostics zip file we might be able to get a better idea of what happened and the current state of the system.

 

BTW:  it is recommended that scheduled parity checks are set to be non-correcting - your description makes it sound as this is not how you have yours set?

Yes...my corrections were set on (I have always had it on, assuming the whole point was to write corrections on the array if there was a problem)....the timing of the upgrading the parity drives with my monthly parity scheduler messed it up I think....I've uprgraded parity drives before but this is the first time I have ever lost data as a result.

 

Diags attached...thanks for the help.

 

tower-diagnostics-20211102-1238.zip

Edited by Fuggin
wrong word.
Link to comment
4 minutes ago, Fuggin said:

Yes...my corrections were set on (I have always had it on, assuming the whole point was to write corrections on the array if there was a problem)

No - the idea is to identify that you have a problem that you can investigate further.   If the problem is a data drive then you do NOT want it to cause invalid corrections to be made to parity which could then prejudice recovery of that drive’s contents if is later replaced.   I would only recommend having the option to correct parity set for a check which is initiated manually after you have decided that is the most appropriate action.

 

Still not clear how data loss can result from building parity as that is only reading from the data drives. 

Link to comment
1 minute ago, itimpi said:

No - the idea is to identify that you have a problem that you can investigate further.   If the problem is a data drive then you do NOT want it to cause invalid corrections to be made to parity which could then prejudice recovery of that drive’s contents if is later replaced.   I would only recommend having the option to correct parity set for a check which is initiated manually after you have decided that is the most appropriate action.

 

Still not clear how data loss can result from building parity as that is only reading from the data drives. 

Well...crap...I didn't know that...ok...gonna leave it off from here on out...

 

Anyways...any insight on what caused the data loss would be helpful...if the data drives are fine, can I still put my old parity drives back and rebuild the array?

Link to comment

Putting the original parity drives back is just going to result in the system rebuilding parity on those drives to match the data drives so as such will not affect data visibility.    
 

The diagnostics did not show the array started in normal mode so we could see if any data drives are not mounting (showing as unmountable) due to file system level corruption which would mean their contents would not show until the appropriate action is taken to repair the file system.

Link to comment
2 hours ago, JorgeB said:

Everything looks normal to me, all disks are mounting and none of the them is empty, certainly didn't have anything to do with the scheduled check, where are you missing data from?

TV_Shows share...odd thing though, not everything in that share was lost...just about 10-20 files still remained. I looked through the files and can't find anything that would have caused their deletion...

Link to comment
2 minutes ago, trurl said:

Are you sure you didn't accidentally move them? Or maybe one of your dockers moved or deleted them?

 

Why do you have 100G for docker.img and for libvirt.img? 20G is often more than enough for docker.img, and I don't think anyone has ever needed more than the default 1G for libvirt.img.

I never move them manually....it's all handled by sabnzbd/radarr/sonarr....  All I know is that all I did was install new larger parity drives and this happened. I am not worried about the data loss but I just need help knowing where to look and seeing what happened so I don't do it again. This is the first time since I started using Unraid (2012-2013-ish) that I have lost this much data.

As for the img sizes...I don't know.....did that so long ago...

Link to comment
1 minute ago, Fuggin said:

All I know is that all I did was install new larger parity drives and this happened.

Are you absolutely sure it didn't happen before replacing parity? As already mentioned, parity contains none of your data, and rebuilding parity changes none of the disks that have your data.

Link to comment
2 minutes ago, trurl said:

Are you absolutely sure it didn't happen before replacing parity? As already mentioned, parity contains none of your data, and rebuilding parity changes none of the disks that have your data.

I am absolutely certain...My array was 97% full while it was rebuilding....it was almost done rebuilding when I went to bed. Woke up and blammo....array was only 60% full...it's just bizarre.

Link to comment
7 minutes ago, trurl said:

Did all of that data exist when you booted your server Oct 28 17:57:47 ?

yes...

 

Spoiler

total 16K
drwxr-xr-x 34 root   root  680 Nov  2 13:12 ./
drwxr-xr-x 20 root   root  440 Nov  2 13:22 ../
drwxrwxrwx  1 nobody users  90 Nov  2 13:22 cache/
drwxrwxrwx  9 nobody users 161 Nov  2 13:22 disk1/
drwxrwxrwx  9 nobody users 161 Nov  2 13:22 disk10/
drwxrwxrwx  9 nobody users 161 Nov  2 13:22 disk11/
drwxrwxrwx  9 nobody users 129 Nov  2 13:22 disk12/
drwxrwxrwx  9 nobody users 161 Nov  2 13:22 disk13/
drwxrwxrwx  8 nobody users 141 Nov  2 13:22 disk14/
drwxrwxrwx  9 nobody users 161 Nov  2 13:22 disk15/
drwxrwxrwx  8 nobody users 141 Nov  2 13:22 disk16/
drwxrwxrwx  8 nobody users 113 Nov  2 13:22 disk17/
drwxrwxrwx  8 nobody users 141 Nov  2 13:22 disk18/
drwxrwxrwx  9 nobody users 161 Nov  2 13:22 disk19/
drwxrwxrwx  9 nobody users 161 Nov  2 13:22 disk2/
drwxrwxrwx  8 nobody users 141 Nov  2 13:22 disk20/
drwxrwxrwx  8 nobody users 141 Nov  2 13:22 disk21/
drwxrwxrwx  7 nobody users 130 Nov  2 13:22 disk22/
drwxrwxrwx  9 nobody users 158 Nov  2 13:22 disk3/
drwxrwxrwx  9 nobody users 161 Nov  2 13:22 disk4/
drwxrwxrwx  9 nobody users 161 Nov  2 13:22 disk5/
drwxrwxrwx  9 nobody users 161 Nov  2 13:22 disk6/
drwxrwxrwx  9 nobody users 161 Nov  2 13:22 disk7/
drwxrwxrwx  9 nobody users 161 Nov  2 13:22 disk8/
drwxrwxrwx  8 nobody users 141 Nov  2 13:22 disk9/
drwxrwxrwt  2 nobody users  40 Oct 28 17:58 disks/
drwxrwxr-x  2 nobody users   6 Nov  2 13:22 movies-cache/
drwxrwxrwt  2 nobody users  40 Oct 28 17:58 remotes/
drwxrwxrwx  2 nobody users   6 Nov  2 18:42 sabcomplete/
drwxr-xr-x  2 root   root   40 Nov  2 13:12 sabcomplete-cache/
drwxrwxrwx  4 nobody users  35 Nov  2 16:06 sabdownloads/
drwxrwxrwx  3 nobody users  22 Nov  2 13:22 tv-cache/
drwxrwxrwx  1 nobody users 161 Nov  2 18:42 user/
drwxrwxrwx  1 nobody users 161 Nov  2 13:22 user0/

 

Link to comment
23 minutes ago, trurl said:

booted your server Oct 28 17:57:47

Lots of stuff happening after that boot with disk assignments; wrong or missing array disks, pool disks assigned or not, parity rebuilds started then cancelled.

 

In addition to cache pool, you had pools named movies-cache, tv-cache, sabdownloads, and sabcomplete-cache. Later, it looks like you got rid of sabcomplete-cache and now have sabcomplete. But, there is still a path in /mnt for sabcomplete-cache as seen in your ls output above and as noted by FCP:

Nov  2 13:22:08 Tower root: Fix Common Problems: Error: Invalid folder sabcomplete-cache contained within /mnt

Do you have any docker or anything else that specifies the path /mnt/sabcomplete-cache ?

 

Did you ever have any user share that specified sabcomplete-cache ?

 

Anything else you can tell us about all the things you did since booting and before this last parity rebuild?

Link to comment

Your array disks are all mountable, and each contains a lot of data. Your pools are nearly empty. It looks like sdh (sabcomplete) had some problems but was later formatted.

 

If there are any missing files, someone or something must have deleted them. Parity rebuild can't delete files because parity doesn't know anything about files, only bits. And parity rebuild doesn't change anything on any data disk. It reads all data disks to get the result of the parity calculation(s) to write to the parity disk(s), but it isn't reading files, just bits.

 

Even if you were rebuilding a data disk, a problem rebuilding wouldn't result in deleted files, because rebuild from parity doesn't know anything about files. You would get filesystem corruption and unmountable disk instead of missing files.

 

 

Link to comment
13 hours ago, trurl said:

Your array disks are all mountable, and each contains a lot of data. Your pools are nearly empty. It looks like sdh (sabcomplete) had some problems but was later formatted.

 

If there are any missing files, someone or something must have deleted them. Parity rebuild can't delete files because parity doesn't know anything about files, only bits. And parity rebuild doesn't change anything on any data disk. It reads all data disks to get the result of the parity calculation(s) to write to the parity disk(s), but it isn't reading files, just bits.

 

Even if you were rebuilding a data disk, a problem rebuilding wouldn't result in deleted files, because rebuild from parity doesn't know anything about files. You would get filesystem corruption and unmountable disk instead of missing files.

 

 

Thank you. I am going to look into Sonarr logs and see if there was something there that caused the files to get deleted since it was mostly TV media files.

Link to comment
15 hours ago, trurl said:

still a path in /mnt for sabcomplete-cache as seen in your ls output above and as noted by FCP:

Nov  2 13:22:08 Tower root: Fix Common Problems: Error: Invalid folder sabcomplete-cache contained within /mnt

Do you have any docker or anything else that specifies the path /mnt/sabcomplete-cache ?

You may have to reboot to get rid of that. And if it comes back then something must be creating it by specifying that path. Since there isn't any storage mounted at that path, it would be a path in rootfs (RAM).

Link to comment

So...I know what I did wrong...and I thought I have done this many times and never had problems.

 

I hit new config in tools, added the larger, new parity drives, rearranged the physical locations of the data disks AND their order in the array.....

 

Pretty sure that did the damage. Lesson learned...

Link to comment
37 minutes ago, Fuggin said:

So...I know what I did wrong...and I thought I have done this many times and never had problems.

 

I hit new config in tools, added the larger, new parity drives, rearranged the physical locations of the data disks AND their order in the array.....

 

Pretty sure that did the damage. Lesson learned...

Unless you assigned a data drive to a parity slot it doesn't matter what order the drives are assigned. The only time order matters is when you don't want to rebuild parity2.

 

And in any case, New Config will only write to the drives in the parity slots, and only read from the drives in the data slots. Parity contains none of your data.

Link to comment

Yeah then I am really confused...I know I didn't assign a data drive to a parity slot...I've done this before...it just doesn't make sense what happened.

 

All my drives showed up...I assigned my new parity drives first because their serial numbers were unique and similar enough, then the rest were the data drives that were in the system already...I just when down the line and re-assigned them. I started the array, it asked to rebuild the parity drives and it went ahead and did so. It finished rebuilding overnight, I woke up and 48TB was gone.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.