FQs19 Posted May 9, 2022 Share Posted May 9, 2022 (edited) My unraid server has been one problem after another. I just moved it to a new case that doesn't have a SATA backplane because the backplane was causing CRC errors with my disks, but now I lost one of my NVME Cache Pools, called nvmecache, which had both my Dockers and a couple Shares on it, including my APPDATA. No idea when or how I lost just the one cache pool because I had to restart my server several times including updating my motherboard's BIOS. Unfortunately, I set the Shares To 'Only' and don't have a backup of the files. Will this FAQ procedure help me recover my Cache Pool? I almost feel like I should just wipe absolutely everything including all my data and just start again. I'm spending so much time trying to recover files and drives, that it might be easier for me to take the 12 months of ripping my media to a fresh server. I will absolutely turn off Auto-Start array from now on and never use it again. Funny thing is, I've had Auto-Start off since the start of all my issues. So I'm really not sure what happened. FYI: Using 6.9.2 I have two Cache Pools. One called Arraycache and the other called nvmecache. They both use NVME drives. They are supposed to be Raid1. They have identical drives in each cache pool. This has happened to me before, but last time I only lost my appdata share which I was able to restore after recreating the cache pool and restoring my appdata using the Backup/Restore utility. Any help is appreciated. threadripper19-diagnostics-20220508-2158.zip Edited September 29, 2022 by FQs19 Topic Solved Quote Link to comment
Solution JorgeB Posted May 9, 2022 Solution Share Posted May 9, 2022 May 8 21:10:17 ThreadRipper19 kernel: BTRFS: device fsid 067b1393-440b-497e-b688-fb11e9c6611d devid 3 transid 645077 /dev/nvme3n1p1 scanned by udevd (2760) May 8 21:10:17 ThreadRipper19 kernel: BTRFS: device fsid 067b1393-440b-497e-b688-fb11e9c6611d devid 2 transid 291831 /dev/nvme0n1p1 scanned by udevd (2792) As you can see the transid of one of the devices is way off, it should be the same for all pool members, note the other pool below: May 8 21:10:17 ThreadRipper19 kernel: BTRFS: device fsid 565f4d7d-1b62-4a67-9414-ac108e2553f3 devid 3 transid 31253 /dev/nvme1n1p1 scanned by udevd (2741) May 8 21:10:17 ThreadRipper19 kernel: BTRFS: device fsid 565f4d7d-1b62-4a67-9414-ac108e2553f3 devid 2 transid 31253 /dev/nvme2n1p1 scanned by udevd (2727) The difference is so large that it couldn't happen if a device lost a few writes, so likely some corruption happened here, also note that Ryzen with overclocked RAM like you have is known to in some cases corrupt data, so suggest fixing that. You can first try restoring the superblock from backup to see if it helps, do it both both pool devices: btrfs-select-super -s 1 /dev/nvme3n1p1 btrfs-select-super -s 1 /dev/nvme0n1p1 Then reboot and post new diags. 1 Quote Link to comment
FQs19 Posted May 9, 2022 Author Share Posted May 9, 2022 5 hours ago, JorgeB said: May 8 21:10:17 ThreadRipper19 kernel: BTRFS: device fsid 067b1393-440b-497e-b688-fb11e9c6611d devid 3 transid 645077 /dev/nvme3n1p1 scanned by udevd (2760) May 8 21:10:17 ThreadRipper19 kernel: BTRFS: device fsid 067b1393-440b-497e-b688-fb11e9c6611d devid 2 transid 291831 /dev/nvme0n1p1 scanned by udevd (2792) As you can see the transid of one of the devices is way off, it should be the same for all pool members, note the other pool below: May 8 21:10:17 ThreadRipper19 kernel: BTRFS: device fsid 565f4d7d-1b62-4a67-9414-ac108e2553f3 devid 3 transid 31253 /dev/nvme1n1p1 scanned by udevd (2741) May 8 21:10:17 ThreadRipper19 kernel: BTRFS: device fsid 565f4d7d-1b62-4a67-9414-ac108e2553f3 devid 2 transid 31253 /dev/nvme2n1p1 scanned by udevd (2727) The difference is so large that it couldn't happen if a device lost a few writes, so likely some corruption happened here, also note that Ryzen with overclocked RAM like you have is known to in some cases corrupt data, so suggest fixing that. You can first try restoring the superblock from backup to see if it helps, do it both both pool devices: btrfs-select-super -s 1 /dev/nvme3n1p1 btrfs-select-super -s 1 /dev/nvme0n1p1 Then reboot and post new diags. I didn't even think to turnoff the memory overclock. Thanks for the FAQ link to that. Here's the memory I'm using: and according to the FAQ: I should only go as high as DDR4-3200. Wondering if I should just turn off D.O.C.P and let it run at its base speed of 2400 at 1.2V or manually set the memory to 3200 at 1.35V? I turned off Global C-State, but couldn't find anything in my ASUS ROG Zenith II Extreme Alpha bios for the "Power Supply Idle Control". I also turned off ErP Ready in my bios. I'll reboot with memory set to the default of DDR4-2400 and try restoring the superblocks then send you new diagnostics. Thanks Quote Link to comment
FQs19 Posted May 9, 2022 Author Share Posted May 9, 2022 @JorgeB I restored the superblock on both nvme drives: Spoiler root@ThreadRipper19:~# btrfs-select-super -s 1 /dev/nvme3n1p1 using SB copy 1, bytenr 67108864 root@ThreadRipper19:~# btrfs-select-super -s 1 /dev/nvme0n1p1 checksum verify failed on 451530457088 found 000000B6 wanted 00000000 checksum verify failed on 451530489856 found 000000B6 wanted 00000000 checksum verify failed on 451530539008 found 000000B6 wanted 00000000 checksum verify failed on 411917156352 found 000000B6 wanted 00000000 checksum verify failed on 411917189120 found 000000B6 wanted 00000000 checksum verify failed on 411889811456 found 000000B6 wanted 00000000 checksum verify failed on 411914977280 found 000000B6 wanted 00000000 checksum verify failed on 411888648192 found 000000B6 wanted 00000000 checksum verify failed on 412077031424 found 000000B6 wanted 00000000 checksum verify failed on 411892908032 found 000000B6 wanted 00000000 checksum verify failed on 411889827840 found 000000B6 wanted 00000000 checksum verify failed on 411917205504 found 000000B6 wanted 00000000 checksum verify failed on 411917221888 found 000000B6 wanted 00000000 checksum verify failed on 412108701696 found 000000B6 wanted 00000000 checksum verify failed on 412679864320 found 00000037 wanted 00000055 checksum verify failed on 412484960256 found 000000B6 wanted 00000000 checksum verify failed on 411996897280 found 000000B6 wanted 00000000 checksum verify failed on 412487876608 found 000000B6 wanted 00000000 checksum verify failed on 411987017728 found 0000002E wanted FFFFFF8A checksum verify failed on 411996930048 found 000000B6 wanted 00000000 checksum verify failed on 411996979200 found 000000B6 wanted 00000000 checksum verify failed on 411997044736 found 000000B6 wanted 00000000 checksum verify failed on 411867660288 found 000000B6 wanted 00000000 checksum verify failed on 411988262912 found 0000002C wanted FFFFFF98 checksum verify failed on 411997077504 found 000000B6 wanted 00000000 checksum verify failed on 411997126656 found 000000B6 wanted 00000000 checksum verify failed on 411989655552 found 000000A2 wanted 00000034 checksum verify failed on 412143370240 found 0000001F wanted 0000000D checksum verify failed on 411772092416 found 000000BF wanted FFFFFFA3 checksum verify failed on 412722511872 found 000000E4 wanted FFFFFF85 checksum verify failed on 411772108800 found 000000F7 wanted 00000043 checksum verify failed on 411997143040 found 000000B6 wanted 00000000 checksum verify failed on 412119367680 found 000000B6 wanted 00000000 checksum verify failed on 411997159424 found 000000B6 wanted 00000000 checksum verify failed on 411997175808 found 000000B6 wanted 00000000 checksum verify failed on 412119384064 found 000000B6 wanted 00000000 checksum verify failed on 412108718080 found 000000B6 wanted 00000000 checksum verify failed on 412673245184 found 0000000D wanted FFFFFF8B checksum verify failed on 412673261568 found 000000A9 wanted FFFFFFC5 checksum verify failed on 412488024064 found 000000B6 wanted 00000000 checksum verify failed on 412486598656 found 000000B6 wanted 00000000 checksum verify failed on 412488040448 found 000000B6 wanted 00000000 checksum verify failed on 411785969664 found 000000B6 wanted 00000000 checksum verify failed on 411781431296 found 000000EC wanted 00000009 checksum verify failed on 412128313344 found 000000B6 wanted 00000000 checksum verify failed on 411790196736 found 000000B6 wanted 00000000 checksum verify failed on 411790491648 found 000000B6 wanted 00000000 checksum verify failed on 411790884864 found 000000B6 wanted 00000000 checksum verify failed on 412713205760 found 00000002 wanted FFFFFFFE checksum verify failed on 411790966784 found 000000B6 wanted 00000000 checksum verify failed on 411993669632 found 000000B6 wanted 00000000 checksum verify failed on 411792343040 found 000000B6 wanted 00000000 checksum verify failed on 411792621568 found 000000B6 wanted 00000000 checksum verify failed on 411786018816 found 000000B6 wanted 00000000 checksum verify failed on 412527607808 found 000000B6 wanted 00000000 checksum verify failed on 412543631360 found 000000B6 wanted 00000000 checksum verify failed on 412000616448 found 000000B6 wanted 00000000 checksum verify failed on 411999436800 found 000000B6 wanted 00000000 checksum verify failed on 411857518592 found 0000004F wanted 0000005F checksum verify failed on 411994161152 found 000000B6 wanted 00000000 checksum verify failed on 412679471104 found 00000014 wanted 0000002E checksum verify failed on 412679487488 found 00000068 wanted FFFFFF95 checksum verify failed on 412707536896 found 00000043 wanted 00000066 checksum verify failed on 412713254912 found 0000002B wanted FFFFFF9A checksum verify failed on 411800797184 found 000000B6 wanted 00000028 checksum verify failed on 411801010176 found 00000045 wanted 0000006B checksum verify failed on 412488073216 found 000000B6 wanted 00000000 checksum verify failed on 412486893568 found 000000B6 wanted 00000000 checksum verify failed on 412116418560 found 000000B6 wanted 00000000 checksum verify failed on 412842098688 found 000000B6 wanted 00000000 checksum verify failed on 411786067968 found 000000B6 wanted 00000000 checksum verify failed on 412128329728 found 000000B6 wanted 00000000 checksum verify failed on 412773695488 found 000000A2 wanted FFFFFFCB checksum verify failed on 411900624896 found 000000B6 wanted 00000000 checksum verify failed on 412137570304 found 000000E5 wanted 0000000D checksum verify failed on 412035186688 found 00000034 wanted 00000000 checksum verify failed on 412028993536 found 0000000C wanted 00000000 checksum verify failed on 412424994816 found 000000F3 wanted 0000000D checksum verify failed on 412048588800 found 000000A9 wanted 00000000 checksum verify failed on 412054110208 found 00000033 wanted 00000000 checksum verify failed on 412048621568 found 00000014 wanted 00000000 checksum verify failed on 411841544192 found 0000004B wanted 0000000E checksum verify failed on 411786100736 found 000000B6 wanted 00000000 checksum verify failed on 412586737664 found 00000093 wanted 0000000D checksum verify failed on 411809873920 found 00000027 wanted 00000039 checksum verify failed on 411786133504 found 000000B6 wanted 00000000 checksum verify failed on 411786149888 found 000000B6 wanted 00000000 checksum verify failed on 411830026240 found 00000072 wanted 00000008 checksum verify failed on 411862335488 found 00000047 wanted 0000006A checksum verify failed on 411878277120 found 000000B6 wanted 00000000 checksum verify failed on 411786215424 found 000000B6 wanted 00000000 checksum verify failed on 412134031360 found 000000B6 wanted 00000000 checksum verify failed on 411899969536 found 000000B6 wanted 00000000 checksum verify failed on 411786297344 found 000000B6 wanted 00000000 checksum verify failed on 411928330240 found 000000ED wanted 00000054 checksum verify failed on 411772272640 found 00000054 wanted 0000000C checksum verify failed on 411937046528 found 00000089 wanted 0000007C checksum verify failed on 411899609088 found 000000B6 wanted 00000000 checksum verify failed on 411899707392 found 000000B6 wanted 00000000 checksum verify failed on 411899723776 found 000000B6 wanted 00000000 checksum verify failed on 412134047744 found 000000B6 wanted 00000000 checksum verify failed on 412825632768 found 000000CA wanted FFFFFFB4 checksum verify failed on 411875950592 found 000000B6 wanted 00000000 using SB copy 1, bytenr 67108864 root@ThreadRipper19:~# Here's the diagnostics after rebooting and before starting the array: threadripper19-diagnostics-20220509-1128.zip and here's the diagnostics after starting the array: threadripper19-diagnostics-20220509-1140.zip The nvmecache pool mounted and I can see my data, unfortunately the data isn't all there. The music share isn't that big of a deal. I can always re-rip that stuff, but I don't know about the appdata or system shares. Just noticed that I won't be able to restore my appdata because I had 'Delete backups if they are this many days old:' set to 15. I never noticed that there's no option in Backup/Restore Appdata utility to 'Keep' a certain number of backups. Seems really silly to not have an option to keep a certain amount of backups. How will that affect my Plex docker? What about my system share and the dockers in it? Quote Link to comment
JorgeB Posted May 9, 2022 Share Posted May 9, 2022 27 minutes ago, FQs19 said: unfortunately the data isn't all there. That's not unexpected, one of the devices had a bogus large transid, any data that was saved with a transid larger than the actual one would be lost. 29 minutes ago, FQs19 said: How will that affect my Plex docker? Can't really help with that, I never used Plex. As for the other docker if the appdata is current they should work, if not they need to be reconfigured, docker image can easily be recreated if needed. Quote Link to comment
FQs19 Posted May 9, 2022 Author Share Posted May 9, 2022 Just now, JorgeB said: That's not unexpected, one of the devices had a bogus large transid, any data that was saved with a transid larger than the actual one would be lost. Can't really help with that, I never used Plex. As for the other docker if the appdata is current they should work, if not they need to be reconfigured, docker image can easily be recreated if needed. Well, I really appreciate you getting my pool back online. I can reconfigure my dockers and plex. Getting Plex back to the state it was in before this mess is going to be a nightmare for sure. Do you have any suggestions on how I can have a backup of my appdata and system shares on my array, but also use the cache pool? I was going to submit a feature request to 'Keep this number of backups' for the Backup/Restore Appdata Utility. I guess I can try and set up a user script to run an rsync to a separate share on my array. Thoughts? Quote Link to comment
JorgeB Posted May 9, 2022 Share Posted May 9, 2022 14 minutes ago, FQs19 said: Do you have any suggestions on how I can have a backup of my appdata and system shares on my array, but also use the cache pool? There are several ways, I have a script that takes a snapshot everyday then sends it to another pool. Quote Link to comment
FQs19 Posted May 9, 2022 Author Share Posted May 9, 2022 9 minutes ago, JorgeB said: There are several ways, I have a script that takes a snapshot everyday then sends it to another pool. Oh nice. Would you mind sharing that script? Quote Link to comment
JorgeB Posted May 9, 2022 Share Posted May 9, 2022 I'm really bad with scripts, it works for me because I know its limitations, you're better off using for example the snapshots plugin. 1 Quote Link to comment
FQs19 Posted May 9, 2022 Author Share Posted May 9, 2022 1 minute ago, JorgeB said: I'm really bad with scripts, it works for me because I know its limitations, you're better off using for example the snapshots plugin. Will do. Thanks Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.