Jump to content

(Solved) NVME Cache Pool Errored - Unmountable: No File System


Go to solution Solved by JorgeB,

Recommended Posts

My unraid server has been one problem after another. 

I just moved it to a new case that doesn't have a SATA backplane because the backplane was causing CRC errors with my disks, but now I lost one of my NVME Cache Pools, called nvmecache, which had both my Dockers and a couple Shares on it, including my APPDATA. No idea when or how I lost just the one cache pool because I had to restart my server several times including updating my motherboard's BIOS. 

Unfortunately, I set the Shares To 'Only' and don't have a backup of the files. 

 

Will this FAQ procedure help me recover my Cache Pool?

 

I almost feel like I should just wipe absolutely everything including all my data and just start again.

I'm spending so much time trying to recover files and drives, that it might be easier for me to take the 12 months of ripping my media to a fresh server.  

 

I will absolutely turn off Auto-Start array from now on and never use it again. 

Funny thing is, I've had Auto-Start off since the start of all my issues. So I'm really not sure what happened. 

 

FYI:

Using 6.9.2

I have two Cache Pools. One called Arraycache and the other called nvmecache. They both use NVME drives. 

They are supposed to be Raid1. They have identical drives in each cache pool. 

This has happened to me before, but last time I only lost my appdata share which I was able to restore after recreating the cache pool and restoring my appdata using the Backup/Restore utility. 

 

Any help is appreciated. 

 

682925509_Screenshot2022-05-08215746.thumb.jpg.095c8f3b49df0a7963197a678bef3cb8.jpg1553949903_Screenshot2022-05-08215907.thumb.jpg.205dcb5f0af395fd1672ea74a1cf6ce7.jpg

1002974962_Screenshot2022-05-08221148.jpg.b3f63805b2db27ddf40222a0fce3d99d.jpg1796624025_Screenshot2022-05-08221233.jpg.521eb200d9d561751103f75cf250f39a.jpg27815356_Screenshot2022-05-08221252.jpg.f95bc845c0d16e2d294ffce63e022dcf.jpg

threadripper19-diagnostics-20220508-2158.zip

Edited by FQs19
Topic Solved
Link to comment
  • Solution
May  8 21:10:17 ThreadRipper19 kernel: BTRFS: device fsid 067b1393-440b-497e-b688-fb11e9c6611d devid 3 transid 645077 /dev/nvme3n1p1 scanned by udevd (2760)
May  8 21:10:17 ThreadRipper19 kernel: BTRFS: device fsid 067b1393-440b-497e-b688-fb11e9c6611d devid 2 transid 291831 /dev/nvme0n1p1 scanned by udevd (2792)

 

As you can see the transid of one of the devices is way off, it should be the same for all pool members, note the other pool below:

 

May  8 21:10:17 ThreadRipper19 kernel: BTRFS: device fsid 565f4d7d-1b62-4a67-9414-ac108e2553f3 devid 3 transid 31253 /dev/nvme1n1p1 scanned by udevd (2741)
May  8 21:10:17 ThreadRipper19 kernel: BTRFS: device fsid 565f4d7d-1b62-4a67-9414-ac108e2553f3 devid 2 transid 31253 /dev/nvme2n1p1 scanned by udevd (2727)

 

The difference is so large that it couldn't happen if a device lost a few writes, so likely some corruption happened here, also note that Ryzen with overclocked RAM like you have is known to in some cases corrupt data, so suggest fixing that.

 

You can first try restoring the superblock from backup to see if it helps, do it both both pool devices:


 

btrfs-select-super -s 1 /dev/nvme3n1p1
btrfs-select-super -s 1 /dev/nvme0n1p1

 

Then reboot and post new diags.

  • Thanks 1
Link to comment
5 hours ago, JorgeB said:
May  8 21:10:17 ThreadRipper19 kernel: BTRFS: device fsid 067b1393-440b-497e-b688-fb11e9c6611d devid 3 transid 645077 /dev/nvme3n1p1 scanned by udevd (2760)
May  8 21:10:17 ThreadRipper19 kernel: BTRFS: device fsid 067b1393-440b-497e-b688-fb11e9c6611d devid 2 transid 291831 /dev/nvme0n1p1 scanned by udevd (2792)

 

As you can see the transid of one of the devices is way off, it should be the same for all pool members, note the other pool below:

 

May  8 21:10:17 ThreadRipper19 kernel: BTRFS: device fsid 565f4d7d-1b62-4a67-9414-ac108e2553f3 devid 3 transid 31253 /dev/nvme1n1p1 scanned by udevd (2741)
May  8 21:10:17 ThreadRipper19 kernel: BTRFS: device fsid 565f4d7d-1b62-4a67-9414-ac108e2553f3 devid 2 transid 31253 /dev/nvme2n1p1 scanned by udevd (2727)

 

The difference is so large that it couldn't happen if a device lost a few writes, so likely some corruption happened here, also note that Ryzen with overclocked RAM like you have is known to in some cases corrupt data, so suggest fixing that.

 

You can first try restoring the superblock from backup to see if it helps, do it both both pool devices:


 

btrfs-select-super -s 1 /dev/nvme3n1p1
btrfs-select-super -s 1 /dev/nvme0n1p1

 

Then reboot and post new diags.

I didn't even think to turnoff the memory overclock. Thanks for the FAQ link to that. 

 

Here's the memory I'm using:

1390577934_Screenshot2022-05-09101538.thumb.jpg.83bb5fd32b253fc564985ac3eac62dd2.jpg

 

and according to the FAQ:192427089_Screenshot2022-05-09101501.jpg.9e44ce66da2448e652d78938b9200997.jpg

I should only go as high as DDR4-3200.

Wondering if I should just turn off D.O.C.P and let it run at its base speed of 2400 at 1.2V

or

manually set the memory to 3200 at 1.35V?

 

I turned off Global C-State, but couldn't find anything in my ASUS ROG Zenith II Extreme Alpha bios for the "Power Supply Idle Control". 

I also turned off ErP Ready in my bios. 

 

I'll reboot with memory set to the default of DDR4-2400 and try restoring the superblocks then send you new diagnostics.

Thanks

 

 

Link to comment

@JorgeB

 

I restored the superblock on both nvme drives:

Spoiler

root@ThreadRipper19:~# btrfs-select-super -s 1 /dev/nvme3n1p1
using SB copy 1, bytenr 67108864
root@ThreadRipper19:~# btrfs-select-super -s 1 /dev/nvme0n1p1
checksum verify failed on 451530457088 found 000000B6 wanted 00000000
checksum verify failed on 451530489856 found 000000B6 wanted 00000000
checksum verify failed on 451530539008 found 000000B6 wanted 00000000
checksum verify failed on 411917156352 found 000000B6 wanted 00000000
checksum verify failed on 411917189120 found 000000B6 wanted 00000000
checksum verify failed on 411889811456 found 000000B6 wanted 00000000
checksum verify failed on 411914977280 found 000000B6 wanted 00000000
checksum verify failed on 411888648192 found 000000B6 wanted 00000000
checksum verify failed on 412077031424 found 000000B6 wanted 00000000
checksum verify failed on 411892908032 found 000000B6 wanted 00000000
checksum verify failed on 411889827840 found 000000B6 wanted 00000000
checksum verify failed on 411917205504 found 000000B6 wanted 00000000
checksum verify failed on 411917221888 found 000000B6 wanted 00000000
checksum verify failed on 412108701696 found 000000B6 wanted 00000000
checksum verify failed on 412679864320 found 00000037 wanted 00000055
checksum verify failed on 412484960256 found 000000B6 wanted 00000000
checksum verify failed on 411996897280 found 000000B6 wanted 00000000
checksum verify failed on 412487876608 found 000000B6 wanted 00000000
checksum verify failed on 411987017728 found 0000002E wanted FFFFFF8A
checksum verify failed on 411996930048 found 000000B6 wanted 00000000
checksum verify failed on 411996979200 found 000000B6 wanted 00000000
checksum verify failed on 411997044736 found 000000B6 wanted 00000000
checksum verify failed on 411867660288 found 000000B6 wanted 00000000
checksum verify failed on 411988262912 found 0000002C wanted FFFFFF98
checksum verify failed on 411997077504 found 000000B6 wanted 00000000
checksum verify failed on 411997126656 found 000000B6 wanted 00000000
checksum verify failed on 411989655552 found 000000A2 wanted 00000034
checksum verify failed on 412143370240 found 0000001F wanted 0000000D
checksum verify failed on 411772092416 found 000000BF wanted FFFFFFA3
checksum verify failed on 412722511872 found 000000E4 wanted FFFFFF85
checksum verify failed on 411772108800 found 000000F7 wanted 00000043
checksum verify failed on 411997143040 found 000000B6 wanted 00000000
checksum verify failed on 412119367680 found 000000B6 wanted 00000000
checksum verify failed on 411997159424 found 000000B6 wanted 00000000
checksum verify failed on 411997175808 found 000000B6 wanted 00000000
checksum verify failed on 412119384064 found 000000B6 wanted 00000000
checksum verify failed on 412108718080 found 000000B6 wanted 00000000
checksum verify failed on 412673245184 found 0000000D wanted FFFFFF8B
checksum verify failed on 412673261568 found 000000A9 wanted FFFFFFC5
checksum verify failed on 412488024064 found 000000B6 wanted 00000000
checksum verify failed on 412486598656 found 000000B6 wanted 00000000
checksum verify failed on 412488040448 found 000000B6 wanted 00000000
checksum verify failed on 411785969664 found 000000B6 wanted 00000000
checksum verify failed on 411781431296 found 000000EC wanted 00000009
checksum verify failed on 412128313344 found 000000B6 wanted 00000000
checksum verify failed on 411790196736 found 000000B6 wanted 00000000
checksum verify failed on 411790491648 found 000000B6 wanted 00000000
checksum verify failed on 411790884864 found 000000B6 wanted 00000000
checksum verify failed on 412713205760 found 00000002 wanted FFFFFFFE
checksum verify failed on 411790966784 found 000000B6 wanted 00000000
checksum verify failed on 411993669632 found 000000B6 wanted 00000000
checksum verify failed on 411792343040 found 000000B6 wanted 00000000
checksum verify failed on 411792621568 found 000000B6 wanted 00000000
checksum verify failed on 411786018816 found 000000B6 wanted 00000000
checksum verify failed on 412527607808 found 000000B6 wanted 00000000
checksum verify failed on 412543631360 found 000000B6 wanted 00000000
checksum verify failed on 412000616448 found 000000B6 wanted 00000000
checksum verify failed on 411999436800 found 000000B6 wanted 00000000
checksum verify failed on 411857518592 found 0000004F wanted 0000005F
checksum verify failed on 411994161152 found 000000B6 wanted 00000000
checksum verify failed on 412679471104 found 00000014 wanted 0000002E
checksum verify failed on 412679487488 found 00000068 wanted FFFFFF95
checksum verify failed on 412707536896 found 00000043 wanted 00000066
checksum verify failed on 412713254912 found 0000002B wanted FFFFFF9A
checksum verify failed on 411800797184 found 000000B6 wanted 00000028
checksum verify failed on 411801010176 found 00000045 wanted 0000006B
checksum verify failed on 412488073216 found 000000B6 wanted 00000000
checksum verify failed on 412486893568 found 000000B6 wanted 00000000
checksum verify failed on 412116418560 found 000000B6 wanted 00000000
checksum verify failed on 412842098688 found 000000B6 wanted 00000000
checksum verify failed on 411786067968 found 000000B6 wanted 00000000
checksum verify failed on 412128329728 found 000000B6 wanted 00000000
checksum verify failed on 412773695488 found 000000A2 wanted FFFFFFCB
checksum verify failed on 411900624896 found 000000B6 wanted 00000000
checksum verify failed on 412137570304 found 000000E5 wanted 0000000D
checksum verify failed on 412035186688 found 00000034 wanted 00000000
checksum verify failed on 412028993536 found 0000000C wanted 00000000
checksum verify failed on 412424994816 found 000000F3 wanted 0000000D
checksum verify failed on 412048588800 found 000000A9 wanted 00000000
checksum verify failed on 412054110208 found 00000033 wanted 00000000
checksum verify failed on 412048621568 found 00000014 wanted 00000000
checksum verify failed on 411841544192 found 0000004B wanted 0000000E
checksum verify failed on 411786100736 found 000000B6 wanted 00000000
checksum verify failed on 412586737664 found 00000093 wanted 0000000D
checksum verify failed on 411809873920 found 00000027 wanted 00000039
checksum verify failed on 411786133504 found 000000B6 wanted 00000000
checksum verify failed on 411786149888 found 000000B6 wanted 00000000
checksum verify failed on 411830026240 found 00000072 wanted 00000008
checksum verify failed on 411862335488 found 00000047 wanted 0000006A
checksum verify failed on 411878277120 found 000000B6 wanted 00000000
checksum verify failed on 411786215424 found 000000B6 wanted 00000000
checksum verify failed on 412134031360 found 000000B6 wanted 00000000
checksum verify failed on 411899969536 found 000000B6 wanted 00000000
checksum verify failed on 411786297344 found 000000B6 wanted 00000000
checksum verify failed on 411928330240 found 000000ED wanted 00000054
checksum verify failed on 411772272640 found 00000054 wanted 0000000C
checksum verify failed on 411937046528 found 00000089 wanted 0000007C
checksum verify failed on 411899609088 found 000000B6 wanted 00000000
checksum verify failed on 411899707392 found 000000B6 wanted 00000000
checksum verify failed on 411899723776 found 000000B6 wanted 00000000
checksum verify failed on 412134047744 found 000000B6 wanted 00000000
checksum verify failed on 412825632768 found 000000CA wanted FFFFFFB4
checksum verify failed on 411875950592 found 000000B6 wanted 00000000
using SB copy 1, bytenr 67108864
root@ThreadRipper19:~# 

 

Here's the diagnostics after rebooting and before starting the array:

threadripper19-diagnostics-20220509-1128.zip

 

and here's the diagnostics after starting the array:

threadripper19-diagnostics-20220509-1140.zip

 

The nvmecache pool mounted and I can see my data, unfortunately the data isn't all there. 

The music share isn't that big of a deal. I can always re-rip that stuff, but I don't know about the appdata or system shares. 

Just noticed that I won't be able to restore my appdata because I had 'Delete backups if they are this many days old:' set to 15.

I never noticed that there's no option in Backup/Restore Appdata utility to 'Keep' a certain number of backups. Seems really silly to not have an option to keep a certain amount of backups.

How will that affect my Plex docker?

 

What about my system share and the dockers in it?

Link to comment
27 minutes ago, FQs19 said:

unfortunately the data isn't all there. 

That's not unexpected, one of the devices had a bogus large transid, any data that was saved with a transid larger than the actual one would be lost.

 

29 minutes ago, FQs19 said:

How will that affect my Plex docker?

Can't really help with that, I never used Plex.

 

As for the other docker if the appdata is current they should work, if not they need to be reconfigured, docker image can easily be recreated if needed.

Link to comment
Just now, JorgeB said:

That's not unexpected, one of the devices had a bogus large transid, any data that was saved with a transid larger than the actual one would be lost.

 

Can't really help with that, I never used Plex.

 

As for the other docker if the appdata is current they should work, if not they need to be reconfigured, docker image can easily be recreated if needed.

Well, I really appreciate you getting my pool back online. 

I can reconfigure my dockers and plex.

Getting Plex back to the state it was in before this mess is going to be a nightmare for sure. 

 

Do you have any suggestions on how I can have a backup of my appdata and system shares on my array, but also use the cache pool?

I was going to submit a feature request to 'Keep this number of backups' for the Backup/Restore Appdata Utility.

I guess I can try and set up a user script to run an rsync to a separate share on my array. 

Thoughts?

Link to comment
  • FQs19 changed the title to (Solved) NVME Cache Pool Errored - Unmountable: No File System

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...