Maddeen Posted July 1, 2020 Posted July 1, 2020 (edited) Hi, dont know how it happens, but one of my cache drives disappears after a reboot. After another reboot it's back but only as a "unassigned device". The other cache drive says "unmountable - no file system" I make a sneak what happens when stopping the array and attach the drive as a cache again --> Warning: All data on the drive will deleted! Now I'm stuck.. what to do now? Edited July 2, 2020 by Maddeen Quote
JorgeB Posted July 2, 2020 Posted July 2, 2020 Please post the diagnostics: Tools -> Diagnostics 1 Quote
Maddeen Posted July 2, 2020 Author Posted July 2, 2020 @johnnie.black - here are the diagnostics. Hopefully you can help me because my isos share was also on the cache pool because I thought its safe while using a pool. AppData was backuped successfully with the CA-Tool. So that's fine. But for now (logically) all shares (appdata, domains, isos and system) are gone ... And - when it gets worse and I (or you) cant restore those shares, I need some help to start from scratch. Don't think thats it just a "build new cache pool" and restore Appdata with the CA-Tool... v1ew-s0urce-diagnostics-20200702-1235.zip Quote
JorgeB Posted July 2, 2020 Posted July 2, 2020 Doesn't look very good since the pool is being detected as a single device, if it was a redundant pool (default) you can try mounting just the other device, to do that try this: Stop the array, if Docker/VM services are using the cache pool disable them, unassign the current cache device, start array to make Unraid "forget" current cache config, stop array, reassign the other cache device (there can't be an "All existing data on this device will be OVERWRITTEN when array is Started" warning for any cache device), start array. If it doesn't work post new diags. 1 Quote
Maddeen Posted July 2, 2020 Author Posted July 2, 2020 Ok -- sadly I have to work for about 3 hours before beeing home But then I'll try it. Docker and VM are not useable because this services are also configurated to run only on cache drive! So to be sure that I'm doing everything correct step for step 1) Stop array 2) unassign the current cache disk (sdb!) --> so no cache drives are configured 3) Start array with no cache disks to make unraid forget any cache config 4) Stop array 5) Attach both cache disks (sdb and sdc) --> You said "there can't be an warning" ... dont you mean "there can be an warning"?! Because I already tried to assign the unassigned drive as a cache drive yesterday and it immediatly showed me the "all existing data..." warning. 6) Start array Quote
JorgeB Posted July 2, 2020 Posted July 2, 2020 6 minutes ago, Maddeen said: 5) Attach both cache disks (sdb and sdc) No, assign only the other device ( currently sdc) 8 minutes ago, Maddeen said: You said "there can't be an warning" ... dont you mean "there can be an warning"?! I mean there can't, that's why we start the array first without any cache device assigned. 1 Quote
Maddeen Posted July 2, 2020 Author Posted July 2, 2020 (edited) @johnnie.black - done. But now I get the error "Unmountable: No pool uuid" (see screen - new diagnostics attached) Additionally - the other cache drive (sdb) is now "un-unmountable" as you can see in the screen. Everything greyed out. And I get those two mail-warnings: First: Event: Unraid Cache disk error Subject: Alert [V1EW-S0URCE] - Cache disk in error state (disk missing) Description: No device identification () Importance: alert Second: Event: Unraid Cache disk message Subject: Notice [V1EW-S0URCE] - Cache disk returned to normal operation Description: Samsung_SSD_850_EVO_500GB_S2RBNXAH243889M (sdc) Importance: normal v1ew-s0urce-diagnostics-20200702-1629.zip Edited July 2, 2020 by Maddeen Quote
JorgeB Posted July 2, 2020 Posted July 2, 2020 Yep, like suspected doesn't look good, that device was removed from the pool, and the other one has a damaged superblock, you can try these recovery options, you can try them against both devices but if it works it should be with sdb. Quote
Maddeen Posted July 2, 2020 Author Posted July 2, 2020 thx - i'll try that. If it's not possible, is there any best practice to start from the scratch? My plan would be Adding both cache drives Formating and bringing unraid in a clean status Creating folders (appdata, domains, isos and system) at the cache drive Setting up the shares Restore AppData via CA Plugin Deactivate docker service and activate it (to write new docker.img at the cache drive) Download needed dockers - all settings should be restored automatically (if I remembered right) Deactive VM service and activate it (to write new libvirt.img at cache drive) Done And just for my information, because I'm not a specialist, let me ask the following.... Do you have any idea what causes my problem? I dont want to have this again. Or is there any option to periodically backup the shares (domains, isos and system) to the array? Why is the cache pool configurated as a raid 1 but wasnt restorable? I doesnt see the benefit of a pool when - in case of crashs like mine - all data is lost... Thank you very much for all your explanations and the time you spend on helping me!! Quote
JorgeB Posted July 2, 2020 Posted July 2, 2020 16 minutes ago, Maddeen said: My plan would be Looks OK to me, docker image should be recreated. 17 minutes ago, Maddeen said: Do you have any idea what causes my problem? I dont want to have this again. The problem was cerated after starting the array with a missing cache device, this removed one of the devices from the pool, though the remaining device should still work assuming it was a redundant pool, but there's some issue with the superblock, I already asked LT to not allow the array to auto-start with a missing cache device, and hopefully that will be implemented soon, but for now and for anyone using a cache pool it's best to disable array auto-start, and always check every device is present before starting. 20 minutes ago, Maddeen said: Why is the cache pool configurated as a raid 1 but wasnt restorable? There's an issue with the remaining cache device superblock, not sure why, it's not common, a btrfs maintainer might be able to help restore it without data loss, you'd need to ask for help on IRC or the mailing list, both are mentioned on the FAQ linked earlier. 1 Quote
Maddeen Posted July 2, 2020 Author Posted July 2, 2020 (edited) Thanks again for your help. Hopefully LT will (or already has) hear your voice, because thats exactly what I thought when reading your explanation. Why does unraid perform an array autostart when it recognized that drives are missing and a potential data loss might happen? 🥴 That behavior makes the complete "pool feature" useless and suggest a redundancy which is not there - imho! Is there any feature request queue where I can give my points to that and affect the priority of implementation? Until that I'll follow your advice and disable array auto-start after power down / reboot. I'll read trough your link -- luckily there are no sensitive data on the cache and before I waste a btrfs maintainers time for "no very urgent data", I'll start from scratch. Appdata was backed-up with CA-Plugin. Docker container settings will be restored by default. Domains - I got none And the isos share only included the Win10 installation image and 2-3 virtIO.imgs. I'll post an update, if I could restore my data ... or if not.. But as I said - it`s not the end of the world!! Thank you very much!! 🤙 Edited July 2, 2020 by Maddeen Quote
JorgeB Posted July 2, 2020 Posted July 2, 2020 33 minutes ago, Maddeen said: Is there any feature request queue where I can give my points to that and affect the priority of implementation? I recently traded messages with Tom about this, but it won't hurt adding a post to the feature request. 1 Quote
Maddeen Posted July 2, 2020 Author Posted July 2, 2020 @johnnie.black - seems that I'm a lucky boy. (see screenshot) But - as you can see in the screenshot - my cmd-line skills are pretty shitty. 🙈 So before I self-destruct my luck ... is this the correct command for copying the folders to my array? cp -r /appdata/ /mnt/user/unraid_backup/appdata/ cp -r /system/ /mnt/user/unraid_backup/system/ cp -r /isos/ /mnt/user/unraid_backup/isos/ Actually there is no folder "appdata, system or isos" in the target folder "unraid_backup" - so does it work as described or do I need to first create the folders at the target first? Quote
JorgeB Posted July 2, 2020 Posted July 2, 2020 Use the Krusader docker or Midnight Commander (mc on the console), it's much easier. 1 Quote
Maddeen Posted July 2, 2020 Author Posted July 2, 2020 mhh but my dockers are on that cache drive 🙈 for now I cant even start the docker service because all necessary ressources are on the cache disks... or do you prefer that I first set up docker to run on the array - -copy files - and then move all docker ressources to the cache? Quote
JorgeB Posted July 2, 2020 Posted July 2, 2020 Not following, if you have a backup restore to the new pool, if you don't you'll need to re-create them. 1 Quote
Maddeen Posted July 2, 2020 Author Posted July 2, 2020 (edited) mhh i started midnight commander via the webterminal but not able to use "copy command" because this needs a press of F5 ... and F5 is reload page. For heavens sake .. solution right in front of my nose and not able to catch it. I'll look for a workaround to disable F5 for browser so I can use F5 to start copying... UPDATE: Got it !!! ESC + number = F-Key... In my case --> ESC+5 = F5 --- copying starts 🙌 Edited July 2, 2020 by Maddeen Quote
Maddeen Posted July 2, 2020 Author Posted July 2, 2020 @johnnie.black - ok - now i'm completely confused. I copied all data to my array. Stopped the array. Add both cache drives. Formated the one, where unRAID said "Unmountable". And ... magic ... my pool is restored. All data is back ... including VMs and Docker containers 🤔 WTF?! Currently he's doing something, because first I received this warning Event: Unraid Cache disk message Subject: Warning [V1EW-S0URCE] - Cache pool BTRFS too many profiles (You can ignore this warning when a cache pool balance operation is in progress) Description: Samsung_SSD_850_EVO_500GB_S2RBNX0HB13914M (sdb) Importance: warning I checked the system log -.- somethings going on like that Jul 2 22:04:46 v1ew-s0urce kernel: BTRFS info (device sdb1): found 703 extents Jul 2 22:04:46 v1ew-s0urce kernel: BTRFS info (device sdb1): found 703 extents Jul 2 22:04:46 v1ew-s0urce kernel: BTRFS info (device sdb1): found 703 extents Jul 2 22:04:46 v1ew-s0urce kernel: BTRFS info (device sdb1): relocating block group 198764920832 flags data Jul 2 22:04:51 v1ew-s0urce kernel: BTRFS info (device sdb1): found 584 extents Jul 2 22:04:51 v1ew-s0urce kernel: BTRFS info (device sdb1): found 584 extents Jul 2 22:04:51 v1ew-s0urce kernel: BTRFS info (device sdb1): found 584 extents Jul 2 22:04:51 v1ew-s0urce kernel: BTRFS info (device sdb1): relocating block group 197691179008 flags data Jul 2 22:04:56 v1ew-s0urce kernel: BTRFS info (device sdb1): found 1069 extents Jul 2 22:04:56 v1ew-s0urce kernel: BTRFS info (device sdb1): found 1069 extents Jul 2 22:04:56 v1ew-s0urce kernel: BTRFS info (device sdb1): relocating block group 196617437184 flags data Jul 2 22:05:01 v1ew-s0urce kernel: BTRFS info (device sdb1): found 926 extents Jul 2 22:05:01 v1ew-s0urce kernel: BTRFS info (device sdb1): found 926 extents Jul 2 22:05:01 v1ew-s0urce kernel: BTRFS info (device sdb1): relocating block group 195543695360 flags data Jul 2 22:05:06 v1ew-s0urce kernel: BTRFS info (device sdb1): found 394 extents Jul 2 22:05:07 v1ew-s0urce kernel: BTRFS info (device sdb1): found 394 extents Jul 2 22:05:07 v1ew-s0urce kernel: BTRFS info (device sdb1): relocating block group 194469953536 flags data Jul 2 22:05:12 v1ew-s0urce kernel: BTRFS info (device sdb1): found 554 extents Jul 2 22:05:12 v1ew-s0urce kernel: BTRFS info (device sdb1): found 554 extents Jul 2 22:05:12 v1ew-s0urce kernel: BTRFS info (device sdb1): relocating block group 193396211712 flags data Jul 2 22:05:17 v1ew-s0urce kernel: BTRFS info (device sdb1): found 673 extents Jul 2 22:05:17 v1ew-s0urce kernel: BTRFS info (device sdb1): found 673 extents And when clicking on the first cache drive --> btrfs balance status: is running... Is now everything fine?! Or do I need to double check some things?? Or running any self checking/repair mechanisms? I'm not sure if my current "status" is safe or if it's like a damaged car... running the last miles and the next little hickup will result in a complete crash Quote
JorgeB Posted July 3, 2020 Posted July 3, 2020 10 hours ago, Maddeen said: And ... magic ... my pool is restored. All data is back ... including VMs and Docker containers 🤔 WTF?! That's weird, please post diags if you didn't rebooted after. 10 hours ago, Maddeen said: Is now everything fine?! What I would guess is that it recovered/replaced the damaged superblock and it's now balancing the data to the other device, if it's showing data and your pool was redundant all data should be correct. Quote
Maddeen Posted July 3, 2020 Author Posted July 3, 2020 @johnnie.black - weird -- that's the correct wording No restart since yesterday - here's the new diagnostics file But what I dont understand is this information (screenshot) ... why it says that there is only 90GB (total) space and 86GB used (I've got two 500GB SSDs as cache drives at RAID1) And below it says "Data to scrub = 173GB" ... and - to fulfill the confusing - at the "Main-Tab" it says, that the cache drive is used for about 93GB. (screenshot #2) All those information doesnt fit or aren't consistent - imho... v1ew-s0urce-diagnostics-20200703-1600.zip Quote
JorgeB Posted July 3, 2020 Posted July 3, 2020 11 minutes ago, Maddeen said: why it says that there is only 90GB (total) That's normal with btrsf, it's the the allocated size, btrfs first creates empty data and metadata chunks before writing data there, so there are 90GiB of chunks on disk and 86GiB are used. 12 minutes ago, Maddeen said: And below it says "Data to scrub = 173GB" ... raid1, so 86GiB x 2 12 minutes ago, Maddeen said: to fulfill the confusing - at the "Main-Tab" it says, that the cache drive is used for about 93GB. Because the GUI uses GB, not GiB, 86Gib=93GB 1 Quote
JorgeB Posted July 3, 2020 Posted July 3, 2020 17 minutes ago, Maddeen said: weird -- that's the correct wording I can't really understand what happened, and why the pool recovered after a format attempt, I can see wiping a device failed because it was busy, that's likely part of what helped, still most likely you were just lucky, that should never happen. 1 Quote
Maddeen Posted July 3, 2020 Author Posted July 3, 2020 (edited) 23 hours ago, johnnie.black said: I can't really understand what happened .... most likely you were just lucky Haha - sometimes you only need luck .. Thanks again for your valuable feedback! Helped me a lot killing my blind spots with unraid. Edited July 4, 2020 by Maddeen 1 Quote
PicoCreator Posted June 15, 2021 Posted June 15, 2021 (edited) Faced a similar experience when a reboot with a bad sata card occured. Knocking out my entire cache array. Because no actual drives was lossed, after rewiring to another sata port (SATA 2 sadly). I am able to see the entire "btrfs filesystem" even if I unable to add them back to unraid (as they are unassigned, and shows a warning that all data will be formatted when reassigning) $ btrfs filesystem show Label: none uuid: 0bfdf8d7-1073-454b-8dec-5a03146de885 Total devices 6 FS bytes used 1.37TiB devid 2 size 111.79GiB used 37.00GiB path /dev/sdo1 devid 3 size 223.57GiB used 138.00GiB path /dev/sdm1 devid 4 size 223.57GiB used 138.00GiB path /dev/sdi1 devid 5 size 1.82TiB used 1.60TiB path /dev/sdd1 devid 6 size 1.82TiB used 1.60TiB path /dev/sde1 devid 7 size 111.79GiB used 37.00GiB path /dev/sdp1 ... there are probably other BTRFS disk drives if you have theme as well ... While attempting to remount this cache pool using the steps found at I was unfortunately faced with an error of $ mount -o degraded,usebackuproot,ro /dev/sdo1 /dev/sdm1 /dev/sdi1 /dev/sdd1 /dev/sde1 /dev/sdp1 /recovery/cache-pool mount: bad usage So alternatively I mount using the UUID (with /recovery/cache-pool being the the recovery folder i created) $ mount -o degraded,usebackuproot,ro --uuid 0bfdf8d7-1073-454b-8dec-5a03146de885 /recovery/cache-pool With that i presume i can then safely remove the drives from the cache pool (for the last 2 disk that was left), and slowly manually reorganize and recover the data. Edited June 15, 2021 by PicoCreator Quote
JorgeB Posted June 15, 2021 Posted June 15, 2021 14 minutes ago, PicoCreator said: I am able to see the entire "btrfs filesystem" even if I unable to add them back to unraid (as they are unassigned, and shows a warning that all data will be formatted when reassigning) You can do this: Stop the array, if Docker/VM services are using the cache pool disable them, unassign all cache devices, start array to make Unraid "forget" current cache config, stop array, reassign all cache devices (there can't be an "All existing data on this device will be OVERWRITTEN when array is Started" warning for any cache device), re-enable Docker/VMs if needed, start array. 1 Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.