July 21, 201510 yr I just took two brand new 500GB SSDs and installed them in my server, created two cache slots, assigned the SSDs and started the array. It said the first drive in the pool was unmountable and needed formatting so I did that. I started using the cache to set up some dockers, etc. but then I looked at the dashboard and something doesn't seem right. It shows that the size of the pool is 1TB (if this is RAID-1 shouldn't it be 500GB?). Used and free space for cache drive look correct but on the "pool of two devices" line it shows 516GB used which makes no sense. Also I tried stopping the array and unassigning cache 2 drive.. when I start up the array, it shows the assigned cache disk as unmountable and is asking me to format again. This can't be right. I assign cache 2 again and everything is "working" again. I tried running a balance command which seems to run ok but it has no effect on the above. After running the balance, if I go back to the cache page where I execute the balance utility it says "No balance found on '/mnt/cache'". Seems to me that based on this behavior, if a cache drive were to fail, I would not have a working cache! Could someone please let me know what's going on here? Log and screenshot attached! Thanks! log.txt
July 21, 201510 yr Makes sense to me, actually. But hey, I'm weird! The pool comprises two disks, 500GB each, so it has the size of 500GB+500GB=1000GB. How the 1000GB are used doesn't make any difference, there are still 1000GB in total in the pool. Those 1000GB can be used in different ways. Since the pool is set to RAID-1 redundancy in this case, and the pool has two disks of equal size, half of that space is used for redundancy. This leaves 500GB to use for storage, and you use about 16GB of that for dockers and stuff, which leaves a total of 500GB-16GB=484GB free and 500GB+16GB=516GB used.
July 21, 201510 yr Author Ok that logic makes sense... but why does the first cache drive show as unmountable and the system wants to format it when I unassign the cache 2 drive? Shouldn't the data still be available on cache 1? Isn't that the point of a redundant cache pool?
July 21, 201510 yr Author Further to my question above, how can I simulate a drive failure in the cache pool to test that it's working? When I unassign the cache 2 drive and restart the array the other cache disk comes up as unmountable and unraid is asking to format it. I was led to believe by limetech that the cache would still operate in a degrated performance state if a drive were to fail. Hoping someone can help me out with some answers here...
July 21, 201510 yr Community Expert Further to my question above, how can I simulate a drive failure in the cache pool to test that it's working? When I unassign the cache 2 drive and restart the array the other cache disk comes up as unmountable and unraid is asking to format it. I was led to believe by limetech that the cache would still operate in a degrated performance state if a drive were to fail. Hoping someone can help me out with some answers here... If you want to simulate a failure, then you should not unassign the disk. Instead leave it assigned and do something like removing the power of SATA cable.
July 21, 201510 yr Author Ok I can do that... will try it shortly. So then, by unassigning the cache 2 disk, when I restart the array and it shows the other cache disk as unmountable, is this normal, expected behavior?
July 22, 201510 yr Author I stopped the array, powered down and disconnected cache 2. Then booted back up and the cache was functional... looking in the log it showed that a balance was underway and when it finished printed a line stating "Tower kernel: BTRFS info (device sdf1): disk deleted missing" which I believe is correct. HOWEVER, I then powered back down, reconnected cache two, powered back up and although it shows both drives back as operational on the Main tab as before, when I click on the cache link, it states the following under the Pool information heading: Label: none uuid: 0f3b251d-d4f8-4786-9cc4-e79c77251a11 Total devices 1 FS bytes used 15.17GiB devid 1 size 465.76GiB used 20.03GiB path /dev/sdh1 btrfs-progs v4.0.1 ...and when I try to run a balance it prints the following in the log: Jul 21 20:20:28 Tower php: /sbin/btrfs balance start -dconvert=raid1 -mconvert=raid1 /mnt/cache &>/dev/null & Jul 21 20:20:28 Tower kernel: BTRFS error (device sdh1): unable to start balance with target data profile 16 Why can't I run the balance on the pool? Don't I need to re-balance now that I've added the second drive back in to the pool?
July 22, 201510 yr Author So I decided to just blow away the cache pool and start over again since I had just started playing with dockers and didn't have much to lose. I assigned one drive to cache, formatted it, stopped the array, assigned cache 2, started the array and then ran a balance. I still had dockers enabled to use a 15G image so that's why some of the space is used. Now I see the following: Label: none uuid: c15451c8-8f62-45a4-b6b1-c3eba515dc09 Total devices 2 FS bytes used 15.00GiB devid 1 size 465.76GiB used 17.03GiB path /dev/sdh1 devid 2 size 465.76GiB used 17.03GiB path /dev/sdm1 btrfs-progs v4.0.1 This makes me nervous about recovering from a failed drive in the cache pool. When I simulated the fail, the cache was still fine... but when I reconnected the "failed" drive, well, I explained what happened in my previous post. Can anyone provide some insight into this? Before I start installing and configuring production dockers and VMs I want to be sure that this all works properly and I can recover from a drive failure. Did I do something wrong when I did my drive failure simulation?
July 22, 201510 yr Community Expert There was some discussion over here. Don't know of anybody that has posted about success.
July 22, 201510 yr Author There was some discussion over here. Don't know of anybody that has posted about success. LOL I was the one asking those questions :-) Hoping Limetech can chime in on this one again!
Archived
This topic is now archived and is closed to further replies.