alexricher Posted March 5, 2016 Share Posted March 5, 2016 Good day Unraid Community, I've been running Unraid with a cache drive (1TB WD Black) for a long. At some point, I've added a 500GB SSD for a total of 2 drives when the cache pool was introduced. 1TB + 500GB = 750GB. Recently, the WD 1TB drive has been giving me tons of errors in syslog and SMART is telling me it's slowly dying with pending sectors and offline uncorrectable errors. I'd like to remove the drive of the cache pool and go back to only 1 drive: 500GB SSD. I don't care to lose the redundancy as I've done a backup for the moment. Now the issue: I cannot find for the life of me how to remove the drive of the cache pool successfully. I've tried reading multiple forum post over the last week and most of the people end up replacing their drive. There was a thread a while ago (https://lime-technology.com/forum/index.php?topic=39774.0) that was closer to my need and had replied but no one answered back to it as it was probably too old. This morning, my dockers and VM is not working anymore due to the faulty drive. I've tried the steps mentioned in the threads I've found but whenever I remove the 1st drive (1TB), I cannot move the cache pool back to 1 drive. It remains at 2. It cannot be rebalance if both drives aren't present. If I start it without the 2 drives, the cache won't mount. The syslog is giving me tons of errors about the drive so for the last week, I've been running without dockers/VMs but I'd like to fix this now. Would anyone would be kind enough to point me in the right direction? I love Unraid and it's only when things start acting up that you realize how dependant we are on our Unraid servers. Don't hesitate if you need more info to help me solve this issue. I'll be spending the day trying to fix this... Thanks for your help and have a great day! Link to comment
trurl Posted March 5, 2016 Share Posted March 5, 2016 How full was your cache pool? You really only had 500GB with that configuration regardless of what unRAID reported. Search for johnnie.black posts about btrfs cache. He has posted a lot about this. See search tips in my sig. Link to comment
alexricher Posted March 5, 2016 Author Share Posted March 5, 2016 Thanks for your reply trurl! My cache was almost full, I probably used >400GB. I've backup/removed almost everything out of the cache drive and I'm now trying to do a rebalance to see if it'll help. I'll do a search as suggested hoping to find more useful information and report back if it still isn't clear. Link to comment
Helmonder Posted March 5, 2016 Share Posted March 5, 2016 You need to verify this but as far as I understand the btrfs cachepool will make sure every file is written on two different media.. That would mean you can just remove the failing drive and add another.. Link to comment
JorgeB Posted March 5, 2016 Share Posted March 5, 2016 If you try to start the array with one cache pool disk unassigned you get unmountable cache, you have to physically disconnect the disk from the server (or format). Link to comment
Helmonder Posted March 5, 2016 Share Posted March 5, 2016 The pool will then think the drive has failed.. That is how the redundancy should work.. Then add another drive to get it back.. I am not sure how you would continue with only one drive.. Link to comment
JorgeB Posted March 5, 2016 Share Posted March 5, 2016 The pool will then think the drive has failed.. That is how the redundancy should work.. Then add another drive to get it back.. I am not sure how you would continue with only one drive.. If you remove one disk from the pool (don’t forget it has to be disconnected) and start array, cache will be rebalanced to the single disk, you can in the future add another disk and it will again rebalance. Link to comment
Helmonder Posted March 5, 2016 Share Posted March 5, 2016 ahaa.. so you will never be in a "degraded state".. with one disk less it will just "transform" to a single disk "pool" ? Link to comment
JorgeB Posted March 5, 2016 Share Posted March 5, 2016 ahaa.. so you will never be in a "degraded state".. with one disk less it will just "transform" to a single disk "pool" ? It will be in a degraded state if one pool disk drops offline, but contrary to what happens with an array disk, it won’t redball, easiest way to tell is if one disk stops showing temp info, but it you remove the failed disk and start the array without it then it will automatically rebalance to a single disk. Link to comment
alexricher Posted March 5, 2016 Author Share Posted March 5, 2016 That's interesting info! Thanks johnnie.black and everyone else for this valuable info. I wasn't aware that I had to disconnect the drive from the server for it to rebalance to a single disk. I was under the impression that not assigning it would be enough to understand the disk isn't available. I must admit I find it a bit unusual that even with only 1 drive assigned as cache it remains a pool of only 1 drive. Not that I mind, I actually like this idea so whenever I'll get a new drive, I can assign it and restart redundancy. I'll disconnect the drive and report back. Thanks guys! Link to comment
alexricher Posted March 5, 2016 Author Share Posted March 5, 2016 Alright, so I've unplugged the 1TB WD HDD, start the array with only the 2nd disk present (first one mentions "Not Installed".) At first, nothing happened so I've gone in the cache settings and did a balance manually. After completed, I was able to access my cache drive's content and start my dockers! Now, the next questions: Unraid states on the Main page that my cache size is of 750GB, how's that possible if I currently only have 500GB installed? When I check the used size for all content on the cache share drive, I seem to use only 243GB, yet it tells me I only have 97GB left on the HDD. 243GB+97GB=340GB... Do I truly have 500GB available? Is there a way I can get back that missing HDD space? Will my cache always remain with in this state (where it's missing the first disk and says 750GB?) If so, is there a way to resolve this? Thanks for your continuous support! [EDIT]: Even if I seem to see 97GB left available, I cannot seem to get any more content on the drive. For instance, Sabnzbd tells me 97GB left but when I extract a 2GB Rar file, it says "disk full"! Now, I'm even more confused... Link to comment
JorgeB Posted March 5, 2016 Share Posted March 5, 2016 It should rebalance automatically, but it takes a while. When it's done space is reported correctly. Link to comment
alexricher Posted March 5, 2016 Author Share Posted March 5, 2016 It should rebalance automatically, but it takes a while. When it's done space is reported correctly. Thanks johnnie. If I do a "Balance" manually and it completed, should it reflect the right amount of space? I've noticed this: btrfs filesystem show: Label: none uuid: a6bc1b21-937a-40b1-a1d8-08ebcbb8f147 Total devices 2 FS bytes used 270.98GiB devid 2 size 465.76GiB used 273.03GiB path /dev/sdn1 *** Some devices missing btrfs-progs v4.1.2 btrfs filesystem df: Data, RAID1: total=270.00GiB, used=269.35GiB System, RAID1: total=32.00MiB, used=80.00KiB Metadata, RAID1: total=3.00GiB, used=1.63GiB GlobalReserve, single: total=512.00MiB, used=0.00B If I read correctly, it now states 270GB as my total space with 269GB used... Link to comment
JonathanM Posted March 5, 2016 Share Posted March 5, 2016 Hmm. I wonder if it's trying to maintain RAID1 fault tolerance with a single device. Link to comment
alexricher Posted March 5, 2016 Author Share Posted March 5, 2016 Hmm. I wonder if it's trying to maintain RAID1 fault tolerance with a single device. I'm no expert but I was under the same assumption... If so, this is still dangerous, no? I mean, if the drive fails, its fault tolerance will be on the same drive; therefore, no backup whatsoever. What's my next step in order to maximize my 500GB? Link to comment
JorgeB Posted March 5, 2016 Share Posted March 5, 2016 Do you see any read/write activity in the cache disk? This normally works like this: -start array with one cache disk missing -balance will begin there will be some read/write activity, it will take some time depending on ssd/hdd -when done, read/write activity stops and btrfs filesystem show will show the new number of devices (in your case should be one) and will stop saying that a device is missing. If there's no read/write activity something went wrong. Link to comment
alexricher Posted March 5, 2016 Author Share Posted March 5, 2016 I see write activity in the "Stats" tab; however, I cannot tell which drive it is. But if I look at the cache details page, I see changes in this section: Label: none uuid: a6bc1b21-937a-40b1-a1d8-08ebcbb8f147 Total devices 2 FS bytes used 267.05GiB devid 2 size 465.76GiB used 293.03GiB path /dev/sdm1 *** Some devices missing btrfs-progs v4.1.2 btrfs filesystem df: >>>>>> Data, RAID1: total=260.00GiB, used=254.27GiB <<<<<<<< Data, single: total=30.00GiB, used=11.15GiB System, single: total=32.00MiB, used=80.00KiB Metadata, RAID1: total=3.00GiB, used=1.62GiB GlobalReserve, single: total=512.00MiB, used=0.00B Every time I refresh the page, I get a new RAID1 total: Data, RAID1: total=256.00GiB, used=250.26GiB (...) Data, RAID1: total=247.00GiB, used=241.07GiB Is this what we're looking for? Link to comment
JorgeB Posted March 5, 2016 Share Posted March 5, 2016 Yes, looks like it's re-balancing, raid1 data should decrease and single data should increase, let it run. Link to comment
alexricher Posted March 5, 2016 Author Share Posted March 5, 2016 Yes, looks like it's re-balancing, raid1 data should decrease and single data should increase, let it run. Awesome, thanks johnnie.black! Will do and report if there is anything unusual. Link to comment
alexricher Posted March 5, 2016 Author Share Posted March 5, 2016 It seems to have stalled: Data, RAID1: total=238.00GiB, used=231.97GiB Data, single: total=52.00GiB, used=33.46GiB System, single: total=32.00MiB, used=80.00KiB Metadata, RAID1: total=3.00GiB, used=1.62GiB GlobalReserve, single: total=512.00MiB, used=0.00B No more [Data, single] changes, nor [Data, RAID1]. Is it safe to be assumed it's done balancing everything? I was expecting the Single total would have been bigger... Any thoughts? Link to comment
JorgeB Posted March 5, 2016 Share Posted March 5, 2016 Balance will be complete if "btfrs filesystem show" displays "Total devices 1" and no mention of "Some devices missing", if it still shows that and there's no activity something went wrong, possibly by damage to the fs caused by the bad disk, you can try stooping and starting array again, if there's no progress probably best to backup cache, reformat it and restore data. Link to comment
alexricher Posted March 6, 2016 Author Share Posted March 6, 2016 Thanks for your reply. Yeah, something must have gone wrong because it stopped balancing and after a while, I've stopped the array, restarted it and now whatever I do is "Unmountable"... Next step I guess is reformating and restarting from scratch from a backup for the cache SSD HDD? Link to comment
alexricher Posted March 6, 2016 Author Share Posted March 6, 2016 In case someone else wonders, I've rebuilt the cache from a backup and I'm now back in business. No more cache pooling for now and when I'll be ready, I'll add another drive and recreate the pool. Thanks everyone for your help, it's really appreciated! Gotta love this Unraid community. Well worth my license cost! Keep on rockin' with this nice piece of software! Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.