lepis0 Posted October 17, 2022 Share Posted October 17, 2022 Hi, I changed one of my cache disks to new larger one. Now when I start array I can't stop it because it says disabled BTRFS operation is running. Syslog has thousands of rows unraid kernel: BTRFS info (device sdh1): found 1 extents, stage: update data pointers Cache balance show: Balance button does nothing. Pool also looks normal: I had waited about week now for that BTRFS operation to stop. I took diagnostic zip also and it's attached. What should I do to get my cache to back normal state? unraid-diagnostics-20221017-1449.zip Quote Link to comment
JorgeB Posted October 17, 2022 Share Posted October 17, 2022 Log is spammed with Nvidia related issues but looks like the btrfs balance is stuck, and there still a device missing, type in the console: btrfs dev del missing /mnt/cache and post the output Quote Link to comment
lepis0 Posted October 17, 2022 Author Share Posted October 17, 2022 Hi, Nvidia drivers are waiting for reboot of server but I can't reboot yet because I can't stop array Here are output of command: root@unraid:~# btrfs dev del missing /mnt/cache ERROR: unable to start device remove, another exclusive operation 'device remove' in progress Quote Link to comment
JorgeB Posted October 17, 2022 Share Posted October 17, 2022 It's still trying to remove the device and it's stuck, AFAIK there's no option to cancel that, since the pool is accessible make sure backups are up to date and then reboot by typing "reboot" in the console, if it reboots post new diags after array start. Quote Link to comment
lepis0 Posted October 18, 2022 Author Share Posted October 18, 2022 Here are diagnostics attached.unraid-diagnostics-20221018-1942.zip Quote Link to comment
Solution JorgeB Posted October 18, 2022 Solution Share Posted October 18, 2022 It's still looping on deleting the missing device, first time I see this, suggest backing up and re-formatting the pool. 1 Quote Link to comment
lepis0 Posted October 18, 2022 Author Share Posted October 18, 2022 (edited) I'm backing up data now. How can I reformat the pool? edit. I managed to reformat pool. Thank you JorgeB for help Edited October 18, 2022 by lepis0 1 Quote Link to comment
atvking Posted May 9, 2023 Share Posted May 9, 2023 On 10/18/2022 at 3:36 PM, lepis0 said: I'm backing up data now. How can I reformat the pool? edit. I managed to reformat pool. Thank you JorgeB for help Can you share how you managed to reformat the pool if you were unable to stop the array? I'm having the same issue right now and it's looking like a reformat is my only option. Quote Link to comment
lepis0 Posted May 9, 2023 Author Share Posted May 9, 2023 If I remember correctly I first backup all data from pool and then stopped array, removed cache pool and created a new one for same disks and this also reformatted it. Then I restored backup data to new pool. Quote Link to comment
miicar Posted August 24, 2023 Share Posted August 24, 2023 (edited) one of my servers has been doing a disk replace on a pool for a couple weeks now...its still moving extents etc in the log, so im assuming its doing what its meant to. i guess its moving them at 5 1/4" floppy speeds? Not sure why its taking this this long (i added a 4TB drive to replace a 350GB). "btrfs fi show" tells me its moved about 48GB in the last 36 hours...i miss dial up, i think it was faster. i need to reboot my sever, but im scared to lose the pool if i do... Edited August 24, 2023 by miicar Quote Link to comment
JorgeB Posted August 24, 2023 Share Posted August 24, 2023 You can post diags to see if there's something there. Quote Link to comment
miicar Posted October 28, 2023 Share Posted October 28, 2023 (edited) On 10/18/2022 at 11:53 AM, JorgeB said: It's still looping on deleting the missing device, first time I see this, suggest backing up and re-formatting the pool. I think i am having the same issue...it's been "removing" a drive for a month, over many reboots. The raid5 (yea i know...call me crazy) BTRFS pool is made up of old retired disks with plenty of repaired sectors and stuff. The pool is not meant to be fast or incredibly safe. It worked great for my needs for the last couple years, as a storj node and surveillance camera recording (archived footage we want to keep, would get moved to the unraid array by the surveillance software); Replacing really bad drives along the way without issue (maybe a day or 2 max for the rebuilds). Now i tried just pulling a smaller HDD, as it was the only one of its size left and i was happy with the current size (and replacement disks if one died). Recently, unraid started telling me one of the other drives in the pool is missing, but its not...its still there error free and seemingly working. Now simply hitting reboot doesn't do much while this issue has been going on! The log says one line about rebooting, then it sits there till i type "powerdown -r"...then it ACTUALLY starts rebooting. I am in the process of backing up and formatting...but i cant get the array to stop, without rebooting with auto-start turned off, to add the backup-disks to a temp pool, so its taking a long time. But hopefully these logs show a possible bug. It could also be i don't really know what i'm doing and royally messed up some things. Anyway, here's the diags!! Thanks elmstorage-diagnostics-20231027-2153.zip (i should add that i was in the middle of trying to manually stop the array when i took these diags...so some things might be extra) Edited October 28, 2023 by miicar more info as im trying to back up and restart this pool Quote Link to comment
JorgeB Posted October 28, 2023 Share Posted October 28, 2023 6 hours ago, miicar said: but i cant get the array to stop Besides the btrfs pool, this one is also not unmounting: Oct 27 21:43:38 ELMSTORAGE root: cannot unmount '/mnt/cache/Docker': pool or dataset is busy Oct 27 21:43:38 ELMSTORAGE root: cannot unmount '/mnt/cache': pool or dataset is busy 1 Quote Link to comment
miicar Posted October 28, 2023 Share Posted October 28, 2023 8 hours ago, JorgeB said: Besides the btrfs pool, this one is also not unmounting: Oct 27 21:43:38 ELMSTORAGE root: cannot unmount '/mnt/cache/Docker': pool or dataset is busy Oct 27 21:43:38 ELMSTORAGE root: cannot unmount '/mnt/cache': pool or dataset is busy I think that is due to storj DB files being on the (SSD) cache, and there is a storj process that hates BTRFS being so slow during a rebuild/degraded drive. My guess is thats what was holding it back. I can post todays diag. Array is in production and running fine overall; still moving files off this pool to change it to a z-pool, but its painfully slow moving. elmstorage-diagnostics-20231028-1251.zip Quote Link to comment
JorgeB Posted October 29, 2023 Share Posted October 29, 2023 Mover can be very slow with many small files. 1 Quote Link to comment
miicar Posted October 30, 2023 Share Posted October 30, 2023 i have avoided mover for that reason...trying to use the file manager plug in (since krusader doesn't seem to like to access some of the folders). its been at 95% for a day now...no movement in the space on either disk tho... Quote Link to comment
miicar Posted November 2, 2023 Share Posted November 2, 2023 (edited) figured out how to stop the "remove" procedure, so the backup is going a tiny bit faster now. I thought i would try removing the "missing" devid 1 that shows when i type "btrfs fi show". It has zero used space and so it should just go and allow the pool to balance properly right?? I typed, "btrfs device remove "devid 1" /mnt/(pool name)" but it tells me, "ERROR: not a block device: devid 1". I am probably typing that syntax wrong...(i'll admit, i made that line up from reading --help commands.) I would like to keep this BTRFS, but i keep running into stability issues with this fs (partly from lack of proper understanding, i am sure). Edited November 2, 2023 by miicar More accurate info. Quote Link to comment
JorgeB Posted November 2, 2023 Share Posted November 2, 2023 Try btrfs device remove missing /mnt/(pool name) Pool must be balance to a profile that allows a device removal Quote Link to comment
miicar Posted November 3, 2023 Share Posted November 3, 2023 (edited) 13 hours ago, JorgeB said: Try btrfs device remove missing /mnt/(pool name) ok yea I tried that last night too...this was its reply: :~# btrfs device remove missing /mnt/Servernstorj ERROR: cannot access '/mnt/Servernstorj': No such file or directory Tried with all caps, as it shows and all lowercase...no dice. I also tried pulling the disks out of the pool assignment, and putting them in UD to mount (something that i done before to rescue my last crashed pool). It wouldn't mount; in UD or by CLI. Something about missing profile tree...(i should have took a screenshot). So i put it back as it was in the pool, and it mounts immediately (degraded and slow of course). ==== This pool is going to get smashed and rebuilt as soon as the painfully slow file move is done, but if ya'll wanna de-bug it before i do that i'm game to play along! Or maybe my use-case is so rare that its not really worth troubleshooting for the average user (mixed bag of random sized, not so healthy sata, and sometimes even IDE drives, in the forbidden BTRFS raid5). I have multiple pools that are for production, and contain perfect drives that get swapped out at the first sign of distress, those pools don't give me any issues (although im moving to the ZFS now for most of the pools now that unraid supports it). I don't know if this is more a BTRFS issue, or how unraid is asking it to do things. This is the second time i have had a drive removal end up destroying a BTRFS pool in unraid, using the listed methods to replace or remove a drive, through the GUI. ==== My personal takeaway in all this, is that i'm going to do these operations through CLI going forward. I think that will entice me to look at the status of the pool between steps, and catch potential issues, before i compound more issues on top; something unraids gui doesn't really give you much insight on between rebuilds. and really, i don't want it too give more info in the GUI side of things...more stuff happening in the background means slower OS in the long run. i also wonder if that will allow me to manage pools whilst the rest of the system is still live...which would be wonderful. I choose this path. Guess I gotta learn the proper way to walk it like a proper Penguin! Thx, C Edited November 3, 2023 by miicar more rambling Quote Link to comment
JorgeB Posted November 3, 2023 Share Posted November 3, 2023 Please post the diagnostics Quote Link to comment
miicar Posted November 3, 2023 Share Posted November 3, 2023 (edited) so. late last night i accidentally rebooted this server, attempting to reboot another server (got my browsers and sleep mixed up). After reboot, the BTRFS operation started automatically again, i had to go to work, so i left it. Just got home, its done! watch -n 10 sudo btrfs fi show Label: 'Emergency_ONLY_UNraid_Spare_Drive_(SmartErrors)' uuid: c9fdb522-0c4c-4b78-89e7-c9518d596bf1 Total devices 5 FS bytes used 1.77TiB devid 2 size 931.51GiB used 552.00GiB path /dev/sdh1 devid 4 size 931.51GiB used 552.00GiB path /dev/sdi1 devid 7 size 931.51GiB used 552.03GiB path /dev/sdb1 devid 8 size 3.64TiB used 569.03GiB path /dev/sdj1 devid 9 size 3.64TiB used 467.00GiB path /dev/sdl1 No longer shows a missing device. The space used isn't the same across like sized disks, as i would expect it to be. Might attempt a balance and see what happens. Is there things i can check so i can trust this pool without formatting and starting over? It's still moving files uselessly slow, but i think that its is partly due to the tiny file size of each file as well as one of the drives might be quitting. Here is today's diag... elmstorage-diagnostics-20231103-1917.zip Edited November 4, 2023 by miicar Quote Link to comment
JorgeB Posted November 4, 2023 Share Posted November 4, 2023 Looks fine to me, you can run a balance but IMHO not much point, it's pretty well balanced, just one drive has more metadata. 1 Quote Link to comment
miicar Posted November 4, 2023 Share Posted November 4, 2023 (edited) 10 hours ago, JorgeB said: Looks fine to me, you can run a balance but IMHO not much point, it's pretty well balanced, just one drive has more metadata. fair enough. I guess the question remains, can i trust this pool again? or should i keep moving the millions of tiny files off the pool and rebuild it? I'm trying to figure out why i cant break the 20MB/s R/W wall...cant even get close most of the time (and not the tiny files, just in general). (i was wrong about the speed issues. Seems to be able to go over 100MB/s again. Which, for the drives im using, is fine). It seems (from my own and reading through others experiences), removing a drive from a BTRFS pool is a risky move to do. Sometimes the FS doesn't wanna let it go (its happened to me on 2 different pools now). Drive adds and replacement, while slow in raid5, have completed without issue. But any issue i have had with BTRFS revolves around removing a drive. It always seems to leave stuff dangling behind, in my experience. Then (before i knew to check), i would add/swap another drive and all hell would break loose! Edited November 4, 2023 by miicar Corrected my lies, and telling some more Quote Link to comment
JorgeB Posted November 5, 2023 Share Posted November 5, 2023 17 hours ago, miicar said: It seems (from my own and reading through others experiences), removing a drive from a BTRFS pool is a risky move to do. Using btrfs raid5/6 is risky in general, since it's considered experimental, unless you really need the flexibility and can live with the risk I would recommend converting to zfs raidz. 1 Quote Link to comment
miicar Posted November 5, 2023 Share Posted November 5, 2023 8 hours ago, JorgeB said: Using btrfs raid5/6 is risky in general, since it's considered experimental, unless you really need the flexibility and can live with the risk I would recommend converting to zfs raidz. Yea, the reason this pool exists is to make use the most space of random HDDs. I accept the risk and only use it for non-critical data (steam games, printer scans, wallpaper share, surveillance camera cache, etc). Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.