[ SOLVED ] pool rebalancing failing


Recommended Posts

  

I upgraded from 6.8.3 to 6.9.1 and while at it finally upgrade my flash drive from a OLD 2gig that was from sometime before 2010. VMs and dockers seem to be working fine now and main array and cache normal. I created a new pool "archive_one" with 6 * 4tb sas drives. Now trying to rebalance it to raid6 instead of default raid 1. I started it last night and it ran for over 8 hours displaying this

    Data, RAID1: total=1.00GiB, used=0.00B
    System, RAID1: total=32.00MiB, used=16.00KiB
    Metadata, RAID1: total=2.00GiB, used=128.00KiB
    GlobalReserve, single: total=3.25MiB, used=16.00KiB

btrfs balance status:

    Balance on '/mnt/archive_one' is running
    2 out of about 3 chunks balanced (3 considered),  33% left

 Since then i tried again a couple times while also watching the log (below). Sometimes it would go right to the above display and then nothing, sometimes UI showed sometimes like above but with 1 of 3 chunks 2 considered 66% left. Both UI displays had the same log output of a quick blurb and then nothing else printed out and main page there are no reads/writes on the drives which confirms that nothing is happening.

 

Apr 6 19:40:13 Raza ool www[15440]: /usr/local/emhttp/plugins/dynamix/scripts/btrfs_balance 'start' '/mnt/archive_one' ''
Apr 6 19:40:13 Raza kernel: BTRFS info (device sdak1): balance: start -d -m -s
Apr 6 19:40:13 Raza kernel: BTRFS info (device sdak1): relocating block group 9861857280 flags metadata|raid1
Apr 6 19:40:13 Raza kernel: BTRFS info (device sdak1): found 3 extents, stage: move data extents
Apr 6 19:40:13 Raza kernel: BTRFS info (device sdak1): relocating block group 9828302848 flags system|raid1
Apr 6 19:40:14 Raza kernel: BTRFS info (device sdak1): found 1 extents, stage: move data extents
Apr 6 19:40:14 Raza kernel: BTRFS info (device sdak1): relocating block group 8754561024 flags data|raid1
Apr 6 19:40:14 Raza kernel: BTRFS info (device sdak1): balance: ended with status: 0

 

 

Posting here as a new thread as reply on original post said to and that "balance start command is missing some arguments".

raza-diagnostics-20210407-0234.zip

Edited by Cull2ArcaHeresy
marked as solved
Link to comment

command/cli output

root@Raza:~# btrfs balance start -dconvert=raid6 -mconvert=raid1c3 /mnt/archive_one
Done, had to relocate 3 out of 3 chunks

log

Apr 7 03:45:46 Raza kernel: BTRFS info (device sdak1): balance: start -dconvert=raid6 -mconvert=raid1c3 -sconvert=raid1c3
Apr 7 03:45:46 Raza kernel: BTRFS info (device sdak1): setting incompat feature flag for RAID1C34 (0x800)
Apr 7 03:45:46 Raza kernel: BTRFS info (device sdak1): relocating block group 14223933440 flags metadata|raid1
Apr 7 03:45:47 Raza kernel: BTRFS info (device sdak1): found 3 extents, stage: move data extents
Apr 7 03:45:47 Raza kernel: BTRFS info (device sdak1): relocating block group 14190379008 flags system|raid1
Apr 7 03:45:47 Raza kernel: BTRFS info (device sdak1): setting incompat feature flag for RAID56 (0x80)
Apr 7 03:45:47 Raza kernel: BTRFS info (device sdak1): relocating block group 13116637184 flags data|raid1
Apr 7 03:45:47 Raza kernel: BTRFS info (device sdak1): balance: ended with status: 0

and it shows up as raid6 in pool settings + pool capacity is correct at 16 now. I have archive_two also empty if there are other things you need tested for debugging. Not putting anything in either pool yet because i'm considering switching to 3 pools of 8 drives instead of 4 pools of 6.

Link to comment

Safe mode + GUI, local browser, same results. Did it that way so there is no chance it would be any kind of plugin/extension in browser or unraid plugin.

 

When i boot into GUI or safemode+GUI, if i have a monitor plugged in and am using idrac console, after boot select menu and then part of the boot text, both are just a black screen. Not sure if that is an unraid thing or a hardware thing. Web access still works, but local does not. I did not try non-gui mode. Integrated graphics...was using the front vga on 720xd, not the rear one if that makes any difference. Dont think i ever had both going at the same time before, so cannot speak to if this is new or not. Also rare to need local access anyways, i just have default boot set to GUI so if i do need it, it is there.

 

A main array drive failed, and i dont have a replacement drive for it so am using unbalance to move the emulated contents off. At ~25mbs for the ~7tb left means a 75 hour eta. In the mean time i am leaving optional things offline to reduce array stress (like binhexrtorrentvpn). Because i wanted to resolve this drive first, i have not installed 6.9.2 yet.

 

I did go ahead and delete archive_two and add 2 of the drives to make archive_one into an 8 drive pool. Balance in web ui had same results, so used the command to balance it since i added 2 drives (even tho still empty). In a few days after having resolved the failed drive, and updating to 6.9.2, ill try again to see if the web ui calls the right command.

 

Thanks for the help so far :)

Edited by Cull2ArcaHeresy
plugin clarification
Link to comment
  • 2 weeks later...
On 4/9/2021 at 1:57 AM, JorgeB said:

If booting in safe mode doesn't help you should create a bug report.

after taking care of failed drive, clearing off 4 more of the 4tb drives that are to be in 8 drive archive_two pool, and dealing with a boot usb issue, i updated to 6.9.2 with no problems. After upgrade, i assigned the 8 drives to new pool archive_two, started array, and then used the gui to set to raid 6 and it did. Since it worked i forgot to copy the command from syslog, so just did balances from raid6 -> raid0 -> raid6. If someone else has issues maybe this could be of help.

Apr 19 23:35:10 Raza ool www[29765]: /usr/local/emhttp/plugins/dynamix/scripts/btrfs_balance 'start' '/mnt/archive_two' '-dconvert=raid6,soft -mconvert=raid1c3,soft'
Apr 19 23:35:10 Raza kernel: BTRFS info (device sdx1): balance: start -dconvert=raid6,soft -mconvert=raid1c3,soft -sconvert=raid1c3,soft
Apr 19 23:35:10 Raza kernel: BTRFS info (device sdx1): balance: ended with status: 0
Apr 19 23:37:11 Raza ool www[29765]: /usr/local/emhttp/plugins/dynamix/scripts/btrfs_balance 'start' '/mnt/archive_two' '-dconvert=raid0,soft -mconvert=raid1,soft'
Apr 19 23:37:12 Raza kernel: BTRFS info (device sdx1): balance: start -dconvert=raid0,soft -mconvert=raid1,soft -sconvert=raid1,soft
Apr 19 23:37:12 Raza kernel: BTRFS info (device sdx1): relocating block group 24894242816 flags metadata|raid1c3
Apr 19 23:37:12 Raza kernel: BTRFS info (device sdx1): relocating block group 24860688384 flags system|raid1c3
Apr 19 23:37:12 Raza kernel: BTRFS info (device sdx1): relocating block group 24827133952 flags system|raid1c3
Apr 19 23:37:12 Raza kernel: BTRFS info (device sdx1): relocating block group 18384683008 flags data|raid6
Apr 19 23:37:12 Raza kernel: BTRFS info (device sdx1): relocating block group 11942232064 flags data|raid6
Apr 19 23:37:12 Raza kernel: BTRFS info (device sdx1): relocating block group 11908677632 flags system|raid1c3
Apr 19 23:37:12 Raza kernel: BTRFS info (device sdx1): relocating block group 10834935808 flags metadata|raid1c3
Apr 19 23:37:12 Raza kernel: BTRFS info (device sdx1): found 3 extents, stage: move data extents
Apr 19 23:37:12 Raza kernel: BTRFS info (device sdx1): clearing incompat feature flag for RAID1C34 (0x800)
Apr 19 23:37:12 Raza kernel: BTRFS info (device sdx1): relocating block group 4392484864 flags data|raid6
Apr 19 23:37:12 Raza kernel: BTRFS info (device sdx1): clearing incompat feature flag for RAID56 (0x80)
Apr 19 23:37:12 Raza kernel: BTRFS info (device sdx1): balance: ended with status: 0
Apr 19 23:37:19 Raza ool www[36771]: /usr/local/emhttp/plugins/dynamix/scripts/btrfs_balance 'start' '/mnt/archive_two' '-dconvert=raid6,soft -mconvert=raid1c3,soft'
Apr 19 23:37:19 Raza kernel: BTRFS info (device sdx1): balance: start -dconvert=raid6,soft -mconvert=raid1c3,soft -sconvert=raid1c3,soft
Apr 19 23:37:19 Raza kernel: BTRFS info (device sdx1): setting incompat feature flag for RAID56 (0x80)
Apr 19 23:37:19 Raza kernel: BTRFS info (device sdx1): setting incompat feature flag for RAID1C34 (0x800)
Apr 19 23:37:19 Raza kernel: BTRFS info (device sdx1): relocating block group 54019489792 flags data|raid0
Apr 19 23:37:19 Raza kernel: BTRFS info (device sdx1): relocating block group 52945747968 flags metadata|raid1
Apr 19 23:37:19 Raza kernel: BTRFS info (device sdx1): relocating block group 52912193536 flags system|raid1
Apr 19 23:37:19 Raza kernel: BTRFS info (device sdx1): relocating block group 44322258944 flags data|raid0
Apr 19 23:37:19 Raza kernel: BTRFS info (device sdx1): relocating block group 35732324352 flags data|raid0
Apr 19 23:37:19 Raza kernel: BTRFS info (device sdx1): relocating block group 27142389760 flags data|raid0
Apr 19 23:37:20 Raza kernel: BTRFS info (device sdx1): relocating block group 27108835328 flags system|raid1
Apr 19 23:37:20 Raza kernel: BTRFS info (device sdx1): relocating block group 27075280896 flags system|raid1
Apr 19 23:37:20 Raza kernel: BTRFS info (device sdx1): relocating block group 27041726464 flags system|raid1
Apr 19 23:37:20 Raza kernel: BTRFS info (device sdx1): relocating block group 25967984640 flags metadata|raid1
Apr 19 23:37:20 Raza kernel: BTRFS info (device sdx1): found 3 extents, stage: move data extents
Apr 19 23:37:20 Raza kernel: BTRFS info (device sdx1): balance: ended with status: 0

 

 

 

Speaking of usb, is using 3 still a bad thing? I migrated from a 10+ year old 2gig to a new 64gig usb3.1 drive. After having issue, got new usb2 drives to use if need be, but didn't yet since i didnt want to blacklist the 3.1 drive if possible. Got it working by making a copy of contents, reflashing it from myservers backup, and then copying all contents back (for custom scripts and ssh keys). My local backup wasnt up to date so had to go around that way. The drive works fine, but it was not bootable until i reflashed it. I'm assuming the boot flag got unmarked somehow, which would have been an easier fix. Just is usb3 still a thing to avoid, or is that old news?

 

Setting as solved since the balance issue seems to be fine. Should it still be filed as a 6.9.1 bug for documentation sake, assuming that was the issue?

Link to comment
  • Cull2ArcaHeresy changed the title to [ SOLVED ] pool rebalancing failing
3 hours ago, Cull2ArcaHeresy said:

Speaking of usb, is using 3 still a bad thing? I migrated from a 10+ year old 2gig to a new 64gig usb3.1 drive.

 

Theoretically it should not matter but anecdotal evidence suggests USB2 is still more reliable.   I think one of the reasons may be that USB3 drives seem to run much hotter than USB2 ones and that is not good for the electronics of something left permanently plugged in.  I would think is going to improve over time as the chipsets get more energy efficient.

Link to comment
1 hour ago, itimpi said:

anecdotal evidence suggests USB2 is still more reliable

Spaceinvaderone's video was all usb3 (or at least top 3 were), but i know that was just extreme case stress testing with temperature readings. I just had the one issue that was easy to fix (will check boot flag first if happens again instead of reimaging), but if the issue is reoccurring then ill move my license to a new usb2 drive. Was hoping by now that the anecdotal-ness of it was historical and not current, but guess we're not there yet due to the amount of old hardware still in use and/or tech in the usb3 drives.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.