January 13, 20251 yr I wanted to replace my cache drifves. They have been running great, but the all of the drives in the pool are now 97 months old so I figured it was time to be proactive. The pool was one 512GB and two 240GB drives. Instead of a "remove" command, I chose to use "add" and then "remove". I successfully added a fourth 240GB drive (using "btrfs device add"). However, when I then try to remove the old drive, I get an error. I did a balance but it didn't help: root@Tower:~# btrfs balance start -dusage=75 /mnt/cache/ Done, had to relocate 0 out of 258 chunks root@Tower:~# btrfs fi us -T /mnt/cache Overall: Device size: 1.12TiB Device allocated: 514.06GiB Device unallocated: 633.59GiB Device missing: 0.00B Device slack: 0.00B Used: 494.13GiB Free (estimated): 325.41GiB (min: 325.41GiB) Free (statfs, df): 289.73GiB Data ratio: 2.00 Metadata ratio: 2.00 Global reserve: 426.64MiB (used: 0.00B) Multiple profiles: no Data Metadata System Id Path RAID1 RAID1 RAID1 Unallocated Total Slack -- --------- --------- --------- -------- ----------- --------- ----- 1 /dev/sdk1 92.00GiB - - 131.57GiB 223.57GiB - 3 /dev/sdf1 90.00GiB 1.00GiB - 132.57GiB 223.57GiB - 4 /dev/sdl1 255.00GiB 2.00GiB 32.00MiB 219.91GiB 476.94GiB - 5 /dev/sdo1 73.00GiB 1.00GiB 32.00MiB 149.54GiB 223.57GiB - -- --------- --------- --------- -------- ----------- --------- ----- Total 255.00GiB 2.00GiB 32.00MiB 633.59GiB 1.12TiB 0.00B Used 246.38GiB 703.03MiB 80.00KiB root@Tower:~# btrfs device remove /dev/sdk /mnt/cache ERROR: error removing device '/dev/sdk': No data available root@Tower:~# I had backed up the cache contents, but is there any way to do this without deleting and entirely recreating my cache pool?
January 13, 20251 yr Author Whoops. I had the drive not the partition... "btrfs device remove /dev/sdk1 /mnt/cache" did the trick.
January 13, 20251 yr Community Expert Note that you will need to re-import the pool for the GUI to be in sync, if you need help with that let us know, and post which release you are running.
January 13, 20251 yr Author 37 minutes ago, JorgeB said: Note that you will need to re-import the pool for the GUI to be in sync, if you need help with that let us know, and post which release you are running. Is it this? https://docs.unraid.net/unraid-os/manual/storage-management/ The device is now removed from the pool, you don't need to stop the array now, but at the next array stop you need to make Unraid forget the now-deleted member, and to achieve that: Stop the array Unassign all pool devices Start the array to make Unraid "forget" the pool config If the docker and/or VMs services were using that pool best to disable those services before start or Unraid will recreate the images somewhere else, assuming they are using /mnt/user paths) Stop array (re-enable docker/VM services if disabled above) Re-assign all pool member except the removed device Start array After removing all of the cache drives and hitting 'start', it is taking forever for the GUI to come back. If I open another browser window, I can see that the drive pool is actually online (but not the cache pool yet). The GUI is still busy and shows "Starting...". I think that this is probably ok but will give it an hour or so and see what happens. By the way, the quoted bit above was helpful but I did find one oversight (at least for me). In addition to the above points about disabling docker and VM, I discovered that I had syslog pointing to my (now missing) cache pool. I was able to disable syslog while the array is being brought up, so not that big a deal. Edited January 13, 20251 yr by tcharron
January 13, 20251 yr Author 39 minutes ago, JorgeB said: It shouldn't take a long time, post the current syslog. Here it is. fyi, I had added the dev/sdo device to the btrfs pool last night, and everything was working ok with 4 drives in the pool. I removed sdk this morning, and everything was working ok. It was only after I took the array down to tell unraid that the pool only had 3 devices that the problem started. I see in this log that the sdo became unmountable after trying to bring the array back up. That should be recoverable given the redundancy of btrfs (I confirmed it had balanced properly before taking the array down), but I think it's a good thing that I have a copy of the cache contents elsewhere! Let me know what you think. syslog.zip
January 13, 20251 yr Community Expert The last start I see is with all the pool devices unassigned, and it's full of spam.
January 13, 20251 yr Author Not sure what spam you refer to, but I suspect you mean the syslog errors. Those were all related to having the syslog daemon point to the cache drive. That activity stopped when I disabled syslog. The GUI did finally complete the process of bringing the array up, and the status now shows as "Started". There is a gap in the syslog entries of about 24 minutes: Jan 13 10:23:31 Tower emhttpd: Starting File Activity... ... [errors related to the misconfigured syslog] Jan 13 10:57:15 Tower file.activity: File Activity inotify starting Jan 13 10:57:15 Tower inotifywait[8273]: Setting up watches. Beware: since -r was given, this may take a while! Jan 13 10:57:17 Tower root: Delaying execution of fix common problems scan for 10 minutes I had 1) shut down array; 2) removed all the cache devices; 3) started array; 4) stopped array; 5) added cache devices; 6) restarted. I think I should have deleted the cache pool after step 2 and then recreated it. So I tried adding the cache pool again, including that step (ie: deleting cache pool and recreating it). Now, I get these in my log when bringing the system up: Jan 13 12:28:02 Tower emhttpd: shcmd (1687651): mkdir -p /mnt/cache Jan 13 12:28:02 Tower emhttpd: /sbin/btrfs filesystem show /dev/sdl1 2>&1 Jan 13 12:28:02 Tower emhttpd: Label: none uuid: 100735db-0e88-4450-a406-40f3efdd2bb7 Jan 13 12:28:02 Tower emhttpd: #011Total devices 3 FS bytes used 247.08GiB Jan 13 12:28:02 Tower emhttpd: #011devid 3 size 223.57GiB used 124.00GiB path /dev/sdf1 Jan 13 12:28:02 Tower emhttpd: #011devid 4 size 476.94GiB used 249.03GiB path /dev/sdl1 Jan 13 12:28:02 Tower emhttpd: #011devid 5 size 223.57GiB used 124.03GiB path /dev/sdo1 Jan 13 12:28:02 Tower emhttpd: /mnt/cache uuid: 100735db-0e88-4450-a406-40f3efdd2bb7 Jan 13 12:28:02 Tower emhttpd: shcmd (1687652): mount -t btrfs -o noatime,space_cache=v2 -U 100735db-0e88-4450-a406-40f3efdd2bb7 /mnt/cache Jan 13 12:28:02 Tower root: mount: /mnt/cache: wrong fs type, bad option, bad superblock on /dev/sdk1, missing codepage or helper program, or other error. What is surprising is the reference to "bad superblock on /dev/sdk1". SDK1 is the disk that was removed from the pool several hours earlier: Jan 13 09:20:19 Tower kernel: BTRFS info (device sdk1): device deleted: /dev/sdk1 This is a bit academic for me since I can recover from backup of the cache, but solving it may be helpful for someone else. Let me know if there's any other info or tests that I can help with.
January 13, 20251 yr Author The rsyslog was turned off before the syslog above was extracted. I've just rebuilt the cache pool from scratch and it's working now. I'll reboot tonight as I need to remove some physical drives. I'll post back here if it is still slow to bring the array up.
January 15, 20251 yr Author I am running 6.12.10. When the array is brought online, it still takes forever. The array and cache are up, but something is causing a huge pause in activity, and the SMB shares are not available for around 30 minutes or more. I looked at the "main" page in a browser, and can see the 'reads' count rising slowly for each of my drives (sequentially, not all at once). I was able to use 'ps' and figured out that it is caused by these processes: UID PID PPID C STIME TTY TIME CMD root 4277 4276 0 23:34 ? 00:00:01 find /mnt/disk4 -type d root 4276 22055 0 23:34 ? 00:00:00 /bin/bash /usr/local/emhttp/plugins/file.activity/scripts/rc.file.activity start root 22055 22054 0 23:14 ? 00:00:00 /bin/bash /usr/local/emhttp/plugins/file.activity/scripts/rc.file.activity start root 22054 21657 0 23:14 ? 00:00:00 /bin/bash /usr/local/emhttp/plugins/file.activity/event/disks_mounted disks_mounted root 21657 10575 0 23:14 ? 00:00:00 /bin/bash /usr/local/sbin/emhttp_event disks_mounted root 10575 1 0 Jan13 ? 00:04:13 /usr/local/bin/emhttpd I could see the 'find' command above progressing through each of my drives, and the 'read' counts risin gin the web guii. Once that finished with the last drive, the array status in the web gui changed from "starting..." to "started", and the SMB shares become available. My syslog showed a 34 minute gap just now as I brought the array up: Jan 14 23:14:15 Tower emhttpd: Starting File Activity... Jan 14 23:14:16 Tower rsyslogd: [origin software="rsyslogd" swVersion="8.2102.0" x-pid="22046" x-info="https://www.rsyslog.com"] start Jan 14 23:48:07 Tower file.activity: File Activity inotify starting Jan 14 23:48:07 Tower inotifywait[25916]: Setting up watches. Beware: since -r was given, this may take a while! Jan 14 23:48:09 Tower unassigned.devices: Mounting 'Auto Mount' Devices... Jan 14 23:48:09 Tower emhttpd: Starting services... Jan 14 23:48:09 Tower emhttpd: shcmd (202405): /etc/rc.d/rc.samba restart Jan 14 23:48:11 Tower root: Starting Samba: /usr/sbin/smbd -D I wonder if the problem is file.activity and the related inotify? That message didn't show up in the log until after the 34 minute pause, but that seems like the kind of thing that would be scanning files.
January 15, 20251 yr Community Expert Try booting in safe mode first to rule out any plugin issues, if the same post new diags after the server is working normally.
January 16, 20251 yr Author The "slow array start" problem is due to a problem with the file.activity plugin. That plugin was updated November 25 2024, but users will not be affected until the array is restarted. I described the issue here: https://forums.unraid.net/topic/54808-file-activity-plugin-how-can-i-figure-out-what-keeps-spinning-up-my-disks/page/17/#findComment-1512201
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.