NotHere Posted October 7, 2023 Share Posted October 7, 2023 Good day all. I started my Unraid from scratch the other day due to some user error issues. All good, I got new disks for ZFS pools so I was not that disappointed. Luckily Unraid is smart and I can use the same disks so I didn't lose any data, just time setting up. I also used some backups from app Backup/Restore appdata. Worked well from what I can see. Oct 7 07:21:03 haberworld emhttpd: shcmd (819806): /usr/sbin/zpool export cache Oct 7 07:21:03 haberworld root: cannot unmount '/mnt/cache/system': pool or dataset is busy Oct 7 07:21:03 haberworld emhttpd: shcmd (819806): exit status: 1 Oct 7 07:21:03 haberworld emhttpd: shcmd (819807): /usr/sbin/zpool export fast Oct 7 07:21:03 haberworld root: cannot unmount '/mnt/fast/appdata': pool or dataset is busy Oct 7 07:21:03 haberworld emhttpd: shcmd (819807): exit status: 1 Oct 7 07:21:03 haberworld emhttpd: Retry unmounting disk share(s)... Oct 7 07:21:08 haberworld emhttpd: Unmounting disks... Oct 7 07:21:08 haberworld emhttpd: shcmd (819808): /usr/sbin/zpool export cache Oct 7 07:21:08 haberworld root: cannot unmount '/mnt/cache/system': pool or dataset is busy Oct 7 07:21:08 haberworld emhttpd: shcmd (819808): exit status: 1 Oct 7 07:21:08 haberworld emhttpd: shcmd (819809): /usr/sbin/zpool export fast Oct 7 07:21:08 haberworld root: cannot unmount '/mnt/fast/appdata': pool or dataset is busy Oct 7 07:21:08 haberworld emhttpd: shcmd (819809): exit status: 1 Oct 7 07:21:08 haberworld emhttpd: Retry unmounting disk share(s)... My issue is that i cant ever STOP the array now. It is always stuck on "RETRY UNMOUNTING DISK SHARE(S)". After a Google search, I found this thread: https://forums.unraid.net/topic/141479-6122-array-stop-stuck-on-retry-unmounting-disk-shares/#comment-1283203 I read it and did try some of the things based on the info provided by `ljm42` , I should be able to run a command like `umount /var/lib/docker` and as he said "The array should then stop and prevent an unclean shutdown." So looking at my logs, I did ` umount /usr/sbin/zpool ` but output was ` umount: /usr/sbin/zpool: not mounted. ` so then i tried ` umount /mnt/cache/system ` but output was ` umount: /mnt/cache/system: target is busy. ` Does this mean I have to HOLD the power button on my system every time I want to stop the array? I have uploaded my diagnostics file, hopefully this is enough to understand whats going on. Thanks for your time. haberworld-diagnostics-20231007-0721.zip Quote Link to comment
JorgeB Posted October 7, 2023 Share Posted October 7, 2023 See if this helps: https://forums.unraid.net/topic/141479-6122-array-stop-stuck-on-retry-unmounting-disk-shares/?do=findComment&comment=1281063 Quote Link to comment
NotHere Posted October 7, 2023 Author Share Posted October 7, 2023 (edited) Thanks for the info. I had tried that before. I should have posted it. This is what I got, it seems that libvirt might still be running? I tried some commands but still nothing. Edited October 7, 2023 by NotHere Quote Link to comment
JorgeB Posted October 8, 2023 Share Posted October 8, 2023 This doesn't make much sense: It's showing as mounted, I assume umount /mnt/cache/system/libvirt/libvirt.img will have the same result? Quote Link to comment
NotHere Posted October 9, 2023 Author Share Posted October 9, 2023 On 10/8/2023 at 5:55 AM, JorgeB said: This doesn't make much sense: It's showing as mounted, I assume umount /mnt/cache/system/libvirt/libvirt.img will have the same result? It does not work either I cant at all stop the array. I just dont know what I can do about it. Uploaded another diagnostics file while the array is just stuck on, "RETRY UNMOUNTING DISK SHARE(S)" haberworld-diagnostics-20231009-0806.zip Quote Link to comment
JorgeB Posted October 9, 2023 Share Posted October 9, 2023 According to the diags libvirt is no longer mounted, not sure why it's still listed on /dev/loop2, but the problem is related to docker, both zfs pools are failing to unmount: Oct 9 08:03:26 haberworld emhttpd: shcmd (1667435): /usr/sbin/zpool export cache Oct 9 08:03:26 haberworld root: cannot unmount '/var/lib/docker/zfs/graph/ad462dd82dfdc1afc3644458289265a6328203159fde95d5eb91f1de4afc6c6a': unmount failed Oct 9 08:03:26 haberworld emhttpd: shcmd (1667435): exit status: 1 Oct 9 08:03:26 haberworld emhttpd: shcmd (1667436): /usr/sbin/zpool export fast Oct 9 08:03:26 haberworld root: cannot unmount '/mnt/fast/appdata': pool or dataset is busy Oct 9 08:03:26 haberworld emhttpd: shcmd (1667436): exit status: 1 Post the output of: /etc/rc.d/rc.docker status Quote Link to comment
NotHere Posted October 9, 2023 Author Share Posted October 9, 2023 (edited) 7 hours ago, JorgeB said: Post the output of: /etc/rc.d/rc.docker status Shows "not mounted" while I am trying to STOP the array. (I did the steps before just dont have a SS of it, bit if you need it I can do another STOP the array to test it out). Edited October 9, 2023 by NotHere Quote Link to comment
JorgeB Posted October 10, 2023 Share Posted October 10, 2023 11 hours ago, NotHere said: Shows "not mounted" Are you sire it's not stopped? Quote Link to comment
NotHere Posted October 10, 2023 Author Share Posted October 10, 2023 4 hours ago, JorgeB said: Are you sire it's not stopped? I mean, after I clicked on STOP the array, ran all the commands I could find on the previous thread mentioned, even navigated away from the server and went back, it still showed "Array Stopping•Retry unmounting disk share(s)..." and none of the disks showed like they usually do when the array is not mounted. This time I did notice that when I ran "losetup", only 2 options came up and not 3 like every other time. I clicked on the SHUTDOWN button to see if it would shut down the system this time, and it did. Facts are that I cant seem to get to stop the array, but if the system is shutting down, I guess thats a win for now. I will keep testing this every day twice a day see if anything changes. haberworld-diagnostics-20231010-0811.zip Quote Link to comment
NotHere Posted October 15, 2023 Author Share Posted October 15, 2023 On 10/10/2023 at 3:57 AM, JorgeB said: Are you sire it's not stopped? Good day. Any chance there is a fix for this? Am I doing something wrong? I noticed that NOW that there is nothing shown when running `losetup` as it did on my previous posts, I am able to REBOOT or SHUTDOWN but I am still unable to stop the array. I downloaded another diagnostics while it was stuck in case it helps. haberworld-diagnostics-20231015-1110.zip Quote Link to comment
JorgeB Posted October 16, 2023 Share Posted October 16, 2023 Oct 15 11:08:37 haberworld emhttpd: shcmd (1409596): umount /var/lib/docker Oct 15 11:08:37 haberworld root: umount: /var/lib/docker: target is busy. Problem still appears to be docker. Quote Link to comment
NotHere Posted October 16, 2023 Author Share Posted October 16, 2023 4 hours ago, JorgeB said: Oct 15 11:08:37 haberworld emhttpd: shcmd (1409596): umount /var/lib/docker Oct 15 11:08:37 haberworld root: umount: /var/lib/docker: target is busy. Problem still appears to be docker. I see. I mean, as of right now I don't think its going to bother me so much. I can just restart the server and then ill be able to do whatever I need to the disks. However, is this an issue that is just for me? I haven't seen any other thread like this since 6.12.2, so I assume its just me. Quote Link to comment
JorgeB Posted October 16, 2023 Share Posted October 16, 2023 3 minutes ago, NotHere said: is this an issue that is just for me? It's not a common issue with v6.12.4. Quote Link to comment
NotHere Posted October 16, 2023 Author Share Posted October 16, 2023 7 hours ago, JorgeB said: It's not a common issue with v6.12.4. Understood. Ill just wait for the next update and see if I still have the same issue thanks for your time. Quote Link to comment
WillCroPoint Posted October 18, 2023 Share Posted October 18, 2023 I have a similar issue, I guess, my cache disk - holding the docker and libvirt images - does not unmount leading to a "forced shutdown" and a parity rebuild on startup (in my case, about eleven hours ). FYI, I use both docker and VMs. The problem is, I do not have time to bring relevant information to this discussion right now. Maybe I will be able to do try a few things and provide a diagnostic file in a few days. Cheers. Quote Link to comment
NotHere Posted October 19, 2023 Author Share Posted October 19, 2023 On 10/18/2023 at 2:57 AM, WillCroPoint said: I have a similar issue, I guess, my cache disk - holding the docker and libvirt images - does not unmount leading to a "forced shutdown" and a parity rebuild on startup (in my case, about eleven hours ). FYI, I use both docker and VMs. The problem is, I do not have time to bring relevant information to this discussion right now. Maybe I will be able to do try a few things and provide a diagnostic file in a few days. Cheers. Ahh, so I am not alone. Thats kinda good news LOL. I would go back to xfs cache drive but ill keep giving this zfs a try. I never had this issue when the format was xfs though... Quote Link to comment
localhost Posted October 20, 2023 Share Posted October 20, 2023 (edited) This has been my experience too since going 11.5 to 12.4, haven't had a clean shut down since. My drive formats/config are as they were before the update. Been running zfs without issue before 6.12.x for a long time. Cache is btrfs though Edited October 20, 2023 by localhost Quote Link to comment
localhost Posted October 20, 2023 Share Posted October 20, 2023 (edited) I'm wondering if 6.12 has a FuseFS issue. Enabling exclusive shares in the global share settings has allowed me to cleanly stop the array for the first time since updating from 6.11.5 Edited October 20, 2023 by localhost Quote Link to comment
grateful-carcinogen6157 Posted October 20, 2023 Share Posted October 20, 2023 Same issue here after upgrading to 6.12.2 and creating a zfs pool for cache. Unknown if those two are related but based on similar posts suggesting problems with the OS I am now on 6.12.4. For me this is definitely a docker img issue which can be traced to a container error. The problem starts with a slow or unresponsive Unraid web Ui. I was able to isolate this to a problem with binhex-Delugevpn container becoming unresponsive that develops after days or weeks with no issues. Deluge web Ui never fully loads but all other container's work but are slow to load. Running "docker stop binhex-Delugevpn" gets the Unraid UI working immediately and all other containers/services/vms are normal. However, Deluge does not actually stop, will not restart, nor force update. Running docker stop again... returns "docker is not running" (or something of that nature). Stopping the docker service from the settings does stop the service and remove the Docker tab from the Unraid UI but, restarting gives error "docker service failed to start." Now any attempt to stop the array puts it in the loop: /usr/sbin/zpool export cache cannot unmount '/mnt/cache/system': pool or dataset is busy Commands ran to stop the loop umount /var/lib/docker – returns “command not found” umount /mnt/cache/system/libvirt/libvirt.img – returns “not mounted” umount /usr/sbin/zpool – returns “not mounted” umount -l /dev/loop2 – returns “not mounted” umount /dev/loop3 - completes but does not stop the loop shutdown -r now - finally kills the loop after a few minutes but starts parity check on restart for unclean shutdown. I need to capture the container log from the terminal next time to figure out why Deluge becomes unresponsive. The array will cleanly stop with VMs and containers running prior to this container error. I have followed Spaceinvaders guide for docker repair, for both the container and the complete docker img rebuild, but still the problem continues. For what it’s worth, I am using an unassigned HDD for deluge downloads and they are transferred directly to the array after completion, they do not hit the cache drive. Unraid is not reporting any disk errors on any drives. Quote Link to comment
NotHere Posted October 20, 2023 Author Share Posted October 20, 2023 Interesting. I am going to do a new clean install and I will not install deluge vpn anymore. I will see if that works at all. Yes, I have done several clean installs and then restored all my dockers images and the issue still happens. It may have to do with deluge. Ill be back when its all done and see if that works. Quote Link to comment
localhost Posted October 21, 2023 Share Posted October 21, 2023 (edited) On 10/20/2023 at 4:52 PM, grateful-carcinogen6157 said: Same issue here after upgrading to 6.12.2 and creating a zfs pool for cache. Unknown if those two are related but based on similar posts suggesting problems with the OS I am now on 6.12.4. For me this is definitely a docker img issue which can be traced to a container error. The problem starts with a slow or unresponsive Unraid web Ui. I was able to isolate this to a problem with binhex-Delugevpn container becoming unresponsive that develops after days or weeks with no issues. Deluge web Ui never fully loads but all other container's work but are slow to load. Running "docker stop binhex-Delugevpn" gets the Unraid UI working immediately and all other containers/services/vms are normal. However, Deluge does not actually stop, will not restart, nor force update. Running docker stop again... returns "docker is not running" (or something of that nature). Stopping the docker service from the settings does stop the service and remove the Docker tab from the Unraid UI but, restarting gives error "docker service failed to start." Now any attempt to stop the array puts it in the loop: /usr/sbin/zpool export cache cannot unmount '/mnt/cache/system': pool or dataset is busy Commands ran to stop the loop umount /var/lib/docker – returns “command not found” umount /mnt/cache/system/libvirt/libvirt.img – returns “not mounted” umount /usr/sbin/zpool – returns “not mounted” umount -l /dev/loop2 – returns “not mounted” umount /dev/loop3 - completes but does not stop the loop shutdown -r now - finally kills the loop after a few minutes but starts parity check on restart for unclean shutdown. I need to capture the container log from the terminal next time to figure out why Deluge becomes unresponsive. The array will cleanly stop with VMs and containers running prior to this container error. I have followed Spaceinvaders guide for docker repair, for both the container and the complete docker img rebuild, but still the problem continues. For what it’s worth, I am using an unassigned HDD for deluge downloads and they are transferred directly to the array after completion, they do not hit the cache drive. Unraid is not reporting any disk errors on any drives. Are your dockers pointing to /mnt/cache/appdata or /mnt/user/appdata? If /user do you have exclusive shares turned on and does the appdata share show as exclusive. Either pointing to /cache or turning on exclusive shares fixed the unmounting issue for me, and my issue was not exclusive to Deluge (as I have just moved to qbittorrentvpn as deluge performance drops off significantly relative to number of active downloads) This hasn't solved my issues, I've had one crash since. However its down from 1-2 crashes per day to one in the last couple days. Syslog server didn't update the log to the designated share so I've set it to mirror to the flash and waiting for the next one. This system has been 100% stable on 11.5 for a long time and crashing frequently after updating to 12.4 a few days ago. So I'm expecting a software bug unless 11.5 was just so much better that it was able to hide a hardware fault. I've read about the issues with macvlan, I use vlans and custom networks for some dockers. Have switched to ipvlan before the last crash so interested to see what the log shows for the next. I may have to experiment with the network config further, I've just been avoiding it the last couple days as its been a long time since I worked in IT and after a decade of working the stupid hours the motorsport industry requires I'm lucky if I can remember where I live some days, so I need to do some refreshing first really... Edited October 21, 2023 by localhost Quote Link to comment
NotHere Posted October 22, 2023 Author Share Posted October 22, 2023 I dont have Exclusive Shares on, its OFF. Seems like a feature for people who know a bit more than I do. What I want to talk about is your crashes. I never mentioned it before in my main thread, but my server crashes all the time. At least 3 times a week :/. I am going to assume its because of this issue since you have the issue as well and mentioned about the crashing. I just hope it gets fixed soon. Quote Link to comment
localhost Posted October 23, 2023 Share Posted October 23, 2023 (edited) TBH I can't really help on the crashes. I'm experiencing what many others are reporting on this version of unraid. Unexplained high CPU loads, then slow GUI, usually followed by a crash in the near future. Then its fine after reboot until that repeats. I've been asking myself for some time why I run unraid on a primary server and really I think my solution is going to be truenas. Unraid serves me well as a VM host, but over the years has been flaky as a services server, and since I've been exclusively zfs for a long time now I think its time. PS. I just caught it bogging down and rebooted before a crash would come, still there is nothing interesting in the syslog. Edited October 23, 2023 by localhost Quote Link to comment
grateful-carcinogen6157 Posted October 23, 2023 Share Posted October 23, 2023 I don’t think this is a permission issue. There is no problem shutting down the array with all containers and VMs running until the container crash. Had another crash this weekend and it followed the same pattern starting with Deluge. After doing research on some seemingly normal entries in the deluge logs, I came across this post which circles back to a larger issue that I don’t quite understand, but may affect other download containers? Posts by Binhex on page 393 and 395 This seems to have been a know issue since at least April 2023 and I am not sure if it has been resolved. I only started seeing this issue after upgrading from 6.11 and setting up ZFS pool. Either way, restarted the server and installed the container version recommended in that thread. Time will tell. Quote Link to comment
localhost Posted October 23, 2023 Share Posted October 23, 2023 1 hour ago, grateful-carcinogen6157 said: I don’t think this is a permission issue. There is no problem shutting down the array with all containers and VMs running until the container crash. Had another crash this weekend and it followed the same pattern starting with Deluge. After doing research on some seemingly normal entries in the deluge logs, I came across this post which circles back to a larger issue that I don’t quite understand, but may affect other download containers? Posts by Binhex on page 393 and 395 This seems to have been a know issue since at least April 2023 and I am not sure if it has been resolved. I only started seeing this issue after upgrading from 6.11 and setting up ZFS pool. Either way, restarted the server and installed the container version recommended in that thread. Time will tell. You are having clean shutdown/reboot until a container has an error? And you can stop the array with all disks cleanly unmounting in normal operation? I wasn't able to cleanly shutdown or stop unraid since 12.4 until bypassing fuse, since then I've had no issues with disks failing to unmount. This however hasn't resolved the stability issues, its just one issue since 12.4 resolved. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.