[6.12.4] ARRAY STOP STUCK ON "RETRY UNMOUNTING DISK SHARE(S)"


Recommended Posts

Good day all. I started my Unraid from scratch the other day due to some user error issues. All good, I got new disks for ZFS pools so I was not that disappointed. Luckily Unraid is smart and I can use the same disks so I didn't lose any data, just time setting up. I also used some backups from app Backup/Restore appdata. Worked well from what I can see. 

 

Oct  7 07:21:03 haberworld emhttpd: shcmd (819806): /usr/sbin/zpool export cache
Oct  7 07:21:03 haberworld root: cannot unmount '/mnt/cache/system': pool or dataset is busy
Oct  7 07:21:03 haberworld emhttpd: shcmd (819806): exit status: 1
Oct  7 07:21:03 haberworld emhttpd: shcmd (819807): /usr/sbin/zpool export fast
Oct  7 07:21:03 haberworld root: cannot unmount '/mnt/fast/appdata': pool or dataset is busy
Oct  7 07:21:03 haberworld emhttpd: shcmd (819807): exit status: 1
Oct  7 07:21:03 haberworld emhttpd: Retry unmounting disk share(s)...
Oct  7 07:21:08 haberworld emhttpd: Unmounting disks...
Oct  7 07:21:08 haberworld emhttpd: shcmd (819808): /usr/sbin/zpool export cache
Oct  7 07:21:08 haberworld root: cannot unmount '/mnt/cache/system': pool or dataset is busy
Oct  7 07:21:08 haberworld emhttpd: shcmd (819808): exit status: 1
Oct  7 07:21:08 haberworld emhttpd: shcmd (819809): /usr/sbin/zpool export fast
Oct  7 07:21:08 haberworld root: cannot unmount '/mnt/fast/appdata': pool or dataset is busy
Oct  7 07:21:08 haberworld emhttpd: shcmd (819809): exit status: 1
Oct  7 07:21:08 haberworld emhttpd: Retry unmounting disk share(s)...

 

My issue is that i cant ever STOP the array now. It is always stuck on "RETRY UNMOUNTING DISK SHARE(S)". After a Google search, I found this thread: https://forums.unraid.net/topic/141479-6122-array-stop-stuck-on-retry-unmounting-disk-shares/#comment-1283203 I read it and did try some of the things based on the info provided by `ljm42` , I should be able to run a command like `umount /var/lib/docker`  and as he said "The array should then stop and prevent an unclean shutdown." So looking at my logs, I did `

umount /usr/sbin/zpool

` but output was `

umount: /usr/sbin/zpool: not mounted.

` so then i tried `

umount /mnt/cache/system

` but output was `

umount: /mnt/cache/system: target is busy.

`

 

Does this mean I have to HOLD the power button on my system every time I want to stop the array?

 

I have uploaded my diagnostics file, hopefully this is enough to understand whats going on.

 

Thanks for your time.

haberworld-diagnostics-20231007-0721.zip

Link to comment
On 10/8/2023 at 5:55 AM, JorgeB said:

This doesn't make much sense:

image.png

 

It's showing as mounted, I assume

umount /mnt/cache/system/libvirt/libvirt.img 

will have the same result?

It does not work either :/ I cant at all stop the array. I just dont know what I can do about it.

 

Uploaded another diagnostics file while the array is just stuck on, "RETRY UNMOUNTING DISK SHARE(S)"

 

Screenshot 2023-10-09 at 8.04.39 AM.png

haberworld-diagnostics-20231009-0806.zip

Link to comment

According to the diags libvirt is no longer mounted, not sure why it's still listed on /dev/loop2, but the problem is related to docker, both zfs pools are failing to unmount:

 

Oct  9 08:03:26 haberworld emhttpd: shcmd (1667435): /usr/sbin/zpool export cache
Oct  9 08:03:26 haberworld root: cannot unmount '/var/lib/docker/zfs/graph/ad462dd82dfdc1afc3644458289265a6328203159fde95d5eb91f1de4afc6c6a': unmount failed
Oct  9 08:03:26 haberworld emhttpd: shcmd (1667435): exit status: 1
Oct  9 08:03:26 haberworld emhttpd: shcmd (1667436): /usr/sbin/zpool export fast
Oct  9 08:03:26 haberworld root: cannot unmount '/mnt/fast/appdata': pool or dataset is busy
Oct  9 08:03:26 haberworld emhttpd: shcmd (1667436): exit status: 1

 

Post the output of:

/etc/rc.d/rc.docker status

 

Link to comment
7 hours ago, JorgeB said:

Post the output of:

/etc/rc.d/rc.docker status

Shows "not mounted" while I am trying to STOP the array. (I did the steps before just dont have a SS of it, bit if you need it I can do another STOP the array to test it out).

Edited by NotHere
Link to comment
4 hours ago, JorgeB said:

Are you sire it's not stopped?

I mean, after I clicked on STOP the array, ran all the commands I could find on the previous thread mentioned, even navigated away from the server and went back, it still showed "Array Stopping•Retry unmounting disk share(s)..." and none of the disks showed like they usually do when the array is not mounted. 

 

This time I did notice that when I ran "losetup", only 2 options came up and not 3 like every other time. I clicked on the SHUTDOWN button to see if it would shut down the system this time, and it did. 

 

Facts are that I cant seem to get to stop the array, but if the system is shutting down, I guess thats a win for now. 

 

I will keep testing this every day twice a day see if anything changes. 

Screenshot 2023-10-10 at 8.04.51 AM.png

Screenshot 2023-10-10 at 8.11.13 AM.png

haberworld-diagnostics-20231010-0811.zip

Link to comment
On 10/10/2023 at 3:57 AM, JorgeB said:

Are you sire it's not stopped?

Good day. Any chance there is a fix for this? Am I doing something wrong?

 

I noticed that NOW that there is nothing shown when running `losetup` as it did on my previous posts, I am able to REBOOT or SHUTDOWN but I am still unable to stop the array. 

 

I downloaded another diagnostics while it was stuck in case it helps.

Screenshot 2023-10-15 at 11.12.01 AM.png

haberworld-diagnostics-20231015-1110.zip

Link to comment
4 hours ago, JorgeB said:
Oct 15 11:08:37 haberworld emhttpd: shcmd (1409596): umount /var/lib/docker
Oct 15 11:08:37 haberworld root: umount: /var/lib/docker: target is busy.

Problem still appears to be docker.

I see. I mean, as of right now I don't think its going to bother me so much. I can just restart the server and then ill be able to do whatever I need to the disks. However, is this an issue that is just for me? I haven't seen any other thread like this since 6.12.2, so I assume its just me. 

Link to comment

I have a similar issue, I guess, my cache disk - holding the docker and libvirt images - does not unmount leading to a "forced shutdown" and a parity rebuild on startup (in my case, about eleven hours :(). FYI, I use both docker and VMs.

 

The problem is, I do not have time to bring relevant information to this discussion right now. Maybe I will be able to do try a few things and provide a diagnostic file in a few days.

 

Cheers.

Link to comment
On 10/18/2023 at 2:57 AM, WillCroPoint said:

I have a similar issue, I guess, my cache disk - holding the docker and libvirt images - does not unmount leading to a "forced shutdown" and a parity rebuild on startup (in my case, about eleven hours :(). FYI, I use both docker and VMs.

 

The problem is, I do not have time to bring relevant information to this discussion right now. Maybe I will be able to do try a few things and provide a diagnostic file in a few days.

 

Cheers.

Ahh, so I am not alone. Thats kinda good news LOL. 

 

I would go back to xfs cache drive but ill keep giving this zfs a try. I never had this issue when the format was xfs though... 

Link to comment

Same issue here after upgrading to 6.12.2 and creating a zfs pool for cache. Unknown if those two are related but based on similar posts suggesting problems with the OS I am now on 6.12.4.

For me this is definitely a docker img issue which can be traced to a container error. 

The problem starts with a slow or unresponsive Unraid web Ui. I was able to isolate this to a problem with binhex-Delugevpn container becoming unresponsive that develops after days or weeks with no issues. Deluge web Ui never fully loads but all other container's work but are slow to load. 

Running "docker stop binhex-Delugevpn" gets the Unraid UI working immediately and all other containers/services/vms are normal. However, Deluge does not actually stop, will not restart, nor force update. Running docker stop again... returns "docker is not running" (or something of that nature). 

 

Stopping the docker service from the settings does stop the service and remove the Docker tab from the Unraid UI but, restarting gives error "docker service failed to start."

 

Now any attempt to stop the array puts it in the loop:

/usr/sbin/zpool export cache
cannot unmount '/mnt/cache/system': pool or dataset is busy

 

Commands ran to stop the loop

umount /var/lib/docker – returns “command not found”

umount /mnt/cache/system/libvirt/libvirt.img – returns “not mounted”

umount /usr/sbin/zpool – returns “not mounted”

umount -l /dev/loop2 – returns “not mounted”

umount /dev/loop3 - completes but does not stop the loop

 

shutdown -r now - finally kills the loop after a few minutes but starts parity check on restart for unclean shutdown. 

 

I need to capture the container log from the terminal next time to figure out why Deluge becomes unresponsive. The array will cleanly stop with VMs and containers running prior to this container error. I have followed Spaceinvaders guide for docker repair, for both the container and the complete docker img rebuild, but still the problem continues.  

 

For what it’s worth, I am using an unassigned HDD for deluge downloads and they are transferred directly to the array after completion, they do not hit the cache drive. Unraid is not reporting any disk errors on any drives.

 

 

 

 

Link to comment

Interesting. I am going to do a new clean install and I will not install deluge vpn anymore. I will see if that works at all. Yes, I have done several clean installs and then restored all my dockers images and the issue still happens. It may have to do with deluge. Ill be back when its all done and see if that works.

Link to comment
On 10/20/2023 at 4:52 PM, grateful-carcinogen6157 said:

Same issue here after upgrading to 6.12.2 and creating a zfs pool for cache. Unknown if those two are related but based on similar posts suggesting problems with the OS I am now on 6.12.4.

For me this is definitely a docker img issue which can be traced to a container error. 

The problem starts with a slow or unresponsive Unraid web Ui. I was able to isolate this to a problem with binhex-Delugevpn container becoming unresponsive that develops after days or weeks with no issues. Deluge web Ui never fully loads but all other container's work but are slow to load. 

Running "docker stop binhex-Delugevpn" gets the Unraid UI working immediately and all other containers/services/vms are normal. However, Deluge does not actually stop, will not restart, nor force update. Running docker stop again... returns "docker is not running" (or something of that nature). 

 

Stopping the docker service from the settings does stop the service and remove the Docker tab from the Unraid UI but, restarting gives error "docker service failed to start."

 

Now any attempt to stop the array puts it in the loop:

/usr/sbin/zpool export cache
cannot unmount '/mnt/cache/system': pool or dataset is busy

 

Commands ran to stop the loop

umount /var/lib/docker – returns “command not found”

umount /mnt/cache/system/libvirt/libvirt.img – returns “not mounted”

umount /usr/sbin/zpool – returns “not mounted”

umount -l /dev/loop2 – returns “not mounted”

umount /dev/loop3 - completes but does not stop the loop

 

shutdown -r now - finally kills the loop after a few minutes but starts parity check on restart for unclean shutdown. 

 

I need to capture the container log from the terminal next time to figure out why Deluge becomes unresponsive. The array will cleanly stop with VMs and containers running prior to this container error. I have followed Spaceinvaders guide for docker repair, for both the container and the complete docker img rebuild, but still the problem continues.  

 

For what it’s worth, I am using an unassigned HDD for deluge downloads and they are transferred directly to the array after completion, they do not hit the cache drive. Unraid is not reporting any disk errors on any drives.

 

 

 

 

 

Are your dockers pointing to /mnt/cache/appdata or /mnt/user/appdata? If /user do you have exclusive shares turned on and does the appdata share show as exclusive. Either pointing to /cache or turning on exclusive shares fixed the unmounting issue for me, and my issue was not exclusive to Deluge (as I have just moved to qbittorrentvpn as deluge performance drops off significantly relative to number of active downloads)

This hasn't solved my issues, I've had one crash since. However its down from 1-2 crashes per day to one in the last couple days. Syslog server didn't update the log to the designated share so I've set it to mirror to the flash and waiting for the next one.
This system has been 100% stable on 11.5 for a long time and crashing frequently after updating to 12.4 a few days ago. So I'm expecting a software bug unless 11.5 was just so much better that it was able to hide a hardware fault.

 

I've read about the issues with macvlan, I use vlans and custom networks for some dockers. Have switched to ipvlan before the last crash so interested to see what the log shows for the next. I may have to experiment with the network config further, I've just been avoiding it the last couple days as its been a long time since I worked in IT and after a decade of working the stupid hours the motorsport industry requires I'm lucky if I can remember where I live some days, so I need to do some refreshing first really...

Edited by localhost
Link to comment

I dont have Exclusive Shares on, its OFF. Seems like a feature for people who know a bit more than I do. 

 

What I want to talk about is your crashes. I never mentioned it before in my main thread, but my server crashes all the time. At least 3 times a week :/. I am going to assume its because of this issue since you have the issue as well and mentioned about the crashing. I just hope it gets fixed soon.

Link to comment

TBH I can't really help on the crashes. I'm experiencing what many others are reporting on this version of unraid. Unexplained high CPU loads, then slow GUI, usually followed by a crash in the near future. Then its fine after reboot until that repeats.
I've been asking myself for some time why I run unraid on a primary server and really I think my solution is going to be truenas. Unraid serves me well as a VM host, but over the years has been flaky as a services server, and since I've been exclusively zfs for a long time now I think its time.

PS. I just caught it bogging down and rebooted before a crash would come, still there is nothing interesting in the syslog.

Edited by localhost
Link to comment

I don’t think this is a permission issue. There is no problem shutting down the array with all containers and VMs running until the container crash.   

Had another crash this weekend and it followed the same pattern starting with Deluge. After doing research on some seemingly normal entries in the deluge logs, I came across this post which circles back to a larger issue that I don’t quite understand, but may affect other download containers? 

 

Posts by Binhex on page 393 and 395 

This seems to have been a know issue since at least April 2023 and I am not sure if it has been resolved. I only started seeing this issue after upgrading from 6.11 and setting up ZFS pool. Either way, restarted the server and installed the container version recommended in that thread. Time will tell.  

 

 

Link to comment
1 hour ago, grateful-carcinogen6157 said:

I don’t think this is a permission issue. There is no problem shutting down the array with all containers and VMs running until the container crash.   

Had another crash this weekend and it followed the same pattern starting with Deluge. After doing research on some seemingly normal entries in the deluge logs, I came across this post which circles back to a larger issue that I don’t quite understand, but may affect other download containers? 

 

Posts by Binhex on page 393 and 395 

This seems to have been a know issue since at least April 2023 and I am not sure if it has been resolved. I only started seeing this issue after upgrading from 6.11 and setting up ZFS pool. Either way, restarted the server and installed the container version recommended in that thread. Time will tell.  

 

 

 

You are having clean shutdown/reboot until a container has an error? And you can stop the array with all disks cleanly unmounting in normal operation?
I wasn't able to cleanly shutdown or stop unraid since 12.4 until bypassing fuse, since then I've had no issues with disks failing to unmount.

 

This however hasn't resolved the stability issues, its just one issue since 12.4 resolved.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.