Unraid 7.0.0-beta2 - Array Stop & NFS issues

JorgeB · August 13

Re: issue #1, several zfs datasets are failing to unmount:

Aug 13 16:21:24 unraid root: cannot unmount '/mnt/disk5': pool or dataset is busy
Aug 13 16:21:24 unraid root: cannot unmount '/mnt/disk3/media': pool or dataset is busy
Aug 13 16:21:24 unraid root: cannot unmount '/mnt/disk3': pool or dataset is busy
Aug 13 16:21:25 unraid root: cannot unmount '/mnt/disk2/proxmox_bck': pool or dataset is busy
Aug 13 16:21:25 unraid root: cannot unmount '/mnt/disk2/proxmox': pool or dataset is busy
Aug 13 16:21:25 unraid root: cannot unmount '/mnt/disk2/media': pool or dataset is busy
Aug 13 16:21:25 unraid root: cannot unmount '/mnt/disk2': pool or dataset is busy

Possibly because of the NFS issue, can you please confirm if you can stop the array normally if you do it before issue #2 happens?

jpatriarca · August 13

I cannot stop the array normally. To be honest, since I started using unraid in beta, I couldn't stop the array normally. Anytime I tried to do it, I got the "cannot unmount" error on logs and then needed to force restart to get back to it.

I would expect any client to get a timeout and that unraid would stop the array even if files were opened. So other NFS issue must be involved in here, based on the description sent on my initial post and related with issue #1.

Even with those "cannot unmount", I can confirm all clients connected to unraid NFS service were hanged and unable to connect to the NFS share. After the restart, everything went back to normal on connected clients.

Edited August 13 by jpatriarca
Adding last paragraph with additional information and context

JorgeB · August 13

So to confirm, if you start the array, and stop it immediately, before any NFS issues, you still cannot stop it?

That would mean something else is preventing the datasets from unmounitng, if that is the case, try to rule out plugins by rebooting in safe mode and Docker and VM services by leaving them disabled initially.

jpatriarca · August 13

Still unable to stop the array, getting this at this moment as I tried to stop the array

Quote

Aug 13 18:48:32 unraid root: cannot unmount '/mnt/disk3/media': pool or dataset is busy
Aug 13 18:48:32 unraid root: cannot unmount '/mnt/disk3': pool or dataset is busy
Aug 13 18:48:32 unraid root: cannot unmount '/mnt/disk2/proxmox_bck': pool or dataset is busy
Aug 13 18:48:32 unraid root: cannot unmount '/mnt/disk2/media': pool or dataset is busy
Aug 13 18:48:32 unraid root: cannot unmount '/mnt/disk2': pool or dataset is busy

Now I need to force reboot unraid and restart again.

I will try with to reboot in safe mode without any plugins on and only with NFS shares active and try to unmount it.

I will share the result later today or tomorrow in this thread. Thanks

jpatriarca · August 13

Hi @JorgeB, I was able to proceed to test my unraid mount/unmount array behavior in safe mode and I could mount and unmount correctly, confirming that all servers are connected with NFS and with and without Docker active. Attached a new diagnostic file of this session and tests done for your analysis.

At this moment, I have a strong bet that moving out from Safe Mode and mounting the array, one of the community apps might be the root cause of this behavior outside of safe mode.

Question: Does the diagnostics file includes logs or information about the community apps installed and what might be causing this array lockdown and avoiding the array to unmount safely?

Thanks

unraid-diagnostics-20240813-2226.zip

JorgeB · August 14

If it works in safe mode it suggests a plugin issue, recommend uninstalling or disabling all plugins and then add one by one and retest.

jpatriarca · August 14

I was able to test unraid outside of Safe Mode and removed some of the plugins installed, and I could unmount the array correctly without needing to restart the server. So I'm limiting this issue to be related with NFS and the shares that are active between my servers and unraid. Somehow, unraid cannot close those in a controlled way, for some reason that logs seems to not describe. If I disable NFS at all, mount/unmount the array will work as expected, which removes docker and plugins to be the cause of the issues.

Also, I've tested using Unraid without servers connecting to it through NFS, or disabling NFS service at all, and I could unmount and mount back the array without issues.

There's more and more indications that both issues described on my initial post are related and something in NFS service in unraid are avoiding the array to be unmounted and is bringing severe hanging in NFS service that makes me to restart.

The NFS connections between my servers and unraid are a bunch of FSTAB NFS shares (2 physical and 2 virtual servers) and Proxmox mounting points to a unique NFS shares on the two physical servers that are Proxmox hosts.

During the unmount process, and after checking logs that the shares are busy, I can go to each server and unmount any NFS share activated before. Even after all those shares are deactivated, unraid keeps providing the messages that shares are still busy.

I wasn't able to see any command or tool in unraid that could tell me which are the shares and files that are avoiding unraid to unmount the array and do some debugging. IF there's any tool or log, please let me know to proceed with those tests.

JorgeB · August 14

@dlandonany ideas for how to confirm if NFS is preventing unmounting the datasets?

dlandon · August 14

2 hours ago, jpatriarca said:

I wasn't able to see any command or tool in unraid that could tell me which are the shares and files that are avoiding unraid to unmount the array and do some debugging. IF there's any tool or log, please let me know to proceed with those tests.

Install the Open FIles plugin. It will show you what files are open and you can see which ones might prevent a shutdown. You are also given the opportunity to kill the tasks holding open thise files.

I will review your diagnostics and see if I find anything.

jpatriarca · August 14

Hi @JorgeB & @dlandon, more details that might be useful:

- the root cause is on NFS service

- I isolated a case scenario to one vm mounting a NFS (Proxmox Backup Server) as the remaining one were disabled

- When unmounting the array, I get the "pool or dataset is busy" for the mounts where the NFS will read

- If I go to server and unmount the NFS share, and get unmount confirmation, but unraid will continue to stay that pool and share are busy

- I need to force restart unraid to get back to enable the array

I've worked to change NFS mount options in multiple ways to try to avoid this behavior, like hard/soft, timeo and retrans options. On the other side, I have Export settings in NFS share configurations in unraid, related with uid/gid that was migrated from the old storage anda void permission issues. Don't know if this influences and if there is another options in exports that can be used to avoid this.

What surprises me, is that the NFS service in unraid will not take timeout in consideration and will not force the closure of the mounts that are being accessed by server mounts. This is the expected behavior (with risks related with loosing data, of course) that I've seen in other platforms, like Synology.

Hope this additional information helps.

dlandon · August 14

3 hours ago, JorgeB said:

@dlandonany ideas for how to confirm if NFS is preventing unmounting the datasets?

Those shares with the 'busy' log messages are all exported with NFS. In looking at the log, it appears Unraid is doing all the right things to get the remote NFS client files released. I'm suspecting that the pool or dataset will have to be force exported to get the pool or dataset not busy. I did a little test and a zfs pool mounted remotely using NFS shows that the pool is busy even if no files are open.

You know better than I about how Unraid is unmounting zfs pools.

JorgeB · August 14

1 hour ago, dlandon said:

I did a little test and a zfs pool mounted remotely using NFS shows that the pool is busy even if no files are open.

I cannot reproduce that, I can stop the array with a share on zfs pool export over NFS mounted on a different server with UD.

jpatriarca · August 14

Thanks both for your feeedback. @JorgeB, I could manage to unmount the array with open NFS connections if files are not open. With files open, I cannot unmount the array and I see the behavior I'm describing. So unraid is trying to close the array and the NFS exports if a file is opened and cannot do it no timeout signal is sent. What don't fit in this description is my previous mentions that after unmouting all NFS mounts on my servers, doesn't do anything to unraid that keeps on a loop waiting. My expectations was that, after the NFS share is unmounted on client servers, the open files are discarded, a timeout would occur and unraid would unmount the zfs pool safely. This is not what's happening.

Based on that and to prove my point, I've done some tests this afternoon, replicating the behavior and having busy messages for multiple shares in unraid, and I've disabled NFS service manually by running command /etc/rc.d/rc.nfsd stop. After a couple of seconds (like 10 to 20) the array was unmounted safely.

I'm sharing a new diagnostics and you should see that happening in the system logs attached to it, somewhere between 14:00 and 15:00.

Please take in consideration this is only about issue #1 and issue #2 are still to be reviewed and discussed, if you don't mind.

unraid-diagnostics-20240814-1705.zip

JorgeB · August 14

OK, so the failing to unmount with open files is expected, still failing after the NFS shares are unmounted on the remote servers I don't know, not really familiar with NFS, and if the current behavior can be improved.

dlandon · August 14

I'm seeing some log emtries with regard to NFS:

Aug 14 11:03:03 unraid rc.nfsd: /usr/sbin/rpc.mountd
Aug 14 11:03:03 unraid rpc.mountd[5605]: Version 2.6.4 starting
Aug 14 11:03:03 unraid kernel: rpc-srv/tcp: nfsd: got error -32 when sending 20 bytes - shutting down socket
### [PREVIOUS LINE REPEATED 1 TIMES] ###
Aug 14 11:03:03 unraid rc.nfsd: NFS server daemon...  Started.

This indicates a problem occured with the NFS client temporarily causing NFS to shut down the socket.

I assume you are mounting the NFS shares with fstab entries. Can you post the fstab file on your client?

dlandon · August 14

As for issue #2, we are seeing this kind of issue on some Unraid 6.12 servers and are looking into it. We are not sure if this is also an Unraid 7.0 problem. The issue seems to be related to a kernel release where nfsd does not reliably terminate so a nfsd restart does not work.

Does the #2 issue occur on a clean boot with the array auto starting? Or after an array stop and restart?

jpatriarca · August 14

Sure, these are the FSTAB entries on my 4 servers:

Server 1 - Proxmox Server using CT mounting points

unraid.internal.patriarca.pt:/mnt/user/media /mnt/unraid/media nfs4 defaults,nolock,timeo=50,retrans=4 0 0

This is the mount details:

unraid.internal.patriarca.pt:/mnt/user/media on /mnt/unraid/media type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=50,retrans=4,sec=sys,clientaddr=192.168.20.21,local_lock=none,addr=192.168.20.199)

Server 2 - Proxmox Server using CT mounting points

unraid.internal.patriarca.pt:/mnt/user/media /mnt/unraid/media nfs4 defaults,nolock,timeo=50,retrans=4 0 0

This is the mount details:

unraid.internal.patriarca.pt:/mnt/user/media on /mnt/unraid/media type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=50,retrans=4,sec=sys,clientaddr=192.168.20.22,local_lock=none,addr=192.168.20.199)

Server 3 - Virtual Server with Debian

unraid.internal.patriarca.pt:/mnt/user/media /mnt/unraid/media nfs4 defaults,nolock,timeo=50,retrans=4 0 0
unraid.internal.patriarca.pt:/mnt/user/personal /mnt/unraid/personal nfs4 defaults,nolock,timeo=50,retrans=4 0 0

These are the mount details:

unraid.internal.patriarca.pt:/mnt/user/proxmox_bck on /mnt/proxmox_bck type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=50,retrans=4,sec=sys,clientaddr=192.168.10.102,local_lock=none,addr=192.168.20.199)

Server 4 - Virtual server with Proxmox Backup Server

unraid.internal.patriarca.pt:/mnt/user/proxmox_bck /mnt/proxmox_bck nfs4 defaults,nolock,timeo=50,retrans=4 0 0

In addition to the FSTAB configuration on server #1 and #2, there's a PVE storage configuration that creates another NFS share for a lvm volume, where I have some hard drive images for a unique virtual server, this is the configuration:

nfs: unraid-lvm
export /mnt/user/proxmox
path /mnt/pve/unraid-lvm
server unraid.internal.patriarca.pt
content images,iso,vztmpl
options timeo=50,retrans=4
preallocation off

This is the mount details (the same details for server #1 and server #2 that share the storage configuration through proxmox:

unraid.internal.patriarca.pt:/mnt/user/proxmox on /mnt/pve/unraid-lvm type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=50,retrans=4,sec=sys,clientaddr=192.168.20.21,local_lock=none,addr=192.168.20.199)

Hope it helps

jpatriarca · August 14

4 minutes ago, dlandon said:

As for issue #2, we are seeing this kind of issue on some Unraid 6.12 servers and are looking into it. We are not sure if this is also an Unraid 7.0 problem. The issue seems to be related to a kernel release where nfsd does not reliably terminate so a nfsd restart does not work.

Does the #2 issue occur on a clean boot with the array auto starting? Or after an array stop and restart?

I've seen the behavior one a clean boot. Since I have also the described unmount, I needed to restart the server and could unmount/mount the array. So all NFS hangs I've seen are after a clean boot and after several hours/days. The first diagnostics shared in this thread might have some info about this nfsd servie hang.

jpatriarca · August 30

Hi @dlandon and @JorgeB,

Is there any update about both issues? Where you able to test my scenario on issues #1? And is there any updates or expected correction (in next 7.0.0 beta version) of the kernel issue with NFS reported in issue #2?

I'm still facing the same issues as reported

Looking forward for your updates. Thank you

dlandon · August 30

Unraid 7.0 beta 3 will be released soon. We recommend you try it and see if the issues are resolved. There is a good chance the NFS issue (#2) will be fixed and I suspect it may also fix your first issue.

Unraid 7.0.0-beta2 - Array Stop & NFS issues

User Feedback

Recommended Comments

JorgeB 8,122

Link to comment

jpatriarca 0

Link to comment

JorgeB 8,122

Link to comment

jpatriarca 0

Link to comment

jpatriarca 0

Link to comment

JorgeB 8,122

Link to comment

jpatriarca 0

Link to comment

JorgeB 8,122

Link to comment

dlandon 1,341

Link to comment

jpatriarca 0

Link to comment

dlandon 1,341

Link to comment

JorgeB 8,122

Link to comment

jpatriarca 0

Link to comment

JorgeB 8,122

Link to comment

dlandon 1,341

Link to comment

dlandon 1,341

Link to comment

jpatriarca 0

Link to comment

jpatriarca 0

Link to comment

jpatriarca 0

Link to comment

dlandon 1,341

Link to comment

Join the conversation