nicr4wks Posted June 17, 2020 Posted June 17, 2020 (edited) Hi All. I've been trying to track down this intermittent issue for the last couple of weeks. ** EDIT - as of an hour ago restarting unraid no longer fixes the issue, samba is constantly slow/inaccessible ** ** Double edit - after running for 5hrs server has recovered and operational again ** It becomes apparent as samba shares from unraid will become incredibly slow, browsing folders takes ~10 seconds to load each directory and transfers to/from the server crawl to <5KB/s from each client. While the transfer issues are present I have noted this process consumes 100% of 1 core and sits like that for hours: /usr/local/sbin/shfs /mnt/user -disks 4094 -o noatime,allow_other -o remember=0 To get the server to resume normal operation it needs to be restarted which can be done via the GUI fine. Other parts of unraid continue to work without issue, VM's are at full speed, Docker responsive, WebGUI loads fine. I can even access disks via SCP at full speed, it's just samba that dies. In previous threads shfs problems seem to stem from reiserfs disks of which I have none, also no cache drive. Any ideas? Unraid: 6.8.3 CPU: Intel® Xeon® CPU E5-1620 @ 3.60GHz Memory: 56 GiB DDR3 Multi-bit ECC Network: eth0: 1000 Mbps, full duplex, mtu 1500 tower-diagnostics-20200617-1109.zip Edited June 17, 2020 by nicr4wks Quote
jonp Posted June 19, 2020 Posted June 19, 2020 Hi there, Sorry to hear you're having issues. I'm curious if any of your VMs are taking advantage of passing through a PCI device. If so, I'd like to ask that you disable IOMMU on your motherboard, reboot, and try again with your VMs off. From doing some research on some items in the log, it appears that in some cases, certain hardware has been known to cause issues when IOMMU is enabled. It would also help to get the full hardware specs from you (motherboard, etc.) to know what kind of gear we are working with. It's also odd to hear that things were slow, then reboots stopped being effective at resolving the issue, then after a while everything was normal again. That is really odd and more indicative of something else on the network than the server itself (otherwise you'd expect the behavior to remain consistent). Let us know if the behavior happens again and if so, please try the test I am proposing and let us know if that has any impact. Quote
EthanBB Posted June 19, 2020 Posted June 19, 2020 (edited) I have exactly same problem. IOMMU is off in my case and I have no VMs running. I rebooted twice yesterday, but today it's slow again. alexandria-diagnostics-20200619-2224.zip Edited June 19, 2020 by EthanBB attachment Quote
nicr4wks Posted June 19, 2020 Author Posted June 19, 2020 5 hours ago, jonp said: Sorry to hear you're having issues. I'm curious if any of your VMs are taking advantage of passing through a PCI device. If so, I'd like to ask that you disable IOMMU on your motherboard, reboot, and try again with your VMs off. From doing some research on some items in the log, it appears that in some cases, certain hardware has been known to cause issues when IOMMU is enabled. It would also help to get the full hardware specs from you (motherboard, etc.) to know what kind of gear we are working with. It's also odd to hear that things were slow, then reboots stopped being effective at resolving the issue, then after a while everything was normal again. That is really odd and more indicative of something else on the network than the server itself (otherwise you'd expect the behavior to remain consistent). Thanks for taking a look. It's running on a HP Z600 workstation, the motherboard is identified as "Hewlett-Packard 158A Version 0.00" I have 1 device passed through to a VM which is a "RealtekCorp. RTL2838 DVB-T (0bda:2838)" connected via USB. I should also mention that when the issue occurs I have tried stopping ALL VM's and dockers and this does not change any of the symptoms at all. I'll physically remove the realtek USB device and keep the VM off over the next week for testing, I'll also note that this hardware configuration has not changed and run flawless for the last 4 years. I do not believe network performance plays any part in this, I mentioned in the original post that when SMB is not responding/running slow I can use SCP over the same network from the same client computer to directly access the disks at full speed. It seems like whatever the shfs process is stuck doing is the cause of the performance issue. Quote
EthanBB Posted June 22, 2020 Posted June 22, 2020 Any updates? My server is basically unusable because of this bug. It flip-flops between being bugged for 5-8 hours, then is fine for another 5-8h ... and then bad again. Quote
nicr4wks Posted June 23, 2020 Author Posted June 23, 2020 14 hours ago, EthanBB said: Any updates? My server is basically unusable because of this bug. It flip-flops between being bugged for 5-8 hours, then is fine for another 5-8h ... and then bad again. I've heard nothing, can you SSH in to your server and run top would be interesting to see if yours has the same stuck shfs process. Also tried downgrading to 6.8.2 where problem is still present. Quote
EthanBB Posted June 23, 2020 Posted June 23, 2020 (edited) 4 hours ago, nicr4wks said: ... can you SSH in to your server and run top would be interesting to see if yours has the same stuck shfs process. Screenshot from yesterday morning: https://imgur.com/IgsKqLy - yep, same as you. But I'm currently at 1331 CPU time hours. Edited June 23, 2020 by EthanBB 1 Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.