High CPU, broken SAMBA - server down


Recommended Posts

Hi All.

 

I've been trying to track down this intermittent issue for the last couple of weeks.

 

** EDIT - as of an hour ago restarting unraid no longer fixes the issue, samba is constantly slow/inaccessible **

** Double edit - after running for 5hrs server has recovered and operational again **

 

It becomes apparent as samba shares from unraid will become incredibly slow, browsing folders takes ~10 seconds to load each directory and transfers to/from the server crawl to <5KB/s from each client.

 

While the transfer issues are present I have noted this process consumes 100% of 1 core and sits like that for hours:

/usr/local/sbin/shfs /mnt/user -disks 4094 -o noatime,allow_other -o remember=0

 

To get the server to resume normal operation it needs to be restarted which can be done via the GUI fine.

 

Other parts of unraid continue to work without issue, VM's are at full speed, Docker responsive, WebGUI loads fine. I can even access disks via SCP at full speed, it's just samba that dies.

 

In previous threads shfs problems seem to stem from reiserfs disks of which I have none, also no cache drive.

 

Any ideas?

 

Unraid: 6.8.3
CPU: Intel® Xeon® CPU E5-1620 @ 3.60GHz
Memory: 56 GiB DDR3 Multi-bit ECC
Network: eth0: 1000 Mbps, full duplex, mtu 1500

 

tower-diagnostics-20200617-1109.zip

Edited by nicr4wks
Link to comment

Hi there,

 

Sorry to hear you're having issues.  I'm curious if any of your VMs are taking advantage of passing through a PCI device.  If so, I'd like to ask that you disable IOMMU on your motherboard, reboot, and try again with your VMs off.  From doing some research on some items in the log, it appears that in some cases, certain hardware has been known to cause issues when IOMMU is enabled.  It would also help to get the full hardware specs from you (motherboard, etc.) to know what kind of gear we are working with.

 

It's also odd to hear that things were slow, then reboots stopped being effective at resolving the issue, then after a while everything was normal again.  That is really odd and more indicative of something else on the network than the server itself (otherwise you'd expect the behavior to remain consistent).

 

Let us know if the behavior happens again and if so, please try the test I am proposing and let us know if that has any impact.

Link to comment
5 hours ago, jonp said:

Sorry to hear you're having issues.  I'm curious if any of your VMs are taking advantage of passing through a PCI device.  If so, I'd like to ask that you disable IOMMU on your motherboard, reboot, and try again with your VMs off.  From doing some research on some items in the log, it appears that in some cases, certain hardware has been known to cause issues when IOMMU is enabled.  It would also help to get the full hardware specs from you (motherboard, etc.) to know what kind of gear we are working with.

 

It's also odd to hear that things were slow, then reboots stopped being effective at resolving the issue, then after a while everything was normal again.  That is really odd and more indicative of something else on the network than the server itself (otherwise you'd expect the behavior to remain consistent).

Thanks for taking a look.

 

It's running on a HP Z600 workstation, the motherboard is identified as "Hewlett-Packard 158A Version 0.00"

 

I have 1 device passed through to a VM which is a "RealtekCorp. RTL2838 DVB-T (0bda:2838)" connected via USB. I should also mention that when the issue occurs I have tried stopping ALL VM's and dockers and this does not change any of the symptoms at all. I'll physically remove the realtek USB device and keep the VM off over the next week for testing, I'll also note that this hardware configuration has not changed and run flawless for the last 4 years.

 

I do not believe network performance plays any part in this, I mentioned in the original post that when SMB is not responding/running slow I can use SCP over the same network from the same client computer to directly access the disks at full speed.

 

It seems like whatever the shfs process is stuck doing is the cause of the performance issue.

 

Link to comment
14 hours ago, EthanBB said:

Any updates?

My server is basically unusable because of this bug. It flip-flops between being bugged for 5-8 hours, then is fine for another 5-8h ... and then bad again.

I've heard nothing, can you SSH in to your server and run top would be interesting to see if yours has the same stuck shfs process.

 

Also tried downgrading to 6.8.2 where problem is still present.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.