Debian Linux machine(s) losing SMB/CIFS connection to unraid share (cache yes) when transfering a file to it.


enJOyIT

Recommended Posts

Hi,

 

a really strange problem...

 

I have a plenty of debian linux vm machines running on proxmox which are connected to several unraid user shares. They keep unexpected losing the connection to these shares. Not every share is involved at the same time, this means sometimes share1 is losing the connection and the other time share2 is involved. I can't tell when it happens... Yesterday every machine was connected to every share and today one machine loses connection to on share. The other unraid-shares of this machine are ok as well the other machines!

 

I moved to unraid from openmediavault and NEVER had such issues, so I pressume it's related to unraid.

 

If this happens the mounted folder isn't readable anymore and I get "Cannot read file" and in midnight commander the folder looks like "?serien" or "?filme".

 

I'm mounting the shares via etc/fstab:

Quote

//192.168.20.215/filme /mnt/filme cifs x-systemd.automount,username=plex,password=xxxxxxxx 0 0
//192.168.20.215/serien /mnt/serien cifs x-systemd.automount,username=plex,password=xxxxxxx 0 0
//192.168.20.215/musik /mnt/musik cifs x-systemd.automount,username=plex,password=xxxxxxx 0 0

 

Maybe there is something what I am missing?

Edited by enJOyIT
Link to comment

Today, it happend again:

 

image.png.9f7ea52ddaca262dc45560984a10aece.png

 

No Logs in my Client?!

 

It just dropped the connection. I transfered some files from disk to disk (both in array) minutes before (but nothing related to "serien"). Some files of "serien" were transfered to the cache dir in the same time. Is this related to that?!

 

 The only info from this timeframe i have, is from the unraid-server:

Jan 28 06:04:27 unraid kernel: mdcmd (49): set md_num_stripes 1280
Jan 28 06:04:27 unraid kernel: mdcmd (50): set md_queue_limit 80
Jan 28 06:04:27 unraid kernel: mdcmd (51): set md_sync_limit 5
Jan 28 06:04:27 unraid kernel: mdcmd (52): set md_write_method
Jan 28 06:09:52 unraid smbd[23687]: [2022/01/28 06:09:52.014082,  0] ../../source3/smbd/smb2_read.c:255(smb2_sendfile_send_data)
Jan 28 06:09:52 unraid smbd[23687]:   smb2_sendfile_send_data: sendfile failed for file xxxx/xxxxx/yyyyyyyy.zzz (Connection reset by peer) for client ipv4:192.168.20.221:36518. Terminating

 

client ipv4:192.168.20.221 is not the client pc with the failed connection at the top, it's another! But it has the same error:

 

image.png.5fc4a156b831df5d7886d516b71cb396.png

 

strange!

Edited by enJOyIT
Link to comment

Additional info:

 

I reconnected the shares and then startet the mover to move some date from cache to the array and bam, the next share dropped:

Unbenannt.png.46e934d97023d66c70ce3bb2ad183391.png

 

But no logs... on client or unraid server

 

It must have something todo with the mover. As I reconnected (umount "filme" and mount -a) the share (which worked), the mover was still going to move files... and in the next minute "serien" dropped...

 

"mount" on the client gives me this:

//192.168.20.215/musik on /mnt/musik type cifs (rw,relatime,vers=3.1.1,cache=strict,username=plex,uid=0,noforceuid,gid=0,noforcegid,addr=192.168.20.215,file_mode=0755,dir_mode=0755,soft,nounix,serverino,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=60,actimeo=1,x-systemd.automount)
//192.168.20.215/filme on /mnt/filme type cifs (rw,relatime,vers=3.1.1,cache=strict,username=plex,uid=0,noforceuid,gid=0,noforcegid,addr=192.168.20.215,file_mode=0755,dir_mode=0755,soft,nounix,serverino,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=60,actimeo=1,x-systemd.automount)
//192.168.20.215/serien on /mnt/serien type cifs (rw,relatime,vers=3.1.1,cache=strict,username=plex,uid=0,noforceuid,gid=0,noforcegid,addr=192.168.20.215,file_mode=0755,dir_mode=0755,soft,nounix,serverino,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=60,actimeo=1,x-systemd.automount)
r

 

What is going on here?????

 

Edited by enJOyIT
Link to comment

I can reproduce this error now. It's definitely connected to the mover/cache drive. If I have files on the cache and start the mover, the connection to the share that contains the moved files drops at the end.

 

It even drops if I just copy files (via Windows) to the mount (cache yes). Maybe it's locking the drive for a second and my linux machine thinks the mount is gone?!?!

 

Is there any one from limetech who can analyse this behaviour? Attached the diagnostic file etc... But imho there isn't really interesting in it. Because I don't get any entry in logfiles etc...

 

Similar threads (it's macos but the behaviour is the same):

Same but on reddit:

 

I need help! :/

 

P.S. I'm getting two new SSD drives (these are my cache drives) tomrorow (my current have no trim support with my HBA - LSI SAS3008) - Maybe (but I doubt) it will help?

 

unraid-diagnostics-20220128-0958.zip

Edited by enJOyIT
Link to comment
  • enJOyIT changed the title to Debian Linux machine(s) losing SMB/CIFS connection to unraid share (cache yes) when transfering a file to it.

Sorry for spamming, but I want to keep the posts separated for chronological order...

 

I enabled mover-logging, but still no helpful information:

Jan 28 11:48:24 unraid emhttpd: shcmd (94072): /usr/local/sbin/mover |& logger &
Jan 28 11:48:24 unraid root: mover: started
Jan 28 11:48:24 unraid move: move: file /mnt/cache/download/nzbget/completed/xxxxxxxxx/yyyyyyyyyy.abc
Jan 28 11:49:19 unraid root: mover: finished

 

Link to comment

I think you are running into the stale file handle issue with CIFS mounts.  What I think is happening is you are referring to a file on the cache and when it is moved the file handle has changed and you can then no longer access it.

 

When UD (Unassigned Devices Plugin) mounts a CIFS share, it uses the 'noserverino' parameter that prevents the stale file handle.

 

Example UD mount command:

/sbin/mount -t 'cifs' -o rw,noserverino,nounix,iocharset=utf8,file_mode=0777,dir_mode=0777,uid=99,gid=100,credentials='/tmp/unassigned.devices/credentials_Public' '//MEDIASERVER/Public' '/mnt/remotes/MEDIASERVER_Public'

 

Basically what the 'noserverino' does is use a local 'ino' and the not the server 'ino' which can change if the file is moved.

 

Take a look at all the parameters used here and see if any others would apply to your situation.

 

Give that a try.  Let me know how it goes and we can work on it somemore.

  • Thanks 2
Link to comment

There have been some samba changes in 6.10.  One of the latest changes is the addition of samba security.  Sorry I can't be more specifc.  There is so much going by me in development I can't keep track of it all.  If you have the stomach for it, you could run the latest 6.10rc2 or wait for 6.10 final and take a look at it then.

 

For the moment, a wait and see approach might be best,

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.