Jump to content

SMB failling after some time


Recommended Posts

Hello,

I have an issue I can't figure out.

I have the same issue on all servers no matters who shares data, but to make it easier, we will take the following case.

I have a couple unraid servers. one of them is my main data storage. on it, I have some SMB shares so that the other unraid servers can connect to it using unassigned devices (yes I have tried nfs, but it really doesn't work better).

 

Mounting the share works well from any server and my dockers can access the share without issue.

But, from time to time (almost every day this week, last week less) the share just.... stops working.... and when I run the ls command in the remote folder, it says permission denied on some share (not all) while all are using the same credential.

 

for now, my solution is to stop the array and start it again and everything work fine (unmounting and mounting again does not work)

 

 

Here are the logs if you can help me. My main data storage is Nostromo and Halcyon is just another server trying to access it (fresh new install with not much on it).

 

for information, the error "kernel: traps: lsof[23970] general protection fault" is on all my servers so I guess it is a "normal" error ...

 

thanks

 

 

halcyon-diagnostics-20221005-0736.zip nostromo-diagnostics-20221005-0736.zip

Link to comment

Share permission just failed again (second time today).

It made me noticed time was not set correctly on one of the servers, but that shouldn't change anything.

I'll wait til next time and give new logs with the right timestamps but I can't see anything else then "kernel: traps: lsof[18502] general protection fault ip:14f91cbbd4ee sp:8da6349c47b6e45d error:0 in libc-2.36.so[14f91cba5000+16b000]"

I haven't had a problem on the other server since I moved radarr/sonarr/lidarr from it.

Could the share in the containers be the source of my problem?

 

 

Link to comment

it is now failling a couple time a day.

 I need to stop all containers, unmount and mount to make it work again. dockers has to have something to do with it...

 

on nostromo (the server with the share) I can see this

Oct 5 14:52:07 Nostromo rpc.mountd[9333]: authenticated mount request from 192.168.1.75:757 for /mnt/user/Music_data (/mnt/user/Music_data) 
Oct 5 14:52:08 Nostromo rpc.mountd[9333]: authenticated mount request from 192.168.1.75:828 for /mnt/user/Video_data (/mnt/user/Video_data) 
Oct 5 14:52:08 Nostromo rpc.mountd[9333]: authenticated mount request from 192.168.1.75:980 for /mnt/user/temp (/mnt/user/temp)

but it doesn't say it fails and I don't see anything on the client side....

 

halcyon-diagnostics-20221006-1659.zip

Edited by Nexius2
Link to comment

the protection faults error are on every server and doesn't seem to be the issue (they are always there and at any time).

I did the opposite, stopped containers one by one until I could mount a share. Krusader seems to make permission problems.... don't know why... I'll try without Krusader to see if it continues

Edited by Nexius2
Link to comment

just happened again between my last message and now....

halcyon-diagnostics-20221012-0825.zip

 

I've seen this:

Oct 12 08:09:30 Halcyon kernel: CIFS: VFS: \\NOSTROMO\temp error -11 on ioctl to get interface list Oct 12 08:09:30 Halcyon kernel: CIFS: VFS: \\NOSTROMO\Music_data error -11 on ioctl to get interface list Oct 12 08:09:30 Halcyon kernel: CIFS: VFS: \\NOSTROMO\Video_data error -11 on ioctl to get interface list

but don't know why

Edited by Nexius2
Link to comment

Looking at your diagnostics I see an issue with one server:

Oct 10 08:56:15 Halcyon unassigned.devices: Warning: shell_exec(/bin/df '/mnt/remotes/AURORA_tdownloaded' --output=size,used,avail | /bin/grep -v '1K-blocks' 2>/dev/null) took longer than 5s!

This is generally indicative of network or remote server connection issues.

 

That server is having a tough time with a CIFS mount:

Oct  9 06:46:29 Halcyon unassigned.devices: Mount SMB share '//AURORA/tdownloaded' using SMB 3.1.1 protocol.
Oct  9 06:46:29 Halcyon unassigned.devices: Mount SMB command: /sbin/mount -t cifs -o rw,noserverino,nounix,iocharset=utf8,file_mode=0777,dir_mode=0777,uid=99,gid=100,vers=3.1.1,credentials='/tmp/unassigned.devices/credentials_tdownloaded' '//AURORA/tdownloaded' '/mnt/remotes/AURORA_tdownloaded'
Oct  9 06:46:29 Halcyon kernel: CIFS: Attempting to mount \\AURORA\tdownloaded
Oct  9 06:46:29 Halcyon kernel: CIFS: VFS: Error connecting to socket. Aborting operation.
Oct  9 06:46:29 Halcyon kernel: CIFS: VFS: cifs_mount failed w/return code = -111
Oct  9 06:46:29 Halcyon unassigned.devices: SMB 3.1.1 mount failed: 'mount error(111): could not connect to 192.168.1.70Unable to find suitable address. '.

 

What is that server?

 

I would not mount that server share with UD and see if it stops your SMB issues.

Link to comment
6 hours ago, dlandon said:

Looking at your diagnostics I see an issue with one server:

Oct 10 08:56:15 Halcyon unassigned.devices: Warning: shell_exec(/bin/df '/mnt/remotes/AURORA_tdownloaded' --output=size,used,avail | /bin/grep -v '1K-blocks' 2>/dev/null) took longer than 5s!

This is generally indicative of network or remote server connection issues.

 

That server is having a tough time with a CIFS mount:

Oct  9 06:46:29 Halcyon unassigned.devices: Mount SMB share '//AURORA/tdownloaded' using SMB 3.1.1 protocol.
Oct  9 06:46:29 Halcyon unassigned.devices: Mount SMB command: /sbin/mount -t cifs -o rw,noserverino,nounix,iocharset=utf8,file_mode=0777,dir_mode=0777,uid=99,gid=100,vers=3.1.1,credentials='/tmp/unassigned.devices/credentials_tdownloaded' '//AURORA/tdownloaded' '/mnt/remotes/AURORA_tdownloaded'
Oct  9 06:46:29 Halcyon kernel: CIFS: Attempting to mount \\AURORA\tdownloaded
Oct  9 06:46:29 Halcyon kernel: CIFS: VFS: Error connecting to socket. Aborting operation.
Oct  9 06:46:29 Halcyon kernel: CIFS: VFS: cifs_mount failed w/return code = -111
Oct  9 06:46:29 Halcyon unassigned.devices: SMB 3.1.1 mount failed: 'mount error(111): could not connect to 192.168.1.70Unable to find suitable address. '.

 

What is that server?

 

I would not mount that server share with UD and see if it stops your SMB issues.

Aurora is a 3rd server and this one does not unmount from Halcyon (or at least mounts back after). I would guess errors are do to high CPU usage that stalls the server. Aurora is pretty much alway between 90 and 100% CPU 🙂

 

My issue is from Halcyon and Nostromo or from Aurora and Nostromo because my shares are on Nostromo (and he is rarely over 40% CPU)

 

 

But maybe I'm wrong and I have some sort of network issue between all my servers 😕

Link to comment

today, I had a script use a share mount. in fact, it's the "backup/restore appdata" plugin that used a unassigned device mount to backup. and it fails.

there is something not working well with unraid mounts. when I search on the forum I see lot's of "kernel: traps: lsof[****] general protection fault ******* in libc-2.36.so" and other similar errors.

 

I thought my servers where just failling because of too long response because of high cpu usage, but realy I'm begining to doubt.

 

what is, the best practice to make shares on unraid?

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...