SMB failling after some time

Nexius2 · October 5, 2022

Hello,

I have an issue I can't figure out.

I have the same issue on all servers no matters who shares data, but to make it easier, we will take the following case.

I have a couple unraid servers. one of them is my main data storage. on it, I have some SMB shares so that the other unraid servers can connect to it using unassigned devices (yes I have tried nfs, but it really doesn't work better).

Mounting the share works well from any server and my dockers can access the share without issue.

But, from time to time (almost every day this week, last week less) the share just.... stops working.... and when I run the ls command in the remote folder, it says permission denied on some share (not all) while all are using the same credential.

for now, my solution is to stop the array and start it again and everything work fine (unmounting and mounting again does not work)

Here are the logs if you can help me. My main data storage is Nostromo and Halcyon is just another server trying to access it (fresh new install with not much on it).

for information, the error "kernel: traps: lsof[23970] general protection fault" is on all my servers so I guess it is a "normal" error ...

thanks

halcyon-diagnostics-20221005-0736.zip nostromo-diagnostics-20221005-0736.zip

JorgeB · October 5, 2022

Nothing obvious in the logs, do you know the time in the log when it stopped working?

Nexius2 · October 5, 2022

Share permission just failed again (second time today).

It made me noticed time was not set correctly on one of the servers, but that shouldn't change anything.

I'll wait til next time and give new logs with the right timestamps but I can't see anything else then "kernel: traps: lsof[18502] general protection fault ip:14f91cbbd4ee sp:8da6349c47b6e45d error:0 in libc-2.36.so[14f91cba5000+16b000]"

I haven't had a problem on the other server since I moved radarr/sonarr/lidarr from it.

Could the share in the containers be the source of my problem?

Nexius2 · October 6, 2022

it is now failling a couple time a day.

I need to stop all containers, unmount and mount to make it work again. dockers has to have something to do with it...

on nostromo (the server with the share) I can see this

Oct 5 14:52:07 Nostromo rpc.mountd[9333]: authenticated mount request from 192.168.1.75:757 for /mnt/user/Music_data (/mnt/user/Music_data) 
Oct 5 14:52:08 Nostromo rpc.mountd[9333]: authenticated mount request from 192.168.1.75:828 for /mnt/user/Video_data (/mnt/user/Video_data) 
Oct 5 14:52:08 Nostromo rpc.mountd[9333]: authenticated mount request from 192.168.1.75:980 for /mnt/user/temp (/mnt/user/temp)

but it doesn't say it fails and I don't see anything on the client side....

halcyon-diagnostics-20221006-1659.zip

Edited October 6, 2022 by Nexius2

JorgeB · October 6, 2022

Does it coincide with the logged lsof general protection faults?

Stop all dockers and start enabling one by one, might help find the culprit.

Nexius2 · October 6, 2022

the protection faults error are on every server and doesn't seem to be the issue (they are always there and at any time).

I did the opposite, stopped containers one by one until I could mount a share. Krusader seems to make permission problems.... don't know why... I'll try without Krusader to see if it continues

Edited October 6, 2022 by Nexius2

Nexius2 · October 7, 2022

Krusader is not the faulty one 😞

it keep failling.

I see "Send error in SessSetup = -35"

but i noticed there is a permission failled... but why?

halcyon-diagnostics-20221007-1331.zip

Nexius2 · October 11, 2022

changed:

- tunable (support hard links) in Settings/Global share settings to NO

- all DNS to the same server

- local master to only the data server in Settings/workgroup settings/ local master

none of these resolved my issue

just updated to 6.11.1 didn't fail yet (2 days)

Nexius2 · October 12, 2022

Well, it failled again....

anybody has an idea of why mounts could fail so regularly?

halcyon-diagnostics-20221012-0759.zip

Nexius2 · October 12, 2022

just happened again between my last message and now....

halcyon-diagnostics-20221012-0825.zip

I've seen this:

Oct 12 08:09:30 Halcyon kernel: CIFS: VFS: \\NOSTROMO\temp error -11 on ioctl to get interface list Oct 12 08:09:30 Halcyon kernel: CIFS: VFS: \\NOSTROMO\Music_data error -11 on ioctl to get interface list Oct 12 08:09:30 Halcyon kernel: CIFS: VFS: \\NOSTROMO\Video_data error -11 on ioctl to get interface list

but don't know why

Edited October 12, 2022 by Nexius2

JorgeB · October 12, 2022

Try booting Unraid in safe mode to rule out any plugin.

dlandon · October 12, 2022

Looking at your diagnostics I see an issue with one server:

Oct 10 08:56:15 Halcyon unassigned.devices: Warning: shell_exec(/bin/df '/mnt/remotes/AURORA_tdownloaded' --output=size,used,avail | /bin/grep -v '1K-blocks' 2>/dev/null) took longer than 5s!

This is generally indicative of network or remote server connection issues.

That server is having a tough time with a CIFS mount:

Oct  9 06:46:29 Halcyon unassigned.devices: Mount SMB share '//AURORA/tdownloaded' using SMB 3.1.1 protocol.
Oct  9 06:46:29 Halcyon unassigned.devices: Mount SMB command: /sbin/mount -t cifs -o rw,noserverino,nounix,iocharset=utf8,file_mode=0777,dir_mode=0777,uid=99,gid=100,vers=3.1.1,credentials='/tmp/unassigned.devices/credentials_tdownloaded' '//AURORA/tdownloaded' '/mnt/remotes/AURORA_tdownloaded'
Oct  9 06:46:29 Halcyon kernel: CIFS: Attempting to mount \\AURORA\tdownloaded
Oct  9 06:46:29 Halcyon kernel: CIFS: VFS: Error connecting to socket. Aborting operation.
Oct  9 06:46:29 Halcyon kernel: CIFS: VFS: cifs_mount failed w/return code = -111
Oct  9 06:46:29 Halcyon unassigned.devices: SMB 3.1.1 mount failed: 'mount error(111): could not connect to 192.168.1.70Unable to find suitable address. '.

What is that server?

I would not mount that server share with UD and see if it stops your SMB issues.

Nexius2 · October 12, 2022

8 hours ago, JorgeB said:

Try booting Unraid in safe mode to rule out any plugin.

never tried safe mode, but mount are made with unassigned device.... no plugin, no mount I would say... no?

Edited October 12, 2022 by Nexius2

Nexius2 · October 12, 2022

6 hours ago, dlandon said:

Looking at your diagnostics I see an issue with one server:

Oct 10 08:56:15 Halcyon unassigned.devices: Warning: shell_exec(/bin/df '/mnt/remotes/AURORA_tdownloaded' --output=size,used,avail | /bin/grep -v '1K-blocks' 2>/dev/null) took longer than 5s!

This is generally indicative of network or remote server connection issues.

That server is having a tough time with a CIFS mount:

Oct  9 06:46:29 Halcyon unassigned.devices: Mount SMB share '//AURORA/tdownloaded' using SMB 3.1.1 protocol.
Oct  9 06:46:29 Halcyon unassigned.devices: Mount SMB command: /sbin/mount -t cifs -o rw,noserverino,nounix,iocharset=utf8,file_mode=0777,dir_mode=0777,uid=99,gid=100,vers=3.1.1,credentials='/tmp/unassigned.devices/credentials_tdownloaded' '//AURORA/tdownloaded' '/mnt/remotes/AURORA_tdownloaded'
Oct  9 06:46:29 Halcyon kernel: CIFS: Attempting to mount \\AURORA\tdownloaded
Oct  9 06:46:29 Halcyon kernel: CIFS: VFS: Error connecting to socket. Aborting operation.
Oct  9 06:46:29 Halcyon kernel: CIFS: VFS: cifs_mount failed w/return code = -111
Oct  9 06:46:29 Halcyon unassigned.devices: SMB 3.1.1 mount failed: 'mount error(111): could not connect to 192.168.1.70Unable to find suitable address. '.

What is that server?

I would not mount that server share with UD and see if it stops your SMB issues.

Aurora is a 3rd server and this one does not unmount from Halcyon (or at least mounts back after). I would guess errors are do to high CPU usage that stalls the server. Aurora is pretty much alway between 90 and 100% CPU 🙂

My issue is from Halcyon and Nostromo or from Aurora and Nostromo because my shares are on Nostromo (and he is rarely over 40% CPU)

But maybe I'm wrong and I have some sort of network issue between all my servers 😕

Nexius2 · October 15, 2022

today, I had a script use a share mount. in fact, it's the "backup/restore appdata" plugin that used a unassigned device mount to backup. and it fails.

there is something not working well with unraid mounts. when I search on the forum I see lot's of "kernel: traps: lsof[****] general protection fault ******* in libc-2.36.so" and other similar errors.

I thought my servers where just failling because of too long response because of high cpu usage, but realy I'm begining to doubt.

what is, the best practice to make shares on unraid?

dlandon · October 15, 2022

6 hours ago, Nexius2 said:

what is, the best practice to make shares on unraid?

What kind of shares are you asking about? Remote server shares?

Nexius2 · October 15, 2022

6 hours ago, dlandon said:

What kind of shares are you asking about? Remote server shares?

yes, the one with unassigned device. unless there is a other/better method?

thanks

dlandon · October 15, 2022

11 minutes ago, Nexius2 said:

yes, the one with unassigned device. unless there is a other/better method?

thanks

UD is the best method.

SMB failling after some time

Recommended Posts

Nexius2

Link to comment

JorgeB

Link to comment

Nexius2

Link to comment

Nexius2

Link to comment

JorgeB

Link to comment

Nexius2

Link to comment

Nexius2

Link to comment

Nexius2

Link to comment

Nexius2

Link to comment

Nexius2

Link to comment

JorgeB

Link to comment

dlandon

Link to comment

Nexius2

Link to comment

Nexius2

Link to comment

Nexius2

Link to comment

dlandon

Link to comment

Nexius2

Link to comment

dlandon

Link to comment

Join the conversation