Nexius2 Posted October 5, 2022 Share Posted October 5, 2022 Hello, I have an issue I can't figure out. I have the same issue on all servers no matters who shares data, but to make it easier, we will take the following case. I have a couple unraid servers. one of them is my main data storage. on it, I have some SMB shares so that the other unraid servers can connect to it using unassigned devices (yes I have tried nfs, but it really doesn't work better). Mounting the share works well from any server and my dockers can access the share without issue. But, from time to time (almost every day this week, last week less) the share just.... stops working.... and when I run the ls command in the remote folder, it says permission denied on some share (not all) while all are using the same credential. for now, my solution is to stop the array and start it again and everything work fine (unmounting and mounting again does not work) Here are the logs if you can help me. My main data storage is Nostromo and Halcyon is just another server trying to access it (fresh new install with not much on it). for information, the error "kernel: traps: lsof[23970] general protection fault" is on all my servers so I guess it is a "normal" error ... thanks halcyon-diagnostics-20221005-0736.zip nostromo-diagnostics-20221005-0736.zip Quote Link to comment
JorgeB Posted October 5, 2022 Share Posted October 5, 2022 Nothing obvious in the logs, do you know the time in the log when it stopped working? Quote Link to comment
Nexius2 Posted October 5, 2022 Author Share Posted October 5, 2022 Share permission just failed again (second time today). It made me noticed time was not set correctly on one of the servers, but that shouldn't change anything. I'll wait til next time and give new logs with the right timestamps but I can't see anything else then "kernel: traps: lsof[18502] general protection fault ip:14f91cbbd4ee sp:8da6349c47b6e45d error:0 in libc-2.36.so[14f91cba5000+16b000]" I haven't had a problem on the other server since I moved radarr/sonarr/lidarr from it. Could the share in the containers be the source of my problem? Quote Link to comment
Nexius2 Posted October 6, 2022 Author Share Posted October 6, 2022 (edited) it is now failling a couple time a day. I need to stop all containers, unmount and mount to make it work again. dockers has to have something to do with it... on nostromo (the server with the share) I can see this Oct 5 14:52:07 Nostromo rpc.mountd[9333]: authenticated mount request from 192.168.1.75:757 for /mnt/user/Music_data (/mnt/user/Music_data) Oct 5 14:52:08 Nostromo rpc.mountd[9333]: authenticated mount request from 192.168.1.75:828 for /mnt/user/Video_data (/mnt/user/Video_data) Oct 5 14:52:08 Nostromo rpc.mountd[9333]: authenticated mount request from 192.168.1.75:980 for /mnt/user/temp (/mnt/user/temp) but it doesn't say it fails and I don't see anything on the client side.... halcyon-diagnostics-20221006-1659.zip Edited October 6, 2022 by Nexius2 Quote Link to comment
JorgeB Posted October 6, 2022 Share Posted October 6, 2022 Does it coincide with the logged lsof general protection faults? Stop all dockers and start enabling one by one, might help find the culprit. Quote Link to comment
Nexius2 Posted October 6, 2022 Author Share Posted October 6, 2022 (edited) the protection faults error are on every server and doesn't seem to be the issue (they are always there and at any time). I did the opposite, stopped containers one by one until I could mount a share. Krusader seems to make permission problems.... don't know why... I'll try without Krusader to see if it continues Edited October 6, 2022 by Nexius2 Quote Link to comment
Nexius2 Posted October 7, 2022 Author Share Posted October 7, 2022 Krusader is not the faulty one 😞 it keep failling. I see "Send error in SessSetup = -35" but i noticed there is a permission failled... but why? halcyon-diagnostics-20221007-1331.zip Quote Link to comment
Nexius2 Posted October 11, 2022 Author Share Posted October 11, 2022 changed: - tunable (support hard links) in Settings/Global share settings to NO - all DNS to the same server - local master to only the data server in Settings/workgroup settings/ local master none of these resolved my issue just updated to 6.11.1 didn't fail yet (2 days) Quote Link to comment
Nexius2 Posted October 12, 2022 Author Share Posted October 12, 2022 Well, it failled again.... anybody has an idea of why mounts could fail so regularly? halcyon-diagnostics-20221012-0759.zip Quote Link to comment
Nexius2 Posted October 12, 2022 Author Share Posted October 12, 2022 (edited) just happened again between my last message and now.... halcyon-diagnostics-20221012-0825.zip I've seen this: Oct 12 08:09:30 Halcyon kernel: CIFS: VFS: \\NOSTROMO\temp error -11 on ioctl to get interface list Oct 12 08:09:30 Halcyon kernel: CIFS: VFS: \\NOSTROMO\Music_data error -11 on ioctl to get interface list Oct 12 08:09:30 Halcyon kernel: CIFS: VFS: \\NOSTROMO\Video_data error -11 on ioctl to get interface list but don't know why Edited October 12, 2022 by Nexius2 Quote Link to comment
JorgeB Posted October 12, 2022 Share Posted October 12, 2022 Try booting Unraid in safe mode to rule out any plugin. Quote Link to comment
dlandon Posted October 12, 2022 Share Posted October 12, 2022 Looking at your diagnostics I see an issue with one server: Oct 10 08:56:15 Halcyon unassigned.devices: Warning: shell_exec(/bin/df '/mnt/remotes/AURORA_tdownloaded' --output=size,used,avail | /bin/grep -v '1K-blocks' 2>/dev/null) took longer than 5s! This is generally indicative of network or remote server connection issues. That server is having a tough time with a CIFS mount: Oct 9 06:46:29 Halcyon unassigned.devices: Mount SMB share '//AURORA/tdownloaded' using SMB 3.1.1 protocol. Oct 9 06:46:29 Halcyon unassigned.devices: Mount SMB command: /sbin/mount -t cifs -o rw,noserverino,nounix,iocharset=utf8,file_mode=0777,dir_mode=0777,uid=99,gid=100,vers=3.1.1,credentials='/tmp/unassigned.devices/credentials_tdownloaded' '//AURORA/tdownloaded' '/mnt/remotes/AURORA_tdownloaded' Oct 9 06:46:29 Halcyon kernel: CIFS: Attempting to mount \\AURORA\tdownloaded Oct 9 06:46:29 Halcyon kernel: CIFS: VFS: Error connecting to socket. Aborting operation. Oct 9 06:46:29 Halcyon kernel: CIFS: VFS: cifs_mount failed w/return code = -111 Oct 9 06:46:29 Halcyon unassigned.devices: SMB 3.1.1 mount failed: 'mount error(111): could not connect to 192.168.1.70Unable to find suitable address. '. What is that server? I would not mount that server share with UD and see if it stops your SMB issues. Quote Link to comment
Nexius2 Posted October 12, 2022 Author Share Posted October 12, 2022 (edited) 8 hours ago, JorgeB said: Try booting Unraid in safe mode to rule out any plugin. never tried safe mode, but mount are made with unassigned device.... no plugin, no mount I would say... no? Edited October 12, 2022 by Nexius2 Quote Link to comment
Nexius2 Posted October 12, 2022 Author Share Posted October 12, 2022 6 hours ago, dlandon said: Looking at your diagnostics I see an issue with one server: Oct 10 08:56:15 Halcyon unassigned.devices: Warning: shell_exec(/bin/df '/mnt/remotes/AURORA_tdownloaded' --output=size,used,avail | /bin/grep -v '1K-blocks' 2>/dev/null) took longer than 5s! This is generally indicative of network or remote server connection issues. That server is having a tough time with a CIFS mount: Oct 9 06:46:29 Halcyon unassigned.devices: Mount SMB share '//AURORA/tdownloaded' using SMB 3.1.1 protocol. Oct 9 06:46:29 Halcyon unassigned.devices: Mount SMB command: /sbin/mount -t cifs -o rw,noserverino,nounix,iocharset=utf8,file_mode=0777,dir_mode=0777,uid=99,gid=100,vers=3.1.1,credentials='/tmp/unassigned.devices/credentials_tdownloaded' '//AURORA/tdownloaded' '/mnt/remotes/AURORA_tdownloaded' Oct 9 06:46:29 Halcyon kernel: CIFS: Attempting to mount \\AURORA\tdownloaded Oct 9 06:46:29 Halcyon kernel: CIFS: VFS: Error connecting to socket. Aborting operation. Oct 9 06:46:29 Halcyon kernel: CIFS: VFS: cifs_mount failed w/return code = -111 Oct 9 06:46:29 Halcyon unassigned.devices: SMB 3.1.1 mount failed: 'mount error(111): could not connect to 192.168.1.70Unable to find suitable address. '. What is that server? I would not mount that server share with UD and see if it stops your SMB issues. Aurora is a 3rd server and this one does not unmount from Halcyon (or at least mounts back after). I would guess errors are do to high CPU usage that stalls the server. Aurora is pretty much alway between 90 and 100% CPU 🙂 My issue is from Halcyon and Nostromo or from Aurora and Nostromo because my shares are on Nostromo (and he is rarely over 40% CPU) But maybe I'm wrong and I have some sort of network issue between all my servers 😕 Quote Link to comment
Nexius2 Posted October 15, 2022 Author Share Posted October 15, 2022 today, I had a script use a share mount. in fact, it's the "backup/restore appdata" plugin that used a unassigned device mount to backup. and it fails. there is something not working well with unraid mounts. when I search on the forum I see lot's of "kernel: traps: lsof[****] general protection fault ******* in libc-2.36.so" and other similar errors. I thought my servers where just failling because of too long response because of high cpu usage, but realy I'm begining to doubt. what is, the best practice to make shares on unraid? Quote Link to comment
dlandon Posted October 15, 2022 Share Posted October 15, 2022 6 hours ago, Nexius2 said: what is, the best practice to make shares on unraid? What kind of shares are you asking about? Remote server shares? Quote Link to comment
Nexius2 Posted October 15, 2022 Author Share Posted October 15, 2022 6 hours ago, dlandon said: What kind of shares are you asking about? Remote server shares? yes, the one with unassigned device. unless there is a other/better method? thanks Quote Link to comment
dlandon Posted October 15, 2022 Share Posted October 15, 2022 11 minutes ago, Nexius2 said: yes, the one with unassigned device. unless there is a other/better method? thanks UD is the best method. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.