Southweave Posted February 13, 2023 Share Posted February 13, 2023 (edited) Description After updating Unraid from version 6.10.3 to 6.11.5 windows backups occasionally fail, which root cause seems to be that SMB locks are not properly handled. When backing up the required disks are all spun up and transfer works properly. Data is transferred to Unraid and everything is fine until the end, there something goes wrong and the process never finishes. For example today it backed up 17GB of data but the file has suffix .tmp which will probably be deleted the day after tomorrow. I never had the issue on 6.10.3 and if I reverted to 6.10.3 then problem didn't reoccur If I kill the process which locks SMB lock then sometimes backup works sometimes it doesn't. There have been times where the problem didn't occur for a week and then there have been times where the whole week backups failed. Pictures of smbstatus, smb settings and tips & tweaks have been included. smbstatus picture shows the locked file which shouldn't be locked anymore (the process already failed) Also I included server diagnostics. The server was restarted this morning after which I attempted to backup once again. I'll gladly answer any questions, this problem has been driving me nuts Systems Unraid Version 6.11.5 Supermicro X9DR3-F Intel E5-2680 v2 (2x) 40Gib DDR3 ECCS memory 40Gib NIC Windows #1 Windows 10 10Gib NIC Intel system Windows #2 Windows 11 10Gib NIC Intel system Backup software: Acronis Macrium Reflect I initially used Acronis however I thought the issue was Acronis specific and migrated to Macrium Reflect. Macrium reflect has the same issue What I've tried: Enabling & Disabling SMB Multi channel Resetting Tweaks And Tricks plugin settings Removing all custom samba configuration (which included RSS & Multi channel prior to update) Changing backup software Switching backing up to rotational, as in one Windows systems backs up a day so there isn't a possibility of a conflict Creating separate Unraid users for backing up Changing "ulimit -Hn" and "ulimit -Sn" to 80000, it used to be 40960 Turn SMB share to public (not hidden) Googled the same issue over and over within last month with different wording and attempted to find same issue described. I couldn't find anything that would help me Definitely something more Backup schedule Both windows attempt to backup at 09:00, if computer is not on then it is initiated on boot Windows #2 is usually always on at 09:00 so that's when it backs up Windows #1 is usually not on at 09:00 and usually backup occurs later on in the day Both Windows systems have issues This issue has occurred for a while now and if my memory serves me right then it occurred the very day I upgraded to 6.11.5 Unraid now has SMB version which supports Multi Channel and with multi channel there's also a bug: https://bugzilla.samba.org/show_bug.cgi?id=11897 It seems like similar issue but I am not knowledgeable enough to Assert that it has the same root cause How to fix it Why does it occur to me server-diagnostics-20230213-1110.zip Edited March 17, 2023 by Southweave Remove hidden content section Quote Link to comment
JorgeB Posted February 13, 2023 Share Posted February 13, 2023 If "Enhanced macOS interoperability" is enabled try disabling it (Settings -> SMB) Quote Link to comment
Southweave Posted February 13, 2023 Author Share Posted February 13, 2023 25 minutes ago, JorgeB said: If "Enhanced macOS interoperability" is enabled try disabling it (Settings -> SMB) Sadly it is already disabled and has been for over a year. Settings are also visible in pictures, i'll remove them from hidden content to make them more visible Quote Link to comment
JorgeB Posted February 13, 2023 Share Posted February 13, 2023 Do you know the time you saw the error? Looks like nothing relevant logged on Unraid. Quote Link to comment
Southweave Posted February 13, 2023 Author Share Posted February 13, 2023 2 minutes ago, JorgeB said: Do you know the time you saw the error? Looks like nothing relevant logged on Unraid. Sadly Macrium Reflect logs don't show the time of exception (or anything useful at all). However the backup start time is 13.02.2023 10:43:56 I guess the backup lasted couple of minutes at best. The temporary file is also gone by now so I can't verify last modified date time. Quote Link to comment
JorgeB Posted February 13, 2023 Share Posted February 13, 2023 Yep, nothing logged, suggesting the problem is not on the the Unraid side. Quote Link to comment
Frank1940 Posted February 13, 2023 Share Posted February 13, 2023 (edited) I do monthly manual backups using Windows Explorer to copy the entire contents of Documents folder/directory to my Unraid server. When I first started doing (probably ten years ago), I noticed that certain data files were failing to be copied over. Windows Explorer provides the user with details about which files fail to copy and why. I investigated at that time and found that these files were the data files used by my primary E-mail program. That program opened those files when the program started and kept them open until the program was closed. (Windows placed a 'lock' on these files and that lock was preventing the copy. Obviously, not every program would always have open files in the Documents folder and, in that case, the copy would proceed without a problem. But that is the luck of the draw. But when one is looking for a cause to a problem, that bit of random-ism can lead to erroneous conclusions!) My solution was a simple one. I now close all programs before I start the backup. (I have also observed that not every program handles data and configuration files in the same manner. Some store them outside of the normal folders that are considered to be a part of user documents. Others will open, read the data and close files as soon as they have everything they need.) Yes, I do know that automated backup programs like Macrium and Acronis are nice and can be very convenient but sometimes doing things the hard way will provide information about why things are not working because you are working much closer to the software layer that is actually doing the work and any error messages are not being filtered by additional layers of software are designed to provide a more convenient user experience. Edited February 13, 2023 by Frank1940 Quote Link to comment
Southweave Posted February 13, 2023 Author Share Posted February 13, 2023 34 minutes ago, JorgeB said: Yep, nothing logged, suggesting the problem is not on the the Unraid side. While I do agree that it's not Unraids fault I do believe it has something to do with my specific configuration. I just don't know what and I don't have any more ideas how to debug this mess, thus why I created this topic. The reasons why I think it's related to my Unraid setup: The issue started occuring on 6.11.5 If I revert to 6.10.3 then the problem does not reoccur (but because it's not consistent I can't be 100% sure) The issue occurs on both Windows 10 and Windows 11 The issue occurs with both Acronis and Macrium Reflect While that doesn't necessarily mean it's my Unraid setup which causes it, it sure does seem like it. But of course it's not the first time that even though everything seems to point at Unraid - it's not actually Unraid's fault. Quote Link to comment
Southweave Posted February 13, 2023 Author Share Posted February 13, 2023 (edited) 51 minutes ago, Frank1940 said: I do monthly manual backups using Windows Explorer to copy the entire contents of Documents folder/directory to my Unraid server. When I first started doing (probably ten years ago), I noticed that certain data files were failing to be copied over. Windows Explorer provides the user with details about which files fail to copy and why. I investigated at that time and found that these files were the data files used by my primary E-mail program. That program opened those files when the program started and kept them open until the program was closed. (Windows placed a 'lock' on these files and that lock was preventing the copy. Obviously, not every program would always have open files in the Documents folder and, in that case, the copy would proceed without a problem. But that is the luck of the draw. But when one is looking for a cause to a problem, that bit of random-ism can lead to erroneous conclusions!) My solution was a simple one. I now close all programs before I start the backup. (I have also observed that not every program handles data and configuration files in the same manner. Some store them outside of the normal folders that are considered to be a part of user documents. Others will open, read the data and close files as soon as they have everything they need.) Yes, I do know that automated backup programs like Macrium and Acronis are nice and can be very convenient but sometimes doing things the hard way will provide information about why things are not working because you are working much closer to the software layer that is actually doing the work and any error messages are not being filtered by additional layers of software are designed to provide a more convenient user experience. While it does seem logical that a local file is blocking the actual backup, I believe that's not the case. If I remember correctly then Acronis logs stated that there was an error locking SMB share file (because it was already locked by the same process) Macrium states "An unexpected network error occured" Today for the first time I also got an error "Backup aborted! - Open file failed - \\SMB_SHARE... - Error - The process cannot access the file because it is being used by another process This probably occured because the 09:00 backup failed and left the SMB locks up I can imagine how the issue could start occurring at the same time I updated Unraid and cause the issue, but I believe it would be a leap - and for two PC's to start having the issue. If the consensus is that it's not my Unraids configuration problem, then I'll just try to revert back to 6.10.3 and run it for a month again. The probability that the issue occurs within a month is high if the cause is not my Unraid's configuration. Also Acronis & Macrium don't seem to care about locked files. I mean you can literally restore your computer from the image. I don't know what black magic is going on there. Edited February 13, 2023 by Southweave Improved explanation Quote Link to comment
Frank1940 Posted February 13, 2023 Share Posted February 13, 2023 13 minutes ago, Southweave said: Also Acronis & Macrium don't seem to care about locked files. I mean you can literally restore your computer from the image. I don't know what black magic is going on there. I am aware of this advertised capability in the creation of image files and I suspect that they have written a low level subroutine that will allow them to read locked files-- essentially bypassing Windows and working at the disk level. Is this failure repeatable? Couple of thoughts here--- Have you tried second run after the first one fails? Have you tried making the same backup (when a failure occurs) to a (say) a UBS drive to see if that fails? From your description, I am assuming that you are making an image of your system. I use Macrium to make an image of my Windows OS SSD drive every other month. My Documents folder/directory is on a hard drive inside the Windows machine. This Macrium image file is placed on that hard drive when it is created and later copied up to my Unraid server. (I tried to use Windows image software to do this years but ran into permission/ownership issues when copying it to the server. 🤕 ) I have never had a problem copying the Macrium image file to Unraid. (When I do this I am storing the images from six Windows computers-- two desktops, one laptop, and three microcomputers that are used as Home theater PC's.) Quote Link to comment
Southweave Posted February 13, 2023 Author Share Posted February 13, 2023 (edited) Quote Is this failure repeatable? Have you tried second run after the first one fails? Yes, if the backup fails then you can repeat it by reattempting. I'd guess that the odds of second backup failing is ~50%. Usually the process to reattempt: Check if SMB lock is still active (smbstatus) If yes, then kill the process (kill {PID}) Restart the backup Fails? Start again from #1 Quote Have you tried making the same backup (when a failure occurs) to a (say) a UBS drive to see if that fails? No, but that's a good suggestion. But because the odds of second backup also failing are ~50% it would take a while to actually conclude that it works. Because it would take a while to actually validate this I think it would be simpler for me to simply revert Unraid to previous version. However if reverting proves to be okay then I'm still not any closer to the actual solution. EDIT: The backup has failed the whole day and I just attempted to backup to another drive on the PC itself. It was successful. Quote From your description, I am assuming that you are making an image of your system. Correct, first full image backup and then differential image backups. Both types fail I'm also going to include "testparm -v" global output, can you please verify this against yours or just check it over? It is entirely possible I've changed something in the bast when attempting to get 10Gb connection between Unraid and other systems Spoiler # Global parameters [global] abort shutdown script = add group script = additional dns hostnames = add machine script = addport command = addprinter command = add share command = add user script = add user to group script = afs token lifetime = 604800 afs username map = aio max threads = 100 algorithmic rid base = 1000 allow dcerpc auth level connect = No allow dns updates = secure only allow insecure wide links = No allow nt4 crypto = No allow trusted domains = Yes allow unsafe cluster upgrade = No apply group policies = No async dns timeout = 10 async smb echo handler = No auth event notification = No auto services = binddns dir = /var/lib/samba/bind-dns bind interfaces only = No browse list = Yes cache directory = /var/cache/samba change notify = Yes change share command = check password script = cldap port = 389 client ipc max protocol = default client ipc min protocol = default client ipc signing = default client lanman auth = No client ldap sasl wrapping = sign client max protocol = default client min protocol = SMB2_02 client NTLMv2 auth = Yes client plaintext auth = No client protection = default client schannel = Yes client signing = default client smb encrypt = default client smb3 encryption algorithms = AES-128-GCM, AES-128-CCM, AES-256-GCM, AES-256-CCM client smb3 signing algorithms = AES-128-GMAC, AES-128-CMAC, HMAC-SHA256 client use kerberos = desired client use spnego principal = No client use spnego = Yes cluster addresses = clustering = No config backend = file config file = create krb5 conf = Yes ctdbd socket = ctdb locktime warn threshold = 0 ctdb timeout = 0 cups connection timeout = 30 cups encrypt = No cups server = dcerpc endpoint servers = epmapper, wkssvc, rpcecho, samr, netlogon, lsarpc, drsuapi, dssetup, unixinfo, browser, eventlog6, backupkey, dnsserver deadtime = 10080 debug class = No debug encryption = No debug hires timestamp = Yes debug pid = No debug prefix timestamp = No debug syslog format = No winbind debug traceid = No debug uid = No dedicated keytab file = default service = defer sharing violations = Yes delete group script = deleteprinter command = delete share command = delete user from group script = delete user script = dgram port = 138 disable netbios = Yes disable spoolss = Yes dns forwarder = dns port = 53 dns proxy = Yes dns update command = /usr/sbin/samba_dnsupdate dns zone scavenging = No dns zone transfer clients allow = dns zone transfer clients deny = domain logons = No domain master = Auto dos charset = CP850 dsdb event notification = No dsdb group change notification = No dsdb password event notification = No enable asu support = No enable core files = Yes enable privileges = Yes encrypt passwords = Yes enhanced browsing = Yes enumports command = eventlog list = get quota command = getwd cache = Yes gpo update command = /usr/sbin/samba-gpupdate guest account = nobody host msdfs = Yes hostname lookups = No idmap backend = tdb idmap cache time = 604800 idmap gid = idmap negative cache time = 120 idmap uid = include system krb5 conf = Yes init logon delay = 100 init logon delayed hosts = interfaces = iprint server = kdc enable fast = Yes keepalive = 300 kerberos encryption types = all kerberos method = default kernel change notify = Yes kpasswd port = 464 krb5 port = 88 lanman auth = No large readwrite = Yes ldap admin dn = ldap connection timeout = 2 ldap debug level = 0 ldap debug threshold = 10 ldap delete dn = No ldap deref = auto ldap follow referral = Auto ldap group suffix = ldap idmap suffix = ldap machine suffix = ldap max anonymous request size = 256000 ldap max authenticated request size = 16777216 ldap max search request size = 256000 ldap page size = 1000 ldap passwd sync = no ldap replication sleep = 1000 ldap server require strong auth = Yes ldap ssl = start tls ldap suffix = ldap timeout = 15 ldap user suffix = lm announce = Auto lm interval = 60 load printers = No local master = Yes lock directory = /var/cache/samba lock spin time = 200 log file = logging = 0 log level = 1 log nt token command = logon drive = logon home = \\%N\%U logon path = \\%N\%U\profile logon script = log writeable files on exit = No lpq cache time = 30 lsa over netlogon = No machine password timeout = 604800 mangle prefix = 1 mangling method = hash2 map to guest = Bad User max disk size = 0 max log size = 10000 max mux = 50 max open files = 40960 max smbd processes = 0 max stat cache size = 512 max ttl = 259200 max wins ttl = 518400 max xmit = 16644 mdns name = netbios message command = min domain uid = 1000 min receivefile size = 0 min wins ttl = 21600 mit kdc command = multicast dns register = No name cache timeout = 660 name resolve order = lmhosts wins host bcast nbt client socket address = 0.0.0.0 nbt port = 137 ncalrpc dir = /var/run/samba/ncalrpc netbios aliases = netbios name = SERVER netbios scope = neutralize nt4 emulation = No nmbd bind explicit broadcast = Yes nsupdate command = /usr/bin/nsupdate -g nt hash store = always ntlm auth = ntlmv1-permitted nt pipe support = Yes ntp signd socket directory = /var/lib/samba/ntp_signd nt status support = Yes null passwords = Yes obey pam restrictions = No old password allowed period = 60 oplock break wait time = 0 os2 driver map = os level = 20 pam password change = No panic action = passdb backend = smbpasswd passdb expand explicit = No passwd chat = *new*password* %n\n *new*password* %n\n *changed* passwd chat debug = No passwd chat timeout = 2 passwd program = password hash gpg key ids = password hash userPassword schemes = password server = * perfcount module = pid directory = /var/run preferred master = Auto prefork backoff increment = 10 prefork children = 4 prefork maximum backoff = 120 preload modules = printcap cache time = 750 printcap name = /dev/null private dir = /var/lib/samba/private raw NTLMv2 auth = No read raw = Yes realm = registry shares = No reject md5 clients = No reject md5 servers = No remote announce = remote browse sync = rename user script = require strong key = Yes reset on zero vc = No restrict anonymous = 0 root directory = rpc big endian = No rpc server dynamic port range = 49152-65535 rpc server port = 0 rpc start on demand helpers = Yes samba kcc command = /usr/sbin/samba_kcc security = USER server max protocol = SMB3 server min protocol = SMB2 server multi channel support = No server role = auto server schannel = Yes server services = s3fs, rpc, nbt, wrepl, ldap, cldap, kdc, drepl, winbindd, ntp_signd, kcc, dnsupdate, dns server signing = default server smb3 encryption algorithms = AES-128-GCM, AES-128-CCM, AES-256-GCM, AES-256-CCM server smb3 signing algorithms = AES-128-GMAC, AES-128-CMAC, HMAC-SHA256 server string = Unraid set primary group script = set quota command = show add printer wizard = No shutdown script = smb1 unix extensions = No smb2 disable lock sequence checking = No smb2 disable oplock break retry = No smb2 leases = Yes smb2 max credits = 8192 smb2 max read = 8388608 smb2 max trans = 8388608 smb2 max write = 8388608 smbd profiling level = off smb passwd file = /var/lib/samba/private/smbpasswd smb ports = 445 139 socket options = TCP_NODELAY spn update command = /usr/sbin/samba_spnupdate stat cache = Yes state directory = /var/lib/samba svcctl list = syslog = 0 syslog only = No template homedir = /home/%D/%U template shell = /bin/false time server = No timestamp logs = Yes tls cafile = tls/ca.pem tls certfile = tls/cert.pem tls crlfile = tls dh params file = tls enabled = Yes tls keyfile = tls/key.pem tls priority = NORMAL:-VERS-SSL3.0 tls verify peer = as_strict_as_possible unicode = Yes unix charset = UTF-8 unix password sync = No use mmap = Yes username level = 0 username map = username map cache time = 0 username map script = usershare allow guests = No usershare max shares = 0 usershare owner only = Yes usershare path = /var/lib/samba/usershares usershare prefix allow list = usershare prefix deny list = usershare template share = utmp = No utmp directory = winbind cache time = 300 winbindd socket directory = /var/run/samba/winbindd winbind enum groups = No winbind enum users = No winbind expand groups = 0 winbind max clients = 200 winbind max domain connections = 1 winbind nested groups = Yes winbind normalize names = No winbind nss info = template winbind offline logon = No winbind reconnect delay = 30 winbind refresh tickets = No winbind request timeout = 60 winbind rpc only = No winbind scan trusted domains = No winbind sealed pipes = Yes winbind separator = \ winbind use default domain = No winbind use krb5 enterprise principals = Yes wins hook = wins proxy = No wins server = wins support = No workgroup = WORKGROUP write raw = Yes wtmp directory = fruit:nfs_aces = No idmap config * : range = 3000-7999 idmap config * : backend = tdb access based share enum = No acl allow execute always = Yes acl check permissions = Yes acl flag inherited canonicalization = Yes acl group control = No acl map full control = Yes administrative share = No admin users = afs share = No aio read size = 0 aio write behind = aio write size = 0 allocation roundup size = 0 available = Yes blocking locks = Yes block size = 1024 browseable = Yes case sensitive = Auto check parent directory delete on close = No comment = copy = create mask = 0777 csc policy = manual cups options = default case = lower default devmode = Yes delete readonly = No delete veto files = No dfree cache time = 0 dfree command = directory mask = 0777 directory name cache size = 100 dmapi support = No dont descend = dos filemode = No dos filetime resolution = No dos filetimes = Yes durable handles = Yes ea support = Yes fake directory create times = No fake oplocks = No follow symlinks = Yes smbd force process locks = No force create mode = 0000 force directory mode = 0000 force group = force printername = No force unknown acl user = No force user = fstype = NTFS guest ok = No guest only = No hide dot files = Yes hide files = hide new files timeout = 0 hide special files = No hide unreadable = No hide unwriteable files = No honor change notify privilege = No hosts allow = hosts deny = include = /etc/samba/smb-shares.conf inherit acls = No inherit owner = no inherit permissions = No invalid users = root kernel oplocks = No kernel share modes = No level2 oplocks = Yes locking = Yes lppause command = lpq command = lpq -P'%p' lpresume command = lprm command = lprm -P'%p' %j magic output = magic script = mangled names = illegal mangling char = ~ map acl inherit = No map archive = No map hidden = No map readonly = yes map system = No max connections = 0 max print jobs = 1000 max reported print jobs = 0 min print space = 0 msdfs proxy = msdfs root = No msdfs shuffle referrals = No nt acl support = Yes ntvfs handler = unixuid, default oplocks = Yes path = posix locking = Yes postexec = preexec = preexec close = No preserve case = Yes printable = No print command = lpr -r -P'%p' %s printer name = printing = bsd printjob username = %U print notify backchannel = No queuepause command = queueresume command = read list = read only = Yes root postexec = root preexec = root preexec close = No server smb encrypt = default short preserve case = Yes smbd async dosmode = No smbd getinfo ask sharemode = Yes smbd max async dosmode = 0 smbd max xattr size = 65536 smbd search ask sharemode = Yes spotlight = No spotlight backend = noindex store dos attributes = Yes strict allocate = No strict locking = Auto strict rename = No strict sync = Yes sync always = No use client driver = No use sendfile = Yes valid users = veto files = veto oplock files = vfs objects = volume = volume serial number = -1 wide links = Yes write list = Edited February 13, 2023 by Southweave Added more context Quote Link to comment
Frank1940 Posted February 13, 2023 Share Posted February 13, 2023 I also believe there was an upgrade to the version of Samba recently so that remains another variable in the equation. One Ver 6.12 will use another even newer version of Samba (Know this because there was a posting an Samba update to fix security issues and it was stated that was a newer version was slated for Ver 6.12.) I am attaching a copy of my output of testparm -v to this post. I leave the cross checking to you... I checked in the Unraid syslog for the server (version 6.11.5) where I store my backups and which is currently using version 4.17.3 of Samba. I am running Windows 10 PRO versions 22H2 on all my Windows computers. I have build 19045.2486 on my primary Desktop. Here is the info on Macrium: I wish you luck in resolving this. (I have been working/playing with computers--started with a Radio Shack Color Computer) since about 1982 and there have been many times when I found that it can be quicker to find a way around a problem rather than solving it--- coward's way out, I know... 😈 ) Testparm.txt Quote Link to comment
Southweave Posted February 13, 2023 Author Share Posted February 13, 2023 36 minutes ago, Frank1940 said: I also believe there was an upgrade to the version of Samba recently so that remains another variable in the equation. One Ver 6.12 will use another even newer version of Samba (Know this because there was a posting an Samba update to fix security issues and it was stated that was a newer version was slated for Ver 6.12.) I am attaching a copy of my output of testparm -v to this post. I leave the cross checking to you... I checked in the Unraid syslog for the server (version 6.11.5) where I store my backups and which is currently using version 4.17.3 of Samba. I am running Windows 10 PRO versions 22H2 on all my Windows computers. I have build 19045.2486 on my primary Desktop. Here is the info on Macrium: I wish you luck in resolving this. (I have been working/playing with computers--started with a Radio Shack Color Computer) since about 1982 and there have been many times when I found that it can be quicker to find a way around a problem rather than solving it--- coward's way out, I know... 😈 ) Testparm.txt 15.62 kB · 1 download I compared our testparm results and there are only a few differences: You seem to have syslog on Your casesensitive is Yes while mine is Auto You show dot files Your max log size is 5 000 while mine is 10 000 And some naming differences Doesn't seem anything substantial. Must be something else... It does seem like finding a way around the problem would be quicker by now... Thank you for your help! Quote Link to comment
Solution Southweave Posted March 17, 2023 Author Solution Share Posted March 17, 2023 Follow up on this topic. I changed a lot of settings ~1 month ago and I haven't had any issues since then. Actions I took: Deleted preclear plugin (I don't think this had any effect) Deleted Tips & Tweaks plugin (I don't think this had any effect) Deleted Nerds plugin (I don't think this had any effect, it wasn't suppose to be used anyways) Dynamix File Integrity I set to disabled: "Automatically protect new and modified files:" Allowed incremental (not only full and differential) backups with Macrium Allowed writing backups to cache drive This couldn't be done with Acrium because they file into a single large file, while Macrium creates a new file for every backup I believe the Dynamix File Integrity changes or directing backups to cache drive fixed the issue. But because I decided to go nuclear and try everything once, then I can't be sure. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.