Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Wierd Issue With SMB (and maybe Cache?)

Featured Replies

I have a weird issue that came out of the blue. I use Macrium Reflect to create disk images (backups and such) from machines and i store them on an SMB share on my unraid server. I restore these images frequently and I have used this method for a couple years now without any issues, but suddenly the restore does not work anymore. Literraly I restored one image sucessfully and after that another image 5 minutes later failed. The error log allways says the same thing, that it can't read the source file because of data corruption. Since then I have tried many times, with many different images, with many different machines and many different installations for many days now and the restore allways fails.


I suspected that the files were corrupted. So i did md5 checks on multiple files that previously worked but now failed to restore, and all are intact on the server physically.
I copied many images to local storage on a seperate PC and tried them to restore from there. No luck, restore fails with same error.
Macrium Reflect has a built-in verifier that checks the image files for corruption. Using this on the files that are on the server and even on the local copies, fails (Verification Failed).
I created a new image file and saved it on local storage then copied the image to the same share on the server. Checked MD5 for both files after copy and they are the same. Restoring this image from the local storage no matter how many times, is succesfful. Restoring this image the first time from the server is successfull, but the second time it fails with same error. At this point I was monitoring the servers network output speed and disk usage and the machines (that restores the image) network input speed. The first restore (the one that was succesfull) all three were showing numbers. There was disk usage (reading image file from disk) and network output (sending it to client) from the server side and network input (restoring data) on the machine that restores it. But, the second time. Only network activity was present on both systems. There was no disk read usage on the server, as if the server allready knew what was there and only just sends it.
I have a 512GB Samsung Cache SSD in the server. The previous observation prompted me to check this drive, because how else would the server know what data to send to a client without reading it from the disk. The drive is in perfect condition no bad clusters or sectors, nothing. (I have to mentioned that I have never observed any disk activity from the cache drive) And just to be sure i checked all the disks for errors also. Although Cache is not enabled on the share where the Images are stored, I even removed the Cache drive from the system and redone all my previous tests. The results are the same, the issue persists. It seams that without any cache drive, there is still some kind of cache on a share that has cache disabled. Weird.
This prompted me to check the system memory. Although there are multiple VMs running at all times without any issues that are using around 75% of the system RAM all the time, I have seen wierder things... I have run memtest and the RAM is fine.
I also checked copying files from the server via SSH. This copies the file correctly (md5 is the same), but through SMB it is corrupt. So basically I suspect something is wrong with SMB. But I have no idea what it could be. I have searched SMB Cache Unraid disable and every combination of those I can think of, but all it comes up with are posts that are considering the cache as a write mechanism when you write to the share. And that is not this.

One more interresting thing I have noticed. If I stop the array, then start it again. The restore works for around a half hour every time, after that the issue comes back and we can observe the same things again. i have no extra start scripts and no special things installed that would explain this.


With this post, my hope is that someone knowledgeable than me will have some idea what should I test or do to fix this issue that came out of nowhere. As I said in the beginning, I had no such issue until a couple of days ago. The only change I made to the Unraid install since than is that I have upgraded it to the current newest version.
Sorry that this is such a long one, but I have tried to include every detail I have collected, observed over the previous couple of days, in hope that someone can suggest something. Every idea is appreciated.
Thank you.

1 hour ago, Balazs Nemeth said:

I have a weird issue that came out of the blue. I use Macrium Reflect to create disk images (backups and such) from machines and i store them on an SMB share on my unraid server. I restore these images frequently and I have used this method for a couple years now without any issues, but suddenly the restore does not work anymore. Literraly I restored one image sucessfully and after that another image 5 minutes later failed. The error log allways says the same thing, that it can't read the source file because of data corruption. Since then I have tried many times, with many different images, with many different machines and many different installations for many days now and the restore allways fails.


I suspected that the files were corrupted. So i did md5 checks on multiple files that previously worked but now failed to restore, and all are intact on the server physically.
I copied many images to local storage on a seperate PC and tried them to restore from there. No luck, restore fails with same error.
Macrium Reflect has a built-in verifier that checks the image files for corruption. Using this on the files that are on the server and even on the local copies, fails (Verification Failed).
I created a new image file and saved it on local storage then copied the image to the same share on the server. Checked MD5 for both files after copy and they are the same. Restoring this image from the local storage no matter how many times, is succesfful. Restoring this image the first time from the server is successfull, but the second time it fails with same error. At this point I was monitoring the servers network output speed and disk usage and the machines (that restores the image) network input speed. The first restore (the one that was succesfull) all three were showing numbers. There was disk usage (reading image file from disk) and network output (sending it to client) from the server side and network input (restoring data) on the machine that restores it. But, the second time. Only network activity was present on both systems. There was no disk read usage on the server, as if the server allready knew what was there and only just sends it.
I have a 512GB Samsung Cache SSD in the server. The previous observation prompted me to check this drive, because how else would the server know what data to send to a client without reading it from the disk. The drive is in perfect condition no bad clusters or sectors, nothing. (I have to mentioned that I have never observed any disk activity from the cache drive) And just to be sure i checked all the disks for errors also. Although Cache is not enabled on the share where the Images are stored, I even removed the Cache drive from the system and redone all my previous tests. The results are the same, the issue persists. It seams that without any cache drive, there is still some kind of cache on a share that has cache disabled. Weird.
This prompted me to check the system memory. Although there are multiple VMs running at all times without any issues that are using around 75% of the system RAM all the time, I have seen wierder things... I have run memtest and the RAM is fine.
I also checked copying files from the server via SSH. This copies the file correctly (md5 is the same), but through SMB it is corrupt. So basically I suspect something is wrong with SMB. But I have no idea what it could be. I have searched SMB Cache Unraid disable and every combination of those I can think of, but all it comes up with are posts that are considering the cache as a write mechanism when you write to the share. And that is not this.

One more interresting thing I have noticed. If I stop the array, then start it again. The restore works for around a half hour every time, after that the issue comes back and we can observe the same things again. i have no extra start scripts and no special things installed that would explain this.


With this post, my hope is that someone knowledgeable than me will have some idea what should I test or do to fix this issue that came out of nowhere. As I said in the beginning, I had no such issue until a couple of days ago. The only change I made to the Unraid install since than is that I have upgraded it to the current newest version.
Sorry that this is such a long one, but I have tried to include every detail I have collected, observed over the previous couple of days, in hope that someone can suggest something. Every idea is appreciated.
Thank you.

Post diagnostics.

  • Author

Hi,

Thank you for posting, see attached. I tried it over the weekend but, it never finished. The diagnostic script went through some files in /mnt/cache that are not even there. I know they are not there because the cache drive is currently empty. This morning I tried once again and it hang, but I could find the finished file in /boot/logs so I could post them here.

Thanks for any suggestion.

beast-diagnostics-20231119-1412.zip

  • 3 weeks later...
  • Author

I did not reply to this in a while, but the issue has been resolved. The problem was... as it seems nothing.

A few days after I posted this I've been trying things experimenting. Finally I pushed the "restart everything button". Until then I've restarted already things several times, and it never fixed it. But, this time when I did a complete network restart, and I mean totally complete restart, everything, from routers to switches, to servers and everything that is on the network including even the modem from the ISP, the problem went away and everything is working now perfectly. I have to reiterate that I have restarted all of these things one by one and even multiple things at a time and it never fixed it, so I have no idea where the problem was. But I am very happy to be finally be rid of it.

Thanks to anyone who even read the post.

 

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.