Jump to content

Windows Explorer + applications getting stuck waiting for unRAID


madshi

Recommended Posts

Hi there,

 

I've always had this problem, but today after I've upgraded from v5 to v6, it's gotten worse. E.g. I've just unRARed a big (50GB) archive, using the cache drive as both source and destination path. Read and write speeds seem to be ok, but the Windows Explorer (and all applications trying to access unRAID) are very often getting stuck, getting no reply for multiple seconds. It's a very annoying problem and happens all the time. Windows 8.1 x64.

 

(This is with all drives spun up.)

 

Here's the ZIP from the v6 Diagnostics tool:

 

http://madshi.net/towerDiagnostics.zip

 

Is there anything suspicious you experts can see?

 

FWIW, in this situation with v5 usually also the unRAID web interface got stuck for the same time. This doesn't seem to happen with v6, anymore, so that's at least some progress. Also I had rather bad write speeds with v5, which seem to be much better with v6. So overall it's a nice improvement for me. But this "getting stuck" problem is really going on my nerves, so getting that solved would be awesome!

Link to comment

I don't know the specific reason it's getting stuck, but I do see a few things -

 

* Syslog says Cache drive is full, and it's currently showing only 40GB free space.  Obviously that's a problem when you're trying to copy 50GB to it.  See if you can clear some room on it.  You have a Minimum space (Min. free space) set on it that should be good, might make it a little larger, to more quickly block saving of files larger than that.  It's a 1.5TB drive.  I noticed that ALL of your drives are rather full!

 

* The fact that the Cache drive is full, means writes to shares will have to go directly to the shares, updating parity, which makes them slower, and causes heavier overall system utilization, which *could* be a factor in slowing down (possibly pausing) other operations.

 

* There's an issue with the Mover, an odd error I don't understand.  Check the configuration for the Mover, and its scheduling.  There's no schedule at all, right now.  If you don't see anything wrong, try redoing it, just to see if it will save something that disables the error.  But it's possible that the error is completely harmless, and irrelevant.

 

* I also noticed that you are still running in IDE emulation mode for all of your motherboard SATA ports.  I'm sure I must have seen that in the past when you requested help, so I apologize if I forgot to mention changing that.  You need to go into the BIOS settings, look for the onboard SATA settings, and change it from IDE emulation to a native SATA mode, preferably AHCI.  This may provide slightly better performance, and a little better safety, since IDE mode ties the drives in pairs.  If one goes down, the other on its channel may also go down.  It won't do that in AHCI mode.  I can't say whether you will be able to see the speed improvement or not, in AHCI mode.

Link to comment

Thanks for your reply!

 

I've went directly to the cache drive, bypassing the user shares. The cache drive is full now after the unrar, that's correct, but there was still enough space left to do the unraring. Parity etc should not have been involved at all, since I read from and wrote directly to the tower/cache share.

 

Also I can see that writing on the server itself is not a problem, because nzbget in the background is working nicely all the time. So to my eyes it looks like a network or samba problem.

 

The mover scheduler is intentionally empty because I prefer to initiate the moving manually.

 

Good tip about the IDE mode, will look into that, but I can't imagine that it could have to do with the problem? Because the issue appears to be limited to accessing the server from external PCs via network somehow...

 

Just to make sure my description is clear: When unrar is running on my Windows PC, it appears to be working fine for some seconds, then suddenly it doesn't respond for several seconds. If I then click on the unrar window, the title bar reads "not responding", then after a couple of seconds (can be 30 seconds, or shorter or longer), it's alive again, only to get stuck again a couple of seconds later...

Link to comment

Here's another big reason, the Reiser file system has trouble when it's close to full, will have long pauses when you are writing to the last 5% to 10% of the drive.  I would either keep at least 200GB free (or more for best performance), or switch to XFS for that drive.  We don't have a lot of experience yet with filling up an XFS drive, but so far reports seem to be better.  Of course, a large SSD formatted with XFS would be even better!

Link to comment

Ok, thanks, Reiser having trouble when it's close to full might be an explanation. Will check if emptying the cache drive helps.

 

Was planning to replace the cache drive with a big fast SSD soon, maybe together with a 10Gbps network card...  ;D

Link to comment

Unfortunately, that wasn't it.

 

Cache drive is now only 30% full, but still the same freezes occur. I've just started another unRAR from/to cache share. It unrars for about 30 seconds, then it gets stuck for about 30 seconds, then it unrars for about 30 seconds, then it gets stuck again. It's not always 30 seconds, sometimes a bit less, sometimes a bit more. But that's roughly the pattern...  :'(

 

Any ideas what I could try?

Link to comment

Ok, I'll take your advice to switch to xfs the next time the disk is completely empty. But the freezes I'm currently having are now with an only 30% filled reiserfs disk. So that can't be the problem, yes? Or can the disk having been full multiple times already have negative follow-up effects even now, although I already (mostly) emptied the disk?

Link to comment
can the disk having been full multiple times already have negative follow-up effects even now, although I already (mostly) emptied the disk?

Yes. I've seen marked performance improvements by reformatting, even if you reformat to reiserfs. I've yet to see just how resilient xfs is to the same treatment, but I suspect over time we will see the same thing, where a well used filesystem can be sped up by formatting to provide a fresh journal area.
Link to comment
  • 3 weeks later...

Ok, I'm on 6.1.2 now, have reformatted the drive to xfs and no real improvement.

 

WinRAR still freezes, regularly, every 30 seconds, for about 15 seconds, when trying to unRAR something (source and destination of unRAR operation both cache share).

 

I've done some more tests. If I use the Windows Explorer to copy 40GB from the cache share to my local hard drive, I get about 30-70 MB/s read speed, without any freezes. If I use the Windows Explorer to copy 40GB to the cache share from my local hard drive, I get about 40-70 MB/s write speed, without any freezes. See image attachments for those two tests. However, if I use the Windows Explorer to copy 40GB from cache share to cache share, I get 70 MB/s copy speed, for about 10 seconds, then it freezes for about 20 seconds. The Explorer copy window is cleverer than WinRAR here. WinRAR becomes totally unresponsive. The Explorer copy window at least stays responsive, but the speed indicator goes to 0 KB/s for those 20 seconds freeze time.

 

So there's a clear pattern: Just reading works well. Just writing works well. But reading from and writing to the cache share at the same time causes Samba to get totally stuck very regularly. Since it's such a regular pattern and 100% reproducible, there must be some logical explanation?

 

Any ideas how I could get to the bottom of this?

readPerformance.png.5a55e2acf88b22f6e2355b53ad34e597.png

writePerformance.png.599c7668ec4047c16bbd6ad91aa8ce0b.png

Link to comment

It sounds like different buffering strategies applied.  It's probably always reading at 70, but the buffer size used can make a big difference in how smooth it looks, compared to the speed polling interval.  Copying involves reading enough to fill a specific buffer size then transferring it, then waiting while you refill the buffer and repeating.  If you use a small buffer size, then it's quick to fill and only a small wait, so your speed looks like 70, 0, 70, 0,... in very quick intervals.  If somewhat smaller than your speed check interval, it appears to be relatively smooth, with an average speed about half of the max speed.  If your buffer is very large, with smaller speed check interval, then you see nothing but 70, then nothing but 0, etc, and you worry about timeouts kicking in.  Larger buffers are usually slightly faster, less overhead, but can be too large for the circumstances.  In this case, you want smaller buffers.

 

This by itself doesn't explain the cache to cache problem though.  But often different copying and buffering strategies are applied by what the copying agent thinks provides better performance.  It must believe a larger buffer is advantageous with a copy between 2 remote locations, same actual volume.  I wonder if you can trick it by mapping a local drive to one path, and copying to it.  Even if it figures out that it's remote, it still may not apply the same strategy.

 

Some tools can be configured as to what buffer sizes to use for different types of copies.  My Total Commander allows that.  I wonder if winrar has that capability.  You want to try configuring it to use smaller buffers.

Link to comment

This would make sense if we were talking about micro stutters. But there's no data transfer at all for 15 consecutive seconds! That's an eternity in the computer world.

 

I can't imagine that this is caused by the client side. Furthermore, the same problem occurs both when using WinRAR *and* the Windows Explorer. So two totally different applications, same problem (although the lengths and intervals of the freezing differs somewhat).

 

At the moment when the freeze occurs, also I can't refresh an Explorer window that shows the contents of the cache drive. If I press F5 to refresh it, the whole window gets stuck for those 15 seconds, too. It's pretty clear that Samba/unRAID doesn't reply at all during those 15 seconds. So application B gets stuck reading from the cache share when application A does a copy operation. This very much looks like a server side issue to me. But I've no idea how to further debug it...

 

(Neither WinRAR nor the Windows Explorer allow me to specify any buffer sizes.)

Link to comment

I'm not 100% sure, to be honest. I saw such freezing problems once in a while with unRAID 5.x, too, but I had always thought it was due to sleeping disks needing time to spin up. I've never carefully analyzed the situation until now.

 

And yes, I would suspect Samba, too. Or a weird SATA/cache/buffer issue. Sadly, the syslog doesn't seem to say anything. Is there any way to make Samba write detailed logs?

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...