Issue writing huge files in rc16c


Recommended Posts

I do have some problems with this new release, sorry.

 

Something I had in the past came back with 16c. It has to do with writing and allocating huge files. These files are 30GB/40GB/50GB in size and space is pre-allocated for these files by ImgBurn (a Windows application). Now with 16c the pre-allocation will be interrupted and an unknown network error is shown on my Windows 8 machines.

 

There's nothing in the syslog.

 

I know this is very vague and I don't have any knowledge about the changed parameters in 16c, but it looks to me that some timeouts fire earlier than with 16b.

 

Link to comment

This might be an issue...  Perhaps a side effect of 16c, or a side effect of the free space available on the user's server.

I do have some problems with this new release, sorry.

 

Something I had in the past came back with 16c. It has to do with writing and allocating huge files. These files are 30GB/40GB/50GB in size and space is pre-allocated for these files by ImgBurn (a Windows application). Now with 16c the pre-allocation will be interrupted and an unknown network error is shown on my Windows 8 machines.

 

There's nothing in the syslog.

 

I know this is very vague and I don't have any knowledge about the changed parameters in 16c, but it looks to me that some timeouts fire earlier than with 16b.

Link to comment

I do have some problems with this new release, sorry.

 

Something I had in the past came back with 16c. It has to do with writing and allocating huge files. These files are 30GB/40GB/50GB in size and space is pre-allocated for these files by ImgBurn (a Windows application). Now with 16c the pre-allocation will be interrupted and an unknown network error is shown on my Windows 8 machines.

 

There's nothing in the syslog.

 

I know this is very vague and I don't have any knowledge about the changed parameters in 16c, but it looks to me that some timeouts fire earlier than with 16b.

Can you post the output of:

df

and also describe which user-share you are writing to, and how that share is configured.  (which allocation method, how you have the min-free-space set for that share.

If using a "disk" share, just let's see if it has the space needed as allocation method would not be applicable) 

 

also please post the output of:

ulimit -a

Link to comment

I used to have this kind of a problem on 'very full' file systems on slower 5400 RPM drives.

 

 

As I would look at the machines. No writes were occurring on the windows station, however the drive was scrambling on the output drive with no activity on the parity drive.

I'm guessing it was a reiserfs superblock data space search or something. Once space was found, there would either be writes or the timeout occurred on the windows samba client.

After the timeout. I would attempt to do it again, and it would work right away the second time.

While this doesn't point to the problem, it can possibly provide a test to see if it's the same problem. I.E. initial reiserfs allocation timeout.

Link to comment

I used to see this error quite a lot.  If it is the same error as I used to see the following will be true, you don't have a cache drive, the folder where you are writing the files to contain more than 40 files.  What I found is that the allocation of the file is happening in the background it is just happening to slow and does not allocate the whole space in time before windows times out.

 

http://lime-technology.com/forum/index.php?topic=20757.15

 

If you create a new empty folder and you then try the copy again into the newly created folder that contains no files it should copy fine if it is the same error.

 

I now have a cache drive and never see the problem so I do not have any idea if it got worse or better with any of the rc releases.

Link to comment

I had this sort of problem when trying to setup large truecrypt volumes on a disk via samba. (i.e 00's of gigabytes in size).

 

That would have been way back in 4.x though. So if this is a similar thing (and it sounds like it) then it's probably been around for a while.

 

I can't remember how I worked round it. I think I just kept trying until it managed to eventually manage to do it. Not much use for diagnostics.

 

Link to comment

Just to add details.

 

In my experience, (only mine)

It ALWAYS happened on 5400 RPM hard drives that were nearly full with a huge amount of files no matter how much ram I had in the system.

1GB, 2GB, 4GB or 8GB.

The server always showed a huge amount of activity on the drive, like it was searching for something before it was going to be written.

Even if I did a find down the whole drive before I was going to write to it, the samba write timeout problem still happened. I knew no data was moving across the network as I was using teracopy.

 

Link to comment

Thanks for your answers. All drives are pretty full, but I did not see these problems with 16b. The share in question spans 12 of the 14 data drives. allocation_method=highwater with min_free_space=0 and split_level=999.

 

It's no show stopper for me as I can preproduce the huge files on the local machines and copy hem over to the tower --> this always works. Sometimes writing to an individual disk instead of the share helps - but not always. I don't have a cache drive.

 

If it happens next time I will try the empty_folder_trick and report back.

 

ulimit -a

 

root@Tower2:~# ulimit -a
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 31851
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 40960
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) 31851
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

 

df

 

root@Tower2:~# df
Filesystem           1K-blocks      Used Available Use% Mounted on
tmpfs                   131072       188    130884   1% /var/log
/dev/sda                990960     40544    950416   5% /boot
/dev/md1             2930177100 2848898612  81278488  98% /mnt/disk1
/dev/md2             2930177100 2876879764  53297336  99% /mnt/disk2
/dev/md3             2930177100 2857527848  72649252  98% /mnt/disk3
/dev/md4             2930177100 2847528452  82648648  98% /mnt/disk4
/dev/md5             2930177100 2850985712  79191388  98% /mnt/disk5
/dev/md6             2930177100 2848203856  81973244  98% /mnt/disk6
/dev/md7             2930177100 2854532148  75644952  98% /mnt/disk7
/dev/md8             1465093832 1385516916  79576916  95% /mnt/disk8
/dev/md9             1465093832 1375018804  90075028  94% /mnt/disk9
/dev/md10            1465093832 1388977328  76116504  95% /mnt/disk10
/dev/md11            1465093832 1311724844 153368988  90% /mnt/disk11
/dev/md12            2930177100 2472392152 457784948  85% /mnt/disk12
/dev/md13            2930177100 2761381736 168795364  95% /mnt/disk13
/dev/md14            2930177100 2758653556 171523544  95% /mnt/disk14
shfs                 35162146328 33438221864 1723924464  96% /mnt/user

 

Thanks

 

Link to comment

Thanks for your answers. All drives are pretty full, but I did not see these problems with 16b. The share in question spans 12 of the 14 data drives. allocation_method=highwater with min_free_space=0 and split_level=999.

 

Thanks

 

Did you read that min_free_space SHOULD BE equal to approximately twice the size of the largest file to be written to the share?  If that is not the source of your problem, it soon will be!

Link to comment

Thanks for your answers. All drives are pretty full,

That is entirely the issue.  You need a bigger server man, I can recommend one..

 

but I did not see these problems with 16b. The share in question spans 12 of the 14 data drives. allocation_method=highwater with min_free_space=0

Coincidence.  You were probably only slightly less than nearly full then.

 

and split_level=999.

Leaving the 'split level' field blank accomplishes the same thing as setting it to a very high number.

Link to comment

That is entirely the issue.  You need a bigger server man, I can recommend one..

 

This does mean I do have 2TB free on a server and can't write 40GB to it?

 

And it does mean I do have at least 80GB free on a single drive on a server and can't write 40GB to it?

 

Time for a new filesystem instead of new hardware ;-)

 

Link to comment

This does mean I do have 2TB free on a server and can't write 40GB to it?

 

And it does mean I do have at least 80GB free on a single drive on a server and can't write 40GB to it?

It means you are at nearly 98% full.  You might squeeze a little more onto disk11 and disk 12.

 

Time for a new filesystem instead of new hardware ;-)

Yes it turns out reiserfs is notorious for slowing way down the closer it gets to 100% full.  But you will find most other file systems do as well.  Sorry, there's not a lot I can do about it right now.

Link to comment

I haven't tested this particular issue out for a while since I installed a cache drive, but I saw it and my disks were only 50% full it also still happened when the drive that was being written to was empty.  Might disable the cache drive on the share and see what happens.

Link to comment

 

Can you try making a New Folder first before you start the 40GB copy?

 

I can say from experience, I have seen the same issues. if you have the time or ability.

Do the copy near your machine while it's idle and freshly booted.

 

Start your copy. See if the first allocation of the file only utilizes the drive in question.

It should be seeking furiously. Either the copy will time out, or you will see it go very slowly utilizing the data and parity drive.

 

In my case, I was using disk shares and I would see this behavior. I've never used the user shares so I cannot comment there.

Link to comment

I haven't tested this particular issue out for a while since I installed a cache drive, but I saw it and my disks were only 50% full it also still happened when the drive that was being written to was empty.  Might disable the cache drive on the share and see what happens.

 

 

To a 5400 RPM drive or a 7200 RPM drive?

I noticed the behavior was more prominent on the slower drives near high capacity.

 

I wonder if there's a samba time out value that can be adjusted?

Link to comment

The empty folder trick works for me, incredible.

 

What I did is:

 

I tried to create a big file in one of my shared folders (287 files, the directory itself is the direct child of the shared folder, no additional subdirectories below). This was aborted with an unknown network error twice. I did create an empty folder 'Temp' in the root of this shared folder and I could create the big file successfully.

 

I could confirm that twice today.

 

 

P.S.: Most of my drives are 7200rpm.

 

Link to comment

This is an interesting issue ... but I don't think it's strictly due to copying large files.  Just to confirm, I just copied two very large image files to my RC16c server ... one was 99GB, the other 65GB ... and they're both nested several directories into the share [//Tower/Backups/Images of my systems/PCName/Date of Image/<file goes here>]

 

Worked perfectly ... and I just ran a validation to confirm they were indeed good copies.

 

I noted that you also can do the same -- you indicated if you created the large files on another PC, they would copy fine.

 

So ... the question is just what is ImgBurn doing that doesn't "play right" with UnRAID.

 

Just for grins, when I get a chance I'll create an image file directly to the UnRAID server (using Image for Windows)  and see if it also works okay.

 

However ... I think it's very likely that your issue is simply that your server is so full that the initial file write by ImgBurn is written okay on one drive, but then it attempts to expand that file and the drive is too full for that.  I don't believe UnRAID will span files across drives (Tom, WeeboTech, etc. can confirm that).  That's why if you copy the large file by itself it works ... because the size is "known" when the copy starts, so it's allocated on a drive that has enough room.

 

A simple test of that possibility:  Choose a drive that has plenty of extra space;  and then create an image to a folder on that drive => nest the folder as deep as you typically would, as I don't think this has anything to do with the depth of the folder.

 

As for speed ... writes to very full drives definitely slow WAY down => in my experience it really doesn't matter if they're 5400, 5900, or 7200 rpm drives  (Although obviously the faster drive, which also has notably faster seek times, will finish the "where to write" process quicker).

 

Link to comment
I think it's very likely that your issue is simply that your server is so full that the initial file write by ImgBurn is written okay on one drive, but then it attempts to expand that file and the drive is too full for that.

 

I don't think that this is the reason. There's no drive expansion during the copy because there's enough room on the drive that was selected by unRAID for that particular file. And I can recreate that issue even when using a single drive - instead of a shared folder.

 

 

But I'm still impressed. This morning I had to create a lot of huge images on my two LimeTech towers and when using an empty folder it always works. When using my usual folders it always fails (100% ok vs. 100% fail on huge files).

 

 

What does ImgBurn? There's an option to preallocate files. This is turned on per default. I will switch that option off and look what happens. Here's a quote from their manual:

 

Allocate Files On Creation: Files created in 'Read/Build Mode' will be preallocated. This cuts down on fragmentation.

 

 

I know that this issue has to do with nearly full arrays. But I want to fill my array at least to its max and don't waste 2TB per machine (=4TB) because of the nature of unRAID/ReiserFS.

 

Link to comment

FWIW I just finished doing an image directly to the server from Image for Windows on one of my other machines.  54GB file ... sent directly to a well-nested folder, and it worked perfectly.

 

However, none of my drives are close to full on my backup server (at least 1TB free on all of them).

 

My Media server is a different story ... several of those drives are VERY full [in a couple cases only 20MB or so of free space].  But I wrote directly to those drives to fill the last bit up ... and they're all static data (DVDs), so once they were full, they're never modified again.    The last few writes to the drives were quite slow.

 

One other thought:  Have you had this problem with Windows 7?

 

Link to comment

One other thought:  Have you had this problem with Windows 7?

 

Yes. This is not the first time that my unRAID machine becomes full. This was something that did bite me in the past as well.

 

 

I'm a little bit further now: With this ImgBurn option "Settings/General/Page 1/AllocateFilesOnCreation=off" everything works. Do I need to worry about fragmentation now ;-)

 

Link to comment

With this ImgBurn option "Settings/General/Page 1/AllocateFilesOnCreation=off" everything works. Do I need to worry about fragmentation now ;-)

 

Good ... glad it's working.  No, I wouldn't worry about fragmentation.  My understanding is that Reiser does a much better job of minimizing this ... I simply don't think it's an issue with UnRAID.

 

Link to comment

I'm a little bit further now: With this ImgBurn option "Settings/General/Page 1/AllocateFilesOnCreation=off" everything works. Do I need to worry about fragmentation now ;-)

 

Only if you have multiple writes from multiple processes simultaneously.

 

What I've noticed in the past. Drives that have not been allocated any new files for a while (not sure what the time frame is) will reveal this initial allocation delay if they are near full with allot of files and are slower drives.

 

Once you make the new 'initial' directory and/or file, all the necessary file system structures are now in ram, so any network access that allocates a new file can do so at a faster speed.

 

This is why I asked if you could observe the lights on your drives.

 

It's been a known issue that nearly full drives get slower allocating new files and/or directories, at least for reiserfs.  I've not experienced it the same way with ext3 or veritas filesystem.

With reiserfs, once that new directory or file is opened and ready for writing, the next operations go faster.

 

Over time the filesystem structures get flushed out (or you reboot) and the process occurs again.

This was a major reason for me to resort to a cache drive. I could write at optimal speeds then move the files manually in place via rsync.

Link to comment

This was a major reason for me to resort to a cache drive. I could write at optimal speeds then move the files manually in place via rsync.

 

r.e. "move the files manually in place via rsync" ==>  Is this the UnRAID "mover" ... or do you do this independently?    ... and if the latter, Why?

 

Link to comment