unRAID Server Release 5.0-beta9 Available


Recommended Posts

Syslog is attached.

Did you capture this syslog after running some writes that were slow?

 

Yes i startet the Upload of an 8GB ISO File via SMB, but stopped it after a few seconds when i saw that the transfer speeds were terribly slow.

 

 

I can't explain that.  Transfer to a 4-year-old Celeron based server with old SATA controllers are around 25MB/sec to array, double that to the cache disk.  Maybe repeat experiment without any add-ons.  If still slow, perhaps repeat with parity disk disabled (though obviously this will invalidate your parity disk).

Link to comment
  • Replies 157
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

Posted Images

Syslog is attached.

Did you capture this syslog after running some writes that were slow?

 

Yes i startet the Upload of an 8GB ISO File via SMB, but stopped it after a few seconds when i saw that the transfer speeds were terribly slow.

 

 

I can't explain that.  Transfer to a 4-year-old Celeron based server with old SATA controllers are around 25MB/sec to array, double that to the cache disk.  Maybe repeat experiment without any add-ons.  If still slow, perhaps repeat with parity disk disabled (though obviously this will invalidate your parity disk).

 

I think I am having speed related issues as well Tom.

 

I was wondering why my backups were taking so long, and it turned out to parity protected shares it was down in single figures!

 

I will post the results of my tests tomorrow, as it's now my bedtime!

Link to comment

Thanks for the tips Tom,

 

I'll do some more testing and digging.

 

Peter / Tom, according to the logs enabled with the LOG_MAXDEBUG line, it's 1800s / 30 mins; see below:

 

Jul 20 11:35:08 unRAID cnid_dbd[26478]: Checkpoint interval: 1800 seconds. Next checkpoint: Jul 20 12:05:08.

Link to comment

Some testing has shown:

 

Time-Machine Share: Single WD10EARS + not cached (in array)

Media Share: WD20EARS + HD154UI + cached (in array)

Cache: 7K1000.B

 

Note: All mods apart from unMenu were disabled for this testing.

Test was to copy and copy back a 500MB DMG file from my SSD which has read/write speeds of 250/100 MB/sec, so it's not the bottleneck.

 

Cache

AFP: Read: 67.5 MB/sec  –  Write: 67.6 MB/sec (5b6: 82.2MB/sec write)

 

TM Share

SMB: Read: 39.8 MB/sec  –  Write: 8.4 MB/sec

AFP: Read: 59 MB/sec –  Write: 16.9 MB/sec (5b6: 20.0MB/sec write)

 

Media Share

SMB: Read: 46.9 MB/sec  –  Write: 6.8MB/sec

AFP: Read: 78.7 MB/sec –  Write: 63 MB/sec (5b6: 25.4MB/sec write - so you've fixed something here!)

 

 

For 5b6 results see: http://lime-technology.com/forum/index.php?topic=13195.msg125081#msg125081

 

 

Cache transfer history:

Read:

cache_AFP_read.png

 

Write:

cache_AFP_write.png

 

 

TM Share write transfer history

AFP:

Time-Machine_AFP_write.png

 

SMB:

Time-Machine_SMB_write.png

 

 

Media Share write transfer history

SMB:

Media_SMB_write.png

 

 

Syslog is attached.

 

That seem normal or is there something up?

syslog-2011-07-20.txt

Link to comment

That seem normal or is there something up?

I can't answer the above question but I can tell you you need to use a bigger file.

 

Something in the multi GB range (i.e. something between 5GB and 10GB should do.

 

A 500MB file could be cached in RAM and therefor not give accurate results.  Please do the tests again and do it multiple times with multiple different 5GB to 10GB files and then find the averages.

Link to comment

I'm very happy to say... I've got to the bottom of the AFP issues!

 

Tom, thank you for your 'light reading' of BerkleyDB, I'm very happy to say that I didn't need that to find the problem, and that you haven't compiled BerkleyDB wrong.

 

To start with, as I said in a previous post; AFP has never worked properly on unRAID. It has, and continues to work, just not as it is supposed to. I'll explain more about this a little later.

 

Secondly, I should point out that the issues that I've been researching are not related to transfer speed, only to sluggish browsing through a directory tree in your share.

 

In short, the problem is with inodes. <- For some of you, perhaps Tom included, this may be enough for you to think 'Oh now I understand where the problem lies...' for others, I will continue...

 

 

How AFP (Netatalk) Works

 

AFP works by creating a database of file and folder locations for each share. When you first browse into a share, a database daemon for that share starts up, and looks to see if a database file exists for that share already. If it exists, this file is read and on you go with your browsing. If the file doesn't exist, it is created.

 

This database files actually starts pretty small, but grows as you browse through your directory structure. As you browse into a folder you've not looked into before, the files and folders inside are built into the database so that next time you go in there, browsing will be instantaneous. Obviously any new files created, modifications or movements will also update the database.

 

Problems arise if you make a change to the files or folders without using AFP, by using SMB or telnet for example. This is because the databases only get updated if the AFP daemons know that a change has taken place.

 

This isn't a big problem, because you'll probably only be changing a few files, and there is some intelligence built into the database process to correct errors in the database as it finds them. For a few file change errors, you probably wouldn't notice the performance hit when browsing, but for a whole directory full of errors you definitely would.

 

The Issue

 

The odd behaviour that I was seeing can be summarised as:

 

After stopping and starting the array or rebooting the system, performance browsing through the directory tree in finder was very very slow, but only the first time.

 

The diagnosis

 

Tom was kind enough to recommend turning on logging for the process which handled the AFP databases. What I found was that after a start/stop or reboot, this process would give errors like this:

 

Jul 20 20:38:32 unRAID cnid_dbd[7992]: dbd_add(did:2, 'HD-DVD Rips', dev/ino:0xd/0x112) {start}
Jul 20 20:38:32 unRAID cnid_dbd[7992]: dbd_lookup(): START
Jul 20 20:38:32 unRAID cnid_dbd[7992]: dbd_lookup(name:'HD-DVD Rips', did:2, dev/ino:0xd/0x112) {devino: 1, didname: 1}
Jul 20 20:38:32 unRAID cnid_dbd[7992]: dbd_lookup: CNID mismatch: (DID:2/'HD-DVD Rips') --> 7611 , (0xd/0x112) --> 6223
Jul 20 20:38:32 unRAID cnid_dbd[7992]: cnid_delete: CNID 6223 deleted
Jul 20 20:38:32 unRAID cnid_dbd[7992]: cnid_delete: CNID 7611 deleted
Jul 20 20:38:32 unRAID cnid_dbd[7992]: dbd_add(did:2, 'HD-DVD Rips', dev/ino:0xd/0x112): {adding to database ...}
Jul 20 20:38:32 unRAID cnid_dbd[7992]: dbd_add(did:2, 'HD-DVD Rips', dev/ino:0xd/0x112): Added with CNID: 7620

 

There seems to be a mismatch somewhere in the database (called CNID) for every directory, requiring deleting of the old record, and adding of a new one. This takes a long time if there are a lot of files in the folder, so why is there a mismatch.

 

After a lot of Googling, and finding no answers, I was about to give up, when I thought... "dev/ino... what could that be?" ino sounds a bit like inode, so I decided to take a look at the inode of a few files.

 

root@unRAID:~# ls -i /mnt/user/Movies/
288 1080p\ Re-encodes/  285 DVD\ Rips/               279 SD\ Re-encodes/
282 720p\ Re-encodes/   274 HD-DVD\ Rips/            249 Temporary\ Items/
16 Blu-ray\ Rips/      246 Network\ Trash\ Folder/

 

Those of you good at maths will be able to see that the "HD-DVD Rips" folder has an inode of 274 or 0x112 in hex!

 

We can conclude then that the AFP database is created based on the inode of files.

 

A quick start and stop of the array and it becomes obvious that the user shares created with the 'fuse' file system have a different inode value every time, and that's the cause of the problem.

 

After every stop/start or reboot, the AFP databases appear to be correct, but are 100% full of errors that need to be corrected.

 

This problem has been here in unRAID all along

 

As far as I know, nothing has changed in the user share 'fuse' file system implementation since AFP support was added to unRAID, which means that this issue has been here all along. I actually suspected this, but noticed that browsing performance got much worse in beta9 after Netatalk was upgraded to version 2.1.5.

 

The main thing that changed from Netatalk 2.0.5 to 2.1.5 relating to this, is that in 2.1.5 by default stores a cache of the database (or part of it) in every .AppleDouble folder. The purpose of this was to speed up browsing a little bit, and also provide an additional method of disaster recovery should the database become corrupt.

 

In unRAID, the database never appears to be corrupt, it's just becomes full of errors. In the latest version of unRAID, fixing these errors means writing the changes to every .AppleDouble folder in the share, as well as the .AppleDB folder in the root of the share. This is a lot of small writes, which on an unRAID parity protected array can take a lot of time.

 

Netatalk 2.0.5 only stored the database files in the .AppleDB folder, so although the problem existed, it may have been slightly less obvious.

 

The Fix

 

I'm actually not sure how this issue can be resolved. Tom knows much more about the 'fuse' file system used for user shares; maybe there is an option that can preserve inodes between file system creations.

 

It's over to you for now Tom, unless I'm able to find a fix by tinkering over the next few days.

 

The Hacks that improve things a bit

 

A few posts ago I recommended moving the databases outside the parity protected array, possibly into the RAM disk. Although this may seem like a bad idea at first, because these databases would be lost on every reboot, it actually makes a lot of sense. The RAM disk is the fastest disk in the system, and every reboot makes the AFP database 100% invalid anyway!

 

I also recommended turning on the option 'nocnidcache' in the AppleVolumes.default- configuration file. This will not prevent .AppleDouble folders being created for resource forks as needed, but will prevent the AFP database cache from being stored in them as well.

 

Both of these suggestions will not fix the problem, but will minimize it's impact.

 

One last note

 

During my tinkering, I was also able to investigate the other errors that AFP was throwing up about extended attributes:

 

Jul 16 23:07:44.859913 afpd[1858] {volume.c:1907} (W:AFPDaemon): volume "Audio" does not support Extended Attributes, using ea:ad instead
Jul 16 23:07:44.860087 afpd[1858] {volume.c:1907} (W:AFPDaemon): volume "BD-Backup" does not support Extended Attributes, using ea:ad instead
Jul 16 23:07:44.860233 afpd[1858] {volume.c:1907} (W:AFPDaemon): volume "Movies" does not support Extended Attributes, using ea:ad instead
Jul 16 23:07:44.860384 afpd[1858] {volume.c:1907} (W:AFPDaemon): volume "Software" does not support Extended Attributes, using ea:ad instead
Jul 16 23:07:44.860490 afpd[1858] {volume.c:1907} (W:AFPDaemon): volume "TV" does not support Extended Attributes, using ea:ad instead

 

This is caused by another change in Netatalk 2.1.5, where extended attributes are by default stored to disk, which the 'fuse' file system seems not to like.

 

Adding the option

ea:none

or

ea:ad

to the AppleVolumes.default- file will instruct the AFP daemons either not to bother with extended attributes (which seems reasonable) or to make a note of the changes in the .AppleDouble folders.

Link to comment

That seem normal or is there something up?

I can't answer the above question but I can tell you you need to use a bigger file.

 

Something in the multi GB range (i.e. something between 5GB and 10GB should do.

 

A 500MB file could be cached in RAM and therefor not give accurate results.  Please do the tests again and do it multiple times with multiple different 5GB to 10GB files and then find the averages.

 

Thats not very real world tho, is it?

 

Currently im doing a time-machine backup and it's barely getting past 1 MB/sec.

 

It's easy to get high transfer speeds with big files, with the smaller ones (which are also the most common) is where the work has to be done. 1 MB/sec is not quick enough tbqh.

 

However, I am currently downloading Lion, I will test with my MacBook Pro in an hours time.

Link to comment

Very small files are always much much more slower. That's why hard disk specifications are quoted with 'Random Writes' and 'Sequential Writes' as a minimum, and full performance tests are done with multiple file sizes. If you want to test the sequential write speed, you'll need to use a very big file as Prostuff says.

 

I'm not sure exactly what data it is that you're writing, or why your performance is that slow, but Prostuff does have a point.

 

The whole concept of 'tar' was built in this... making all the small files into one big tarball makes them much quicker to move about.

Link to comment

Very small files are always much much more slower. That's why hard disk specifications are quoted with 'Random Writes' and 'Sequential Writes' as a minimum, and full performance tests are done with multiple file sizes. If you want to test the sequential write speed, you'll need to use a very big file as Prostuff says.

 

I'm not sure exactly what data it is that you're writing, or why your performance is that slow, but Prostuff does have a point.

 

The whole concept of 'tar' was built in this... making all the small files into one big tarball makes them much quicker to move about.

 

Im well aware of all that, I have been a hardware enthusiast for many years ;)

 

However when I know the WD10EARS can do at-least 10MB/sec and more like 50MB/sec when doing small files something is wrong.

 

See:

time_machine_speed_AFP.png

 

(Ignore the receive, that's Lion being downloaded)

 

The performance that TM is getting to the TM share is *not* acceptable, something, somewhere needs a tweak. I'd be happy with just 10 MB/sec, but not less than that.

 

CPU usage is <10% and is sitting at 800Mhz for both cores, disk writes (both parity and WD10EARS) is below 500KB/sec!!!!

Link to comment

More stuff

 

I am not saying you are not correct, and something may be able to be tweaked.

 

You also have to realize that every write to a drive with parity protection will also be much slower.  The small random writes that require parity to be updated also are going to always be the slowest.

Link to comment

More stuff

 

I am not saying you are not correct, and something may be able to be tweaked.

 

You also have to realize that every write to a drive with parity protection will also be much slower.  The small random writes that require parity to be updated also are going to always be the slowest.

 

Yes, ofcourse.

 

However, do you think 1MB/sec in this day and age is ok?

 

If UnRAID is going to have TM support, it might as well not bother for the speeds i'm getting ;)

Link to comment

Like i already mentioned in my Post yesterday, i have tried it with an 8GB ISO File yesterday and i have exact the same problem like Concorde Rules.

I barely get 3,8MB/s from my Windows 7 PC, same circumstances  but with beta6a i get around 40MB/s.

 

So this is not an issue of uploading many small files, also there has been other Users in the 5.0 developement forum complaining about really slow write speeds with this beta.

 

So there must have happened something between beta6a and beta9.

Link to comment

For those experiencing transfer slow downs: does you motherboard use a Realtek NIC?

Concorde Rules - I can see in the syslog that yours does.

 

Apparently this is known issue:

https://bbs.archlinux.org/viewtopic.php?id=115644&p=1

and a workaround is mentioned here:

http://mailman.archlinux.org/pipermail/arch-dev-public/2011-June/020672.html

 

So I'll research this some more and release a fix.

 

Link to comment

For those experiencing transfer slow downs: does you motherboard use a Realtek NIC?

Concorde Rules - I can see in the syslog that yours does.

 

Apparently this is known issue:

https://bbs.archlinux.org/viewtopic.php?id=115644&p=1

and a workaround is mentioned here:

http://mailman.archlinux.org/pipermail/arch-dev-public/2011-June/020672.html

 

So I'll research this some more and release a fix.

 

 

Tom, you may want to look here. darkside40 did some digging through the posted syslogs and all are using realtek.

 

My production server is running realtek, though I have not updated that to the latest beta.

Link to comment

For those experiencing transfer slow downs: does you motherboard use a Realtek NIC?

Concorde Rules - I can see in the syslog that yours does.

 

Apparently this is known issue:

https://bbs.archlinux.org/viewtopic.php?id=115644&p=1

and a workaround is mentioned here:

http://mailman.archlinux.org/pipermail/arch-dev-public/2011-June/020672.html

 

So I'll research this some more and release a fix.

 

 

That would be absolutely great. While waiting for the fix i think i will go back to 5.0beta6a.

Link to comment

I got the Intel NIC and had problems with WOL and auto-mounting my shares with speeding_ant's script. Reverted back to my Realtek and my network freeze issues seemed to disappear (but this was also on 5.0b I believe). Speeds seem to be ok, though I haven't been transferring greater than 10GB at once so I'm not sure if that's a factor or not.

Link to comment

For those experiencing transfer slow downs: does you motherboard use a Realtek NIC?

Concorde Rules - I can see in the syslog that yours does.

 

Apparently this is known issue:

https://bbs.archlinux.org/viewtopic.php?id=115644&p=1

and a workaround is mentioned here:

http://mailman.archlinux.org/pipermail/arch-dev-public/2011-June/020672.html

 

So I'll research this some more and release a fix.

 

 

Wow, ok.

 

Thanks for you're time on this!

 

On a side note, Lion seems to work fine [as in, it loads and transfers] with 5b9. No idea about TM tho as my Mac Pro won't be updated for a while.

Link to comment

got two issues with beta9.

 

i upgraded from beta6 and upon a reboot, my AMD X2 5000+ is stuck at 2600 MHz and it does not scale its speed. it does however when i run the command

modprobe powernow-k8

but it only scales down to 1000. It used to scale down to 800 with beta6.

 

the other issue is about the unMENU image server going crazy and spawning processes. See here for further info: http://lime-technology.com/forum/index.php?topic=5568.msg133452#msg133452

 

Link to comment

got two issues with beta9.

 

i upgraded from beta6 and upon a reboot, my AMD X2 5000+ is stuck at 2600 MHz and it does not scale its speed. it does however when i run the command

modprobe powernow-k8

but it only scales down to 1000. It used to scale down to 800 with beta6.

 

the other issue is about the unMENU image server going crazy and spawning processes. See here for further info: http://lime-technology.com/forum/index.php?topic=5568.msg133452#msg133452

 

Then I would suggest you not use it.
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.