Nezil

Members
  • Posts

    130
  • Joined

  • Last visited

Everything posted by Nezil

  1. Tom, can you detail the 'other changes to Netatalk' a bit please. Did you have a chance to solve the inode issues that I identified?
  2. I think no-one would officially endorse running a beta system to store your really important data. Having said that, there are a number of users (including me) who are doing just that. Beta6a was labelled as 'Stable', but there was an announcement about a bug discovered in all versions prior to beta8 I believe that in some situations could result in data corruption (Something to do with writing to the array during a rebuild, in some situations). Personally, I've had no problems, but I've only been playing with unRAID for about 2 months, and am running on un-certified hardware as well. If you want to be 100% safe, go with 4.7 on certified hardware (if you can still find the parts), otherwise you take a calculated risk like some of us here.
  3. Very small files are always much much more slower. That's why hard disk specifications are quoted with 'Random Writes' and 'Sequential Writes' as a minimum, and full performance tests are done with multiple file sizes. If you want to test the sequential write speed, you'll need to use a very big file as Prostuff says. I'm not sure exactly what data it is that you're writing, or why your performance is that slow, but Prostuff does have a point. The whole concept of 'tar' was built in this... making all the small files into one big tarball makes them much quicker to move about.
  4. I'm very happy to say... I've got to the bottom of the AFP issues! Tom, thank you for your 'light reading' of BerkleyDB, I'm very happy to say that I didn't need that to find the problem, and that you haven't compiled BerkleyDB wrong. To start with, as I said in a previous post; AFP has never worked properly on unRAID. It has, and continues to work, just not as it is supposed to. I'll explain more about this a little later. Secondly, I should point out that the issues that I've been researching are not related to transfer speed, only to sluggish browsing through a directory tree in your share. In short, the problem is with inodes. <- For some of you, perhaps Tom included, this may be enough for you to think 'Oh now I understand where the problem lies...' for others, I will continue... How AFP (Netatalk) Works AFP works by creating a database of file and folder locations for each share. When you first browse into a share, a database daemon for that share starts up, and looks to see if a database file exists for that share already. If it exists, this file is read and on you go with your browsing. If the file doesn't exist, it is created. This database files actually starts pretty small, but grows as you browse through your directory structure. As you browse into a folder you've not looked into before, the files and folders inside are built into the database so that next time you go in there, browsing will be instantaneous. Obviously any new files created, modifications or movements will also update the database. Problems arise if you make a change to the files or folders without using AFP, by using SMB or telnet for example. This is because the databases only get updated if the AFP daemons know that a change has taken place. This isn't a big problem, because you'll probably only be changing a few files, and there is some intelligence built into the database process to correct errors in the database as it finds them. For a few file change errors, you probably wouldn't notice the performance hit when browsing, but for a whole directory full of errors you definitely would. The Issue The odd behaviour that I was seeing can be summarised as: After stopping and starting the array or rebooting the system, performance browsing through the directory tree in finder was very very slow, but only the first time. The diagnosis Tom was kind enough to recommend turning on logging for the process which handled the AFP databases. What I found was that after a start/stop or reboot, this process would give errors like this: Jul 20 20:38:32 unRAID cnid_dbd[7992]: dbd_add(did:2, 'HD-DVD Rips', dev/ino:0xd/0x112) {start} Jul 20 20:38:32 unRAID cnid_dbd[7992]: dbd_lookup(): START Jul 20 20:38:32 unRAID cnid_dbd[7992]: dbd_lookup(name:'HD-DVD Rips', did:2, dev/ino:0xd/0x112) {devino: 1, didname: 1} Jul 20 20:38:32 unRAID cnid_dbd[7992]: dbd_lookup: CNID mismatch: (DID:2/'HD-DVD Rips') --> 7611 , (0xd/0x112) --> 6223 Jul 20 20:38:32 unRAID cnid_dbd[7992]: cnid_delete: CNID 6223 deleted Jul 20 20:38:32 unRAID cnid_dbd[7992]: cnid_delete: CNID 7611 deleted Jul 20 20:38:32 unRAID cnid_dbd[7992]: dbd_add(did:2, 'HD-DVD Rips', dev/ino:0xd/0x112): {adding to database ...} Jul 20 20:38:32 unRAID cnid_dbd[7992]: dbd_add(did:2, 'HD-DVD Rips', dev/ino:0xd/0x112): Added with CNID: 7620 There seems to be a mismatch somewhere in the database (called CNID) for every directory, requiring deleting of the old record, and adding of a new one. This takes a long time if there are a lot of files in the folder, so why is there a mismatch. After a lot of Googling, and finding no answers, I was about to give up, when I thought... "dev/ino... what could that be?" ino sounds a bit like inode, so I decided to take a look at the inode of a few files. root@unRAID:~# ls -i /mnt/user/Movies/ 288 1080p\ Re-encodes/ 285 DVD\ Rips/ 279 SD\ Re-encodes/ 282 720p\ Re-encodes/ 274 HD-DVD\ Rips/ 249 Temporary\ Items/ 16 Blu-ray\ Rips/ 246 Network\ Trash\ Folder/ Those of you good at maths will be able to see that the "HD-DVD Rips" folder has an inode of 274 or 0x112 in hex! We can conclude then that the AFP database is created based on the inode of files. A quick start and stop of the array and it becomes obvious that the user shares created with the 'fuse' file system have a different inode value every time, and that's the cause of the problem. After every stop/start or reboot, the AFP databases appear to be correct, but are 100% full of errors that need to be corrected. This problem has been here in unRAID all along As far as I know, nothing has changed in the user share 'fuse' file system implementation since AFP support was added to unRAID, which means that this issue has been here all along. I actually suspected this, but noticed that browsing performance got much worse in beta9 after Netatalk was upgraded to version 2.1.5. The main thing that changed from Netatalk 2.0.5 to 2.1.5 relating to this, is that in 2.1.5 by default stores a cache of the database (or part of it) in every .AppleDouble folder. The purpose of this was to speed up browsing a little bit, and also provide an additional method of disaster recovery should the database become corrupt. In unRAID, the database never appears to be corrupt, it's just becomes full of errors. In the latest version of unRAID, fixing these errors means writing the changes to every .AppleDouble folder in the share, as well as the .AppleDB folder in the root of the share. This is a lot of small writes, which on an unRAID parity protected array can take a lot of time. Netatalk 2.0.5 only stored the database files in the .AppleDB folder, so although the problem existed, it may have been slightly less obvious. The Fix I'm actually not sure how this issue can be resolved. Tom knows much more about the 'fuse' file system used for user shares; maybe there is an option that can preserve inodes between file system creations. It's over to you for now Tom, unless I'm able to find a fix by tinkering over the next few days. The Hacks that improve things a bit A few posts ago I recommended moving the databases outside the parity protected array, possibly into the RAM disk. Although this may seem like a bad idea at first, because these databases would be lost on every reboot, it actually makes a lot of sense. The RAM disk is the fastest disk in the system, and every reboot makes the AFP database 100% invalid anyway! I also recommended turning on the option 'nocnidcache' in the AppleVolumes.default- configuration file. This will not prevent .AppleDouble folders being created for resource forks as needed, but will prevent the AFP database cache from being stored in them as well. Both of these suggestions will not fix the problem, but will minimize it's impact. One last note During my tinkering, I was also able to investigate the other errors that AFP was throwing up about extended attributes: Jul 16 23:07:44.859913 afpd[1858] {volume.c:1907} (W:AFPDaemon): volume "Audio" does not support Extended Attributes, using ea:ad instead Jul 16 23:07:44.860087 afpd[1858] {volume.c:1907} (W:AFPDaemon): volume "BD-Backup" does not support Extended Attributes, using ea:ad instead Jul 16 23:07:44.860233 afpd[1858] {volume.c:1907} (W:AFPDaemon): volume "Movies" does not support Extended Attributes, using ea:ad instead Jul 16 23:07:44.860384 afpd[1858] {volume.c:1907} (W:AFPDaemon): volume "Software" does not support Extended Attributes, using ea:ad instead Jul 16 23:07:44.860490 afpd[1858] {volume.c:1907} (W:AFPDaemon): volume "TV" does not support Extended Attributes, using ea:ad instead This is caused by another change in Netatalk 2.1.5, where extended attributes are by default stored to disk, which the 'fuse' file system seems not to like. Adding the option ea:none or ea:ad to the AppleVolumes.default- file will instruct the AFP daemons either not to bother with extended attributes (which seems reasonable) or to make a note of the changes in the .AppleDouble folders.
  5. Thanks for the tips Tom, I'll do some more testing and digging. Peter / Tom, according to the logs enabled with the LOG_MAXDEBUG line, it's 1800s / 30 mins; see below: Jul 20 11:35:08 unRAID cnid_dbd[26478]: Checkpoint interval: 1800 seconds. Next checkpoint: Jul 20 12:05:08.
  6. Warlock, that's a feature, nothing new to beta9. More details like this can be found in the wiki.
  7. OK... an update, and Tom; I now see exactly what you mean about the issue. My suggestions for the 'nocnid cache' and moving the dbpath for .AppleDB folders off the array were simply to improve the speed in creating and rebuilding the CNID database, something that should not be necessary once the main problem is fixed. I tried the following things: If you stop and then restart the AFP daemons while the array is running, all of the __db.00x files are removed, but they're rebuilt from the log.00000001 file at a pretty high speed (about 11 seconds for a full find . command on my setup). The real problem is that the CNID backup in the log.00000001 file stops working when the array is stopped and then re-started. I thought perhaps this was because the AFP daemons see that the shares no longer exist in the AppleVolumes.default file once the array is stopped, and perhaps mark the log file as no longer important - I was wrong! I prevented the AppleVolumes.default file from being changed, stoped and restarted the array and it didn't make any difference. The databases could not be recovered and took the long time (~4 mins) to be rebuilt. Next I thought that maybe the AFP daemons noticed that the folder /mnt/users no longer existed, and somehow marked the log file as no longer important - I was wrong again!I stopped the AFP daemons manually, and moved the rc.atalk file to prevent the AFP daemons from starting when the /mnt/users folder was gone. I then stopped and restarted the array, moved the rc.atalk file back and manually restarted and it didn't make any difference. The databases could not be recovered and took the long time (~4 mins) to be rebuilt. Now, like you Tom, I am stumped. The only possible thing I can think of, is that the CNID databases know about what has changed by the modification time of the folders. When the array is stopped, these folders don't exist, and are created as part of starting the array. This makes the creation date different every time. I was going to try and test this, but I'm not sure how I can prevent emhttp from doing this on shut-down and start-up; I think it's over to you now!
  8. Thanks for taking the time to respond to my posts Tom, though I do still have some questions: Nezil - nice analysis, but unfortunately a lot of it is wrong/misleading. The .AppleDB folder contains the Berkeley Database files used to store CNID-to-file mappings required by AFP. The files starting with "__" such as __db.001, are temporary files used during operation; in my testing they take very little time to create upon startup of AFP daemons. The log.0000000001 file stores DB transactions (ie, a journal of sorts) that lets Berkeley DB recover from a system crash. In my testing, I've found that "restoring" the DB from using this file is what's taking all the time, and I haven't determined why this is the case yet. I think the point that I was making was that the time to "restore" the __db.00x files from the log.00000000x files is basically identical to having to re-create them. More concerning of course is that they even have to be re-created at all. Your point is that they only need to be recovered during a system crash, but I have found that starting and stopping the array without a system crash requires them to be rebuilt / recovered as well. The other interesting thing to note is that these files are not built upon startup of AFP daemons. The .AppleDB folder structure IS indeed built at that time, and takes less than a second to build, but it is not populated until you browse through all the shares, and this is what takes the time... every time your start / stop the array. You definitely don't want these in a RAM disk, maybe the Cache disk, but probably the extra complication this entails is not worth it, but something I'm looking into. I agree that you wouldn't want this in the RAM disk, but only because they don't persist with a reboot. The reason I suggested this is that as described above, the CNID database doesn't seem to survive a start / stop of the array currently, and so has to be re-built anyway. With this in mind, a RAM disk location would at least speed up the creation of the databases that seems necessary every start / stop. If it's possible to prevent the CNID database needing to be re-created every start / stop / reboot (see point 3), then having these files on the array may not be a problem. They might take a long time to create, but that would only be necessary once, so wouldn't be an issue. It's also interesting to note that having these files on the array means that the disk containing the CNID database needs to be spun up every time you browse the share, which would not be necessary with SMB. The .AppleDouble files have always been part of netatalk, and are required to implement resource forks. If you use 'nocnidcache' option, and you lose the main cnid2.db, then all your resource forks are gone. This is not recommended, and adds very little to decrease performance. I think there is some confusion here; my understanding is that in Netatalk 2.1.x CNID files are stored inside .AppleDB AND the .AppleDouble folder in each directory. In previous versions of Netatalk, the CNID files were only stored in .AppleDB. The 'nocnidcache' option does not stop .AppleDouble folders from being created when they're needed for resource forks, it just stops the CNID cache being stored there as well. I believe that this does have a big performance impact, as when browsing into a directory that has several hundred sub-directories, several hundred new files need to be created on the array, which requires writing to the disks. Again, making changes to AppleVolumes.default does not cause "CNID databases to be deleted /re-created", nor should stop/start of the various AFP daemons. A big change between netatalk 2.0 and 2.1 was requirement of using Berkeley DB 4.6 or later (previously DB 4.4 was requirement). This required me to create a slack package for DB 4.6, which I did by modifying slack build scripts for 4.4. It's possible I have some configure options incorrect for this version. I think this suggestion was actually the most important one, as I was trying to solve the need to re-create / restore the databases every start / stop. In my testing, when the array is stopped, in the logs, the AppleVolumes.default- file is copied to the AppleVolumes.default location, and on inspection you can see that all the shares are missing from the file. This makes sense so that AFP does not list the shares when the array is stopped. When this process happens, all of the .__db00x files are deleted as well, which would explain why they need to be re-created / restored. If I call /etc/rc.d/rc.atalk stop before stopping the array, the .__db00x files remain; but when I start the array again, the .__db00x files are deleted during the process. I would expect AFP to not work after starting the array, but it does; you must therefore be starting AFP daemons during the array startup process. My guess therefore on the process of re-starting the array is: Stopping array - Stop array clicked in Web UI - AFP daemon stopped - AppleVolumes.default file modified to remove shares - AFP daemon re-started - AFP daemon sees that user shares are no longer required / present and deletes the CNID database files - User Shares unmounted Starting array - Start array clicked in Web UI - User Shares mounted - AFP daemon stopped - AppleVolumes.default file modified to add shares - AFP daemon re-started I may be wrong about the exact order, and was unable to test. I was simply trying to give you a pointer for something to look at as to the cause of the database loss on start / stop / reboot.
  9. Saves the day, in an earlier post (about beta8d) by Tom, he explains the purpose of this folder. In short... you need it... don't delete it... unRAID created it for moving files from the cache drive. See this quote:
  10. So back to the AFP Issues... Having moved these .AppleDB database files to another disk, and noticing a performance benefit from having done so, I have done some investigation to see if I can get these database files to not need to be re-created on every boot. I've also done some testing with the previous Netatalk in beta8d (v2.0.5), again with the dbpath variable set to a separate disk. I'll start with the results from that version: 5.0b8d / Netatalk 2.0.5 After boot up, I removed all the CNID .AppleDB folders, and ran my find . commands on each of the shares to build the CNID databases. Once this was done, browsing was fast on all machines in my network, and I checked the resulting database file sizes by running the du /apps/dbd/ -h --max-depth=1 command, results below: root@unRAID:/apps/dbd# du -h --max-depth 1 452K ./TV 3.0M ./Audio 312K ./Movies 48K ./BD-Backup 32K ./Software 3.8M . Total data for CNID databases is 3.8 MB, quite reasonable. I rebooted the system (which caused a Kernel Oops again - seems it wasn't an issue with the Biostar card!, Syslog attached). After the reboot, the CNID databases were still present, and had the same combined size of 3.8MB; this didn't seem to make browsing any better however, and after running all the find . commands again, the result of another du /apps/dbd/ -h --max-depth=1 command, showed: root@unRAID:/apps/dbd/BD-Backup# du /apps/dbd/ -h --max-depth=1 861K /apps/dbd/TV 4.5M /apps/dbd/Audio 581K /apps/dbd/Movies 48K /apps/dbd/BD-Backup 32K /apps/dbd/Software 6.0M /apps/dbd/ The size had nearly doubled! I wonder how long this will go on for? Stopping and Starting the array also required the CNID databases to be re-created, but with different results; this time the du /apps/dbd/ -h --max-depth=1 command, showed only slightly larger sizes than previous run: 6.4 MB total. A few other bits of information: - Running find /Volumes on the Mac, with all the unRAID AFP shares mounted takes 3 minutes 40 seconds on my system (If these CNID database files were on the array drives, it would take even longer as parity would be involved in writing the files) - Running the same command takes just 3 seconds to complete once the databases are created - The killall -HUP afpd command issued when minor changes to shares are made in the web UI doesn't affect the CNID databases; performance isn't affected - Running /etc/rc.d/rc.atalk restart extends the time for the find command to 11 seconds, for the next run; hardly any effect on performance decrease, and back to normal 3 seconds after that - Starting and stopping the array, or rebooting the system causes the CNID databases to be re-created, probably because the AFP shares in AppleVolumes.default are removed and then re-created in the process. In time this may cause the database files to grow to a potentially large size. As an aside, version 2.0x of Netatalk doesn't ever seem to store a CNID cache in the .AppleDouble files, and these are therefore not created as you browse around as they are by default in 2.1x. The nocnidcache variable therefore seems to have no effect in 2.0x. Time to move on to beta9 testing... 5.0b9 / Netatalk 2.1.5 Running the same tests as before (Deleting all CNID Files, and rebuilding them from scratch with the newer Netatalk) results in the following from a du /apps/dbd/ -h --max-depth=1 command: root@unRAID:/apps/dbd# du . -h --max-depth=1 3.5M ./TV 18M ./Audio 2.5M ./Movies 27MB ./BD-Backup 968K ./Software 51M . - The file sizes are bigger, but it didn't take very much longer to create them - 4 minutes 18 seconds. - Re-running the command gives the same 3 second performance as the previous version. Where did all that data come from? If you look inside the .AppleDB folders, on both Netatalk 2.0.5 and 2.1.5, there are four files called lock, log.0000000001, cnid2.db and db_err.log. Netatalk 2.1.5 however also has several files labelled __db.00x, and it's these that make up the majority of the additional data. Perhaps 2.0.5 is storing these in RAM? When a share is removed, added or changed by making changes to the AppleVolumes.default file, as happens when starting or stopping the array; all of this __db.00x files are deleted, and must be re-created again. - As before, if a killall -HUP afpd command is called, performance is not affected really - If the array is started and stopped, or the system is rebooted, the databases need to be completely rebuilt, slowing everything down again. Conclusion / Suggestion 1. Storring the .AppleDB files on a disk outside the array, or possibly the RAM disk, will improve performance when initially creating the databases. There may also be some compatibility issues with having these on the Fuser disk system that's used for user shares. This can be changed with the dbpath: variable in AppleVolumes.default- 2. Using the .AppleDouble files to store the CNID database cache (introduced in 2.1.5) causes performance and possibly again compatibility issues on user shares. This can be disabled by adding the nocnidcache option to the AppleVolumes.default- file 3. The current method of starting and stopping the array involves continual restarting of the AFP daemons, and moving and making changes to the AppleVolumes.default file. This causes all the CNID databases to be deleted / re-created, which may not be necessary. I would like to try and investigate what happens if you change the order of starting and stopping AFP daemons. For example, AFP daemons are stopped when the array stops, before the AppleVolumes.default- changes are carried out, and then only re-started after the array has started again, and any changes to the AppleVolumes.default- files have taken place. My suggestion for 3. would mean that AFP services would not be running when the array is stopped, but I don't see this as being a problem, as nothing can be connected to in this case anyway. Syslog_2_17-07-11.txt
  11. Yes, that's right SASLP, I've modified the original post. Syslog attached from the latest crash on reboot... just now! It looks again like it's /dev/sdh related, though this is my USB Flash Drive now; For this boot, I'd moved the Samsung disk to the SATA controller. Next test... remove the Biostar Card. syslog_17-07-11.txt
  12. As requested by Madburg: Case: Lian Li PC-P50B Motherboard: Foxconn H67MP Processor: Intel Core i3 2100 Controller 1: onboard H67 chipset (2 x SATA III, 4 x SATA II) Drives: 2 x WD20EVDS, 2 x WD20EADS, 1 x WD20EARS, 1 x WD10EAVS Controller 2: Supermicro AOC-SASLP-MV8 (8 x SATA II) Firmware: 3.10.21 Drives: 1 x WD10EAVS, 4 x WD10EADS Controller 3: Biostar DCS3A [ASMedia 106x Chipset] (2 x SATA III) Drives: None so far Controller 4: USB HDD Case, Connected and mounted internally Drives: 20GB Samsung SATA HDD (From an XBOX 360) - Used for Plex & Slimserver Apps Backplanes: 3 x IcyDock MB455SPF-B Cabling: Standard SATA Cables from Onboard and Bios Controllers, 3Ware Forward Breakouts from the Supermicro Some Notes: It always seems to be the USB HDD that causes the Kernel Oops (sdh mentioned in the log I posted earlier). This drive is connected to the internal USB Header. The reason for doing this, is that I don't use a cache drive, and want to maximise the available storage in my case, thereby using all 15 drive cages in the backplane for storage. I mount the drive at /mnt/apps, and also 'bind mount' /tmp to a sub-directory on the drive. The /tmp folder needs to be mounted because Plex writes the video chucks for HTTP Live Streaming to the /tmp folder, and it would quickly fill up the RAM disk otherwise. I have a script that is called based on the 'emhttp_event' to start up my add ons. This is necessary because Plex and Slimserver might prevent the array from stopping if they're holding on to files in the array for any reason. I recently added the 'stopped' case to my script, to try and stop the Kernel Oops... it didn't work though! My script is shown below: # A script to start up and shutdown services when the array starts and stops case $1 in svcs_started) if [ -L /dev/disk/by-id/usb-SAMSUNG_HM020GI_A10040007188-0:0-part1 ]; then if ! mount | grep /mnt/apps\ ; then mount /dev/disk/by-id/usb-SAMSUNG_HM020GI_A10040007188-0:0-part1 /mnt/apps fi if ! mount | grep /tmp; then cd /tmp; tar cf - . | (cd /mnt/apps/.tmp; tar xf -) mount --bind /mnt/apps/.tmp /tmp fi if ! ps -ef | grep .Plex/Plex\ Media\ Server | grep -v grep; then /mnt/apps/.Plex/start.sh >> /Library/Logs/.Plex\ Media\ Server.log 2>&1 & logger -t Plex\ Media\ Server Started fi if ! ps -ef | grep slimserver.pl | grep -v grep; then /mnt/apps/.Squeezebox/SqueezeboxServer/slimserver.pl --nosb1slimp3sync --logdir /var/log --cachedir /mnt/apps/.Squeezebox/Cache/ --noupnp --daemon --user neil logger -t Squeezebox\ Server Started fi fi;; svcs_restarted) if [ -L /dev/disk/by-id/usb-SAMSUNG_HM020GI_A10040007188-0:0-part1 ]; then if ! mount | grep /mnt/apps\ ; then mount /dev/disk/by-id/usb-SAMSUNG_HM020GI_A10040007188-0:0-part1 /mnt/apps fi if ! mount | grep /tmp; then cd /tmp; tar cf - . | (cd /mnt/apps/.tmp; tar xf -) mount --bind /mnt/apps/.tmp /tmp fi if ! ps -ef | grep .Plex/Plex\ Media\ Server | grep -v grep; then /mnt/apps/.Plex/start.sh >> /Library/Logs/.Plex\ Media\ Server.log 2>&1 & logger -t Plex\ Media\ Server Started fi if ! ps -ef | grep slimserver.pl | grep -v grep; then /mnt/apps/.Squeezebox/SqueezeboxServer/slimserver.pl --nosb1slimp3sync --logdir /var/log --cachedir /mnt/apps/.Squeezebox/Cache/ --noupnp --daemon --user neil logger -t Squeezebox\ Server Started fi fi;; stopping_svcs) if mount | grep /mnt/apps\ ; then if ps -ef | grep .Plex/Plex\ Media\ Server | grep -v grep; then kill -s INT `ps -ef | grep .Plex/Plex\ Media\ Server | grep -v grep | awk '{print $2}'` logger -t Plex\ Media\ Server Stopped fi if mount | grep /tmp; then umount /tmp fi if ps -ef | grep slimserver.pl | grep -v grep; then kill -s INT `ps -ef | grep slimserver.pl | grep -v grep | awk '{print $2}'` logger -t Sqeezebox\ Server Stopped fi fi;; stopped) if mount | grep /mnt/apps\ ; then umount /mnt/apps; fi;; esac The next thing I will try, is to use the last spare SATA channel from the Biostar card to connect the Samsung 20GB Applications drive rather than use USB. I'm hoping this might solve the issue, but I'm not confident.
  13. As I was writing the last post, I had the annoying Kernel Oops message regarding my USB HDD for apps. This time it happened when I stopped the array in the web UI, but in the past it's happened when I request a reboot. The important lines from the syslog are: Jul 16 22:57:59 unRAID kernel: BUG: unable to handle kernel NULL pointer dereference at (null) Jul 16 22:57:59 unRAID kernel: IP: [] queue_delayed_work_on+0x33/0xbf Jul 16 22:57:59 unRAID kernel: *pdpt = 00000000377fb001 *pde = 0000000000000000 Jul 16 22:57:59 unRAID kernel: Oops: 0000 [#1] SMP Jul 16 22:57:59 unRAID kernel: last sysfs file: /sys/devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.6/2-1.6:1.0/host10/target10:0:0/10:0:0:0/block/sdh/stat Jul 16 22:57:59 unRAID kernel: Modules linked in: md_mod xor mvsas i2c_i801 r8169 libsas ahci i2c_core scsi_transport_sas libahci [last unloaded: md_mod] Jul 16 22:57:59 unRAID kernel: Jul 16 22:57:59 unRAID kernel: Pid: 9857, comm: umount Not tainted 2.6.37.6-unRAID #3 To be filled by O.E.M. To be filled by O.E.M./H67MP-S/-V/H67MP Jul 16 22:57:59 unRAID kernel: EIP: 0060:[] EFLAGS: 00210246 CPU: 2 Jul 16 22:57:59 unRAID kernel: EIP is at queue_delayed_work_on+0x33/0xbf Jul 16 22:57:59 unRAID kernel: EAX: f8522138 EBX: ffffffff ECX: f8522134 EDX: 00000000 Jul 16 22:57:59 unRAID kernel: ESI: 00000000 EDI: f8522134 EBP: f6fb5e44 ESP: f6fb5e38 Jul 16 22:57:59 unRAID kernel: DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Jul 16 22:57:59 unRAID kernel: Process umount (pid: 9857, ti=f6fb4000 task=f7506b10 task.ti=f6fb4000) Jul 16 22:57:59 unRAID kernel: Stack: Jul 16 22:57:59 unRAID kernel: f8512000 df6e9b54 df6e9b00 f6fb5e50 c1038a6f 0000000a f6fb5eb4 c10d467a Jul 16 22:57:59 unRAID kernel: f8512000 00000012 00000000 f0b97700 0001c499 037da370 00000000 00000010 Jul 16 22:57:59 unRAID kernel: df6e9b18 00000012 00000004 f8512000 00001da9 00000000 dfe73000 d7085fc0 Jul 16 22:57:59 unRAID kernel: Call Trace: Jul 16 22:57:59 unRAID kernel: [] ? queue_delayed_work+0x1b/0x1e Jul 16 22:57:59 unRAID kernel: [] ? do_journal_end+0x747/0x92a Jul 16 22:57:59 unRAID kernel: [] ? journal_end_sync+0x5b/0x63 Jul 16 22:57:59 unRAID kernel: [] ? reiserfs_sync_fs+0x32/0x51 Jul 16 22:57:59 unRAID kernel: [] ? __sync_filesystem+0x53/0x65 Jul 16 22:57:59 unRAID kernel: [] ? sync_filesystem+0x2c/0x40 Jul 16 22:57:59 unRAID kernel: [] ? generic_shutdown_super+0x1d/0xb5 Jul 16 22:57:59 unRAID kernel: [] ? kill_block_super+0x1d/0x31 Jul 16 22:57:59 unRAID kernel: [] ? reiserfs_kill_sb+0x7d/0x80 Jul 16 22:57:59 unRAID kernel: [] ? deactivate_locked_super+0x1a/0x36 Jul 16 22:57:59 unRAID kernel: [] ? deactivate_super+0x32/0x36 Jul 16 22:57:59 unRAID kernel: [] ? mntput_no_expire+0xb0/0xcc Jul 16 22:57:59 unRAID kernel: [] ? sys_umount+0x8f/0x98 Jul 16 22:57:59 unRAID kernel: [] ? sys_oldumount+0xd/0xf Jul 16 22:57:59 unRAID emhttp: shcmd (124): cp /etc/netatalk/AppleVolumes.default- /etc/netatalk/AppleVolumes.default Jul 16 22:57:59 unRAID kernel: [] ? syscall_call+0x7/0xb Jul 16 22:57:59 unRAID kernel: Code: d6 53 89 c3 f0 0f ba 29 00 19 d2 31 c0 85 d2 0f 85 9d 00 00 00 83 79 10 00 74 04 0f 0b eb fe 8d 41 04 39 41 04 74 04 0f 0b eb fe 06 02 b8 08 00 00 00 75 19 89 c8 e8 9d e6 ff ff 85 c0 74 08 Jul 16 22:57:59 unRAID kernel: EIP: [] queue_delayed_work_on+0x33/0xbf SS:ESP 0068:f6fb5e38 Jul 16 22:57:59 unRAID kernel: CR2: 0000000000000000 Jul 16 22:57:59 unRAID kernel: ---[ end trace cf51328d94327b1e ]--- I'll have to do a hard reset to get things going again. This might be similar to MrLondon's post further up the thread.
  14. Thanks prostuff. A couple of things that I probably should add to what I said before. Going through and deleting all of your .AppleDouble files may not be a good idea. These files don't only contain CNID database cache, they also contain resource forks, which from my understanding can be important to Mac users. I've tried to find more information about exactly what would be in a resource fork, but have not been able to find out a lot. It seems that on older Macintosh systems, each file had two components; a data fork, and a resource fork. Only Apple file systems support the apple resource components, and .AppleDouble files are therefore created to hold these resource fork information. On newer Mac OS X systems, I don't believe there is anything important in these resource forks, but I have heard that some MS Office for Mac applications do use them. In my unRAID server, the only files I have are media files, with all the information in the data fork. I can therefore be pretty confident in deleting the .AppleDouble files. If this is not the case for you, YOU HAVE BEEN WARNED. Deleting the .AppleDouble files could have bad consequences. A few people, including me, have noticed error messages like this in the syslog: Jul 16 22:29:02 unRAID shfs/user: shfs_setxattr: setxattr: /mnt/disk1/Audio (22) Invalid argument Jul 16 22:29:02 unRAID shfs/user: shfs_setxattr: setxattr: /mnt/disk1/BD-Backup (22) Invalid argument Jul 16 22:29:02 unRAID shfs/user: shfs_setxattr: setxattr: /mnt/disk1/Movies (22) Invalid argument Jul 16 22:29:02 unRAID shfs/user: shfs_setxattr: setxattr: /mnt/disk1/Software (22) Invalid argument Jul 16 22:29:02 unRAID shfs/user: shfs_setxattr: setxattr: /mnt/disk1/TV (22) Invalid argument I turned on afpd logging by adding this line to the /etc/netatalk/afpd.conf file -setuplog “default log_info /var/log/afpd.log” On restarting afpd, I was able to see this information in the afpd.log file: Jul 16 23:07:44.859913 afpd[1858] {volume.c:1907} (W:AFPDaemon): volume "Audio" does not support Extended Attributes, using ea:ad instead Jul 16 23:07:44.860087 afpd[1858] {volume.c:1907} (W:AFPDaemon): volume "BD-Backup" does not support Extended Attributes, using ea:ad instead Jul 16 23:07:44.860233 afpd[1858] {volume.c:1907} (W:AFPDaemon): volume "Movies" does not support Extended Attributes, using ea:ad instead Jul 16 23:07:44.860384 afpd[1858] {volume.c:1907} (W:AFPDaemon): volume "Software" does not support Extended Attributes, using ea:ad instead Jul 16 23:07:44.860490 afpd[1858] {volume.c:1907} (W:AFPDaemon): volume "TV" does not support Extended Attributes, using ea:ad instead Which corresponds to the messages above (ignore the time differences, I had to reboot; see below!). I think this means that the errors in the syslog can be safely ignored.
  15. OK, I've spent quite a lot of time today playing with Netatalk, and trying to understand exactly how it works... this is what I've learnt so far. Looking at this quote from the Netatalk 2.1 help pages: And the warning below it: It seems that browsing a directory tree on an AFP share is not really browsing a tree structure at all... each file and folder has a unique ID, which needs to be created by Netatalk when you browse a share. My testing has shown that the database is actually only created as you move through a share, with a new .AppleDouble file being created for every subdirectory of a directory that you enter. If like me, you have a Movie / TV Show / Music directory with a lot of subdirectories, there is a very long delay while each folder and file is indexed by the CNID database daemon. Once it's done, browsing folders is very fast, as we're just looking at the database for the structure. I actually found out that if you mount each of your shares, and then run a find . command in the OS X Terminal, you can build this CNID database in one go, rather than have it build as you browse. I figured I may have some corruption in the CNID database, so I decided to rebuild it from scratch. In order to do this, I stopped netatalk by running /etc/rc.d/rc.atalk stop and then deleted all the .AppleDB and .AppleDouble files on all of the disks in the array by running find disk*/ -name .AppleD* -exec rm -r {} \; from the /mnt directory. After I'd done this I was able to restart netatalk, and then browse around to rebuild the database. This all sounds good so far, but I discovered other issues... On re-booting, the CNID database seems to be re-created, so connecting and browsing becomes slow again. Occasionally I've got errors about the CNID database being corrupt after rebooting, even after I'd completely re-created them in the previous boot. I wondered if there might be some issues creating these .AppleDouble and .AppleDB files on the fuser file system, and had read about there being a way to move the AppleDB folders away to a different location, and prevent the CNID data from being cached in the .AppleDouble folders. To do this, you need to edit the /etc/netatalk/AppleVolumes.default- file (make sure you edit the file with a -, as it's used by unRAID each time you make a change to your shares configuration). At the bottom of the file is a line that lists the defaults for AFP Shares, where you can add the option 'nocnidcache' and 'dbpath:'. First I created a subdirectory in /var to hold these databases in the RAM disk. The line in my AppleVolumes.default- file then looked like this: :DEFAULT: cnidscheme:dbd options:upriv,usedots,nocnidcache dbpath:/var/dbd/$v I stopped netatalk, removed all the CNID files using the process above, the restarted everything. Without .AppleDouble directories being created, the initial browsing of folder is much much quicker. The downside to putting the .AppleDB folders in /var, is that they disappear on reboot. This may not be an issue if they really are being re-generated on each boot anyway of course. I've also tried having the dbpath for the .AppleDB files on my USB disk that I've been using for Plex and Slimserver, though I'm still having issues with Kernel Oops: when I try to reboot with this drive in place.
  16. Yes, I agree that it does get better once it finally connects... So far the delay in connecting is about 2 or 3 minutes, and the OSX finder locks up for the duration, so it's pretty annoying. One thing I haven't tried is only connecting to individual disks rather than user shares. I know that there are issues with sharing both via AFP, and given the choice I'd rather share the user shares. Those errors that we're seeing are all related to the user share file system.
  17. I have so far found AFP to be painfully slow at initially connecting in beta9, the logs only show a lot of: Jul 15 16:29:05 unRAID shfs/user: shfs_setxattr: setxattr: /mnt/cache/TV (95) Operation not supported Nothing else is obvious other than that. I've tried deleting all the .AppleDB files to have the re-created after the upgrade to beta9, and while this worked the first time (after waiting ages for what I assume was them being rebuilt the first time I tried to connect to each share); I've just tried connecting on a different computer in the network and it's really slow again. AFP was never fast in previous versions, but it was better than this. I may have to downgrade to beta8d for the time being. I'm surprised that I'm the only one seeing this. Really no-one else having performance issues with beta9 / Snow Leopard?
  18. Me too, that's why I didn't put it in there at first... unfortunately it seems now that it's only the graphics slot that will work at full speed with this card on my board.
  19. Thanks for your tips Johnm, I moved the card into the other slot on the motherboard that was probably meant for the graphics card, and it seems to be working well now, with 2 simultaneous pre-clears going at the speed I would expect and 'top' reporting a %wa of between 20 and 30 - much better. I actually selected my motherboard because it had 1 x PCIe 16x, 1 x PCIe 4x and 2 x PCIe 1x slots. I though this would mean that I would be able to fit 2 x AOC-SASLP-MV8 cards, 2 x 2 port SATA cards and with the 6 onboard SATA slots, get as many as 26 drives running from it! This may not be the case now however. Before I mark this thread as solved, I'll wait for the pre-clears to finish and then move the card back into the earlier slot and confirm that this was the problem. I am concerned that I might have fixed it with a simple reboot.
  20. As Joe said, I believe this is the issue I reported a few pages back... the free edition does not currently work with beta8c, and Tom has stated that this will be fixed in beta8d. I actually reported it as either a virtual machine or free edition error, but the other reports are showing that I was right in the second guess that it's the free edition that doesn't work. Regarding this message: What Apple OS are you running? I've been able to reproduce it with Snow Leopard and Lion I'm running Snow Leopard, and have not had any problems (besides it being slow to first connect) with AFP. I'm not surprised that there are problems with Lion though, it's not even officially released yet, and unRAID's netatalk and avahi are not the latest versions as far as I know. I hope Tom is able to update these in a future version, but things are quite usable already in Snow Leopard. I can't speak for time machine though... I don't use that.
  21. This last weekend, I added a Supermicro AOC-SASLP-MV8 controller card, allowing me to get my server up to 14 drives. I've actually got an additional 2 port SATA3 card that would bring me to 16 drives, but that didn't want to play nice once, and wouldn't allow the system to boot if I inserted the MV8 (I'll make another post about this). What I'm wondering is... what sort of performance I should expect from this card, because so far, I've found it to be painfully slow. I've got only 4 drives connected to it so far, all off one socket via a forward breakout cable. Of these 4 drives, 1 is the cache drive, 2 are new and waiting for pre-clear, and 1 is an array drive. My understanding was that because this is a PCIe 4x card, that was enough CPU bandwidth to get full speed out of 8 drives at a maximum of SATA 2 speed, more than enough for standard spinning disks. I'm not seeing this though... I started a pre-clear on both the new drives, and the whole server ground to a halt. I was able to log in and load 'top' where I could see that the CPU was hovering at about 90%wa, which I believe means waiting for IO. The result of this is that the pre-clear performance slowed down to about 2.6 MB/s, and took about 10 minutes to update the progress screen to show me this, as opposed to the 10 seconds that it usually updates at. If I only one 1 pre-clear, the %wa in 'top' is reduced, but it doesn't seem to make a difference to the performance at all. I'm not sure if it will help, but I've added the syslog that covers the initial boot. I guess I should add a few bits of info to this... System is a Foxconn H67MP-S Core i3 2100 4 GB RAM unRAID 5.0beta8c syslog.txt
  22. I've run into some problems with the latest beta (beta8c) that are causing the use of USB mounted drives to become unreliable, resulting in kernel 'Oops'. Tom has said this should be fixed in the next build (beta8d); but you should be aware of this issue if you're considering following my information above. Thanks for the time by the way Joe, I actually worked this out today myself, but because of the issue didn't post any update yet.
  23. I don't have the system logs from this any more I'm afraid, I was kinda hoping that you would have some ideas, sorta like you did! I actually think you might be right with your guess, because the errors that popped up on the console did include something about a /dev/sdx error every time. Once beta8d is available, I'll move things back over to the USB and give it a go! Thanks again!
  24. Thanks Tom, I've also noticed some other strange behavior that I'm hoping you can help with... In order to allow me to have the maximum amount of drives in my case for the array, I'd installed a USB HDD for my add on applications (So far Plex and Squeezebox server). This also had some other beneficial side effects, because running these applications on the cache disk meant that they had to be stopped before the array could be stopped in the web UI. On a disk outside the array, they could run on their own, doing whatever they pleased without interfering. Since the update to beta8c, I've been getting a lot of 'kernel Oops' when stopping the array, this doesn't happen if I take the USB disk out and run things from the cache drive. To be honest, I had noticed these errors before, so it may not be a beta8c thing, though in this latest version it's much more repeatable (happens every time I stop the array if I've ever started any processes on the USB HDD). If it helps, the USB was mounted at /mnt/apps, shared over SMB by adding to the smb-extra file in /boot/config, and formatted with reiserfs.
  25. Now it's the weekend, and I've got some time, I thought I'd try and do the upgrade to 5.0-beta8c. I usually start by testing updates, and indeed any customisations on a virtual machine that I have set up with tiny 8GB dummy drives. Because I'm using a Mac, it's not possible to actually use a USB disk for this, and in the past I've used an extra SATA dummy disk as the flash, which has worked fine. Obviously I'm only able to simulate the free version of unRAID like this, but that's fine for testing. With 5.0-beta8c, I'm not able to do this because I'm getting the 'Segmentation fault' error that users of beta8a were having on the first page of this thread. I'm wondering if that's something that can be fixed early, as using a VM for development and testing is very valuable. I'm also thinking that this issue might exist if anyone is trying to use the free version with a real flash drive, though I'm not able to test this.