April 19, 201016 yr I've searched and searched to no avail so I offer the following issue for the guru's out there. I have an RB-1210 (2GB) at home, and a MD-1510 (7400 & 4GB) at work. Both are on gigabit networks, cat6 (and 10GBiT Fiber at work), gigabit switches through out, etc. Both have a selection of quality drives (mostly Seagates and Samsungs, with the occasional WDC). The RB-1210 is running 4.5 and the 1510 is running 4.4.x. I have a raptor in the cache slot at work, and a samsung F1 in the cache slot at home. I regularly get ~50MB/s writes both at home and at work.. But at home I get a 3-15(!) second pause after writing each and every file to the RB-1210. I was hoping adding the cache drive would help, but it made NO difference (I told all of the relevant shares to use the cache disk). Anyone have any suggestions? I've tried all forms of disk queueing with nearly no difference in performance, and no real correlation with this "hanging." And now the really ugly part, on occasion we get errors with backups that run to the lime box at work, and they're of a variety that leaves me wondering if the same thing isn't happening, where it just hangs for a little bit (and subsequent operations get you a stale nfs handle because the lime box doesn't respond). In no case does this have to be perfect, I'd happily even take something of a cut in overall performance to stop or reduce the hanging. Any suggestions would be much appreciated. TIA
April 19, 201016 yr I've searched and searched to no avail so I offer the following issue for the guru's out there. I have an RB-1210 (2GB) at home, and a MD-1510 (7400 & 4GB) at work. Both are on gigabit networks, cat6 (and 10GBiT Fiber at work), gigabit switches through out, etc. Both have a selection of quality drives (mostly Seagates and Samsungs, with the occasional WDC). The RB-1210 is running 4.5 and the 1510 is running 4.4.x. I have a raptor in the cache slot at work, and a samsung F2 in the cache slot at home. I regularly get ~50MB/s writes both at home and at work.. But at home I get a 3-15(!) second pause after writing each and every file to the RB-1210. I was hoping adding the cache drive would help, but it made NO difference (I told all of the relevant shares to use the cache disk). Anyone have any suggestions? I've tried all forms of disk queueing with nearly no difference in performance, and no real correlation with this "hanging." And now the really ugly part, on occasion we get errors with backups that run to the lime box at work, and they're of a variety that leaves me wondering if the same thing isn't happening, where it just hangs for a little bit (and subsequent operations get you a stale nfs handle because the lime box doesn't respond). In no case does this have to be perfect, I'd happily even take something of a cut in overall performance to stop or reduce the hanging. Any suggestions would be much appreciated. TIA Sounds more like a networking issue to me.
April 19, 201016 yr what are you monitoring with? some things like winXP will display their cache, not the real write status like vista/7, this is why vista/7 copy looks slower because its showing whats been written not whats been copied to ssytem cache. its a big problem with usb drives as you thinks tufffs copied but it takes a good few seconds and then the drive actually writes, which makes a delay between files.
April 19, 201016 yr Author what are you monitoring with? some things like winXP will display their cache, not the real write status like vista/7, this is why vista/7 copy looks slower because its showing whats been written not whats been copied to ssytem cache. its a big problem with usb drives as you thinks tufffs copied but it takes a good few seconds and then the drive actually writes, which makes a delay between files. No XP here, Nautilus, Dolphin, Thumar, and Mac OS X, I experience the same issue.
April 19, 201016 yr what are you monitoring with? some things like winXP will display their cache, not the real write status like vista/7, this is why vista/7 copy looks slower because its showing whats been written not whats been copied to ssytem cache. its a big problem with usb drives as you thinks tufffs copied but it takes a good few seconds and then the drive actually writes, which makes a delay between files. No XP here, Nautilus, Dolphin, Thumar, and Mac OS X, I experience the same issue. Long ago I recall this being an issue if you did not define a nameserver on unRAID. Do you have one defined?
April 19, 201016 yr see if it happens moving stuff on the bo x itself, not over teh network... maybe look at the hdd led light?
April 19, 201016 yr Author Sounds more like a networking issue to me. Joe, how so? How does the network factor in with the end of a file? Shouldn't this happen at other times as well then? Your reputation preceding you, I will test with a crossover cable going directly between the RB-1200 and another machine and see what happens. In the interest of full disclosure (and while I try to maintain an open mind) I really don't think its a networking issue. Not to repeat myself, but if it were I'd think it would hang or bog in the middle of copying 5 & 10GB files, which it doesn't seem to do. It does seem however to finish the copy, then pause a moment after copying the 10GB file. It feels like what you see in other systems when parity is calculated after-the-fact, and the write status isn't returned until after the parity has been calculated and written.
April 19, 201016 yr Sounds more like a networking issue to me. Joe, how so? How does the network factor in with the end of a file? Shouldn't this happen at other times as well then? Your reputation preceding you, I will test with a crossover cable going directly between the RB-1200 and another machine and see what happens. In the interest of full disclosure (and while I try to maintain an open mind) I really don't think its a networking issue. Not to repeat myself, but if it were I'd think it would hang or bog in the middle of copying 5 & 10GB files, which it doesn't seem to do. It does seem however to finish the copy, then pause a moment after copying the 10GB file. It feels like what you see in other systems when parity is calculated after-the-fact, and the write status isn't returned until after the parity has been calculated and written. Parity is calculated as you write to the drive, there is no delay. (unless you are writing to a cache drive, in which case the copy to the protected array is in the middle of the night)
April 20, 201016 yr Author Joe, Strangely it may have been DNS, though I can see no earthly reason why. I did have the routers IP address in there for DNS, which works, albeit not ideally, I've put in IPs for DNS, and it seems like it might be better, but its hard to tell. Hard to tell because I'm also doing a drive upgrade right now with the RB1200, and I'm still getting the occasional pause, though it seems less severe and I'm willing to attribute it to the ongoing rebuild/expansion. (I've noticed the pausing/hanging for weeks, the rebuild just started 2 hours ago). I'll post something more conclusive after the rebuild, and testing with a crossover.
April 20, 201016 yr For the longest time I had these being invoked nameserver_ip="192.168.2.1" grep $nameserver_ip /etc/resolv.conf >/dev/null 2>&1 if [ $? = 1 ] then echo "nameserver $nameserver_ip" >>/etc/resolv.conf fi and this: #!/bin/bash # Fixes /etc/hosts with proper hostname information HOSTNAME=`hostname` if grep $HOSTNAME /etc/hosts >/dev/null then echo "hostname: '$HOSTNAME' already in hosts. skipping" exit fi # Remove this crappy entry. grep -v 'darkstar.example.net' < /etc/hosts > /tmp/hosts # Get current ifconfig information and use it to get address ifconfig | awk -vhostname="$HOSTNAME" ' { # inet addr:192.168.1.178 Bcast:192.168.1.255 Mask:255.255.255.0 # $1 $2 $3 $4 if ( /inet addr:/ && /Bcast:/ && /Mask:/ ) { addr=$2 gsub("addr:","",addr); printf("%s\t%s\n",addr,hostname); } } ' >> /tmp/hosts if ! grep $HOSTNAME /tmp/hosts > /dev/null then printf "127.0.0.1\t%s\n" $HOSTNAME >> /tmp/hosts fi cat /tmp/hosts > /etc/hosts rm -f /tmp/hosts Obviously, you would need to put the correct name-server for your network.
April 20, 201016 yr Author One other thing worth mentioning, this never seems to happen with read operations, I can copy off the lime box without pauses/hangs.. Even with the rebuild/upgrade going I'm getting a solid 11.5MB/s (+/- 1MB/s)
April 20, 201016 yr Similar delays have been observed when there was no Master Browser on the subnet. Have you tried enabling unRAID to be the Master Browser? (shot in the dark)
April 20, 201016 yr WOW that script looked familiar... It looked like something I would write. Then I realized. I did. hahahaha threw me for a loop. that was funny.
April 20, 201016 yr Author Similar delays have been observed when there was no Master Browser on the subnet. Have you tried enabling unRAID to be the Master Browser? (shot in the dark) Thanks, if only, local master=yes
April 20, 201016 yr I too think that it is likely something network related. More speciffically, may be samba related. Have a look at my smb-extra.conf and see if you can pick anything useful from there.
April 20, 201016 yr Author I too think that it is likely something network related. More speciffically, may be samba related. Have a look at my smb-extra.conf and see if you can pick anything useful from there. Good stuff, but this is happening with NFS, so I'm don't know how it could relate to my issues.
April 20, 201016 yr There are other events that occur on unRAID, such as buffer caching and reiserfs allocation. When copies fail at the start, I look at reiserfs allocation delay. This is prominent with 5400 RPM drives and a near capacity drive. When copies fail at the end, I look at how much data was moved, how much ram and how busy the parity drive is. Allot of data gets cached, and when the file is finally closed, the data is flushed causing a massive parity update. shfs usage takes some performance out of the equation. Does the issue occur when writing directly to a disk share? Not having DNS setup correctly could cause issues. This is usually on initial connect.
April 20, 201016 yr Author There are other events that occur on unRAID, such as buffer caching and reiserfs allocation. When copies fail at the start, I look at reiserfs allocation delay. This is prominent with 5400 RPM drives and a near capacity drive. When copies fail at the end, I look at how much data was moved, how much ram and how busy the parity drive is. Allot of data gets cached, and when the file is finally closed, the data is flushed causing a massive parity update. shfs usage takes some performance out of the equation. Does the issue occur when writing directly to a disk share? Not having DNS setup correctly could cause issues. This is usually on initial connect. I like this theory, it fits with what I know of my symptoms. What you recommend as a next step to resolving this? Does this really still apply if parity is being written later, as I'm using a cache drive?
April 20, 201016 yr Does this really still apply if parity is being written later, as I'm using a cache drive? It's a buffer cache flush. Go on your unraid system, do a top and watch the %wa. If you have high %wa then there is a wait on I/O. I have a raptor in the cache slot at work, and a samsung F2 in the cache slot at home. I regularly get ~50MB/s writes both at home and at work.. But at home I get a 3-15(!) second pause after writing each and every file to the RB-1210. I was hoping adding the cache drive would help, but it made NO difference (I told all of the relevant shares to use the cache disk). It could have something to do with the drive model. That is a 5400 drive. Try using a 7200RPM or a samsung F3. I benchmarked the Samsung F3 at 80MB/s writes, 149MB/s reads. I benchmarked the Seagate 1.5TB 7200RPM drive at 125MB/s writes on an empty filesystem. you can try this. cd to your cache drive and run a dd speed test. dd if=/dev/zero of=test.dd bs=1024 count=4096000
April 20, 201016 yr Author It's a buffer cache flush. Go on your unraid system, do a top and watch the %wa. If you have high %wa then there is a wait on I/O. And this is exactly what I've seen in the past, what's the remedy? It could have something to do with the drive model. That is a 5400 drive. Try using a 7200RPM or a samsung F3. I benchmarked the Samsung F3 at 80MB/s writes, 149MB/s reads. I benchmarked the Seagate 1.5TB 7200RPM drive at 125MB/s writes on an empty filesystem. you can try this. cd to your cache drive and run a dd speed test. dd if=/dev/zero of=test.dd bs=1024 count=4096000 I had the drive model wrong, its actually an F1, sorry for the confusion, I'll edit my original post. (SAMSUNG HD753LJ SATAII 7200rpm, 32MB). Here's a full inventory. St Slot Disk Dev Share Manu Model Name OK parity sdj Seagate 2T ST32000542AS 6XW01DZQ OK disk1 sdl Samsung 1T SAMSUNG HD103UJ S1PVJ1CQ900899 OK disk2 sdk Seagate 2T ST32000542AS 6XW0TEXW OK disk3 sdm Samsung 1T SAMSUNG HD103UI S1LMJ90S312595 OK disk4 sda Seagate 1T ST31000340NS 9QJ2MMX3 OK disk5 sdb WD 1T WDC WD10EACS-00ZJB0 WD-WCASJ0604634 OK disk6 sdc WD 1T WDC WD10EACS-07D6B0 WD-WCAU45498602 OK disk7 sdd Seagate 1T ST31000340AS 9QJ032XS OK disk8 sde WD 1T WDC WD10EACS-00ZJB0 WD-WCASJ1188131 OK disk9 sdg WD 1T WDC WD10EACS-00ZJB0 WD-WCASJ0601314 INVALID disk10 sdh Samsung 2T SAMSUNG HD203WI S1UYJ1CZ312372This is the drive in the process of replacing a smaller one BT flash sdf Unknown 2G Lexar JD FireFly SE3OUE0PQP65A9KF61PF MT cache sdi Samsung 750G SAMSUNG HD753LJ S13UJ1MPC16788
April 20, 201016 yr Try the speed test below, that gives you the best possible case of any writes. Check your syslog and make sure you are operating at 3.0gb/s.
April 20, 201016 yr Author Try the speed test below, that gives you the best possible case of any writes. Check your syslog and make sure you are operating at 3.0gb/s. Sorry, I forgot to say, I am planning on running the speed test, once the rebuild is complete.. I've checked all of the drives, no jumpers, they're all at 3Gb/s. Stangely syslog restarted in the night, resetting the output, the log doesn't go back far enough.. Thanks for all the help.
April 20, 201016 yr Stangely syslog restarted in the night, resetting the output, the log doesn't go back far enough.. There would be another syslog such as syslog.0 or syslog.1.
April 20, 201016 yr Author Stangely syslog restarted in the night, resetting the output, the log doesn't go back far enough.. There would be another syslog such as syslog.0 or syslog.1. Right you are! aobrien@aob-tower:~/Desktop$ grep link lime-syslog* Apr 19 16:53:18 lime kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Apr 19 16:53:18 lime kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Apr 19 16:53:18 lime kernel: ata2: hard resetting link Apr 19 16:53:18 lime kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Apr 19 16:53:18 lime kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Apr 19 16:53:18 lime kernel: ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Apr 19 16:53:18 lime kernel: ata3: hard resetting link Apr 19 16:53:18 lime kernel: ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Apr 19 16:53:18 lime kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Apr 19 16:53:18 lime kernel: ata4: hard resetting link Apr 19 16:53:18 lime kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Apr 19 16:53:18 lime kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Apr 19 16:53:18 lime kernel: ata6: hard resetting link Apr 19 16:53:18 lime kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Apr 19 16:53:18 lime kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Apr 19 16:53:18 lime kernel: ata7: hard resetting link Apr 19 16:53:18 lime kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Apr 19 16:53:18 lime kernel: ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Apr 19 16:53:18 lime kernel: ata8: hard resetting link Apr 19 16:53:18 lime kernel: ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
April 20, 201016 yr It's a buffer cache flush. Go on your unraid system, do a top and watch the %wa. If you have high %wa then there is a wait on I/O. And this is exactly what I've seen in the past, what's the remedy? I used to experience heavy I/O waits in the past on my (underpowered) server. They were gone once I switched the scheduler, and tweaked the buffer settings as you'll find in my 'go' script that I linked you to in my earlier post. You can experiment with that. --- Apr 19 16:53:18 lime kernel: ata2: hard resetting link Apr 19 16:53:18 lime kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Apr 19 16:53:18 lime kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Apr 19 16:53:18 lime kernel: ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Apr 19 16:53:18 lime kernel: ata3: hard resetting link Apr 19 16:53:18 lime kernel: ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Apr 19 16:53:18 lime kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Apr 19 16:53:18 lime kernel: ata4: hard resetting link Apr 19 16:53:18 lime kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Apr 19 16:53:18 lime kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Apr 19 16:53:18 lime kernel: ata6: hard resetting link Apr 19 16:53:18 lime kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Apr 19 16:53:18 lime kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Apr 19 16:53:18 lime kernel: ata7: hard resetting link Apr 19 16:53:18 lime kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Apr 19 16:53:18 lime kernel: ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Apr 19 16:53:18 lime kernel: ata8: hard resetting link That doesn't look good! Can it be cabling problems?
Archived
This topic is now archived and is closed to further replies.