ro345 Posted July 16, 2016 Share Posted July 16, 2016 I posted this a few months ago and got no response, I've since upgraded to Unraid 6 and I'm still seeing it. Load on the server will spike for 15 or 30 minutes at a time. The server basically becomes on responsive, shares disconnect, running any "hdparm -tT" commands on any drive leads to a hung session. There is nothing in the logs at all to indicate a problem. The top output below was when the server was doing a parity check. Its been running for about 6 hours with 6 more hours to go. The load spikes for 15 or 30 minutes at a time, and the server becomes unresponsive (i can still run top and the CLI is responsive, however hdparm -tT hangs and I cannot copy files to the server and the share is unresponsive). Notice the very high load. top - 10:22:37 up 7:13, 3 users, load average: 14.47, 9.39, 5.03 Tasks: 302 total, 2 running, 300 sleeping, 0 stopped, 0 zombie Cpu(s): 0.4%us, 0.6%sy, 0.0%ni, 90.6%id, 8.3%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 66037640k total, 55752540k used, 10285100k free, 612692k buffers Swap: 0k total, 0k used, 0k free, 54185592k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4603 root 20 0 0 0 0 S 9 0.0 138:39.53 unraidd 4550 root 20 0 0 0 0 D 2 0.0 34:48.37 mdrecoveryd 4530 root 20 0 89120 3452 2960 S 1 0.0 1:00.03 emhttp 8 root 20 0 0 0 0 S 0 0.0 0:48.83 rcu_preempt 1085 root 20 0 0 0 0 S 0 0.0 0:11.12 kworker/8:1 1166 root 0 -20 0 0 0 S 0 0.0 0:51.37 kworker/10:1H 4545 root 0 -20 0 0 0 S 0 0.0 0:50.38 kworker/11:1H 4693 root 20 0 619m 25m 552 S 0 0.0 80:30.11 shfs 7722 root 20 0 0 0 0 S 0 0.0 0:10.40 kworker/11:2 17944 root 20 0 26624 4772 4316 S 0 0.0 0:00.02 sshd 1 root 20 0 4368 1400 1300 S 0 0.0 0:14.69 init 2 root 20 0 0 0 0 S 0 0.0 0:00.00 kthreadd 3 root 20 0 0 0 0 S 0 0.0 0:18.13 ksoftirqd/0 5 root 0 -20 0 0 0 S 0 0.0 0:00.00 kworker/0:0H 6 root 20 0 0 0 0 S 0 0.0 0:00.01 kworker/u48:0 9 root 20 0 0 0 0 S 0 0.0 0:00.00 rcu_sched 10 root 20 0 0 0 0 S 0 0.0 0:00.00 rcu_bh Its a fairly powerful box. I have dual hex core xeons, and 64GB of ram in the server, a M1015 controller in IT mode. Even under high load, I would still expect that samba would respond in a timely fashion, like in a few seconds, as opposed to several minutes. Right now, I can't even do an "hdparm -tT /dev/xxx". It just sits and waits. When not in the degraded condition, I can copy files out of the server at wire speed, normally around 110 Megabytes per second or so. The degraded condition tends to come and go. It seems to happen most when running preclear or forced parity checks, but happens at other times randomly. No issues in the smart report. THanks for any help. Quote Link to comment
JonathanM Posted July 16, 2016 Share Posted July 16, 2016 Diagnostics zip? Without that to go on, my blind guess is that you are using ReiserFS disks, and are running into the extended housekeeping that a large fragmented nearly full ReiserFS disk experiences when you try to write to it. Quote Link to comment
ro345 Posted July 16, 2016 Author Share Posted July 16, 2016 I think you are on to something. Even though I have nearly 1TB left, the filesystem is very full. root@Tower:/var/log# df -h Filesystem Size Used Avail Use% Mounted on tmpfs 128M 1.9M 127M 2% /var/log /dev/sda1 7.5G 62M 7.5G 1% /boot /dev/md1 3.7T 3.6T 115G 97% /mnt/disk1 /dev/md2 3.7T 3.5T 147G 97% /mnt/disk2 /dev/md3 3.7T 3.6T 112G 97% /mnt/disk3 /dev/md4 2.8T 2.6T 149G 95% /mnt/disk4 /dev/md5 3.7T 3.5T 230G 94% /mnt/disk5 /dev/md6 3.7T 3.5T 230G 94% /mnt/disk6 shfs 21T 20T 981G 96% /mnt/user /dev/loop0 1.8M 80K 1.6M 5% /etc/libvirt Does that kind of utilization cause these issues in reiserfs? Anything that I can do about it (short of adding more disk)? Converting to XFS help (probably rather not do that)? Optimize reiserfs? I can't post the diagnostic.zip due to my company's security rules (serial numbers, IP address, etc). Thanks! Quote Link to comment
Squid Posted July 16, 2016 Share Posted July 16, 2016 I posted this a few months ago and got no response, I've since upgraded to Unraid 6 and I'm still seeing it. Load on the server will spike for 15 or 30 minutes at a time. The server basically becomes on responsive, shares disconnect, running any "hdparm -tT" commands on any drive leads to a hung session. There is nothing in the logs at all to indicate a problem. The top output below was when the server was doing a parity check. Its been running for about 6 hours with 6 more hours to go. The load spikes for 15 or 30 minutes at a time, and the server becomes unresponsive (i can still run top and the CLI is responsive, however hdparm -tT hangs and I cannot copy files to the server and the share is unresponsive). Notice the very high load. top - 10:22:37 up 7:13, 3 users, load average: 14.47, 9.39, 5.03 Tasks: 302 total, 2 running, 300 sleeping, 0 stopped, 0 zombie Cpu(s): 0.4%us, 0.6%sy, 0.0%ni, 90.6%id, 8.3%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 66037640k total, 55752540k used, 10285100k free, 612692k buffers Swap: 0k total, 0k used, 0k free, 54185592k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4603 root 20 0 0 0 0 S 9 0.0 138:39.53 unraidd 4550 root 20 0 0 0 0 D 2 0.0 34:48.37 mdrecoveryd 4530 root 20 0 89120 3452 2960 S 1 0.0 1:00.03 emhttp 8 root 20 0 0 0 0 S 0 0.0 0:48.83 rcu_preempt 1085 root 20 0 0 0 0 S 0 0.0 0:11.12 kworker/8:1 1166 root 0 -20 0 0 0 S 0 0.0 0:51.37 kworker/10:1H 4545 root 0 -20 0 0 0 S 0 0.0 0:50.38 kworker/11:1H 4693 root 20 0 619m 25m 552 S 0 0.0 80:30.11 shfs 7722 root 20 0 0 0 0 S 0 0.0 0:10.40 kworker/11:2 17944 root 20 0 26624 4772 4316 S 0 0.0 0:00.02 sshd 1 root 20 0 4368 1400 1300 S 0 0.0 0:14.69 init 2 root 20 0 0 0 0 S 0 0.0 0:00.00 kthreadd 3 root 20 0 0 0 0 S 0 0.0 0:18.13 ksoftirqd/0 5 root 0 -20 0 0 0 S 0 0.0 0:00.00 kworker/0:0H 6 root 20 0 0 0 0 S 0 0.0 0:00.01 kworker/u48:0 9 root 20 0 0 0 0 S 0 0.0 0:00.00 rcu_sched 10 root 20 0 0 0 0 S 0 0.0 0:00.00 rcu_bh Its a fairly powerful box. I have dual hex core xeons, and 64GB of ram in the server, a M1015 controller in IT mode. Even under high load, I would still expect that samba would respond in a timely fashion, like in a few seconds, as opposed to several minutes. Right now, I can't even do an "hdparm -tT /dev/xxx". It just sits and waits. When not in the degraded condition, I can copy files out of the server at wire speed, normally around 110 Megabytes per second or so. The degraded condition tends to come and go. It seems to happen most when running preclear or forced parity checks, but happens at other times randomly. No issues in the smart report. THanks for any help. With dual hex core cpu's, a load average of 14 is actually nothing. You've got a total of 12 cores + 12 hyperthreaded cores, and load averages are related to total cores. A load average of 24 means that all cores are fully ultilized. Anything less means you still have room to spare. On the other hand, if you only had a single core, then 14 would be rather extreme. http://www.howtogeek.com/194642/understanding-the-load-average-on-linux-and-other-unix-like-systems/ As johnathanm stated, its probably reiserfs Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.