High Load on Server / Unresponsive

July 16, 20169 yr

I posted this a few months ago and got no response, I've since upgraded to Unraid 6 and I'm still seeing it. Load on the server will spike for 15 or 30 minutes at a time. The server basically becomes on responsive, shares disconnect, running any "hdparm -tT" commands on any drive leads to a hung session. There is nothing in the logs at all to indicate a problem. The top output below was when the server was doing a parity check. Its been running for about 6 hours with 6 more hours to go. The load spikes for 15 or 30 minutes at a time, and the server becomes unresponsive (i can still run top and the CLI is responsive, however hdparm -tT hangs and I cannot copy files to the server and the share is unresponsive). Notice the very high load.

top - 10:22:37 up 7:13, 3 users, load average: 14.47, 9.39, 5.03

Tasks: 302 total, 2 running, 300 sleeping, 0 stopped, 0 zombie

Cpu(s): 0.4%us, 0.6%sy, 0.0%ni, 90.6%id, 8.3%wa, 0.0%hi, 0.0%si, 0.0%st

Mem: 66037640k total, 55752540k used, 10285100k free, 612692k buffers

Swap: 0k total, 0k used, 0k free, 54185592k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

4603 root 20 0 0 0 0 S 9 0.0 138:39.53 unraidd

4550 root 20 0 0 0 0 D 2 0.0 34:48.37 mdrecoveryd

4530 root 20 0 89120 3452 2960 S 1 0.0 1:00.03 emhttp

8 root 20 0 0 0 0 S 0 0.0 0:48.83 rcu_preempt

1085 root 20 0 0 0 0 S 0 0.0 0:11.12 kworker/8:1

1166 root 0 -20 0 0 0 S 0 0.0 0:51.37 kworker/10:1H

4545 root 0 -20 0 0 0 S 0 0.0 0:50.38 kworker/11:1H

4693 root 20 0 619m 25m 552 S 0 0.0 80:30.11 shfs

7722 root 20 0 0 0 0 S 0 0.0 0:10.40 kworker/11:2

17944 root 20 0 26624 4772 4316 S 0 0.0 0:00.02 sshd

1 root 20 0 4368 1400 1300 S 0 0.0 0:14.69 init

2 root 20 0 0 0 0 S 0 0.0 0:00.00 kthreadd

3 root 20 0 0 0 0 S 0 0.0 0:18.13 ksoftirqd/0

5 root 0 -20 0 0 0 S 0 0.0 0:00.00 kworker/0:0H

6 root 20 0 0 0 0 S 0 0.0 0:00.01 kworker/u48:0

9 root 20 0 0 0 0 S 0 0.0 0:00.00 rcu_sched

10 root 20 0 0 0 0 S 0 0.0 0:00.00 rcu_bh

Its a fairly powerful box. I have dual hex core xeons, and 64GB of ram in the server, a M1015 controller in IT mode. Even under high load, I would still expect that samba would respond in a timely fashion, like in a few seconds, as opposed to several minutes. Right now, I can't even do an "hdparm -tT /dev/xxx". It just sits and waits. When not in the degraded condition, I can copy files out of the server at wire speed, normally around 110 Megabytes per second or so. The degraded condition tends to come and go. It seems to happen most when running preclear or forced parity checks, but happens at other times randomly. No issues in the smart report.

THanks for any help.

Quote

July 16, 20169 yr

Diagnostics zip?

Without that to go on, my blind guess is that you are using ReiserFS disks, and are running into the extended housekeeping that a large fragmented nearly full ReiserFS disk experiences when you try to write to it.

Quote

July 16, 20169 yr

Author

I think you are on to something. Even though I have nearly 1TB left, the filesystem is very full.

root@Tower:/var/log# df -h

Filesystem Size Used Avail Use% Mounted on

tmpfs 128M 1.9M 127M 2% /var/log

/dev/sda1 7.5G 62M 7.5G 1% /boot

/dev/md1 3.7T 3.6T 115G 97% /mnt/disk1

/dev/md2 3.7T 3.5T 147G 97% /mnt/disk2

/dev/md3 3.7T 3.6T 112G 97% /mnt/disk3

/dev/md4 2.8T 2.6T 149G 95% /mnt/disk4

/dev/md5 3.7T 3.5T 230G 94% /mnt/disk5

/dev/md6 3.7T 3.5T 230G 94% /mnt/disk6

shfs 21T 20T 981G 96% /mnt/user

/dev/loop0 1.8M 80K 1.6M 5% /etc/libvirt

Does that kind of utilization cause these issues in reiserfs? Anything that I can do about it (short of adding more disk)? Converting to XFS help (probably rather not do that)? Optimize reiserfs?

I can't post the diagnostic.zip due to my company's security rules (serial numbers, IP address, etc). Thanks!

Quote

July 16, 20169 yr

I posted this a few months ago and got no response, I've since upgraded to Unraid 6 and I'm still seeing it. Load on the server will spike for 15 or 30 minutes at a time. The server basically becomes on responsive, shares disconnect, running any "hdparm -tT" commands on any drive leads to a hung session. There is nothing in the logs at all to indicate a problem. The top output below was when the server was doing a parity check. Its been running for about 6 hours with 6 more hours to go. The load spikes for 15 or 30 minutes at a time, and the server becomes unresponsive (i can still run top and the CLI is responsive, however hdparm -tT hangs and I cannot copy files to the server and the share is unresponsive). Notice the very high load.

top - 10:22:37 up 7:13, 3 users, load average: 14.47, 9.39, 5.03

Tasks: 302 total, 2 running, 300 sleeping, 0 stopped, 0 zombie

Cpu(s): 0.4%us, 0.6%sy, 0.0%ni, 90.6%id, 8.3%wa, 0.0%hi, 0.0%si, 0.0%st

Mem: 66037640k total, 55752540k used, 10285100k free, 612692k buffers

Swap: 0k total, 0k used, 0k free, 54185592k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

4603 root 20 0 0 0 0 S 9 0.0 138:39.53 unraidd

4550 root 20 0 0 0 0 D 2 0.0 34:48.37 mdrecoveryd

4530 root 20 0 89120 3452 2960 S 1 0.0 1:00.03 emhttp

8 root 20 0 0 0 0 S 0 0.0 0:48.83 rcu_preempt

1085 root 20 0 0 0 0 S 0 0.0 0:11.12 kworker/8:1

1166 root 0 -20 0 0 0 S 0 0.0 0:51.37 kworker/10:1H

4545 root 0 -20 0 0 0 S 0 0.0 0:50.38 kworker/11:1H

4693 root 20 0 619m 25m 552 S 0 0.0 80:30.11 shfs

7722 root 20 0 0 0 0 S 0 0.0 0:10.40 kworker/11:2

17944 root 20 0 26624 4772 4316 S 0 0.0 0:00.02 sshd

1 root 20 0 4368 1400 1300 S 0 0.0 0:14.69 init

2 root 20 0 0 0 0 S 0 0.0 0:00.00 kthreadd

3 root 20 0 0 0 0 S 0 0.0 0:18.13 ksoftirqd/0

5 root 0 -20 0 0 0 S 0 0.0 0:00.00 kworker/0:0H

6 root 20 0 0 0 0 S 0 0.0 0:00.01 kworker/u48:0

9 root 20 0 0 0 0 S 0 0.0 0:00.00 rcu_sched

10 root 20 0 0 0 0 S 0 0.0 0:00.00 rcu_bh

Its a fairly powerful box. I have dual hex core xeons, and 64GB of ram in the server, a M1015 controller in IT mode. Even under high load, I would still expect that samba would respond in a timely fashion, like in a few seconds, as opposed to several minutes. Right now, I can't even do an "hdparm -tT /dev/xxx". It just sits and waits. When not in the degraded condition, I can copy files out of the server at wire speed, normally around 110 Megabytes per second or so. The degraded condition tends to come and go. It seems to happen most when running preclear or forced parity checks, but happens at other times randomly. No issues in the smart report.

THanks for any help.

With dual hex core cpu's, a load average of 14 is actually nothing.

You've got a total of 12 cores + 12 hyperthreaded cores, and load averages are related to total cores. A load average of 24 means that all cores are fully ultilized. Anything less means you still have room to spare.

On the other hand, if you only had a single core, then 14 would be rather extreme.

http://www.howtogeek.com/194642/understanding-the-load-average-on-linux-and-other-unix-like-systems/

As johnathanm stated, its probably reiserfs

Quote

High Load on Server / Unresponsive

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)