Jump to content

High Load on Server / Unresponsive


Recommended Posts

I posted this a few months ago and got no response, I've since upgraded to Unraid 6 and I'm still seeing it.  Load on the server will spike for 15 or 30 minutes at a time.  The server basically becomes on responsive, shares disconnect, running any "hdparm -tT" commands on any drive leads to a hung session.  There is nothing in the logs at all to indicate a problem.  The top output below was when the server was doing a parity check.  Its been running for about 6 hours with 6 more hours to go.  The load spikes for 15 or 30 minutes at a time, and the server becomes unresponsive (i can still run top and the CLI is responsive, however hdparm -tT hangs and I cannot copy files to the server and the share is unresponsive).  Notice the very high load.

 

top - 10:22:37 up  7:13,  3 users,  load average: 14.47, 9.39, 5.03

Tasks: 302 total,  2 running, 300 sleeping,  0 stopped,  0 zombie

Cpu(s):  0.4%us,  0.6%sy,  0.0%ni, 90.6%id,  8.3%wa,  0.0%hi,  0.0%si,  0.0%st

Mem:  66037640k total, 55752540k used, 10285100k free,  612692k buffers

Swap:        0k total,        0k used,        0k free, 54185592k cached

 

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND

4603 root      20  0    0    0    0 S    9  0.0 138:39.53 unraidd

4550 root      20  0    0    0    0 D    2  0.0  34:48.37 mdrecoveryd

4530 root      20  0 89120 3452 2960 S    1  0.0  1:00.03 emhttp

    8 root      20  0    0    0    0 S    0  0.0  0:48.83 rcu_preempt

1085 root      20  0    0    0    0 S    0  0.0  0:11.12 kworker/8:1

1166 root      0 -20    0    0    0 S    0  0.0  0:51.37 kworker/10:1H

4545 root      0 -20    0    0    0 S    0  0.0  0:50.38 kworker/11:1H

4693 root      20  0  619m  25m  552 S    0  0.0  80:30.11 shfs

7722 root      20  0    0    0    0 S    0  0.0  0:10.40 kworker/11:2

17944 root      20  0 26624 4772 4316 S    0  0.0  0:00.02 sshd

    1 root      20  0  4368 1400 1300 S    0  0.0  0:14.69 init

    2 root      20  0    0    0    0 S    0  0.0  0:00.00 kthreadd

    3 root      20  0    0    0    0 S    0  0.0  0:18.13 ksoftirqd/0

    5 root      0 -20    0    0    0 S    0  0.0  0:00.00 kworker/0:0H

    6 root      20  0    0    0    0 S    0  0.0  0:00.01 kworker/u48:0

    9 root      20  0    0    0    0 S    0  0.0  0:00.00 rcu_sched

  10 root      20  0    0    0    0 S    0  0.0  0:00.00 rcu_bh

 

Its a fairly powerful box.  I have dual hex core xeons, and 64GB of ram in the server, a M1015 controller in IT mode.    Even under high load, I would still expect that samba would respond in a timely fashion, like in a few seconds, as opposed to several minutes.  Right now, I can't even do an "hdparm -tT /dev/xxx".  It just sits and waits.  When not in the degraded condition, I can copy files out of the server at wire speed, normally around 110 Megabytes per second or so.  The degraded condition tends to come and go.  It seems to happen most when running preclear or forced parity checks, but happens at other times randomly.  No issues in the smart report. 

 

THanks for any help.

Link to comment

I think you are on to something.  Even though I have nearly 1TB left, the filesystem is very full.

 

root@Tower:/var/log# df -h

Filesystem      Size  Used Avail Use% Mounted on

tmpfs          128M  1.9M  127M  2% /var/log

/dev/sda1      7.5G  62M  7.5G  1% /boot

/dev/md1        3.7T  3.6T  115G  97% /mnt/disk1

/dev/md2        3.7T  3.5T  147G  97% /mnt/disk2

/dev/md3        3.7T  3.6T  112G  97% /mnt/disk3

/dev/md4        2.8T  2.6T  149G  95% /mnt/disk4

/dev/md5        3.7T  3.5T  230G  94% /mnt/disk5

/dev/md6        3.7T  3.5T  230G  94% /mnt/disk6

shfs            21T  20T  981G  96% /mnt/user

/dev/loop0      1.8M  80K  1.6M  5% /etc/libvirt

 

Does that kind of utilization cause these issues in reiserfs?  Anything that I can do about it (short of adding more disk)?  Converting to XFS help (probably rather not do that)?  Optimize reiserfs?

 

I can't post the diagnostic.zip due to my company's security  rules (serial numbers, IP address, etc).  Thanks!

 

Link to comment

I posted this a few months ago and got no response, I've since upgraded to Unraid 6 and I'm still seeing it.  Load on the server will spike for 15 or 30 minutes at a time.  The server basically becomes on responsive, shares disconnect, running any "hdparm -tT" commands on any drive leads to a hung session.  There is nothing in the logs at all to indicate a problem.  The top output below was when the server was doing a parity check.  Its been running for about 6 hours with 6 more hours to go.  The load spikes for 15 or 30 minutes at a time, and the server becomes unresponsive (i can still run top and the CLI is responsive, however hdparm -tT hangs and I cannot copy files to the server and the share is unresponsive).  Notice the very high load.

 

top - 10:22:37 up  7:13,  3 users,  load average: 14.47, 9.39, 5.03

Tasks: 302 total,  2 running, 300 sleeping,  0 stopped,  0 zombie

Cpu(s):  0.4%us,  0.6%sy,  0.0%ni, 90.6%id,  8.3%wa,  0.0%hi,  0.0%si,  0.0%st

Mem:  66037640k total, 55752540k used, 10285100k free,  612692k buffers

Swap:        0k total,        0k used,        0k free, 54185592k cached

 

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND

4603 root      20  0    0    0    0 S    9  0.0 138:39.53 unraidd

4550 root      20  0    0    0    0 D    2  0.0  34:48.37 mdrecoveryd

4530 root      20  0 89120 3452 2960 S    1  0.0  1:00.03 emhttp

    8 root      20  0    0    0    0 S    0  0.0  0:48.83 rcu_preempt

1085 root      20  0    0    0    0 S    0  0.0  0:11.12 kworker/8:1

1166 root      0 -20    0    0    0 S    0  0.0  0:51.37 kworker/10:1H

4545 root      0 -20    0    0    0 S    0  0.0  0:50.38 kworker/11:1H

4693 root      20  0  619m  25m  552 S    0  0.0  80:30.11 shfs

7722 root      20  0    0    0    0 S    0  0.0  0:10.40 kworker/11:2

17944 root      20  0 26624 4772 4316 S    0  0.0  0:00.02 sshd

    1 root      20  0  4368 1400 1300 S    0  0.0  0:14.69 init

    2 root      20  0    0    0    0 S    0  0.0  0:00.00 kthreadd

    3 root      20  0    0    0    0 S    0  0.0  0:18.13 ksoftirqd/0

    5 root      0 -20    0    0    0 S    0  0.0  0:00.00 kworker/0:0H

    6 root      20  0    0    0    0 S    0  0.0  0:00.01 kworker/u48:0

    9 root      20  0    0    0    0 S    0  0.0  0:00.00 rcu_sched

  10 root      20  0    0    0    0 S    0  0.0  0:00.00 rcu_bh

 

Its a fairly powerful box.  I have dual hex core xeons, and 64GB of ram in the server, a M1015 controller in IT mode.    Even under high load, I would still expect that samba would respond in a timely fashion, like in a few seconds, as opposed to several minutes.  Right now, I can't even do an "hdparm -tT /dev/xxx".  It just sits and waits.  When not in the degraded condition, I can copy files out of the server at wire speed, normally around 110 Megabytes per second or so.  The degraded condition tends to come and go.  It seems to happen most when running preclear or forced parity checks, but happens at other times randomly.  No issues in the smart report. 

 

THanks for any help.

With dual hex core cpu's, a load average of 14 is actually nothing.

 

You've got a total of 12 cores + 12 hyperthreaded cores, and load averages are related to total cores.  A load average of 24 means that all cores are fully ultilized.  Anything less means you still have room to spare.

 

On the other hand, if you only had a single core, then 14 would be rather extreme.

 

http://www.howtogeek.com/194642/understanding-the-load-average-on-linux-and-other-unix-like-systems/

 

As johnathanm stated, its probably reiserfs

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...