Locked up server when disk is accessed


Recommended Posts

I've had this issue for a while and I'm about at my wits end. I'm tried a ton of different things and the problem persists. I first saw issues as my array started to fill up and when NZBGet would unpack a download the load average would spike as a ton of processes would enter state "D" uninterruptible sleep (See picture). It turns out my CPU is sitting there doing nothing, memory is fine, but load average will climb to 20, 30, 40+ and the web UI becomes unresponsive, "docker ps" won't complete, all docker images and VM's just lock and won't do anything.

 

Looking at docker logs for the containers I see messages about "database locked", "query took over 1 seconds", etc (sqlite DB's).

 

I've tried limiting NZBGet's resource allotment but it appears it was just the cannery as even without it running the system can lock up. 

 

I have 10x5TB drives in the array and 1x256GB SSD cache drive. The array drives were all sitting at about 95% full but I just deleted about a 1TB of content to get all the drives under 90% as my previous hunch was nearly-full drives were causing slow reads/writes and by clearing out some space I could alleviate the problem. It hasn't. I've done this (or moved data) and I thought that helped fixed this issue but it may have just been my imagination. Pretty much my entire setup is automated so I don't pay much attention to it unless my VM locks up or one of my docker containers stops responding.

 

I will do ANYTHING to try to fix this as it's becoming a time sync. I have 3 unraid boxes at the highest paid tier but this is my only box I run multiple docker containers on as it's my "power house" while the other two are glorified NFS's that are mounted on the main box for Plex to serve up.

 

Here are some stats and I've also attached the diagnostic logs:

 

Model: Custom
M/B: ASRock - Z97 Extreme6
CPU: Intel® Core™ i7-4770K CPU @ 3.50GHz
HVM: Enabled
IOMMU: Disabled
Cache: 256 kB, 1024 kB, 8192 kB
Memory: 32 GB (max. installable capacity 32 GB)
Network: eth0: 1000 Mb/s, full duplex, mtu 1500 eth1: not connected
Kernel: Linux 4.9.10-unRAID x86_64
OpenSSL: 1.0.2k

Screenshot 2017-08-06 12.57.39.png

tower-diagnostics-20170806-1315.zip

Link to comment

I assume you are referencing this:

 

Aug  6 13:31:14 Tower sshd[7408]: Received disconnect from 116.31.116.43 port 35126:11:  [preauth]
Aug  6 13:31:14 Tower sshd[7408]: Disconnected from 116.31.116.43 port 35126 [preauth]
Aug  6 13:32:24 Tower sshd[8082]: Received disconnect from 116.31.116.43 port 59440:11:  [preauth]
Aug  6 13:32:24 Tower sshd[8082]: Disconnected from 116.31.116.43 port 59440 [preauth]
Aug  6 13:32:54 Tower sshd[6366]: Received disconnect from 96.29.187.74 port 40130:11: disconnected by user
Aug  6 13:32:54 Tower sshd[6366]: Disconnected from 96.29.187.74 port 40130
Aug  6 13:32:55 Tower sshd[6275]: Received disconnect from 96.29.187.74 port 33723:11: disconnected by user
Aug  6 13:32:55 Tower sshd[6275]: Disconnected from 96.29.187.74 port 33723
Aug  6 13:33:34 Tower sshd[10060]: Received disconnect from 116.31.116.43 port 25303:11:  [preauth]
Aug  6 13:33:34 Tower sshd[10060]: Disconnected from 116.31.116.43 port 25303 [preauth]
Aug  6 13:34:42 Tower sshd[11499]: Received disconnect from 59.45.175.11 port 50098:11:  [preauth]

I have pubkey auth-only enabled but yes this is annoying as hell in my syslogs. A while back I looked into something like fail2ban for unraid but didn't find anything.

 

Edit: Also my server is not in a DMZ, just 22, 80, 443 are exposed through the router (which yes, you could argue is close enough to being in a DMZ).

 

Edit 2: I just re-setup DenyHosts, I had it before but I had some issue with it (I can't remember what the problem was). Hopefully that will block the attacker.

Edited by joshstrange
Added info
Link to comment

@trurl I have DenyHosts setup and I have blocked the attempts. I need them open for the following reasons:

 

22: This is my jump server into my home network, it is key-only login

 

80: To redirect to 443

 

443: To serve up my SSL apps running in docker containers

 

That said this is slightly divergent from my main issue which is writing to the array takes a long time, causes IO blocking, and causes the load average to jump to 40+.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.