unRAID became slow after removing data disk

Ziggy · May 12, 2016

Hello all

After replacing my data disk because of issues explained here, I found my unRAID to become extremely unresponsive. After booting everything is fine for several minutes, then CPU usage seems to go up, the web UI takes ages to load a page and the array rebuild speed drops from over 100MBps to a couple of KBps. I can confirm this is happening regardless of the Docker image being enabled.

I'm running unRAID on a ESXI VM, have 16GB of ram assigned supported by a AMD FX6300 (6x3.5Ghz) and 7 datadrives as raw mapped LUNs connected (5 of them attached to the mobo, the rest to an PCI-E SATA controller). This issue only started after I removed the old faulty disk and restarted the array unprotected (I figured it'd be fine for the duration of the new drive pre-clearing). I did manage to go through 2 cycles of pre-clearing last night without the system becoming unstable, but only because the array wasn't mounted. So the issue only appears after mounting the array.

I managed to download the diagnostics report, please do find it included with the attachment.

Thank you in advance

-Ziggy

EDIT: I'm definitely thinking all those odd authentication attempts have something to do with it... I tried blocking the IP which magically solved the problem, though the attempts came back from an other origin... Suggestions?

ziggy_unraid-diagnostics-20160512-2222.zip

RobJ · May 13, 2016

It looks like you have more exposed to the Internet than the FTP port? Try locking it down to just the minimum you need, and no DMZ!

I don't see a reason, but you are right, just about everything at the end is using ridiculous amounts of CPU. It's not just one thing hogging the CPU. It doesn't look usable.

Ziggy · May 14, 2016

It looks like you have more exposed to the Internet than the FTP port? Try locking it down to just the minimum you need, and no DMZ!

I don't see a reason, but you are right, just about everything at the end is using ridiculous amounts of CPU. It's not just one thing hogging the CPU. It doesn't look usable.

Thank you for your reply. Yeah, I had SSH and Unraid forwarded as well. Don't really need it since I'm passing everything via Nginx so I got rid of those, and the flooding seems to have stopped.

I'm still having performance issues though, whereas it took about 3 minutes to load a page, it now takes over a minute. This time with Docker enabled and multiple containers running.

I attached another diagnostic archive. Can anyone shed some light on this bizarre situation?

ziggy_unraid-diagnostics-20160514-1554.zip

Ziggy · May 17, 2016

Bump, I would really appreciate some insights. Thank you!

Ziggy · May 23, 2016

Bump... Anyone?

RobJ · June 15, 2016

My apologies, I like to help, but I'm terrible about followup!

I've looked at the last diagnostics (old, May 14!), and it looks mostly OK. What's still striking is how much CPU is being used by the various apps and tools. Deluge is the busiest at 93, with other apps ranging from 84 to 53 to 15. What's remarkable though is how much ordinary tools are using, Diagnostics is using 45, and todos is using 63! It makes you wonder if your setup is not 'governing' correctly, perhaps being 'governed' down to its lowest CPU speed and never speeding back up. I noticed that in both syslogs, the CPU is determined to be a 3.5GHz processor, but barely nanoseconds later in one syslog, it was 3.2GHZ (still 3.5GHz in the other)! That's a second clue that controlling/maintaining CPU speed may be an issue. I don't have any ideas to help.

One other possible issue, in last syslog you were getting probable syn floods on port 49153, a listener set up by a docker, don't know which. You should probably check that out.

Ziggy · June 21, 2016

My apologies, I like to help, but I'm terrible about followup!

I've looked at the last diagnostics (old, May 14!), and it looks mostly OK. What's still striking is how much CPU is being used by the various apps and tools. Deluge is the busiest at 93, with other apps ranging from 84 to 53 to 15. What's remarkable though is how much ordinary tools are using, Diagnostics is using 45, and todos is using 63! It makes you wonder if your setup is not 'governing' correctly, perhaps being 'governed' down to its lowest CPU speed and never speeding back up. I noticed that in both syslogs, the CPU is determined to be a 3.5GHz processor, but barely nanoseconds later in one syslog, it was 3.2GHZ (still 3.5GHz in the other)! That's a second clue that controlling/maintaining CPU speed may be an issue. I don't have any ideas to help.

One other possible issue, in last syslog you were getting probable syn floods on port 49153, a listener set up by a docker, don't know which. You should probably check that out.

Hi Rob

Thank you still for following up, I didn't expect to receive any reply and I appreciate your time and efforts.

I wasn't able to figure out what the problem was so went ahead and got rid of ESXI. The hypervisor was kind of obsolete since I had everything set up in Docker, and ditching it for a full blown unRAID installation was a better idea anyway.

Having done that, everything seems to be running smoothly again. It was probably ESXI that was not throttling the CPU resources correctly, like you said.

unRAID became slow after removing data disk

Recommended Posts

Ziggy

Link to comment

RobJ

Link to comment

Ziggy

Link to comment

Ziggy

Link to comment

Ziggy

Link to comment

RobJ

Link to comment

Ziggy

Link to comment

Join the conversation