[6.5.2] Server is sticky like marmelade. Logs attached.


Recommended Posts

Hello,

I had my server running for a while, but it seems to me that my SSD-cache pool is eating SSDs alive. A lot of wear if I can read the SMART correctly.

I have an issue, when doing IO-intensive work, like downloading, unpacking, etc my whole server is slow as syrup. All this happens in the cache-pool, and I do wonder if this have something to do with my cache-SSDs being ready for replacement.

 

I'm thinking about moving a lot of my IO-work(for example all the SABNZBD-operations) to my ramdrive. I had great experience of moving all my transcoding to the ramdrive, so I will consider this with my SABNZBD-stuff also.

 

Some of the symptoms:

  • Slow response from dockers and web-ui of Unraid
  • Copy to and from cache-pool at very slow speeds
  • Low CPU-usage observed
  • Low RAM-usage observed

 

I have a INTEL Xeon E5-2630v4 2,20GHz as CPU, so the CPU should not be an bottleneck for this type of work.

 

Hopefully some of you can read through the logs, tbh, I don't know what to look for. The diagnostics was taken during these problems with slow server, so they hopefully could shed some light on these matters.

 

Thanks,

 

Thomas

 

 

 

vault-diagnostics-20180714-2054.zip

Link to comment

One thing to remember is that if your SSD doesn't have overprovisioning, and you don't regularly trim it, then it gets much additional wear. It isn't uncommon to have a write amplification of 10-20 or even more. Which means that when you write 1 GB to the SSD, the actual wear on the SSD may be 10-20 GB. Some SSD can show the actual write amplification in the SMART data.

 

Another thing is that an SSD that doesn't have overprovisioning and isn't regularly trimmed will become very slow, because it never have any erased flash blocks to use. So on every write, the SSD must copy the content of an in-use flash block into RAM. Then erase the flash block. Then restore the parts of the flash block that isn't overwritten from RAM. And finally copy in your new data. Then that flash block is fully used again. So next block you need to write forces the SSD to repeat the same process again.

  • Like 1
Link to comment

 

What @pwm stated.  The diagnostics show that you've got a Windows VM running (probably not doing particularly much though, although Windows constantly accesses the hard drives), Plex is currently transcoding a file, along with serving another file, and sab is unRar'ing a file, Crashplan may (or may not) be currently uploading.  The unRar however is taking miniscule amounts of actual CPU usage.

 

But, if you consider that if you're downloading to the cache drive (odds on yes), you're unrar'ing to the cache drive (odds on yes), transcoding and saving the temporary transcode files onto the cache drive (possibly yes), and the two Plex files may or may not also be on the cache drive (not moved to the array yet), maybe you're running into a perfect storm of bandwidth on the drives.

 

What is really awesome for diagnosis in these cases is to install the Netdata applications (you can get it via Apps).  Then you'll be able to graphically see everything going on with the drives / cpu / memory in realtime and figure out where the problem actually lies.

  • Like 1
Link to comment

Thanks for fast replies @pwm and @Squid. I think I found some of the most troubling when awaiting answers here.

My plex transcode goes directly to RAM-drive, so that's not a concern(as far as I understand), what I have found is that sabnzbd is download to the array, and that disk6 which sabnzbd is using for both downloading and unraring, is having whoppingly over 1 million load cycle counts, so there might be something that is affecting this disk. Non of the other older disks is even close. 22.000 is the closest.

 

I will move all my IO-stuff to the SSDs and will get another 32GB of RAM so that I can do all my unraring and downloading directly to RAM before moving to the array(via cache).

Thanks for your help!!

Link to comment

look at the wdidle3 utility.  Your drive is parking its head every 2 minutes on average, and possibly also attempting a spin down on its own.  Many, many controllers will pause all transfers to and from drives (all drives attached to the controller) when a drive has to spin up.

Link to comment
19 minutes ago, thomas.lone said:

and that disk6 which sabnzbd is using for both downloading and unraring, is having whoppingly over 1 million load cycle counts

 

That's a huge amount.


What you need to do is to reconfigure the drive. I have been using hdparm on a number of drives to reduce the power management level.

 

This is a WD Red. I wasn't aware that any Red had this issue. Lots of WD Blue and Green drives comes with a default setting that makes them self-destruct when used in Linux machines.

Link to comment

Thanks again! I did download the wd5741-tool for Linux it worked superbly. Scanned all my disks, the only one needed update was the affected disk6. So now that's been taken care of.  Hopefully this will now make my server running more smoothly! Now i will look into the Netdata-tool! Looks promising!

 

@pwm The Reds also had these issues. I have already trashed 1 disk because of it :(

 

For reference I have attaced the tool if anyone needs it. Works with WD Red only. Not Green.

 

Thanks,

Thomas

 

 

wd5741x64

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.