[6.5.2] Server is sticky like marmelade. Logs attached.

July 14, 20187 yr

Hello,

I had my server running for a while, but it seems to me that my SSD-cache pool is eating SSDs alive. A lot of wear if I can read the SMART correctly.

I have an issue, when doing IO-intensive work, like downloading, unpacking, etc my whole server is slow as syrup. All this happens in the cache-pool, and I do wonder if this have something to do with my cache-SSDs being ready for replacement.

I'm thinking about moving a lot of my IO-work(for example all the SABNZBD-operations) to my ramdrive. I had great experience of moving all my transcoding to the ramdrive, so I will consider this with my SABNZBD-stuff also.

Some of the symptoms:

Slow response from dockers and web-ui of Unraid
Copy to and from cache-pool at very slow speeds
Low CPU-usage observed
Low RAM-usage observed

I have a INTEL Xeon E5-2630v4 2,20GHz as CPU, so the CPU should not be an bottleneck for this type of work.

Hopefully some of you can read through the logs, tbh, I don't know what to look for. The diagnostics was taken during these problems with slow server, so they hopefully could shed some light on these matters.

Thanks,

Thomas

vault-diagnostics-20180714-2054.zip

Quote

July 14, 20187 yr

One thing to remember is that if your SSD doesn't have overprovisioning, and you don't regularly trim it, then it gets much additional wear. It isn't uncommon to have a write amplification of 10-20 or even more. Which means that when you write 1 GB to the SSD, the actual wear on the SSD may be 10-20 GB. Some SSD can show the actual write amplification in the SMART data.

Another thing is that an SSD that doesn't have overprovisioning and isn't regularly trimmed will become very slow, because it never have any erased flash blocks to use. So on every write, the SSD must copy the content of an in-use flash block into RAM. Then erase the flash block. Then restore the parts of the flash block that isn't overwritten from RAM. And finally copy in your new data. Then that flash block is fully used again. So next block you need to write forces the SSD to repeat the same process again.

Quote

July 14, 20187 yr

What @pwm stated. The diagnostics show that you've got a Windows VM running (probably not doing particularly much though, although Windows constantly accesses the hard drives), Plex is currently transcoding a file, along with serving another file, and sab is unRar'ing a file, Crashplan may (or may not) be currently uploading. The unRar however is taking miniscule amounts of actual CPU usage.

But, if you consider that if you're downloading to the cache drive (odds on yes), you're unrar'ing to the cache drive (odds on yes), transcoding and saving the temporary transcode files onto the cache drive (possibly yes), and the two Plex files may or may not also be on the cache drive (not moved to the array yet), maybe you're running into a perfect storm of bandwidth on the drives.

What is really awesome for diagnosis in these cases is to install the Netdata applications (you can get it via Apps). Then you'll be able to graphically see everything going on with the drives / cpu / memory in realtime and figure out where the problem actually lies.

Quote

July 14, 20187 yr

Author

Thanks for fast replies @pwm and @Squid. I think I found some of the most troubling when awaiting answers here.

My plex transcode goes directly to RAM-drive, so that's not a concern(as far as I understand), what I have found is that sabnzbd is download to the array, and that disk6 which sabnzbd is using for both downloading and unraring, is having whoppingly over 1 million load cycle counts, so there might be something that is affecting this disk. Non of the other older disks is even close. 22.000 is the closest.

I will move all my IO-stuff to the SSDs and will get another 32GB of RAM so that I can do all my unraring and downloading directly to RAM before moving to the array(via cache).

Thanks for your help!!

Quote

July 14, 20187 yr

look at the wdidle3 utility. Your drive is parking its head every 2 minutes on average, and possibly also attempting a spin down on its own. Many, many controllers will pause all transfers to and from drives (all drives attached to the controller) when a drive has to spin up.

Quote

July 14, 20187 yr

19 minutes ago, thomas.lone said:

and that disk6 which sabnzbd is using for both downloading and unraring, is having whoppingly over 1 million load cycle counts

That's a huge amount.

What you need to do is to reconfigure the drive. I have been using hdparm on a number of drives to reduce the power management level.

This is a WD Red. I wasn't aware that any Red had this issue. Lots of WD Blue and Green drives comes with a default setting that makes them self-destruct when used in Linux machines.

Quote

July 14, 20187 yr

Author

Thanks again! I did download the wd5741-tool for Linux it worked superbly. Scanned all my disks, the only one needed update was the affected disk6. So now that's been taken care of. Hopefully this will now make my server running more smoothly! Now i will look into the Netdata-tool! Looks promising!

@pwm The Reds also had these issues. I have already trashed 1 disk because of it

For reference I have attaced the tool if anyone needs it. Works with WD Red only. Not Green.

Thanks,

Thomas

wd5741x64

Quote

July 15, 20187 yr

Community Expert

12 hours ago, tjo099 said:

Works with WD Red only. Not Green.

Wdidle3 works on Reds, Greens and Blues (except the older 7.200 rpm blues).

Quote

July 15, 20187 yr

Author

Wdidle3 yeah, but not the wd5741-tool.

Quote

[6.5.2] Server is sticky like marmelade. Logs attached.

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)