Containers slowing to a crawl

hermy65 · December 30, 2018

Over the last few days ive been running into an issue where i would notice the various containers i was running would be ridiculously slow and in sometime 100% unresponsive. Downloads would be down around 300kb/s and thats where they would stay. If i rebooted the server everything would return to normal and downloads would be in that 40-60Mb/s then after an hour or two everything would go to hell again.

Today i thought maybe it was an issue with my docker image so i removed it and started to rebuild my containers but that doesnt seem to be helping me either.

My machine isnt underpowered so that shouldnt be the issue but im running out of ideas.

Edit: Ive been adding containers for ~1.5 hours and ive only been able to add maybe 10. Something is definitely not right here

Diagnostics are attached

storage-diagnostics-20181229-2350.zip

Edited December 30, 2018 by hermy65

JorgeB · December 30, 2018

There are read/write errors on two of your cache devices, mostly cache2:

Dec 29 22:22:04 Storage kernel: BTRFS info (device sdd1): bdev /dev/sdd1 errs: wr 0, rd 20, flush 0, corrupt 0, gen 0
Dec 29 22:22:04 Storage kernel: BTRFS info (device sdd1): bdev /dev/sdc1 errs: wr 567551, rd 153668, flush 7072, corrupt 0, gen 0

This will cause corruption on NOCOW shares, like the system share is by default, see here for more info:

https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=700582

hermy65 · January 6, 2019

@johnnie.black I replaced sdc in the above message you posted but now when i run the btrfs dev stats /mnt/cache command it shows 0 for all drives. Is that normal that the errors on sdd would go away after replacing sdc?

Also, im still seeing a lot of sluggishness with most things running on my unraid box even after the replacement. Any other suggestions? Running the diagnostics took ~20 minutes.

For reference, my machine is running dual Xeon E5-2630 v4's with 64gb of ram

storage-diagnostics-20190105-2034.zip

Edited January 6, 2019 by hermy65
Added diagnostics

JorgeB · January 6, 2019

8 hours ago, hermy65 said:

Is that normal that the errors on sdd would go away after replacing sdc?

Likely they are reset when a device is replaced.

Nothing jumps out in the syslog, though I might have missed something since it's spammed with lines similar to these:

Jan  5 20:32:47 Storage root: #012/dev/sdaa:#012 drive state is:  active/idle
Jan  5 20:32:47 Storage root: #012/dev/sdl:#012 drive state is:  unknown
Jan  5 20:32:47 Storage root: #012/dev/sdg:#012 drive state is:  unknown
Jan  5 20:32:47 Storage root: #012/dev/sdac:#012 drive state is:  unknown
Jan  5 20:32:47 Storage root: #012/dev/sdab:#012 drive state is:  active/idle
Jan  5 20:32:47 Storage root: #012/dev/sdr:#012 drive state is:  standby
Jan  5 20:32:47 Storage root: #012/dev/sdv:#012 drive state is:  standby
Jan  5 20:32:47 Storage root: #012/dev/sdo:#012 drive state is:  active/idle

hermy65 · January 6, 2019

@johnnie.black are those lines not normal? If not, is there something I need to do to make them stop?

Squid · January 6, 2019

The lines are coming from either the S3 plugin (disable its logging) or the Auto Turbo plugin (disable its debugging)

hermy65 · January 6, 2019

@Squid I had the TurboWrite plugin installed and debugging was enabled so i assume that should fix it. Thanks!

Containers slowing to a crawl

Recommended Posts

hermy65

Link to comment

JorgeB

Link to comment

hermy65

Link to comment

JorgeB

Link to comment

hermy65

Link to comment

Squid

Link to comment

hermy65

Link to comment

Join the conversation