High Loop2 Reads


Andiroo2

Recommended Posts

I’m having a new issue where my Loop2 process is reading insane amounts of data (350-400MB/s) from my cache (2x 500GB WD Blue SSDs in BTRFS RAID1) when I’m watching 4K videos in Plex Docker.  The bursts last for 5-10 mins or so.  I know it’s Loop2 because it’s showing in IOTOP s the top process when it’s happening, and the read rate matches the Unraid interface.  In this case, the Plex movies are on the cache, and my docker and all system/domains files are on cache.  I have an E5-2699v4 CPU (42 threads) with 50% dedicated to Plex.  All Plex metadata is on the cache as well.  

 

Now, I’m direct playing the videos in Plex Docker, so I’m not transcoding there.  I don’t know that it’s related to the 4K movies playing, but that’s when I notice it (it eventually chokes the docker service and Plex stops for everyone watching).

 

So, to trouble this from the Unraid perspective, how can I see into the Loop2 process to understand what those reads really are? 

 

EDIT:  This doesn’t happen constantly, only in these bursts, which is why I’m not yet sure that it’s related to the Plex activity. 

 

Thanks!

Edited by Andiroo2
Details
Link to comment
  • 4 weeks later...
  • 3 weeks later...

I'm having the same issue as well.  Was researching and posted to the below linked topic.  I'm going to remove that docker and try a different Plex Docker Image.

I add the repository which I think is directly from plex (https://github.com/plexinc/pms-docker).

 

I originally thought it was a brtfs issue with the cache but it doesn't seem to be the problem... ?

 

Any solutions on your end?

 

 

 

 

 

https://forums.unraid.net/topic/98114-new-cache-pool-ssd-default-format-to-btrfs-i-want-xfs/?tab=comments#comment-929089

Link to comment

I'm pretty certain Plex isn't the cause in my issue linked above. I re-created my docker.img btrfs with no effect.

 

/dev/loop2 is the docker image that lives on the cache drive (usually). So it's not just used for Plex - but for any/all dockers you may have running.

 

Note that the issue we are seeing is lots of *reads* (this thread and mine linked above), not writes on the /dev/loop2 device.

Link to comment
  • 2 months later...

I'm seeing the same behaviour, docker loop device is showing a very high amount of reads. System load climbs to ridiculous amount, CPU usage shows as 100%, server becomes unresponsive/sluggish/things start crashing/misbehaving.

I've moved 'docker.img' to a separate SSD from 'app_data', but that hasn't helped.

Does anyone have a resolution?

Link to comment

My original issue was related to Plex RAM transcoding set up incorrectly. I was transcoding to /tmp, which allowed the Plex docker to eventually use all the available RAM in the system without properly freeing it up when it ran out. 
 

I changed to a fixed RAM disk instead, set to 12GB, and the system has worked perfectly ever since. When it approaches 12GB used for transcoding, it properly frees up RAM on the RAM disk and transcoding keeps on moving without impacting the rest of the server. 

Link to comment
15 hours ago, soabwahott said:

6.8.3

I am still running 6.8.3 and have this issue, so I don't think this is related to 6.9.x.

 

I do run Plex but it's not a transcoding issue. 

 

Are you running Crashplan? I don't believe Crashplan is specifically the problem, but perhaps somehow causes it more often. At the moment my system is doing it within a few hours of starting the Crashplan docker.

Link to comment
  • 1 month later...

I think I'm having the same issue.  I posted previously about it (link to thread with multiple diagnostic files below).  Here's what I've found...

  • iowait causes all 4 CPUs to peg at 100%, system becomes mostly unresponsive
  • iotop -a shows large amount of accumulating READS from the cache disk (at >300MB/S), specifically, loop2
  • restarting a docker container via command line will fix the problem (for example, docker restart plex, or docker restart netdata)

I can not figure out a pattern to when this happens.  Mover or TRIM is not running.  No one watching a plex movie.

 

I'm on 6.8.3, all drives formatted to XFS

 

 

 

Link to comment

As per others in this thread my issue turned out to be related to filling up the RAM on my server.

 

I use Borg for backups and was running Ubuntu Server previously. Borg by default creates cache files in the home directory. On ubuntu server this was fine because they sat on my main SSD, whereas on Unraid the home directory is created on the ramdisk. Once I told Borg to put the cache files on my array the problem went away and I've had no issues since.

 

If anyone else sees similar behaviour with high reads on loop2 please check your ram usage, if this is showing as very high when the issue occurs it is likely that something is filling it up, e.g. plex transcoding or something else writing to /tmp/ or running outside of docker/vms. 

Link to comment

I am fairly certain it wasn't RAM related for me. I have 10GB and whilst I run quite a few dockers I don't think I'm hitting any memory issues.

 

 

Looking at my screenshot of top here, shows ~2GB cached so not running out of memory. If that were an issue you should start seeing oom logs.

 

In the general process of things I have replaced my cache drive (so new filesystem on it and then re-created the docker.img which is loop2 so new filesystem there too). I also moved if from the SATA II only internal port to a spare SATA III port I had on an add in card. It does appear to be happening less, but I had another instance the other day so that certainly hasn't resolved it.

 

My CPUs (4C/8T Xeon E3) weren't getting pegged, but there was a lot of iowaiting happening.

Link to comment

Another thing I noticed during my last incident, when I ran 'docker stats', I could see that the netdata container was marked 'unhealthy'

 

docker.thumb.jpg.bfdbfdc3b032e87a4662e6c0de98cf2c.jpg

 

I never had netdata running previously when this happened, I just turned it on recently to try and figure this issue out. So I don't think this specific docker is the cause.

 

Also, from the 5 diagnostics top files I have accumulated while this occurs:

  1. MiB Mem :   7667.6 total,    117.7 free,   6593.1 used,    956.7 buff/cache
  2. MiB Mem :   7667.6 total,    131.4 free,   6260.4 used,   1275.8 buff/cache
  3. MiB Mem :   7667.6 total,    121.9 free,   6270.2 used,   1275.5 buff/cache
  4. MiB Mem :   7667.6 total,    121.2 free,   6304.1 used,   1242.3 buff/cache
  5. MiB Mem :   7667.6 total,    117.0 free,   6156.9 used,   1393.7 buff/cache

So similar to @Shonky in terms of RAM related:  It doesn't see to point to that.

 

Curious how to figure out what is causing the loop2 read or if there is a workaround to restart a docker if it's marked as unhealthy?

 

 

 

Link to comment
2 minutes ago, DingHo said:

Curious how to figure out what is causing the loop2 read or if there is a workaround to restart a docker if it's marked as unhealthy?

I don't think it's your unhealthy container, but have nothing to back that.

 

If I catch mine in the early stages, things like docker ps, docker stop/start etc work ok. Usually by the time I catch it though, it's because something like Plex or Pi-Hole dockers are being blocked from doing anything. It's too late then and docker commands just get blocked until it comes back. So I can't properly shut down a docker and have to kill some processes that docker is running.

 

I've toyed with just as a work around, something monitoring system load. Any time it gets over say 20 or 30 even on 1 minute and restart a docker perhaps after capturing some logs. Whilst CrashPlanPRO seems to be involved I have had occasions where restarting other very lightweight dockers was enough.

 

Unfortunately mine does it even less than it used to now.

Link to comment

I also run 'netdata' and saw similar behaviour, either my 'netdata' docker would stop collecting data or crash completely.

 

The RAM stats posted above are similar to what my own were. I thought I didn't have a problem because I had a fair chunk of memory shown as buff/cache. However, I think because unraid runs in the ramdisk this isn't a true reflection of what's going on compared to a conventional Linux OS. 

Link to comment

buff/cache is RAM allocated to buffers by the OS. It is not allocated RAM and is available if something tries to allocate RAM. I doubt there is any significant difference in the buff/cache handling in unRAID. It's still pretty close to a regular Linux kernel.

 

If unRAID was filling its ram disk (and you didn't have enough), you'll start seeing OOM errors. My system shows / as 4.8GB total with about 900MB used.

 

I don't have a netdata docker and haven't tried it either.

Edited by Shonky
Link to comment
  • 2 months later...
  • 2 weeks later...
On 7/24/2021 at 7:36 PM, DingHo said:

Curious if anyone has made any progress on this issue.  I'm still encountering it on a fairly regular basis, even after updating to 6.9.2.

I've not made any definite change that fixed it but I haven't had a re-occurrence for a while now. Running 6.9.2 but I'm pretty certain that I was still getting it with 6.9.2 also. Not that 6.9.2 had changed expected to fix this.

 

I think the most significant thing I did change was I moved my cache drive SSD SATA connection from the HP Microserver ( Gen 8 ) motherboard and used the PCIe SATA card I had installed. The main reason I did was that it was limited to SATA II speeds. I can't say 100% but at least since then I've not had an issue.

 

The other thing I had done was that I changed the SSD itself which meant recreating the whole docker.img filesystem. No improvement from that change.

Link to comment

@Shonky  Thanks for the update.  I too attempted rebuilding the docker image, with no effect.

I think I was able to chase down a cause for my problem, however not sure it applies to anyone but my particular case, and probably not to unRAID in general.

 

When Plex was running its scheduled tasks, it was getting 'stuck' on some music files in one of my libraries.  While stuck it would read like crazy on the cache drive, even though the songs are on the array.  I could reproduce the issue several times by setting the scheduled task and the watching it get stuck.  My temporary solution was to remove that particular music library, I'll have to investigate further to see what file(s) in particular where causing the issue.  Just thought I'd update in case this helps anyone else.

Link to comment

@DingHo

Now that you say that, I do also run plex and had some issues with the scheduled scan failing/timing out/something on a couple of random files. I think they were video files and I ended up removing them. If it was those files specifically, it wasn't 100% repeatable because it would have happened every night.

 

I don't remember if if I was looking at the Plex issue because of the loop2 reads or independently.

 

Sorry that's not really helpful. If you turn on debug in plex it's pretty easy to find the specific files causing the problem. And you can trigger the full scan manually. The scan runs but is killed after an hour from memory if it's taking too long.

Edited by Shonky
Link to comment

Jinxed it. Caught it happening again tonight so none of the above things fixed anything.

 

I only touched two dockers to bring it back around. Gitea and Photoprism. The latter did seem to be scanning. I'm only playing with it so killed it pretty quickly. Previously it did seem like CrashPlanPRO was a cause, but not always.

 

Seems like dockers that generate a lot of IO (Plex scanning, Photoprism scanning, CrashPlanPRO indexing/scanning etc ) tend to trigger it off. Stop one or two of them or even others sometimes and it comes back around. Almost like it's disk thrashing on a mechanical disk.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.