Parity check causes VM crash and playback issues


Recommended Posts

So I've been having an issue for a while, but I haven't had the time to look into it and it's been a low priority since it only happens every 2 months when I do a parity check. The issue is when I run a parity check my vm will start to lag and will eventually crash, reboot - repeat. Also, my emby users are reporting lag/stop in playback whenever I run a parity check. Think I've run into this issue with local playback too. Thing is, this never used to happen, it only started like maybe 3-4 parity checks ago which will make it about 6-8 months ago.

 

I started a parity check 10:05 today, and soon after the issue would manifest. 12 hours in I paused the parity check (thank you so much for this function btw!), because I was going to use my windows vm. I then looked at the log and I see call traces which I assume has something to do with my issue since it never shows up otherwise.

So if someone can take a look at my diags and maybe point out what could be wrong I would really appreciate it.


Ps: Please ignore the FCP message about my VM share is cache only but files exist on array. That's from a long time ago and are only old vdisks that are not in use right now. The VM I use now exists only on the cache drive so that's not the problem. 

 

Edit: I actually just now looked at my chat logs with one of my emby users and I was right, the issue first started happening in October 2018, which makes it about 8 months ago.

 

tower-diagnostics-20190609-2012.zip

Edited by strike
Link to comment

A quick look at your Diagnostics file showed that you're using an older processor and you have dual parity.  The parity2 calculation is a complex matrix arithmetic operation which requires a lot of cPU cycles to complete UNLESS your CPU has the AVX2 extensions.  See here for more details:

 

   

 

Link to comment

That must be it, if I'm not mistaken I upgraded to dual parity about 8-10 months ago. But those call traces, are they related to the fact that the CPU has too much to do, as in it almost can't handle it? I must admit, I can't remember I've looked at the CPU usage during a parity check for a long time. The CPU has always been powerful enough for my use. But yeah, it isn't really exactly new.. 

Link to comment

If I remember you are using an older XEON CPU.  It has eight cores.  That is why it is a reasonably powerful processor even today but you have to realize that each of those cores has only one eighth of the total capability of the whole.  You have probably dedicated several of them to the VM.  I am not sure quite how LimeTech has implemented the parity2 calculation.  Does it only run in one core or can it be spread across two or more cores?  

Link to comment

Well, after resuming the parity-check again and keeping an eye on the CPU usage I'm not entirely convinced on your theory. The total CPU usage never goes over 55%. Idle when the parity-check is not running the usage is about 20%. 

 

I don't have any emby users on right now but my VM started to lag/freeze almost the second I resumed the parity-check. The 4 first cores are barely being used (10-30%), occasionally some of the threads spikes to about 70% for one second. The 4 last cores which my VM uses is another story though. 3-4 of the total  8 threads are almost constantly at 100%, it goes down for a few seconds occasionally but right back to 100%. Which threads it is varies, but it seems to for the most part be one full core and one-two HT. And the others spikes as well, but they never stay at 100%.

 

My emby container only uses the first 4 cores and the vm uses as I said the last 4. Every other container uses, for the most part, the first core, as they don't do any heavy lifting anyway. I've never heard anything from my emby users about lag/stop when I'm not running a parity-check So I'm assuming 4 cores (even when transcoding) is enough. I use high priority on transcoding so the 4 cores will have some workout, but only for about 1-2 min until the whole movie is done transcoding. I can see that maybe be a problem for the emby users when the parity-check is running, but it should only last like I said 1-2 min. But my emby users are reporting stop/lag very frequently and that was long after it was finished transcoding too. So I don't think it has anything to do with the transcoding either. I even think they reported lag when there was no transcoding involved too IIRC, but I might be wrong about that.

 

6 hours ago, Frank1940 said:

I am not sure quite how LimeTech has implemented the parity2 calculation.  Does it only run in one core or can it be spread across two or more cores? 

I have no Idea how this works either but from what I'm seeing it can use all cores. And had it only been one I would have thought it would favor the first core?

But yeah, as I said the first 4 cores are barely being used. For the emby problem, I guess I could change the transcode priority to low to check if that solves it. It would just take longer to transcode the whole movie.

 

But I have no idea on how to solve the VM issue.. Any thoughts? I could maybe reverse the workload (put the VM on the first 4 cores and emby on the last 4) and see if that helps. But I can't see any logical reason why that would work either, unless the parity-check heavily favors the last 4 cores..

 

I could work around it all by using the parity-check tuning plugin of course and I most certainly will, but I would like to know the root cause so if I have to I can leave the parity check running regardless of what the server is doing (well not any gaming or heavy use of course). What happens when I need to rebuild a disk for example, is my server unusable for like 20 hours?

Link to comment

Thanks, love this community I'm learning something new almost every day! I will try to lower it even further to test. Can't really say I remember changing the value so current value most be default (on my system anyway).

 

Any other thoughts if my tests fail? Why does the performance hit (for the most part) the 4 last cores? Or does it just have to do with the fact that there is a VM running on them and somehow VM's get a massive performance hit during dual parity -check? I mean if I take the workload on the 4 first cores into consideration and "add it" to the last 4 cores it should not have impacted the VM that heavily. 

Link to comment
  • 3 years later...
On 6/11/2019 at 2:06 PM, strike said:

Thanks, love this community I'm learning something new almost every day! I will try to lower it even further to test. Can't really say I remember changing the value so current value most be default (on my system anyway).

 

Any other thoughts if my tests fail? Why does the performance hit (for the most part) the 4 last cores? Or does it just have to do with the fact that there is a VM running on them and somehow VM's get a massive performance hit during dual parity -check? I mean if I take the workload on the 4 first cores into consideration and "add it" to the last 4 cores it should not have impacted the VM that heavily. 

Hi @strike

 

Did you manage to solve the issue? I too have massive VM Issues (windows 10, now windows 11), when I am running parity check.  I've AMD Ryzen 5 5600X 6-Core and only the last 5 Cores (the first one are not in use by vm, so they are used only by the unraid server) and the overall load doesn't exceed 20-25%. I have lags every 10-15 seconds, which is making watching youtube or anything else impossible during the parity check.

Maybe you have a working solution :). Evenually you modified the "md_sync_thresh"? I don't event know what it is at this point and don't think I'm going to change something, if it doesn't help in the first place :)

 

Edited by Doublemyst
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.