6.9.2 Excessive Cache Writes Still Persisting After Re-Alignment to 1MiB (Solved) (Probably)


Recommended Posts

Hi all,

 

I've taken so many different actions to try and resolve this issue that it's hard to collate everything and to know where to start but here goes...

 

(Diagnostics attached)

 

...so essentially since 6.8.3 I've been struggling with the excessive cache writes issue. Due to personal circumstances I was not able to do any meaningful troubleshooting until the January or so after moving to 6.9.0

 

It should also be noted that I not sure if I was experiancing excessive writes prior to December of 2020 when my SSD Cache Pool was 4x 250GB SATA SSDs in Parity (BTRFS) mode, but I have since upgraded to 2x WD Blue SN550 1TB NVME (WDS100T2B0C) where the issue first came to my attention or possibly even started. Since they've been installed (to this date) they have 114,252,802 [58.4 TB] data units written each.

 

I've poured over every forum/reddit post about this issue and tried everything in them except for moving my cache over to XFS/Unencrpyted as I strictly require the encryption and redundency. I know I'm making life hard on myself with this...but I need it and prior to 

 

I'll have more specifics on writes below.

 

My Specs are:

  • Ryzen 2700X
  • TUF B450-PLUS GAMING
  • 32GB RAM (4x Team Group Vulcan Z T-Force 8GB DDR4 3000MHz)
  • Silverstone Essential 550W 80 Plus Gold
  • nVidia Geforce GT710 1GB

 

My Array Configuration:

unRAID_ArrayConfig.png.073799bf3b812b86d5d74b59e30b0324.png

 

Share Settings:

unRAID_Shares.thumb.png.9bf0edd102f584c452bb593888383d53.png

 

After upgrading to 6.9.0 I've:

  • Reformating cache pool to 1MiB partition layout as described by limetech HERE.
  • Switching from Official Plex docker to Binhex Plex Pass container.
  • Tried toggling various dockers/vms on and off to find a culprit, no joy.

 

After upgrading to 6.9.1 I've:

  • Switching Docker to use a directory instead of an image.
  • Moved Docker back to a fresh btrfs image after the above didn't work.
  • Tried toggling various dockers/vms on and off to find a culprit, no joy.

 

After upgrading to 6.9.2 on 18/04 I've:

  • Moved docker back to using a directory instead of an image again.
  • Disabled my duplicati docker and anything else I don't utilise often (bazarr, krusader...etc...)
  • Disabled all my W10 VMs, only a pfSense VM running now.
  • Tried toggling various dockers/vms on and off to find a culprit, no joy.

 

Following my upgrade to 6.9.1 and subsequent actions I let it run for 1 month without my interfering and in that time the cache had over 60 million writes...and the USB key failed. Which is actually what precipitated my upgrade to 6.9.2 this past Sunday and another round of troubleshooting this issue.

 

TBW according to the cli:

cat /etc/unraid-version; /usr/sbin/smartctl -A /dev/sdb | awk '$0~/Power_On_Hours/{ printf "Days: %.1f\n", $10 / 24} $0~/LBAs/{ printf "TBW: %.1f\n", $10 * 512 / 1024^4 }'

version="6.9.2"
Days: 2261.2
TBW: 81.6
TBW: 217.8

 

Cache Settings showing alignment:

unRAID_Cache1_14hrsClearStats.thumb.png.ddff42eb5007bf7ad6fcb79630f46fe4.png

unRAID_Cache2_14hrsClearStats.thumb.png.aa20d4792327cfeef6edaab6e0c6b4e5.png

 

Current Docker Settings:

unRAID_DockerSettings_14hrsClearStats.thumb.png.3df8a74902a96f8bf9c322a1ad145ef4.png

 

Currently Running Dockers and their mappings:

unRAID_Dockers_14hrsClearStats.thumb.png.e0738683540d9601ba1ba6f544781754.png

 

Screenshot of Main 14hrs after a stats resest (after moving docker back to directory mode from img):

unRAID_Main_14hrsAfterClearStats.thumb.png.6afcd7e74d9c933cd46f666d7311669d.png

 

As you can see from the above screenshots, the writes are excessive. This is despite the cache re-alignment, move back to using docker in directory mode and the disabling of the only docker that should be doing large writes because of how I have it set up (duplicati).

 

The only VM I'm running is for my pfSense and I've disabled any intensive logging I had running there.

 

14 hours before the time of the screenshots (after moving docker back to directory mode) I cleared the stats on Main and left an iotop -ao running on my laptop. Unofrtunately the laptop decided to reboot for updates (damn) during this time so I don't have the output from iotop but as you can see in that period the cache had over 2.3 million writes with not a whole lot going on/running.

 

I've run another iotop -ao for the last 3hours before writing this post (without changing anything else) to give some sort of idea of what that overnight output would look like if I had it:

unRAID_3hrIOTOP_20_04_2021.png.2b963500ba8681194668f116f2d2e271.png

 

I should also mention that yesterday I ran each docker 1 by 1 (with all others disabled) and ran iotop -ao for 10 minutes each but the results were inconclusive.

 

As I've mentioned I have to have the cache encrypted and redundant. I do understand there is a write aplificiation expected from the encryption etc...but it shouldn't be this high.

 

I've tried to be thourough and include as much current information and background as possible. But if I've missed anything please don't hesitate to ask!

 

I can't help but feel like I'm missing something...any help/different perspectives would be very very welcome at this point.

ibstorage-diagnostics-20210420-1106.zip

Edited by DaveDoesStuff
Updated with share settings
Link to comment
1 hour ago, DaveDoesStuff said:

I've run another iotop -ao for the last 3hours before writing this post (without changing anything else) to give some sort of idea of what that overnight output would look like if I had it:

That's from 3 Hours? It shows a total of less than 2GiB. That's very little, something else must be doing most of the writing.

  • Thanks 1
Link to comment
7 minutes ago, JorgeB said:

That's from 3 Hours? It shows a total of less than 2GiB. That's very little, something else must be doing most of the writing.

 

That's what I've been thinking...but not sure how to narrow it down further, or possibly go beyound whatever bounds iotop has.

 

Here is an SS from 4 hours, not much difference.

unRAID_4hrIOTOP_20.04.2021.png

Edited by DaveDoesStuff
Link to comment
47 minutes ago, JorgeB said:

Any continuous writes should be in there, at least they were for all users with that issue, including myself, do you do any downloading/unpacking to the cache pool?

 

Nothing beyond the standard Radarr/Sonarr > Qbtorrent > Plex setup used by many. Here is a screenshot of QB at present along with a fresh iotop, the QB screenshot effectively represents my last 72 hours of torrenting. As you can see, not a ton. From what I've seen on the forums here I'd say it's a standard amount even.

 

 

unRAID_Qbt_20.04.2021.png

unRAID_6hrIOTOP_20.04.2021.png

 

EDIT: I've just noticed that iotop, despite being the same window running for the last 6 hours seems to be giving inconsistant stats...haven't used it outside of troubleshooting this issue so not sure if thats normal. For example in the screenshot in this post taken at roughly 6hrs run time, compared to the previous 4hrs one the entries at the top for highest write are missing.

 

EDIT2: Added share settings to the OP.

Edited by DaveDoesStuff
Link to comment
7 minutes ago, JorgeB said:

No, but much more than the continuous writes reported by iotop would generate, check SSDs TBW then let it run for 24H without any downloading, then check again.

 

Yeah for sure its not adding up. I've got Qbtorrent disabled now, at time of writing this the TBW is:

unRAID_TBW_QB-Disabled_20.04@14_45.png.387f342d9e0b1a04b8b19be36eef20bc.png

 

Let's see where it sits tomorrow morning.

 

Thank you for the assit by the way!

Link to comment
21 minutes ago, JorgeB said:

Those are not correct, you can see them on the SMART report:

 


Data Units Written:                 114,252,166 [58.4 TB]

and


Data Units Written:                 114,251,764 [58.4 TB]

 

 

 

I believe what I posted is actually for the total time this "server" has been in operation/powered on. That's probably a bit useless come to think of it, I picked it up in a bug report :D

 

Currently the 2 cache drives report:

  • Cache 1: 114,276,540 [58.5 TB]
  • Cache 2: 114,276,147 [58.5 TB]
Link to comment

So after running overnight...

 

unRAID_TBW_QB-Disabled_Overnight.png.a936382e8abbc6f6b7eac71e94defd81.png

  • Cache 1: 114,313,326 [58.5 TB]
  • Cache 2: 114,312,943 [58.5 TB]

Thats a delta of roughly 35,000 on each drive. No this does not seem like a lot...but unRAID main reporting 3.9 million when it was around 2.6 million yesterday evening is still pretty bad so a server that was essentially only running a handful of low IO dockers and a pfSense VM.

 

Link to comment
34 minutes ago, JorgeB said:

The number of writes in the GUI is basically meaningless, it varies with the devices, what matters are the TBW, and that is fine, only a few GB were written.

 

Hmm, didn't know that. To be fair it would be counter-intuitive for me to assume it's a load of crap lol but thank you for the info.

 

So obviously there *was* an issue that lead to these two devices racking up 58.5TB in writes in just 4 months since I began using them...and it seems like possibly something I did either earlier or in the last 72 hours has actually fixed it but I was focusing on metrics that don't actually represent the TBW or possibly even misrepresent it.

 

Now I'll just have to re-enable Qbtorrent and monitor it closely, then if that checks out Duplicati (which I feel is possibly the culprit) and take some additional actions depending on the outcome.

 

Thank you for the fresh set of eyes on this Jorge! I had complete tunnel vision until you came along lol!

 

Do you use any particular tools or commands on your own setup(s) to monitor this kind of thing? Or anything you would recommend over what I've utilised in this thread?

Link to comment
  • DaveDoesStuff changed the title to 6.9.2 Excessive Cache Writes Still Persisting After Re-Alignment to 1MiB (Solved) (Probably)

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.