6.9.2 Excessive Cache Writes Still Persisting After Re-Alignment to 1MiB (Solved) (Probably)

DaveDoesStuff · April 20, 2021

Hi all,

I've taken so many different actions to try and resolve this issue that it's hard to collate everything and to know where to start but here goes...

(Diagnostics attached)

...so essentially since 6.8.3 I've been struggling with the excessive cache writes issue. Due to personal circumstances I was not able to do any meaningful troubleshooting until the January or so after moving to 6.9.0

It should also be noted that I not sure if I was experiancing excessive writes prior to December of 2020 when my SSD Cache Pool was 4x 250GB SATA SSDs in Parity (BTRFS) mode, but I have since upgraded to 2x WD Blue SN550 1TB NVME (WDS100T2B0C) where the issue first came to my attention or possibly even started. Since they've been installed (to this date) they have 114,252,802 [58.4 TB] data units written each.

I've poured over every forum/reddit post about this issue and tried everything in them except for moving my cache over to XFS/Unencrpyted as I strictly require the encryption and redundency. I know I'm making life hard on myself with this...but I need it and prior to

I'll have more specifics on writes below.

My Specs are:

Ryzen 2700X
TUF B450-PLUS GAMING
32GB RAM (4x Team Group Vulcan Z T-Force 8GB DDR4 3000MHz)
Silverstone Essential 550W 80 Plus Gold
nVidia Geforce GT710 1GB

My Array Configuration:

unRAID_ArrayConfig.png.073799bf3b812b86d5d74b59e30b0324.png

Share Settings:

After upgrading to 6.9.0 I've:

Reformating cache pool to 1MiB partition layout as described by limetech HERE.
Switching from Official Plex docker to Binhex Plex Pass container.
Tried toggling various dockers/vms on and off to find a culprit, no joy.

After upgrading to 6.9.1 I've:

Switching Docker to use a directory instead of an image.
Moved Docker back to a fresh btrfs image after the above didn't work.
Tried toggling various dockers/vms on and off to find a culprit, no joy.

After upgrading to 6.9.2 on 18/04 I've:

Moved docker back to using a directory instead of an image again.
Disabled my duplicati docker and anything else I don't utilise often (bazarr, krusader...etc...)
Disabled all my W10 VMs, only a pfSense VM running now.
Tried toggling various dockers/vms on and off to find a culprit, no joy.

Following my upgrade to 6.9.1 and subsequent actions I let it run for 1 month without my interfering and in that time the cache had over 60 million writes...and the USB key failed. Which is actually what precipitated my upgrade to 6.9.2 this past Sunday and another round of troubleshooting this issue.

TBW according to the cli:

cat /etc/unraid-version; /usr/sbin/smartctl -A /dev/sdb | awk '$0~/Power_On_Hours/{ printf "Days: %.1f\n", $10 / 24} $0~/LBAs/{ printf "TBW: %.1f\n", $10 * 512 / 1024^4 }'

version="6.9.2"
Days: 2261.2
TBW: 81.6
TBW: 217.8

Cache Settings showing alignment:

Current Docker Settings:

Currently Running Dockers and their mappings:

Screenshot of Main 14hrs after a stats resest (after moving docker back to directory mode from img):

As you can see from the above screenshots, the writes are excessive. This is despite the cache re-alignment, move back to using docker in directory mode and the disabling of the only docker that should be doing large writes because of how I have it set up (duplicati).

The only VM I'm running is for my pfSense and I've disabled any intensive logging I had running there.

14 hours before the time of the screenshots (after moving docker back to directory mode) I cleared the stats on Main and left an iotop -ao running on my laptop. Unofrtunately the laptop decided to reboot for updates (damn) during this time so I don't have the output from iotop but as you can see in that period the cache had over 2.3 million writes with not a whole lot going on/running.

I've run another iotop -ao for the last 3hours before writing this post (without changing anything else) to give some sort of idea of what that overnight output would look like if I had it:

unRAID_3hrIOTOP_20_04_2021.png.2b963500ba8681194668f116f2d2e271.png

I should also mention that yesterday I ran each docker 1 by 1 (with all others disabled) and ran iotop -ao for 10 minutes each but the results were inconclusive.

As I've mentioned I have to have the cache encrypted and redundant. I do understand there is a write aplificiation expected from the encryption etc...but it shouldn't be this high.

I've tried to be thourough and include as much current information and background as possible. But if I've missed anything please don't hesitate to ask!

I can't help but feel like I'm missing something...any help/different perspectives would be very very welcome at this point.

ibstorage-diagnostics-20210420-1106.zip

Edited April 21, 2021 by DaveDoesStuff
Updated with share settings

JorgeB · April 20, 2021

1 hour ago, DaveDoesStuff said:

I've run another iotop -ao for the last 3hours before writing this post (without changing anything else) to give some sort of idea of what that overnight output would look like if I had it:

That's from 3 Hours? It shows a total of less than 2GiB. That's very little, something else must be doing most of the writing.

DaveDoesStuff · April 20, 2021

7 minutes ago, JorgeB said:

That's from 3 Hours? It shows a total of less than 2GiB. That's very little, something else must be doing most of the writing.

That's what I've been thinking...but not sure how to narrow it down further, or possibly go beyound whatever bounds iotop has.

Here is an SS from 4 hours, not much difference.

Edited April 20, 2021 by DaveDoesStuff

JorgeB · April 20, 2021

Any continuous writes should be in there, at least they were for all users with that issue, including myself, do you do any downloading/unpacking to the cache pool?

DaveDoesStuff · April 20, 2021

47 minutes ago, JorgeB said:

Any continuous writes should be in there, at least they were for all users with that issue, including myself, do you do any downloading/unpacking to the cache pool?

Nothing beyond the standard Radarr/Sonarr > Qbtorrent > Plex setup used by many. Here is a screenshot of QB at present along with a fresh iotop, the QB screenshot effectively represents my last 72 hours of torrenting. As you can see, not a ton. From what I've seen on the forums here I'd say it's a standard amount even.

EDIT: I've just noticed that iotop, despite being the same window running for the last 6 hours seems to be giving inconsistant stats...haven't used it outside of troubleshooting this issue so not sure if thats normal. For example in the screenshot in this post taken at roughly 6hrs run time, compared to the previous 4hrs one the entries at the top for highest write are missing.

EDIT2: Added share settings to the OP.

Edited April 20, 2021 by DaveDoesStuff

JorgeB · April 20, 2021

18 minutes ago, DaveDoesStuff said:

As you can see, not a ton.

No, but much more than the continuous writes reported by iotop would generate, check SSDs TBW then let it run for 24H without any downloading, then check again.

DaveDoesStuff · April 20, 2021

7 minutes ago, JorgeB said:

No, but much more than the continuous writes reported by iotop would generate, check SSDs TBW then let it run for 24H without any downloading, then check again.

Yeah for sure its not adding up. I've got Qbtorrent disabled now, at time of writing this the TBW is:

unRAID_TBW_QB-Disabled_20.04@14_45.png.387f342d9e0b1a04b8b19be36eef20bc.png

Let's see where it sits tomorrow morning.

Thank you for the assit by the way!

JorgeB · April 20, 2021

26 minutes ago, DaveDoesStuff said:

at time of writing this the TBW is:

Those are not correct, you can see them on the SMART report:

Data Units Written:                 114,252,166 [58.4 TB]

and

Data Units Written:                 114,251,764 [58.4 TB]

DaveDoesStuff · April 20, 2021

21 minutes ago, JorgeB said:
Those are not correct, you can see them on the SMART report:
Data Units Written:                 114,252,166 [58.4 TB]
and
Data Units Written:                 114,251,764 [58.4 TB]

I believe what I posted is actually for the total time this "server" has been in operation/powered on. That's probably a bit useless come to think of it, I picked it up in a bug report

Currently the 2 cache drives report:

Cache 1: 114,276,540 [58.5 TB]
Cache 2: 114,276,147 [58.5 TB]

DaveDoesStuff · April 21, 2021

So after running overnight...

unRAID_TBW_QB-Disabled_Overnight.png.a936382e8abbc6f6b7eac71e94defd81.png

Cache 1: 114,313,326 [58.5 TB]
Cache 2: 114,312,943 [58.5 TB]

Thats a delta of roughly 35,000 on each drive. No this does not seem like a lot...but unRAID main reporting 3.9 million when it was around 2.6 million yesterday evening is still pretty bad so a server that was essentially only running a handful of low IO dockers and a pfSense VM.

JorgeB · April 21, 2021

The number of writes in the GUI is basically meaningless, it varies with the devices, what matters are the TBW, and that is fine, only a few GB were written.

DaveDoesStuff · April 21, 2021

34 minutes ago, JorgeB said:

The number of writes in the GUI is basically meaningless, it varies with the devices, what matters are the TBW, and that is fine, only a few GB were written.

Hmm, didn't know that. To be fair it would be counter-intuitive for me to assume it's a load of crap lol but thank you for the info.

So obviously there *was* an issue that lead to these two devices racking up 58.5TB in writes in just 4 months since I began using them...and it seems like possibly something I did either earlier or in the last 72 hours has actually fixed it but I was focusing on metrics that don't actually represent the TBW or possibly even misrepresent it.

Now I'll just have to re-enable Qbtorrent and monitor it closely, then if that checks out Duplicati (which I feel is possibly the culprit) and take some additional actions depending on the outcome.

Thank you for the fresh set of eyes on this Jorge! I had complete tunnel vision until you came along lol!

Do you use any particular tools or commands on your own setup(s) to monitor this kind of thing? Or anything you would recommend over what I've utilised in this thread?

JorgeB · April 21, 2021

1 minute ago, DaveDoesStuff said:

Do you use any particular tools or commands on your own setup(s) to monitor this kind of thing?

I just check the total TBW once a month or so to see if it's increasing normally.

DaveDoesStuff · April 21, 2021

40 minutes ago, JorgeB said:

I just check the total TBW once a month or so to see if it's increasing normally.

Fair enough, i'll try to over engineer something to achieve the same thing! Thanks again.

6.9.2 Excessive Cache Writes Still Persisting After Re-Alignment to 1MiB (Solved) (Probably)

Recommended Posts

DaveDoesStuff

Link to comment

JorgeB

Link to comment

DaveDoesStuff

Link to comment

JorgeB

Link to comment

DaveDoesStuff

Link to comment

JorgeB

Link to comment

DaveDoesStuff

Link to comment

JorgeB

Link to comment

DaveDoesStuff

Link to comment

DaveDoesStuff

Link to comment

JorgeB

Link to comment

DaveDoesStuff

Link to comment

JorgeB

Link to comment

DaveDoesStuff

Link to comment

Join the conversation