Jump to content

Poor Write To Cache Network Performance / BTRFS Errors


Go to solution Solved by JorgeB,

Recommended Posts

Hello, sincere appreciation in advance for any guidance and thoughts from the community on this topic.

 

Over the last several weeks, I have noticed in some instances the speeds with which I can write TO my cache drive will drop significantly. I use Teracopy to write files to my server, and in these instances of poor cache write performance, I will frequently get an error that the hashes don't match for the files written to the server. In each case, when troubleshooting the issue I have rebooted my server, and it has resolved the issue for some amount of time. An example of the change in write speed is as follows:

 

Average normal write speed: 85-90 MB/s

Average degraded write speed: 25-40MB/s

 

Here is the current write speeds to my Unraid Server, writing to SSD Cache drive:

2057606053_WritetoUnraid.png.2e9611c643ba00a9f077bcb0a5263371.png

 

As a point of comparison, here are the write speeds to my Synology server, no cache SSD. The same source computer, same network, same router, same switch, same file are being used for both servers:

1468798642_WritetoSynology.png.238961b37e59480cc081e8f3977dbe6e.png

 

It is worth noting that in this window, I updated to Unraid 6.12.6 and I understand that there are potential networking issues with RealTek adapters. Last week I downloaded the suggested RealTek driver from the Fix Common Problems Plug-in but the same issue popped up. I removed the plug-in yesterday.

 

Here is the information being shown in my system devices for the network adapter:

 

Quote

[10ec:8168]07:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)

 

I have also noticed the following errors in my system log from this morning:

 

Quote

Jan 20 08:03:48 Tower kernel: r8169 0000:07:00.0 eth0: Rx ERROR. status = 3921c040

Jan 20 08:04:14 Tower kernel: r8169 0000:07:00.0 eth0: Rx ERROR. status = 3921c040

Jan 20 08:06:45 Tower kernel: r8169 0000:07:00.0 eth0: Rx ERROR. status = 3921c040

Jan 20 08:07:07 Tower kernel: r8169 0000:07:00.0 eth0: Rx ERROR. status = 3921c040

Jan 20 08:07:25 Tower kernel: r8169 0000:07:00.0 eth0: Rx ERROR. status = 3921c040

Jan 20 08:07:54 Tower kernel: r8169 0000:07:00.0 eth0: Rx ERROR. status = 3921c040

Jan 20 08:08:01 Tower kernel: r8169 0000:07:00.0 eth0: Rx ERROR. status = 3921c040

Jan 20 08:08:33 Tower kernel: r8169 0000:07:00.0 eth0: Rx ERROR. status = 3921c040

Jan 20 08:12:15 Tower kernel: r8169 0000:07:00.0 eth0: Rx ERROR. status = 3921c040

Jan 20 08:12:24 Tower kernel: r8169 0000:07:00.0 eth0: Rx ERROR. status = 3921c040

Jan 20 08:12:27 Tower kernel: r8169 0000:07:00.0 eth0: Rx ERROR. status = 3921c040

Jan 20 08:12:52 Tower kernel: r8169 0000:07:00.0 eth0: Rx ERROR. status = 3921c040

 

I also had BTRFS errors in my logs over the past several days.

 

Quote

Jan 18 13:27:45 Tower kernel: BTRFS error (device sdo1): bdev /dev/sdo1 errs: wr 0, rd 0, flush 0, corrupt 7, gen 0

Jan 18 13:27:45 Tower kernel: BTRFS error (device sdo1): bdev /dev/sdo1 errs: wr 0, rd 0, flush 0, corrupt 8, gen 0

Jan 18 13:27:45 Tower kernel: BTRFS error (device sdo1): bdev /dev/sdo1 errs: wr 0, rd 0, flush 0, corrupt 9, gen 0

Jan 19 04:56:57 Tower kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0

 

I suspect these are related, but I can't be certain. Apologies if I am conflating two separate issues, but I wanted to share as much as possible.

 

I have attached Diagnostics, as well as my system logs from these periods. My apologies for the noise in the logs, when rebooting my server, a new IP address was assigned and the server lost access to the UPS. I didn't notice the noise in the logs from this until today. I copied and pasted errors above.

 

Please let me know if I can provide any other useful information. Thanks so much!

tower-syslog-20240120-1710.zip tower-diagnostics-20240120-0909.zip tower-syslog-192.168.1.2-20240120-1711.zip

Link to comment
On 1/21/2024 at 2:42 AM, JorgeB said:

Since btrfs is detecting data corruption I would recommend starting by running memtest.

I believe the recommendation is to run Memtest for a full 24 hours, but given that I am 12 hours in and have already logged 462 errors, I'm suspecting this is enough data to call my dimms bad? Is that that the general consensus? I assume even one error is above a comfortable threshold.

 

If so, I need to work on tracking down some ram compatible with my motherboard and CPU. I am using older hardware and I am not clear on whether or not ram is still manufactured or available. Are there current recommendations around ram? Is it better/worse to utilize all four slots on the motherboard? If I have been comfortably running 16GB for years, is there any reason to bump to 32GB? If I move from 4 dimms to 2 dimms, is that going to present any sort of issue? I assume the answer to all of these is no, but better to check than be surprised.

 

Last question, would failing memory explain the drop in cache write speeds I have been observing?

 

Appreciate everyone's time and thoughts.

 

MemTest.thumb.jpg.d4c3c4b36bb3de6cc30f5a1735ae3edd.jpg

Link to comment
1 minute ago, JorgeB said:

That's correct.

 

Using all 4 slots is usually fine, though 2 can be more stable, especially if the board has an underlying issue.

Much appreciated. I will start to track down some new ram and report back.

 

Sorry to repeat this: but would failing memory explain the drop in cache write speeds I have been observing? Or is that likely another scenario which I will need to troubleshoot after getting new ram.

 

Thanks again.

Link to comment
1 minute ago, JPilla415 said:

Sorry to repeat this: but would failing memory explain the drop in cache write speeds I have been observing?

Difficult to say for sure, but bad RAM can cause all sorts of issues, especially with btrfs, so fix that and then retest.

  • Thanks 1
Link to comment
On 1/22/2024 at 8:35 AM, JorgeB said:

Difficult to say for sure, but bad RAM can cause all sorts of issues, especially with btrfs, so fix that and then retest.

 

12 hours and error free with the new ram. I'm going to let this run the rest of the day to play it safe.

 

Assuming I encounter no errors in the next 12-24 hours, what is the recommended next step to troubleshoot my slow cache write performance outlined earlier in the thread? Just go back to normal day-to-day use of the server and see if the issue pops up again?

 

Thank you again for all of the help!

 

MemTestNewRam12hr.thumb.jpg.292b24f3f2216d430063497f57090c2e.jpg

Edited by JPilla415
Link to comment
Just now, JorgeB said:

Yep.

Thanks very much for the time and help on this! I'll go ahead and mark the MemTest post from you as the solution. If other issues crop back up, I'll come back to the thread and add more details. Fingers crossed this is the end of it.

 

Is there a current recommendation for my RealTek Adapter, it's unclear to me if I should be using the RealTek plug-in or not:

 

Quote

[10ec:8168]07:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...