January 20, 20242 yr Hello, sincere appreciation in advance for any guidance and thoughts from the community on this topic. Over the last several weeks, I have noticed in some instances the speeds with which I can write TO my cache drive will drop significantly. I use Teracopy to write files to my server, and in these instances of poor cache write performance, I will frequently get an error that the hashes don't match for the files written to the server. In each case, when troubleshooting the issue I have rebooted my server, and it has resolved the issue for some amount of time. An example of the change in write speed is as follows: Average normal write speed: 85-90 MB/s Average degraded write speed: 25-40MB/s Here is the current write speeds to my Unraid Server, writing to SSD Cache drive: As a point of comparison, here are the write speeds to my Synology server, no cache SSD. The same source computer, same network, same router, same switch, same file are being used for both servers: It is worth noting that in this window, I updated to Unraid 6.12.6 and I understand that there are potential networking issues with RealTek adapters. Last week I downloaded the suggested RealTek driver from the Fix Common Problems Plug-in but the same issue popped up. I removed the plug-in yesterday. Here is the information being shown in my system devices for the network adapter: Quote [10ec:8168]07:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06) I have also noticed the following errors in my system log from this morning: Quote Jan 20 08:03:48 Tower kernel: r8169 0000:07:00.0 eth0: Rx ERROR. status = 3921c040 Jan 20 08:04:14 Tower kernel: r8169 0000:07:00.0 eth0: Rx ERROR. status = 3921c040 Jan 20 08:06:45 Tower kernel: r8169 0000:07:00.0 eth0: Rx ERROR. status = 3921c040 Jan 20 08:07:07 Tower kernel: r8169 0000:07:00.0 eth0: Rx ERROR. status = 3921c040 Jan 20 08:07:25 Tower kernel: r8169 0000:07:00.0 eth0: Rx ERROR. status = 3921c040 Jan 20 08:07:54 Tower kernel: r8169 0000:07:00.0 eth0: Rx ERROR. status = 3921c040 Jan 20 08:08:01 Tower kernel: r8169 0000:07:00.0 eth0: Rx ERROR. status = 3921c040 Jan 20 08:08:33 Tower kernel: r8169 0000:07:00.0 eth0: Rx ERROR. status = 3921c040 Jan 20 08:12:15 Tower kernel: r8169 0000:07:00.0 eth0: Rx ERROR. status = 3921c040 Jan 20 08:12:24 Tower kernel: r8169 0000:07:00.0 eth0: Rx ERROR. status = 3921c040 Jan 20 08:12:27 Tower kernel: r8169 0000:07:00.0 eth0: Rx ERROR. status = 3921c040 Jan 20 08:12:52 Tower kernel: r8169 0000:07:00.0 eth0: Rx ERROR. status = 3921c040 I also had BTRFS errors in my logs over the past several days. Quote Jan 18 13:27:45 Tower kernel: BTRFS error (device sdo1): bdev /dev/sdo1 errs: wr 0, rd 0, flush 0, corrupt 7, gen 0 Jan 18 13:27:45 Tower kernel: BTRFS error (device sdo1): bdev /dev/sdo1 errs: wr 0, rd 0, flush 0, corrupt 8, gen 0 Jan 18 13:27:45 Tower kernel: BTRFS error (device sdo1): bdev /dev/sdo1 errs: wr 0, rd 0, flush 0, corrupt 9, gen 0 Jan 19 04:56:57 Tower kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0 I suspect these are related, but I can't be certain. Apologies if I am conflating two separate issues, but I wanted to share as much as possible. I have attached Diagnostics, as well as my system logs from these periods. My apologies for the noise in the logs, when rebooting my server, a new IP address was assigned and the server lost access to the UPS. I didn't notice the noise in the logs from this until today. I copied and pasted errors above. Please let me know if I can provide any other useful information. Thanks so much! tower-syslog-20240120-1710.zip tower-diagnostics-20240120-0909.zip tower-syslog-192.168.1.2-20240120-1711.zip
January 20, 20242 yr Author I've made no changes to my network, Unraid or PC. Tried another copy to the Unraid cache and this is what I am seeing now. These are the speeds I am used to seeing.
January 21, 20242 yr Community Expert Solution Since btrfs is detecting data corruption I would recommend starting by running memtest.
January 22, 20242 yr Author Understood, thank you very much for the suggestion. Running a memtest now. I will report back tomorrow.
January 22, 20242 yr Author On 1/21/2024 at 2:42 AM, JorgeB said: Since btrfs is detecting data corruption I would recommend starting by running memtest. I believe the recommendation is to run Memtest for a full 24 hours, but given that I am 12 hours in and have already logged 462 errors, I'm suspecting this is enough data to call my dimms bad? Is that that the general consensus? I assume even one error is above a comfortable threshold. If so, I need to work on tracking down some ram compatible with my motherboard and CPU. I am using older hardware and I am not clear on whether or not ram is still manufactured or available. Are there current recommendations around ram? Is it better/worse to utilize all four slots on the motherboard? If I have been comfortably running 16GB for years, is there any reason to bump to 32GB? If I move from 4 dimms to 2 dimms, is that going to present any sort of issue? I assume the answer to all of these is no, but better to check than be surprised. Last question, would failing memory explain the drop in cache write speeds I have been observing? Appreciate everyone's time and thoughts.
January 22, 20242 yr Community Expert 21 minutes ago, JPilla415 said: I assume even one error is above a comfortable threshold. That's correct. Using all 4 slots is usually fine, though 2 can be more stable, especially if the board has an underlying issue.
January 22, 20242 yr Author 1 minute ago, JorgeB said: That's correct. Using all 4 slots is usually fine, though 2 can be more stable, especially if the board has an underlying issue. Much appreciated. I will start to track down some new ram and report back. Sorry to repeat this: but would failing memory explain the drop in cache write speeds I have been observing? Or is that likely another scenario which I will need to troubleshoot after getting new ram. Thanks again.
January 22, 20242 yr Community Expert 1 minute ago, JPilla415 said: Sorry to repeat this: but would failing memory explain the drop in cache write speeds I have been observing? Difficult to say for sure, but bad RAM can cause all sorts of issues, especially with btrfs, so fix that and then retest.
January 22, 20242 yr Author 4 minutes ago, JorgeB said: Difficult to say for sure, but bad RAM can cause all sorts of issues, especially with btrfs, so fix that and then retest. Understood, will do thanks again!
January 24, 20242 yr Author On 1/22/2024 at 8:35 AM, JorgeB said: Difficult to say for sure, but bad RAM can cause all sorts of issues, especially with btrfs, so fix that and then retest. 12 hours and error free with the new ram. I'm going to let this run the rest of the day to play it safe. Assuming I encounter no errors in the next 12-24 hours, what is the recommended next step to troubleshoot my slow cache write performance outlined earlier in the thread? Just go back to normal day-to-day use of the server and see if the issue pops up again? Thank you again for all of the help! Edited January 24, 20242 yr by JPilla415
January 24, 20242 yr Community Expert 30 minutes ago, JPilla415 said: Just go back to normal day-to-day use of the server and see if the issue pops up again? Yep.
January 24, 20242 yr Author Just now, JorgeB said: Yep. Thanks very much for the time and help on this! I'll go ahead and mark the MemTest post from you as the solution. If other issues crop back up, I'll come back to the thread and add more details. Fingers crossed this is the end of it. Is there a current recommendation for my RealTek Adapter, it's unclear to me if I should be using the RealTek plug-in or not: Quote [10ec:8168]07:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)
January 25, 20242 yr Author 28 hour MemTest with clean results! Booting back up the server again after several days of downtime. Fingers crossed! Thanks again for all the help.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.