marc0777 Posted May 5, 2022 Share Posted May 5, 2022 (edited) I've recently replaced some disks in my NAS and I took the chance to switch from OpenMediaVault to UNRAID 6.9.2. However I noticed as soon as I started copying files from the old disks to the new ones that performance are very bad. While on OMV I could saturate the ethernet link via SMB, now all copies (SMB, NFS and even local) are around 30 MB/s. Even when copying from SSD to SSD I'm not able to overcome that limit. What I noticed is that while copying data the shfs process uses a lot of CPU: While I expected a drop in write performance with respect to my previous setup because of the parity, I was not expecting such a drop in reads as well. Is it to be expected with my hardware? Moreover, I also noticed that I'm getting a lot of "Stale file handle" error, with one of the NFS shares. On the client's dmesg I see a lot of those errors: "NFS: server [IP] error: fileid changed". I already tried tuning the fuse_remember parameter but to no avail. Any suggestions for this? The machine is a Thecus N5550 with the following specifications: - Intel Atom D2550 - 8GB DDR3 RAM - 2 × Toshiba MG07ACA14TE (CMR) 14TB - 2 × WD RED (CMR) 6TB - Kingston A400 120GB - UNRAID 6.9.2, VMs and Docker disabled and no plugins I attach to this post the diagnostics. Thanks, Marco unraid-diagnostics-20220506-0106.zip Edited May 6, 2022 by marc0777 Better title. Quote Link to comment
AndrewZ Posted May 13, 2022 Share Posted May 13, 2022 Hi, Trouble with those diagnostics is that the logs are nothing but NFS issues, so we can't tell if there is something else at play. Can you try upgrading to 6.10 and see if that fixes your trouble and then we go from there? Andrew Quote Link to comment
dlandon Posted May 13, 2022 Share Posted May 13, 2022 You should upgrade to 6.10rc8. It will use NFSv4 that is much better than NFSv3 in 6.9.2. It should also solve the stale file handle issues. Edit: 6.10rc8 also fixes the rpcnind log spamming. Quote Link to comment
marc0777 Posted May 13, 2022 Author Share Posted May 13, 2022 Just upgraded to 6.10rc8 and switched all my NFS share to v4.2, the logs look already cleaner at least on the clients side. Performances however did not improve, in fact I'm seeing around 22MB/s reading from cache or array, always with shfs at around 100 in top. New diagnostics attached. unraid-diagnostics-20220513-1815.zip Quote Link to comment
dlandon Posted May 13, 2022 Share Posted May 13, 2022 I would look at the following: Your server is an Intel Atom. That's not a termendously powerful prpcessor. Client side setup. Is the slow down in reading from the server or writing on the client? Install the Tips and Tweaks plugin. See if some disk cache adjustments will help. Also see if any processor scaling governor changes will help. Make the recommended changes to the NICs. Disable NIC Flow Control and NIC Offload. Quote Link to comment
marc0777 Posted May 15, 2022 Author Share Posted May 15, 2022 I know it's not exactly the best performer, indeed I don't expect to run Docker, VMs or anything else more than NFS and SMB on it. However I just need it to reach some decent read speeds, as it did with no problem before I started using Unraid on it. I thought the overhead would be only on write speed and wouldn't make such a big difference on read speeds as well. The slow down is on the server itself, copying a file from /mnt/user to /dev/null brings shfs usage to 100% and doesn't reach more than 30MB/s. Copying from the /mnt/cache to /dev/null doesn't seem make shfs go to 100% but the read speed is the same. CPU governor is currently set to On Demand and the frequency seem to be scaling correctly, and I haven't changed any network setting yet because the problem appears also locally: I first noticed it when copying data from a disk to another from Unraid, indeed importing the data from the old drives to the new ones took me a very long time, because the speed never reached more than 30MB/s despite the drives being able to reach 200MB/s and more. At first I assumed it was Unraid limiting write speeds to the array... Regarding NFS, I'm still getting stale file handle errors, but the frequency went from once every few minutes to once a day. Again, I'm attaching diagnostics.... Should I give up on Unraid and switch back to a less resource hungry OS? unraid-diagnostics-20220515-1925.zip Quote Link to comment
Vr2Io Posted May 15, 2022 Share Posted May 15, 2022 (edited) 5 hours ago, marc0777 said: /mnt/user to /dev/null brings shfs usage to 100% and doesn't reach more than 30MB/s No reason only got 30MB/s, you need dig why local also such slow. Pls test this by dd if=/dev/zero of=/dev/null bs=1M count=10000 10000+0 records in 10000+0 records out 10485760000 bytes (10 GB, 9.8 GiB) copied, 2.02103 s, 5.2 GB/s 5 hours ago, marc0777 said: However I just need it to reach some decent read speeds Try found a large file and perform ' md5sum *file* ' and check the read speed. 5 hours ago, marc0777 said: Should I give up on Unraid and switch back to a less resource hungry OS? In general, Unraid not a hungry OS. Edited May 15, 2022 by Vr2Io Quote Link to comment
marc0777 Posted May 16, 2022 Author Share Posted May 16, 2022 This is the result of the command you wrote: dd if=/dev/zero of=/dev/null bs=1M count=10000 10000+0 records in 10000+0 records out 10485760000 bytes (10 GB, 9.8 GiB) copied, 8.05482 s, 1.3 GB/s Using md5sum from the array I reach the usual speeds of around 30MB/s, and the process using more CPU is shfs. Doing the same from the cache (SSD) I get around 64MB/s, which is the highest speed I've seen so far. Quote Link to comment
Vr2Io Posted May 16, 2022 Share Posted May 16, 2022 (edited) If DD only got 1.3GB/s, it look like CPU issue, my result 5.2GB/s was under J1900, md5sum test also easy reach 200MB/s. Suggest recheck CPU running clock rate or wrong pinning CPU core. Edited May 16, 2022 by Vr2Io Quote Link to comment
marc0777 Posted May 16, 2022 Author Share Posted May 16, 2022 Please notice that the D2550 I've got is slower than a J1900. However md5sum is currently held back by read speeds, since I can reach higher speeds when not reading from disk: dd if=/dev/zero bs=1M count=1k | md5sum 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 9.21265 s, 117 MB/s cd573cfaace07e7949bc0c46028904ff - My CPU doesn't support frequency scaling so it's always running at full frequency, 1.86GHz. As you can see from below it's indeed running at that frequency. Spoiler cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 54 model name : Intel(R) Atom(TM) CPU D2550 @ 1.86GHz stepping : 1 microcode : 0x10d cpu MHz : 1861.886 cache size : 512 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl nonstop_tsc cpuid aperfmperf pni dtes64 monitor ds_cpl tm2 ssse3 cx16 xtpr pdcm movbe lahf_lm dtherm arat bugs : bogomips : 3723.82 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 54 model name : Intel(R) Atom(TM) CPU D2550 @ 1.86GHz stepping : 1 microcode : 0x10d cpu MHz : 1861.910 cache size : 512 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 2 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl nonstop_tsc cpuid aperfmperf pni dtes64 monitor ds_cpl tm2 ssse3 cx16 xtpr pdcm movbe lahf_lm dtherm arat bugs : bogomips : 3723.82 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 2 vendor_id : GenuineIntel cpu family : 6 model : 54 model name : Intel(R) Atom(TM) CPU D2550 @ 1.86GHz stepping : 1 microcode : 0x10d cpu MHz : 1861.910 cache size : 512 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 2 apicid : 2 initial apicid : 2 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl nonstop_tsc cpuid aperfmperf pni dtes64 monitor ds_cpl tm2 ssse3 cx16 xtpr pdcm movbe lahf_lm dtherm arat bugs : bogomips : 3723.82 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 54 model name : Intel(R) Atom(TM) CPU D2550 @ 1.86GHz stepping : 1 microcode : 0x10d cpu MHz : 1861.913 cache size : 512 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 2 apicid : 3 initial apicid : 3 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl nonstop_tsc cpuid aperfmperf pni dtes64 monitor ds_cpl tm2 ssse3 cx16 xtpr pdcm movbe lahf_lm dtherm arat bugs : bogomips : 3723.82 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: Quote Link to comment
Vr2Io Posted May 16, 2022 Share Posted May 16, 2022 4 minutes ago, marc0777 said: since I can reach higher speeds when not reading from disk: Then look like storage controller/subsystem issue, no more trouble shoot suggest as those were onboard. Quote Link to comment
JorgeB Posted May 16, 2022 Share Posted May 16, 2022 2 hours ago, marc0777 said: since I can reach higher speeds when not reading from disk: Still seems a little on the low side for that CPU, this is with a D2700, it is faster but not my much (passmark of 492 vs 423): dd if=/dev/zero bs=1M count=1k | md5sum 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 4.70252 s, 228 MB/s I would expect it to be 10 or 20% faster on this test, not twice as fast, might be worth checking the BIOS to make sure CPU cache is enable and/or reset CMOS to optimal defaults. Some more comparison tests, though as already mentioned a low end CPU can always limit your performance, especially when using user shares, but it also shouldn't be that slow, source disk is a 500GB 2.5" HDD: dd if=/mnt/disk2/test/1.mp4 bs=1M count=1k | md5sum 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 12.1803 s, 88.2 MB/s dd if=/mnt/user/test/1.mp4 bs=1M count=1k | md5sum 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 16.5119 s, 65.0 MB/s Reading from a client computer over SMB I get around 110MB/s when using a disk share and around 65MB/s with a user share. Quote Link to comment
marc0777 Posted May 16, 2022 Author Share Posted May 16, 2022 So assuming there is something wrong with the CPU performances, what could be limiting them and what should I check? I tried setting the governor to performance but this didn't seem to change anything. I don't know if this might be related, but output of cpufreq-info shows: cpufreq-info cpufrequtils 008: cpufreq-info (C) Dominik Brodowski 2004-2009 Report errors and bugs to [email protected], please. analyzing CPU 0: no or unknown cpufreq driver is active on this CPU maximum transition latency: 4294.55 ms. analyzing CPU 1: no or unknown cpufreq driver is active on this CPU maximum transition latency: 4294.55 ms. analyzing CPU 2: no or unknown cpufreq driver is active on this CPU maximum transition latency: 4294.55 ms. analyzing CPU 3: no or unknown cpufreq driver is active on this CPU maximum transition latency: 4294.55 ms. Quote Link to comment
JorgeB Posted May 16, 2022 Share Posted May 16, 2022 30 minutes ago, marc0777 said: I don't know if this might be related That's should be unrelated, but other than what I mentioned above, checking/loading BIOS optimal defaults don't have other ideas. Quote Link to comment
marc0777 Posted May 16, 2022 Author Share Posted May 16, 2022 After some more digging it looks like the CPU is stuck in the C1E state, which makes me pretty sure it's a BIOS problem. Unfortunately I'm currently far from where I keep the NAS, so I'll have to wait or find someone who can go and check for me sooner. Anyway, I'll update you as soon as I have news regarding that. Any suggestions instead on NFS? Despite the upgrade and switching to NFS 4.2 I'm still getting stale file handles 2-3 times a day... Quote Link to comment
JorgeB Posted May 16, 2022 Share Posted May 16, 2022 34 minutes ago, marc0777 said: After some more digging it looks like the CPU is stuck in the C1E state, which makes me pretty sure it's a BIOS problem. That would make sense, hopefully it can be corrected in the BIOS. 35 minutes ago, marc0777 said: Any suggestions instead on NFS? Sorry, can't help with that, never used NFS. Quote Link to comment
marc0777 Posted May 26, 2022 Author Share Posted May 26, 2022 So I tried changing every option in the BIOS but with no success, performances are still abysmal. Moreover I had to disable cache for all my NFS shares or they would continuously giving stale file handle errors. Since there doesn't seem to be a solution to this problems I'll switch to another OS as soon as I have the time to move all the files around again. Quote Link to comment
Solution marc0777 Posted July 30, 2022 Author Solution Share Posted July 30, 2022 Hello everyone! After a long time I was able to work on the NAS, and I managed to solve the problem! The problem was a faulty BIOS, made for a slightly different version of my NAS. I installed it after reading on the internet that it should've fixed some problems when using OSes different than the one it was sold with, however the CPU management was completely broken. After finding and installing the original BIOS now performance are what I would expect. Indeed I reach around 100 MB/s both when reading and writing using NFS over a 1gbps link. Thanks again to everyone for the support! 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.