Bad R/W performances and high shfs CPU usage, NFS stale file handle

May 5, 20224 yr

I've recently replaced some disks in my NAS and I took the chance to switch from OpenMediaVault to UNRAID 6.9.2. However I noticed as soon as I started copying files from the old disks to the new ones that performance are very bad. While on OMV I could saturate the ethernet link via SMB, now all copies (SMB, NFS and even local) are around 30 MB/s. Even when copying from SSD to SSD I'm not able to overcome that limit.

What I noticed is that while copying data the shfs process uses a lot of CPU:

shfs.png.55092a3d70792f30e8dd409cf0cb4c5b.png

While I expected a drop in write performance with respect to my previous setup because of the parity, I was not expecting such a drop in reads as well. Is it to be expected with my hardware?

Moreover, I also noticed that I'm getting a lot of "Stale file handle" error, with one of the NFS shares. On the client's dmesg I see a lot of those errors: "NFS: server [IP] error: fileid changed". I already tried tuning the fuse_remember parameter but to no avail. Any suggestions for this?

The machine is a Thecus N5550 with the following specifications:

- Intel Atom D2550

- 8GB DDR3 RAM

- 2 × Toshiba MG07ACA14TE (CMR) 14TB

- 2 × WD RED (CMR) 6TB

- Kingston A400 120GB

- UNRAID 6.9.2, VMs and Docker disabled and no plugins

I attach to this post the diagnostics.

Thanks,

Marco

unraid-diagnostics-20220506-0106.zip

Edited May 6, 20224 yr by marc0777
Better title.

Quote

May 13, 20224 yr

Hi,

Trouble with those diagnostics is that the logs are nothing but NFS issues, so we can't tell if there is something else at play. Can you try upgrading to 6.10 and see if that fixes your trouble and then we go from there?

Andrew

Quote

May 13, 20224 yr

You should upgrade to 6.10rc8. It will use NFSv4 that is much better than NFSv3 in 6.9.2. It should also solve the stale file handle issues.

Edit: 6.10rc8 also fixes the rpcnind log spamming.

Quote

May 13, 20224 yr

Author

Just upgraded to 6.10rc8 and switched all my NFS share to v4.2, the logs look already cleaner at least on the clients side.

Performances however did not improve, in fact I'm seeing around 22MB/s reading from cache or array, always with shfs at around 100 in top.

New diagnostics attached.

unraid-diagnostics-20220513-1815.zip

Quote

May 13, 20224 yr

I would look at the following:

Your server is an Intel Atom. That's not a termendously powerful prpcessor.
Client side setup. Is the slow down in reading from the server or writing on the client?
Install the Tips and Tweaks plugin.
- See if some disk cache adjustments will help.
- Also see if any processor scaling governor changes will help.
- Make the recommended changes to the NICs. Disable NIC Flow Control and NIC Offload.

Quote

May 15, 20224 yr

Author

I know it's not exactly the best performer, indeed I don't expect to run Docker, VMs or anything else more than NFS and SMB on it.

However I just need it to reach some decent read speeds, as it did with no problem before I started using Unraid on it. I thought the overhead would be only on write speed and wouldn't make such a big difference on read speeds as well.

The slow down is on the server itself, copying a file from /mnt/user to /dev/null brings shfs usage to 100% and doesn't reach more than 30MB/s. Copying from the /mnt/cache to /dev/null doesn't seem make shfs go to 100% but the read speed is the same.

CPU governor is currently set to On Demand and the frequency seem to be scaling correctly, and I haven't changed any network setting yet because the problem appears also locally: I first noticed it when copying data from a disk to another from Unraid, indeed importing the data from the old drives to the new ones took me a very long time, because the speed never reached more than 30MB/s despite the drives being able to reach 200MB/s and more. At first I assumed it was Unraid limiting write speeds to the array...

Regarding NFS, I'm still getting stale file handle errors, but the frequency went from once every few minutes to once a day. Again, I'm attaching diagnostics....

Should I give up on Unraid and switch back to a less resource hungry OS?

unraid-diagnostics-20220515-1925.zip

Quote

May 15, 20224 yr

5 hours ago, marc0777 said:

/mnt/user to /dev/null brings shfs usage to 100% and doesn't reach more than 30MB/s

No reason only got 30MB/s, you need dig why local also such slow.

Pls test this by

dd if=/dev/zero of=/dev/null bs=1M count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB, 9.8 GiB) copied, 2.02103 s, 5.2 GB/s

5 hours ago, marc0777 said:

However I just need it to reach some decent read speeds

Try found a large file and perform ' md5sum *file* ' and check the read speed.

5 hours ago, marc0777 said:

Should I give up on Unraid and switch back to a less resource hungry OS?

In general, Unraid not a hungry OS.

Edited May 15, 20224 yr by Vr2Io

Quote

May 16, 20224 yr

Author

This is the result of the command you wrote:

dd if=/dev/zero of=/dev/null bs=1M count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB, 9.8 GiB) copied, 8.05482 s, 1.3 GB/s

Using md5sum from the array I reach the usual speeds of around 30MB/s, and the process using more CPU is shfs. Doing the same from the cache (SSD) I get around 64MB/s, which is the highest speed I've seen so far.

Quote

May 16, 20224 yr

If DD only got 1.3GB/s, it look like CPU issue, my result 5.2GB/s was under J1900, md5sum test also easy reach 200MB/s.

Suggest recheck CPU running clock rate or wrong pinning CPU core.

Edited May 16, 20224 yr by Vr2Io

Quote

May 16, 20224 yr

Author

Please notice that the D2550 I've got is slower than a J1900. However md5sum is currently held back by read speeds, since I can reach higher speeds when not reading from disk:

dd if=/dev/zero bs=1M count=1k | md5sum
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 9.21265 s, 117 MB/s
cd573cfaace07e7949bc0c46028904ff  -

My CPU doesn't support frequency scaling so it's always running at full frequency, 1.86GHz. As you can see from below it's indeed running at that frequency.

Spoiler

cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 54
model name      : Intel(R) Atom(TM) CPU D2550   @ 1.86GHz
stepping        : 1
microcode       : 0x10d
cpu MHz         : 1861.886
cache size      : 512 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 2
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl nonstop_tsc cpuid aperfmperf pni dtes64 monitor ds_cpl tm2 ssse3 cx16 xtpr pdcm movbe lahf_lm dtherm arat
bugs            :
bogomips        : 3723.82
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 54
model name      : Intel(R) Atom(TM) CPU D2550   @ 1.86GHz
stepping        : 1
microcode       : 0x10d
cpu MHz         : 1861.910
cache size      : 512 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 2
apicid          : 1
initial apicid  : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl nonstop_tsc cpuid aperfmperf pni dtes64 monitor ds_cpl tm2 ssse3 cx16 xtpr pdcm movbe lahf_lm dtherm arat
bugs            :
bogomips        : 3723.82
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor       : 2
vendor_id       : GenuineIntel
cpu family      : 6
model           : 54
model name      : Intel(R) Atom(TM) CPU D2550   @ 1.86GHz
stepping        : 1
microcode       : 0x10d
cpu MHz         : 1861.910
cache size      : 512 KB
physical id     : 0
siblings        : 4
core id         : 1
cpu cores       : 2
apicid          : 2
initial apicid  : 2
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl nonstop_tsc cpuid aperfmperf pni dtes64 monitor ds_cpl tm2 ssse3 cx16 xtpr pdcm movbe lahf_lm dtherm arat
bugs            :
bogomips        : 3723.82
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor       : 3
vendor_id       : GenuineIntel
cpu family      : 6
model           : 54
model name      : Intel(R) Atom(TM) CPU D2550   @ 1.86GHz
stepping        : 1
microcode       : 0x10d
cpu MHz         : 1861.913
cache size      : 512 KB
physical id     : 0
siblings        : 4
core id         : 1
cpu cores       : 2
apicid          : 3
initial apicid  : 3
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl nonstop_tsc cpuid aperfmperf pni dtes64 monitor ds_cpl tm2 ssse3 cx16 xtpr pdcm movbe lahf_lm dtherm arat
bugs            :
bogomips        : 3723.82
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

Quote

May 16, 20224 yr

4 minutes ago, marc0777 said:

since I can reach higher speeds when not reading from disk:

Then look like storage controller/subsystem issue, no more trouble shoot suggest as those were onboard.

Quote

May 16, 20224 yr

Community Expert

2 hours ago, marc0777 said:

since I can reach higher speeds when not reading from disk:

Still seems a little on the low side for that CPU, this is with a D2700, it is faster but not my much (passmark of 492 vs 423):

dd if=/dev/zero bs=1M count=1k | md5sum
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 4.70252 s, 228 MB/s

I would expect it to be 10 or 20% faster on this test, not twice as fast, might be worth checking the BIOS to make sure CPU cache is enable and/or reset CMOS to optimal defaults.

Some more comparison tests, though as already mentioned a low end CPU can always limit your performance, especially when using user shares, but it also shouldn't be that slow, source disk is a 500GB 2.5" HDD:

dd if=/mnt/disk2/test/1.mp4 bs=1M count=1k | md5sum
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 12.1803 s, 88.2 MB/s

dd if=/mnt/user/test/1.mp4 bs=1M count=1k | md5sum
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 16.5119 s, 65.0 MB/s

Reading from a client computer over SMB I get around 110MB/s when using a disk share and around 65MB/s with a user share.

1653393647_Screenshot2022-05-1611_20_29.png.b0641f8700c6533854a64443925dc872.png

143550491_Screenshot2022-05-1611_21_08.png.8f574b2ae0dedcb18d24623e89c6b7b9.png

Quote

May 16, 20224 yr

Author

So assuming there is something wrong with the CPU performances, what could be limiting them and what should I check? I tried setting the governor to performance but this didn't seem to change anything.

I don't know if this might be related, but output of cpufreq-info shows:

cpufreq-info
cpufrequtils 008: cpufreq-info (C) Dominik Brodowski 2004-2009
Report errors and bugs to [email protected], please.
analyzing CPU 0:
  no or unknown cpufreq driver is active on this CPU
  maximum transition latency: 4294.55 ms.
analyzing CPU 1:
  no or unknown cpufreq driver is active on this CPU
  maximum transition latency: 4294.55 ms.
analyzing CPU 2:
  no or unknown cpufreq driver is active on this CPU
  maximum transition latency: 4294.55 ms.
analyzing CPU 3:
  no or unknown cpufreq driver is active on this CPU
  maximum transition latency: 4294.55 ms.

Quote

May 16, 20224 yr

Community Expert

30 minutes ago, marc0777 said:

I don't know if this might be related

That's should be unrelated, but other than what I mentioned above, checking/loading BIOS optimal defaults don't have other ideas.

Quote

May 16, 20224 yr

Author

After some more digging it looks like the CPU is stuck in the C1E state, which makes me pretty sure it's a BIOS problem. Unfortunately I'm currently far from where I keep the NAS, so I'll have to wait or find someone who can go and check for me sooner. Anyway, I'll update you as soon as I have news regarding that.

Any suggestions instead on NFS? Despite the upgrade and switching to NFS 4.2 I'm still getting stale file handles 2-3 times a day...

Quote

May 16, 20224 yr

Community Expert

34 minutes ago, marc0777 said:

After some more digging it looks like the CPU is stuck in the C1E state, which makes me pretty sure it's a BIOS problem.

That would make sense, hopefully it can be corrected in the BIOS.

35 minutes ago, marc0777 said:

Any suggestions instead on NFS?

Sorry, can't help with that, never used NFS.

Quote

May 26, 20224 yr

Author

So I tried changing every option in the BIOS but with no success, performances are still abysmal. Moreover I had to disable cache for all my NFS shares or they would continuously giving stale file handle errors.

Since there doesn't seem to be a solution to this problems I'll switch to another OS as soon as I have the time to move all the files around again.

Quote

July 30, 20223 yr

Author
Solution

Hello everyone!
After a long time I was able to work on the NAS, and I managed to solve the problem!

The problem was a faulty BIOS, made for a slightly different version of my NAS. I installed it after reading on the internet that it should've fixed some problems when using OSes different than the one it was sold with, however the CPU management was completely broken. After finding and installing the original BIOS now performance are what I would expect.

Indeed I reach around 100 MB/s both when reading and writing using NFS over a 1gbps link.

Thanks again to everyone for the support!

Quote

1

Bad R/W performances and high shfs CPU usage, NFS stale file handle

Featured Replies

Solved by marc0777

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)