Jump to content

Sudden very slow disk access on 6.10 RC2


Sarge

Recommended Posts

I've been running 6.10 RC2 for a couple months now, learning how Unraid works and getting a bunch of Docker containers up like Pihole, Nextcloud, Plex, etc. 

 

Today, while copying some mkv files to the server I noticed that transfer speeds suddenly tanked. After some troubleshooting I think I've narrowed it down to the drives themselves.  

 

Here's what I've done:

  • VM service is off, has been for a while as I have no need for VM's yet
  • Turned off the Docker service to rule out any of the containers putting strain on the system
  • Disabled RAM write caching using Tips and Tweaks and setting vm.dirty_background_ratio and vm.dirty_ratio to 0 (did this early on to try and figure out if the RAM to disk flush was causing it)
  • Uninstalled a couple plugins that I had most recently installed, one of which was Dynamix File Integrity, kind of hoped that would fix it, then realized the cache drive is BTRFS and should not have been affected by it at all.  I did do a reboot after with no improvement.
  • Tested by SSHing into the Unraid box and running the following two commands.
    • dd if=/dev/zero of=/mnt/cache/temp/test1.img bs=1G count=1 oflag=dsync
    • dd if=/dev/zero of=/mnt/disk1/dtest/test1.img bs=1G count=1 oflag=dsync
    • Results are around 3 to 5 MB a second, both on the cache drive and spinning disks where normal results are several gigs a second for the cache and ~100 MB/s for the disks

 

System Specs:

  • Dell R720xd
  • Dell RAID card flashed to IT mode, been working fine for a couple months
  • 128 Gigs of RAM, fully tested with Memtest
  • 2 X Intel Xeon E5-2697 v2 @ 2.70GHz (24 cores, 48 threads)
  • Dual 10 gig Dell Ethernet bonded
  • Nvidia Quadro P400
  • 7 X 10TB Seagate Enterprise drives, two of which are parity
  • 2 X 2TB Samsung SSD 970 EVO Plus NVME drives for cache in RAID 1 using BTRFS

 

Please see diagnostics attached.

 

Any pointers anyone can give would be greatly appreciated.

Let me know if this should go in the 6.10 RC2 thread.

thor-diagnostics-20220224-0018.zip

Link to comment
1 hour ago, Sarge said:

Tested by SSHing into the Unraid box and running the following two commands.

  • dd if=/dev/zero of=/mnt/cache/temp/test1.img bs=1G count=1 oflag=dsync
  • dd if=/dev/zero of=/mnt/disk1/dtest/test1.img bs=1G count=1 oflag=dsync
  • Results are around 3 to 5 MB a second, both on the cache drive and spinning disks where normal results are several gigs a second for the cache and ~100 MB/s for the disks

 

Test count only 1 with 1G, size too small.

If local storage performance normal then you should check network and source storage performance.

Edited by Vr2Io
Link to comment
8 hours ago, Vr2Io said:

Test count only 1 with 1G, size too small.

If local storage performance normal then you should check network and source storage performance.

I ran both of those commands multiple times over several reboots while I was working on the issue with the same results.

It's not network, dd bypasses network as it's ran locally, also network perf to RAM is as expected.

Link to comment

Following is the list of plugins I had installed. 

Plugins with an x in front of them I uninstalled.  I ran a speed test between each uninstall with no improvements.  Once I got to this point I decided a reboot was in order in case one of the plugins needed the reboot. 

After the reboot speeds were back to normal.

 

I'm now going to reinstall the critical plugins one at a time, running a speed test between each, will probably reboot from time to time to make sure things are stable.  Really sucks rebooting enterprise hardware, take forever.

 

x    CA Auto Turbo Write Mode

    CA Auto Update Applications

    CA Backup / Restore Appdata

    CA Cleanup Appdata

    CA Mover Tuning

    Community Applications

x    Dynamix Cache Directories

    Dynamix System Buttons

x    Dynamix System Information

x    Dynamix System Statistics

    Fix Common Problems

    GPU Statistics

x    My Servers

    Nerd Tools

    Nvidia Driver

x    Open Files

x    Preclear Disks

    rclone

x    Tips and Tweaks

x    Unassigned Devices

x    Unassigned Devices Plus

x    unBALANCE

    User Scripts

Link to comment

Ignore all of the above, including safe mode fixing it.

 

Safe mode / uninstalling Tips and Tweaks was resetting vm.dirty_background_ratio and vm.dirty_ratio back to their defaults of 10 and 20.  Setting both of these manually to 0 using systemctl causes the dd tests to slow way back down. Not as bad as before, around 50 MB/s for the NVME cache and 7MB/s for spinning disks but still too slow.  Again, this is in Safe mode, so no plugins are running.

 

I really have no idea what is going on now.

 

I think my next test is going to be to boot to a USB Ubuntu instance and run the same tests on one of the spinning disks there.  This should take Unraid out of the equations to see if it is a hardware issue or not.

 

I'm pretty sure things were working fine not that long ago.  I was copying massive amounts of data to the array as I ripped my bluray collection.  I'd rip disks all day to my local system then make sure their directories and file names were correct then copy all of them to Unraid over SMB.  The first 80 gigs or so was super fast because it was going to the Ram cache but it would slow down some for the rest once it hit the dirty_ratio limit (I had it set at 50 at the time, so about half of 128 gigs).  But it certainly wasn't a few megs a secons.

Link to comment

Well, now I'm just stumped.  

Booting to an Ubuntu live USB and mounting the disks is showing the same perf issues, worse in fact.

The NVME BTRFS flash array that is the cache drive shows write speeds of 5MB/s 

One of the spinning disks is showing 511 KB/s using dd.

This is after setting 

  sysctl vm.dirty_background_ratio=0
  sysctl vm.dirty_ratio=0

Is this "normal" for Linux?  Or is something seriously broken in my server all of a sudden?

 

I was thinking that maybe it was the flashing of the Dell RAID card to IT mode, but the NVME drives are attached via PCI Express cards.

 

I'm starting to run out of ideas.  I guess I can pull the PCI express cards, install them in spare desktop I have and boot with the Ubuntu USB and replicate it there.

 

Any help would be greatly appreciated.

 

Link to comment

Won't hba issue, as same problem happen on NVMe and spindle disk.

 

26 minutes ago, Sarge said:

 

 

You often said safe mode no problem, do you mean mounting array disk by UD ? If mount by UD become normal then it won't be hardware issue, it still software issue.

 

Pls try use another USB fresh install Unraid ( trial license ) and test again.

Edited by Vr2Io
Link to comment
1 hour ago, Vr2Io said:

Pls use oflag=direct instead dsync

Ahh, thank you.  Yes, that fixed the testing with dd, now I'm getting the speed I expect using dd on Ubuntu and Unraid.

 

Booting Unraid, even in safe mode still has file transfers over SMB be very slow, now about 2 MB or less of speed.  This is for one large file when the vm.dirty_ratio is 0, when vm.dirty_ratio is default (20) then I get 300 MB/s or more.  This is telling me that it's not a SMB or network config thing. Again, this is in Safe Mode, so no plugins.  I'll try a new Unraid install with a demo key on a different USB drive and report back.

image.png.c0032b7e02da55ad218a102d6bb0e7b1.png

Link to comment

@Vr2Io Booted into new Unraid 6.10 RC2 and long story short, same issue.  Here's the full rundown of what I did.

  1. Created a new USB key with 6.10 RC2
  2. Booted with new key
  3. Set root password and logged in.  
  4. Logged into Unraid.net to get demo key
  5. Assigned devices to array and started it
  6. Turned on public SMB sharing for Temp share that is cache drive only
  7. Logged into ssh and ran `sysctl vm.dirty_ratio=0` and `sysctl vm.dirty_background_ratio=0`
  8. Ran `dd if=/dev/zero of=/mnt/cache/temp/test1.img bs=1G count=1 oflag=direct` Got expected performance of about 1.5 GB/s
  9. Tried copying 16.1 GB file to temp share on cache drive and got about 3.08 MB/s, see screenshot below. </sigh>

I'm going to install 6.9 on a USB drive to see if the speed is the same.  Burning my whole day on this, really getting frustrated.

image.png.fe68f2cae9510428c588bc20b87eae59.png

Link to comment
16 hours ago, Vr2Io said:

Try this will got same issue slow SMB transfer result, once resume to default then resume normal.

 

Why you want to turn-off cache, I haven't found benefit. 

I'm turning off the cache to replicate the issue I saw with the cache on when the RAM runs out.  SMB transfers should not slow to 3 to 5 MB/s on a 10 gig network to NVME storage, there's no excuse for this.  Either something is really really really broken in Unraid or Linux needs RAM caches to make SMB work for some reason.  I'm going to try to replicate on Ubuntu shortly but have a few meetings this morning.

Link to comment
22 hours ago, Vr2Io said:

Try this will got same issue slow SMB transfer result, once resume to default then resume normal.

 

Why you want to turn-off cache, I haven't found benefit. 

OK, tried this with a Ubuntu live boot USB and get the same results, so NOT an Unraid bug, a "Linux thing".  Seems Linux needs those RAM caches for SMB to work worth a darn?  IDK, super weird.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...