[SOLVED] - Unbearably slow disk to disk write speeds (1MB/s)


Recommended Posts

Hi all,

 

I have been pouring over this form and the reddit channel, I am having some major issues writing to my disks using midnight commander this evening.

 

My system:

  1. Ryzen 1600
  2. 16gb DDR4
  3. Asrock B450m Pro4-F

 

Order of events:

  1. Installed 2 new drives last week, have been reshuffling items to reduce spin time on drives since then.
  2. Moved 2tb from Disk 1 to Disk 2 (avg 30-40MB/s) no issues there.
  3. Moved 1tb from disk 3 to disk 4 (as above, no issues there.
  4. Currently am trying to move about 100gb from Disk 2 to disk 3 (avg speed 1MB/s), constant freezing.
  5. I did reorder the disks using this: (i need a specific order...it's an OCD thing). 
  6. Writing first attempted directly from mnt/disk3 to mnt/disk4, waited overnight and had an average of 1MB/s to 0.5MB/s.
  7. Then moved from mnt/disk3 to mnt/disks/1tbNVME (unassigned), read time from Disk2 slower than expected (avg 60MB/s).
  8. Finally, now trying to copy from mnt/disks/1tbNVME to ANY drive on array, or between drives yields under 1MB/s writes.
  9. Attempted turning on turbo writes, same issue.
  10. Attempted multiple restarts.
  11. When attempting, Midnight commander appears to freeze, needs to be manually killed for the array to spindown.

note - I also adjusted the Disk Cache 'vm.dirty_background_ratio' (%): to 2 / 4 (previous was 1 / 2) using tips and tweaks.

 

Before you ask ;-):

  • 5 data drives (connected via IT flashed H310)
    • 3 of the drives are SMR, expecting a write penalty but this is well over and above, plus #7 above should have decreased this variable a lot.
  • 1 parity (directly to mobo)
  • 1 cache (directly to mobo)
  • 3 unassigned drives (directly to motherboard
  • I am ONLY writing from /mnt/disk to /mnt/disk, never to or from shares.

 

It's almost like my system is running out of cache and it's unable to write anymore to any drive?

Lastly, my cache is basically full, I have tried using mover for a small set of files but did not experience this delay or write issue.

 

The diagnostics are attached, this is with a move after a Fresh reboot, so let me know if there is some more tests I should do for more data.

adam-htpc-diagnostics-20210220-0102.zip

Edited by deanpelton
Link to comment

xfs repair didn't pick anything up.

 

root@Adam-HTPC:~# xfs_repair -v /dev/md3
Phase 1 - find and verify superblock...
        - block cache size set to 736760 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 28945 tail block 28945
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 1
        - agno = 0
        - agno = 3
        - agno = 2
Phase 5 - rebuild AG headers and trees...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...

        XFS_REPAIR Summary    Sat Feb 20 01:53:14 2021

Phase           Start           End             Duration
Phase 1:        02/20 01:52:47  02/20 01:52:48  1 second
Phase 2:        02/20 01:52:48  02/20 01:52:49  1 second
Phase 3:        02/20 01:52:49  02/20 01:52:50  1 second
Phase 4:        02/20 01:52:50  02/20 01:52:50
Phase 5:        02/20 01:52:50  02/20 01:52:50
Phase 6:        02/20 01:52:50  02/20 01:52:50
Phase 7:        02/20 01:52:50  02/20 01:52:50

Total run time: 3 seconds
done
root@Adam-HTPC:~# xfs_repair -v /dev/md1
Phase 1 - find and verify superblock...
        - block cache size set to 736760 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 66790 tail block 66790
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 2
        - agno = 3
        - agno = 1
        - agno = 0
Phase 5 - rebuild AG headers and trees...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...

        XFS_REPAIR Summary    Sat Feb 20 01:54:12 2021

Phase           Start           End             Duration
Phase 1:        02/20 01:54:11  02/20 01:54:11
Phase 2:        02/20 01:54:11  02/20 01:54:11
Phase 3:        02/20 01:54:11  02/20 01:54:12  1 second
Phase 4:        02/20 01:54:12  02/20 01:54:12
Phase 5:        02/20 01:54:12  02/20 01:54:12
Phase 6:        02/20 01:54:12  02/20 01:54:12
Phase 7:        02/20 01:54:12  02/20 01:54:12

Total run time: 1 second
done
root@Adam-HTPC:~# xfs_repair -v /dev/md2
Phase 1 - find and verify superblock...
        - block cache size set to 736760 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 1695868 tail block 1695868
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
Phase 5 - rebuild AG headers and trees...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...

        XFS_REPAIR Summary    Sat Feb 20 01:54:22 2021

Phase           Start           End             Duration
Phase 1:        02/20 01:54:20  02/20 01:54:21  1 second
Phase 2:        02/20 01:54:21  02/20 01:54:21
Phase 3:        02/20 01:54:21  02/20 01:54:21
Phase 4:        02/20 01:54:21  02/20 01:54:21
Phase 5:        02/20 01:54:21  02/20 01:54:21
Phase 6:        02/20 01:54:21  02/20 01:54:21
Phase 7:        02/20 01:54:21  02/20 01:54:21

Total run time: 1 second
done
root@Adam-HTPC:~# xfs_repair -v /dev/md4
Phase 1 - find and verify superblock...
        - block cache size set to 751664 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 8920 tail block 8920
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
Phase 5 - rebuild AG headers and trees...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...

        XFS_REPAIR Summary    Sat Feb 20 01:54:29 2021

Phase           Start           End             Duration
Phase 1:        02/20 01:54:27  02/20 01:54:28  1 second
Phase 2:        02/20 01:54:28  02/20 01:54:28
Phase 3:        02/20 01:54:28  02/20 01:54:29  1 second
Phase 4:        02/20 01:54:29  02/20 01:54:29
Phase 5:        02/20 01:54:29  02/20 01:54:29
Phase 6:        02/20 01:54:29  02/20 01:54:29
Phase 7:        02/20 01:54:29  02/20 01:54:29

Total run time: 2 seconds
done
root@Adam-HTPC:~# xfs_repair -v /dev/md5
Phase 1 - find and verify superblock...
        - block cache size set to 759112 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 6921 tail block 6921
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
Phase 5 - rebuild AG headers and trees...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...

        XFS_REPAIR Summary    Sat Feb 20 01:54:38 2021

Phase           Start           End             Duration
Phase 1:        02/20 01:54:35  02/20 01:54:35
Phase 2:        02/20 01:54:35  02/20 01:54:36  1 second
Phase 3:        02/20 01:54:36  02/20 01:54:36
Phase 4:        02/20 01:54:36  02/20 01:54:36
Phase 5:        02/20 01:54:36  02/20 01:54:36
Phase 6:        02/20 01:54:36  02/20 01:54:38  2 seconds
Phase 7:        02/20 01:54:38  02/20 01:54:38

Total run time: 3 seconds
done

Link to comment

Does seem to be the case... writing to the EARS drive avg at 40MB/s...

Now that I attempt to write to the other drives inc other SMR drives there doesn't seem to be a problem. Perhaps this drive is failing?

I've run a smart test already before logging with no issues, perhaps I will rerun an extended one overnight.

 

Wonder if anyone else has seen a failure like this? I guess just important to get the data off it given my whole array is in a state of flux.

 

EDIT: I did test moving to the non-smr drives earlier with the same problem, perhaps something else going on.

Will retest in the morning.

Edited by deanpelton
Link to comment

Okay now it's a new day.

I have managed to test this again now by excluding the slow drive / any SMR drives.

Basically, after a long copy (approx 100gb), after about 30 minutes the speed drops to 1MB/s, and the entire array becomes unresponsive. Any movement between any disks involving the array (including unassigned drives) is unbearably slow.

Updated diagnostics.

adam-htpc-diagnostics-20210220-1337.zip

Link to comment

Alright after swapping drive around off the H310, my results are all over the place.

At the moment, I suspect the SMR drives (from the long sustained writes) outer tracks are all full, tanking the performance of Write/Modify/Read.

I'm going to move the data off the affected drives and clear them and start my array again and look for some replacements.

Link to comment
  • deanpelton changed the title to [SOLVED] - Unbearably slow disk to disk write speeds (1MB/s)

Figured this out.

 

In case anyone else has the same issue, this is what happened:

 

I am running 3 SMR drives in my system (ST4000DM004), these are Seagate Compute drives.

The drives were almost full, so I moved data off the drives to some CMR drives, and then attempted to move data between the drives.

This resulted in the following:

  • All surface tracks on drives were full, so the drive needed to read, modify write AS WELL AS the parity read modify write.
  • 1mb/s transfer speeds, high IOWait, Unraid and Midnight commander locking up due to IO speed crashing to 0 while the drive moved the data around.

This is because these drives have no TRIM support, so there is no way to inform the drive that the data has been deleted, resulting in it shuffling the data around to make space until it overwrites it.

So how did I fix it? I needed to remove the drive(s) from the array, write 0's on every sector, then write the data sequentially back onto it, then rebuild parity.

 

How to avoid this in the future? Don't use SMR drives for anything other than sequential writes, especially those without TRIM.

 

What an ordeal...

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.