[6.7.0-rc2] Reading all disks when writing to a single one

SimonF · February 23, 2019

looking ok to me, small reads no longer showing when mover is running.

image.png.60ec849ded032b7212748a87d4b06933.png

Do see all the disks spinup that share is on, is that normal or should it just spin up disk its writing to?

itimpi · February 23, 2019

24 minutes ago, SimonF said:

looking ok to me, small reads no longer showing when mover is running.

Do see all the disks spinup that share is on, is that normal or should it just spin up disk its writing to?

I think that is normal due to UnRAID having to check the file does not already exist on another drive. Belonging to the share (although I could be wrong )

it is likely that using the Folder Caching plugin will help here. I know that recently some people have had problems with that plugin but it could well have been due to the same kernel bug that was causing these small writes? I guess we will have to see what others find out?

hawihoney · February 23, 2019

Quote

Definitely something else going on. This is not normal behavior.

That's possible, indeed. Would be two different problems showing the same symtoms then. Interesting.

Here are the results - just in case:

I copy with MC from the bare metal server a 52GB file to the VM server. The source is a folder created on an Unassigned Devices RAID1 (BTRFS). The target is a folder in the VM. The disk of that remote folder is mounted on the bare metal server as follows. I use own Mount and Unmount scripts in User Scripts because I need to wait for the start of the VMs before mounting and Unassigned Devices seems to get a race condition when showing 48 SMB mount points on the Main Page:

mkdir -p /mnt/hawi
mkdir -p /mnt/hawi/192.168.178.101_disk1
mount -t cifs -o rw,nounix,iocharset=utf8,_netdev,file_mode=0777,dir_mode=0777,vers=3.0,username=hawi,password=******** '//192.168.178.101/disk20' '/mnt/hawi/192.168.178.101_disk20'

MC.jpg.0769f4842a20b28fb7af99e42aec81c4.jpg

The start looked promising. RAM usage on target did not get beyond 12-13%. CPU was around 30%. At 32% of the copy process (15GB of copy done, RAM total is 16GB) the problem started. MC halted, those 8 threads showed 100% CPU on source server (host of the VM). Dto. on target server. Last I saw was 87% RAM usage on target server. The small read requests on the other disks started. First they came in chunks (disk1-disk5, disk6-disk10, ...). Short time later all disks were reading in small values. On one picture you see red marks and a blue mark. This blue mark shows the disk that is not part of the User Share and shows 50% of the read amount of the other disks.

I know, Unraid in VM is not supported, but I think that shows a problem. My first guess (look two pages before) was SMBD. Because that process is holding that amount of CPU and virtual RAM when the problems are showing up.

tower-diagnostics-20190223-0833.zip

towervm01-diagnostics-20190223-0833.zip

Edited February 23, 2019 by hawihoney

bonienl · February 23, 2019

20 minutes ago, hawihoney said:

I copy with MC from the bare metal server a 52GB file to the VM server. The source is a folder created on an Unassigned Devices RAID1 (BTRFS). The target is a folder in the VM

Have you tried a 'standard' test? Both UD and Unraid as VM are not part of a stock Unraid installation.

This is copying a 80 GB file to the encrypted array on my server

image.png.950802cc4ed82f955a5e1d326b989f23.png

Copy speed is pretty constant and saturates the 1 Gbps link. During the whole process only parity and the designated disk show read/write acitivity. CPU usage is between 15% and 20%.

Edited February 23, 2019 by bonienl

bonienl · February 23, 2019

50 minutes ago, itimpi said:

I think that is normal due to UnRAID having to check the file does not already exist on another drive. Belonging to the share (although I could be wrong )

You are not wrong. If a share exists on multiple disks, the mover needs to retrieve information from all those disks in order to determine where to place the new file(s).

Edited February 23, 2019 by bonienl

hawihoney · February 23, 2019

Can't go back to two bare metal servers. Within the last weeks I did rebuild my systems to one bare metal server and several JBOD chassis. Each JBOD chassis is driven by it's own 9300-8e in the bare metal server. All chassis are connect thru SAS cables, not Ethernet.

So please close that thread. My problem is still there, but I can't retest that in an official supported environment.

I decided to swap the contents. Big files on Main Server, small files in VMs on JBODs.

saarg · February 23, 2019

6 minutes ago, hawihoney said:

Can't go back to two bare metal servers. Within the last weeks I did rebuild my systems to one bare metal server and several JBOD chassis. Each JBOD chassis is driven by it's own 9300-8e in the bare metal server. All chassis are connect thru SAS cables, not Ethernet.

So please close that thread. My problem is still there, but I can't retest that in an official supported environment.

I decided to swap the contents. Big files on Main Server, small files in VMs on JBODs.

Isn't it better to switch the servers? Copy from the unraid vm to the bare metal one? That way the writing happens on the supported unraid.

hawihoney · February 23, 2019

Quote

Isn't it better to switch the servers? Copy from the unraid vm to the bare metal one? That way the writing happens on the supported unraid.

Ok, I will do a test and copy a big file from VM to bare metal. But I think that will not show the real problem. Bare metal server has 128 GB RAM. IMHO part of the problem shows up here whenever the size of the copied file exceeds the available RAM of the target machine.

Perhaps copying over SAS cables to SMB mount points is to fast for the target server in a VM. Perhaps something caches to much. In one of my first posts here I did point at SMBD. This process is eating up all available RAM and CPU on the target side. Perhaps that's only happening within a VM. So many questions

Will test tomorrow and report back. But I bet this will work.

NewDisplayName · February 23, 2019

I had a 4GB Ram unraid server copying 10,20,30 gb of files via smb, never had a problem always around 100mb/s. So it might be not a general problem.

Unraid SMB -> Windows 10 Client.

Edited February 23, 2019 by nuhll

limetech · February 23, 2019

Thanks everyone for retesting. This one was a doozy.

8 hours ago, hawihoney said:

The start looked promising. RAM usage on target did not get beyond 12-13%. CPU was around 30%. At 32% of the copy process (15GB of copy done, RAM total is 16GB) the problem started. MC halted, those 8 threads showed 100% CPU on source server (host of the VM). Dto. on target server. Last I saw was 87% RAM usage on target server. The small read requests on the other disks started. First they came in chunks (disk1-disk5, disk6-disk10, ...). Short time later all disks were reading in small values. On one picture you see red marks and a blue mark. This blue mark shows the disk that is not part of the User Share and shows 50% of the read amount of the other disks.

For your particular use case, perhaps consider experimenting with kernel virtual memory tuning:

https://discuss.aerospike.com/t/tuning-kernel-memory-for-performance/4195

Hoopster · February 25, 2019

I cannot comment on other cases mentioned in this report; however, the behavior I noticed with rc4 seems to have been resolved with rc5.

When doing large file writes/DVR records/backups, I see only the parity disk and the target data disk spun up if they were not previously active. I no longer see (at least in my limited testing) all disks spun up.

TRusselo · December 26, 2020

I just noticed this problem on my machine with 6.9.0 RC2

[6.7.0-rc2] Reading all disks when writing to a single one

User Feedback

Recommended Comments

SimonF 948

Link to comment

itimpi 2237

Link to comment

hawihoney 597

Link to comment

bonienl 1764

Link to comment

bonienl 1764

Link to comment

hawihoney 597

Link to comment

saarg 472

Link to comment

hawihoney 597

Link to comment

NewDisplayName 117

Link to comment

limetech 3326

Link to comment

Hoopster 1183

Link to comment

TRusselo 18

Link to comment

Join the conversation