s.Oliver

August 24, 2019

8 hours ago, simalex said:

I think it's more that when a sector is written on a data drive then for parity to be consistent the same sector needs to be updated near real time as well on the parity drive. The individual drives don't understand this concept and allowing a drive to update in which ever order it chooses would increase the chance of the parity drive being out of sync with the actual data drives, especially when you have updates in multiple data drives simultaneously.

Imagine having to update sector 13456 on drive 3 and sector 25789 on drive 4 in that order and then the parity drive deciding that is should update first sector 25789 and then 13456 and at the same time having a power failure in-between those writes. Then you would end up having 2 sectors with invalid parity data, even though your data drives both have the correct information.

i didn't want to go into deep of the concept of unRAIDs parity algorithm. so you're right, unRAID needs to be strict in writing the same sector to data/parity drive(s) at (more or less) at the same time (given how fast different drives are completing the request). so the slowest drive in the mix (which is in the data writing cycle – doesn't matter if parity or data) is responsible for the time needed (or how fast that write cycle will be completed).

but, unRAID is not immune against data loss because of not finished write operations (whatever reason) and has no concept of a journal (to my knowledge). so this file (at that time when writing was abrupt ended and not finished) is damaged/incomplete and parity doesn't/can't change anything here and probably isn't in sync anyway. so unRAID does usually force an parity sync on next start of the array (and it will rebuild parity information completely/only based on the values of the data drive(s)).

unRAID would need some concept of journaling to replay the writes and find the missing part. it has not (again, to my knowledge). ZFS is one file system, which has an algorithm to prevent exactly this.

my observation is, that it is a pretty much synchronous write operation (all drives which need to write data, do write the sectors in the same order/same time – else i imagine, i could hear much more 'noise' from my drives, especially if you do a rebuild).

but i do confess – that is only my understanding of unRAIDs way of writing data into the array.

August 24, 2019

13 hours ago, Marshalleq said:

That said the below actually outlines areas where performance is decreased and specifically mentions RAID. So perhaps that is Limetech's testing found it works better switched off.

https://en.wikipedia.org/wiki/Native_Command_Queuing

on normal SSDs (SATA) (at least on one machine as cache drive seen) it is set to "32". but these are fast enough to handle it and they are not embedded in that special "RAID" operation as the data/parity drives.

because of the nature of unRAIDs "RAID"-modus i guess, the drives are "faster" if they work one small chunks of data in 'sequential' order.

August 24, 2019

10 hours ago, rclifton said:

I think what he was saying is, now that he is back on 6.6.7 he checked and the queue depth is 1 on 6.6.7 as well. Which means that the speculation that NCQ in 6.7 might be part of the problem with that release would be incorrect since for him queue depth was 1 for both 6.6.7 and 6.7 but he has no issues with 6.6.7. Or at least that's how I read what he said anyway.

nearly perfect

i haven't checked my own QD settings on 6.7.x before i left (no one has brought up the QD as a possible reason), but i looked at a friends unRAID system. a fresh setup (just a few weeks old) and there all spinners are also on QD=1.

August 24, 2019

11 hours ago, Marshalleq said:

I'm not sure what you're saying here. Queue Depth is set to 1 on both 6.6.7 and latest stable. So how does 6.6.x have a higher Queue depth?

i was just reacting on @patchrules2000 post, he was setting all drives to QD=32 (even on 6.6.x).

August 23, 2019

my 2cents here (i'm back on 6.6.7 for 12 days and all is as good as it ever was):

Disk Settings: Tunable (md_write_method): Auto (have never touched it)

cat /sys/block/sdX/device/queue_depth for all rotational HDDs is "1"

QD for cache NVMe drive is unknown (doesn't have the same path to print the value)

wouldn't this contradict the opinion, that because of 6.6.x series has a higher QD value, it performs better?

August 16, 2019

need to correct my last post: PLEX docker (media scan background task) did crash now once. so possible that this isn't related to the kernel, or whatever.

August 15, 2019

5 hours ago, sirkuz said:

I bit the bullet and have decided to revert one system as well, as reported many times things seem to be back to "normal". Not sure how long I will be able to hold of on the secondary reverting as well as it was quite simple.

maybe you don't have to, if limetech can identify the problem and fix it.

August 14, 2019

funny thing, now another problem has disappeared (after going back to 6.6.7), which brought some serious brain smashing:

PLEX (docker) has some background tasks running (usually in the night), one is the media scanning job. this one regularly crashed and alot of people had this problem too and tried to find a solution. now after some days of up time with 6.6.7 i haven't seen one crash – YEAH!

in the nights i've some big backup jobs running, which are writing into the array. so i would guess, that PLEX has timed out on accessing data in the array (albeit, i just reads files).

August 10, 2019

well, couldn't stand it anymore – so back to 6.6.7 and all is back to normal, expected behavior.

though, missing stuff from 6.7, so i'll hope they can identify/fix the problem really soon.

August 7, 2019

i can add to this and it's a major drop-down for unRAID going from 6.7 onward.

before i was reluctant to post about it, cause of too less tests done to be 100% sure of not having some settings somewhere changed…

but now, i'm sure. today i upgraded one more unRAID server from 6.6x to 6.7.2 and do see the exact same behavior! so i do have 2 machines here, which haven't had a single change, except they were uograded to 6.7.x (meanwhile all on 6.7.2).

in my book, it doesn't matter how you access the data: coming from network or locally on the server, using different machines to connect to the server… when one write into the array is ongoing, then any reads (even from cache SSDs/NVMe') – even the ones coming from data or cache devices which aren't written to – are super slow. also whenever now a rebuild is happening, you better not want to read any file...

also RAM amount doesn't change anything, nor the used controllers nor the cpu (with/without mitigation enabled/disabled). and while i can't back it by data, it seems that rebuilds are slower too.

this can have severe scenarios, where some services are writing continuously data into the array (like video surveillance for example).

hopefully we can find a fast fix for this, because going back to 6.6.x isn't a good option anymore.

@limetech what can we do to help debugging this?

September 19, 2018

12 minutes ago, SpaceInvaderOne said:

Well, I just changed my motherboard for an ASRock board and swapped it out today. Then I read this post and saw there was a new bios for the Gigabyte boards! Typical.

I am seriously thinking of just using air cooling and going with a Noctua NH U14s TR4 then selling the Enermax when I get the replacement.

hey spaceinvaderone,

might consider to tell the name/type of the mainboards you use for your threadripper cpus?

well, i can recommend the Noctua CPU air coolers, at least here i've used several of them, all excellent!

s.Oliver

Posts

Joined

Last visited

Content Type

Profiles

Forums

Downloads

Store

Gallery

Bug Reports

Documentation

Landing

Report Comments posted by s.Oliver

[6.7.x] Very slow array concurrent performance

[6.7.x] Very slow array concurrent performance

[6.7.x] Very slow array concurrent performance

[6.7.x] Very slow array concurrent performance

[6.7.x] Very slow array concurrent performance

[6.7.x] Very slow array concurrent performance

[6.7.x] Very slow array concurrent performance

[6.7.x] Very slow array concurrent performance

[6.7.x] Very slow array concurrent performance

[6.7.x] Very slow array concurrent performance

[ 6.6.0-rc1 ] VM performance on threadripper 2990wx