Jump to content

s.Oliver

Members
  • Posts

    308
  • Joined

  • Last visited

Report Comments posted by s.Oliver

  1. 8 hours ago, simalex said:

    I think it's more that when a sector is written on a data drive then for parity to be consistent the same sector needs to be updated near real time as well on the parity drive. The individual drives don't understand this concept and allowing a drive to update in which ever order it chooses would increase the chance of the parity drive being out of sync with the actual data drives, especially when you have updates in multiple data drives simultaneously.

    Imagine having to update sector 13456 on drive 3 and sector 25789 on drive 4 in that order and then the parity drive deciding that is should update first sector 25789 and then 13456 and at the same time having a power failure in-between those writes. Then you would end up having 2 sectors with invalid parity data, even though your data drives both have the correct information. 

     

    i didn't want to go into deep of the concept of unRAIDs parity algorithm. so you're right, unRAID needs to be strict in writing the same sector to data/parity drive(s) at (more or less) at the same time (given how fast different drives are completing the request). so the slowest drive in the mix (which is in the data writing cycle – doesn't matter if parity or data) is responsible for the time needed (or how fast that write cycle will be completed).

     

    but, unRAID is not immune against data loss because of not finished write operations (whatever reason) and has no concept of a journal (to my knowledge). so this file (at that time when writing was abrupt ended and not finished) is damaged/incomplete and parity doesn't/can't change anything here and probably isn't in sync anyway. so unRAID does usually force an parity sync on next start of the array (and it will rebuild parity information completely/only based on the values of the data drive(s)).

     

    unRAID would need some concept of journaling to replay the writes and find the missing part. it has not (again, to my knowledge). ZFS is one file system, which has an algorithm to prevent exactly this.

     

    my observation is, that it is a pretty much synchronous write operation (all drives which need to write data, do write the sectors in the same order/same time – else i imagine, i could hear much more 'noise' from my drives, especially if you do a rebuild).

     

    but i do confess – that is only my understanding of unRAIDs way of writing data into the array.

  2. 13 hours ago, Marshalleq said:

    That said the below actually outlines areas where performance is decreased and specifically mentions RAID.  So perhaps that is Limetech's testing found it works better switched off.

     

    https://en.wikipedia.org/wiki/Native_Command_Queuing

    on normal SSDs (SATA) (at least on one machine as cache drive seen) it is set to "32". but these are fast enough to handle it and they are not embedded in that special "RAID" operation as the data/parity drives.

    because of the nature of unRAIDs "RAID"-modus i guess, the drives are "faster" if they work one small chunks of data in 'sequential' order.

  3. 10 hours ago, rclifton said:

    I think what he was saying is, now that he is back on 6.6.7 he checked and the queue depth is 1 on 6.6.7 as well.  Which means that the speculation that NCQ in 6.7 might be part of the problem with that release would be incorrect since for him queue depth was 1 for both 6.6.7 and 6.7 but he has no issues with 6.6.7.  Or at least that's how I read what he said anyway.

    nearly perfect ;-)

    i haven't checked my own QD settings on 6.7.x before i left (no one has brought up the QD as a possible reason), but i looked at a friends unRAID system. a fresh setup (just a few weeks old) and there all spinners are also on QD=1.

  4. my 2cents here (i'm back on 6.6.7 for 12 days and all is as good as it ever was):

     

    Disk Settings: Tunable (md_write_method): Auto (have never touched it)

    cat /sys/block/sdX/device/queue_depth for all rotational HDDs is "1"

     

    QD for cache NVMe drive is unknown (doesn't have the same path to print the value)

     

    wouldn't this contradict the opinion, that because of 6.6.x series has a higher QD value, it performs better?

  5. 5 hours ago, sirkuz said:

    I  bit the bullet and have decided to revert one system as well, as reported many times things seem to be back to "normal".  Not sure how long I will be able to hold of on the secondary reverting as well as it was quite simple.

    maybe you don't have to, if limetech can identify the problem and fix it.

    • Upvote 1
  6. funny thing, now another problem has disappeared (after going back to 6.6.7), which brought some serious brain smashing:

     

    PLEX (docker) has some background tasks running (usually in the night), one is the media scanning job. this one regularly crashed and alot of people had this problem too and tried to find a solution. now after some days of up time with 6.6.7 i haven't seen one crash – YEAH!

     

    in the nights i've some big backup jobs running, which are writing into the array. so i would guess, that PLEX has timed out on accessing data in the array (albeit, i just reads files).

    • Upvote 1
  7. i can add to this and it's a major drop-down for unRAID going from 6.7 onward.

     

    before i was reluctant to post about it, cause of too less tests done to be 100% sure of not having some settings somewhere changed…

     

    but now, i'm sure. today i upgraded one more unRAID server from 6.6x to 6.7.2 and do see the exact same behavior! so i do have 2 machines here, which haven't had a single change, except they were uograded to 6.7.x (meanwhile all on 6.7.2).

     

    in my book, it doesn't matter how you access the data: coming from network or locally on the server, using different machines to connect to the server… when one write into the array is ongoing, then any reads (even from cache SSDs/NVMe') – even the ones coming from data or cache devices which aren't written to – are super slow. also whenever now a rebuild is happening, you better not want to read any file...

     

    also RAM amount doesn't change anything, nor the used controllers nor the cpu (with/without mitigation enabled/disabled). and while i can't back it by data, it seems that rebuilds are slower too.

     

    this can have severe scenarios, where some services are writing continuously data into the array (like video surveillance for example).

     

    hopefully we can find a fast fix for this, because going back to 6.6.x isn't a good option anymore.

     

    @limetech what can we do to help debugging this?

    • Upvote 2
  8. 12 minutes ago, SpaceInvaderOne said:

    Well, I just changed my motherboard for an ASRock board and swapped it out today. Then I read this post and saw there was a new bios for the Gigabyte boards! Typical.

     

    I am seriously thinking of just using air cooling and going with a Noctua NH U14s TR4 then selling the Enermax when I get the replacement.

     

    hey spaceinvaderone,

     

    might consider to tell the name/type of the mainboards you use for your threadripper cpus?

     

    well, i can recommend the Noctua CPU air coolers, at least here i've used several of them, all excellent!

×
×
  • Create New...