• [6.7.x] Very slow array concurrent performance


    JorgeB
    • Solved Urgent

    Since I can remember Unraid has never been great at simultaneous array disk performance, but it was pretty acceptable, since v6.7 there have been various users complaining for example of very poor performance when running the mover and trying to stream a movie.

     

    I noticed this myself yesterday when I couldn't even start watching an SD video using Kodi just because there were writes going on to a different array disk, and this server doesn't even have a parity drive, so did a quick test on my test server and the problem is easily reproducible and started with the first v6.7 release candidate, rc1.

     

    How to reproduce:

     

    -Server just needs 2 assigned array data devices (no parity needed, but same happens with parity) and one cache device, no encryption, all devices are btrfs formatted

    -Used cp to copy a few video files from cache to disk2

    -While cp is going on tried to stream a movie from disk1, took a long time to start and would keep stalling/buffering

     

    Tried to copy one file from disk1 (still while cp is going one on disk2), with V6.6.7:

     

    2083897607_Screenshot2019-08-0511_58_06.png.520373133cc121c80a361538a5fcc99b.png

     

    with v6.7rc1:

     

    856181720_Screenshot2019-08-0511_54_15.png.310bce8dbd6ed80d11d97727de55ac14.png

     

    A few times transfer will go higher for a couple of seconds but most times it's at a few KB/s or completely stalled.

     

    Also tried with all unencrypted xfs formatted devices and it was the same:

     

    1954593604_Screenshot2019-08-0512_21_37.png.6fb39b088e6cc77d99e45b37ea3184d8.png

     

    Server where problem was detected and test server have no hardware in common, one is based on X11 Supermicro board, test server is X9 series, server using HDDs, test server using SSDs so very unlikely to be hardware related.

    • Like 1
    • Upvote 22



    User Feedback

    Recommended Comments



    6 minutes ago, bonienl said:

    In the Main page, click on the blue icon at top right to toggle between display of disk counters or disk throughput

    Cheers!

    Link to comment
    2 minutes ago, Ancan said:

    Thanks, that improved a bit. But everything else still slows down drastically whenever I'm transferring files. Often can't reach Plex web interface, listing dockers in Unraid just gives me the bouncy wave animation, sometimes the CPU load bars just sit at 0% all of them. On a Ryzen 2600 with 32GB RAM and still far from saturating 1Gb network.

    That's not normal with v6.6, you might want to start a support thread and post diags.

    Link to comment

    I seem to have a similar issue. I only started using Unraid at 6.7.2 and I don't want to downgrade to an older version but I don't think hardware is my issue because my server was performing fine using Windows Storage Spaces on Server19 and Stablebit DrivePool on Win7 (the two previous things I was using before switching to Unraid). That said, my server is an i3-2105 with 4GB RAM. I'm using 14 devices, 6x connected to the onboard Intel SATA controller and 8x connected to two 4-port Marvell 88se9230 controllers.

     

    When I'm doing heavy writes to the array, reads are interrupted. If I'm adding episodes to the Unraid NAS even a Plex single stream from the same Unraid NAS (even if it's coming from a separate drive) becomes problematic and buffers excessively. As long as I'm writing to my SSD cache it's fine but once the cache is full the read performance goes bye-bye.

     

    My Plex server is a separate physical server, it's not running on my Unraid NAS. My Unraid NAS has no dockers or anything.

    Edited by shovenose
    Link to comment

    Is there any ETA for next version? It's getting out of control, my server is simply unusable.

    This is simply by running the mover or copying a file. PLEASE RELEASE AN EMERGENCY FIX.

    And no, I will not downgrade and risk other side effects, i'm in a production env.

    image.png.860dad5e85517c4fde898e707ce77729.png

    Link to comment
    On 9/12/2019 at 11:56 AM, yendi said:

    Is there any ETA for next version? It's getting out of control, my server is simply unusable.

    This is simply by running the mover or copying a file. PLEASE RELEASE AN EMERGENCY FIX.

    And no, I will not downgrade and risk other side effects, i'm in a production env.

    image.png.860dad5e85517c4fde898e707ce77729.png

    Yeah, I'm wondering when an update will be available to resolve this issue.

    Link to comment
    13 hours ago, shovenose said:

    Yeah, I'm wondering when an update will be available to resolve this issue.

    LT is trying to fix this issue for v6.8rc1, which will be released soon™

    Link to comment

    That sounds good - have you been told that officially by anyone, they're still remarkably quiet, I don't remember the database corruption issue taking this long or being so 'still' and I'd say this is at least as bad....

    Link to comment

    Fantastic.  How ever you squeezed that information out of him I'm grateful.  Personally, I didn't think it would be that hard to just post that here.  Either way it's hard to be annoyed at a guy who's profile only brings memories of island holidays and Hawaiian shirts lol.

    Link to comment
    1 hour ago, Marshalleq said:

    Fantastic.  How ever you squeezed that information out of him I'm grateful.  Personally, I didn't think it would be that hard to just post that here.  Either way it's hard to be annoyed at a guy who's profile only brings memories of island holidays and Hawaiian shirts lol.

    Agree but this issue has made me review Unraid as the central server that runs most of the houses needs. 

     

    This is a pretty big issue, if a copy happens while streaming video the movie just stops. Media streaming and the associated media harvesting and organising dockers is my main reason for unraid. 

     

    To not have a single acknowledgement, other than now through a 3rd party, that they are aware of the issue and addressing it is poor form. It's a big issue, it needed those affected to be put at some ease. 

     

    I've been around since the 4.x days so I'm acutely aware of what 'released soon' means. I hope we see something normal people 'soon' so I can fend off the wife and kids. 

    Link to comment

    Well said.  Though you get what you pay for.  Unraid is pretty cheap really.  I run servers and things on mine, but really Unraid is a long way off from being a mission critical type of setup.  It's most definitely aimed at home installations.

    Link to comment

    Just to add that while Tom mentioned they are trying to fix this issue for v6.8-rc1, I suppose that if it takes longer than expected, or if for example the SQL corruption issue is unrelated and fixed before this, I would expect they could release rc1 with this as a known issue.

    Link to comment
    35 minutes ago, Marshalleq said:

    Well said.  Though you get what you pay for.  Unraid is pretty cheap really.  I run servers and things on mine, but really Unraid is a long way off from being a mission critical type of setup.  It's most definitely aimed at home installations.

    Oh don't get me wrong. I love Unraid and nothing gets close to it. But unlike the old days, now there are demanding family members who just expect things to work. 

    12 minutes ago, johnnie.black said:

    Just to add that while Tom mentioned they are trying to fix this issue for v6.8-rc1, I suppose that if it takes longer than expected, or if for example the SQL corruption issue is unrelated and fixed before this, I would expect they could release rc1 with this as a known issue.

    That's a bit worrying if they still aren't sure what the issue is and are hoping it's related to another issue they are fixing. 

     

    The SQL corruption is an easy fix, use a cache drive. This issue has no easy fix. 

    Link to comment
    3 minutes ago, dalben said:

    That's a bit worrying if they still aren't sure what the issue is and are hoping it's related to another issue they are fixing. 

    It's me who suspects both issues are related, don't know what Tom thinks about that.

    Link to comment
    1 minute ago, johnnie.black said:

    It's me who suspects both issues are related, don't know what Tom thinks about that.

    OK. Fair enough. I'll blame you 😉

    • Haha 1
    Link to comment

    Here's my take on the situation. The sql thing has been an issue for a LONG time, but only under some very hard to pin down circumstances. The typical fix was just to be sure the sql database file was on a direct disk mapping instead of the user share fuse system. It seems to me like the sql software is too sensitive to timing, gives up and corrupts the database when the transaction takes too long.

     

    Fast forward to the 6.7.x release, and it's not just the fuse system, it's the entire array that is having performance issues. Suddenly, what was a manageable issue with sql corruption becomes an issue for anything but a direct cache mapping.

     

    So, I suspect fixing this concurrent access issue will help with the sql issue for many people as well, but I think the sql thing will ultimately require changes that are out of unraid's direct control, possibly some major changes with the database engine. The sql thing has been an issue in the background for years.

    • Like 1
    Link to comment

    Man I had forgotten how unRAID is supposed to run.  Been back on 6.6.7 all of about 10 minutes and everything is sooooo much faster.  Even torrents are coming in 30% faster than top speed on 6.7.2.  Crazy.

     

    craigr

    Link to comment

    The scariest thing to me is I keep getting read errors on this version.  I've had several on a brand new SSD, which are permanently etched into it's record and those parts disabled, I've had them on hard drives as well - which it seems to select at random.  Every now and then it kicks them out of the array and this so far always coincides with a reboot.  i've also had two SSD's completely die on this version which my gut tells me is caused by this but I don't know how I could ever prove that - I just don't want to downgrade as the GPU passthrough is so much better in this version, but I'm thinking again I might do that today.  It's just too much pain.

    Link to comment

    Actually now that you mention it I've had 3 of my disks (spinners) throw read errors and end up disabled recently (4 week period) . I have no reason to believe it's related to this bug but it's interesting you have seen the same. 

     

    I've RMA'd 2 and mamanged to get one going again. I put it down to a bad batch and just one of those things but I may downgrade to 6.6.7 for a while. 

    Link to comment

    Multiple hard drives across both onboard and PCIe base controllers, two completely dead SSD's and an additional brand new SSD on top of that with read errors, mostly occurring post reboot.  It may or may not be related to this bug, however it seems to me it is related to this version.  No idea why.  Yeah it could be hardware, but again 'coincidentally' arrived with this version.

    Link to comment
    58 minutes ago, Marshalleq said:

    Multiple hard drives across both onboard and PCIe base controllers, two completely dead SSD's and an additional brand new SSD on top of that with read errors, mostly occurring post reboot.  It may or may not be related to this bug, however it seems to me it is related to this version.  No idea why.  Yeah it could be hardware, but again 'coincidentally' arrived with this version.

    I have been having the same problem, so add me to the list.

    Link to comment

    I to would love some sort of official acknowledgment of the issue.  While its an inexpensive product, its still good to hear that an issue is being worked on.  Customers complained about Flexraid and its lack of responsiveness to issues - and look at where that product ended up... and in all fairness I would still be running it if the company hadn't died (some of my machines is still running it, but in the process of being decommissioned now)  Was about to dive into a few more licenses for Unraid but holding off for now due to this issue and mainly - lack of acknowledgement.

    Link to comment
    3 hours ago, bytchslappa said:

    I to would love some sort of official acknowledgment of the issue.  While its an inexpensive product, its still good to hear that an issue is being worked on.  Customers complained about Flexraid and its lack of responsiveness to issues - and look at where that product ended up... and in all fairness I would still be running it if the company hadn't died (some of my machines is still running it, but in the process of being decommissioned now)  Was about to dive into a few more licenses for Unraid but holding off for now due to this issue and mainly - lack of acknowledgement.

    Yeah, I mean sure Unraid is cheap compared to enterprise software, but there are plenty of other great products that are free or cheaper. I don't think price is a worthy reason to ignore such a major, easily reproducible bug. I spent $130 on this thing and for it to perform so abysmally is frustrating. I never had ANY performance or any other issues with DrivePool and it's waaaay cheaper. I am glad I switched to Unraid because it delivers the additional features I needed (easy parity, web UI, etc) but DrivePool just worked. Perfectly. 100% of the time. And it's $30.

     

    Saying it's OK for Unraid to have this issue without any word from the developers on an ETA for resolution is like saying it's OK if my 2010 Ford Escape gets limited to 5mph while driving 65mph randomly just because it cost less than a brand new Escape. No, it's not, and in fact Ford had a massive recall related to prematurely failing electronic throttle bodies because they would go into limp mode on the highway which is a major safety and functionality issue.

    Edited by shovenose
    Link to comment
    10 hours ago, bytchslappa said:

    I to would love some sort of official acknowledgment of the issue.  While its an inexpensive product, its still good to hear that an issue is being worked on.  Customers complained about Flexraid and its lack of responsiveness to issues - and look at where that product ended up... and in all fairness I would still be running it if the company hadn't died (some of my machines is still running it, but in the process of being decommissioned now)  Was about to dive into a few more licenses for Unraid but holding off for now due to this issue and mainly - lack of acknowledgement.

    I am reasonably certain that the issue HAS been acknowledged by Limetech!   We have definitely been told it is on the list of issues that are hoping to be cleared for the 6.8 rc1 release. (which is hopefully going to be made available soon).

     

    I suspect that any solution is non-trivial and is somewhere at the Linux driver level which means it can be difficult to get resolved in a way that has no side-effects elsewhere.

    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.