• SQLite DB Corruption testers needed


    limetech
    • Closed

    9/17/2019 Update: may have got to the bottom of this.  Please try 6.7.3-rc3 available on the next branch.

    9/18/2019 Update: 6.7.3-rc4 is available to address Very Slow Array Concurrent Performance.

     

    re:

     

    Trying to get to the bottom of this...  First we have not been able to reproduce, which is odd because it implies there may be some kind of hardware/driver dependency with this issue.  Nevertheless I want to start a series of tests, which I know will be painful for some since every time DB corruption occurs, you have to go through lengthy rebuild process.  That said, we would really appreciate anyone's input during this time.

     

    The idea is that we are only going to change one thing at a time.  We can either start with 6.6.7 and start updating stuff until it breaks, or we can start with 6.7.2 and revert stuff until it's fixed.  Since my best guess at this point is that the issue is either with Linux kernel, docker, or something we have misconfigured (not one of a hundred other packages we updated), we are going to start with 6.7.2 code base and see if we can make it work.

     

    But actually, the first stab at this is not reverting anything, but rather first updating the Linux kernel to the latest 4.19 patch release which is 4.19.60 (6.7.2 uses kernel 4.19.55).  In skimming the kernel change logs, nothing jumps out as a possible fix, however I want to first try the easiest and least impactful change: update to latest 4.19 kernel.

     

    If this does not solve the problem (which I expect it won't), then we have two choices:

     

    1) update to latest Linux stable kernel (5.2.2) - we are using 5.2 kernel in Unraid 6.8-beta and so far no one has reported any sqlite DB corruption, though the sample set is pretty small.  The downside with this is, not all out-of-tree drivers yet build with 5.2 kernel and so some functionality would be lost.

     

    2) downgrade docker from 18.09.06 (version in 6.7.2) to 18.06.03-ce (version in 6.6.7).

    [BTW the latest Docker release 19.03.00 was just published today - people gripe about our release numbers, try making sense of Docker release numbers haha]

     

    If neither of those steps succeed then ... well let's hope one of them does succeed.

     

    To get started, first make a backup of your flash via Main/Flash/Flash Backup, and then switch to the 'next' branch via Tools/Upgrade OS page.  There you should see version 6.7.3-rc1

     

    As soon as a couple people report corruption I'll publish an -rc2, probably with reverted Docker.

    Edited by limetech

    • Upvote 5



    User Feedback

    Recommended Comments



    Ok finished setup of the new server yesterday

     

    Installed only 2 dockers Sonarr (linuxserver version) and Plex (Plex media version)

     

    Started copying a few series from my previous server to the download dir of the new one

    Once the first series finished copying I started manual import. Everything completed fine.

    Second series finished copying I attempted to start manual import and immediately got a database malformed in Sonarr.

    At the same time I was still copying other files to the new server.

     

    This is now from a clean install, DBs for both Sonarr & Plex where empty.

     

    Attaching diagnostics.

     

    Will keep this server as a test bed for the next few weeks, to try anything new.

     

    alexandria-diagnostics-20191016-1053.zip

    Link to comment

    6.7.3-rc4 build here

    binhex-radarr

    binhex-sonarr

    Plex (Plex version)

     

    In all instances of the RC I've seen random corruption, but been going on several weeks since the last rebuild aside from my radarr instance, I haven't rebuilt that one. I have a pretty massive library. 

     

    Corruption seems to only occur when I have two scans on two different applications going at the same time. For instance, I can cause corruption pretty easily by just scanning Plex media folder via Plex and scanning the drive for new movies via radarr. 

     

    I've disabled automatic updates of my Plex library and only scan when I'm getting ready to use it. This has eliminated the corruption issues I've been seeing, even with sonarr doing it's thing. 

     

    I haven't chimed in on this thread in the past. Understand that you'll need diagnostics logs, but I'm on my way updating to 6.8.0-rc1 and will post my diagnostics files if I see corruption on 6.8.0-rc1. 

     

    Also of note for my configuration as I took several steps to see about avoiding things that would be causing issue. 

     

    I am running either disk10 (NVRAM) or cache (a different NVRAM) for the config directories. Interesting enough, the radarr instance is the one that got corruption the last time and operates on the same mount point as Plex. 

    /mnt/disk10/docker/binhex-radarr2/config

    /mnt/cache/docker/binhex-sonarr2/config

    /mnt/disk10/docker/pms-docker3/config

     

    Another test scenario could be to use one of my other disks as a mount and see if I can cause the corruption again, like I said, I can reproduce pretty easily at this point with just kicking off all three of them to do scans and imports. 

     

    Also, lastly, here is my current diagnostics file (sorry already have 6.8.0-rc1 staged) 

     

     

    base-diagnostics-20191017-0128.zip

    • Thanks 1
    Link to comment
    36 minutes ago, doubleshot said:

    /mnt/disk10/docker/binhex-radarr2/config

    /mnt/cache/docker/binhex-sonarr2/config

    /mnt/disk10/docker/pms-docker3/config

     

    36 minutes ago, doubleshot said:

    use one of my other disks

    Why not just use the cache drive for all of them.  Will probably solve the problem once and for all.

    Link to comment



    Guest
    This is now closed for further comments

  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.