• SQLite DB Corruption testers needed


    limetech
    • Closed

    9/17/2019 Update: may have got to the bottom of this.  Please try 6.7.3-rc3 available on the next branch.

    9/18/2019 Update: 6.7.3-rc4 is available to address Very Slow Array Concurrent Performance.

     

    re:

     

    Trying to get to the bottom of this...  First we have not been able to reproduce, which is odd because it implies there may be some kind of hardware/driver dependency with this issue.  Nevertheless I want to start a series of tests, which I know will be painful for some since every time DB corruption occurs, you have to go through lengthy rebuild process.  That said, we would really appreciate anyone's input during this time.

     

    The idea is that we are only going to change one thing at a time.  We can either start with 6.6.7 and start updating stuff until it breaks, or we can start with 6.7.2 and revert stuff until it's fixed.  Since my best guess at this point is that the issue is either with Linux kernel, docker, or something we have misconfigured (not one of a hundred other packages we updated), we are going to start with 6.7.2 code base and see if we can make it work.

     

    But actually, the first stab at this is not reverting anything, but rather first updating the Linux kernel to the latest 4.19 patch release which is 4.19.60 (6.7.2 uses kernel 4.19.55).  In skimming the kernel change logs, nothing jumps out as a possible fix, however I want to first try the easiest and least impactful change: update to latest 4.19 kernel.

     

    If this does not solve the problem (which I expect it won't), then we have two choices:

     

    1) update to latest Linux stable kernel (5.2.2) - we are using 5.2 kernel in Unraid 6.8-beta and so far no one has reported any sqlite DB corruption, though the sample set is pretty small.  The downside with this is, not all out-of-tree drivers yet build with 5.2 kernel and so some functionality would be lost.

     

    2) downgrade docker from 18.09.06 (version in 6.7.2) to 18.06.03-ce (version in 6.6.7).

    [BTW the latest Docker release 19.03.00 was just published today - people gripe about our release numbers, try making sense of Docker release numbers haha]

     

    If neither of those steps succeed then ... well let's hope one of them does succeed.

     

    To get started, first make a backup of your flash via Main/Flash/Flash Backup, and then switch to the 'next' branch via Tools/Upgrade OS page.  There you should see version 6.7.3-rc1

     

    As soon as a couple people report corruption I'll publish an -rc2, probably with reverted Docker.

    Edited by limetech

    • Upvote 5



    User Feedback

    Recommended Comments



    7 hours ago, trott said:

    I found MakeMAV failed to remux some movies, I have thought it might be movie issue, but I have a force recheck on those torrents today, it happaned they are not 100% complete


    I ran across this problem too and found that is was related to this problem "[6.7.x] Very slow array concurrent performance by Johnnie.Black" that has been reported.

    I don't run plex but do have several other dockers that use sqlite and have never had a problem. Appdata is set to "cache only"

    Link to comment

    I think my dad is suffering with the same problem.

     

    I have never had this issue and his will do it every two weeks.

     

    I've even moved him to the beta in hope it would stop.

     

    Happens with both linuxserver and binhex emby releases.

     

    Only main difference is he has an AMD cpu and i'm intel

     

    Version: 4.2.1.0
    
    Command line: /app/emby/EmbyServer.dll -programdata /config -ffdetect /app/emby/ffdetect -ffmpeg /app/emby/ffmpeg -ffprobe /app/emby/ffprobe
    Operating system: Unix 4.19.56.0
    64-Bit OS: True
    64-Bit Process: True
    User Interactive: True
    Runtime: file:///app/emby/System.Private.CoreLib.dll
    Processor count: 2
    Program data path: /config
    Application directory: /app/emby
    System.Net.HttpListenerException: System.Net.HttpListenerException (13): Permission denied
    at SocketHttpListener.Net.HttpEndPointManager.GetEPListener(ILogger logger, String host, Int32 port, HttpListener listener, Boolean secure)
    at SocketHttpListener.Net.HttpEndPointManager.AddPrefixInternal(ILogger logger, String p, HttpListener listener)
    at SocketHttpListener.Net.HttpEndPointManager.AddListener(ILogger logger, HttpListener listener)
    at SocketHttpListener.Net.HttpListener.Start()
    at Emby.Server.Implementations.ApplicationHost.StartServer()
    Source: SocketHttpListener
    TargetSite: SocketHttpListener.Net.HttpEndPointListener GetEPListener(MediaBrowser.Model.Logging.ILogger, System.String, Int32, SocketHttpListener.Net.HttpListener, Boolean)
    
    Info HttpServer: Adding HttpListener prefix http://+:8096/
    Info HttpServer: Adding HttpListener prefix https://+:8920/
    Info SqliteItemRepository: Default journal_mode for /config/data/library.db is wal
    Error Main: Error in appHost.Init
    
    *** Error Report ***
    
    Version: 4.2.1.0
    
    Command line: /app/emby/EmbyServer.dll -programdata /config -ffdetect /app/emby/ffdetect -ffmpeg /app/emby/ffmpeg -ffprobe /app/emby/ffprobe
    Operating system: Unix 4.19.56.0
    64-Bit OS: True
    64-Bit Process: True
    User Interactive: True
    Runtime: file:///app/emby/System.Private.CoreLib.dll
    Processor count: 2
    Program data path: /config
    Application directory: /app/emby
    SQLitePCL.pretty.SQLiteException: Corrupt: database disk image is malformed
    SQLitePCL.pretty.SQLiteException: Exception of type 'SQLitePCL.pretty.SQLiteException' was thrown.
    at SQLitePCL.pretty.SQLiteException.CheckOk(sqlite3 db, Int32 rc)
    at SQLitePCL.pretty.StatementImpl.MoveNext()
    at SQLitePCL.pretty.DatabaseConnection.ExecuteAll(IDatabaseConnection This, String sql)
    at Emby.Server.Implementations.Data.SqliteItemRepository.Initialize(SqliteUserDataRepository userDataRepo, IUserManager userManager)
    at Emby.Server.Implementations.ApplicationHost.InitDatabases()
    at Emby.Server.Implementations.ApplicationHost.Init()
    at EmbyServer.HostedService.StartAsync(CancellationToken cancellationToken)
    Source: SQLitePCL.pretty
    TargetSite: Void CheckOk(SQLitePCL.sqlite3, Int32)
    
    Info Main: Shutdown complete
    Info Main: Application path: /app/emby/EmbyServer.dll

     

    Quote

     

     

    Edited by gareth_iowc
    Link to comment
    14 hours ago, TheBuz said:

    +1

    +1 no corruption on my server running 6.7.2 with appdata on the cache nvme

     

    On my hetzner hosted server with no cache disks; had corruption in 2 days after 6.7.2 upgrade, went back to 6.6.7 and no corruptions since

    Link to comment

    Here also. Corrupted anytime I hit it with an io load (ie rebuilding Plex from scratch) less corruption in sonarr and radarr but still present.

     

    No corruption when I moved back to 6.6.7. hardware forced me back to 6.7.2 and got corruption again. Nothing since I moved appdate to an unnasigned nvme. Going on a few weeks.

    Link to comment

    It's only emby he has the problem with.  currently moved to 6.7.3-rc2 and it's still a problem.

     

    Takes about a week for it to happen 

    Link to comment

    I tried having the appdata on disk1 using 6.7.2....when that corrupted, I rebuilt by having appdata on cache drive while still running 6.7.2 and binhex-plexpass.  Rebuilt again on cache and rolled back to 6.6.7 about 4 weeks or so ago and still using binhex-plexpass.  Not a problems since with any corruption issues.  Afraid to try 6.7.3-rc2 since everything is going so well with 6.6.7.  I have a backup tower that is running 6.7.2 that I could run 6.7.2-rc3 and then load plex to see what happens.  Won't matter on the backup if DB corrupts.  I'll consider and let you know if I do and the outcome!

    Edited by isrdude
    Link to comment

    FYI, I have been experiencing the same issue with plex, sonarr and radarr. I didn't actually realize that this was a known issue, I am just updating to 6.7.2 today because I'm lazy. I'll get updated on this thread and help out with testing where I can.

    Link to comment
    On 9/5/2019 at 6:33 AM, GHunter said:


    I ran across this problem too and found that is was related to this problem "[6.7.x] Very slow array concurrent performance by Johnnie.Black" that has been reported.

    I don't run plex but do have several other dockers that use sqlite and have never had a problem. Appdata is set to "cache only"

    I also think these two are related. Experiencing both on 6.7.2

    Link to comment

    6.7.2 on a Unraid machine I built over this past weekend.  Ombi went corrupt at some point yesterday.  Frustrating after I just set it all up.

    /appdata is on the array and does NOT use the cache. 

     

    Changing whether /appdata is on the cache drive or not will briefly fix this issue but then goes right back to being corrupt.

    Edited by technologiq
    Link to comment
    2 hours ago, technologiq said:

    Changing whether /appdata is on the cache drive or not will briefly fix this issue but then goes right back to being corrupt.

    Are you saying you see new corruption even with all SQLite databases are located on 'cache'?

    Link to comment

    Been fighting with this for a good 3 months after having unraid run briskly over a couple of years, had no idea it was a wide spread issue. Have restored radarr, sonarr and plex from backups and started them fresh just to see database corruption again and again. Any diagnostics or tests you'd like me to run that could help troubleshooting?

    Edited by gugahoi
    Link to comment

    Just wanted to chime in.  I've always had my Docker image and configs on the cache drive and never experienced this issue.  I've been on 6.7.2 since its release and used most of not all of the affected versions.

    Link to comment
    11 hours ago, gugahoi said:

    Been fighting with this for a good 3 months after having unraid run briskly over a couple of years, had no idea it was a wide spread issue. Have restored radarr, sonarr and plex from backups and started them fresh just to see database corruption again and again. Any diagnostics or tests you'd like me to run that could help troubleshooting?

    I would imagine that Tom would like to know your answer to this question:

    On 9/13/2019 at 1:58 PM, limetech said:

    Are you saying you see new corruption even with all SQLite databases are located on 'cache'?

     

    Link to comment

    I'm not sure what the state of things is, so I'll just write out what my setup is. Could someone maybe summarize some things to test?

     

    I am still seeing regular corruption. My plex DB has corrupted twice in the last 12 hours.

     

    I do not have a cache disk.

    I have a share called data and all of my docker images map directories from there. For example, I am using the linuxserver/plex image and it maps /config to /mnt/user/data/docker/plex/config

    EDIT: I am running 6.7.2

     

    Should I move things around?

     

    Thanks!

    Edited by cnicholls
    Link to comment
    17 minutes ago, cnicholls said:

    Should I move things around?

    If by that you mean add a cache disk and use it for the config folders, yes.

     

    Without a cache disk, I don't think there is anything you can do right now except wait for a new release or downgrade to 6.6.7

    Link to comment

    actually even for the appdata on cache, there will be 2 setup

    1. map directly to /mnt/cache/xxx

    2. map to user share /mnt/user/xxx with setup "use cache disk" only

     

    by testing on this 2 setup, we might be able to isolate if the issue is fuse related or not

    Edited by trott
    • Like 1
    Link to comment
    54 minutes ago, trott said:

    actually even for the appdata on cache, there will be 2 setup

    1. map directly to /mnt/cache/xxx

    2. map to user share /mnt/user/xxx with setup "use cache disk" only

     

    by testing on this 2 setup, we might be able to isolate if the issue is fuse related or not

     

    FWIW I'm following #2 and am not seeing corruption issues.

    Link to comment
    5 minutes ago, JamieK said:

     

    FWIW I'm following #2 and am not seeing corruption issues.

    Another factor would SSD or HD cache drive(s). 

    Link to comment

    frankly speaking, I have test both with plex docker, both has no issue to finish scanning my 400 movies 

    Link to comment
    4 hours ago, trott said:

    actually even for the appdata on cache, there will be 2 setup

    1. map directly to /mnt/cache/xxx

    2. map to user share /mnt/user/xxx with setup "use cache disk" only

     

    by testing on this 2 setup, we might be able to isolate if the issue is fuse related or not

    thanks. I'll work on this and report back. I guess it'll be a couple days

    Link to comment

    I was using 6.7.2 with lsio/plex mapped to /mnt/user/appdata with appdata share set a single disk in the array (#5).

    This setup really decreased the occurrence of sqlite db corruption for me, but not 100%.

    Along with the Slow Array Performance problem, I was finally fed up with all of this, and yesterday downgraded to 6.6.7.

     

    I was thinking about converting another system to unraid for awhile now, but since these issues are show stoppers for me, I almost gave up hope completely... It is now just a habit of mine to check the forums (these 2 topics) every once in awhile...

    Link to comment

    Just to rule this out as the causes...

     

    Those with Plex corruption, are you using Graphics Hardware Acceleration for Transcoding? Anyone getting the corruption and not using GPU acceleration?

     

    Also, anyone with the corruptions tried running with CPU Security Mitigations disabled via the additional plugin and still getting corruptions?

    Link to comment

    I have Plex in a docker with no hardware acceleration and I was having corruption before moving appdata to an unassigned drive.

    Link to comment



    Guest
    This is now closed for further comments

  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.