• SQLite Data Corruption testing


    limetech

    tldr: Starting with 6.8.0-rc2 please visit Settings/Disk Settings and change the 'Tunable (scheduler)' to 'none'.  Then run with SQLite DB files located on array disk shares and report whether your databases still become corrupted.

     

    When we first started looking into this issue one of the first things I ran across was this monster topic:
    https://bugzilla.kernel.org/show_bug.cgi?id=201685

    and related patch discussion:
    https://patchwork.kernel.org/patch/10712695/


    This bug is very very similar to what we're seeing.  In addition Unraid 6.6.7 is on the last of the 4.18 kernels (4.18.20).  Unraid 6.7 is on 4.19 kernel and of course 6.8 is on 5.3 currently.  The SQLite DB Corruption bug also only started happening with 4.19 and so I don't think this is coincidence.

    In looking at the 5.3 code the patch above is not in the code; however, I ran across a later commit that reverted that patch and solved the bug a different way:
    https://www.spinics.net/lists/linux-block/msg34445.html

    That set of changes is in 5.3 code.

    I'm thinking perhaps their "fix" is not properly handling some I/O pattern that SQLite via md/unraid is generating.

     

    Before I go off and revert the kernel to 4.18.20, please test if setting the scheduler to 'none' makes any difference in whether databases become corrupted.

    • Like 1
    • Thanks 3



    User Feedback

    Recommended Comments



    1 hour ago, Scorpionhl said:

    Could you elaborate on the long term fix? should I be adding this md command to my go file?

    No 6.8.0-rc5 will have permanent fix.

    • Like 1
    Link to comment

    A little over a month ago I was having daily SQL errors with my Plex / Sonarr setup. I had to completely stop adding any new media and have been following this and many other threads looking for a fix. Last night I updated to 6.8.0-rc5 and have been really stressing the database today with TV and Movie updates over the past few months. So far no errors, thank you!!!

    Edited by bbolinger
    • Like 1
    Link to comment

    Just installed 6.8.0-rc5 and I have to say all of the major show-stopper issues I had with 6.7 have been resolved (minus a few small bugs). Performance is really good and no corruption yet. Thank you guys for all the hard work you put into figuring this out!

    • Thanks 1
    Link to comment
    On 10/31/2019 at 8:48 AM, limetech said:

    To not ever fail read aheads.

    So in the end, what was the actual bug here, and how did it manifest? I'm mostly wondering if there's anything libsqlite's doing that relies on particular implementation-defined kernel behavior that isn't actually guaranteed, in which case I'd want to report that to the sqlite devs with a description of a repro case.

    Link to comment
    16 hours ago, 11rcombs said:

    So in the end, what was the actual bug here, and how did it manifest? I'm mostly wondering if there's anything libsqlite's doing that relies on particular implementation-defined kernel behavior that isn't actually guaranteed, in which case I'd want to report that to the sqlite devs with a description of a repro case.

    The corruption occurred as a result of failing a read-ahead I/O operation with "BLK_STS_IOERR" status.

     

    In the Linux block layer each READ or WRITE can have various modifier bits set.  In the case of a read-ahead you get READ|REQ_RAHEAD which tells I/O driver this is a read-ahead.  In this case, if there are insufficient resources at the time this request is received, the driver is permitted to terminate the operation with BLK_STS_IOERR status.  Here is an example in Linux md/raid5 driver.

     

    In case of Unraid it can definitely happen under heavy load that a read-ahead comes along and there are no 'stripe buffers' immediately available.  In this case, instead of making calling process wait, it terminated the I/O.  This has worked this way for years.

     

    When this problem first happened there were conflicting reports of the config in which it happened.  My first thought was an issue in user share file system.  Eventually ruled that out and next thought was cache vs. array.  Some reports seemed to indicate it happened with all databases on cache - but I think those reports were mistaken for various reasons.  Ultimately decided issue had to be with md/unraid driver.  Our big problem was that we could not reproduce the issue but others seemed to be able to reproduce with ease.

     

    Honestly, thinking failing read-aheads could be the issue was a "hunch" - it was either that or some logic in scheduler that merged I/O's incorrectly (there were kernel bugs related to this with some pretty extensive patches and I thought maybe developer missed a corner case - this is why I added config setting for which scheduler to use).  This resulted in release with those 'md_restrict' flags to determine if one of those was the culprit, and what-do-you-know, not failing read-aheads makes the issue go away.

     

    What I suspect is that this is a bug in SQLite - I think SQLite is using direct-I/O (bypassing page cache) and issuing it's own read-aheads and their logic to handle failing read-ahead is broken.  But I did not follow that rabbit hole - too many other problems to work on :/

    • Like 3
    • Thanks 10
    Link to comment

    I have just updated to 6.8 RC5 and am in the process of reinstalling the Plex docker.  I've read that RC5 includes the fix, but are there any specific config items I need to set?

    Link to comment
    1 hour ago, Duniac said:

    Many have written that they have backed up their Plex database, can someone please point me to the location of this?

    Use the terminal tool built into Unraid.  Once running in a new window, you will need to know the appdata location for your system.  Mine is /mnt/disk1/appdata.  

     

    So I can cd to /mnt/disk1/appdata/PlexMediaServer/Library/Application Support/Plex Media Server/Plug-in Support/Databases. 

     

    com.plexapp.plugins.library.db is the main database.  You can make a copy of this by just doing a cp of this file to another file name.  Plex will make backup copies also...I believe every 3 days.  They should have the date appended to the name. 

     

    If you need to fall back to one of these, you have to stop the Plex docker, and then copy the file with date appended back to the name of the main database. 

     

     

    Link to comment
    2 minutes ago, Rich Minear said:

    Use the terminal tool built into Unraid.  Once running in a new window, you will need to know the appdata location for your system.  Mine is /mnt/disk1/appdata.  

     

    So I can cd to /mnt/disk1/appdata/PlexMediaServer/Library/Application Support/Plex Media Server/Plug-in Support/Databases. 

     

    com.plexapp.plugins.library.db is the main database.  You can make a copy of this by just doing a cp of this file to another file name.  Plex will make backup copies also...I believe every 3 days.  They should have the date appended to the name. 

     

    If you need to fall back to one of these, you have to stop the Plex docker, and then copy the file with date appended back to the name of the main database. 

     

     

    I use CA Backup / Restore Appdata, it can keep older versions aswell

    Link to comment
    5 hours ago, Duniac said:

    For previous Plex dockers I have spread it across all disks, should I limit it to only one disk?

    So that is a good question.  When I was 6.6.7, I had it spread across disks also.  I was asked to move it one disk as part of the testing.  I've had a couple of people tell me why, but I never really understood the reasoning. 

     

    My guess would be that if rc5 is stable like 6.6.7, it would not make any difference.  But someone else may have to chime in to school us on this.  🙂

     

    Link to comment

    Sounds like it was to eliminate disk spin up delays. If they're backup copies and not your live version, it doesn't matter how you store it or where.

    Link to comment
    1 minute ago, jbartlett said:

    Sounds like it was to eliminate disk spin up delays. If they're backup copies and not your live version, it doesn't matter how you store it or where.

    This is the appdata area.  All of the databases are there.  As for disk spin up delays...you have that if you are on a single disk also and the disk is quiet.  Unless it is an SSD

    Link to comment

    Just an FYI for this forum:  It has been 16 days since I have had any corruption with Plex.  That only happened with 6.6.7.  With anything newer, it would corrupt in less than a day.  6.8.0-rc4 and rc5 have been rock stable with the changes that were made.  I'm glad that I stuck with the testing, and was able to work so close with the Unraid team.  🙂

    • Like 1
    • Thanks 5
    Link to comment

    Thanks so much @Rich Minear for sticking it out and doing all that testing for the rest of us. So glad we've got a resolution and I can confidently start thinking about moving off 6.6.7

    • Like 1
    Link to comment

    So...for someone late to the party...I've seen sqlite errors in my plex logs somewhere but it's been a while and I can't remember how to look this up.  Where do I check, and...if I'm getting them...should I go rc6?  Is it stable enough.  Once there, how can I correct my possibly corrupt DBs?  Does anything else use sqlite that I should be aware of that's had this issue?  Sorry if this is all covered, but I'm still trying to get through all of this post and this info might make a good sticky of the tl;dr type.

    Link to comment

    Hey,

    I upgraded to 6.8.0-rc9 from 6.7.2 cause I started seeing the SQLite exceptions starting today. So I made the jump but for some reason, I'm still seeing these exceptions and seeing lots of " database disk image is malformed " in Sonarr

     

    Also my plex is not letting me see anything. Just see this in my dashboard: 624ab6102bfd17843696e3fb4803ec42

     

    Please help not sure what I'm doing wrong!!!!

    wakanda-diagnostics-20191210-1637.zip

    Link to comment

    You need to start with a known valid database without corruption, so if you dont have a backup you will have to start fresh and let plex build the database from scratch.

    • Thanks 1
    Link to comment
    27 minutes ago, mrtech213 said:

    I'm assume that I'll need to reinstall mainly all the dockers???

    Reinstalling them won't do anything. You need to shut them down and delete the appdata.

     

    Or restore a known good backup.

    Edited by Rick Gillyon
    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.