• SQLite DB Corruption testers needed


    limetech
    • Closed

    9/17/2019 Update: may have got to the bottom of this.  Please try 6.7.3-rc3 available on the next branch.

    9/18/2019 Update: 6.7.3-rc4 is available to address Very Slow Array Concurrent Performance.

     

    re:

     

    Trying to get to the bottom of this...  First we have not been able to reproduce, which is odd because it implies there may be some kind of hardware/driver dependency with this issue.  Nevertheless I want to start a series of tests, which I know will be painful for some since every time DB corruption occurs, you have to go through lengthy rebuild process.  That said, we would really appreciate anyone's input during this time.

     

    The idea is that we are only going to change one thing at a time.  We can either start with 6.6.7 and start updating stuff until it breaks, or we can start with 6.7.2 and revert stuff until it's fixed.  Since my best guess at this point is that the issue is either with Linux kernel, docker, or something we have misconfigured (not one of a hundred other packages we updated), we are going to start with 6.7.2 code base and see if we can make it work.

     

    But actually, the first stab at this is not reverting anything, but rather first updating the Linux kernel to the latest 4.19 patch release which is 4.19.60 (6.7.2 uses kernel 4.19.55).  In skimming the kernel change logs, nothing jumps out as a possible fix, however I want to first try the easiest and least impactful change: update to latest 4.19 kernel.

     

    If this does not solve the problem (which I expect it won't), then we have two choices:

     

    1) update to latest Linux stable kernel (5.2.2) - we are using 5.2 kernel in Unraid 6.8-beta and so far no one has reported any sqlite DB corruption, though the sample set is pretty small.  The downside with this is, not all out-of-tree drivers yet build with 5.2 kernel and so some functionality would be lost.

     

    2) downgrade docker from 18.09.06 (version in 6.7.2) to 18.06.03-ce (version in 6.6.7).

    [BTW the latest Docker release 19.03.00 was just published today - people gripe about our release numbers, try making sense of Docker release numbers haha]

     

    If neither of those steps succeed then ... well let's hope one of them does succeed.

     

    To get started, first make a backup of your flash via Main/Flash/Flash Backup, and then switch to the 'next' branch via Tools/Upgrade OS page.  There you should see version 6.7.3-rc1

     

    As soon as a couple people report corruption I'll publish an -rc2, probably with reverted Docker.

    Edited by limetech

    • Upvote 5



    User Feedback

    Recommended Comments



    I bit the bullet and upgraded again from 6.6.7 (Stable as a rock for three weeks!) to 6.7.3-rc4.  I am kicking off some updates, downloads with Sonarr and seeing what happens.....  🤞

     

     

    Link to comment
    On 9/21/2019 at 4:23 PM, limetech said:

    Unraid OS 6.8 uses FUSE 3.6.2 however I don't think FUSE has anything to do with this issue.

    I get the feeling you might be ditching 6.7.2, and moving straight to 6.8, given all the changes you've talked about are for the latter and not the former.

     

    Probably not something you like answering, but are we days/weeks/months away from RCs starting for 6.8?

    Link to comment
    10 hours ago, dustinr said:

    /mnt/usr/specialappdata/

    1 hour ago, jonathanm said:

    typo?

    And why do you have 100G allocated to docker image? 20G should be more than enough, and when a user makes it as large as you have then I have to wonder if they have been filling it up. Have you?

     

    • Thanks 1
    Link to comment
    10 hours ago, dustinr said:

    just noticed corruption tonight,

    Also, your Unassigned Device has filesystem corruption. Are your dockers using it?

     

    • Thanks 1
    Link to comment

    I believe I had my first instance of Plex DB corruption last night on unRAID 6.7.2. I got the spinning wheel of death in the app and posted in the Plex forums before I found this thread. I restored from a backup this morning and after I did an optimize and clean bundles I could only access random files when trying. 

     

    I followed their article (https://support.plex.tv/articles/201100678-repair-a-corrupt-database/) to check/repair corruption and got an OK. Now I seem to be able to access everything again. 

     

    Last night I did dump the logs from Plex, but I don't see anything in it for errors with SQLITE3. Do I need to turn on verbose logging in Plex?

     

    My mapping for Plex is as follows:

     

    /transcode -> /mnt/user/appdata/PlexMediaServer/transcode
    /data -> /mnt/user/
    /config -> /mnt/user/appdata/PlexMediaServer

     

    And /mnt/user/appdata is mapped to cache (SSD)

     

    Diagnostics are attached. Is it safe to try RC4 or no?

     

    tower-diagnostics-20190923-1353.zip

    Edited by acurcione
    Link to comment

    I honestly can't tell if the problem I'm seeing is corruption or not. I've restored the DB several times from several backups and I get the same results with some files working and some that don't. The Plex logs show something funny going on, but nothing to do with the DB.

    Link to comment
    8 hours ago, trurl said:

    And why do you have 100G allocated to docker image? 20G should be more than enough, and when a user makes it as large as you have then I have to wonder if they have been filling it up. Have you?

     

    The /mnt/usr/specialappdata is not a typo, previously people talked about isolating the appdata to a single disk, so i created a share just for that purpose, and it only lives on DISK1.

     

    The 100G docker image was from a misconfigured container install years ago that i just haven't got around to resizing.

    Link to comment
    8 hours ago, trurl said:

    Also, your Unassigned Device has filesystem corruption. Are your dockers using it?

     

    nah, that's just a usb backup drive.

    Link to comment
    17 minutes ago, acurcione said:

    I honestly can't tell if the problem I'm seeing is corruption or not. I've restored the DB several times from several backups and I get the same results with some files working and some that don't. The Plex logs show something funny going on, but nothing to do with the DB.

    The Plex logs would show if you had a db problem.  You would see lines that show "malformed" db entries and lines. 

    Link to comment
    5 minutes ago, Rich Minear said:

    The Plex logs would show if you had a db problem.  You would see lines that show "malformed" db entries and lines. 

    Thanks for that. I don't see any of that in any of the logs I've dumped. So now I have no freaking clue what's going on.

     

    At least it isn't this I guess.

    Link to comment
    37 minutes ago, dustinr said:

    /mnt/usr/specialappdata is not a typo

    In that case, I must warn you that the path you list is not on the array at all, but rather in RAM, and will not survive a reboot.

    • Like 1
    Link to comment
    1 hour ago, dustinr said:

    The /mnt/usr/specialappdata is not a typo, previously people talked about isolating the appdata to a single disk, so i created a share just for that purpose, and it only lives on DISK1.

    I think what you want to do is create a share specialappdata and assign disk1 to it. Then use the path /mnt/user/specialappdata. /mnt/usr is in RAM and you’ll lose it every time you reboot your server. Or better yet, use /mnt/disk1/specialappdata.

    Link to comment

    I went through and rebuilt my databases from scratch for Plex, sonarr, and radarr the other day and I am now experiencing some weird behavior with the sonarr database. It is corrupting with similar messages in the logs, but the container can be restarted, unlike before rc3. The app is generally responsive, but I have noticed that it won't reliably interact with the download client. After a while, the sqlite errors return to the logs. I'm not really sure what to do at this point but I can provide my diagnostics again.

    serenity-diagnostics-20190924-0250.zip

    Link to comment

    I've just tested on 6.7.3-rc4 and confirmed that I can still reproduce the mover issue I described here:

    Is it possible that this issue is actually just a manifestation of that one? That would be consistent with what I've seen with that issue in the past.

     

    Has anyone seen this issue on a setup with no cache drive, or a share set not to use it (or to use it exclusively)?

    Link to comment
    4 hours ago, 11rcombs said:

    I've just tested on 6.7.3-rc4 and confirmed that I can still reproduce the mover issue I described here:

    Is it possible that this issue is actually just a manifestation of that one? That would be consistent with what I've seen with that issue in the past.

     

    Has anyone seen this issue on a setup with no cache drive, or a share set not to use it (or to use it exclusively)?

    I do not have a cache drive. I have seen the corruption issues for a couple of months, and originally started the former thread about having to downgrade to 6.6.7 because of the corruption in the databases for Plex and Sonarr.

    Link to comment
    15 hours ago, rzeeman711 said:

    I went through and rebuilt my databases from scratch for Plex, sonarr, and radarr the other day and I am now experiencing some weird behavior with the sonarr database. It is corrupting with similar messages in the logs, but the container can be restarted, unlike before rc3. The app is generally responsive, but I have noticed that it won't reliably interact with the download client. After a while, the sqlite errors return to the logs. I'm not really sure what to do at this point but I can provide my diagnostics again.

    serenity-diagnostics-20190924-0250.zip 123.72 kB · 0 downloads

    Interesting update here. Because sonarr was still partially functional, I actually just ignored the corruption messages in the logs and just kept it running. Surprisingly this morning I checked sonarr and it was 100% functional with no additional errors in the logs. I tried searching for media, downloading, adding to Plex, etc and it all worked fine. Going to just let it ride and assume it's all good for now

    Link to comment

    Good morning.  So I fixed the corrupt databases last night right after I posted, and then went to bed.  There was no corruption when I turned off my computer (not the NAS...obviously).  Within an hour, there was corruption in my plex database. 

     

    Unless someone has something meaningful to try based on my diagnostics, I'm going back to 6.6.7 tonight.  The entire system was stable for almost a MONTH after going back....and it corrupted in the first day when trying this new release candidate. 

    Link to comment

    After reversing to 6.6.7 more than a month ago, things are rock solid, handling big database and many downloads, not a single corruption.

    I am sticking to this version until a true fix comes out.

    It is frustrating to not benefit from any update of a product I payed for, but I am confident the team is hard at work trying to make us all happy again.

    Edited by Deazo
    Link to comment



    Guest
    This is now closed for further comments

  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.