• SQLite DB Corruption testers needed


    limetech
    • Closed

    9/17/2019 Update: may have got to the bottom of this.  Please try 6.7.3-rc3 available on the next branch.

    9/18/2019 Update: 6.7.3-rc4 is available to address Very Slow Array Concurrent Performance.

     

    re:

     

    Trying to get to the bottom of this...  First we have not been able to reproduce, which is odd because it implies there may be some kind of hardware/driver dependency with this issue.  Nevertheless I want to start a series of tests, which I know will be painful for some since every time DB corruption occurs, you have to go through lengthy rebuild process.  That said, we would really appreciate anyone's input during this time.

     

    The idea is that we are only going to change one thing at a time.  We can either start with 6.6.7 and start updating stuff until it breaks, or we can start with 6.7.2 and revert stuff until it's fixed.  Since my best guess at this point is that the issue is either with Linux kernel, docker, or something we have misconfigured (not one of a hundred other packages we updated), we are going to start with 6.7.2 code base and see if we can make it work.

     

    But actually, the first stab at this is not reverting anything, but rather first updating the Linux kernel to the latest 4.19 patch release which is 4.19.60 (6.7.2 uses kernel 4.19.55).  In skimming the kernel change logs, nothing jumps out as a possible fix, however I want to first try the easiest and least impactful change: update to latest 4.19 kernel.

     

    If this does not solve the problem (which I expect it won't), then we have two choices:

     

    1) update to latest Linux stable kernel (5.2.2) - we are using 5.2 kernel in Unraid 6.8-beta and so far no one has reported any sqlite DB corruption, though the sample set is pretty small.  The downside with this is, not all out-of-tree drivers yet build with 5.2 kernel and so some functionality would be lost.

     

    2) downgrade docker from 18.09.06 (version in 6.7.2) to 18.06.03-ce (version in 6.6.7).

    [BTW the latest Docker release 19.03.00 was just published today - people gripe about our release numbers, try making sense of Docker release numbers haha]

     

    If neither of those steps succeed then ... well let's hope one of them does succeed.

     

    To get started, first make a backup of your flash via Main/Flash/Flash Backup, and then switch to the 'next' branch via Tools/Upgrade OS page.  There you should see version 6.7.3-rc1

     

    As soon as a couple people report corruption I'll publish an -rc2, probably with reverted Docker.

    Edited by limetech

    • Upvote 5



    User Feedback

    Recommended Comments



    9 hours ago, Deazo said:

    After reversing to 6.6.7 more than a month ago, things are rock solid, handling big database and many downloads, not a single corruption.

    I am sticking to this version until a true fix comes out.

    It is frustrating to not benefit from any update of a product I payed for, but I am confident the team is hard at work trying to make us all happy again.

    I just spent two hours fixing databases on my system, after downgrading to 6.6.7.  I want to help...but I can't keep rebuilding all the time.  And I'm heading out on vacation...I need it to be solid. 

    Link to comment

    I can 100% confirm I never had any database corruption while my appdata was on a cache drive (and running the latest stable Unraid 6.7.x). During this time, I was running both Plex and Sonarr concurrently. I had upgraded my array 2 months ago to use just NVMe drives without a cache and after the upgrade I was only using Plex and not Sonarr. Granted I had Sonarr running, but it was not doing anything since I did not have my indexers setup. Everything was working perfectly with my appdata on /mnt/disk1. Now a few days ago, I decided to get Sonarr back up and running. Within an hour (while performing tasks in both Sonarr and Plex), I had database corruption on both my Plex and Sonarr databases. Everytime i restored a backup, it would inevitably become corrupted again. Since then, I had to disable Sonarr to prevent whatever is causing SQLite to misbehave. I have not tried any of the latest RC releases but I'll give it a shot this weekend.

    Link to comment

    Hello Guys. 

     

    I wasn't having daily corruption but some sporadic ones. I've updated to 6.7.3-rc4 and restored everything. So far no issues.

     

    I'll keep you updated here with diag logs if anything happens.


    Regards

    • Thanks 1
    Link to comment

    I just had my Plex database corrupt irreversibly with no backup (my own fault I know) on rc4. Echoing what others have said, I tried to stick it out for as long as possible to help contribute my reports in order to get a resolution, but it's taking up too much time. I will be reverting back to the last stable version to get a break from this issue.

    Link to comment

    I posted here: 

     


    What can i do to help fix the problem? This docker (SQLite Database) like get corrupted in hours...


    Never had problems with radarr, sonarr, plex or any other docker so far.

    Edited by nuhll
    Link to comment

    I've been running rc4 since it came out and things were stable but unfortunately I got corruption this evening in both my Radarr and Sonarr containers.

    Link to comment

    I am on RC4 and yesterday I saw some strange behaviour from my sonarr docker.

     

    I have the share set up correctly for media however sonarr kept trying to write to a full drive.

     

    The drive had about a gigabyte free and it would delete about 2 gig then write more than that until the drive was 100-percent full then delete and go then maxing out the read and writes on the drive making the whole system going to a halt.

     

    There is plenty of space on other drives but the sonarr docker was fixated with this one full drive.

     

    I had to use unblancer to free up some space and then Sonarr finished whatever it was writing in about 10 seconds. very strange.

    Edited by TheBuz
    Link to comment
    On 9/23/2019 at 4:24 PM, jonathanm said:

    In that case, I must warn you that the path you list is not on the array at all, but rather in RAM, and will not survive a reboot.

    oh yea, that was a type, it is /mnt/user/ please disregard.

    Link to comment

    I had the same problem : corruption in both radarr and sonarr in 6.7.2.

    I tried to move my appdata dir to the cache but the corruption would happen nevertheless, so I rolled back to 6.6.7.

     

    Then I also moved my system share to the cache (containing docker and libvirt). My docker img was damaged so I deleted it and created a new one. Then I updated to 6.7.2.

    That was 3 weeks ago and I never had a db corruption ever since, so the problem seems fixed for me.

     

    I hope that helps.

    • Thanks 1
    Link to comment
    On 9/27/2019 at 4:11 PM, rzeeman711 said:

    I just had my Plex database corrupt irreversibly with no backup (my own fault I know) on rc4. Echoing what others have said, I tried to stick it out for as long as possible to help contribute my reports in order to get a resolution, but it's taking up too much time. I will be reverting back to the last stable version to get a break from this issue.

    I had my Plex DB corrupt as well. On a cache drive, on rc4.. Which I don't believe should ever happen.. I think it may have happened during the upgrade of the container overnight though.

    Link to comment

    I wonder why no docker (plex, radarr, sonarr, and many others) has this problem for me atleast, except the new storjv3 docker...

    Link to comment

    I recently updated to 6.7.3-rc4 (and docker data is on a single disk). What I have noticed is that Plex is running without problems at the moment. Sonarr is a weird beast though, it's still going corrupt. Something I noticed is that after restoring a backup, that Sonarr could get back up, but in the logs still complain about database corruption.

     

    I wonder if my Sonarr problems on 6.7.3-rc4 could be that I should just start from 0 with that database. Restoring a backup might not give me a fully working database, because I don't know where the smallest corruption started and I have tons of backups, but most of them are from the 6.7.2 install and earlier (where docker data was still on multiple disks).

    Link to comment

    I had a week of much-needed vacation, and my dockers/db's were rock solid on 6.6.7 while I was gone. 

     

    Has there been any movement on this?  Any new ideas or changes in code?

    Link to comment

    @Rich Minear When you downgraded, did you need to do anything else to the dockers to make them work again? I downgraded to 6.6.7 recently when I realized this issue (a little late to the party) but I am still seeing problems in my Plex libraries and Sonarr. I don't think its a full corruption since people seem to need to rebuild after that, but some libraries wont play media or load metadata from the web like poster art and Sonarr won't go past the first loading screen. Just trying to figure out all the right steps to take here. 

     

    Thanks

    Link to comment

    There were a couple of things that I did.  Usually....I did not have to do a full blown rebuild of Plex.  I would have backup databases that I could fall back on.  And I also found the information on how to rebuild the database without losing all of my plex data. 

     

    I have a shell script that does the following:  (must be done with the docker image stopped, and you have to go to the location of the Plex database files on the file system)

     

    cp com.plexapp.plugins.library.db com.plexapp.plugins.library.db.original
    sqlite3 com.plexapp.plugins.library.db "DROP index 'index_title_sort_naturalsort'"
    sqlite3 com.plexapp.plugins.library.db "DELETE from schema_migrations where version='20180501000000'"
    sqlite3 com.plexapp.plugins.library.db "PRAGMA integrity_check"

    sqlite3 com.plexapp.plugins.library.db .dump > dump.sql
    rm com.plexapp.plugins.library.db
    sqlite3 com.plexapp.plugins.library.db < dump.sql
    chown nobody com.plexapp.plugins.library.db
    chgrp users com.plexapp.plugins.library.db

     

    Occasionally when you have a database that really has issues, you will end up with a dump.sql file that has issues then also.  I found that when you pull the dump.sql file back into an empty database, you will see errors, and then the db will have a zero byte file size.  Those error messages will point to line numbers in the dump.sql file.  You can go in with a text editor and delete those lines.  And the last line in the file may be set to rollback.  You have to change that also.  But then you can re-import the file, and it gets your db back at least to a point where the Plex server can take over and get things back to normal. 

     

    There is also a database tool built into Plex.  Its under SETTINGS, TROUBLESHOOTING.  There is a button that says "Optimize Database"

     

    When I was having the corruption, I would run that shell script (If i didn't have a backup to fall back on), bring Plex back up.  I would let it catch up with any new TV shows or Movies that it found on the disk that might not have been in the database.  And then I would run that Optimize action.  Once it was done....I would stop Plex, and make a "known good" backup of the database (copying it a the shell level) so I had a place to start again if it failed.  

     

    The script would also work on Sonarr (or Radarr).  It just has to be tweaked to run for it's database instead of Plex.  You can lose the first three lines that are specific to the Plex DB schema. 

     

    Since I have been back on 6.6.7, and have a clean database....I have had ZERO corruption:  no matter how busy my Plex and Sonarr get. 

     

    I know it is a hard thing to troubleshoot, and they are working to find it...but it is something to do with the later versions of Unraid, or the utilities that it contains.  6.6.7 is rock solid, not matter what version of Plex and Sonarr I am running. 

     

     

     

     

    Link to comment

     

    On 10/1/2019 at 1:18 PM, Rick Gillyon said:

    Regress to 6.6.7 or make sure all appdata is on a cache drive.

    Was this an answer to me?

     

    Does the corruption only happens if not on cache drive?

    Link to comment

    You may have to read through the previous pages.  I'm pretty sure that I saw that some people were having issues even with appdata on a cache drive. 

    Link to comment
    9 minutes ago, nuhll said:

    Was this an answer to me?

     

    Does the corruption only happens if not on cache drive?

    Yes, to you. There have been one or two reporting corruption on cache, but very rare. It's possible those were caused by moving corrupt databases to cache. Cache certainly seems safest.

    Link to comment

    Hi all

    Has anyone tried unRaid 6.8.0 rc1  yet, to see if the corruption issue is fixed.

    Release notes don't mention anything about fixing this problem, but one can only hope

     

     

    Link to comment
    9 hours ago, simalex said:

    Release notes don't mention anything about fixing this problem, but one can only hope

    It should be tried by anyone having the issue, it could be fixed by the md driver changes introduced to deal with the bad v6.7 array performance, there's also a much newer kernel and many other changes.

    • Thanks 1
    Link to comment



    Guest
    This is now closed for further comments

  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.