• SQLite DB Corruption testers needed


    limetech
    • Closed

    9/17/2019 Update: may have got to the bottom of this.  Please try 6.7.3-rc3 available on the next branch.

    9/18/2019 Update: 6.7.3-rc4 is available to address Very Slow Array Concurrent Performance.

     

    re:

     

    Trying to get to the bottom of this...  First we have not been able to reproduce, which is odd because it implies there may be some kind of hardware/driver dependency with this issue.  Nevertheless I want to start a series of tests, which I know will be painful for some since every time DB corruption occurs, you have to go through lengthy rebuild process.  That said, we would really appreciate anyone's input during this time.

     

    The idea is that we are only going to change one thing at a time.  We can either start with 6.6.7 and start updating stuff until it breaks, or we can start with 6.7.2 and revert stuff until it's fixed.  Since my best guess at this point is that the issue is either with Linux kernel, docker, or something we have misconfigured (not one of a hundred other packages we updated), we are going to start with 6.7.2 code base and see if we can make it work.

     

    But actually, the first stab at this is not reverting anything, but rather first updating the Linux kernel to the latest 4.19 patch release which is 4.19.60 (6.7.2 uses kernel 4.19.55).  In skimming the kernel change logs, nothing jumps out as a possible fix, however I want to first try the easiest and least impactful change: update to latest 4.19 kernel.

     

    If this does not solve the problem (which I expect it won't), then we have two choices:

     

    1) update to latest Linux stable kernel (5.2.2) - we are using 5.2 kernel in Unraid 6.8-beta and so far no one has reported any sqlite DB corruption, though the sample set is pretty small.  The downside with this is, not all out-of-tree drivers yet build with 5.2 kernel and so some functionality would be lost.

     

    2) downgrade docker from 18.09.06 (version in 6.7.2) to 18.06.03-ce (version in 6.6.7).

    [BTW the latest Docker release 19.03.00 was just published today - people gripe about our release numbers, try making sense of Docker release numbers haha]

     

    If neither of those steps succeed then ... well let's hope one of them does succeed.

     

    To get started, first make a backup of your flash via Main/Flash/Flash Backup, and then switch to the 'next' branch via Tools/Upgrade OS page.  There you should see version 6.7.3-rc1

     

    As soon as a couple people report corruption I'll publish an -rc2, probably with reverted Docker.

    Edited by limetech

    • Upvote 5



    User Feedback

    Recommended Comments



    Despite downgrading to 6.6.7 the upgrade to -rc1 was painless. All containers/vms started successfully. I'll throw some stress tests at the system tomorrow and otherwise keep my fingers crossed.

    Link to comment

    Hey, if it helps, I noticed on my Plex. It happened when i optimized the database and deleted bundles.ud assume anytime the scheduled talks does either or both of those it's corruots the db.

    Link to comment

    I upgraded to -rc1 last night and already have plex database corruption. After the upgrade, I added a bunch of media through sonarr/radarr/plex and noticed the corruption today. Not sure what else I need to report here.

    Link to comment
    16 minutes ago, rzeeman711 said:

    I upgraded to -rc1 last night and already have plex database corruption. After the upgrade, I added a bunch of media through sonarr/radarr/plex and noticed the corruption today. Not sure what else I need to report here.

    Thank you for the report.  Are you fairly sure this is new corruption and not perhaps has already been there and just seeing now?

    Link to comment

    Pretty sure. I didn't check the actual databases for signs of corruption with sqlite before updating, but I did restart my Plex/Sonarr/Radarr dockers and check the logs for corruption errors. I also did the same check after upgrading to -rc1

    Link to comment

    I have been deleting the current db, go to the most recent backup, copy/paste as the new db and restart the container. This method has worked best for me. I have never been able to fix the db successfully, and rebuilding from scratch hasn't seemed to improve time between corruption.

    Link to comment

    Maybe entirely silly question and not related at all, but for everyone who has the SQLLite Corruption what filesystem are you using on the drives where the DB resides? Is everyone with corruption using BTRFS? Anyone experiencing corruption using XFS on their drives? Maybe this is like the pesky issue dealing with COW or RFS years ago?

    Link to comment

    Another path for investigation, all of the apps I've heard mentioned experiencing corruption are running mono/dotnetcore.

    Link to comment
    14 hours ago, BRiT said:

    Maybe this is like the pesky issue dealing with COW or RFS years ago?

    I considered that but latest reports indicate this was solved, however it remains a possibility.  The next test is going to be reverting Docker.

    • Upvote 1
    Link to comment

    I just had my first ever instance of sabnzbd corruption today. My appdata is /mnt/user/appdata/sabnzbd and my data disks are all formatted as xfs. 

    2019-07-24 12:20:52,064::ERROR::[database:142] Damaged History database, created empty replacement
    2019-07-24 12:22:12,507::ERROR::[database:142] Damaged History database, created empty replacement
    2019-07-24 12:23:17,987::ERROR::[database:142] Damaged History database, created empty replacement
    2019-07-24 13:00:17,803::ERROR::[database:142] Damaged History database, created empty replacement
    2019-07-24 13:00:17,823::ERROR::[database:142] Damaged History database, created empty replacement
    2019-07-24 13:00:18,162::ERROR::[database:154] SQL Command Failed, see log

     

    Link to comment

    6.7.3-rc2 is published now.  This reverts docker to version 18.06.3-ce.

    Tools/Update OS/Check For Updates

    • Like 1
    Link to comment

    So after installing a cache drive and pointing everything to /mnt/cache it seems the corruption has stopped. 

     

    I even downloaded multiple shows (around 50) and no issues. Normally if I process about 10 shows in say 10 min my plex is guarantee to be corrupted. 

     

    I guess this doesn’t really help the root cause just another confirmation using a cache can/may resolve it for some people. 

     

    Will update if i do get a corruption in the near future. 

    • Like 1
    Link to comment

    Installed 6.7.3-rc1 yesterday. No problem but installed 6.7.3-rc2 this morning just to be safe. Didn't have a problem with the last rc either where we had all the other reports of sqlite corruption. I am using a cache drive (nvme).

    Edited by T0rqueWr3nch
    Link to comment

    What I'm looking for is this:

    1. Someone who experienced DB corruption after upgrading from 6.6.7 to 6.7.2 and then doing nothing more than revert back to 6.6.7 (without changing any disk/share config or appdata mapping)
    2. That person or persons now installing 6.7.3-rc2 to see if corruption happens again.
    Link to comment

    I fit these criteria. I will attempt to install rc2 tonight when I get home and report back.

     

    Update 2019-07-26: Installing 6.7.3-rc2 as we speak and will run some tests on my Plex docker.

    Update 2019-07-26: First impression seems to indicate that things overall are running smoother than they were in 6.7.2, initiated a library scan of Plex to see if it triggers corruption and also adding new media to it. Will report back tomorrow.

    Edited by TXLZONE
    • Like 1
    • Upvote 1
    Link to comment

    Just a hint:

    I moved my appdata-directories (the ones who are using sqlite) to an external drive using the "Unassigned Devices"-plugin. Normally appdata is on /mnt/disk1/.

    Upgraded from 6.6.7 to 6.7.2 and have no database corruption since. I do not have a cache drive. I think the database corruption must be something to do with array-functionality.

     

    I will downgrade tonight and move the appadata-directories back to /mnt/disk1 and then upgrade to 6.7.3-rc2.

     

    Update 2019-07-28

    After one day I got a database corruption with 6.7.3-rc2

    Let me know what I can provide to help the troubleshooting.

    Edited by hpaar
    Update Status
    Link to comment

    Good morning, just experienced corruption on Plex running on 6.7.3-rc2. It seemed to have happened as soon as the Media Scanner had finished running. Here are the plex logs from when it corrupted at 2am:

    Jul 27, 2019 02:04:35.851 [0x147abd509700] DEBUG - Completed: [127.0.0.1:59984] 200 GET /identity (8 live) 0ms 386 bytes (pipelined: 1)
    Jul 27, 2019 02:04:43.709 [0x147abdb0c700] DEBUG - Jobs: '/usr/lib/plexmediaserver/Plex Media Scanner' exit code for process 2382 is 0 (success)
    Jul 27, 2019 02:04:43.721 [0x147a3fbfd700] DEBUG - Request: [127.0.0.1:59998 (Loopback)] GET /identity (8 live) Signed-in
    Jul 27, 2019 02:04:43.722 [0x147abd70a700] DEBUG - Completed: [127.0.0.1:59998] 200 GET /identity (8 live) 0ms 386 bytes (pipelined: 1)
    Jul 27, 2019 02:04:43.733 [0x147a8fbfd700] ERROR - SQLITE3:(nil), 11, database corruption at line 79998 of [bf8c1b2b7a]
    Jul 27, 2019 02:04:43.733 [0x147a8fbfd700] ERROR - SQLITE3:(nil), 11, database corruption at line 68430 of [bf8c1b2b7a]
    Jul 27, 2019 02:04:43.734 [0x147a8fbfd700] ERROR - SQLITE3:(nil), 11, statement aborts at 20: [select media_items.id as 'media_items_id', media_items.library_section_id as 'media_items_library_section_id', media_items.section_location_id as 'media_items_section_location_id', med
    Jul 27, 2019 02:04:46.171 [0x147a8fbfd700] ERROR - Thread: Uncaught exception running async task which was spawned by thread 0x147abd90b700: sqlite3_statement_backend::loadRS: database disk image is malformed
    Jul 27, 2019 02:04:51.783 [0x147a8f9fc700] DEBUG - Request: [127.0.0.1:60020 (Loopback)] GET /identity (8 live) Signed-in
    Jul 27, 2019 02:04:51.783 [0x147abd70a700] DEBUG - Completed: [127.0.0.1:60020] 200 GET /identity (8 live) 0ms 386 bytes (pipelined: 1)

    The other issues I was seeing with 6.7.2 are also back, which include higher CPU usage and SMB transfers that periodically pause at 0kb/s rate and then shoot up to 40MB/s and then down again. Let me know what I can provide to help the troubleshooting.

    Link to comment
    Quote

    Let me know what I can provide to help the troubleshooting.

    Please post diagnostics.zip  Tools/Diagnostics

    Link to comment

    r410-diagnostics-20190730-0703.zipLoaded 6.7.3-rc2 today around 5pm.  By 11pm, Plex database is corrupted.

     

    I am a new unraid user, installed 6.7.2 to fresh hardware.  Saw corruption from the beginning.  I changed from /mnt/user to /mnt/diskX/ and corruption still occured on 6.7.2.

    Edited by mi5key
    adding diagnostics
    Link to comment

    Updated to 6.7.3-rc2 yesterday (29th July) at around 20:00 GMT+0 and completely rebuilt my Plex databases from scratch, as of today (30th July) at about 13:00 GMT+0 I noticed that my Plex database was corrupt. Earlier that morning it was still rebuilding the TV library but now it doesn't work. The film side works but not the dashboard or the TV library.

     

    This is the fastest I've seen the corruption occur even when building from scratch rather than restoring from a backup database.

     

    Before upgrading to 6.7.3-rc2 I had changed from using /mnt/user to /mnt/diskX and that had worked for a while but i think it corrupted in the end (sorry it was a while ago)

     

    Have only seen Sonarr corrupt once.

     

     tower-diagnostics-20190730-1448.zip

    Link to comment

    Thank you for the reports.  We are monitoring and trying to come up with another plan to isolate the issue.

    Link to comment



    Guest
    This is now closed for further comments

  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.