SQLite DB Corruption testers needed

dustinr · August 12, 2019

sonarr decided to shit the bed now too... restoring from backup now... what further testing can i do ? im tempted to roll back, but i would love to help fix this bug..

also be advised, im having the same issues as described in this thread as well:

let me know how to proceed.

TheBuz · August 12, 2019

Have you guys found any clues yet? This is very bizarre

dustinr · August 13, 2019

Just a quick update I rolled back to unraid6.6.7, and turned my parity drive into a data drive(my thought process is that I have been getting such terrible IO speeds that maybe the parity is my issue...) Gonna reload all these docker images one by one and rebuild the DB and let you guys know how it goes.

-DR

Rich Minear · August 13, 2019

I'm about ready to roll back to it also. So far, it doesn't seem that we are making any headway with the Sqlite issue, and I'm tired of explaining to the family why I have to spend some time rebuilding the database so they can watch things.

TheBuz · August 13, 2019

I have had my dockers on the cache since the 9th (of August) and no corruption yet it used to happen at least once a day, I have CA BACKUP AND RESTORE running every 12 hours so if it corrupts it's a 2 minute fix to restore.

FYI, SABnbzd has never corrupted in any configuration, ever for me.

Edited August 13, 2019 by TheBuz

Rich Minear · August 13, 2019

Not everyone has a cache drive...or even the need for one.

TheBuz · August 13, 2019

Just now, Rich Minear said:

Not everyone has a cache drive...or even the need for one.

True, I don't really need a cache drive either, but i had a couple of 240gb SSDs gather dust from an old project.

But if there is a difference with the way data is handled cache vs array, there might be some clues in there as to why this is happening.

dustinr · August 13, 2019

Yea, i rolled back, AND removed my parity drive (for better performance / more space...), so im not really sure how much of a help i will be to you guys. But im going to keep a REAL close eye on my SQL for radarr/sonarr and will let you guys know if i see corruption on the 6.6.x branch as well.

thanks

-DCR

brainbone · August 14, 2019

Yikes! I'm behind the curve here. I updated to 6.7.2 from 6.6.6 a few days ago, before seeing this information.

I've not yet had any corruption, but I'm a little worried. Glancing over this and related threads, so far it seems like only those storing SQLite DBs directly on disk rather than on SSD/Cache are seeing this issue. Is that correct, or are there reports of people storing appdata on cache with this issue?

Having all my dockers in appdata, and having appdata set to cache only, am I immune to this issue, or should I roll back to 6.6.7?

Badams · August 14, 2019

Alright, so I did have 6.7.3rc2 installed and still had the same issues... So, I've rolled back to 6.6.7

How can I help? What would you like me to do?

Rich Minear · August 15, 2019

Since I've had 6.7 installed, and now 6.7.3rc2, I don't have the option in my gui to roll back to 6.6.7.

How do I do it manually?

wgstarks · August 15, 2019

16 minutes ago, Rich Minear said:

How do I do it manually?

Backup your flash (just in case). Download 6.7.7. Unpack the zip. Replace the bz files and syslinux on your flash with the ones in the downloaded folder. Reboot.

Sorry, typo 6.7.7 should have been 6.6.7.

Edited August 15, 2019 by wgstarks

gellux · August 15, 2019

so i hadn't actually had a problem whatsoever with 6.7.1 nor 6.7.2 until the other day, but i'm unsure whether my problem is the same as everyone else's. i was just doing a little "aesthetic maintenance" on plex - literally just adding a couple album covers to those missing it - and maybe 10 minutes after i went to put on a movie and the tower was unavailable. went to laptop and it confirmed it was unavailable. went to my unraid tab and the docker tab said Plex was "unhealthy" so i restarted it and it never started again. in the log it just keeps repeating Starting Plex Media Server over and over and over.

are those the same symptoms others have had?

my appdata is in /mnt/user/appdata if that helps. i haven't been able to do anything else since, so i haven't reverted to 6.6.7 or anything, and i don't know when or if i will have the time to do so.

TheBuz · August 15, 2019

On 8/13/2019 at 8:21 PM, dustinr said:

Yea, i rolled back, AND removed my parity drive (for better performance / more space...), so im not really sure how much of a help i will be to you guys. But im going to keep a REAL close eye on my SQL for radarr/sonarr and will let you guys know if i see corruption on the 6.6.x branch as well.

thanks

-DCR

Did you see a performance increase in read speeds after rolling back.

Some people have report much faster read/write speeds after rolling back, and I would probably do it for this reason alone.

Are Dockers, VMs and Community Apps affected by downgrading?

isrdude · August 15, 2019

Where I noticed the performance gain was in the metadata matching! I switched to mounting dockers in cache drive first, before I rolled back. It took almost a full day for my movie collection to fetch all the metadata. After another crash, I kept dockers on the cache drive, rolled back to UNRAID 6.6.7. When I started rebuilding the libraries, my metadata downloading was a lot faster. I had same config as always in PLEX but I was then able to tag my movies, music, and tv shows all in under 12 hours. NEVER had this happen before. We're talking a 16TB library here. I know the cache drive helped quite a bit, but until the rollback, never saw rebuilds that were this fast.

simalex · August 16, 2019

OK

So I switched to

binhex docker for Sonarr (instead of linuxserver)

and plexinc docker for Plex (instead of limetech which was anyway deprecated)

Then I upgraded again to unRaid 6.7.2

I did not rebuild the databases, instead backed up appdata and pointed the new dockers to the old paths.

The system was running for almost 3 days straight without any SQLite corruption. However, for the first 3 days I did not do any heavy lifting. That is only few new TV episodes were added and those sporadically

Then I decided to force heavy load on both Sonarr & Plex by manual importing a full season.

So I imported 10 3.3GB episodes through Sonarr. What this effectively was doing was

i. Sonarr created a local copy of the file that was to be imported named .backup in the source dir

ii. Sonarr copies the file to the destination directory

iii. Once finished Sonarr deletes both the original and .backup from the source dir (my setup was to move the files)

iv. Sonarr notifies Plex of the change

v. Plex will start its own analysis of the new media file, and process it in order to create thumbnails etc.

In order to further load the system, at the same time I forced Sonarr to do a Series Refresh, which since my library is huge what trigger reads in at least three 8TB disks at the same time.

Results

binhex Sonarr docker (instead of linuxserver) is still ok, no corruption

Plex database (plex inc docker) was corrupted at some point when Plex detected a change in a directory time stamp and started re-scanning the library and at the same time analyzing the files for generating new thumbnails.

It is apparent that the corruption issue will only manifest when unRaid or the dockers are under load.

All my dockers have been set for more than a month with appdata directory in /mnt/disk1 as initially suggested (and that by itself resulted in an significant performance increase of the containers)

One other thing to note is that the since I don't have a cache drive, all my media is first placed in disk1 (same as where the SQLite DBs reside) and then from there are transferred to the target locations which are in various user shares /mnt/user directories. This puts an additional stress on disk1 during the import, as it is also used as a) storage during downloads, b) used in some of the user shares.

I will now downgrade to 6.6.x and then upgrade to 6.7.3rc2 (so that I have an easy fallback point) and try the above again.

simalex · August 17, 2019

Same results with 6.7.3-rc2.

After the upgrade everything worked properly for a while, no SQLite corruptions.

Almost one hour after starting the manual import in Sonarr, again using a set of 3-4GB media files, the corruption issue appeared. The only difference is that this time both Plex and Sonarr have database corruptions. First Sonarr DB was corrupted and a several minutes later so was Plex.

As far as I understand the corruption is happening when there is a heavy load on the unRaid server e.g. copying large files from one disk of the array to another.

As I mentioned in my previous post I am moving media files using manual import process of Sonarr from disk1 to other disks in the array. My TV Shows library is in a user share that spans several disks including disk1 and using high-water allocation method. Media files currently are getting copied to disk8 as that one has 3TB free. So effectively I have heavy file copying from disk1 to disk8, and at the same time Plex and Sonarr are updating their databases in disk1.

I am inclined to think that this is putting a strain on the parity drive because the heads are forced to do a lot of flying around for all the updates to be processed correctly.

For Sonarr when you start a manual import this is seems to be running in the background probably on a different thread. At the same time other scheduled tasks (e.g RSS scans, Series refresh etc) will still start in the background at the predefined times.

Similar for Plex, Sonarr will notify Plex that a new episode was uploaded and Plex will start a library re-scan. At the same time it will still run any other scheduled tasks (e.g. create thumbnails etc)

So if there are performance issues it is possible that there are some kind of time outs that are mishandled by SQLite and the result is a) the threads have a different "image" of the DB and any successful write after that could corrupt the actual DB file b) the on disk copy of the DB is inconsistent with the the in memory cached parts of the DB so again any write after that could end up corrupting the DB.

Why this problem is only manifesting in the latest version of unRaid, I can only speculate that even a slight change in a threshold value that got missed might increase the sensitivity of SQLite to any type of time-out.

If that is the case then people that are using a cache drive for storing appdata should not have a similar strain on the parity drive as writes to the cache drive don't update the parity, so the DB that is in the cache drive will be much more robust to this type of failure.

Are any of you that have the SQLite corruption issue have the appdata on a cache drive?

TheBuz · August 17, 2019

42 minutes ago, simalex said:

Are any of you that have the SQLite corruption issue have the appdata on a cache drive?

Dockers have been on 24/7 I moved appdata to the cache on the 9th of August. No corruption yet.

Sonarr, Radarr, Plex, SAB

Edited August 17, 2019 by TheBuz

dustinr · August 17, 2019

On 8/15/2019 at 1:59 AM, TheBuz said:

Did you see a performance increase in read speeds after rolling back.

Some people have report much faster read/write speeds after rolling back, and I would probably do it for this reason alone.

Are Dockers, VMs and Community Apps affected by downgrading?

WELL, i did see increased IO, but i think thats PRIMARILY because i deleted my PARITY drive and just made it part of my storage array. unRaid is performing GREAT NOW. and i haven't had any corruption in three days. I think all of my issues stem from the parity drive. is there anything on the road map for snapraid ? (or something similar..) I think my big bottleneck is writing parity drive data synchronously on OLD HARDWARE / OLD HARDDRIVES (2010-2018).

EDIT: In the interest of SCIENCE. I am upgrading my unraid from 6.6.7 to the new 6.7.3 rc2 and continuing to run without parity and without cache.

Edited August 17, 2019 by dustinr

Squid · August 17, 2019

2 hours ago, dustinr said:

is there anything on the road map for snapraid ?

I truly hope not. It would remove a very key feature of unRaid vs snapraid where unRaid can emulate missing / dead drives seamlessly. snapraid cannot do that at all

mdeabreu · August 18, 2019

Got corruption again, this time in Sonarr, Radarr, and OpenVPN-AS.

As has been mentioned it seems to occur during periods of high disk activity. I believe Sonarr was importing some media while Plex was streaming/transcoding.

At this point I'm very tempted to revert to 6.6.7 as it was rock stable. Are there any other tests we can do to help resolve this?

tower-diagnostics-20190818-1657.zip

Edited August 18, 2019 by mdeabreu

Rich Minear · August 18, 2019

1 hour ago, mdeabreu said:

Got corruption again, this time in Sonarr, Radarr, and OpenVPN-AS.

As has been mentioned it seems to occur during periods of high disk activity. I believe Sonarr was importing some media while Plex was streaming/transcoding.

At this point I'm very tempted to revert to 6.6.7 as it was rock stable. Are there any other tests we can do to help resolve this?

tower-diagnostics-20190818-1657.zip 100.99 kB · 1 download

I had to do the same thing. 6.6.7. I've been fighting corruption since mid May on the new platform, and nothing seems to work. I tried all the things they asked...but nothing seemed to make any difference. And I couldn't keep rebuilding the Plex database every day. 😞

principis · August 18, 2019

In the previous thread I said it seemed to be fixed by changing to /mnt/disk1 well it's not.

I'm still on 6.7.2, please let me know if there's an rc3 and I'll help test. Maybe downgrade the kernel?

dustinr · August 19, 2019

7 hours ago, Rich Minear said:

I had to do the same thing. 6.6.7. I've been fighting corruption since mid May on the new platform, and nothing seems to work. I tried all the things they asked...but nothing seemed to make any difference. And I couldn't keep rebuilding the Plex database every day. 😞

_IF_ your feeling adventurous remove your parity drive from your array and see if corruption occurs on RC2. Ive been running perfect since i deleted my parity drive.. If nothing else it would be a good test to correlate the issue.

phbigred · August 19, 2019

Maybe related, maybe not. Had Plex DB issues as well on cache drive. Flipped to XFS from btrfs for cache as I wasn't using the pool feature and haven't had issues since. Noticed cache corruption with my VMs too becoming unable to backup. Just my 2 cents.

SQLite DB Corruption testers needed

User Feedback

Recommended Comments

dustinr 0

Link to comment

TheBuz 4

Link to comment

dustinr 0

Link to comment

Rich Minear 33

Link to comment

TheBuz 4

Link to comment

Rich Minear 33

Link to comment

TheBuz 4

Link to comment

dustinr 0

Link to comment

brainbone 9

Link to comment

Badams 0

Link to comment

Rich Minear 33

Link to comment

wgstarks 522

Link to comment

gellux 3

Link to comment

TheBuz 4

Link to comment

isrdude 5

Link to comment

simalex 2

Link to comment

simalex 2

Link to comment

TheBuz 4

Link to comment

dustinr 0

Link to comment

Squid 4987

Link to comment

mdeabreu 1

Link to comment

Rich Minear 33

Link to comment

principis 0

Link to comment

dustinr 0

Link to comment

phbigred 13

Link to comment