Troubleshooting help required


Recommended Posts

Hi

I'm in a bit of a bind and I think I need help to perform proper diagnostics to finally resolve my issues. 

 

My UnRaid has been.. temperamental.. of late and I noticed a lot of BTFRS errors in the log and while they were curable they kept cropping up. So I replaced both cache drives from 2 x 256 to 2x1TB and rebuilt cache, docker file system and everything from scratch.

So far, so good. 

Now I am, however, getting infrequent crashes which mainly seem to affect docker - but docker is handling all my operations. It just happened again today, and now my Plex instance won't start claiming database corruption:

 

Starting Plex Media Server.
Error: Unable to set up server: sqlite3_statement_backend::prepare: malformed database schema (113616) for SQL: PRAGMA cache_size=2000 (N4soci10soci_errorE)
Stopping Plex Media Server.

 

This is GOD-AWFUL considering the amount of media I have in my library and how I've been painstakenly managing it inside Plex, but then again, I've had Plex decide to be an *@*!*#* before so it's not exactly the first time. 
It is however entirely too unstable and I really want it fixed. I don't think it's purely docker related as it happened last week, suddenly everything was inaccessible while I was on holiday, and even "My Servers" in UnRaid said that the server was offline. 

So there's *something* going on.. but I am having a hard time locking down what it may be. 

I am also using a marvell-based controller at the moment, since my non-marvel one doesn't want to address all the disks for whatever reason, so please disregard that. 

 

I'm starting to think running BTFRS cache is just cursed, as it's not the first time this configuration has given me extreme issues. I am half tempted to just yolo it and run with a single cache drive and just rely on backup of application data instead. 

If anyone has any insight I'd greatly appreciate it. 

fortytwo-syslog-127.0.0.1-20210527-1338.zip fortytwo-syslog-20210527-1342.zip

Link to comment
2 hours ago, ChatNoir said:

Did you run a memtest ? This would be the first time I try in your situation.

 

Your diagnostics could provide more information about your system that the syslog can.

Go to Tools, Diagnostics and attach the complete zip file to your next post.

 

Thanks for taking the time. Syslog attached. 

System has been stable for... six years or so and is using ECC memory, I hadn't really considered memory being an issue considering it's not being manipulated at all. 

It also seems to be entirely docker-container related, the issues I'm having.. array seems fine. I'm assuming the 5 errors that keep popping up during parity checks are down to the marvell controller which is being addressed. 

fortytwo-diagnostics-20210527-2140.zip

Link to comment

Oh, the reason I suspect Docker is because when Plex broke earlier in the day I logged on to My Servers from work and checked the server remotely. Everything seemed fine, but I couldn't load the docker page - at all. 

Then when I tried to reboot it via the interface it would just hang and not do anything, maybe from failing to halt the docker service. Log didn't really illuminate me. 

Ended up phoning the wife for a hard-reset which resolved it, but left Plex with the corrupt DB. Which I still haven't addressed as yet in case troubleshooting illuminates something that might fix it. 

Link to comment
7 minutes ago, Froberg said:

System has been stable for... six years or so and is using ECC memory, I hadn't really considered memory being an issue considering it's not being manipulated at all. 

I have the same reactions at times, but that's kind of the thing with failures, it works until it does not. :) 

Might not be the issue, if you have ECC memory, you might have a log of corrected errors ? In BIOS maybe (not an expert of ECC ^^).

 

If you want to check, the memtest that ships with Unraid is not useful for ECC, you would have to make a boot drive from https://www.memtest86.com/

 

I see incorrect sectors in the Array on the log probably because of the unclean shutdown:

May 27 16:18:18 FortyTwo kernel: md: recovery thread: P incorrect, sector=1574961048
May 27 16:18:18 FortyTwo kernel: md: recovery thread: P incorrect, sector=1574961056
May 27 16:18:18 FortyTwo kernel: md: recovery thread: P incorrect, sector=1574961064
May 27 16:18:18 FortyTwo kernel: md: recovery thread: P incorrect, sector=1574961072
May 27 16:18:18 FortyTwo kernel: md: recovery thread: P incorrect, sector=1574961080

 

Also some corruption on your cache:

May 27 15:00:20 FortyTwo kernel: BTRFS info (device sdb1): bdev /dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 8, gen 0
May 27 15:00:20 FortyTwo kernel: BTRFS info (device sdb1): bdev /dev/sdc1 errs: wr 0, rd 0, flush 0, corrupt 9, gen 0

JorgeB should be able to provide better advice on this. Could be the cause of your issues ?

  • Like 1
Link to comment

Yeah that cache is irksome to say the least. I rebuilt it because it kept throwing errors and causing problems - ON NEW DRIVES - and still it persists. 

That's just.. hurtful at this stage. 

But yes, I strongly suspect BTFRS and the cache/Docker to be at the root of much of this. 

 

Side note: Plex is now fixed by recovering an earlier backup DB. 

Edited by Froberg
Link to comment
4 minutes ago, JorgeB said:

Cache is detecting data corruption, you should run a scrub, if there are uncorrectable errors look at the syslog for the name of the files, those need to be deleted/restored from backups.

 

Did a scrub yesterday when ChatNoir mentioned it, no errors fixed or detected. 

Link to comment
6 minutes ago, JorgeB said:

Then they are old errors and can be reset, see here to reset and to better monitor the pool for the future:

https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=700582

 

It's not normal for data corruption to happen, next time it does act right way.

 

The cache is barely a month in to operation with these new drives so it's extremely odd for me. 

I hadn't noticed any corruption warnings though. 

 

I'll follow your guide and set up the notifications to be alerted. So you reckon' this is the cause of my problems? 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.