Server crashing again


Recommended Posts

So my server had been crashing recently and then we tracked it to a corrupt cache drive which I cleared and started over with. Today my server has been crashing again. I'm not very good at reading logs but from what I can tell it might have something to do with btrfs which leads me to believe its the cache drive again but I'm not sure what would be corrupting it. Here's all my logs and screenshot of the monitor when I plugged it in. Any help would be greatly appreciated.

 

 

IMG_0104.JPG.2837ef9bd7a1cdc671f9d1138b4835ec.JPG

FCPsyslog_tail.zip

server-diagnostics-20170215-1939.zip

Link to comment

The Fix Common Problems syslog tail shows a segfault following a BTRFS checksum error on the filesystem contained in your docker.img file and then an Out Of Memory error:

 

Feb 14 20:33:55 Server kernel: BTRFS warning (device loop0): loop0 checksum verify failed on 191053824 wanted 850A780D found C12EEF04 level 0

Feb 14 20:33:58 Server kernel: notify[10708]: segfault at 0 ip 00000000005f42ad sp 00007ffda29924e0 error 4 in php[400000+724000]

Feb 14 20:34:02 Server kernel: Plex Media Serv invoked oom-killer: gfp_mask=0x24201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=0, order=0, oom_score_adj=0

 

Later there's BTRFS corruption found on your cache disk itself:

 

Feb 15 03:03:11 Server kernel: BTRFS critical (device sde1): corrupt leaf, bad key order: block=173359104, root=1, slot=151

 

repeated many times, plus a call trace. Later there's more BTRFS trouble:

 

Feb 15 19:07:57 Server kernel: kernel BUG at fs/btrfs/ctree.c:3168!

Feb 15 19:07:57 Server kernel: invalid opcode: 0000 [#1] PREEMPT SMP

 

then a kernel oops and finally:

 

Feb 15 19:07:57 Server kernel: Fixing recursive fault but reboot is needed!

 

Since is seems to have begun with a segfault you should run Memtest for 48 hours as a memory problem is a relatively easy thing to detect and fix. I don't know whether the BTRFS errors are a symptom or a cause but once the RAM is either eliminated or replaced you need to address both the file system errors on the cache and those in the docker.img file.

 

It's probably going to be easier to delete the docker.img file, then re-create it and download the containers again than to try to repair it - but fix the cache disk problem first. Regarding the cache disk itself, you say you have had problems with it before. I don't know anything about that and you haven't provided a link so I'll just say that you might want to consider replacing the cache disk.

 

Your magnetic disks look OK but they all show evidence of having had cable problems in the past. You may be aware of this already - if not, you'll want to make sure the SATA cables are all seated securely at both ends.

 

Link to comment

 

Since is seems to have begun with a segfault you should run Memtest for 48 hours as a memory problem is a relatively easy thing to detect and fix. I don't know whether the BTRFS errors are a symptom or a cause but once the RAM is either eliminated or replaced you need to address both the file system errors on the cache and those in the docker.img file.

 

It's probably going to be easier to delete the docker.img file, then re-create it and download the containers again than to try to repair it - but fix the cache disk problem first. Regarding the cache disk itself, you say you have had problems with it before. I don't know anything about that and you haven't provided a link so I'll just say that you might want to consider replacing the cache disk.

 

Your magnetic disks look OK but they all show evidence of having had cable problems in the past. You may be aware of this already - if not, you'll want to make sure the SATA cables are all seated securely at both ends.

 

 

I'll run memtest for the next 48 and see what that turns up. I've got another SSD laying around so I may throw that in and take out the current drive. I was using old cables and actually replaced all them over the weekend with new cables so I'll double check my seating. I'll let you know what the memtest turns up.

 

Thanks!

 

Link to comment

So after a memtest I was getting some memory errors so I pulled the dimms I suspected were bad and I'm running a new memtest. So far, so good but I'm letting it run another 12 hours for the full 48 again.

I'll rebuild my cache drive after and hopefully the problem will be solved. Since I'm going to be using just one cache drive and don't have plans in the immediate future to add a second would I be better off running XFS on the cache drive?

I do have a question about the ram. I had 24GB with 2 8GB dimms and 2 4GB dimms. I took out the 2 4GB dimms which leaves me the 16GB. I was reading in this forum (I think)that its better to run 2 sticks vs 4 sticks. Will 16 be adequate for running plex, couch potato, sickrage, duck dns, deluge, and any future dockers, or would I be better served to add the 8GB back in once I get replacements from Kingston?

 

 

Link to comment

If you have a UPS you might as well stick with BTRFS, though some people would recommend switching to XFS, which seems to recover better from unclean shutdowns.

Four sticks of unbuffered RAM can present too big a load for some motherboards, so unless yours is on the manufacturer's qualified RAM list it's safer to stick with two. That said, many people get away with it, though to stand a fighting chance I'd try to use the same type. Of those dockers you mention, I only run Plex. The best way to find out is to try it and see how you get on.

  • Upvote 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.