Random crashes - Maybe something to do with CPU?


Recommended Posts

In the past week my server has begun to crash randomly. I can't seem to figure out exactly what task or process is triggering this and I'm hoping someone here can help me shed some light on it.

 

I've attached my most recent log. Last crash was on 7/29 and I didn't turn it back on until today. Crashed 3 or 4 times prior to that, but I have no logs for them.

 

Many of the errors at the beginning of the file are because It's trying to write logs to the syslog share (which is on cache), but the cache drive was full from a backup operation. I have since moved the backup operation to write straight to the array so it doesn't fill the cache as I'm trying to gather logs. (Other suggestions there are appreciated - I tried to set up a syslog server on a Windows machine, but was not successful in getting it to work.)

 

You can see in the logs that after Jul 27 17:39:16 I got some BTRFS warnings about files it appears the Mover couldn't move. I think this is because a crash occurred during writing them. I deleted the files from the cache and let the backup run again which got those files onto the array. Not too worried about this.

 

I think the problem is at Jul 27 22:09:05

Jul 27 22:09:05 TyreeMedia kernel: general protection fault: 0000 [#1] SMP PTI
Jul 27 22:09:05 TyreeMedia kernel: CPU: 3 PID: 32449 Comm: find Tainted: P           O      4.19.107-Unraid #1

 

Is this a CPU issue???

 

I've also attached my diagnostics output.

 

Thanks!

syslog-192.168.75.12.log tyreemedia-diagnostics-20200729-1756.zip

Edited by Mattaton
Link to comment

Okay. Looking on the forums for similar errors, I saw that MemTest was something to try. See the attache photo.

 

Ummmm.... I'm gonna say this isn't good, but is it even possible for 4 sticks to COMPLETELY FAIL??? I know pretty much nothing about this stuff, but this seems like too many errors to be reasonable.

 

Since taking the photo, the Pass is at 16% with over 1500 errors and still Pass of 0.

 

Am I looking at buying new RAM or could this be a mobo issue?

 

20200729_183115.jpg

Link to comment
35 minutes ago, Mattaton said:

Shuffled all RAM sticks in same slots. Still lighting up like a Christmas tree. How do I know which stick is throwing the errors?

Test all RAM-Sticks one by one in the same (working) slot - then you will see which one is faulty.

And check that all RAMs are running "not" overclocked - deaktivate XMP!

Also possible: A defective RAM-Slot on the Mainboard... not funny but possible...

Edited by Zonediver
Link to comment
1 hour ago, Michael_P said:

Bent pin in the CPU socket could knock out a bank, too - ask me how I know :)

Yeeeesshhh...fun!

I'm hoping that's not the case since it's been working fine for a long time and the CPU hasn't been removed for the pins to be exposed.

 

I'm toying with the idea of just replacing the mobo, CPU, & RAM. This PC was aging when I put it into service as an unRAID server. I think with the extra duties I'm throwing at it with backups from my Windows PCs, I should actually look at some new hardware for it and not start slapping band-aids on this build.

 

Time to start researching how much unRAID likes Ryzen3000/x570.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.