July 29, 20205 yr In the past week my server has begun to crash randomly. I can't seem to figure out exactly what task or process is triggering this and I'm hoping someone here can help me shed some light on it. I've attached my most recent log. Last crash was on 7/29 and I didn't turn it back on until today. Crashed 3 or 4 times prior to that, but I have no logs for them. Many of the errors at the beginning of the file are because It's trying to write logs to the syslog share (which is on cache), but the cache drive was full from a backup operation. I have since moved the backup operation to write straight to the array so it doesn't fill the cache as I'm trying to gather logs. (Other suggestions there are appreciated - I tried to set up a syslog server on a Windows machine, but was not successful in getting it to work.) You can see in the logs that after Jul 27 17:39:16 I got some BTRFS warnings about files it appears the Mover couldn't move. I think this is because a crash occurred during writing them. I deleted the files from the cache and let the backup run again which got those files onto the array. Not too worried about this. I think the problem is at Jul 27 22:09:05 Jul 27 22:09:05 TyreeMedia kernel: general protection fault: 0000 [#1] SMP PTI Jul 27 22:09:05 TyreeMedia kernel: CPU: 3 PID: 32449 Comm: find Tainted: P O 4.19.107-Unraid #1 Is this a CPU issue??? I've also attached my diagnostics output. Thanks! syslog-192.168.75.12.log tyreemedia-diagnostics-20200729-1756.zip Edited July 29, 20205 yr by Mattaton
July 29, 20205 yr Author Okay. Looking on the forums for similar errors, I saw that MemTest was something to try. See the attache photo. Ummmm.... I'm gonna say this isn't good, but is it even possible for 4 sticks to COMPLETELY FAIL??? I know pretty much nothing about this stuff, but this seems like too many errors to be reasonable. Since taking the photo, the Pass is at 16% with over 1500 errors and still Pass of 0. Am I looking at buying new RAM or could this be a mobo issue?
July 29, 20205 yr Author Shuffled all RAM sticks in same slots. Still lighting up like a Christmas tree. How do I know which stick is throwing the errors?
July 29, 20205 yr 35 minutes ago, Mattaton said: Shuffled all RAM sticks in same slots. Still lighting up like a Christmas tree. How do I know which stick is throwing the errors? Test all RAM-Sticks one by one in the same (working) slot - then you will see which one is faulty. And check that all RAMs are running "not" overclocked - deaktivate XMP! Also possible: A defective RAM-Slot on the Mainboard... not funny but possible... Edited July 29, 20205 yr by Zonediver
July 30, 20205 yr 49 minutes ago, Zonediver said: Also possible: A defective RAM-Slot on the Mainboard... Which you should also test for by trying that single stick in another slot if the test fails.
July 30, 20205 yr Author 1 hour ago, Michael_P said: Bent pin in the CPU socket could knock out a bank, too - ask me how I know Yeeeesshhh...fun! I'm hoping that's not the case since it's been working fine for a long time and the CPU hasn't been removed for the pins to be exposed. I'm toying with the idea of just replacing the mobo, CPU, & RAM. This PC was aging when I put it into service as an unRAID server. I think with the extra duties I'm throwing at it with backups from my Windows PCs, I should actually look at some new hardware for it and not start slapping band-aids on this build. Time to start researching how much unRAID likes Ryzen3000/x570.
Archived
This topic is now archived and is closed to further replies.