jamesj2 Posted March 20, 2017 Share Posted March 20, 2017 This is the 4th week in a row when I have woken up to an unresponsive server. Cannot connect to the web GUI, SSH, or console and have to hard reboot. The issue always occurs on Sunday morning. I am not sure what is causing it. I have attached my FCP log file and am hoping someone might be able to shed some light as to what is going on. FCPsyslog_tail.zip Quote Link to comment
thebaldconvict Posted March 20, 2017 Share Posted March 20, 2017 Funnily enough mine was down on Sunday morning too! What time did yours fail? I run observium as a docker and it all goes black at about 2am. My backups happen on a Tuesday for dockers and Thursday for Vms so it shouldn't have been doing anything then. Quote Link to comment
gubbgnutten Posted March 20, 2017 Share Posted March 20, 2017 How about attaching the last diagnostics file FCP captured as well? Quote Link to comment
jamesj2 Posted March 20, 2017 Author Share Posted March 20, 2017 Sorry I grabbed the wrong file. Here's that last FCP caputured. rock-diagnostics-20170319-0403.zip Quote Link to comment
jamesj2 Posted March 20, 2017 Author Share Posted March 20, 2017 5 hours ago, thebaldconvict said: Quote Funnily enough mine was down on Sunday morning too! What time did yours fail? I run observium as a docker and it all goes black at about 2am. My backups happen on a Tuesday for dockers and Thursday for Vms so it shouldn't have been doing anything then. Mine fails @ 4am after the CA backups have completed. I did try running the backup by hand last week to see if it was the culprit and it ran fine. Quote Link to comment
thebaldconvict Posted March 20, 2017 Share Posted March 20, 2017 (edited) 9 hours ago, gubbgnutten said: How about attaching the last diagnostics file FCP captured as well? OK stupid question time, I see jamesj2 has managed it but how do I get the last diags file? Ah ok so I throw FCP into diagnostics mode and then when it does it again I just grab the latest diagnostics file.... Edited March 20, 2017 by thebaldconvict Used the search better Quote Link to comment
trurl Posted March 20, 2017 Share Posted March 20, 2017 11 minutes ago, thebaldconvict said: OK stupid question time, I see jamesj2 has managed it but how do I get the last diags file? The current diagnostics is at Tool - Diagnostics. In addition to this, the Fix Common Problems plugin has a Troubleshooting mode that will periodically save Diagnostics and the tailing syslog to your flash drive so you can get them in the event of a crash. Quote Link to comment
thebaldconvict Posted March 20, 2017 Share Posted March 20, 2017 Well this is the current output, I will turn on troubleshooting mode also so I can get those when (if) it does it again.. tower-diagnostics-20170320-1933.zip Quote Link to comment
bigsing Posted March 20, 2017 Share Posted March 20, 2017 I have the same problem after CA Backup runs on Monday morning. The system is unresponsive and I have to hard reboot. I'll turn on troubleshooting mode this weekend to see if I can capture anything.Sent from my XT1254 using Tapatalk Quote Link to comment
thebaldconvict Posted March 21, 2017 Share Posted March 21, 2017 (edited) Have any of you got a keyboard and screen plugged into yours? Mine when it did it was sat at the login screen with no new messages or anything but the keyboard was completely unresponsive also. But it runs the backups at 2am Tuesday morning so it isn't that on mine, in fact it completed them successfully this morning... Edited March 21, 2017 by thebaldconvict Quote Link to comment
jamesj2 Posted March 21, 2017 Author Share Posted March 21, 2017 I have a USB keyboard and monitor plugged in. When it hangs I can wake up the console and switch terminals but I can't login. Quote Link to comment
kizer Posted March 21, 2017 Share Posted March 21, 2017 I'd run tail on your log so it outputs while your asleep. Maybe you'll catch something on the screen while it freezes up. Quote Link to comment
thebaldconvict Posted March 26, 2017 Share Posted March 26, 2017 OK so mine is still alive (just) It is sending loads of call traces to the syslog at the moment, have attached my latest stats. Is there a safe way to downgrade Unraid as I suspect it might be to do with the latest version(s)?... tower-diagnostics-20170326-1152.zip Quote Link to comment
jamesj2 Posted March 26, 2017 Author Share Posted March 26, 2017 My server is still working this Sunday. No hard reboot needed. Only thing different is I updated community applications, ca update applications, user scripts, dynamix system stats, virtualbox, and fix common problems during the last week. Quote Link to comment
thebaldconvict Posted April 2, 2017 Share Posted April 2, 2017 Mine went again at 08:59 yesterday morning, spread a load of stuff across the screen that is plugged in and hard locked with only this is the syslog: kernel:page:ffffea000d092b00 count:0 mapcount:0 mapping: (null) index:0x1 Wierd huh? Quote Link to comment
jamesj2 Posted April 2, 2017 Author Share Posted April 2, 2017 I noticed mid week a Windows 10 VM that I use to run Jackett wasn't running. I've been running Jackett inside a VM since I was getting a lot mono errors in Docker. It's the only VM I have running. I do have VirtualBox installed with no active machines. And this morning my server locked up again. I'll have to try turning off the VM next Saturday and see what happens on Sunday. Quote Link to comment
thebaldconvict Posted April 2, 2017 Share Posted April 2, 2017 Actually, I might have figured it out.... Flakey RAM. Ages ago (around 2 years ago) my server had gone into a boot loop situation where the fans would come on, then it would power off after about 2 seconds and cycle round in a loop, it would never show POST. I fixed that by doing a BIOS reset and all was well until this all started. Well I tried to reboot after the message above and I got the same boot loop thing as before but after stripping everything from the pc (all drives and cards) and a BIOS reset it was still doing it. If I take one of the RAM DIMMS out it posts fine, swap the other DIMM in and looped again, it follows the chip regardless of the socket, whenever that stick is in the board it wont post. It is now limping along with only 8GB RAM and showing 86% used on the dashboard, not sure it is enough but it is on and working for now... I'll monitor it like this and see how it goes, if it runs out of RAM or is stable I'll replace the stick, only problem is I can't get the same spec so will have to replace the pair. Quote Link to comment
thebaldconvict Posted April 12, 2017 Share Posted April 12, 2017 (edited) Ran my server with only 8gb for a couple of days while I found a matching stick of ram and all seemed well. I have since put a new 8gb on taking it back to 16gb and have got an up time of 8 days so far, not huge but much better and it's still climbing... Fingers crossed that was it! Better still the matching stick cost a grand total of £20, a new set would have been over £100. Edited April 12, 2017 by thebaldconvict Added price Quote Link to comment
jamesj2 Posted April 12, 2017 Author Share Posted April 12, 2017 Disabling the Windows 10 VM didn't help. Now I've updated to Unraid 6.3.3 and CA 2017.04.09 we'll see what happens next Sunday... Quote Link to comment
thebaldconvict Posted April 12, 2017 Share Posted April 12, 2017 Is it possible it could be a similar thing? I'd it possible that a newer kernel or kvm version is driving ram harder? Could be worth running the memtest86 on the startup menu for a few passes... Sent from my SM-G925F using Tapatalk Quote Link to comment
jamesj2 Posted April 12, 2017 Author Share Posted April 12, 2017 IMHO don't think it would drive the ram harder. If it passes memtest your memory should be good. How do the capacitors look on your motherboard? If they're leaking or popped up then you can have random issues. Quote Link to comment
thebaldconvict Posted April 12, 2017 Share Posted April 12, 2017 No sorry, I meant mine has been behaving for just over a week now (early days still I know) but was behaving just like yours before the ram died so I was wondering if maybe yours could be a similar fault? Hence maybe run memtest for a couple of passes. ☺ It isn't unheard of in the Windows world for later versions of Windows to show up ram issues so didn't know if maybe the Linux kernel could have the same effect. Quote Link to comment
jamesj2 Posted April 12, 2017 Author Share Posted April 12, 2017 I may run memtest when I get a chance just to be sure. The server was pretty solid until I upgrade to 6.3.2. I can't remember when I installed Community Applications though. Since it locks after a the backup finishes on Sunday morning I'll try removing CA next. Quote Link to comment
thebaldconvict Posted April 12, 2017 Share Posted April 12, 2017 Mine was as well, would stay up for months quite happily and it seemed to lock on the backup days mostly too. It's run through two CA backups since removing that stick and also I have a script that stops the VM's, backs them up to the array, starts them and then zips the disk images which must be pretty taxing since they are about 120gb each. It's done that twice also and people have been streaming off it via plex pretty much constantly this week with it being school holidays. One thing I never tried was downgrading unraid. Sent from my SM-G925F using Tapatalk Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.