Unraid unresponsive every Sunday

jamesj2 · March 20, 2017

This is the 4th week in a row when I have woken up to an unresponsive server. Cannot connect to the web GUI, SSH, or console and have to hard reboot.

The issue always occurs on Sunday morning. I am not sure what is causing it. I have attached my FCP log file and am hoping someone might be able to shed some light as to what is going on.

FCPsyslog_tail.zip

thebaldconvict · March 20, 2017

Funnily enough mine was down on Sunday morning too! What time did yours fail?

I run observium as a docker and it all goes black at about 2am.

My backups happen on a Tuesday for dockers and Thursday for Vms so it shouldn't have been doing anything then.

gubbgnutten · March 20, 2017

How about attaching the last diagnostics file FCP captured as well?

jamesj2 · March 20, 2017

Sorry I grabbed the wrong file. Here's that last FCP caputured.

rock-diagnostics-20170319-0403.zip

jamesj2 · March 20, 2017

5 hours ago, thebaldconvict said:

Quote

Funnily enough mine was down on Sunday morning too! What time did yours fail?

I run observium as a docker and it all goes black at about 2am.

My backups happen on a Tuesday for dockers and Thursday for Vms so it shouldn't have been doing anything then.

Mine fails @ 4am after the CA backups have completed. I did try running the backup by hand last week to see if it was the culprit and it ran fine.

thebaldconvict · March 20, 2017

9 hours ago, gubbgnutten said:

How about attaching the last diagnostics file FCP captured as well?

OK stupid question time, I see jamesj2 has managed it but how do I get the last diags file?

Ah ok so I throw FCP into diagnostics mode and then when it does it again I just grab the latest diagnostics file....

Edited March 20, 2017 by thebaldconvict
Used the search better

trurl · March 20, 2017

11 minutes ago, thebaldconvict said:

OK stupid question time, I see jamesj2 has managed it but how do I get the last diags file?

The current diagnostics is at Tool - Diagnostics.

In addition to this, the Fix Common Problems plugin has a Troubleshooting mode that will periodically save Diagnostics and the tailing syslog to your flash drive so you can get them in the event of a crash.

thebaldconvict · March 20, 2017

Well this is the current output, I will turn on troubleshooting mode also so I can get those when (if) it does it again..

tower-diagnostics-20170320-1933.zip

bigsing · March 20, 2017

I have the same problem after CA Backup runs on Monday morning. The system is unresponsive and I have to hard reboot. I'll turn on troubleshooting mode this weekend to see if I can capture anything.

Sent from my XT1254 using Tapatalk

thebaldconvict · March 21, 2017

Have any of you got a keyboard and screen plugged into yours?

Mine when it did it was sat at the login screen with no new messages or anything but the keyboard was completely unresponsive also.

But it runs the backups at 2am Tuesday morning so it isn't that on mine, in fact it completed them successfully this morning...

Edited March 21, 2017 by thebaldconvict

jamesj2 · March 21, 2017

I have a USB keyboard and monitor plugged in. When it hangs I can wake up the console and switch terminals but I can't login.

kizer · March 21, 2017

I'd run tail on your log so it outputs while your asleep. Maybe you'll catch something on the screen while it freezes up.

thebaldconvict · March 26, 2017

OK so mine is still alive (just)

It is sending loads of call traces to the syslog at the moment, have attached my latest stats.

Is there a safe way to downgrade Unraid as I suspect it might be to do with the latest version(s)?...

tower-diagnostics-20170326-1152.zip

jamesj2 · March 26, 2017

My server is still working this Sunday. No hard reboot needed. Only thing different is I updated community applications, ca update applications, user scripts, dynamix system stats, virtualbox, and fix common problems during the last week.

thebaldconvict · April 2, 2017

Mine went again at 08:59 yesterday morning, spread a load of stuff across the screen that is plugged in and hard locked with only this is the syslog:

kernel:page:ffffea000d092b00 count:0 mapcount:0 mapping: (null) index:0x1

Wierd huh?

jamesj2 · April 2, 2017

I noticed mid week a Windows 10 VM that I use to run Jackett wasn't running. I've been running Jackett inside a VM since I was getting a lot mono errors in Docker. It's the only VM I have running. I do have VirtualBox installed with no active machines. And this morning my server locked up again. I'll have to try turning off the VM next Saturday and see what happens on Sunday.

thebaldconvict · April 2, 2017

Actually, I might have figured it out.... Flakey RAM. Ages ago (around 2 years ago) my server had gone into a boot loop situation where the fans would come on, then it would power off after about 2 seconds and cycle round in a loop, it would never show POST.

I fixed that by doing a BIOS reset and all was well until this all started.

Well I tried to reboot after the message above and I got the same boot loop thing as before but after stripping everything from the pc (all drives and cards) and a BIOS reset it was still doing it. If I take one of the RAM DIMMS out it posts fine, swap the other DIMM in and looped again, it follows the chip regardless of the socket, whenever that stick is in the board it wont post.

It is now limping along with only 8GB RAM and showing 86% used on the dashboard, not sure it is enough but it is on and working for now...

I'll monitor it like this and see how it goes, if it runs out of RAM or is stable I'll replace the stick, only problem is I can't get the same spec so will have to replace the pair.

thebaldconvict · April 12, 2017

Ran my server with only 8gb for a couple of days while I found a matching stick of ram and all seemed well.

I have since put a new 8gb on taking it back to 16gb and have got an up time of 8 days so far, not huge but much better and it's still climbing...

Fingers crossed that was it!

Better still the matching stick cost a grand total of £20, a new set would have been over £100.

Edited April 12, 2017 by thebaldconvict
Added price

jamesj2 · April 12, 2017

Disabling the Windows 10 VM didn't help. Now I've updated to Unraid 6.3.3 and CA 2017.04.09 we'll see what happens next Sunday...

thebaldconvict · April 12, 2017

Is it possible it could be a similar thing? I'd it possible that a newer kernel or kvm version is driving ram harder? Could be worth running the memtest86 on the startup menu for a few passes...

Sent from my SM-G925F using Tapatalk

jamesj2 · April 12, 2017

IMHO don't think it would drive the ram harder. If it passes memtest your memory should be good. How do the capacitors look on your motherboard? If they're leaking or popped up then you can have random issues.

thebaldconvict · April 12, 2017

No sorry, I meant mine has been behaving for just over a week now (early days still I know) but was behaving just like yours before the ram died so I was wondering if maybe yours could be a similar fault?

Hence maybe run memtest for a couple of passes. ☺

It isn't unheard of in the Windows world for later versions of Windows to show up ram issues so didn't know if maybe the Linux kernel could have the same effect.

jamesj2 · April 12, 2017

I may run memtest when I get a chance just to be sure. The server was pretty solid until I upgrade to 6.3.2. I can't remember when I installed Community Applications though. Since it locks after a the backup finishes on Sunday morning I'll try removing CA next.

thebaldconvict · April 12, 2017

Mine was as well, would stay up for months quite happily and it seemed to lock on the backup days mostly too.

It's run through two CA backups since removing that stick and also I have a script that stops the VM's, backs them up to the array, starts them and then zips the disk images which must be pretty taxing since they are about 120gb each.

It's done that twice also and people have been streaming off it via plex pretty much constantly this week with it being school holidays.

One thing I never tried was downgrading unraid.

Sent from my SM-G925F using Tapatalk

Unraid unresponsive every Sunday

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation