Unraid unresponsive every Sunday


Recommended Posts

This is the 4th week in a row when I have woken up to an unresponsive server. Cannot connect to the web GUI, SSH, or console and have to hard reboot. 

 

The issue always occurs on Sunday morning.  I am not sure what is causing it. I have attached my FCP log file and am hoping someone might be able to shed some light as to what is going on. 

FCPsyslog_tail.zip

Link to comment
5 hours ago, thebaldconvict said:
Quote

Funnily enough mine was down on Sunday morning too! What time did yours fail? 

 

I run observium as a docker and it all goes black at about 2am. 

 

My backups happen on a Tuesday for dockers and Thursday for Vms so it shouldn't have been doing anything then. 

 

Mine fails @ 4am after the CA backups have completed.  I did try running the backup by hand last week to see if it was the culprit and it ran fine.

 

Link to comment
9 hours ago, gubbgnutten said:

How about attaching the last diagnostics file FCP captured as well? :)

 

OK stupid question time, I see jamesj2 has managed it but how do I get the last diags file?

 

Ah ok so I throw FCP into diagnostics mode and then when it does it again I just grab the latest diagnostics file....

Edited by thebaldconvict
Used the search better
Link to comment
11 minutes ago, thebaldconvict said:

 

OK stupid question time, I see jamesj2 has managed it but how do I get the last diags file?

The current diagnostics is at Tool - Diagnostics.

 

In addition to this, the Fix Common Problems plugin has a Troubleshooting mode that will periodically save Diagnostics and the tailing syslog to your flash drive so you can get them in the event of a crash.

Link to comment

Have any of you got a keyboard and screen plugged into yours? 

 

Mine when it did it was sat at the login screen with no new messages or anything but the keyboard was completely unresponsive also. 

 

But it runs the backups at 2am Tuesday morning so it isn't that on mine, in fact it completed them successfully this morning... 

Edited by thebaldconvict
Link to comment

I noticed mid week a Windows 10 VM that I use to run Jackett wasn't running.  I've been running Jackett inside a VM since I was getting a lot mono errors in Docker.  It's the only VM I have running.  I do have VirtualBox installed with no active machines.  And this morning my server locked up again.   I'll have to try turning off the VM next Saturday and see what happens on Sunday.

Link to comment

Actually, I might have figured it out.... Flakey RAM. Ages ago (around 2 years ago) my server had gone into a boot loop situation where the fans would come on, then it would power off after about 2 seconds and cycle round in a loop, it would never show POST.

 

I fixed that by doing a BIOS reset and all was well until this all started.

 

Well I tried to reboot after the message above and I got the same boot loop thing as before but after stripping everything from the pc (all drives and cards) and a BIOS reset it was still doing it. If I take one of the RAM DIMMS out it posts fine, swap the other DIMM in and looped again, it follows the chip regardless of the socket, whenever that stick is in the board it wont post.

 

It is now limping along with only 8GB RAM and showing 86% used on the dashboard, not sure it is enough but it is on and working for now...

 

I'll monitor it like this and see how it goes, if it runs out of RAM or is stable I'll replace the stick, only problem is I can't get the same spec so will have to replace the pair.

Link to comment
  • 2 weeks later...

Ran my server with only 8gb for a couple of days while I found a matching stick of ram and all seemed well. 

 

I have since put a new 8gb on taking it back to 16gb and have got an up time of 8 days so far, not huge but much better and it's still climbing... 

 

Fingers crossed that was it! 

 

Better still the matching stick cost a grand total of £20, a new set would have been over £100.

Edited by thebaldconvict
Added price
Link to comment

No sorry, I meant mine has been behaving for just over a week now (early days still I know) but was behaving just like yours before the ram died so I was wondering if maybe yours could be a similar fault? 

 

Hence maybe run memtest for a couple of passes. ☺

 

It isn't unheard of in the Windows world for later versions of Windows to show up ram issues so didn't know if maybe the Linux kernel could have the same effect. 

Link to comment

I may run memtest when I get a chance just to be sure.  The server was pretty solid until I upgrade to 6.3.2.  I can't remember when I installed Community Applications though.  Since it locks after a the backup finishes on Sunday morning I'll try removing CA next.

Link to comment

Mine was as well, would stay up for months quite happily and it seemed to lock on the backup days mostly too.

It's run through two CA backups since removing that stick and also I have a script that stops the VM's, backs them up to the array, starts them and then zips the disk images which must be pretty taxing since they are about 120gb each.

It's done that twice also and people have been streaming off it via plex pretty much constantly this week with it being school holidays.

One thing I never tried was downgrading unraid.

Sent from my SM-G925F using Tapatalk

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.