Server hang after upgrading from 6.3.0


Recommended Posts

Not wanting to thread jack, but I too have been having issues since upgrading to 6.3.2.   I found the Fix Common Problems plugin and have it running in Troubleshooting mode.  unRAID died again during the night (happening every other night between 3-5 am) and shares are dead, webui unresponsive (sometimes is responsive but shares are gone and only come back after a full reboot), ssh is down.   Will be combing through the logs sometime this evening or tomorrow, but wanted to let you know that you are not alone, rippernz.

 

I had no issues on 6.2.x builds, and I don't recall any issues on 6.3.0 (possibly none with 6.3.1), but every second night (early morning) since 6.3.2, my server goes tits up.   Wanted to share my experience in case our problems are related.  I have NOT opened my own thread yet, still gathering information, but can if needed.

Link to comment

I think I've been having similar problem. Started after the latest update here as well. The whole webUI goes unresponsive, iowait times go high as hell. Going to commandline to see some system details, most of the times the commands just hang and I can't even CTRL-C out. The system did that just a moment ago, had to force shutdown and I'm now doing parity check.

 

I was thinking it could be a bottleneck from slow HDD's and things like finished downloads overloading the transfer capacity? I started to at the Docker containers, but could not find any clues if they are the root of this problem. What containers do you have enabled and are in use? Does anyone have any good tips where to look when trying to find the cause for high iowait?

 

I found this discussion from Docker Github, could it be related to this one? They seem to have concluded it to be a kernel bug.

Link to comment
41 minutes ago, johnnie.black said:

 

Any reiserfs disks?

 

Not mgladwin, but I know in my case, I have 0 reiserfs.  All xfs on my end.    And my Mover script runs once an hour.  

 

My time frame is oddly consistent.  It is happening every other night (well wee hours in morning), seems too predictable to be a coincidence.  I am still on travel, so have not had a chance to review the logs saved off from FCP plugin in troubleshooting mode.  Hopefully will get to that tomorrow, and tomorrow morning is a predicted failure time based off of previous observations.

Edited by MisterLas
Link to comment
 
Any reiserfs disks?


Yes 3 reiser disks and 1 xfs. My mover doesn't actually move anything because I dont use cache for my shares. Only appdata and vdisks.
My troubles started after adding the xfs disk not necessarily after a os update. From memory I lasted about a week before first crash/lock up. Mine seems to lock up in early hours or morning as well. Is there any system cron jobs that run early morning apart from mover? My first crash was about 3:32am or at least that was the end of the syslog.Does that line up anywhere near others?

Sent from my SM-G930F using Tapatalk

Link to comment
8 hours ago, mgladwin said:

Yes 3 reiser disks and 1 xfs

 

There are various reports of users having non responsive servers with v6 and reiserfs disks, since you already have a xfs disk you could temporarily limit all writes to that disk (by changing your share(s) to only include that one) and test for a few days if it helps, if it does convert all remaining disks to xfs (IMO you should convert anyway, there have been several reiserfs related issues lately, besides the fact that they can have terrible performance in certain situations)

Link to comment
 
There are various reports of users having non responsive servers with v6 and reiserfs disks, since you already have a xfs disk you could temporarily limit all writes to that disk (by changing your share(s) to only include that one) and test for a few days if it helps, if it does convert all remaining disks to xfs (IMO you should convert anyway, there have been several reiserfs related issues lately, besides the fact that they can have terrible performance in certain situations)


Thanks jonnie, I do plan on changing all disks over to xfs. I will try what you said in the mean time and see how it goes.

Sent from my SM-G930F using Tapatalk

Link to comment
2 hours ago, mgladwin said:

 


Thanks jonnie, I do plan on changing all disks over to xfs. I will try what you said in the mean time and see how it goes.

Sent from my SM-G930F using Tapatalk
 

 

 

While I won't argue against converting to xfs, I sincerely don't think this is the issue.   I have 0 reiser filesystems, and have repeatable failures.   

 

Like clockwork, my 6.3.2 died again during the night.  I'm back from travel today and have 2 good sets of logs from FCP plugin in troubleshooting mode that I can hopefully analyze today and see if anything is reported.  Could possibly share out too after I review what is actually captured. 

Edited by MisterLas
  • Upvote 1
Link to comment
5 minutes ago, MisterLas said:

 

While I won't argue against converting to xfs, I sincerely don't think this is the issue.   I have 0 reiser filesystems, and have repeatable failures.   

 

Like clockwork, my 6.3.2 died again during the night.  I'm back from travel today and have 2 good sets of logs from FCP plugin in troubleshooting mode that I can hopefully analyze today and see if anything is reported.  Could possibly share out too after I review what is actually captured. 

 

You should start your won thread, similar symptom does not always mean same problem, and it's difficult to support more than one user in the same thread.

Link to comment
1 minute ago, johnnie.black said:

 

You should start your won thread, similar symptom does not always mean same problem, and it's difficult to support more than one user in the same thread.

 

Fair point, was not meaning to thread jack, and definitely will open my own thread if I go to share logs, but was simply pasting my general experience in what I felt was a related situation.

Link to comment

So, this morning at about 12.20 i re-enabled CA backup & Restore to backup my appdata and dockers and enabled fix common problems, and i wake up to a hung server.

 

it had been running about 9 days before i had to restart it due to the log filling up and getting blasted with emails every 10min telling me it was filling up, after that restart it was up about 1 days 15hrs before i re-enabled CA backup & Restore.

 

is this the problem?  does anyone else have this running?

 

The backup time is 3am, im not sure if this is default or i set it.

 

i have attached the logs.

 

 

FCPsyslog_tail.txt

lion-diagnostics-20170331-0733.zip

Link to comment
12 hours ago, rippernz said:

it had been running about 9 days before i had to restart it due to the log filling up and getting blasted with emails every 10min telling me it was filling up, after that restart it was up about 1 days 15hrs before i re-enabled CA backup & Restore.

 

is this the problem?  does anyone else have this running?

 

 I do not have CA Backup and Restore configured to run, but do have it installed.  

 

I was still seeing the same issue on 6.3.0, but I downgraded to 6.2.4 and have been running for 3.5 days with no issues thus far... I'll see if I can make it through the weekend.

Link to comment
9 minutes ago, MisterLas said:

 

 I do not have CA Backup and Restore configured to run, but do have it installed.  

 

I was still seeing the same issue on 6.3.0, but I downgraded to 6.2.4 and have been running for 3.5 days with no issues thus far... I'll see if I can make it through the weekend.

i also had it installed, not set to run.  have now uninstalled.  i also downgraded to 6.2.4 and thought things were ok, but then got a hang after 6 days.  better than every two days.

 

hope we're isolating the problem

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.