Jump to content
rippernz

Server hang after upgrading from 6.3.0

83 posts in this topic Last Reply

Recommended Posts

Not wanting to thread jack, but I too have been having issues since upgrading to 6.3.2.   I found the Fix Common Problems plugin and have it running in Troubleshooting mode.  unRAID died again during the night (happening every other night between 3-5 am) and shares are dead, webui unresponsive (sometimes is responsive but shares are gone and only come back after a full reboot), ssh is down.   Will be combing through the logs sometime this evening or tomorrow, but wanted to let you know that you are not alone, rippernz.

 

I had no issues on 6.2.x builds, and I don't recall any issues on 6.3.0 (possibly none with 6.3.1), but every second night (early morning) since 6.3.2, my server goes tits up.   Wanted to share my experience in case our problems are related.  I have NOT opened my own thread yet, still gathering information, but can if needed.

Share this post


Link to post
13 minutes ago, MisterLas said:

happening every other night between 3-5 am)

 

May when the mover runs? Did you try temporarily disabling it?

Share this post


Link to post
My mover is set to run every hour, I believe.  Will doublecheck tonight.


I disabled my mover thinking it was that. Still having same issue.

Sent from my SM-G930F using Tapatalk

Share this post


Link to post

I think I've been having similar problem. Started after the latest update here as well. The whole webUI goes unresponsive, iowait times go high as hell. Going to commandline to see some system details, most of the times the commands just hang and I can't even CTRL-C out. The system did that just a moment ago, had to force shutdown and I'm now doing parity check.

 

I was thinking it could be a bottleneck from slow HDD's and things like finished downloads overloading the transfer capacity? I started to at the Docker containers, but could not find any clues if they are the root of this problem. What containers do you have enabled and are in use? Does anyone have any good tips where to look when trying to find the cause for high iowait?

 

I found this discussion from Docker Github, could it be related to this one? They seem to have concluded it to be a kernel bug.

Share this post


Link to post
41 minutes ago, johnnie.black said:

 

Any reiserfs disks?

 

Not mgladwin, but I know in my case, I have 0 reiserfs.  All xfs on my end.    And my Mover script runs once an hour.  

 

My time frame is oddly consistent.  It is happening every other night (well wee hours in morning), seems too predictable to be a coincidence.  I am still on travel, so have not had a chance to review the logs saved off from FCP plugin in troubleshooting mode.  Hopefully will get to that tomorrow, and tomorrow morning is a predicted failure time based off of previous observations.

Edited by MisterLas

Share this post


Link to post

Well after reverting back to 6.3.0 I have been up for 3days 8hrs, I never saw this when I was on 6.3.1 or 6.3.2, I'll try updating to 6.3.2 latter and see how long it lasts

Share this post


Link to post
 
Any reiserfs disks?


Yes 3 reiser disks and 1 xfs. My mover doesn't actually move anything because I dont use cache for my shares. Only appdata and vdisks.
My troubles started after adding the xfs disk not necessarily after a os update. From memory I lasted about a week before first crash/lock up. Mine seems to lock up in early hours or morning as well. Is there any system cron jobs that run early morning apart from mover? My first crash was about 3:32am or at least that was the end of the syslog.Does that line up anywhere near others?

Sent from my SM-G930F using Tapatalk

Share this post


Link to post
8 hours ago, mgladwin said:

Yes 3 reiser disks and 1 xfs

 

There are various reports of users having non responsive servers with v6 and reiserfs disks, since you already have a xfs disk you could temporarily limit all writes to that disk (by changing your share(s) to only include that one) and test for a few days if it helps, if it does convert all remaining disks to xfs (IMO you should convert anyway, there have been several reiserfs related issues lately, besides the fact that they can have terrible performance in certain situations)

Share this post


Link to post
 
There are various reports of users having non responsive servers with v6 and reiserfs disks, since you already have a xfs disk you could temporarily limit all writes to that disk (by changing your share(s) to only include that one) and test for a few days if it helps, if it does convert all remaining disks to xfs (IMO you should convert anyway, there have been several reiserfs related issues lately, besides the fact that they can have terrible performance in certain situations)


Thanks jonnie, I do plan on changing all disks over to xfs. I will try what you said in the mean time and see how it goes.

Sent from my SM-G930F using Tapatalk

Share this post


Link to post
2 hours ago, mgladwin said:

 


Thanks jonnie, I do plan on changing all disks over to xfs. I will try what you said in the mean time and see how it goes.

Sent from my SM-G930F using Tapatalk
 

 

 

While I won't argue against converting to xfs, I sincerely don't think this is the issue.   I have 0 reiser filesystems, and have repeatable failures.   

 

Like clockwork, my 6.3.2 died again during the night.  I'm back from travel today and have 2 good sets of logs from FCP plugin in troubleshooting mode that I can hopefully analyze today and see if anything is reported.  Could possibly share out too after I review what is actually captured. 

Edited by MisterLas

Share this post


Link to post
5 minutes ago, MisterLas said:

 

While I won't argue against converting to xfs, I sincerely don't think this is the issue.   I have 0 reiser filesystems, and have repeatable failures.   

 

Like clockwork, my 6.3.2 died again during the night.  I'm back from travel today and have 2 good sets of logs from FCP plugin in troubleshooting mode that I can hopefully analyze today and see if anything is reported.  Could possibly share out too after I review what is actually captured. 

 

You should start your won thread, similar symptom does not always mean same problem, and it's difficult to support more than one user in the same thread.

Share this post


Link to post
1 minute ago, johnnie.black said:

 

You should start your won thread, similar symptom does not always mean same problem, and it's difficult to support more than one user in the same thread.

 

Fair point, was not meaning to thread jack, and definitely will open my own thread if I go to share logs, but was simply pasting my general experience in what I felt was a related situation.

Share this post


Link to post

So im just on 6 days back on 6.3.0 with no issues, i have just re-downloaded 6.3.2 and about to restart for it to take effect, lets see how long it goes before it freaks.

Share this post


Link to post

I did as jonnie.black said and limited writes to all my ReiserFS disks and I haven't had an issue since. I will update the thread I started with this info as well.

Share this post


Link to post

Well its been 4 days 17 hrs on 6.3.2 with no issues, fix common problems is still running in troubleshooting mode.

 

nothing on the system has changed from when it was hanging so im a little stumped

Share this post


Link to post

i reverted back to 6.3.0 but woke up to a locked system after about 36 hours.  glad you're doing good, not sure why i keep freezing

 

Share this post


Link to post

Have updated my own thread but thought I would update here as well. I have converted all disks to XFS and still having same issue. Still seems to be around 3am in the morning for me. Frustrating to say the least!

Share this post


Link to post

So, this morning at about 12.20 i re-enabled CA backup & Restore to backup my appdata and dockers and enabled fix common problems, and i wake up to a hung server.

 

it had been running about 9 days before i had to restart it due to the log filling up and getting blasted with emails every 10min telling me it was filling up, after that restart it was up about 1 days 15hrs before i re-enabled CA backup & Restore.

 

is this the problem?  does anyone else have this running?

 

The backup time is 3am, im not sure if this is default or i set it.

 

i have attached the logs.

 

 

FCPsyslog_tail.txt

lion-diagnostics-20170331-0733.zip

Share this post


Link to post

Mine is locking up at 3am each morning aswell and last thing in log is appdata backup. We might be onto something! Can you help Squid?

Sent from my SM-G930F using Tapatalk

Share this post


Link to post

i also have appdata backup running... had a lockup this morning as well!

 

just disabled, will see if more stable

 

Edited by grither

Share this post


Link to post

I still have mine set up to run tomorrow morning so I expect to see a freeze tomorrow or the next day then I shell disable it again and see what happens.

Share this post


Link to post
12 hours ago, rippernz said:

it had been running about 9 days before i had to restart it due to the log filling up and getting blasted with emails every 10min telling me it was filling up, after that restart it was up about 1 days 15hrs before i re-enabled CA backup & Restore.

 

is this the problem?  does anyone else have this running?

 

 I do not have CA Backup and Restore configured to run, but do have it installed.  

 

I was still seeing the same issue on 6.3.0, but I downgraded to 6.2.4 and have been running for 3.5 days with no issues thus far... I'll see if I can make it through the weekend.

Share this post


Link to post
9 minutes ago, MisterLas said:

 

 I do not have CA Backup and Restore configured to run, but do have it installed.  

 

I was still seeing the same issue on 6.3.0, but I downgraded to 6.2.4 and have been running for 3.5 days with no issues thus far... I'll see if I can make it through the weekend.

i also had it installed, not set to run.  have now uninstalled.  i also downgraded to 6.2.4 and thought things were ok, but then got a hang after 6 days.  better than every two days.

 

hope we're isolating the problem

 

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.