Unraid Unresponsive v6.4.1


damonkey

Recommended Posts

Hello, I hope someone can help me I have been using UnRaid for since 2011without issue. In the last 2 months I have been having a problem of the system becoming unresponsive.

I am running 6.4.1 pro. I have tried to trouble shoot this on my own but can't seem to pin it down. I have tried reinstalling from scratch, reinstalling all apps one at a time. running without apps. I do not have an answer. It just stops all response, no web, ssh, plugged in keyboard and mouse wont respond. I have run fix common problems with troubleshoot mode and attached logs. Please if someone can look them over I would appreciate it. 

FCPsyslog_tail.txt

unraid-diagnostics-20180225-1819.zip

Link to comment
  • 2 weeks later...

Well I am doing some more testing. I have tried a second lexar thumbdrive with a fresh UnRaid installed and system halted. Started UnRaid in maintenance mode and ran check file system on all drives no corruptions. I created SLAX on another usb and booted to it. system halted.So it is not UnRaid :D. I ran a 10 minute PSU test with no issues. I have started to pull memory dimms one at a time and boot to Slax. Even though I have run memtest over night two seperate times with a pass. I have also ordered a new Supermicro 8-Port PCI-E x4 controller to replace a Silicon Image SIL3132. used for cache drive. BONUS I get to add ore drives once I fix this 9_9. After testing memory I will unplug HDD's and test them. Last will be MB. I hope it is not that. I will have to trackdown same or a good replacement for current Supermicro x9scm-f.

Link to comment

Unfortunately no. I have tried running fix common problems in maint mode and looked through the logs with no errors. I have run htop on console with no notable issues. Right now I have pulled the pci sil3132 card out which leave me no cache but it has been up for 3.5 hrs with no issues. I don't have docker running due to this but until I see it go past 24hrs I won't make a call it was that. I did try running with the sil3132 card and cache with no docker and it froze also. My fingers are crossed it is the sil card since I have the AOC-SASLP-MV8 on order now. I will post one way or another if it freezes or works.

 

 

Link to comment

Well I spoke to soon. System froze with uptime 03:45:16 I have attached a screen shot. Since it is frozen I can do a diagnostics.  I did not have sil card installed so no cache so do docker so no fix common problems with maint mode. I have shut down and pulled 2x2GB dimms out s o running 8GB now. Will see what happens. IMG_2911.thumb.JPG.3525b69090980147f1444ed7189d8df3.JPG

Link to comment

Thanks trurl, I found that after the boot. I am running the troubleshoot mode now while doing a mprime large test. Next step is to flash bios again. then pull all drives and cables boot to unraid and let it sit for a while. Have never booted to unraid with no drives but I assume I can. 

Link to comment
12 hours ago, damonkey said:

Have never booted to unraid with no drives but I assume I can.

Of course you won't be able to start the array, but it should boot. It would even be possible to test with only some disks. But if you mount any array disks separately, or start the array with any changes to the disk assignments, you will invalidate parity.

Link to comment

This has happened to me four times. Cannot ping the server, console not responding, no web access. I *now* have FCP troubleshooting mode enabled. Just had my last lockup a few hours ago. When it happens there is nothing on the console apart from the "Login: " prompt. Looks like the same symptoms, so I'm watching your thread with much interest. Parity check is currently running as I had to hard reset.

Edited by PeteB
Link to comment

Hey Pete, So my system has been up for 1 1/2 days. I came home today and shutdown the server to start adding hardware back in one at a time. I had pulled 2x2GB dimms out and my cache, and controller card that it attached to. I have installed the controller card with out cache attached and powered back on. I will add the cache back tomorrow. I will continue to post daily until I determine the cause. Check out a post from WillDouglas. He is having similar issues and has not determined the root cause yet.  

Link to comment

The interesting thing is that I haven't seen any of these sorts of posts prior to version 6.4. My system was completely stable prior to the upgrade to 6.4.

 

I'm not saying it's a software problem (maybe 6.4 picks up errors that 6.3 didn't pick up), but there have been lots of these types of issues reported since 6.4. Wondering whether it would be helpful  if that is factored into the investigations. I'm considering going back to 6.3.5 to see if the problem goes away for me. That might contribute something to the investigations.

Edited by PeteB
Link to comment

I did rev back to 6.3 and still had lockups. I am moving more toward hardware related in my case. Maybe memory. I know that it is easy to say that it is the os/app allot of times. And for some it may be. I would say if you can trace back to any change that was made just before the issue. I know I jumped to app at first, then moved to OS. But after pulling out any relatively new hardware then slowly moving forward I am seeing it could be a bad dimm. One that I have had for 3~4 years. So things can go bad. Not saying I have determined it is. I have a lot to add back hardware & apps any could be the culprit. I would say post your diag logs. Let some guys here have a look at them. There are plenty very helpful and knowledgeable members. And if it is an app let the builder know they want to know and will work to fix it. Same for OS. 

Link to comment

System has been stable for 24hrs. I have added the SuperMicro AOC-SASLP-MV8 controller today. The one ssd cache drive is on it. I will be waiting another 24hrs to add back radarr then memory over the weekend. fingers crossed. If that goes well I might convert ssd from xfs to btrfs and add my second ssd back.

Link to comment

Well so far so good. Systems has been stable now for over 48 hrs. I added radarr back Friday morning and still going. I went ahead and enabled my nic bond alb back. I will wait for another 24~48 hrs to add memory back. Or I might just order 2 new 4gb  dimms to use. Been wanting to get more memory. 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.