damonkey Posted February 27, 2018 Share Posted February 27, 2018 Hello, I hope someone can help me I have been using UnRaid for since 2011without issue. In the last 2 months I have been having a problem of the system becoming unresponsive. I am running 6.4.1 pro. I have tried to trouble shoot this on my own but can't seem to pin it down. I have tried reinstalling from scratch, reinstalling all apps one at a time. running without apps. I do not have an answer. It just stops all response, no web, ssh, plugged in keyboard and mouse wont respond. I have run fix common problems with troubleshoot mode and attached logs. Please if someone can look them over I would appreciate it. FCPsyslog_tail.txt unraid-diagnostics-20180225-1819.zip Quote Link to comment
trurl Posted February 27, 2018 Share Posted February 27, 2018 Sounds like you have done some troubleshooting, but you don't mention having tried a memtest. Have you? Quote Link to comment
damonkey Posted February 27, 2018 Author Share Posted February 27, 2018 Hey trurl, Thanks Yes I have ran a memtest overnight no errors. Also change nic from bond rr, bond alb, no bond just 1 nic. changed out router. Different variations on docker apps plex, sab, sonarr, radarr/couchpotato running or not. Deleted all vm's. Quote Link to comment
JorgeB Posted February 27, 2018 Share Posted February 27, 2018 Might be worth converting all disks to xfs, it's a known issue with reiserfs and newer releases for some users. Quote Link to comment
damonkey Posted February 27, 2018 Author Share Posted February 27, 2018 Thanks johnnie, I figure my next step is to move my cache to xfs and test. Last step would be to move data to xfs. Quote Link to comment
JorgeB Posted February 27, 2018 Share Posted February 27, 2018 40 minutes ago, damonkey said: I figure my next step is to move my cache to xfs and test. Yes, as long as you limit all writes to the cache for some time it's a good way to test if it helps. Quote Link to comment
damonkey Posted March 10, 2018 Author Share Posted March 10, 2018 Well I am doing some more testing. I have tried a second lexar thumbdrive with a fresh UnRaid installed and system halted. Started UnRaid in maintenance mode and ran check file system on all drives no corruptions. I created SLAX on another usb and booted to it. system halted.So it is not UnRaid . I ran a 10 minute PSU test with no issues. I have started to pull memory dimms one at a time and boot to Slax. Even though I have run memtest over night two seperate times with a pass. I have also ordered a new Supermicro 8-Port PCI-E x4 controller to replace a Silicon Image SIL3132. used for cache drive. BONUS I get to add ore drives once I fix this . After testing memory I will unplug HDD's and test them. Last will be MB. I hope it is not that. I will have to trackdown same or a good replacement for current Supermicro x9scm-f. Quote Link to comment
willdouglas Posted March 10, 2018 Share Posted March 10, 2018 Have you observed any output at console when the system locks up? I'm seeing a similar issue and am also elbow deep in hardware tests that seem to pass with flying colors. Quote Link to comment
damonkey Posted March 10, 2018 Author Share Posted March 10, 2018 Unfortunately no. I have tried running fix common problems in maint mode and looked through the logs with no errors. I have run htop on console with no notable issues. Right now I have pulled the pci sil3132 card out which leave me no cache but it has been up for 3.5 hrs with no issues. I don't have docker running due to this but until I see it go past 24hrs I won't make a call it was that. I did try running with the sil3132 card and cache with no docker and it froze also. My fingers are crossed it is the sil card since I have the AOC-SASLP-MV8 on order now. I will post one way or another if it freezes or works. Quote Link to comment
damonkey Posted March 10, 2018 Author Share Posted March 10, 2018 Well I spoke to soon. System froze with uptime 03:45:16 I have attached a screen shot. Since it is frozen I can do a diagnostics. I did not have sil card installed so no cache so do docker so no fix common problems with maint mode. I have shut down and pulled 2x2GB dimms out s o running 8GB now. Will see what happens. Quote Link to comment
trurl Posted March 11, 2018 Share Posted March 11, 2018 1 hour ago, damonkey said: so no cache so do docker so no fix common problems with maint mode. Fix Common Problems is a plugin, not a docker, and will work just fine without a cache drive. Quote Link to comment
damonkey Posted March 11, 2018 Author Share Posted March 11, 2018 Thanks trurl, I found that after the boot. I am running the troubleshoot mode now while doing a mprime large test. Next step is to flash bios again. then pull all drives and cables boot to unraid and let it sit for a while. Have never booted to unraid with no drives but I assume I can. Quote Link to comment
damonkey Posted March 11, 2018 Author Share Posted March 11, 2018 So I let the mprime run for 2:30 hrs with no issues cpu should be good. I also have a I3 that I used before getting the E3 and put that in and it froze on that. I then re-flashed the bios to latest and powered back on. Next will be to pull the drives. Quote Link to comment
trurl Posted March 11, 2018 Share Posted March 11, 2018 12 hours ago, damonkey said: Have never booted to unraid with no drives but I assume I can. Of course you won't be able to start the array, but it should boot. It would even be possible to test with only some disks. But if you mount any array disks separately, or start the array with any changes to the disk assignments, you will invalidate parity. Quote Link to comment
PeteB Posted March 12, 2018 Share Posted March 12, 2018 (edited) This has happened to me four times. Cannot ping the server, console not responding, no web access. I *now* have FCP troubleshooting mode enabled. Just had my last lockup a few hours ago. When it happens there is nothing on the console apart from the "Login: " prompt. Looks like the same symptoms, so I'm watching your thread with much interest. Parity check is currently running as I had to hard reset. Edited March 12, 2018 by PeteB Quote Link to comment
damonkey Posted March 12, 2018 Author Share Posted March 12, 2018 Hey Pete, So my system has been up for 1 1/2 days. I came home today and shutdown the server to start adding hardware back in one at a time. I had pulled 2x2GB dimms out and my cache, and controller card that it attached to. I have installed the controller card with out cache attached and powered back on. I will add the cache back tomorrow. I will continue to post daily until I determine the cause. Check out a post from WillDouglas. He is having similar issues and has not determined the root cause yet. Quote Link to comment
PeteB Posted March 13, 2018 Share Posted March 13, 2018 (edited) The interesting thing is that I haven't seen any of these sorts of posts prior to version 6.4. My system was completely stable prior to the upgrade to 6.4. I'm not saying it's a software problem (maybe 6.4 picks up errors that 6.3 didn't pick up), but there have been lots of these types of issues reported since 6.4. Wondering whether it would be helpful if that is factored into the investigations. I'm considering going back to 6.3.5 to see if the problem goes away for me. That might contribute something to the investigations. Edited March 13, 2018 by PeteB Quote Link to comment
damonkey Posted March 13, 2018 Author Share Posted March 13, 2018 I did rev back to 6.3 and still had lockups. I am moving more toward hardware related in my case. Maybe memory. I know that it is easy to say that it is the os/app allot of times. And for some it may be. I would say if you can trace back to any change that was made just before the issue. I know I jumped to app at first, then moved to OS. But after pulling out any relatively new hardware then slowly moving forward I am seeing it could be a bad dimm. One that I have had for 3~4 years. So things can go bad. Not saying I have determined it is. I have a lot to add back hardware & apps any could be the culprit. I would say post your diag logs. Let some guys here have a look at them. There are plenty very helpful and knowledgeable members. And if it is an app let the builder know they want to know and will work to fix it. Same for OS. Quote Link to comment
WarDave Posted March 13, 2018 Share Posted March 13, 2018 I just had this same issue myself and I'm on 6.4.0, it started behaving really sluggish so I went to reboot it and went into a parity check upon reboot. It also took like 7 to 10 mins to mount all the drives on boot. Quote Link to comment
trurl Posted March 13, 2018 Share Posted March 13, 2018 If anybody other than damonkey wants support they should post their diagnostics in their own thread. Quote Link to comment
damonkey Posted March 13, 2018 Author Share Posted March 13, 2018 Thanks Trurl, I did not mean for others to post their logs here just that they should make a post. Quote Link to comment
PeteB Posted March 13, 2018 Share Posted March 13, 2018 Hi Trurl, No problems. Not after support just yet, just thought my experience may add value to this investigation. Quote Link to comment
damonkey Posted March 14, 2018 Author Share Posted March 14, 2018 Well I updated unraid to 6.5.0 and rebooted but did not start radarr yet or add my memory back. Am waiting to get my new controller card then add that then test for a day or two. If that works start radarr then test, then add memory last. Quote Link to comment
damonkey Posted March 15, 2018 Author Share Posted March 15, 2018 System has been stable for 24hrs. I have added the SuperMicro AOC-SASLP-MV8 controller today. The one ssd cache drive is on it. I will be waiting another 24hrs to add back radarr then memory over the weekend. fingers crossed. If that goes well I might convert ssd from xfs to btrfs and add my second ssd back. Quote Link to comment
damonkey Posted March 17, 2018 Author Share Posted March 17, 2018 Well so far so good. Systems has been stable now for over 48 hrs. I added radarr back Friday morning and still going. I went ahead and enabled my nic bond alb back. I will wait for another 24~48 hrs to add memory back. Or I might just order 2 new 4gb dimms to use. Been wanting to get more memory. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.