hmnd Posted May 18, 2019 Share Posted May 18, 2019 I've scheduled parity checks to run monthly, but I keep getting a parity check started every few hours, which is causing all my docker containers to stop working. I've attached diagnostics if that helps (bigmac-diagnostics-20190518-0156.zip). Thanks in advance! Quote Link to comment
trurl Posted May 18, 2019 Share Posted May 18, 2019 Your diagnostics are from just after a reboot, and you are getting a parity check due to unclean shutdown. Are you sure you aren't losing power? Quote Link to comment
hmnd Posted May 18, 2019 Author Share Posted May 18, 2019 Oh! You're right. I think I may need to replace my UPS. Can you tell what caused the shutdown or is it just up to speculation? Quote Link to comment
Frank1940 Posted May 18, 2019 Share Posted May 18, 2019 23 minutes ago, hmnd said: I think I may need to replace my UPS. IT might be just the battery. You can check this with a couple of lights with incandescent lamps in them. (You want about 150W of load.) Shut the server down, plugin in the lamps. Now pull the power plug for UPS from the mains. Make sure it runs for about ten minutes on the battery. Less than five minutes and you need to replace the battery. UPS batteries for the home market UPS's are fairly standardized in size and capacity. There seems to be only two or three different ones and you can measure them with a ruler to get the size you need. Quote Link to comment
hmnd Posted May 18, 2019 Author Share Posted May 18, 2019 21 minutes ago, Frank1940 said: IT might be just the battery. You can check this with a couple of lights with incandescent lamps in them. (You want about 150W of load.) Shut the server down, plugin in the lamps. Now pull the power plug for UPS from the mains. Make sure it runs for about ten minutes on the battery. Less than five minutes and you need to replace the battery. UPS batteries for the home market UPS's are fairly standardized in size and capacity. There seems to be only two or three different ones and you can measure them with a ruler to get the size you need. I actually have a bunch of UPS's I can try with and a second one I tried just now had the same issue. Lowered the sensitivity on the UPS, as it was on high. Hopefully that fixes it. Quote Link to comment
Frank1940 Posted May 18, 2019 Share Posted May 18, 2019 There are couple things you should try. First, is a 24 hour memtst from the Boot menu. (The built-in memtst will not check ECC memory.) Second, the Power Supply. This is best done by replacement. Many folks have one in the junk box, can borrow one from a friend, or, even, 'borrow' one from a vendor with a liberal return policy. 8 hours ago, hmnd said: Lowered the sensitivity on the UPS, as it was on high. Hopefully that fixes it. This should really not fix it. This is what UPS are suppose to protect against. The only thing that lowering the sensitivity should do is to reduce the number of transfers from mains to battery. You could make a screen shot of your UPS settings for shutdown and post it up. (Many folks haven't really thought things through when setting up a UPS.) Quote Link to comment
hmnd Posted May 18, 2019 Author Share Posted May 18, 2019 11 hours ago, Frank1940 said: First, is a 24 hour memtst from the Boot menu. (The built-in memtst will not check ECC memory.) Did a bunch of memtests a week ago and all passed. 11 hours ago, Frank1940 said: Second, the Power Supply. This is best done by replacement. Many folks have one in the junk box, can borrow one from a friend, or, even, 'borrow' one from a vendor with a liberal return policy. I think this is it. I checked and turns out the only component I neglected to replace in my rebuild last year was the PSU. Ordered a new one for tomorrow, so hopefully that'll fix it. I'll update here with what happens. Thanks! Quote Link to comment
hmnd Posted May 19, 2019 Author Share Posted May 19, 2019 @Frank1940 Got the new PSU in and still no dice. Any other ideas? Quote Link to comment
Squid Posted May 19, 2019 Share Posted May 19, 2019 Have you checked the Ryzen threads? In particular something about C-States? Quote Link to comment
Frank1940 Posted May 19, 2019 Share Posted May 19, 2019 OK, you are running 6.7.0. The version contains a new troubleshooting tool. This is an (slightly) edited tip from the Tip and Tweaks plugin... "There is a very cool feature in Unraid 6.7 that can be used to capture your logs and help in troubleshooting a situation where your server locks up or becomes unresponsive. Go to Settings -> Syslog Server and enable the Syslog Server. Put the IP address of your server in the 'Remote syslog server' field. This will log all your server log entries to the Syslog Server and save the log to an array share. After a reboot from a lockup or hung server, you can view the saved log using the Syslog viewer." ('Help' will provide you with some tips on how to get this setup if you still have questions.) Hopefully, this new tool will capture things that the old 'Troubleshooting mode' in the Fix Common Problems plugin would miss. Quote Link to comment
bonienl Posted May 19, 2019 Share Posted May 19, 2019 (edited) For troubleshooting purposes the setting "Mirror syslog to flash" is introduced. This will copy everything logged to the flash device (even when the system is rebooted) With this setting enabled, it is not necessary to set up a syslog server explicitely. Edited May 19, 2019 by bonienl Quote Link to comment
Frank1940 Posted May 19, 2019 Share Posted May 19, 2019 2 minutes ago, bonienl said: This will copy everything logged to the flash device (even when the system is rebooted) The reason, I am hesitant about recommending using the Flash drive is because of the wear-and-tear on the flash drive. (I would not worried about my flash drive but having other folks do it is a bit of a concern...) But I have a question about how this logging function works. Is there a problem with looping back to the same server? Another way to ask the same question: Can I save the syslog to an share on the server that the syslog is for? Second question: Is it 'sticky' through a reboot? That is, if it is running when the server is rebooted (deliberate or otherwise), will the syslog server be running and logging as the server is starting up? Quote Link to comment
itimpi Posted May 19, 2019 Share Posted May 19, 2019 You can save the syslog to a share on the Unraid server. I have it set to save mine to my cache drive. Doing it that way survives a reboot. Quote Link to comment
bonienl Posted May 19, 2019 Share Posted May 19, 2019 31 minutes ago, Frank1940 said: The reason, I am hesitant about recommending using the Flash drive is because of the wear-and-tear on the flash drive. Yes, that's why it is recommended to use it only while troubleshooting (as explained in the Help) This function ensures everything is captured from the start of the system, unlike the local syslog server which only starts capturing after the network and shares are available. Quote Link to comment
Frank1940 Posted May 19, 2019 Share Posted May 19, 2019 17 minutes ago, bonienl said: ...unlike the local syslog server which only starts capturing after the network and shares are available. That makes sense. I never thought about the fact that those services don't really start until fairly late in the boot cycle. So I further gather from this comment that it is sticky across reboots. That is great as it will allow checking times and other details without have to look at two seperate files. Hopefully, this will turn out to be a really useful tool to troubleshoot the type of problem that this OP is experiencing. Quote Link to comment
hmnd Posted May 19, 2019 Author Share Posted May 19, 2019 3 hours ago, bonienl said: For troubleshooting purposes the setting "Mirror syslog to flash" is introduced. This will copy everything logged to the flash device (even when the system is rebooted) With this setting enabled, it is not necessary to set up a syslog server explicitely. Is this a good config for troubleshooting? Quote Link to comment
hmnd Posted May 20, 2019 Author Share Posted May 20, 2019 4 hours ago, Squid said: Have you checked the Ryzen threads? In particular something about C-States? I hadn't, but I've disabled global c-state (I think that's what it was called in bios) and I've got 3 hrs of uptime so far, so that may be it! Hopefully I haven't jinxed myself... Quote Link to comment
hmnd Posted May 20, 2019 Author Share Posted May 20, 2019 @Frank1940 @itimpi @bonienl Just crashed again. Syslog did save through the crash but there's nothing logged prior to it booting up again. Quote Link to comment
bonienl Posted May 20, 2019 Share Posted May 20, 2019 1 hour ago, hmnd said: @Frank1940 @itimpi @bonienl Just crashed again. Syslog did save through the crash but there's nothing logged prior to it booting up again. With "Mirror syslog to flash" enabled, everything is copied to the same file on the flash device, including the log after booting. Quote Link to comment
hmnd Posted May 20, 2019 Author Share Posted May 20, 2019 7 minutes ago, bonienl said: With "Mirror syslog to flash" enabled, everything is copied to the same file on the flash device, including the log after booting. I mean that nothing was logged pertaining to the crash. Just stuff a few hours before and the boot up from after the crash. Quote Link to comment
bonienl Posted May 20, 2019 Share Posted May 20, 2019 This sounds like you are having some hardware related issue. Quote Link to comment
hmnd Posted May 20, 2019 Author Share Posted May 20, 2019 Just now, bonienl said: This sounds like you are having some hardware related issue. Yeah, and I'm unsure how to proceed at this point... So far I've replaced PSU, memtested RAM, disabled global c state and cool n quiet on CPU. Also installed latest mobo firmware that was released a couple weeks ago. Any other ideas? Quote Link to comment
Frank1940 Posted May 20, 2019 Share Posted May 20, 2019 Do a bit of research on the BIOS updates for your MB. I had a look at your Diagnostics file and it looks like you have might one of the early Ryzen systems. As I recall, there were a lot of issues with them. I seem to recall that AMD actually replaced some CPU's that were having issues with Linux. Google is your friend in doing these types of searches. Quote Link to comment
hmnd Posted May 22, 2019 Author Share Posted May 22, 2019 (edited) On 5/20/2019 at 7:22 AM, Frank1940 said: I seem to recall that AMD actually replaced some CPU's that were having issues with Linux. I did quite a bit of searching and, at the risk of jinxing myself again, I think I've finally found the solution to this. First thing I did was add the following kernel boot parameter: rcu_nocbs=0-11, thanks to the post below. That should be added by clicking flash in the Main section and adding it directly after append under all the Syslinux config boxes starting with Unraid OS. Next, I added /usr/local/sbin/zenstates --c6-disable to my /boot/config/go file, per the post below. That disables C6, and I had already disabled Global C-state Control in BIOS before. My server has now been running for 18 hours straight without issue, and hopefully it'll survive through the night. I figured I'd document what I did here, in case anyone encounters the same issue in the future. Edited May 22, 2019 by hmnd Quote Link to comment
magicman32 Posted January 30, 2020 Share Posted January 30, 2020 On 5/18/2019 at 11:59 AM, hmnd said: I've scheduled parity checks to run monthly, but I keep getting a parity check started every few hours, which is causing all my docker containers to stop working. I've attached diagnostics if that helps (bigmac-diagnostics-20190518-0156.zip). Thanks in advance! What Cpu is giving you problems, I currently have a ryzen 5 1600 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.