Jump to content

Ongoing, Intermittent Unraid (6.9.2) Issues


Recommended Posts

I have been having ongoing issues with my Unraid server. Every few days all the shares will disappear and all my Docker containers will stop responding. I haven't been able to complete a parity check for a couple of months, and every time I try to run a parity check it always finds exactly 5 sync errors prior to the other issues showing up. A reboot restores the shares and services, but I am wondering if there are some logs I need to pull prior to reboot that will help me identify the issue(s)? interestingly enough, the parity check continues on after the other issues show up, but I always stop it and reboot the machine in order to restore services.

I have not yet rebooted the box this morning, and I am willing to leave it offline for a while to see if anyone gets back with me.

I am running Unraid 6.9.2.

Thanks in advance,
ecnal

Edited by ecnal.magnus
Link to comment

With every container except for Plex and Tautulli disabled, the parity check finally completed -- although it again had 5 errors written to correction. But towards the end of the parity check process, my machine was running low on RAM. I have 64GB, the max my motherboard will allow. Now that I think of it, the other times the shares crashed I saw memory allocation errors. Anyway, I used Top in the console to see what was using the RAM and it was shfs. I stopped the array and the RAM utilization went back to normal levels. When I reenabled the array shfs wasn't using very much RAM. Now that I have been running it for a few days with all the containers operational, RAM is again creeping up to high levels of utilization, with Top reporting that most of that utilization is due to shfs. My guess is that within a day or two, without intervention on my part, my shares will all crash again and I will see the memory allocation errors show up.

Does anyone know why shfs would be hogging all my memory? Are there particular logs that I should be looking at for this issue?

Thanks in advance,

ecnal

Screen Shot 2022-04-07 at 7.09.08 PM.png

Screen Shot 2022-04-07 at 7.09.23 PM.png

Link to comment

I have not. I bought the RAM less than a year ago brand new. I realize that doesn't necessarily mean it doesn't have issues, but it is relatively new and high quality. Do parity errors usually indicate bad RAM? I also found the attached post talking about shfs and memory leaks. I can carve out some time to run memtest if that is what it takes? Just wondering if there are other approaches or potential issues that don't take the server offline for such an extended period?

ecnal

Link to comment
11 hours ago, ecnal.magnus said:

Do parity errors usually indicate bad RAM?

Parity errors don't usually indicate bad RAM. Recurring parity errors may indicate bad RAM, which seemed to be what you were saying.

 

A correcting parity check followed by a non-correcting parity check, with no reboots between, would be required to get a better idea. If after correcting parity errors, there are still parity errors, then examining the syslog for both parity checks could tell if the same sectors are in error, which might indicate a disk problem, or if different sectors are in error, which would probably indicate bad RAM.

 

So either way, you need to get to the bottom of your parity errors. The only acceptable result is exactly zero sync errors.

 

And, you don't even want to try to run a computer unless the RAM is trustworthy. Everything goes through RAM, your data, the executable code for the OS, everything.

Link to comment

Last night the RAM usage was pushing 100%, so in an effort to stave off a crash I shut all the containers down and tried to stop the array. The array wouldn't stop, it just sat there attempting to stop. I went in and killed the shfs service and the array immediately stopped. It seems that shfs is my problem, as far as these crashes are concerned, that is. I am going to run memtest starting Monday morning, but I genuinely don't expect to find anything. It seems that whatever is causing shfs to use so much RAM, and to hold onto it when I try to stop the array, is likely my issue. I suppose I have to look into what shfs actually does, and perhaps upgrade my version of Unraid to the beta channel or something, to see if that corrects it. In the other thread I read that the people experiencing a similar issue with an older iteration corrected it by downgrading. I don't know exactly when this issue started, but I did upgrade a couple of months ago to the current stable version, and it seems to me it has been since then that I started experiencing these RAM allocation errors.

ecnal

Edited by ecnal.magnus
misspelling
Link to comment

I awoke this morning to a crashed server. I was running parity check and it was at 80% when I went to bed, it also had the telltale 5 sync errors that have shown up every time I have run it for at least 2 months now. As it nears completion is when it usually crashes. It has completely new RAM in it. Also, I ran memtest on the old RAM and it passed with no errors. Are there specific logs I should be looking at to see what happened, and if so, what would I search for in those logs?

ecnal

Link to comment
On 4/8/2022 at 8:49 AM, trurl said:

A correcting parity check followed by a non-correcting parity check, with no reboots between, would be required to get a better idea. If after correcting parity errors, there are still parity errors, then examining the syslog for both parity checks could tell if the same sectors are in error, which might indicate a disk problem, or if different sectors are in error, which would probably indicate bad RAM.

 

Link to comment

I searched through the syslog for the term "parity" and it found nothing. I am not certain what else to look for -- specific search terms that would take me to the relevant sections of the log?

Also, I have only got it to complete 1 parity check in the last 2 months, and the followup failed last night.

ecnal

Edited by ecnal.magnus
more information
Link to comment

At this point what I know is that the process shfs is grabbing ahold of RAM and not letting it go. This eventually leads to memory exhaustion and whatever process or service that manages the shares crashes. This takes anything that needs access to the shares with it -- primarily Docker. It is a relatively slow process, with shfs slowly using progressively more RAM as time goes on. It usually takes anywhere from 48 to 120 hours before the shares crash. After the shares crash the server is still accessible, I just can't get any of the shares to show up. After shfs is showing that it is starting to accumulate RAM, if I try to stop the array the stop process hangs and the bottom left notification area says "array stopping -- RETRY UNMOUNTING USER SHARE(S)." It never stops until I either power down the server forcefully, or I get in the terminal and kill the shfs process manually.

Shfs is at least part of the problem. I don't know if it is the primary problem, but whatever is causing it to grab ahold and not release RAM is what is eventually causing the memory exhaustion issue and crashing the shares. I currently have all my Docker containers shut down and in this state Top indicates that shfs is using 0.1% of RAM. I only just restarted the server so I don't know if what shfs will do in the current state. I will watch it and see if it starts to accrue RAM with all the containers shut down. Basically the server is up, the array is started, and I have parity check running, but literally nothing else is running -- no VMs or containers. I have an external syslog server collecting logs. I have exported them and I will attach them to this response. I don't know which logs to look at or what to look for in those logs. If someone could give me an idea along these lines I would really appreciate it. I did a quick search for "parity" in them, but I didn't see anything. I'm not certain that the parity issues aren't ancillary to the shfs issue and only happening because shfs is using all the memory. I know that shfs is causing problems, I just don't know why.

Honestly, I am pretty frustrated at this point and I don't know where to go from here.

Thanks,

ecnal

I edited this and attached more log files I found. There are two IP addresses on the server, and the new logs are from the second IP address.

wsdd.log winbindd.log unassigned.devices.log sudo.log sshd.log smbd.log sSMTP.log rsyslogd.log root.log rc.inet1.log rc.docker.log quit#015.log proftpd.log ool.log ntpd.log nmbd.log nginx.log kernel.log exit#015.log emhttpd.log ecnal#015.log dnsmasq.log dnsmasq-dhcp.log dhcpcd.log #015.log

winbindd.log webGUI.log unassigned.devices.log sudo.log sshd.log smbd.log sm-notify.log shutdown.log shfs.log sSMTP.log rsyslogd.log rpcbind.log rpc.statd.log rpc.mountd.log root.log rc.inet1.log rc.docker.log ntpd.log nmbd.log nerdpack.log kernel.log init.log haveged.log emhttpd.log dhcpcd.log crond.log avahi-dnsconfd.log avahi-daemon.log acpid.log ProFTPd.log

Edited by ecnal.magnus
attached more log files
Link to comment

I discovered the attached posts from 2020 and they put me on the right path. I currently have SMB disabled and everything is working fine (other than I don't have SMB access, that is). In the process of scouring my shares and disabling SMB on all of them, I discovered a share whose "Share settings" page won't fully populate. It is the only one that behaves this way (screenshot attached). It only loads the top portion of the page, and then the loading indicator on the browser tab just spins until it eventually stops and all that is displayed is this much of the page. I truly don't care about the data on this share. Does anyone know the implications if I just delete this share from the config file? Is there a better way of going about getting access to this shares setting page? Is there a config file where this Share Settings page can be altered by hand in a text editor?

1164644423_ScreenShot2022-04-17at9_59_52PM.thumb.png.da8944e18b86820054e4a84ab43254c3.png

Thanks,

ecnal

Link to comment
  • 1 year later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...