Extremely strange behavior...


tessierp

Recommended Posts

So, my NAS seems to be in limbo again.. Well in-between... For some reason the web interface can't be accessed, can't remote login to the system through SSH.. All mounts are available though CIFS / SMB or NFS. The motherboard is OK I can log in to the IPMI and nothing unusual or no errors... But UNRAID is partially unresponsive. This really has me worried as to the stability of the system. Not sure if anyone has any suggestions. But my evaluation time is running out and having these issues is just making me wonder if UNRAID is going to be the right fit for me. I'm not dismissing the product or saying it is bad. Could be a combination of hardware, drivers, etc. But I really don't have any  way to diagnose the issue right now as I can't login to see the logs, etc.

 

If anyone has any suggestions as to what I could do, please let me know.. I'll wait before I forcefully reboot another 30 minutos to an hour.

Link to comment

I also connected a monitor to the system.. No output... 

 

Super strange...

 

For reference I'm using a 3900X processor on a X570D4U-2L2T with 32 Gigs of Kingston ECC memory. (Just in case others had similar issues with the same configuration I have.

Edited by tessierp
Link to comment
39 minutes ago, trurl said:

Have you setup syslog server?

I just enabled them again.. Hopefully this time it doesn't disable itself but basically I have added a share called syslogs and the server is active and pointing to that share. Last time it disabled itself on its own.

 

Just out of curiosity, could the fact that I went with BTRFS for my array explain the issues I am having? Doesn't seem logical but just thought I'd ask just in case. Most seem to use XFS. That is why I'm asking really.

Edited by tessierp
Link to comment

Alright, second time this happens... 1 core this time is 100%.. I can't stop my Plex docker container, can't download Diagnostics... I was able to download the syslog though.

 

Really not sure what is going on except that my UNRAID system has been extremely unstable. 

 

Did a MEMTEST86 and no errors found... IPMI reports no issues with the motherboard.

 

Not sure if the devs are able to look into this because it does not seem like a hardware issue at this point.

 

Of course I can't reboot or shutdown, it hangs when I try too just like when this type of issue happened before with 1 core stuck at 100% (last time was 3). At least that is what the webUI reports. HTOP reports same core 100% and core 7 at 50%. It weird because the list shows CPU around 2%.... It is like some of the cores have been locked by something...

 

Eventually the WebUI become unresponsive. I had to connect a monitor and keyboard to login and ask a reboot. It waited 90 seconds for a graceful shutdown and it had to force a shutdown.. Now it is collecting diagnostics and hanging there.

 

I guess my options now are to backup all my data and wipe the array and try with XFS to see if BTRFS is the issue but I am running out of trial time and unless I can find an explanation I'll have to move on to something else like TrueNAS. Really sucks as I prefer the flexibility UNRAID brings but the system is just too unpredictable and unstable.

 

image.thumb.png.20956ca6dadce1b63046b64e011fd03b.png

image.png

ptr1-nas-1-syslog-20211122-1252.zip

Edited by tessierp
Link to comment

Why does your log have these entries every few seconds?

Nov 22 00:02:00 PTR1-NAS-1 rpcbind[6252]: connect from 192.168.20.30 to getport/addr(mountd)
Nov 22 00:02:03 PTR1-NAS-1 rpcbind[6320]: connect from 192.168.20.32 to getport/addr(mountd)
Nov 22 00:02:10 PTR1-NAS-1 rpcbind[6415]: connect from 192.168.20.30 to getport/addr(mountd)
Nov 22 00:02:13 PTR1-NAS-1 rpcbind[6422]: connect from 192.168.20.32 to getport/addr(mountd)
Nov 22 00:02:20 PTR1-NAS-1 rpcbind[6478]: connect from 192.168.20.30 to getport/addr(mountd)
Nov 22 00:02:23 PTR1-NAS-1 rpcbind[6485]: connect from 192.168.20.32 to getport/addr(mountd)
Nov 22 00:02:30 PTR1-NAS-1 rpcbind[6538]: connect from 192.168.20.30 to getport/addr(mountd)
Nov 22 00:02:33 PTR1-NAS-1 rpcbind[6547]: connect from 192.168.20.32 to getport/addr(mountd)
Nov 22 00:02:40 PTR1-NAS-1 rpcbind[6580]: connect from 192.168.20.30 to getport/addr(mountd)
Nov 22 00:02:44 PTR1-NAS-1 rpcbind[6651]: connect from 192.168.20.32 to getport/addr(mountd)
Nov 22 00:02:50 PTR1-NAS-1 rpcbind[6687]: connect from 192.168.20.30 to getport/addr(mountd)
Nov 22 00:02:53 PTR1-NAS-1 rpcbind[6710]: connect from 192.168.20.32 to getport/addr(mountd)

 

Link to comment

Those are coming from my Proxmox Servers. Like I said I've been having issues perhaps with the switch (MAYBE). I had the same switch with my QNAP TS-563 and I never had any CIFS / NFS issues before. 

 

So what you see there is probably the Proxmox servers losing connection and retrying to connect. But that wouldn't be a reason to crash UNRAID and if it is then, it should be looked into.

 

Actually, glad you noticed this as for the time being I shutdown all my connections to UNRAID, had to shutdown a few Docker Services and I completely forgot about my Proxmox boxes that have an active share with UNRAID to automate VM backups.

Edited by tessierp
Link to comment

Seeing that no one really knows, I decided to reformat my array with XFS to see if that, somehow, could be my NFS and CIFS issues. As for my CPU having some cores locked to 100% at times preventing me from doing any reboot or shutdown via ssh or the WebUI (I basically have to forcefully poweroff), someone who is runnning Unraid with docker told me it happened to him a few times so I am not alone experiencing these issues.

 

I'm giving this one more chance with my 12 days trial to see if somehow things hold up better. I'm more than willing to work with the developers if they want to get to the bottom of this but I need a running system and can't justify paying if I can't have a somewhat stable system.

Link to comment

Ended up striking another bug.

 

So yesterday I proceeded to do my backup and then stopped the array, changed from BTRFS to XFS and started the array backup. Formatted.. And then I proceeded to copy everything. Everything was going fine and saw that all my Proxmox files were on so I proceed to mount via NFS shares. Not too sure what happened but somehow permissions were changed on me for that folder to "NOBODY". I did try to reassign permissions but it just wont work. I did try to reboot and released all NFS shares but nothing can be done.

Link to comment
16 minutes ago, trurl said:

group users: user nobody is the default for all files and should work since network access is controlled by the share settings.

I understand that but that does not explain why when I try to override and change the users it doesn't update. Other folders reflect the exact users that should have access except for this one due to the action I performed and not it is locked as NOBODY but that is incorrect since I have set different permissions.

 

I did try to turn off permissions and set them again, it just wont work. 

 

At this point this would force me to offload the content of this share, some 150 GB, somewhere else and recreate.

Link to comment
2 hours ago, Squid said:

Yeah.  Setting up a share means that's the share where another server will send it's syslogs to.  You wanted mirror to flash

If you do not want to mirror to flash then you can also set the IP of the Unraid server as the target (so that UnRaid is acting as both client and server).    This is what I prefer to do.

Link to comment

Thanks for all your help everyone. I appreciate it. I wish I had more time to troubleshoot the various problems I was juggling with but I really needed to get my NAS up and running and for now I looked for an alternative. I really think it is a combination of hardware I'm using. Granted those flat ethernet cables are junk and the switch is flaky at times but until I have more time I have to move on. UNRAID is an amazing product which is meant to be very flexible and I'm sure I'll be back when I have more time to dedicate to this but for now I need a NAS up and running.

Link to comment
  • 2 weeks later...
On 11/22/2021 at 9:24 AM, trurl said:

Why does your log have these entries every few seconds?

Nov 22 00:02:00 PTR1-NAS-1 rpcbind[6252]: connect from 192.168.20.30 to getport/addr(mountd)
Nov 22 00:02:03 PTR1-NAS-1 rpcbind[6320]: connect from 192.168.20.32 to getport/addr(mountd)
Nov 22 00:02:10 PTR1-NAS-1 rpcbind[6415]: connect from 192.168.20.30 to getport/addr(mountd)
Nov 22 00:02:13 PTR1-NAS-1 rpcbind[6422]: connect from 192.168.20.32 to getport/addr(mountd)
Nov 22 00:02:20 PTR1-NAS-1 rpcbind[6478]: connect from 192.168.20.30 to getport/addr(mountd)
Nov 22 00:02:23 PTR1-NAS-1 rpcbind[6485]: connect from 192.168.20.32 to getport/addr(mountd)
Nov 22 00:02:30 PTR1-NAS-1 rpcbind[6538]: connect from 192.168.20.30 to getport/addr(mountd)
Nov 22 00:02:33 PTR1-NAS-1 rpcbind[6547]: connect from 192.168.20.32 to getport/addr(mountd)
Nov 22 00:02:40 PTR1-NAS-1 rpcbind[6580]: connect from 192.168.20.30 to getport/addr(mountd)
Nov 22 00:02:44 PTR1-NAS-1 rpcbind[6651]: connect from 192.168.20.32 to getport/addr(mountd)
Nov 22 00:02:50 PTR1-NAS-1 rpcbind[6687]: connect from 192.168.20.30 to getport/addr(mountd)
Nov 22 00:02:53 PTR1-NAS-1 rpcbind[6710]: connect from 192.168.20.32 to getport/addr(mountd)

 

This is a bug from long ago, I get this still, even though it was supposed to have been fixed a while back.

 

 

Edited by pixitha
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.