Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Server Lockups unRAID 6.11.5

Featured Replies

Lets start this by saying I'm planning on upgrading to 7.0.0 after reading all the great upgrade stories on this forum but don't wand to perform an upgrade when I have lock ups happening.  I stuck with 6.11.5 after reading about the 6.12 problems.  If you think it would be advisable to upgrade or the upgrade will help I will be happy to do it...I've been looking forward to seeing the new unRAID.  I usually update 1 to 6 months after an update is available.  This has been the longest I haven't updated in my life.  Almost bit the bullet many times.

 

Every 4 to 5 days after the automated reboot, the server looses connection with the GUI, Plex, and Cloudflare incoming to Dockers but the SSH and all other dockers are accessible on internal network.  Diagnostics hangs and never completes.  I was able to copy the log to the Cache Drive and ran the command manual shutdown, using umount, of the array.  One disk was busy and the fuser command hung that was suppose to tell me the process(es) and the command that was suppose to blanket kill those processes also hung.  I have unRAID Home Assistant installed so my logs is spammed with SSH commands and have been unsuccessful at suppressing the spam, annoying, so I will copy the logs after the lock up here.  I was actively watching something on Plex when the unresponsiveness occurred.  Just to mention it, powerdown, shutdown -r now, shutdown now, reboot, and pressing the power button did nothing.  The shutdown command did the broadcast shutdown for maintenance message but never shut down the server, I had a pressing matter so the server had 3 hours to process these commands before I started manual shutdown procedure.  

 

Server:

Supermicro X9DRi-LN4+/X9DR3-LN4+ , Version REV:1.20A

Dual Xeon E5-2690 V2

Quatro RTX4000 for Docker acceleration / Transcode using official plug in.

Asus STRIX RTX 960 Passthrew to Windows 10 Gaming VM, shutdown since I got my Steam Deck

LSi 9207-8i attached to some mysterious expander card that I got somewhere, can't find must info on, and it's been hooked up for years with out a problem. I have 15 drives and 2 Parity right now and getting a 120 MB/s parity check adverage.

Asus 4 port NVMe card with 2 Samsung NVMe Drives in it first as Apps Data Drive and other VM vDisk and ISO dive.

 

Long time user of unRAID and the stability of it is better than any other OS I've owned.  I'm pulling my hair out with this one.  Spent a couple hours today working on the server threw SSH trying to get all the info and gracefully shutdown.  

 

Thank you to anyone who looks at this for me.  I'm not sure but it looks like my AppData NVMe takes a dive from a kernel panic or the kernel panics because of the AppData NVMe drive problem.

Odd Stuff in Logs after the Crash.txt

  • Community Expert
4 hours ago, Rudder2 said:

Every 4 to 5 days after the automated reboot, the server looses connection with the GUI

What do you mean by automated reboot?

  • Author
5 hours ago, JorgeB said:
10 hours ago, Rudder2 said:

 

What do you mean by automated reboot?

Every Wednesday at 0400 the server gracefully reboots using a User Script. 

 

My answer to keeping it running well many years ago.  I can't remember why I needed the server to reboot every week but probably had to do with performance and it's been doing it even on my old hardware for like 10 years.  I had a forum post about why and how to make it happen many years ago. 

 

This server has been rock solid with me leaving it alone for 6+ months at a time.  Been running 24/7 across 2 hardware builds for ~14 years on the same USB.  Wouldn't surprise me if there are some cobwebs left over from years ago.

  • Community Expert

The log you posted shows a btrfs filesystem issue with the pool, recommend backing up and reformatting it to see if it helps.

  • Author

I copied everything to the cache drive and everything I collected is gone.  It was there when I made this post and now it's not.  I'm feeling sick.  This is the first time I got data.  The folder I created and copied it to is there but all the partial diagnostics and the /var/logs info I copied it gone.

  • Author
Just now, JorgeB said:

The log you posted shows a btrfs filesystem issue with the pool, recommend backing up and reformatting it to see if it helps.

I will try that. Thank You.

  • Author
28 minutes ago, JorgeB said:

The log you posted shows a btrfs filesystem issue with the pool, recommend backing up and reformatting it to see if it helps.

Performed a Scrub with Repair flag and got some errors.  I'm starting to think you are right, something wrong with the BTRFS on that drive.

 

 

Odd Stuff in Logs after the BTRFS Scrub on nvme1n1p1.txt

  • Community Expert

With btrfs I recommend 

55 minutes ago, JorgeB said:

backing up and reformatting

 

  • Author

@JorgeB I really don't use the benefits of BTRFS.  Would you recommend going to a different file system?

  • Community Expert

For single device pool you can use xfs.

  • Community Expert
8 hours ago, Rudder2 said:

I really don't use the benefits of BTRFS.

Historically the benefits of btrfs for a pool were:

  • It included built in checks for data corruption
  • It allowed easy expansion of a pool by adding drives and they did not need to be of the same size.

 

  • Author

I have ECC Ram and have no idea why I would of started to have BTRFS issues.  I've read RAM issues is a big contributor.  

 

Can BTRFS issues cause the kernel to panic?  Or did the kernel panic cause the BTRFS issues?  Can the AppData NVMe BTRFS issue cause the WebUI to not be contactable while SSH and and most Dockers were still reachable on internal network?  My Docker File also lives on this NVMe.  Would like to find root cause if I could.

 

The NVMe passes SMART but think this 2021 drive could be starting to fail?  I'm going to format it when I have the time but debating on weather to just call the 4 year old Samsung drive that SMART says is 78% used no longer good for AppData.

 

I love DATA and loosing it is like loosing a child...LOL...Data Horder here.

 

I thank you both for the input.  I never use more than one drive in my cache pools so Data Corruption Checks would be the only benefit I can see would be nice so the philosophical debate commences...BTRFS vs XFS for my cache pools since I have to reformat anyways.

 

I understand if some of this stuff can't be determined with the limited Data I have left.  Thank you again.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.