Server Lockups unRAID 6.11.5

February 18, 20251 yr

Lets start this by saying I'm planning on upgrading to 7.0.0 after reading all the great upgrade stories on this forum but don't wand to perform an upgrade when I have lock ups happening. I stuck with 6.11.5 after reading about the 6.12 problems. If you think it would be advisable to upgrade or the upgrade will help I will be happy to do it...I've been looking forward to seeing the new unRAID. I usually update 1 to 6 months after an update is available. This has been the longest I haven't updated in my life. Almost bit the bullet many times.

Every 4 to 5 days after the automated reboot, the server looses connection with the GUI, Plex, and Cloudflare incoming to Dockers but the SSH and all other dockers are accessible on internal network. Diagnostics hangs and never completes. I was able to copy the log to the Cache Drive and ran the command manual shutdown, using umount, of the array. One disk was busy and the fuser command hung that was suppose to tell me the process(es) and the command that was suppose to blanket kill those processes also hung. I have unRAID Home Assistant installed so my logs is spammed with SSH commands and have been unsuccessful at suppressing the spam, annoying, so I will copy the logs after the lock up here. I was actively watching something on Plex when the unresponsiveness occurred. Just to mention it, powerdown, shutdown -r now, shutdown now, reboot, and pressing the power button did nothing. The shutdown command did the broadcast shutdown for maintenance message but never shut down the server, I had a pressing matter so the server had 3 hours to process these commands before I started manual shutdown procedure.

Server:

Supermicro X9DRi-LN4+/X9DR3-LN4+ , Version REV:1.20A

Dual Xeon E5-2690 V2

Quatro RTX4000 for Docker acceleration / Transcode using official plug in.

Asus STRIX RTX 960 Passthrew to Windows 10 Gaming VM, shutdown since I got my Steam Deck

LSi 9207-8i attached to some mysterious expander card that I got somewhere, can't find must info on, and it's been hooked up for years with out a problem. I have 15 drives and 2 Parity right now and getting a 120 MB/s parity check adverage.

Asus 4 port NVMe card with 2 Samsung NVMe Drives in it first as Apps Data Drive and other VM vDisk and ISO dive.

Long time user of unRAID and the stability of it is better than any other OS I've owned. I'm pulling my hair out with this one. Spent a couple hours today working on the server threw SSH trying to get all the info and gracefully shutdown.

Thank you to anyone who looks at this for me. I'm not sure but it looks like my AppData NVMe takes a dive from a kernel panic or the kernel panics because of the AppData NVMe drive problem.

Odd Stuff in Logs after the Crash.txt

Quote

February 18, 20251 yr

Community Expert

4 hours ago, Rudder2 said:

Every 4 to 5 days after the automated reboot, the server looses connection with the GUI

What do you mean by automated reboot?

Quote

February 18, 20251 yr

Author

5 hours ago, JorgeB said:

10 hours ago, Rudder2 said:

What do you mean by automated reboot?

Every Wednesday at 0400 the server gracefully reboots using a User Script.

My answer to keeping it running well many years ago. I can't remember why I needed the server to reboot every week but probably had to do with performance and it's been doing it even on my old hardware for like 10 years. I had a forum post about why and how to make it happen many years ago.

This server has been rock solid with me leaving it alone for 6+ months at a time. Been running 24/7 across 2 hardware builds for ~14 years on the same USB. Wouldn't surprise me if there are some cobwebs left over from years ago.

Quote

February 18, 20251 yr

Community Expert

The log you posted shows a btrfs filesystem issue with the pool, recommend backing up and reformatting it to see if it helps.

Quote

February 18, 20251 yr

Author

I copied everything to the cache drive and everything I collected is gone. It was there when I made this post and now it's not. I'm feeling sick. This is the first time I got data. The folder I created and copied it to is there but all the partial diagnostics and the /var/logs info I copied it gone.

Quote

February 18, 20251 yr

Author

Just now, JorgeB said:

The log you posted shows a btrfs filesystem issue with the pool, recommend backing up and reformatting it to see if it helps.

I will try that. Thank You.

Quote

February 18, 20251 yr

Author

28 minutes ago, JorgeB said:

The log you posted shows a btrfs filesystem issue with the pool, recommend backing up and reformatting it to see if it helps.

Performed a Scrub with Repair flag and got some errors. I'm starting to think you are right, something wrong with the BTRFS on that drive.

Odd Stuff in Logs after the BTRFS Scrub on nvme1n1p1.txt

Quote

February 18, 20251 yr

Community Expert

With btrfs I recommend

55 minutes ago, JorgeB said:

backing up and reformatting

Quote

February 19, 20251 yr

Author

@JorgeB I really don't use the benefits of BTRFS. Would you recommend going to a different file system?

Quote

February 19, 20251 yr

Community Expert

For single device pool you can use xfs.

Quote

February 19, 20251 yr

Community Expert

8 hours ago, Rudder2 said:

I really don't use the benefits of BTRFS.

Historically the benefits of btrfs for a pool were:

It included built in checks for data corruption
It allowed easy expansion of a pool by adding drives and they did not need to be of the same size.

Quote

February 20, 20251 yr

Author

I have ECC Ram and have no idea why I would of started to have BTRFS issues. I've read RAM issues is a big contributor.

Can BTRFS issues cause the kernel to panic? Or did the kernel panic cause the BTRFS issues? Can the AppData NVMe BTRFS issue cause the WebUI to not be contactable while SSH and and most Dockers were still reachable on internal network? My Docker File also lives on this NVMe. Would like to find root cause if I could.

The NVMe passes SMART but think this 2021 drive could be starting to fail? I'm going to format it when I have the time but debating on weather to just call the 4 year old Samsung drive that SMART says is 78% used no longer good for AppData.

I love DATA and loosing it is like loosing a child...LOL...Data Horder here.

I thank you both for the input. I never use more than one drive in my cache pools so Data Corruption Checks would be the only benefit I can see would be nice so the philosophical debate commences...BTRFS vs XFS for my cache pools since I have to reformat anyways.

I understand if some of this stuff can't be determined with the limited Data I have left. Thank you again.

Quote

Server Lockups unRAID 6.11.5

Featured Replies

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)