Please help. My unRAID is constantly locking up, and has become unusable.

relink · April 28, 2021

I have been having this particular issues for a couple days now. My server will become mostly unresponsive. Sometimes it recovers, and sometimes a hard reboot is the only option. I'll do my best to lay out everything I know, but I'm beginning to feel overwhelmed.

Now what I mean by mostly unresponsive is that some of my docker containers will still work just slowly (such as NGINX, Wallabag, PiHole, etc.) while others become unresponsive, most notably Plex. Sometimes the unraid UI becomes inaccessible. sometimes it can be accessed but I cant get diagnostics, I cant start or stop any containers or the array, I cant reboot or shutdown. Its like the UI is visible, but I cant actually DO anything. When this happens I can usually still SSH in, but I cant control anything. I tried to reboot over SSH by running "shutdown -r now" and it came back with the usual "system is going down" message, but it never actually did anything.

I do have a separate Syslog server so I have been monitoring the errors, and the only thing that catches my attention are Kernel errors pertaining to "aacraid" which is my Adaptec 71605 HBA. I did find a page discussing solutions for my card but its using Ubuntu, so I don't know how or if I even should apply that to unRAID. The Page is here. But im not even sure if this is the issue, or just a coincidence.

So far what I have done:

-Ran Memtest (passed)

-Checked SATA cables

-ran chkdsk on flash (Passed)

-updated mobo to latest bios

-updated RAID card to latest bios

-Tried RAID card in other PCIE slot.

-unraid is fully up to date

-all plugins are fully up to date

-disabled write cacheing on HBA disks (this was mostly to protect me from all the hard shutdowns im doing)

System Specs:

Ryzen 5 2600

ROG Strix B450-F Gaming

32GB Corsair Dominator

Adaptec 71605 RAID Controller

I attached:

-Full Syslog

-Syslog showing only Kernel errors

-Diagnostics, although Im unsure if it will really show much.

EDIT: Attached a copy of the logs as .txt files in a .zip archive.

All_2021-4-28-10_7_11.csv Kernel-Errors-Only_All_2021-4-28-10_8_4.csv serverus-diagnostics-20210428-1052.zip

Logs.zip

Edited April 28, 2021 by relink

trurl · April 28, 2021

Instead of .csv, please attach syslogs as plain text files or even better, zipped plain text files.

relink · April 28, 2021

10 minutes ago, trurl said:

Instead of .csv, please attach syslogs as plain text files or even better, zipped plain text files.

Sorry about that, just went with what my Syslog server gives me.

New copy is attached. I'll also add it to the original post too.

Logs.zip

trurl · April 28, 2021

Those are maybe even more difficult to make sense of. They are in descending timestamp order just like the .csv, but can't be easily sorted since they are text files. Maybe the .csv would be better but I don't have a licensed copy of Excel on this computer.

Sorry, I am just more familiar with looking at syslogs in the usual way they appear in Diagnostics. Maybe someone else will take a look.

Or, you could try to get Diagnostics from the command line when the problem occurs.

relink · April 28, 2021

31 minutes ago, trurl said:

Those are maybe even more difficult to make sense of.

Is the attached file any better?

31 minutes ago, trurl said:

Or, you could try to get Diagnostics from the command line when the problem occurs.

Unfortunately I cant, it just freezes when I try to get diagnostics.

All_2021-4-28-12_39_59.html

rodan5150 · April 28, 2021

I looked over it, I'm no expert, but you do have quite a few errors and warnings. Not sure what is critical or would cause hangups/crashes.

Anyway, I took your CSV file, sorted it in descending orde r by dat e and time stamp, then exported as tab delimited txt file. Maybe this will help others to interpret it better.

All_2021-4-28-10_7_11_tab delimited.zip

Edited April 28, 2021 by rodan5150

relink · April 28, 2021

41 minutes ago, rodan5150 said:

I took your CSV file, sorted it in descending orde r by dat e and time stamp, then exported as tab delimited txt file.

Thanks, I was trying to figure out a quick way to do that but couldn't think of anything.

Vr2Io · April 28, 2021

Pls run UEFI memtest86 and setting test all CPU in parallel.

Edited April 28, 2021 by Vr2Io

relink · April 28, 2021

37 minutes ago, Vr2Io said:

Pls run UEFI memtest86 and setting test all CPU in parallel.

wow, I just read that entire thread, I really hope it's not a RAM or CPU issue, Neither are exactly affordable right now. But I'll take your advice and run the test as soon as I get home from work in about an hour.

Vr2Io · April 28, 2021

11 minutes ago, relink said:

wow, I just read that entire thread, I really hope it's not a RAM or CPU issue, Neither are exactly affordable right now. But I'll take your advice and run the test as soon as I get home from work in about an hour.

Great you have read entire thread, personally confirm RAM issue quite common and easy for fix, CPU really seldom will be the cause at least I never face that.

Just say I won't test by legacy memtest86.

Edited April 28, 2021 by Vr2Io

relink · April 28, 2021

So far so good. I’m going let it run overnight and I’ll check it again in the morning.

trurl · April 29, 2021

10 hours ago, relink said:

Ryzen 5 2600

Don't know if there is anything here for you or not:

relink · April 29, 2021

12 hours ago, trurl said:

Don't know if there is anything here for you or not

I read over that thread numerous times. It really only seems to pertain to 1st gen Ryzen, but I tried it anyway, and its made no difference.

relink · April 29, 2021

@Vr2Io Alright the memtest ran over night and passed.

MemTest86-Report-20210428-211336.html

Vr2Io · April 29, 2021

6 minutes ago, relink said:

@Vr2Io Alright the memtest ran over night and passed.

Then main hardware should be health, next I would shoot on HBA, Does Container / Docker image store on array or separate SSD ?

Edited April 29, 2021 by Vr2Io

relink · April 29, 2021

2 minutes ago, Vr2Io said:

Then main hardware should be health, next I would shoot on HBA, Does Container / Docker image store on array or separate SSD ?

Appdata, System, and Domains are all on a NVME cache pool.

Also I have the Maxview utility installed as a container. I don't know if you're familiar but it's a tool to monitor and manage Adaptec cards. The screenshot below is my HBAs status, which all looks good to me.

Vr2Io · April 29, 2021

I haven't ASR71605. Next I would suggest unplug HBA and whole array disks but keep disks in power on.

Then new config and add a dummy disk which connect to onboard SATA, so you can start array, container and docker to check does system crash again.

Pls backup whole boot flash and record down all disk assignment.

relink · April 29, 2021

56 minutes ago, Vr2Io said:

I haven't ASR71605. Next I would suggest unplug HBA and whole array disks but keep disks in power on.

Then new config and add a dummy disk which connect to onboard SATA, so you can start array, container and docker to check does system crash again.

Pls backup whole boot flash and record down all disk assignment.

What would I need to do to ensure that I will be able to restore my current config when I'm done?

Also I'm not sure anything would happen, the troubleshooting I have done so far seems to point to issues when there are a high number of writes going onto the array. I kept my server online all day yesterday until I ran the memtest, and it ran fine. What I did different was I disabled things like Radarr and Sabnzbd, anything that would do heavy writes.

In fact Im about to run a test and see if I can make it crash by doing a large transfer from my desktop to unraid.

Vr2Io · April 29, 2021

9 minutes ago, relink said:

What I did different was I disabled things like Radarr and Sabnzbd, anything that would do heavy writes.

I always heavy read / write in array, but mainly sequential I/O, I haven't run Radarr and Sabnzbd.

9 minutes ago, relink said:

I need to do to ensure that I will be able to restore my current config

You can use Unraid build-in backup flash feature.

Does same system running stable before ? With which ver. Unraid or since any change then have problem.

Pls found similar case as below, pls hold on for remove HBA and array disk and keep track does no heavy random I/O then no crash.

Edited April 29, 2021 by Vr2Io

Vr2Io · April 29, 2021

Will parity check ( no correction ) also crash ?

relink · April 29, 2021

Just now, Vr2Io said:

Will parity check ( no correction ) also crash ?

No parity checks run just fine, over 100MB/s the entire time.

Im also currently running a simple test, Im transferring several 100GB of data from my one of my computers to a no cache share on my unraid server...and so far im transferring at about 36MB/s and not a single error in my log and everything is still working...I dont get it...

Vr2Io · April 29, 2021

Then I assume not hardware related issue.

As I almost no random I/O happen in Unraid, so I really no much suggestion/idea in this case.

relink · April 29, 2021

just to spice things up since just writing data doesn't seem to be enough, I decided to start a non-correcting parity check while it's still transferring. I'll probably start to do a large read transfer now too.

relink · April 29, 2021

Ok, so I have several 100GB going into a no cache share, I have several 100GB being read from the array, and I'm running a parity check.

The writes have slowed to about 1MB/s or less, the reads are between 34-40MB/s, and the parity check is running at about 45MB/s. Also CPU useage is around 50% average, and RAM is around 50%.

Ok things changed before I even posted this. The Parity check is now running around 80MB/s and the writes have practically stalled.

Vr2Io · April 29, 2021

Slow expected, but crash abnormal, just make more test to narrow down the symptom .

Edited April 29, 2021 by Vr2Io

Please help. My unRAID is constantly locking up, and has become unusable.

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation