Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Back again with more crashing issues!

Featured Replies

Over the last few years I've had issues off and on with my unraid server crashing. At first it was a c-state issue that I fixed. Then a year later, my mobo re-enabled all that stuff somehow and I made changes to c-states again as well as how the machine delivered power at idle. Both times I got stability for a few months. 

 

A couple weeks ago, crashes started speeding up again. I found some time to check BIOS assuming that the changes got made magically again but they hadn't. I've run memtest for about 15 hours and no errors were found. What else should I check for root cause of crashes? I disconnected my UPS because I had heard that that could be buggy and cause random shutdowns, but this is a hard reboot. I am at the point now where it is happening every 3-5 hours give or take. Happens regardless of if parity check is happening (last time, it was consistently after parity finished). SMART data is all good except for a couple one off command errors but I checked seagate and they are not the issue (one of the 188 attribute codes). 

 

Thoughts at this point are, in order:

 

PSU

CPU

MOBO

 

 

Not sure what else it could be. Setup is:

 

MOBO: Gigabyte Technology Co., Ltd. B550M DS3H , Version Default string
American Megatrends International, LLC., Version F19
BIOS dated: Fri 22 Mar 2024 12:00:00 AM CDT

CPU: AMD Ryzen 7 5700G with Radeon Graphics @ 3800 MHz

RAM: 64 GB Gskill

PSU: Corsair RM650x

 

Diagnostic from yesterday attached.

tower-diagnostics-20241028-0821.zip

  • Author
4 hours ago, JorgeB said:

Enable the syslog server and post that after a crash.

Should note - syslog has been next to useless but here is the file - note - the first log of the reboot is always "Tower root: Delaying execution of fix common problems scan for 10 minutes" so you can see from those timestamps how often it crashes now

syslog-192.168.1.124.log

  • Community Expert

There are multiple segfaults from this, but done't know what "P" is:

 

Dec  4 06:50:07 TOWER kernel: P[proc][21012]: segfault at 100000002 ip 0000149a8c174d59 sp 0000149a79ccc1b8 error 4 in libc.so.6[149a8c045000+155000] likely on CPU 13 (core 5, socket 0)

 

One thing you can try is to boot the server in safe mode with all docker containers/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one, including the docker containers. 

  • Author
11 minutes ago, JorgeB said:

There are multiple segfaults from this, but done't know what "P" is:

 

Dec  4 06:50:07 TOWER kernel: P[proc][21012]: segfault at 100000002 ip 0000149a8c174d59 sp 0000149a79ccc1b8 error 4 in libc.so.6[149a8c045000+155000] likely on CPU 13 (core 5, socket 0)

 

One thing you can try is to boot the server in safe mode with all docker containers/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one, including the docker containers. 

The segfault hasn't happened in the last calendar year though and there have been dozens of crashes in the last few weeks. Also not many crashes around those segfault messages. 

 

Some of the crashes seem to happen after several attempts of this 

Tower flash_backup: adding task: /usr/local/emhttp/plugins/dynamix.my.servers/scripts/UpdateFlashBackup update

 

 

Any processes for hardware troubleshooting that doesn't involve rip and replace and not being able to return used items for the test? Or could that flash_backup be an issue?

  • Community Expert

If you have multiple RAM sticks try using the server with just one, if the same try with a different one, that will basically rule out bad RAM.

  • Author

I've run extensive testing there and am about 95% sure it isn't RAM. Interestingly - I saw that the writes to my USB were kinda high, sometimes dozens of writes at a time. 

 

image.png.4096e6d30fd71b791fcead710b5a892d.png

 

For example that pic was after like an hour of runtime. As stated in my previous reply I'm thinking there may be something to that based on the syslog around the crashes. This led me to another thread about a high number of writes and there talked about uninstalling unassigned devices preclear and myservers from plugins. I can't find myservers plugin though (even though that log file references it) I removed preclear as I had that installed and now my USB interactions look like this after 8-9 hours:

 

image.thumb.png.b68df2ff30679a2ff5b48ecc8f0ee513.png

 

That also means I have doubled my uptime average. It does seem that there was one overnight crash after around 9 or 10 hours instead of what had been 3-5... So we're getting somewhere, I think. 

 

Any other thoughts on root cause based on that information?

Take this with a grain of salt... but this kinda feels like a PSU issue... Do you have a spare one you could use/test?

 

Running the system without the dockers/services does reduce power draw(cpu usage, hdd write/read) making it easier for the PSU

 

IMO that's where I would start

  • Author
38 minutes ago, mathomas3 said:

Take this with a grain of salt... but this kinda feels like a PSU issue... Do you have a spare one you could use/test?

 

Running the system without the dockers/services does reduce power draw(cpu usage, hdd write/read) making it easier for the PSU

 

IMO that's where I would start

I have been leaning more and more that direction. I don't have a spare, unfortunately but am watching a bit to see over the next couple days if some of the changes I've done help. Based on the other things I've tested, I'm at its either PSU or UnRAID OS. You might have solidified my choice to grab a spare PSU to have around for spare anyway...

You could also try to boot into a live OS (mint linux or what ever) and run a stress test on it (sure there is something out there you could find to do this, youtube on loop?) if it crashes there then you know... if it doesnt... well that cant really rule out the PSU but... it's something

 

To eliminate the OS side of the house, you could try a fresh install of unraid but I have found that the OS is good and stable

 

Just something to consider/try

 

  • Author
17 minutes ago, mathomas3 said:

youtube on loop?

lol - in like 20 chrome tabs!

 

I do have a bootable that I could try and have a psu on its way. Something to try while I wait for sure, although as we speak I'm on my longest uptime in more than two weeks at 12 hours even

having random issues like this are always frustrating to work out... if anything get the PSU test it... and send it back if that doesnt solve it

  • Author

 

10 hours ago, mathomas3 said:

having random issues like this are always frustrating to work out... if anything get the PSU test it... and send it back if that doesnt solve it

Thinking more and more it is power related - just had it crash after a 16 hour uptime randomly. Thought I should cancel parity check since it keeps trying to do it on reboot and then it immediately crashed again which I missed. Just cancelled that parity check and am waiting to see. I tried to up the idle load via bios but yeah. PSU gets here Friday and I'm off that day so I know what I'll be working on...

  • Author

New update, I am noticing a lot of writes to flash drive again. I checked what files are being updated the most and they are under the .git/objects folder. Its adding 5-10 every few minutes. I had 300 writes an hour ago and am at 600 now. Can't find anything else changing. Not sure why git is updating like crazy...

  • Author
On 10/30/2024 at 11:35 AM, mathomas3 said:

having random issues like this are always frustrating to work out... if anything get the PSU test it... and send it back if that doesnt solve it

Welp. Got the new psu installed yesterday and the crashing actually got slightly worse if anything. Ruling that out... Noticing again that the writes to flash are ticking up... Can't figure out what is causing that though as the git logs are not showing anything being updated...

  • Author

Anyone want to give me a sanity check on this pre-boot stuff?

 

PXL_20241103_191752989.thumb.jpg.ce27d1b35862cc2bed529f5670bb8ee3.jpg

  • Community Expert

Looks normal to me.

  • Author

Cool - well here is the latest.

 

Replaced PSU - no dice

RAM - no dice

Docker turned off - finally made it through full parity check, crashed after parity was done.

Checked bios - no issues to report - all config as it should be

Disks are fine minus a random error on my original disks that just won't clear - the error id translates to 1 errors in 65538 operations. (its the 188 error that is ignored)

 

I'm in the danger zone at this point - I can't finish parity checks most of the time and because it crashes when parity finishes or is cancelled, I cannot invoke the mover and disks are starting to get into an undesirable state.

 

Please advise. 

 

EDIT: Crash occurs within roughly 7 min of parity finishing or being cancelled

Edited by seecs2011
Added context

  • Community Expert

Then 

34 minutes ago, JorgeB said:

Board or CPU would be the other main suspects

 

  • Author

Welp - I've now replaced all components and it is rebooting within a minute of the web portal becoming available now

 

  • Author

Worth noting, safe mode also crashes, only stability I've had for more than a minute or two has been the memtest I've been running just to make sure the system can stay up apart from UnRAID OS for a bit...

  • Author

Something broke in this process - I reinstalled the old hardware and am getting the same behavior

 

  • Author

Now it crashed with an Ubuntu live disk...what the hell? I can't figure out what is wrong with this thing

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.