Jump to content

Array stopping for unknown reason & system restarts unknown reason


Recommended Posts

Please help, this is driving me insane.

 

Please note I used to have a lot more frequent crashes when I was running lots of containers ... I clicked scrub in anger and wiped out all my appdata...ive got to rebuild everything :''(

I have done the following:

 

  • attached both the persistent syslog & the anonymised diagnostics.
  • Run memtest for 3 passes and it come back clean...all disks pass SMART test.
  • Noticed the system clock is off (will sort) and also noticed .
  • Made the ipvlan change also from macvlan

 

The only thing i can think is that 3 passes on memtest is not enough OR that the networking setup is causing issues. Just as an FYI my Unraid box is hooked up to pfSense. It has 1 x 1Gb port connecting it. I have 3 x VLANs to manage different subnets. All the subnets, outbound NAT, routing, etc. is handled by pfSense.

tower-smart-20240708-0148 (2).zip tower-smart-20240708-0148 (1).zip tower-smart-20240708-0148.zip tower-smart-20240708-0147.zip tower-smart-20240708-0146.zip tower-diagnostics-20240706-1135.zip syslog-1720306751

Link to comment
1 hour ago, JorgeB said:

If you mean that the server is rebooting by itself, that is almost always a hardware problem.

I am then assuming the most usual culprit is RAM? Then drives?

 

I've not had a hardware issue before and struggling to diagnose then. Got any pointers on common tests to run and order of probability?

 

Thank you 

Link to comment
  • 2 weeks later...

Hey @JorgeB

 

how would you test board/CPU?

  1. I have run memtest for over 24h with no errors. 
  2. I have checked my PSU with a power supply tester - no issues
  3. Run SMART report on all drives - no issues

The crashes do not occur for anything particular like heavy usage of moving files or stressing the GPU. It occurred even when not running docker containers/just idling. The frequency varies, sometimes once in 24-48h window then others 5 times in a row, then again 6 hours later.

 

Only had these issues after upgrading from UnRaid 5.X -> UnRaid 6.X

Link to comment
On 7/16/2024 at 9:12 AM, JorgeB said:

You'd need to swap with a different one.

Okay so I am still experiencing crashes. I have done:

 

  • 36 hour memtest -> 0 errors
  • PSU test -> nothing wrong with PSU
  • Swapped CPU for new one
  • Swapped Mobo for new one

All that expense for naught - as it crashed again 1 hour after booting up the new motherboard and cpu...

Link to comment

should i run it in safe mode for a while to see if the restarts occur? and dont turn on VM/Docker

Then incrementally bring services back online until crashes occur?

 

Safe mode 

Not Safe mode (no docker)

Not Safe mode /w docker

Link to comment
On 7/24/2024 at 11:09 AM, JorgeB said:

It's worth a try.

So I did the following:

Safe Mode /w no gui = 3 days with no crashes, so proceeded to next step
GUI Mode with no array = 2 days with no crashes so proceeded to next step

GUI Mode with array, but no Docker and VMS = 1 day but then a crash

 

All disks are healthy.

What could this be do you think?

Link to comment
Posted (edited)
5 hours ago, JorgeB said:

Not sure I follow, I understood it only crashed with VMs running, not idling.

Originally it was crashing with Docker running. I had one VM but it was barely anything and a new addition to the stack.

 

This has been ongoing for quite some time, i ensured the array started automatically and just let it restart & do its thing...

 

However, the crashes seemed to be more frequent lately, to the point where it was becoming a blocker to me doing my stuff. That's when i decided to make this thread and get real serious about trying to diagnose this issue.

 


So after doing the checks mentioned before:

 

1 ) RAM memtest for 36 hours

2) PSU test

3) SMART Reporting for all drives

 

 

You then said it may be a CPU/MoBo problem, but the system just halting and restarting more often than not is a hardware issue.

 

 

So, I went on eBay and spent a couple hundred on those new bits. However, the issue persists.

That's now why i run the series of tests:

 

- Running both [Safe Mode] & [GUI w/o Array] didn't cause any crashes.

- It seems turning on the array is causing crashes...and the crashes happen more frequently if i run all my docker containers.

 

 

My gut it telling me to say two things:

 

1) The array, once started, cannot be stopped - it always says "retry unmounting disk shares"

2) Maybe there is a driver / storage issue somewhere (given crashes happen with the array turned on)

 

However, nothing shows in the Persistent Syslog.

 

Edited by toughiv
Link to comment

OK, I think I misunderstood your previous post, as VMs were running when it crashed, IMHO, if it crashes with just the array running, without docker and VMs, it's almost certainly hardware, but you should try to figure out what is preventing the array from stopping, even though that is most likely unrelated to the crashing.

Link to comment
2 minutes ago, JorgeB said:

OK, I think I misunderstood your previous post, as VMs were running when it crashed, IMHO, if it crashes with just the array running, without docker and VMs, it's almost certainly hardware, but you should try to figure out what is preventing the array from stopping, even though that is most likely unrelated to the crashing.

It has just crashed now and even when booting, it'll crash again and do so a couple times until it manages to stay up and running

 

Given that it'll crash before the array has started in those instances, what do you think that could be?

 

In fact it just happened and i was watching the screen and it crashed when the third party drivers (nvidia plugin) was being installed on boot ... maybe a coincidence. It is booting back up now...let's see if it happens there again. Yep it happened there twice!!

Maybe it is the nvidia plugin?

 

 

 

Link to comment
2 minutes ago, toughiv said:

It has just crashed now and even when booting, it'll crash again and do so a couple times until it manages to stay up and running

 

Given that it'll crash before the array has started in those instances, what do you think that could be?

 

In fact it just happened and i was watching the screen and it crashed when the third party drivers (nvidia plugin) was being installed on boot ... maybe a coincidence. It is booting back up now...let's see if it happens there again. Yep it happened there twice!!

Maybe it is the nvidia plugin?

 

 

 

I just tried to boot into Safe Mode - Gui - No plugins and it still crashed...

 

maybe red herring on the nvidia front

 

Link to comment
Just now, toughiv said:

I just tried to boot into Safe Mode - Gui - No plugins and it still crashed...

 

maybe red herring on the nvidia front

 

it actually still tries to install the nvidia third party plugin even when that safe mode option selected

Link to comment
44 minutes ago, toughiv said:

what do you think that could be?

I think it still continues to point to hardware, but still difficult to say which component exactly, since there are several that can cause that, if you have already swapped board and CPU, RAM and PSU would be the main remaining suspects, if you have multiple sticks of RAM, try just one, if the same, try the other one, that will basically rule out the RAM.

Link to comment
22 hours ago, JorgeB said:

I think it still continues to point to hardware, but still difficult to say which component exactly, since there are several that can cause that, if you have already swapped board and CPU, RAM and PSU would be the main remaining suspects, if you have multiple sticks of RAM, try just one, if the same, try the other one, that will basically rule out the RAM.

So just run the box without the GPU plugged in and it crashed.

Even though i run memtest for 36H, it could still be the RAM? If so, i'll buy a little 16GB and give it a go...

Feels like such an odd issue

Link to comment
  • 4 weeks later...

@JorgeB - I have replaced the RAM and UnRaid still randomly restarted.

So now I have proven:

 

- PSU is fine

- Replaced the MoBo

- Replaced the CPU

- Replaced the RAM

- Run it without the GPU

 

All still crashes. 

 

Surely this is an UnRaid issue?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...