Jump to content

Frequent system lockups/crashes in last 6 weeks


Go to solution Solved by vw-kombi,

Recommended Posts

Re the link to the ryzon stuff - I have had a stable system for many years - since when it was first 'upgraded' to ryzen in 2019. So I cant see that as being relevant after all this time.

I have had no bios updates or anything - the system is the same as it has always been - so why does the ryzon matter now - can only be Unraid O/S related

Had two more system lockups this week.  I was overseas for the other four lockups.

I am home now, so this time I am able to see the monitor screen - and its blank.

Have no idea what has made this so unstable lately - since OS update to 6.12.5 it seems.

memtest is all clear.

Maybe another hardware issues - mobo/cpu ?

I had 6.9.2 running for a record 6 months straight I think at one stage - only planned outages.

Went to the mid 6.10's also with no lockups.

Went to mid 6.11's also with no lockups - and I never had the macvlan issues that people kept on about.

The upgraded to 6.12.4 - and did the special macvlan instructions - as I really need that for my unifi controller to report IP addresses per each mac.  Seemed stable on that release too.

I suspect 6.12.6 done in Jan was when this instability for me came in.

I'm on 6.12.8 now - as of yesterday - and just had the first crash on that release also.

 

Considering getting a fingerbot for home assistant to push the button remotely at this stage!!!!!!

Link to comment

Did that - writing to a share called syslog now - just woke up to another crash.

The monitor was attached in gui mode and had the logon screen there - but after user/pwd, nothing happened.

Had to give it the finger again.  Parity running again - which I expect to fail while I am sleeping again.

It seems to be where I would expect the parity to complete based on timeline - but I am not in country, or awake when it has happened to date.

Diags attached.

tower-diagnostics-20240306-0611.zip

Link to comment
Posted (edited)

I noticed a number of ffmpeg oom's - not sure if related - I have stopped the frigate docker due to this.

I only have emby, cloudflared, home assistant, tvheadend, mosquitto and unifi controller running now.

Ram usage is 27% of the 32GB now.  Normally it sits around 49%.

Parity check says it will complete in 13 hours ish - maybe I will be awake for this one......

sync errors corrected shows 1 so far!!!!

Edited by vw-kombi
spelling
Link to comment
3 hours ago, vw-kombi said:

Ran to the server and I could see it rebooting

If the system reboots itself (as opposed to freezing) you almost certainly have a hardware error.

 

likely suspected tend to be the PSU struggling to handle the load or the CPU overheating and causing thermal related shutdowns.

Link to comment

That's the first ever auto restart like that.

One thing that is new (only yesterday) is the connection of the unraid server to my solar inverter via a USB cable so I could add the monitoring of that to the home assistant docker.  That's the once change since I moved from a VM for home assistant to a docker.

 

I am monitoring temps and there is nothing suspect on the CPU, or on the disks.

 

The power supply is also reasonably new as of January - so I cant discount that as being part / the cause of the issue at this stage.  It is a Corsair 750W 80+ Bronze Power Supply.  Its supplying an AMD Ryzen 7 2700 Eight-Core @ 3200 MHz, and 6 drives (4x10TB, 1x8TB and 1x4TB).  There is only a basic graphics card for the system to post.  With tips and tweaks, Turbo boost is off, and the system is set to power save.  UPS load with the parity check running (all disk) is currently showing as 126W, but that includes a router, three switches, rasberri pi, isp gateway and two unifi access points - so I suspect the system is drawing under 80W.

 

Link to comment

woke up to a crash again overnight.  I got the emails from the nightly ZFS sync so this crash was after 4:30.

I am running another memtest to see if anything different (as the kids did it remotely for me last time).

 

I see so many reports on redit and here about crashes on this later 6.12.x releases.  Could it be an OS compatibility and I am wasting all my time on hardware for nothing ?

As I said, I have had many years of stability and all these issues are on 6.12.4 starting in Jan for me.  

 

My 6.11.5 stayed up from the moment it was upgraded until the moment it went to 6.12.4 - surely that is not a coincidence with all the other posting of hardware lockups.  Would I be wasting my time and effort in going back to 6.11.5 ? 

 

I dont have the money for a new CPU and motherboard so I will have to dust of my backup emby server that was de-commissioned in 2019 and get all that updated to latest and slowly move the drives over to it one by one (as they are all old ones).  

 

 

 

Link to comment

Shut it down, reset bios, (bios is up to date for my CPU), disabled C states, no overclocking or anything, gave it a big clean out - got dust of CPU fan etc etc.

Disconnected UPS, solar and current cost USB's and stopped home assistant docker also.  

New parity check running now.  Hope to get through one on 6.12.x......

Link to comment

So I have been really babysitting this while the parity check is running.

Strangely, I notice disk 4 was spun down - which I thought should never happen in a parity check.  I clicked to spin up, and also changes the disk setting to never spin down.

Notice the reads on the disks - almost all the same except that one - and also - after spinning up - the reads are not changing.  Is this some sort of bug/issue ?

Is it because that disk is so much smaller than the others ?

 

image.thumb.png.0067580265a2b70f6e6a18b78e5735e8.png

 

 

 

Link to comment

Great News!  I got a clean parity check for the first time in what seems like ages.......

So - question is - which of these things was the cause :

1 - reset bios, disable Cstates, clean PC and CPU fans (note C states were always enabled before for years).

2 - removal of the USB connection for the solar inverter, and removal of home assistant (which gets randon flooding on that port)

3 - combination of the above

 

I will plan for another parity check with full dockers running as normal.

I am going to move home assistant and the USB connecters back to the raspberry pi until I get a small mini pc to run that, and maybe a few frigate cameras on it.

 

Link to comment

No more issues, so start to put things back - but I dont plan on going back to docker for home assistant, as that is somehting that also changed in January, along with two USB connections from inverter and current cost - so they are no longer connected - moving back to Rasberry Pi for them.

 

So - I have something very strange to report........

I was adding my stuff back, -  APC UPS cable - so I plugged that in, and re-enabled the UPS in unraid.

The strange thing - the UPS load is now showing just 63W.

This would normally be between 93W and 120W.

There are 5 disks running also.........

I checked my other things connected, and they are still running - I can see 12W being delivered to the POE devices, and there is a few other low power stuff that would add up to a small bit extra - maybe another 15W.

 

So - on paper it would seem my server is now using under 40W - which I doubt.........

I will get a meter on it later today and update here.

 

Link to comment

It had a good run - but today I noticed a parity check running - checked logs and it has rebooted itself (rather than a lockup).

Nothing in the logs - whatever is happening is just restarting the server with no time up update anything.

 

Link to comment

I did wonder if I somehow caused that as I was messing with smart home plugs around that time - I really hope I did.

Either way, I have resigned to the fact of a new MOBO, CPU, RAM and Powersupply at some stage in the future.  Hit all the parts at the same time.

Those costs were not on the radar so I have to plan for it a bit.

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...