Jump to content

Cron script to reboot the server and parity checks. (Reason for cron script is crashes)


Recommended Posts

Hey all,

 

Buckle up because this is going to be a long one....

 

TL;DR: If i have a cron script to reboot the server on a weekly basis, what happens if a parity check is running at the time the script should start?

 

The reason for the TL;DR is for this reddit post i made earlier today, I had some decent feedback on what to look for for the crashing but haven't gotten an answer on the cron sript yet. So i figured now might be a good time to really dive in and see if anyone has any ideas for the crashing or if i should just keep doing what i have been doing.

 

I have been using for about 2 years and some where along the lines i did something or something happened to where the server will randomly lock up. I have tried everything that i know of to try and troubleshoot it other than completely starting from scratch. The hours that i have put into all the docker containers, plug ins, and overall customization to make everything work just isn't worth redoing...

 

At one point i thought that there could be an issue with one of my VMs running that was causing it to crash, so i stopped leaving my VMs running unless i needed them (Which is fine with me). This didn't seem to help other than lower my electricity usage slightly.

 

I have scoured my log files before and after each of the crashes and nothing seems to stick out, and i cant seem to find any correlation to when, where, or why this is happening.

 

What i DO know:

  • Crashes happen about every 4-6 weeks, Different times of the day, different days
  • Stopping my VMs doesn't resolve this
  • C-States are disabled in bios and config/go
  • This has happened with two different mobo/cpu combinations (First one was a R7-1700x on a A320 chipset)
  • Using what i can see with TIG stack, it doesn't seem that there is an obvious change in resources or activity before the crash

 

What i DON'T know:

  • How to troubleshoot further without starting from scratch
  • What is causing these crashes/freezes

 

The fix: About a month ago i put in a user script to reboot the server once a week at a time that the server should be mostly idle. So far this is working fine. I haven't had any sort of freezing or lock up since.

 

With this fix i'm concerned with the additional wear and tear on HDDs by them all spinning up at the same time every week. From what i know about HDDs there is a max limit to how many can be near each other for vibration reasons? (I think i read this somewhere?). I'm also concerned with other activities that would be running on the server at the same time that it would reboot, Such as parity checks, mover, yadayada...

 

In the long run, if i cant fix the crashing i would rather just reboot the server to make sure it doesn't lock up in the middle of something important, it would be even better if i could slow the spin up of each drive, almost like cascade them? It would also be better if i could do a cron script to run every other week but i cant seem to find a way to do that.

 

Short spec list:

2x Node 804, one as a DAS

Ryzen TR1900X

AsRock X399M Taichi

64GB DDR4

Parity - 2x 10TB

Array - 8x 8TB, 5x 10TB, 1x 3TB, 1x 4TB

Cache - 2TB SSD

UD - 120GB SSD (Plex Transcode), 240GB SSD (VMs), 500GB SSD (Appdata, dockers, etc)

thevault-diagnostics-20210511-1638.zip

Link to comment
3 hours ago, cwrivers said:

C-States are disabled in bios and config/go

 

That is the wrong approach and doing it twice doesn't make it work any better. C states in general are a good thing. It's just the very lowest power one that can be problematic with first generation Zen, such as your 1700X and 1900X, which enter a low power state when idle and sometimes fail to come back out of it. Try this and, if it works, you can forget about rebooting by cron and it's associated problems:

  • Remove the command from the go file
  • Re-enable Global C states in the BIOS
  • Find the BIOS setting for Power Supply Idle Control - it can be difficult to locate but you might find it near to the C states control option; if not it will be under the extensive AMD CBS section. Change it from the default value of "Low Current Idle" to "Typical Current Idle".

If that is indeed your problem, it will hopefully now be fixed. If it isn't fixed then, quite honestly, I would much rather give the processor some light task (such as leaving your VMs running) to keep it from falling asleep than scheduling a regular reboot.

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...