Unclean shutdown - hardware error - post your diagnostics


Recommended Posts

  • 4 weeks later...

I have been trying to solve this problem intensively for a week. But no luck.

 

Before I through out the motherboard, I have one question:

 

Is there an Unraid setting which restarts the PC when reached a certain temperature?

Link to comment

Have you notified whoever asked you to post diagnostics that they are up here? Maybe describe the issues you're having in more detail and someone else may be able to take a look.

 

Most modern CPUs will throttle back if they get too hot, and will probably shut the computer down if temps continue to go up. You'd probably need to look at the docs for your mother board to determine if it has that feature and where in the BIOS settings it may be.

 

The Parity Check Tuning plugin can be set to pause a parity check or disk rebuild if disk temps get too hot, but it won't shut down the whole server.

Link to comment
6 hours ago, FreeMan said:

The Parity Check Tuning plugin can be set to pause a parity check or disk rebuild if disk temps get too hot, but it won't shut down the whole server.

 

The more recent versions of the plugin DO have an option to shutdown the server if disks overheat :)  There is, though,  no option to do this on something like the CPU overheating.

 

Link to comment

Thanks for feedback. 

 

Diagnostics in the first post. But it lists only from the restart. 

The server is liquid cooled. max temp on cpus I seen 60C. Max working temp is +90C according to Intel.. Max temp on motherboard is not listed by Asus, the reboot is at about 55C. 

 

// Frode

Link to comment

I removed the Parity Check Tuning plugin just to be at the safe side. No change, it rebooted. I have tested the same load with the chassie open and fans at full speed. Then no reboot: So heat, not load is causing the reboot. I had a suspision against the power supply, Corsair AX1600i. I tested a new psu of the same type, no change. This psu have a temp overload feature. That is 120C, so it should not do it. 

 

IPMI/BMC log says:

1078629807_Screenshot2021-07-04at19_25_05.png.541ea2debd8c04f37bea544394dd61a1.png

 

which to me can imply that the reboot cause is outside the ilmi/motherboard, but maybe OS or PSU. 

 

It is possible I need a new motherboard. What I really want to avoid is ending up with the same reboot on a new mb.

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.