Jump to content

Regular crashes without warning


Go to solution Solved by DesertCookie,

Recommended Posts

My system started randomly crashing three days ago. After 5-30 minutes it would crash. Updating my UEFI yielded me one night of continuous operation before it went back to the old behavior. I switched from an AMD X399 1st-gen to an Intel Z590 11th-gen system two weeks ago and upgraded from 6.9 to 6.10 at the same tune; it was running without issues to this point.

 

In the instanced where unRAID would still start the array, the log was saved to the flash drive. I added the last log I have from when the server was still running fine and a few of the past days (some of them are 385MB in size!). I also managed to export diagnostics when I actually once made it to the web UI.

 

Current system:

  • Gigabyte Z590 Aorus Master
  • i5 11600K
  • GTX 1650
  • 2x16GB DDR4-3200
  • 10GbE SFP+ Card
  • 3x12TB HDD, 2TB HDD, 4TB HDD, 1TB M.2 SSD
  • Enermax PSU (unlikely the culprit)
  • APC UPS

 

I already tried:

  • UEFI update
  • unRAID update (6.10.2 to 6.10.3)
  • removing GPU, drives (except M.2), and SFP+-card
  • different RAM
  • reseating all power and data connections
  • unRAID safe-mode
  • fresh unRAID boot stick (wasn't recognized by the UEFI until I copied all original data to it again)
  • running without UPS

 

I am already suspecting a hardware error somewhere but cannot believe this new system would outright fail after two weeks (though that would be exactly my luck with tech).

 

20.06.22-1_syslog-1655701321 20.06.22-2_syslog-1655727397 06.06.22_syslog-20220603-181331.txt tower-diagnostics-20220620-0922.zip

Edited by DesertCookie
added more info
Link to comment

 I enabled it as otherwise I could see /dev/dri for use in Jellyfin; the iGPU was the whole reason for my upgrade. I will try disabling it for now to see if that fixes the issue.

 

Edit: Blacklisting the driver did not change my situation. I now am further investigating the possibility of a hardware failure.

Edited by DesertCookie
Link to comment
  • Solution

It seems to have been a temperature issue. The 11600K (60-70°C) runs a lot hotter than the previous 1900X (50-60°C) while consuming the same amount of power. I had left the RPM of my Noctua NF-A14s at the same 600RPM I used in the old Threadripper system and had removed the second NH-D15 fan. Adding the second fan back in and increasing the fan speed to 700RPM seems to have fixed the issue.

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...