DesertCookie Posted June 20, 2022 Share Posted June 20, 2022 (edited) My system started randomly crashing three days ago. After 5-30 minutes it would crash. Updating my UEFI yielded me one night of continuous operation before it went back to the old behavior. I switched from an AMD X399 1st-gen to an Intel Z590 11th-gen system two weeks ago and upgraded from 6.9 to 6.10 at the same tune; it was running without issues to this point. In the instanced where unRAID would still start the array, the log was saved to the flash drive. I added the last log I have from when the server was still running fine and a few of the past days (some of them are 385MB in size!). I also managed to export diagnostics when I actually once made it to the web UI. Current system: Gigabyte Z590 Aorus Master i5 11600K GTX 1650 2x16GB DDR4-3200 10GbE SFP+ Card 3x12TB HDD, 2TB HDD, 4TB HDD, 1TB M.2 SSD Enermax PSU (unlikely the culprit) APC UPS I already tried: UEFI update unRAID update (6.10.2 to 6.10.3) removing GPU, drives (except M.2), and SFP+-card different RAM reseating all power and data connections unRAID safe-mode fresh unRAID boot stick (wasn't recognized by the UEFI until I copied all original data to it again) running without UPS I am already suspecting a hardware error somewhere but cannot believe this new system would outright fail after two weeks (though that would be exactly my luck with tech). 20.06.22-1_syslog-1655701321 20.06.22-2_syslog-1655727397 06.06.22_syslog-20220603-181331.txt tower-diagnostics-20220620-0922.zip Edited June 20, 2022 by DesertCookie added more info Quote Link to comment
JorgeB Posted June 20, 2022 Share Posted June 20, 2022 Nothing obvious in the logs, suggest blacklisting the i915 driver if you don't need it, it's been known to cause issues for some. Quote Link to comment
DesertCookie Posted June 20, 2022 Author Share Posted June 20, 2022 (edited) I enabled it as otherwise I could see /dev/dri for use in Jellyfin; the iGPU was the whole reason for my upgrade. I will try disabling it for now to see if that fixes the issue. Edit: Blacklisting the driver did not change my situation. I now am further investigating the possibility of a hardware failure. Edited June 22, 2022 by DesertCookie Quote Link to comment
Solution DesertCookie Posted June 26, 2022 Author Solution Share Posted June 26, 2022 It seems to have been a temperature issue. The 11600K (60-70°C) runs a lot hotter than the previous 1900X (50-60°C) while consuming the same amount of power. I had left the RPM of my Noctua NF-A14s at the same 600RPM I used in the old Threadripper system and had removed the second NH-D15 fan. Adding the second fan back in and increasing the fan speed to 700RPM seems to have fixed the issue. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.