August 7, 20241 yr Hi All, Hoping someone can help I was on 6.10 and started crashing daily, approx half a month ago. I updated to 6.12.10 and noticed 2 drives went disabled, I ended up rebuilding those (1 Parity and 1 Drive). After rebuild it ran for 15 days straight, now back to constant crashing. Would bad drives cause the server to restart like this or is something else going on? Will be updated to 6.12.11 now. Usually finding this: Aug 6 09:49:51 Tower emhttpd: unclean shutdown detected tower-diagnostics-20240806-2203.zip Edited August 7, 20241 yr by Tyronious Added unclean message
August 8, 20241 yr Author Please see attached. Hopefully this is correct, mirror'd to flash drive and pulled from there. Did notice this, sounds like I may need to do a bios update. Will try that, let me know if you notice anything else. Thanks! Aug 8 14:00:23 Tower kernel: Performance Events: PEBS fmt1+, SandyBridge events, 16-deep LBR, full-width counters, Broken BIOS detected, complain to your hardware vendor. Aug 8 14:00:23 Tower kernel: [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 38d is 330) Aug 8 14:00:23 Tower kernel: Intel PMU driver. Aug 8 14:00:23 Tower kernel: ... version: 3 Aug 8 14:00:23 Tower kernel: ... bit width: 48 Aug 8 14:00:23 Tower kernel: ... generic registers: 4 Aug 8 14:00:23 Tower kernel: ... value mask: 0000ffffffffffff Aug 8 14:00:23 Tower kernel: ... max period: 00007fffffffffff Aug 8 14:00:23 Tower kernel: ... fixed-purpose events: 3 Aug 8 14:00:23 Tower kernel: ... event mask: 000000070000000f Aug 8 14:00:23 Tower kernel: signal: max sigframe size: 1776 Aug 8 14:00:23 Tower kernel: Estimated ratio of average max frequency by base frequency (times 1024): 1202 Aug 8 14:00:23 Tower kernel: rcu: Hierarchical SRCU implementation. Aug 8 14:00:23 Tower kernel: rcu: Max phase no-delay instances is 400. Aug 8 14:00:23 Tower kernel: smp: Bringing up secondary CPUs ... Aug 8 14:00:23 Tower kernel: x86: Booting SMP configuration: Aug 8 14:00:23 Tower kernel: .... node #0, CPUs: #1 #2 #3 #4 #5 Aug 8 14:00:23 Tower kernel: .... node #1, CPUs: #6 Aug 8 14:00:23 Tower kernel: smpboot: CPU 6 Converting physical 0 to logical die 1 Aug 8 14:00:23 Tower kernel: #7 #8 #9 #10 #11 Aug 8 14:00:23 Tower kernel: .... node #0, CPUs: #12 Aug 8 14:00:23 Tower kernel: MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details. syslog.rar Edited August 8, 20241 yr by Tyronious Added log
August 8, 20241 yr Author I did also see this: hwmon_device_register() is deprecated. Please convert the driver to use hwmon_device_register_with_info(). Which I believe is apart of Dynamix System Temp - Attempted unload and reload of drivers to see if that fixes the above. Also this crashing does occur with the array not started.
August 9, 20241 yr Community Expert There's nothing in the syslog-previous, but it only cover one minute up-time, wrong log?
August 19, 20241 yr Author After being away for the week, I have come back to powering on the system and see it stuck in a boot loop. I suspect this is hardware or an incompatibility going on. I see similar threads so I am going to try a new USB drive - https://forums.unraid.net/topic/140536-612-stable-update-stuck-in-boot-loop/ to rule out a config issue on the USB. Assuming this fails I will have to investigate potential hardware. I get to the point of "Loading bzroot...." and the Server restarts.
August 20, 20241 yr Author Boot loop was resolved. Pointing to bad RAM, removed 2 sticks, boot loop went away. Unraid started back up. Will continue to monitor for initial issue and add a cache drive. Will ensure proper logs for now we can close this thread. Will reopen should crashing occur post ram issues.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.