razorweb Posted August 25, 2021 Share Posted August 25, 2021 (edited) Have been using unraid for about three years without many issues until recently. At a total loss as to what is causing this. The crashing is totally random and I can’t find anything in the logs that indicates what’s causing it. I only know it started after upgrading to 6.9.2. For now I’ve shut off everything but plex, sonarr, radarr, and sabz. That worked for about two weeks so I recently tried adding duckdns and had another crash today. Diagnostics linked here any help would be greatly appreciated. With these intermittent crashes it is so hard to figure out the cause! Edited August 25, 2021 by razorweb Quote Link to comment
Tristankin Posted August 25, 2021 Share Posted August 25, 2021 Yep, had the same here, either roll back or try 6.10-RC1 Quote Link to comment
JorgeB Posted August 25, 2021 Share Posted August 25, 2021 Enable syslog mirror to flash then post that log after a crash, together with the diagnostics, please attach directly to the forum, don't use external sites. Quote Link to comment
razorweb Posted August 29, 2021 Author Share Posted August 29, 2021 Diagnostics, including syslog, attached. tower-diagnostics-20210829-1121.zip Quote Link to comment
trurl Posted August 29, 2021 Share Posted August 29, 2021 1 hour ago, razorweb said: Diagnostics, including syslog Diagnostics always includes current syslog, which is in RAM like the rest of the OS and so doesn't survive reboot. Now we need syslog saved from before crash On 8/25/2021 at 4:58 AM, JorgeB said: Enable syslog mirror to flash then post that log after a crash Quote Link to comment
nowhere99 Posted August 31, 2021 Share Posted August 31, 2021 It's probably a long shot but I just replaced a failing drive. I added a new PCIe SATA card and attached a new drive to it and tried to rebuild the data failed disk and it would cause the system to hang after a randomish 50% to 90% completion. Physical monitor was a blank screen, they keyboard num lock led went out, no web interface, no ssh. I ended up pulling all the sata cables and sata power cables out so I could label all the drives with their SSN and when I reseated everything, the system rebuilt the data drive, then rebuilt a bigger parity then cleared the old parity disk as a data drive, formatted and fired up no problem. I've had problems with the SATA cables with the clips on them to snap them in place. I think they actually prevent good connection sometimes and I've had three computers fixed this year by reseating hard drives... Quote Link to comment
trurl Posted August 31, 2021 Share Posted August 31, 2021 9 hours ago, nowhere99 said: problems with the SATA cables with the clips on them to snap them in place. I think they actually prevent good connection sometimes https://support-en.wd.com/app/answers/detail/a_id/15954 Quote Link to comment
razorweb Posted January 13, 2022 Author Share Posted January 13, 2022 So i spent the last few months mirroring my syslog to flash. The result is attached. Can anyone decipher what may be causing the crashes (which have continued even after downgrading to 6.9.1, albeit less frequently)? syslog Quote Link to comment
JorgeB Posted January 13, 2022 Share Posted January 13, 2022 Jan 7 18:01:35 Tower kernel: mce: [Hardware Error]: Machine check events logged Jan 7 18:01:35 Tower kernel: [Hardware Error]: Corrected error, no action required. Jan 7 18:01:35 Tower kernel: [Hardware Error]: CPU:7 (17:8:2) MC1_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|-|-|-]: 0xdc20000000010859 Jan 7 18:01:35 Tower kernel: [Hardware Error]: Error Addr: 0x0000000a2a4f1080 Jan 7 18:01:35 Tower kernel: [Hardware Error]: IPID: 0x000100b000000000, Syndrome: 0x000000005a020300 Jan 7 18:01:35 Tower kernel: [Hardware Error]: Instruction Fetch Unit Ext. Error Code: 1, IC Microtag or Full Tag Multi-hit Error. Jan 7 18:01:35 Tower kernel: [Hardware Error]: cache level: L1, mem/io: IO, mem-tx: IRD, part-proc: SRC (no timeout) Jan 7 18:28:54 Tower kernel: mce: [Hardware Error]: Machine check events logged Jan 7 18:28:54 Tower kernel: [Hardware Error]: Corrected error, no action required. Jan 7 18:28:54 Tower kernel: [Hardware Error]: CPU:9 (17:8:2) MC1_STATUS[-|CE|MiscV|AddrV|-|-|SyndV|-|-|-]: 0x9c20000000010859 Jan 7 18:28:54 Tower kernel: [Hardware Error]: Error Addr: 0x0000000001150800 Jan 7 18:28:54 Tower kernel: [Hardware Error]: IPID: 0x000100b000000000, Syndrome: 0x000000005a020300 Jan 7 18:28:54 Tower kernel: [Hardware Error]: Instruction Fetch Unit Ext. Error Code: 1, IC Microtag or Full Tag Multi-hit Error. Jan 7 18:28:54 Tower kernel: [Hardware Error]: cache level: L1, mem/io: IO, mem-tx: IRD, part-proc: SRC (no timeout) Hardware errors being detected, looks like CPU related. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.