Frequent crashing after upgrading to 6.9.2.


Recommended Posts

Have been using unraid for about three years without many issues until recently. At a total loss as to what is causing this. The crashing is totally random and I can’t find anything in the logs that indicates what’s causing it. I only know it started after upgrading to 6.9.2. For now I’ve shut off everything but plex, sonarr, radarr, and sabz.  That worked for about two weeks so I recently tried adding duckdns and had another crash today. 
 

Diagnostics linked here

 

any help would be greatly appreciated. With these intermittent crashes it is so hard to figure out the cause!

Edited by razorweb
Link to comment

It's probably a long shot but I just replaced a failing drive. I added a new PCIe SATA card and attached a new drive to it and tried to rebuild the data failed disk and it would cause the system to hang after a randomish 50% to 90% completion. Physical monitor was a blank screen, they keyboard num lock led went out, no web interface, no ssh. I ended up pulling all the sata cables and sata power cables out so I could label all the drives with their SSN and when I reseated everything, the system rebuilt the data drive, then rebuilt a bigger parity then cleared the old parity disk as a data drive, formatted and fired up no problem. I've had problems with the SATA cables with the clips on them to snap them in place. I think they actually prevent good connection sometimes and I've had three computers fixed this year by reseating hard drives...

Link to comment
  • 4 months later...
Jan  7 18:01:35 Tower kernel: mce: [Hardware Error]: Machine check events logged
Jan  7 18:01:35 Tower kernel: [Hardware Error]: Corrected error, no action required.
Jan  7 18:01:35 Tower kernel: [Hardware Error]: CPU:7 (17:8:2) MC1_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|-|-|-]: 0xdc20000000010859
Jan  7 18:01:35 Tower kernel: [Hardware Error]: Error Addr: 0x0000000a2a4f1080
Jan  7 18:01:35 Tower kernel: [Hardware Error]: IPID: 0x000100b000000000, Syndrome: 0x000000005a020300
Jan  7 18:01:35 Tower kernel: [Hardware Error]: Instruction Fetch Unit Ext. Error Code: 1, IC Microtag or Full Tag Multi-hit Error.
Jan  7 18:01:35 Tower kernel: [Hardware Error]: cache level: L1, mem/io: IO, mem-tx: IRD, part-proc: SRC (no timeout)
Jan  7 18:28:54 Tower kernel: mce: [Hardware Error]: Machine check events logged
Jan  7 18:28:54 Tower kernel: [Hardware Error]: Corrected error, no action required.
Jan  7 18:28:54 Tower kernel: [Hardware Error]: CPU:9 (17:8:2) MC1_STATUS[-|CE|MiscV|AddrV|-|-|SyndV|-|-|-]: 0x9c20000000010859
Jan  7 18:28:54 Tower kernel: [Hardware Error]: Error Addr: 0x0000000001150800
Jan  7 18:28:54 Tower kernel: [Hardware Error]: IPID: 0x000100b000000000, Syndrome: 0x000000005a020300
Jan  7 18:28:54 Tower kernel: [Hardware Error]: Instruction Fetch Unit Ext. Error Code: 1, IC Microtag or Full Tag Multi-hit Error.
Jan  7 18:28:54 Tower kernel: [Hardware Error]: cache level: L1, mem/io: IO, mem-tx: IRD, part-proc: SRC (no timeout)

 

Hardware errors being detected, looks like CPU related.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.