Jump to content

UNRAID Kernel panic not syncing Fatal exception in interrupt


Recommended Posts

My server has crashed 3 times now, increasing in frequency each time. I thought the issue may have been a couple new drives I added for parity, as they've put out parity errors while doing the latest check. I removed one of them and started a parity check only for it to crash in about an hour.

 

I did some due diligence and enabled syslog (although it's not allowing me to upload them here for some reason) So here's a link to it. 

 

Also, I have no idea if it's related or just a random occurrence, but, my docker.img tripled in size randomly, so I had to go through that whole process of fixing it. Idk if that's helpful, just thought I'd mention it in case it was.

Link to comment

Ok, after dealing with some other things I finally was able to start  another parity check syslogand it crashed about an hour and a half in.
 

After looking at the file I can see that there is data from after the crash, so I feel I should mention that I restarted the server before remembering about the syslog. Sorry if that make it harder.

 

Edited by ItsNotNick
Link to comment

The only thing of note I see is multiple of these:

 

Apr 24 00:11:12 Andromeda kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]
Apr 24 00:11:12 Andromeda kernel: caller _nv000651rm+0x1ad/0x200 [nvidia] mapping multiple BARs

 

Including one just before the crash, can't say it's related, but worth trying running the server without the Nvidia GPU, or installing it in a different PCIe slot if available.

Link to comment

I am planning to switch motherboard out for one with a couple more PCIe slots, I could do that now, but I feel like it would be better not to change too much at once.

 

Also worth mentioning that this only happens when running a parity check, I ran the server all week without a single crash, then turned on parity only for it to crash within a couple hours like I said.

Link to comment
  • 3 weeks later...
On 4/24/2022 at 2:46 AM, JorgeB said:

It's likely hardware related, if it's not the GPU it could be PSU, board, etc., you can also try v6.10.0-rc to rule out any kernel compatibility issue.

 So, after some time, I went from connecting every drive directly to the motherboard to using an HBA Card. And after doing that my parity check went from 17 errors to 16. My system hasn't crashed at all, which is good (?), although I don't know what I did exactly to fix that.

 

I have 2 questions:

1: Does the decrease in parity checks (assuming it's not a fluke) suggest it's the cables? Since I did reconnect every single cable once, at least.

2: Is 16 parity checks a real issue or am I overreacting and it is something I could just ignore?

Link to comment
46 minutes ago, ItsNotNick said:

Is 16 parity checks a real issue or am I overreacting and it is something I could just ignore?

Assuming you mean errors reported during a parity check then anything other than 0 is an issue as if a disk fails and needs to be built onto a replacement that means that 16 sectors will have corruption after the rebuild.

 

Are the checks reporting this correcting or non-correcting?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...