Jump to content

BEEN CRASHING CONSTANTLY FOR WEEKS EVERY DAY OR SO


Recommended Posts

Hi everyone!

 

I haven't posted here in a long time, so please let me know if I am not providing all the proper information, but I really need your help. For weeks now, every day or two, I would lose complete access to my server and its containers (even a local IP sniffer shows that the server's local IP cannot be pinged). I would then have to do a complete power shutdown of the server and reboot - to which the server would work again for a day or so before the issued arose again. At first I thought it was my deluge container that was the issue, since it would be the container that would seem to error first, but since then I have removed the container and I still have the issue. I have attached the diag zip file and the syslog file hoping someone could point me to the source of the issue so I can fix it. THANK YOU SO MUCH FOR YOUR HELP!!

syslog

tower-diagnostics-20230323-1717.zip

Edited by Pjrezai
Link to comment
  • 2 weeks later...

Call traces are different now, but there are multiple ones, unfortunately these don't give me any clue on what the problem could be, it could be hardware, one thing you can try is to boot the server in safe mode with all docker/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one.

  • Like 1
Link to comment
On 4/4/2023 at 11:51 AM, JorgeB said:

RAM or board/CPU would be the main suspects.

Hi! So it completely crashed again without having the torrent client open. So I decided to look into the hardware. Thanks for pointing that out to me. I ran a memtest and didn't get any errors though. Then I read somewhere about c-states. I decided to disable them and run the server again and see if in the next few days it crashes again. Do you think it may be the c-states? Also, I have attached the newest syslog for the newest crash. Hopefully you can see something in there. In they syslog it seems to have occurred around 2:11am this morning, if you can take a look around that time in the syslogs.

syslog-192.168.1.146.log

Link to comment

Ah sorry, thanks! Thought it might be more than one problem. So I basically moved back to my old hardware and it seems like everything is running smoothly so far. I will wait a few days to see, but this could basically point to it being a hardware problem - which is unfortunate because the new hardware is brand new - an i9-13900k and asus prime Z-690a and 32gb ddr5 5200 G.Skill. I did a memtest and didn't come up with any errors but still I feel as though odds are it is the memory since I bought them used (the cpu and motherboard are new)

Link to comment

@Pjrezai if you have 2 memory slots used then you could try to just use one and see if it crashes. Then if it does switch out and only use the other. If its the memory I would assume the chances to be low that both are defective at once.

As you moved to old hardware you could use the new hardware to run a trial version of unraid so that your main system can stay stable for now.

Edited by Anon
  • Like 1
Link to comment
On 4/8/2023 at 5:49 AM, Anon said:

@Pjrezai if you have 2 memory slots used then you could try to just use one and see if it crashes. Then if it does switch out and only use the other. If its the memory I would assume the chances to be low that both are defective at once.

As you moved to old hardware you could use the new hardware to run a trial version of unraid so that your main system can stay stable for now.

Thanks for that option! That would be very convenient.  So Just an update, I went and purchased new ram (same make and model) thinking maybe the previous ram was defective and still got a crash. I have attached it below. I am lost for words on why now, but maybe you can see something in the logs! Thanks for all your help guys!

syslog-192.168.1.146 (1).log tower-diagnostics-20230412-0217.zip

Link to comment
  • 2 weeks later...

I installed a fresh trial version of unraid on the new hardware and it crashed again. I have done a memtest in the past (via the built-in one on my motherboard; the one on the unraid USB doesn't work - but I even bought new memory just in case). It still crashes. I have attached the syslog. It happened on Aug18. I don't know if there is something in the BIOS that is the issue? The motherboard? The CPU? I'm really lost. ANY help would be very appreciative!

syslog test-diagnostics-20230422-1932.zip

Link to comment
6 hours ago, JorgeB said:

There are constant call traces logged, unfortunately not always easy to tell what is causing them, but looks more like a hardware problem, if you already tried safe mode without docker and VMs like suggested above, and RAM was already replaced, board or CPU would be the main suspects.

Well yea it definitely is a hardware problem because I am running a whole new instance of unraid. Yeah I replaced the RAM with the same make and model. I do have another new-in-box of the same motherboard. The only thing I don't have a spare of is the CPU. I really hope it isn't that because it is the most difficult to deal with, but I assume it would be with my luck. Do you think it is just that the particular make and model of the motherboard and RAM don't play well with Unraid? I doubt it but I have the ASUS PRIME Z-690A and G.Skill Ripjaw DDR5 5200 RAM.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...