Jump to content

(6.12.9, 6.12.10) Random Crashing on AMD and Intel builds with no info in Unraid syslog


Go to solution Solved by JorgeB,

Recommended Posts

Hello! I have been using Unraid for about a month and a half now, and have been dealing with random crashes essentially the entire time. I've lurked on this forum and the subreddit, looking to solve my own issues, but have had no luck. My original system was an old gaming PC that I had from around 2017, but now I am running a system with 100% new parts and it is still crashing.

My server will crash randomly. Sometimes it stays on for 6 hours, sometimes 30 hours, sometimes 2 hours. I have set up syslog to a local share, mirrored it to my flash drive, but there is no rhyme or reason in the syslog. I found a thread where there was an issue with 1 Gen Ryzen CPUs, so I replaced that with an Intel chip along with most other parts just to do a full upgrade. Still encountering crashes.

 

I have set it up today with just two of my Docker containers running and we are at 9 hours runtime; I am going to leave all my other containers off and see if I can make it more than three days without a crash. If so, I'm going to add a few more containers and see how things go from there.

 

I've also attached my diags as well as my Syslog from the last three days, in case anyone can help me find a reason before my testing ends. Syslog is already being forwarded to the internal syslog server.

 

In this testing, I'd like to also forward the logs from all of my Docker containers to the same share as my Unraid syslog, but I am not familiar enough with Docker yet to know how to do that. I'm doing my own research into that, but if anyone here knows of Unraid-specific commands I would be very grateful.

Thanks!

EDIT: I have enabled log forwarding from my Docker containers to the syslog server using the following command:
--log-driver syslog --log-opt syslog-address=udp://x.x.x.x:514 --log-opt tag={{.Name}}

I haven't had a crash since my original post, so I am going to let it ride for a couple of days before enabling other containers.

kirane-diagnostics-20240615-2300.zip Syslog_Since_Rebuild.log

Edited by calabriel
Update
Link to comment
  • calabriel changed the title to (6.12.9, 6.12.10) Random Crashing on AMD and Intel builds with no info in Unraid syslog

There are multiple apps segfaulting, start by running memtest, but memtest is only definitive if it finds errors, if you have multiple sticks try using the server with just one, if the same try with a different one, that will basically rule out bad RAM.

Link to comment
Posted (edited)

I ran Memtest for three passes on the original used RAM in the AMD build, and then for one pass on the brand new RAM I put into the Intel build. There were no errors found. I can run it again on the new RAM to really bake it in.

 

I do have multiple sticks as well and pulling those is easy enough. Having the same/similar issue on two different sets of RAM bought years apart would be an interesting coincidence. Thanks, I'll be back with results!

Edited by calabriel
Link to comment

Good to know. I'll have to check in on this after work. I did run the ZFS scrub once, before running it showed no errors but after running it showed one checksum error. I'll update in about 12-13 hours when I get back home and can run Memtest and check on the cache error.

Link to comment

"zpool status -v" showed one error in a Tdarr file. Tdarr is currently off and I don't need that file, so I deleted it and re-ran the scrub. The pool currently shows as clean, my syslog shows normal activity from Unraid and all active containers, and we have the longest uptime in a week or so.


I'm still going to Memtest, probably overnight tonight.

Link to comment

Shut down the server to run Memtest, and we have major errors (561 of them and counting, in pass 2). I'm refunding this set of RAM since I got it less than a week ago and I'm getting a new set. Before shutdown, I had 48 hours of uptime and I think everything was stable without Tdarr. I think I'm clear once this RAM is replaced and I find an alternative to Tdarr.

Thanks Jorge!
 

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...