Jump to content

Please help, getting multiple errors: server seems to crash every night, started getting docker service failed to start, cant start array again after stopping it etc...


unrais
Go to solution Solved by JorgeB,

Recommended Posts

Posted

Hi, I'm getting a bunch of errors and crashes lately. I've not changed any hardware in the system, and the only change has been upgrading to 6.12.13 (though of course this is less likely to be the problem), and getting a UPS.

What's happening?

  • Seems to crash every night. While I can ssh into it and enter the powerdown command, it looks like it begins the process, but then never succeeds. I've suspected that this might be due to docker in some way, but not been able to diagnose.
  • Since yesterday I've started to get 'docker service failed to start' upon rebooting too, with 'SQUASHFS error: xz decompression failed, data probably corrupt'. I deleted and re-built the docker image folder, which fixed it, until I rebooted again and it reported 'docker service failed to start' again. I then tried another way by stopping the array, erasing the mirror of the cache drive and starting the array again, which fixed it (without having to re-build).
  • It crashed again last night, so I had to do a hard reboot, and once again I was greeted with 'docker service failed to start'. So I tried the start/stop of the array again, but this time I couldn't get it to start again, and the system fully froze up.
  • I've also been seeing a lot of segfault 139 errors both inside docker containers and directly in unraid cli (when testing something), which I never used to get before.

 

What I've tried doing to mitigate:

  • Thinking it was a clash of schedules causing the system to crash, I disabled all schedules that weren't totally necessary (docker auto update etc) and ensured the other schedules didn't overlap at all.
  • I ran memtest which didn't report any errors.
  • Thinking it might be some other form of hardware issue, I booted directly into windows and tested a few things there without issue (one of which being one of the things causing a lot of segfault 139 errors in unraid).
  • I considered changing the usb, but after I got it working yesterday, I figured it must have been something else, though now of course I do need to try and actually change it.

 

What I've not been able to try yet (due to it crashing and now at work)

  • Adding '--no-healthcheck' to the plex container (it could already be there, I can't remember), just as a test.
  • entering 'echo "options i915 enable_dc=0" >> /boot/config/modprobe.d/i915.conf' in cli. Again, just as a test.
     


I've attached my Unraid syslog that my Synology NAS captured. It'd be great if someone smarter than me (not difficult!) could take a look at the log to see if there's a better indication as to what the problem is?

All_2024-9-10-10_3_11.html

  • Solution
Posted

Multiple apps segfaulting, memtest is only definitive if it finds errors, if you have multiple sticks try using the server with just one, if the same try with a different one, that will basically rule out bad RAM, CPU would be the next suspect.

 

 

 

Posted
26 minutes ago, JorgeB said:

Multiple apps segfaulting, memtest is only definitive if it finds errors, if you have multiple sticks try using the server with just one, if the same try with a different one, that will basically rule out bad RAM, CPU would be the next suspect.

 

 

 

Is it also possible that the usb could be causing it? If not then I'll jump straight to the RAM but otherwise I'll try replacing the USB firstly.

Posted (edited)
On 9/10/2024 at 12:42 PM, JorgeB said:

Won't say it's impossible, but very unlikely.


Hey, so after testing the ram (trying to run with just one stick at a time) and still having issues, I bit the bullet and bought some new ram. Running like a charm now. Thank you! However, I have noticed this error appearing in the syslog 
 

Unrais kernel: traps: light-locker[29184] trap int3 ip:149b64f85ca7 sp:7ffd806a5550 error:0 in libglib-2.0.so.0.6600.8[149b64f49000+89000]


Is there any way to fix that?

Edited by unrais
Posted
13 minutes ago, JorgeB said:

Not sure, they may be from a specific app.


I don't think I've installed any out of the ordinary plugin, and as far as I know it's unlikely for docker containers to cause this kind of issue (outside of themselves), is that right? I've attached a diagnostics export I've just made if you could perhaps shed some light on it please?

unrais-diagnostics-20240913-0946.zip

Posted
5 minutes ago, JorgeB said:

I don't know what light-locker is, but don't think it's a stock app.


The only plugins I have installed are:

  • Appdata backup beta
  • Community applications
  • Compose manager
  • Dynamix file manager
  • Folder view
  • Intel GPU TOP
  • Nvidia Driver
  • RTL8125 Drivers
  • Unraid Connect
  • User Scripts

 

Besides that everything else is inside docker containers & the Unraid OS itself is a fresh install (only carried over my license and appdata from the old USB).

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...