Nightly crashes

March 16, 20251 yr

I have this issue that I think can only now be with unraid.

I have an i9 13900k, z690-e asus rog strix motherboard, 1.3kw psu, rtx4090 suprim x gpu.

Every night the server fails and I can't access it. However the manner in which it fails seems to change all the time. Sometimes it slowly goes, I.e I can't ssh but can kvm access, other times the webgui itself just goes, other times it's a complete system crash but sometimes it appears like complete crash but it's still running and a few emails will be sent by some services etc.

I have trawled the forums since December making minor adjustments constantly to try rectify it. So far I have changed the usb because one of my warnings was a squashfs, I have started and stopped different docker containers trying to find a culprit because I get some segfault/node errors as well as different python versions that I've managed to stop with removing different containers but I don't know if this is chicken or egg, I have upgraded the psu and gpu, I've put just one ram stick in and then swapped, I've started with a fresh unraid install on a new usb, fornatted the nvme drives, set up docker containers again from scratch incase of file corruption, amongst hundreds of other small changes to files.

The error logs seem to be full of issues with plex however I think these MAY be separate to the main issue and just exacerbates it because starting or stopping plex doesn't seem to affect the crash outcome.

At the end of my tether now but love unraid and just wondered if posting my logs here may help someone help me! I have 10s to 100s of diagnostic files I could post but I'll start with my latest that actually only crashed after 48 hours which is a recent record. This was after leaving it running with only one ram stick

Thank you for looking

Sam

scarif-diagnostics-20250315-0933.zip

Quote

March 16, 20251 yr

Community Expert

8 minutes ago, samabsalom said:

I've put just one ram stick in and then swapped

Have you actually done memtest? It's on the boot menu.

Quote

March 16, 20251 yr

Community Expert

Enable the syslog server and post that after a crash, but since you have a 13900K could also be the Intel 13/14 gen issue.

Quote

March 16, 20251 yr

Author

1 hour ago, trurl said:

Have you actually done memtest? It's on the boot menu.

Yes I ran the bios memtest and the unraid memtest for the full four passes and always fine

And i have run a local syslog and an external syslog but not really caught much of use/that i understand!

I'll post some quotes from the past few months that I've sent to chatgpt..

Quote

March 16, 20251 yr

Community Expert

8 hours ago, samabsalom said:

syslog

is text, which compresses very well. Zip and attach.

Quote

March 19, 20251 yr

Author

On 3/16/2025 at 11:16 PM, trurl said:

Zip and attach.

scarif-syslog-20250319-0701.zip Here is the syslog. Thank you for looking! I tried to run diagnostics but it failed

Typical timing but my unraid has actually been running for the last three days now with lots of errors but no problems. You'll see the errors in the logs ive attached and I suppose I dont know if they were always there. I was also away for the weekend so it has had less use.

Thanks again

Sam

Quote

March 19, 20251 yr

Author

Sorry to post twice but these are another set of logs where Ive started without docker and started array in maintenence mode. It was something I read on another forum post a while back that helped someone else determine an issue

scarif-syslog-20250319-0738.zip

Quote

March 19, 20251 yr

Community Expert

7 hours ago, samabsalom said:

started without docker and started array in maintenence mode. It was something I read on another forum post

Are you sure it didn't say safe mode instead of maintenance mode? That would make more sense.

Quote

March 19, 20251 yr

Author

Ah ok. I think it was to do with the xfs filesystem and checking that but you're right safe mode does make more sense. Would this help to do and submit logs?

Did you get chance to look at the logs? If so was there anything that struck you?

More logs from today
scarif-syslog-20250319-1913.zip

Quote

March 19, 20251 yr

Community Expert

51 minutes ago, samabsalom said:

it was to do with the xfs filesystem and checking that

All disks seem to be mounting so probably don't need check.

51 minutes ago, samabsalom said:

safe mode does make more sense. Would this help to do and submit logs?

Disable Docker and VM Manager in Settings and reboot in SAFE mode and see if it will run through the night.

Quote

March 25, 20251 yr

Author

So I've replaced the ram with entirely new ram. This pc is becoming a little bit like triggers broom now! Im running out of parts to replace.

It's started surviving some of the errors for 2-3 days and I can't work out why so I've been trying to undo changes to induce the problem again.

I'm still getting the squashfs errors on startup and shutdown

Just booted into safe mode now to see how that goes

scarif-syslog-20250325-0709.zip scarif-diagnostics-20250325-0631.zip

Quote

March 25, 20251 yr

Community Expert

37 minutes ago, samabsalom said:

I'm still getting the squashfs errors

These are typically a bad flash drive, but can also be other hardware, is the CPU still the same?

Quote

March 25, 20251 yr

Author

Thanks for replying. Yes the cpu is still the i9 13900k but I have changed the flash drive for a new one from amazon. https://amzn.eu/d/3zHaz3J. Would you think trying a third usb?

Quote

March 25, 20251 yr

Community Expert

If the flash drive was already replaced and the squashfs errors continue, it likely not the problem, as mentioned,

On 3/16/2025 at 1:00 PM, JorgeB said:

since you have a 13900K could also be the Intel 13/14 gen issue.

But since the flash drive it's the easiest thing to replace, it may be worth doing once more.

Quote

March 28, 20251 yr

Author

So I have a new flash drive and still the same issues. I'm focusing on these SQUASHFS errors and they seem to be after loading docker.img

ar 28 11:17:11 Scarif emhttpd: Starting services...
Mar 28 11:17:11 Scarif emhttpd: shcmd (346): /etc/rc.d/rc.avahidaemon reload
Mar 28 11:17:11 Scarif emhttpd: shcmd (356): /usr/local/sbin/mount_image '/mnt/user/system/dockernew/docker.img' /var/lib/docker 100
Mar 28 11:17:11 Scarif kernel: loop2: detected capacity change from 0 to 209715200
Mar 28 11:17:12 Scarif kernel: BTRFS: device fsid 45e7b221-7392-413e-ad5f-ffcf3cd76b04 devid 1 transid 249 /dev/loop2 scanned by mount (15465)
Mar 28 11:17:12 Scarif kernel: BTRFS info (device loop2): first mount of filesystem 45e7b221-7392-413e-ad5f-ffcf3cd76b04
Mar 28 11:17:12 Scarif kernel: BTRFS info (device loop2): using crc32c (crc32c-intel) checksum algorithm
Mar 28 11:17:12 Scarif kernel: BTRFS info (device loop2): using free space tree
Mar 28 11:17:12 Scarif kernel: BTRFS info (device loop2): auto enabling async discard
Mar 28 11:17:12 Scarif root: Resize device id 1 (/dev/loop2) from 100.00GiB to max
Mar 28 11:17:12 Scarif emhttpd: shcmd (358): /etc/rc.d/rc.docker start
Mar 28 11:17:12 Scarif rc.docker: Starting Docker daemon...
Mar 28 11:17:12 Scarif rsyslogd: [origin software="rsyslogd" swVersion="8.2102.0" x-pid="15309" x-info="https://www.rsyslog.com"] start
Mar 28 11:17:13 Scarif kernel: SQUASHFS error: xz decompression failed, data probably corrupt
Mar 28 11:17:13 Scarif kernel: SQUASHFS error: Failed to read block 0x1c254cf8: -5

Just to see, I made a new btrfs docker.img on a different drive and I still get the same errors. At the moment I'm booting into safe mode and starting the array manually. Do you think this is coincidence its always after this or could this be the root of my issue??

Thanks again

Quote

March 28, 20251 yr

Author

Mar 28 11:37:39 Scarif emhttpd: Starting services...
Mar 28 11:37:39 Scarif emhttpd: shcmd (187): /etc/rc.d/rc.avahidaemon reload
Mar 28 11:37:39 Scarif emhttpd: shcmd (196): /usr/local/sbin/mount_image '/mnt/disk6/system/dockerdir/' /var/lib/docker 100
Mar 28 11:37:39 Scarif emhttpd: shcmd (198): /etc/rc.d/rc.docker start
Mar 28 11:37:39 Scarif rc.docker: Starting Docker daemon...
Mar 28 11:37:40 Scarif rsyslogd: [origin software="rsyslogd" swVersion="8.2102.0" x-pid="9371" x-info="https://www.rsyslog.com"] start
Mar 28 11:37:41 Scarif kernel: SQUASHFS error: xz decompression failed, data probably corrupt
Mar 28 11:37:41 Scarif kernel: SQUASHFS error: Failed to read block 0x1bca2bac: -5
Mar 28 11:37:41 Scarif kernel: SQUASHFS error: xz decompression failed, data probably corrupt
Mar 28 11:37:41 Scarif kernel: SQUASHFS error: Failed to read block 0x1bc5e830: -5

And here is a snippet if I load docker from a directory instead of an img

Quote

March 28, 20251 yr

Community Expert

If it's not the flash drive, RAM or CPU would be the next suspects.

Quote

March 28, 20251 yr

Author

thanks again for replying. So If I've replaced the RAM already that leaves my CPU.. How best would I go about testing this do you think? I'd like to avoid buying a new cpu just to test

Quote

March 28, 20251 yr

Community Expert

You can try an RMA with Intel, there's a known issue with those CPUs, it's been the confirmed problem for many users in the forum.

Quote

April 9, 20251 yr

Author

Reigniting this because im getting weirder and weirder symptoms. the latest is that the docker page only shows 0% cpu usage for everything and all my graphs and tables have disappeared.

.

Quote

April 10, 20251 yr

Community Expert

This is typically just a display problem, try a different browser, or just reboot.

Quote

Nightly crashes

Featured Replies

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)