March 16, 20251 yr I have this issue that I think can only now be with unraid. I have an i9 13900k, z690-e asus rog strix motherboard, 1.3kw psu, rtx4090 suprim x gpu. Every night the server fails and I can't access it. However the manner in which it fails seems to change all the time. Sometimes it slowly goes, I.e I can't ssh but can kvm access, other times the webgui itself just goes, other times it's a complete system crash but sometimes it appears like complete crash but it's still running and a few emails will be sent by some services etc. I have trawled the forums since December making minor adjustments constantly to try rectify it. So far I have changed the usb because one of my warnings was a squashfs, I have started and stopped different docker containers trying to find a culprit because I get some segfault/node errors as well as different python versions that I've managed to stop with removing different containers but I don't know if this is chicken or egg, I have upgraded the psu and gpu, I've put just one ram stick in and then swapped, I've started with a fresh unraid install on a new usb, fornatted the nvme drives, set up docker containers again from scratch incase of file corruption, amongst hundreds of other small changes to files. The error logs seem to be full of issues with plex however I think these MAY be separate to the main issue and just exacerbates it because starting or stopping plex doesn't seem to affect the crash outcome. At the end of my tether now but love unraid and just wondered if posting my logs here may help someone help me! I have 10s to 100s of diagnostic files I could post but I'll start with my latest that actually only crashed after 48 hours which is a recent record. This was after leaving it running with only one ram stick Thank you for looking Sam scarif-diagnostics-20250315-0933.zip
March 16, 20251 yr Community Expert 8 minutes ago, samabsalom said: I've put just one ram stick in and then swapped Have you actually done memtest? It's on the boot menu.
March 16, 20251 yr Community Expert Enable the syslog server and post that after a crash, but since you have a 13900K could also be the Intel 13/14 gen issue.
March 16, 20251 yr Author 1 hour ago, trurl said: Have you actually done memtest? It's on the boot menu. Yes I ran the bios memtest and the unraid memtest for the full four passes and always fine And i have run a local syslog and an external syslog but not really caught much of use/that i understand! I'll post some quotes from the past few months that I've sent to chatgpt..
March 16, 20251 yr Community Expert 8 hours ago, samabsalom said: syslog is text, which compresses very well. Zip and attach.
March 19, 20251 yr Author On 3/16/2025 at 11:16 PM, trurl said: Zip and attach. scarif-syslog-20250319-0701.zip Here is the syslog. Thank you for looking! I tried to run diagnostics but it failed Typical timing but my unraid has actually been running for the last three days now with lots of errors but no problems. You'll see the errors in the logs ive attached and I suppose I dont know if they were always there. I was also away for the weekend so it has had less use. Thanks again Sam
March 19, 20251 yr Author Sorry to post twice but these are another set of logs where Ive started without docker and started array in maintenence mode. It was something I read on another forum post a while back that helped someone else determine an issue scarif-syslog-20250319-0738.zip
March 19, 20251 yr Community Expert 7 hours ago, samabsalom said: started without docker and started array in maintenence mode. It was something I read on another forum post Are you sure it didn't say safe mode instead of maintenance mode? That would make more sense.
March 19, 20251 yr Author Ah ok. I think it was to do with the xfs filesystem and checking that but you're right safe mode does make more sense. Would this help to do and submit logs? Did you get chance to look at the logs? If so was there anything that struck you? More logs from today scarif-syslog-20250319-1913.zip
March 19, 20251 yr Community Expert 51 minutes ago, samabsalom said: it was to do with the xfs filesystem and checking that All disks seem to be mounting so probably don't need check. 51 minutes ago, samabsalom said: safe mode does make more sense. Would this help to do and submit logs? Disable Docker and VM Manager in Settings and reboot in SAFE mode and see if it will run through the night.
March 25, 20251 yr Author So I've replaced the ram with entirely new ram. This pc is becoming a little bit like triggers broom now! Im running out of parts to replace. It's started surviving some of the errors for 2-3 days and I can't work out why so I've been trying to undo changes to induce the problem again. I'm still getting the squashfs errors on startup and shutdown Just booted into safe mode now to see how that goes scarif-syslog-20250325-0709.zip scarif-diagnostics-20250325-0631.zip
March 25, 20251 yr Community Expert 37 minutes ago, samabsalom said: I'm still getting the squashfs errors These are typically a bad flash drive, but can also be other hardware, is the CPU still the same?
March 25, 20251 yr Author Thanks for replying. Yes the cpu is still the i9 13900k but I have changed the flash drive for a new one from amazon. https://amzn.eu/d/3zHaz3J. Would you think trying a third usb?
March 25, 20251 yr Community Expert If the flash drive was already replaced and the squashfs errors continue, it likely not the problem, as mentioned, On 3/16/2025 at 1:00 PM, JorgeB said: since you have a 13900K could also be the Intel 13/14 gen issue. But since the flash drive it's the easiest thing to replace, it may be worth doing once more.
March 28, 20251 yr Author So I have a new flash drive and still the same issues. I'm focusing on these SQUASHFS errors and they seem to be after loading docker.img ar 28 11:17:11 Scarif emhttpd: Starting services... Mar 28 11:17:11 Scarif emhttpd: shcmd (346): /etc/rc.d/rc.avahidaemon reload Mar 28 11:17:11 Scarif emhttpd: shcmd (356): /usr/local/sbin/mount_image '/mnt/user/system/dockernew/docker.img' /var/lib/docker 100 Mar 28 11:17:11 Scarif kernel: loop2: detected capacity change from 0 to 209715200 Mar 28 11:17:12 Scarif kernel: BTRFS: device fsid 45e7b221-7392-413e-ad5f-ffcf3cd76b04 devid 1 transid 249 /dev/loop2 scanned by mount (15465) Mar 28 11:17:12 Scarif kernel: BTRFS info (device loop2): first mount of filesystem 45e7b221-7392-413e-ad5f-ffcf3cd76b04 Mar 28 11:17:12 Scarif kernel: BTRFS info (device loop2): using crc32c (crc32c-intel) checksum algorithm Mar 28 11:17:12 Scarif kernel: BTRFS info (device loop2): using free space tree Mar 28 11:17:12 Scarif kernel: BTRFS info (device loop2): auto enabling async discard Mar 28 11:17:12 Scarif root: Resize device id 1 (/dev/loop2) from 100.00GiB to max Mar 28 11:17:12 Scarif emhttpd: shcmd (358): /etc/rc.d/rc.docker start Mar 28 11:17:12 Scarif rc.docker: Starting Docker daemon... Mar 28 11:17:12 Scarif rsyslogd: [origin software="rsyslogd" swVersion="8.2102.0" x-pid="15309" x-info="https://www.rsyslog.com"] start Mar 28 11:17:13 Scarif kernel: SQUASHFS error: xz decompression failed, data probably corrupt Mar 28 11:17:13 Scarif kernel: SQUASHFS error: Failed to read block 0x1c254cf8: -5 Just to see, I made a new btrfs docker.img on a different drive and I still get the same errors. At the moment I'm booting into safe mode and starting the array manually. Do you think this is coincidence its always after this or could this be the root of my issue?? Thanks again
March 28, 20251 yr Author Mar 28 11:37:39 Scarif emhttpd: Starting services... Mar 28 11:37:39 Scarif emhttpd: shcmd (187): /etc/rc.d/rc.avahidaemon reload Mar 28 11:37:39 Scarif emhttpd: shcmd (196): /usr/local/sbin/mount_image '/mnt/disk6/system/dockerdir/' /var/lib/docker 100 Mar 28 11:37:39 Scarif emhttpd: shcmd (198): /etc/rc.d/rc.docker start Mar 28 11:37:39 Scarif rc.docker: Starting Docker daemon... Mar 28 11:37:40 Scarif rsyslogd: [origin software="rsyslogd" swVersion="8.2102.0" x-pid="9371" x-info="https://www.rsyslog.com"] start Mar 28 11:37:41 Scarif kernel: SQUASHFS error: xz decompression failed, data probably corrupt Mar 28 11:37:41 Scarif kernel: SQUASHFS error: Failed to read block 0x1bca2bac: -5 Mar 28 11:37:41 Scarif kernel: SQUASHFS error: xz decompression failed, data probably corrupt Mar 28 11:37:41 Scarif kernel: SQUASHFS error: Failed to read block 0x1bc5e830: -5 And here is a snippet if I load docker from a directory instead of an img
March 28, 20251 yr Community Expert If it's not the flash drive, RAM or CPU would be the next suspects.
March 28, 20251 yr Author thanks again for replying. So If I've replaced the RAM already that leaves my CPU.. How best would I go about testing this do you think? I'd like to avoid buying a new cpu just to test
March 28, 20251 yr Community Expert You can try an RMA with Intel, there's a known issue with those CPUs, it's been the confirmed problem for many users in the forum.
April 9, 20251 yr Author Reigniting this because im getting weirder and weirder symptoms. the latest is that the docker page only shows 0% cpu usage for everything and all my graphs and tables have disappeared. .
April 10, 20251 yr Community Expert This is typically just a display problem, try a different browser, or just reboot.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.