August 20, 20241 yr I'm having an issue where all docker containers turn off and fail to start again unless I restart the array. Before this I had issues with frequent crashes from a faulty pcie to sata expansion card. After it was removed the system booted. tower-diagnostics-20240821-0714.zip
August 21, 20241 yr There are BTRFS errors : Aug 19 14:06:31 Tower kernel: btrfs_validate_extent_buffer: 1850 callbacks suppressed Aug 19 14:06:31 Tower kernel: BTRFS warning (device nvme1n1p1): checksum verify failed on logical 966566641664 mirror 1 wanted 0xc1a226c5 found 0x6bade2d5 level 0 Aug 19 14:06:31 Tower kernel: BTRFS warning (device nvme1n1p1): checksum verify failed on logical 966566641664 mirror 2 wanted 0xc1a226c5 found 0xee1c9cd7 level 0 Aug 19 14:06:31 Tower kernel: BTRFS warning (device nvme1n1p1): checksum verify failed on logical 966566641664 mirror 1 wanted 0xc1a226c5 found 0x6bade2d5 level 0 Aug 19 14:06:31 Tower kernel: BTRFS warning (device nvme1n1p1): checksum verify failed on logical 966566641664 mirror 2 wanted 0xc1a226c5 found 0xee1c9cd7 level 0 Aug 19 14:06:31 Tower kernel: I/O error, dev loop2, sector 20264160 op 0x0:(READ) flags 0x1000 phys_seg 4 prio class 0 Aug 19 14:06:31 Tower kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 86, rd 90, flush 0, corrupt 0, gen 0 Aug 19 14:06:31 Tower kernel: BTRFS warning (device nvme1n1p1): checksum verify failed on logical 966566641664 mirror 1 wanted 0xc1a226c5 found 0x6bade2d5 level 0 Aug 19 14:06:31 Tower kernel: BTRFS warning (device nvme1n1p1): checksum verify failed on logical 966566641664 mirror 2 wanted 0xc1a226c5 found 0xee1c9cd7 level 0 Aug 19 14:06:31 Tower kernel: loop: Write error at byte offset 10375249920, length 4096. Aug 19 14:06:31 Tower kernel: I/O error, dev loop2, sector 20264160 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0 Aug 19 14:06:31 Tower kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 87, rd 90, flush 0, corrupt 0, gen 0 Probably to your docker image. You should delete and recreate it. If it continues, you should do a memcheck.
August 21, 20241 yr Author The ssd with docker on it gave errors UID: 80bf3011-b153-4b87-9e8c-711b01222bd3 Scrub started: Wed Aug 21 14:40:36 2024 Status: finished Duration: 0:00:28 Total to scrub: 69.41GiB Rate: 2.48GiB/s Error summary: verify=8 csum=1 Corrected: 0 Uncorrectable: 9 Unverified: 0 tower-diagnostics-20240822-0741.zip
August 22, 20241 yr Look at the syslog for the list of corrupt files, delete or restore them from a backup, then run another scrub to confirm 0 errors.
August 22, 20241 yr Author I deleted the docker image The errors are still present on the one sdd (which contains the docker)
August 22, 20241 yr There are other corrupt files, they are listed in the syslog, e.g.: path: appdata/binhex-plexpass/Plex Media Server/Media/localhost/3/70320c9411d2e1173ca6a8e17276181f3667518.bundle/Contents/Thumbnails/thumb1.jpg)
August 23, 20241 yr Author The ssd corrupted with a no file system error so I restored formated and restored from backup. Scrub is no longer giving an error. However the system went offline last night, plugging into an external monitor doesn't detect a signal until reboot. So it seams the Pcie expansion was not the issue. I ran memtest86 and it passed. I can't run the unraid memcheck because it's disabled without a graphics card. attached is the diagnostic after booting again after the crash. I have the gen13 i3 with latest bios. Its not meant to be part of the intel voltage issue so shouldnt be causing the crash...
August 23, 20241 yr Enable the syslog server and post that after a crash but if it's a hardware problem there likely won't be anything relevant logged.
August 23, 20241 yr 6 hours ago, Azura said: I can't run the unraid memcheck because it's disabled without a graphics card. It is worth pointing out that there is now the "Live Memory Tester" plugin that can be run while Unraid is running. Passing its tests does not necessarily mean you have no RAM issue, but failing any test definitely does.
August 25, 20241 yr Author Just had a reset crash, not the usual type of crash tower-diagnostics-20240825-1958.zip syslog
August 25, 20241 yr Author 4 minutes ago, Azura said: Just had a reset crash, after the downloading the logs the system went down again. If theres nothing in the logs Ill start swapping out hardware tower-diagnostics-20240825-1958.zip 140 kB · 0 downloads syslog 172.71 kB · 0 downloads
August 26, 20241 yr Author how do I find the persistant one. I ticked mirror to flash and the guide says its in "/boot/logs". There is no boot folder so Im guessing its the root folder of the usb. Capturing diagnostic information | Unraid Docs
August 26, 20241 yr To download to the share you need to set the local server IP in the Remote syslog server.
August 28, 20241 yr Author Thanks It crashed again at 3 days, logs attached. Seams odd to be hardware related if its every 3 days. syslog-192.168.0.121.log
August 28, 20241 yr Author seams like this is the crash point ug 29 05:09:59 Tower autofan: Highest disk temp is 41C, adjusting fan speed from: 175 (68% @ 1340rpm) to: 150 (58% @ 1171rpm) Aug 29 05:20:01 Tower crond[1707]: failed parsing crontab for user root: Invalid frequency setting of /usr/local/emhttp/plugins/ca.update.applications/scripts/updateApplications.php >/dev/null 2>&1 Aug 29 05:29:16 Tower emhttpd: spinning down /dev/sde Aug 29 05:30:25 Tower emhttpd: spinning down /dev/sdc Aug 29 05:32:42 Tower emhttpd: spinning down /dev/sdd Aug 29 05:32:45 Tower emhttpd: spinning down /dev/sdf Aug 29 05:33:39 Tower kernel: mdcmd (57): set md_write_method 0 Aug 29 05:33:39 Tower kernel: Aug 29 05:35:05 Tower autofan: Highest disk temp is 0C, adjusting fan speed from: 150 (58% @ 1167rpm) to: OFF (0% @ 0rpm) Aug 29 06:05:01 Tower crond[1707]: failed parsing crontab for user root: Invalid frequency setting of /usr/local/emhttp/plugins/ca.update.applications/scripts/updateApplications.php >/dev/null 2>&1 Aug 29 06:55:01 Tower crond[1707]: failed parsing crontab for user root: Invalid frequency setting of /usr/local/emhttp/plugins/ca.update.applications/scripts/updateApplications.php >/dev/null 2>&1 Aug 29 07:00:25 Tower kernel: veth193667f: renamed from eth0 Aug 29 07:00:37 Tower kernel: eth0: renamed from vetha3fcec2 Aug 29 07:00:40 Tower kernel: veth0cba01d: renamed from eth0 Aug 29 07:00:41 Tower kernel: eth0: renamed from veth0273903 Aug 29 07:00:45 Tower kernel: vethd666a3b: renamed from eth0 Aug 29 07:00:47 Tower kernel: eth0: renamed from veth7589674 Aug 29 07:00:50 Tower kernel: docker0: port 2(vethaad0a38) entered disabled state Aug 29 07:00:50 Tower kernel: vethade13f7: renamed from eth0 Aug 29 07:00:50 Tower kernel: docker0: port 2(vethaad0a38) entered disabled state Aug 29 07:00:50 Tower kernel: vethaad0a38 (unregistering): left allmulticast mode Aug 29 07:00:50 Tower kernel: vethaad0a38 (unregistering): left promiscuous mode Aug 29 07:00:50 Tower kernel: docker0: port 2(vethaad0a38) entered disabled state Aug 29 07:01:04 Tower emhttpd: read SMART /dev/sdd Aug 29 07:01:05 Tower kernel: docker0: port 2(veth11fffe6) entered blocking state Aug 29 07:01:05 Tower kernel: docker0: port 2(veth11fffe6) entered disabled state Aug 29 07:01:05 Tower kernel: veth11fffe6: entered allmulticast mode Aug 29 07:01:05 Tower kernel: veth11fffe6: entered promiscuous mode Aug 29 07:01:05 Tower emhttpd: read SMART /dev/sdf Aug 29 07:01:05 Tower kernel: eth0: renamed from veth4dabbdb Aug 29 07:01:05 Tower kernel: docker0: port 2(veth11fffe6) entered blocking state Aug 29 07:01:05 Tower kernel: docker0: port 2(veth11fffe6) entered forwarding state Aug 29 07:01:09 Tower kernel: veth4d63e1d: renamed from eth0 Aug 29 07:01:09 Tower kernel: docker0: port 1(vethee93c1c) entered disabled state Aug 29 07:01:09 Tower kernel: docker0: port 1(vethee93c1c) entered disabled state Aug 29 07:01:09 Tower kernel: vethee93c1c (unregistering): left allmulticast mode Aug 29 07:01:09 Tower kernel: vethee93c1c (unregistering): left promiscuous mode Aug 29 07:01:09 Tower kernel: docker0: port 1(vethee93c1c) entered disabled state Aug 29 07:01:31 Tower kernel: docker0: port 1(vethe9bcb46) entered blocking state Aug 29 07:01:31 Tower kernel: docker0: port 1(vethe9bcb46) entered disabled state Aug 29 07:01:31 Tower kernel: vethe9bcb46: entered allmulticast mode Aug 29 07:01:31 Tower kernel: vethe9bcb46: entered promiscuous mode Aug 29 07:01:32 Tower kernel: eth0: renamed from vethf6a58a6 Aug 29 07:01:32 Tower kernel: docker0: port 1(vethe9bcb46) entered blocking state Aug 29 07:01:32 Tower kernel: docker0: port 1(vethe9bcb46) entered forwarding state Aug 29 07:50:01 Tower rc.rsyslogd: Syslog server daemon... Started. Aug 29 07:50:01 Tower file.activity: Starting File Activity
August 29, 20241 yr Unfortunately there's nothing relevant logged, this can be a hardware issue, one thing you can try is to boot the server in safe mode with all docker containers/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one.
September 4, 20241 yr Author Thanks, I ended up deleting all plugins except the basic ones you need and the crashing stopped. Its run longer than usually. The last plugin I installed was the fan speed plugin. So I have a feeling it was that one.
September 12, 20241 yr Author scratch that, the issue suddenly returned and went into an on / off re-loop. I swapped the gen 12 cpu with a gen 12 computer and its running again. Will take a while to work out the cause but I'm considering it may be instability with gen 13 intel on older motherboards. I'm reading there not as stable.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.