wug

Members

Joined
March 29, 201610 yr
Last visited
November 5, 20232 yr

View Profile Find content

Noob

Current rank (1/14)

Posts

Find content

20
Reputation
Neutral

0

Gender
Undisclosed

The recent visitors block is disabled and is not being shown to other users.

wug started following [6.12.x] Completely inexplicable, random crashes for 2+ months
- October 27, 20232 yr
[6.12.x] Completely inexplicable, random crashes for 2+ months
[6.12.x] Completely inexplicable, random crashes for 2+ months

wug posted a report in Stable Releases

I have been having an inexplicable issue where the server hangs and requires a hard reboot to return to normal. This has occurred since upgrading to 6.12.3 (and now 6.12.4). When I say it's random, I mean it: Sometimes the server has lasted 4+ days after a reboot, sometimes it's become unresponsive within minutes of booting. Sometimes it has crashed while disks are active, sometimes when they are idle. It's crashed with all of these combinations: Docker disabled, VMs disabled Docker enabled, VMs disabled Docker disabled, VMs enabled Docker enabled, VMs enabled It's crashed when it's connected to the usual network, and when it's on its own dedicated subnet. Crashed for each ethernet port on the motherboard being used as the sole network connect (and that's a 1G and a 2.5G port, so they aren't even the same hardware or drivers!) It's crashed with the configuration that I've built up over the last eight years of running Unraid, and it's crashed with a fresh configuration on a fresh flash drive. There is not a single condition that is actually correlated with crashes. I believe this is related to these issues, but I'm creating a new post because one was marked as closed, and I also went to some pretty significant lengths to try to debug this: Debugging Process This has been going on since August, and I've done absolutely everything possible to eliminate defective hardware as a possibility. That includes: swapping out all PCIe cards with spares a run of the system with each indivdual drive disconnected, one at a time (i.e. I remove one disk, see if it still crashes, if it does, put the disk back in and pull the next one). every single non-destructive stress test I can think of fsck each disk and pool individually run every maintenance operation I can think of tested various configurations of power and sleep settings on the motherboard BIOS Logging Process Here's the wildly frustrating part: I creating a syslog configuration that would log basically every single message it could (including marks) to a log file on a ext4-formatted flash drive mounted as an unassigned device. I had at least a dozen log files that didn't contain a single error and before those log files end, there is an unbroken sequences of --MARK-- lines that goes back for hours before the system locked up. I've also tried using various notification methods to try to receive messages that the system is dying, and I've also tried setting up remote logging. None of them ever surface an issue anywhere near the time of the crash, so the crash is definitely also killing outbound networking. There is one tiny hint of what might be going on, and that is for a period of time, when I rebooted after a crash, I would get a "udma crc error count returned to normal value" for a drive (but it never seemed to be consistent). However, all the components have since been removed and added back to the server and I haven't seen an issue like that in a while. I'll also add that rebooting requires holding down the power button on the computer until it shuts off. If I just do a quick press once, nothing happens: the server keeps running, the monitor doesn't wake up, nothing happens to indicate that anything was actually able to capture that ACPI signal. Fresh Install Last night, as my last step, I used the USB creator tool to create a brand new boot disk with 6.12.4 (on a factory-sealed flash drive), copied over only the bare minimum configuration files (like the array config). Again, it crashed. Unraid 6.12.4 is Fundamentally Broken on Some Systems? I'm just rolling back to 6.12.2 at this point, because there is nothing abnormal in the diagnostics or logs that would indicate an actual problem. I've attached the diagnostics file from the fresh install from before the array was even started, because this locking up problem happens even when nothing is mounted and the system is just idling. But it also happens when a parity check is running, so it's not just a high-load or low-load issue. tl;dr: There is some issue occurring with Unraid ≥6.12.3 that cannot be detected through any normal logging methods, and has made my local installation totally unusable since August—and I'm apparently not the only one. mediatower-diagnostics-20231027-1338.zip
- October 27, 20232 yr
- 8 comments
wug started following Docker IP Address in 6.2 Beta , Server has shutdown every night for the past few nights and Security of UnRAID in general vs. security of VM/Docker
- August 4, 20178 yr
Preclear plugin
Preclear plugin

wug replied to gfjardim's topic in Plugin Support

Do you have the 6.2 beta? The line in the preclear script that triggers the "Device X is busy" calls sfdisk -R, but the version of sfdisk in 6.2 doesn't have the -R option.
- April 11, 201610 yr
- 3148 replies

wug

Joined

Last visited

Noob

Posts

Reputation

[6.12.x] Completely inexplicable, random crashes for 2+ months

Preclear plugin

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)