March 12, 20206 yr This started a couple weeks ago when I saw the unraid server was partially locked up. I had no access to dockers, and one cpu core was pinned at 100%. There were many BTRFS errors in the log. I had a pair of 2TB hard drives in a raid 0 pool for cache. My docker image was on a SSD mounted in unassigned drives. One of the cache drives had UDMA CRC errors. I tried replacing cables, and connecting it to different adapter/mobo ports. I couldn't run Unraid with docker enabled for more than a few hours before it would lock up again. One of the array drives also went into disabled state. At this point I broke up my cache drives, formatted one as XFS and used it for the cache drive. Still locked up after a few hours. Deleted the Docker image on the ssd drive and created a new one, no change. I also managed to lose some files from my flash drive including the key file. I was able to reformat the flash drive and apply my license key. I unmounted the SSD drive, created the docker image on the cache drive. At this point it seemed stable. I disabled docker, used xfs_repair to fix the array drive, rebuilt the drive and did a parity check. I installed a few dockers, turned on docker and it ran for 2.5 days before I encountered the problem again. I can't get into any dockers or the docker settings, and one cpu is pinned. I captured the diagnostics and attached them. Help would be much appreciated at this point. godzilla-diagnostics-20200312-1226.zip
March 12, 20206 yr Multiple filesystem corruptions suggest a hardware problem, like bad RAM, I would start with memtest.
March 12, 20206 yr Author I forgot to mention that I did do memtest a couple times, no errors either time. The one on the unraid USB wouldn't run, but had another bootable USB with memtest on it.
March 12, 20206 yr Recreating the docker image should fix it for now, but it will likely happen again.
March 12, 20206 yr Author I'll run memtest again tonight. If the memory is ok, am I looking at motherboard/cpu failing? I have an I5 and motherboard available, but it doesn't have enough sata ports/pcie slots to handle all my drives.
March 13, 20206 yr If memtest finds nothing try with a btrfs pool again, btrfs usually gets corrupted faster when there's a hardware problem, and the type of corruption might give a clue on what the issue is, just be sure to save the diags before rebooting.
March 14, 20206 yr Author I was able run memtest again, and it got plenty of errors this time. I'll swap it out and retest.
March 16, 20206 yr Author Just to close this off. I had an extra pair of ram sticks. I did memtest on them together and individually, and all passed. Unraid server became unusable whether I used the original ram or the replacement ram. It would boot up, http would not be available. I could ssh in, but as soon as I typed a command like "ls" or "cat", ssh hung. I've replaced the motherboard, cpu, and ram. Everything is working again. It was an 8 year old mobo/cpu that had been running my unraid server for 4 years, but it looks like it has died
Archived
This topic is now archived and is closed to further replies.