Drago69 Posted March 12, 2020 Share Posted March 12, 2020 This started a couple weeks ago when I saw the unraid server was partially locked up. I had no access to dockers, and one cpu core was pinned at 100%. There were many BTRFS errors in the log. I had a pair of 2TB hard drives in a raid 0 pool for cache. My docker image was on a SSD mounted in unassigned drives. One of the cache drives had UDMA CRC errors. I tried replacing cables, and connecting it to different adapter/mobo ports. I couldn't run Unraid with docker enabled for more than a few hours before it would lock up again. One of the array drives also went into disabled state. At this point I broke up my cache drives, formatted one as XFS and used it for the cache drive. Still locked up after a few hours. Deleted the Docker image on the ssd drive and created a new one, no change. I also managed to lose some files from my flash drive including the key file. I was able to reformat the flash drive and apply my license key. I unmounted the SSD drive, created the docker image on the cache drive. At this point it seemed stable. I disabled docker, used xfs_repair to fix the array drive, rebuilt the drive and did a parity check. I installed a few dockers, turned on docker and it ran for 2.5 days before I encountered the problem again. I can't get into any dockers or the docker settings, and one cpu is pinned. I captured the diagnostics and attached them. Help would be much appreciated at this point. godzilla-diagnostics-20200312-1226.zip Quote Link to comment
JorgeB Posted March 12, 2020 Share Posted March 12, 2020 Multiple filesystem corruptions suggest a hardware problem, like bad RAM, I would start with memtest. Quote Link to comment
Drago69 Posted March 12, 2020 Author Share Posted March 12, 2020 I forgot to mention that I did do memtest a couple times, no errors either time. The one on the unraid USB wouldn't run, but had another bootable USB with memtest on it. Quote Link to comment
JorgeB Posted March 12, 2020 Share Posted March 12, 2020 Recreating the docker image should fix it for now, but it will likely happen again. Quote Link to comment
Drago69 Posted March 12, 2020 Author Share Posted March 12, 2020 I'll run memtest again tonight. If the memory is ok, am I looking at motherboard/cpu failing? I have an I5 and motherboard available, but it doesn't have enough sata ports/pcie slots to handle all my drives. Quote Link to comment
JorgeB Posted March 13, 2020 Share Posted March 13, 2020 If memtest finds nothing try with a btrfs pool again, btrfs usually gets corrupted faster when there's a hardware problem, and the type of corruption might give a clue on what the issue is, just be sure to save the diags before rebooting. 1 Quote Link to comment
Drago69 Posted March 14, 2020 Author Share Posted March 14, 2020 I was able run memtest again, and it got plenty of errors this time. I'll swap it out and retest. Quote Link to comment
Drago69 Posted March 16, 2020 Author Share Posted March 16, 2020 Just to close this off. I had an extra pair of ram sticks. I did memtest on them together and individually, and all passed. Unraid server became unusable whether I used the original ram or the replacement ram. It would boot up, http would not be available. I could ssh in, but as soon as I typed a command like "ls" or "cat", ssh hung. I've replaced the motherboard, cpu, and ram. Everything is working again. It was an 8 year old mobo/cpu that had been running my unraid server for 4 years, but it looks like it has died Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.