July 29, 2025Jul 29 My system has been down for a few months while I tried to troubleshoot a number of problems. It looks like the old motherboard went bad, and given the age of the Intel 10980xe, finding a replacement MB was tough. Finally, I found one and figured out the memory wasn't compatible. I upgraded the parity drives and SATA controller to a 9500-16i, etc.So I'm not super surprised I had a problem with a file system error given the problems that were occurring, but I can't get the system to a place to start fixing the fundamental problems (it seemed to be working until I tried to scrub the ZFS cache).Last night, after the scrub checking zpool status -v and seeing the bad files, I tried to delete them. The affected files were Plex cache/library files. That process ran an hour before I decided to shut down Docker.Then I tried to scrub again from the GUI. The system seemed to get stuck with 8 or so CPUs maxed out at 100%.I tried to shut down, and nothing changed, so I let it run overnight for 12 hours, and there was no change.I still couldn't shut down.I finally pulled the power this morning.Tried to start the array and after 3 hours it is still starting, and 2 CPU coresare pinned again.Tried to run fix common errors, it stalls at 36%.What should I do from here? I'm guessing this is all related to the corruption of the ZFS cache (running 5 NVMe's in a raidz1).atlas-diagnostics-20250729-1055.zip Edited July 29, 2025Jul 29 by Christobol
July 29, 2025Jul 29 Author Solution I forced a power cycle. I rebooted into safe mode, started the array in maintenance mode, and it worked.-Booted safemode again - Started Array-System is stuck here like earlier today. I've attached new logs (the prior logs had no updates from what was posted after running ~6 hours other than a reboot command)-2 cpus are pinned at 100% utilization again- after ~32 minutes the array started as did one cache pool, the ZFS pool is still mounting -> I do not see a log entry that the other drives were available in the GUIzpool status -v outputI guess this is a zfs issue? I'll look for next steps. My inclination is to give up on the data on the drive, kill the pool, format and build a new one? atlas-diagnostics-20250729-1626.zip Edited July 29, 2025Jul 29 by Christobol clarification
July 30, 2025Jul 30 Pool has metadata corruption; see if it mounst read-only:zpool import -o readonly=on cache_zfs_nvme
August 12, 2025Aug 12 Author I must have email updates turned off since I didn't see this response. I ended up deleting the array and building everything from scratch. The appdata restore didn't work. So good learning lesson, now I have a failure test case to see if I can get the restore to work for the future.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.