ajs Posted February 8 Share Posted February 8 (edited) Hello, I woke up to a downed Unraid server. Problem I logged into Unraid to see all CPUs at 100% Qemu was the process that seemed to be having the problem Shutting down via GUI and CLI did not work, hard reboot was performed On restart, Unraid is stuck on "Array Starting - Mounting Disks" with half CPUs at 100%, though top command does not identify any processes eating up those cycles I pulled a diagnostic file (attached) Noticed in syslog: "status: One or more devices has experienced an error resulting in data corruption. Applications may be affected." Executed command /usr/sbin/zpool status -PLv tier1-cache (adding -v) which identified a single file on one of the NVME's in a cache is corrupted The file is a qcow2 file, my Home Assistant vm. Note: I had received an error from Home Assistant regarding a file being corrupted, but assumed it was benign Currently I can boot it into safe mode and mount array within maintenance mode Other Info Luckily, I do have much of the data from this cache backed up via vm and appdata backups however unsure how to proceed with accessing those files and moving them over I did short SMART scans on both NVMEs and they came back with no errors mothership-diagnostics-20240207-2306.zip Edited February 8 by ajs Moved to new thread, updated to reflect issue Quote Link to comment
trurl Posted February 8 Share Posted February 8 2 minutes ago, ajs said: Please advise if I should move this to a new post if it is not related. Usually better to start your own thread and let the other user have their thread. Unless asking for support on specific dockers or plugins, then use the already created support thread. I have split your post into its own thread. Quote Link to comment
Solution JorgeB Posted February 8 Solution Share Posted February 8 Pool is corrupt, before doing anything else I would recommend running at least a couple of memtest passes, if nothing is found see if the pool mounts read-only: zpool import -o readonly=on tier1-cache If it does, start the array, it will still show unmountable on the GUID, but the data will be under /mnt/tier1-cache, then backup what you can somewhere else, note that the corrupt file will fail to copy. Quote Link to comment
ajs Posted February 8 Author Share Posted February 8 Thank you for the reply. I gave it 4 hrs of memtest (passed). I was able to follow the above and get access to the files. As you mentioned the corrupt didn't cp. Thank you backups! Next steps, is it safe to assume I need to just format the disks or should I recreate the entire cache? Second, a few weeks ago my docker.img got corrupted which was a strange, but on the same disks. Since memory passed, any other reasons I would have seen this corruption error? Neutrinos? Quote Link to comment
ajs Posted February 9 Author Share Posted February 9 Got the Unraid server back up and running! Between being able to pull the files off the (corrupted) cache pool and the back ups (VM Backup and AppData Backups), I was able to fully recover. Now if anyone can tell me how the file got corrupted! Thanks @JorgeB! Quote Link to comment
JorgeB Posted February 9 Share Posted February 9 4 hours ago, ajs said: Now if anyone can tell me how the file got corrupted! Could still be a hardware issue, memtest is only definitive if it finds errors, could also be a device problem, keep monitoring, if it happens again likely there's an underlying issue. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.