6.12.6 Stuck on "Array Starting - Mounting disks..." and Corruption on Cache

ajs · February 8

Hello, I woke up to a downed Unraid server.

Problem
- I logged into Unraid to see all CPUs at 100%
- Qemu was the process that seemed to be having the problem
- Shutting down via GUI and CLI did not work, hard reboot was performed
- On restart, Unraid is stuck on "Array Starting - Mounting Disks" with half CPUs at 100%, though top command does not identify any processes eating up those cycles
- I pulled a diagnostic file (attached)
- Noticed in syslog: "status: One or more devices has experienced an error resulting in data corruption. Applications may be affected."
- Executed command /usr/sbin/zpool status -PLv tier1-cache (adding -v) which identified a single file on one of the NVME's in a cache is corrupted
  - The file is a qcow2 file, my Home Assistant vm.
  - Note: I had received an error from Home Assistant regarding a file being corrupted, but assumed it was benign
- Currently I can boot it into safe mode and mount array within maintenance mode
Other Info
- Luckily, I do have much of the data from this cache backed up via vm and appdata backups however unsure how to proceed with accessing those files and moving them over
- I did short SMART scans on both NVMEs and they came back with no errors

mothership-diagnostics-20240207-2306.zip

Edited February 8 by ajs
Moved to new thread, updated to reflect issue

trurl · February 8

2 minutes ago, ajs said:

Please advise if I should move this to a new post if it is not related.

Usually better to start your own thread and let the other user have their thread. Unless asking for support on specific dockers or plugins, then use the already created support thread.

I have split your post into its own thread.

JorgeB · February 8

Pool is corrupt, before doing anything else I would recommend running at least a couple of memtest passes, if nothing is found see if the pool mounts read-only:

zpool import -o readonly=on tier1-cache

If it does, start the array, it will still show unmountable on the GUID, but the data will be under /mnt/tier1-cache, then backup what you can somewhere else, note that the corrupt file will fail to copy.

ajs · February 8

Thank you for the reply. I gave it 4 hrs of memtest (passed).

I was able to follow the above and get access to the files. As you mentioned the corrupt didn't cp. Thank you backups!

Next steps, is it safe to assume I need to just format the disks or should I recreate the entire cache?

Second, a few weeks ago my docker.img got corrupted which was a strange, but on the same disks. Since memory passed, any other reasons I would have seen this corruption error? Neutrinos?

ajs · February 9

Got the Unraid server back up and running!

Between being able to pull the files off the (corrupted) cache pool and the back ups (VM Backup and AppData Backups), I was able to fully recover.

Now if anyone can tell me how the file got corrupted!

Thanks @JorgeB!

JorgeB · February 9

4 hours ago, ajs said:

Now if anyone can tell me how the file got corrupted!

Could still be a hardware issue, memtest is only definitive if it finds errors, could also be a device problem, keep monitoring, if it happens again likely there's an underlying issue.

6.12.6 Stuck on "Array Starting - Mounting disks..." and Corruption on Cache

Recommended Posts

ajs

Link to comment

trurl

Link to comment

JorgeB

Link to comment

ajs

Link to comment

ajs

Link to comment

JorgeB

Link to comment

Join the conversation