Jump to content

6.12.6 Stuck on "Array Starting - Mounting disks..." and Corruption on Cache


Go to solution Solved by JorgeB,

Recommended Posts

Hello, I woke up to a downed Unraid server.

 

  • Problem
    • I logged into Unraid to see all CPUs at 100%
    • Qemu was the process that seemed to be having the problem
    • Shutting down via GUI and CLI did not work, hard reboot was performed
    • On restart, Unraid is stuck on "Array Starting - Mounting Disks" with half CPUs at 100%, though top command does not identify any processes eating up those cycles
    • I pulled a diagnostic file (attached)
    • Noticed in syslog: "status: One or more devices has experienced an error resulting in data corruption.  Applications may be affected."
    • Executed command /usr/sbin/zpool status -PLv tier1-cache (adding -v) which identified a single file on one of the NVME's in a cache is corrupted
      • The file is a qcow2 file, my Home Assistant vm.
      • Note: I had received an error from Home Assistant regarding a file being corrupted, but assumed it was benign
    •  Currently I can boot it into safe mode and mount array within maintenance mode
  • Other Info
    • Luckily, I do have much of the data from this cache backed up via vm and appdata backups however unsure how to proceed with accessing those files and moving them over
    • I did short SMART scans on both NVMEs and they came back with no errors

mothership-diagnostics-20240207-2306.zip

Edited by ajs
Moved to new thread, updated to reflect issue
Link to comment
2 minutes ago, ajs said:

Please advise if I should move this to a new post if it is not related.

Usually better to start your own thread and let the other user have their thread. Unless asking for support on specific dockers or plugins, then use the already created support thread.

 

I have split your post into its own thread.

Link to comment
  • ajs changed the title to 6.12.6 Stuck on "Array Starting - Mounting disks..." and Corruption on Cache
  • Solution

Pool is corrupt, before doing anything else I would recommend running at least a couple of memtest passes, if nothing is found see if the pool mounts read-only:

 

zpool import -o readonly=on tier1-cache

 

If it does, start the array, it will still show unmountable on the GUID, but the data will be under /mnt/tier1-cache, then backup what you can somewhere else, note that the corrupt file will fail to copy.

Link to comment

Thank you for the reply. I gave it 4 hrs of memtest (passed).

I was able to follow the above and get access to the files. As you mentioned the corrupt didn't cp. Thank you backups!

 

Next steps, is it safe to assume I need to just format the disks or should I recreate the entire cache?

 

Second, a few weeks ago my docker.img got corrupted which was a strange, but on the same disks. Since memory passed, any other reasons I would have seen this corruption error? Neutrinos? :P

Link to comment

Got the Unraid server back up and running!

 

Between being able to pull the files off the (corrupted) cache pool and the back ups (VM Backup and AppData Backups), I was able to fully recover.

 

Now if anyone can tell me how the file got corrupted!

 

Thanks @JorgeB!

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...