Hard Crash...disk error... looking for advice before proceeding

goinsnoopin · May 6, 2019

Unraid 6.6.7
Here is what I am experiencing:

VMs kept going to resume

Attempted to shutdown VMs, by hitting resume then quickly hitting stop. One of the two VMs shut down. When doing the same for the second VM, Unraid crashed. No webgui, no ssh/telnet access. Server is headless so I don't have keyboard or monitor (will in future). So I was unable to get diagnostics for this original crash. That being said I know my cache drive was over utilized as I had this happen once before (VMs going to resume) when a docker log filled the cache drive. I also know a couple of the array disks had little free space and I know that can sometime be an issue.

Due to hard crash, power cycled Unraid. Unraid came up and parity check started. My two VMs that were running during the crash were no longer listed in my VM tab (my two other VMs that were not running at the time were the only two listed). A handful of dockers were running, balance of them would not start.
Decided to reboot unraid via webgui to see if they would return. It returned the exact same way only 2 VMs with handful of dockers. Also note that it says parity check canceled, so I probably should not have done this as parity check was in progress.

Deleted docker.img and recreated with 10 or so of my key dockers from templates. Worked with no issues. Went to bed.

Got up this morning....read some forums and decided to run btrfs balance on cache drive. While this was running, I was looking into VM issue. Realized I had an older libvirt.img in another location and switched to this image. Went to VM tab and saw a bunch of older VMs I had from lets say a year ago or so. All VMs were stopped.
I then went to my VM xml file backups and copied the VM xml for my main windows 10 VM, went to xml mode, pasted the backup xml file, unchecked start on creation and then created the new VM.
I immediately got a disk 3 read error, followed by notification that disk 3 was disabled.
Went to btrfs balance and saw it was still running so I canceled it, stopped the array, went to settings to turn off auto start array, rebooted server and started array in maintenance mode.
After rebooting in maintenance mode, disk 3 shows up, has a smart report and in looking at the smart status stats, I don't see any issues. I am currently running an extended smart test on this disk and am waiting for results.
I know this is a lot...any advice on how to proceed?

Hard Crash...disk error... looking for advice before proceeding

Recommended Posts

goinsnoopin

Link to comment

Archived