v6.11.5 Unable to start array after loss of cache pool [SOLVED]


AjaxMpls
Go to solution Solved by JorgeB,

Recommended Posts

I had a pair of Crucial MX500 SSDs in RAID1 for my cache pool and had noticed the logs filling up with IO errors. Thinking these errors were related to the previously reported BTRFS problems with the Crucial firmware, I planned to replace the cache drives with another brand.

 

In the interim, I applied the update from 6.11.3 to 6.11.5 and rebooted. Upon rebooting, the array refused to start, complaining the cache drives were unmountable. I foolishly tried removing and re-adding the drives to the pool but applied them in the wrong position and lost the MBRs - cache pool was a total loss.

 

In any case, I am now trying to rebuild with my replacement cache drive installed. I precleared it, formatted, and added to the cache pool, and click Start array. The array does not start, though. The parity check begins but the array stays offline. I tried letting the parity check complete, thinking maybe the array would start after the check is completed, but the parity check just hangs at 29.1% complete. I let it sit at that level of completion for about an hour but no further progress was had.

 

I've rebooted again and am still getting the same behavior. Parity check starts but array does not. I am also getting a message about a stale configuration. I'm not sure where to go from here.

unraid-diagnostics-20230223-1858.zip

Link to comment

Last night I rebooted again and this time the parity check did finish. It still showed the array stopped but after rebooting once more it looks normal and the array is started and I was able to start the docker service.

 

I do still have the removed cache disks. I had tried mounting them with Unassigned Devices without luck. Any other suggestions to recover the data?

Link to comment

OK, that was really very impressive. All my docker containers appear to be in working order now, and one of my VMs is working properly. I do have a Windows 10 VM that will not boot now, though. Getting bluescreens repeatedly, so I'm guessing that one is a loss.

 

So some lessons learned on this one. I'll get those sketchy cache disks replaced, keep the appdata folder backed up, and what is the best practice for backing up VMs, since snapshots are not supported? I'm fine with shutting the VM down before backing up - just copy the vdisk + libvert?

unraid-diagnostics-20230224-1353.zip

Link to comment
  • Solution
Feb 24 13:22:14 unraid kernel: BTRFS info (device loop3): bdev /dev/loop3 errs: wr 0, rd 0, flush 0, corrupt 25, gen 0

 

libvirt.img is also corrupt, but with that one if you re-create it you will need to reconfigure the VMs, do you have a backup?

 

11 hours ago, AjaxMpls said:

since snapshots are not supported?

Snapshots are supported, it's how I backup my VMs, there's just no GUI support, but you can take them manually (or with a script) or by using the Snapshot plugin.

Link to comment
  • AjaxMpls changed the title to v6.11.5 Unable to start array after loss of cache pool [SOLVED]

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.