AjaxMpls Posted February 24, 2023 Share Posted February 24, 2023 I had a pair of Crucial MX500 SSDs in RAID1 for my cache pool and had noticed the logs filling up with IO errors. Thinking these errors were related to the previously reported BTRFS problems with the Crucial firmware, I planned to replace the cache drives with another brand. In the interim, I applied the update from 6.11.3 to 6.11.5 and rebooted. Upon rebooting, the array refused to start, complaining the cache drives were unmountable. I foolishly tried removing and re-adding the drives to the pool but applied them in the wrong position and lost the MBRs - cache pool was a total loss. In any case, I am now trying to rebuild with my replacement cache drive installed. I precleared it, formatted, and added to the cache pool, and click Start array. The array does not start, though. The parity check begins but the array stays offline. I tried letting the parity check complete, thinking maybe the array would start after the check is completed, but the parity check just hangs at 29.1% complete. I let it sit at that level of completion for about an hour but no further progress was had. I've rebooted again and am still getting the same behavior. Parity check starts but array does not. I am also getting a message about a stale configuration. I'm not sure where to go from here. unraid-diagnostics-20230223-1858.zip Quote Link to comment
JorgeB Posted February 24, 2023 Share Posted February 24, 2023 If a parity check started the array is started, are you using Firefox? If yes reboot first then use a different browser. P.S. if there was important data there the old cache might still be recoverable, if they were just wiped by Unraid and you still have the devices untouched. Quote Link to comment
AjaxMpls Posted February 24, 2023 Author Share Posted February 24, 2023 Last night I rebooted again and this time the parity check did finish. It still showed the array stopped but after rebooting once more it looks normal and the array is started and I was able to start the docker service. I do still have the removed cache disks. I had tried mounting them with Unassigned Devices without luck. Any other suggestions to recover the data? Quote Link to comment
JorgeB Posted February 24, 2023 Share Posted February 24, 2023 Post new diags with both cache disks connected, and identify which devices they are. Quote Link to comment
AjaxMpls Posted February 24, 2023 Author Share Posted February 24, 2023 Thanks, attached. It's sdh & sdg. unraid-diagnostics-20230224-0948.zip Quote Link to comment
JorgeB Posted February 24, 2023 Share Posted February 24, 2023 Post the output of btrfs-select-super -s 1 /dev/sdg and btrfs-select-super -s 1 /dev/sdh Quote Link to comment
AjaxMpls Posted February 24, 2023 Author Share Posted February 24, 2023 root@unraid:~# btrfs-select-super -s 1 /dev/sdg No valid Btrfs found on /dev/sdg ERROR: open ctree failed root@unraid:~# btrfs-select-super -s 1 /dev/sdh No valid Btrfs found on /dev/sdh ERROR: open ctree failed Quote Link to comment
JorgeB Posted February 24, 2023 Share Posted February 24, 2023 Oops, sorry, my bad, should be: btrfs-select-super -s 1 /dev/sdg1 and btrfs-select-super -s 1 /dev/sdh1 Quote Link to comment
AjaxMpls Posted February 24, 2023 Author Share Posted February 24, 2023 oot@unraid:~# btrfs-select-super -s 1 /dev/sdg1 warning, device 5 is missing using SB copy 1, bytenr 67108864 root@unraid:~# btrfs-select-super -s 1 /dev/sdh1 using SB copy 1, bytenr 67108864 root@unraid:~# Quote Link to comment
JorgeB Posted February 24, 2023 Share Posted February 24, 2023 Now assign both devices to a pool, it must be a new pool or pool that was reset, i.e., it can't show a "data on this device will be deleted" for any of the pool devices, then start array and the old pool should import, if it doesn't post new diags. Quote Link to comment
AjaxMpls Posted February 24, 2023 Author Share Posted February 24, 2023 Nice! Yes it did import as a new pool and I can see my shares again. I tried toggling docker services off & on but it's still not starting up. Will I need to reboot again? Quote Link to comment
JorgeB Posted February 24, 2023 Share Posted February 24, 2023 Reboot should not be needed, please post diags. Quote Link to comment
AjaxMpls Posted February 24, 2023 Author Share Posted February 24, 2023 Posted. I'm having a hard time from the logs determining which drive is throwing the IO errors as they are both referenced. unraid-diagnostics-20230224-1105.zip Quote Link to comment
JorgeB Posted February 24, 2023 Share Posted February 24, 2023 One of the pool devices is out of sync, it likely dropped offline in the past, run a correcting scrub and post the results. Quote Link to comment
AjaxMpls Posted February 24, 2023 Author Share Posted February 24, 2023 scrub complete unraid-diagnostics-20230224-1133.zip Quote Link to comment
JorgeB Posted February 24, 2023 Share Posted February 24, 2023 And the output of the scrub? Quote Link to comment
AjaxMpls Posted February 24, 2023 Author Share Posted February 24, 2023 UUID: 40f53594-6f8d-42e0-afaa-a94807c15c5c Scrub started: Fri Feb 24 11:14:51 2023 Status: finished Duration: 0:17:43 Total to scrub: 778.31GiB Rate: 749.75MiB/s Error summary: verify=13827 csum=1531847 Corrected: 1545674 Uncorrectable: 0 Unverified: 0 Quote Link to comment
JorgeB Posted February 24, 2023 Share Posted February 24, 2023 OK, now reboot to clear the syslog and post new diags after array start (and docker start if it's not enabled) Quote Link to comment
AjaxMpls Posted February 24, 2023 Author Share Posted February 24, 2023 Still getting a "Docker service failed to start" message after reboot. unraid-diagnostics-20230224-1202.zip Quote Link to comment
JorgeB Posted February 24, 2023 Share Posted February 24, 2023 Recreate the docker image. Quote Link to comment
AjaxMpls Posted February 24, 2023 Author Share Posted February 24, 2023 Does this mean I need to configure all my containers anew after they are recreated? Quote Link to comment
JorgeB Posted February 24, 2023 Share Posted February 24, 2023 Nope, see the link. Quote Link to comment
AjaxMpls Posted February 24, 2023 Author Share Posted February 24, 2023 OK, that was really very impressive. All my docker containers appear to be in working order now, and one of my VMs is working properly. I do have a Windows 10 VM that will not boot now, though. Getting bluescreens repeatedly, so I'm guessing that one is a loss. So some lessons learned on this one. I'll get those sketchy cache disks replaced, keep the appdata folder backed up, and what is the best practice for backing up VMs, since snapshots are not supported? I'm fine with shutting the VM down before backing up - just copy the vdisk + libvert? unraid-diagnostics-20230224-1353.zip Quote Link to comment
Solution JorgeB Posted February 25, 2023 Solution Share Posted February 25, 2023 Feb 24 13:22:14 unraid kernel: BTRFS info (device loop3): bdev /dev/loop3 errs: wr 0, rd 0, flush 0, corrupt 25, gen 0 libvirt.img is also corrupt, but with that one if you re-create it you will need to reconfigure the VMs, do you have a backup? 11 hours ago, AjaxMpls said: since snapshots are not supported? Snapshots are supported, it's how I backup my VMs, there's just no GUI support, but you can take them manually (or with a script) or by using the Snapshot plugin. Quote Link to comment
AjaxMpls Posted February 26, 2023 Author Share Posted February 26, 2023 Thanks for your help, Jorge. I did not have a backup so I recreated that VM and have now configured backups using https://github.com/danioj/unraid-autovmbackup script. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.