August 30, 20241 yr Cache pool is two 2TB NVME mirrored BTRFS About a week ago, I woke up to my cache pool being offline due errors. This happened once before about 9 months ago. I was able to mostly recover this time. While I wasn't sure, I suspected I may have run out of memory(now think that's wrong). I've been watching the logs the last week and all seemed fine until today. I noticed a bunch of errors again, but the drive hadn't gone offline yet. I figured I'd stop the array, remove the problem drive and be able to continue on the remaining drive. I rebooted and the cache is reporting no file system. So now I'm not sure what to do. The data should be there, I think just screwed up by removing the other drive. Can anyone assist with getting my remaining drive back online? unraid-diagnostics-20240830-1248 2.zip Edited August 30, 20241 yr by WashingtonMatt Type-o
August 30, 20241 yr Author I was able to use This post and mount my drive read only. Currently copying my data to the array. Hopefully not corrupted. Both drives are currently in unassigned devices. Oddly the "bad" drive is showing some reads as I copy the data, and the system log is barfing a ton of errors... I could use some help in determining what actually went wrong. The whole point of mirroring the drives was to be able to have a failure.
August 31, 20241 yr Community Expert The syslog doesn't show the start of the issue, but looks like one of the NVMe devices dropped offline, once the backup is done, power cycle the server and post new diags after array start.
September 8, 20241 yr Author Finally getting an opportunity to get back to this. These things never happen a convenient time. To get working again, I just recreated the cache with a single drive. Things seem to have been running fine with no errors. I would still like to run the other drive in it's own pool, but it seems to be locked up by unraid. I'm not really sure the proper way to proceed. The attached diagnostics is just after a cold boot and attempting to mount the drive via unassigned devices. unraid-diagnostics-20240908-1349.zip
September 8, 20241 yr Community Expert If you want to use it with a different pool, wipe the second device with blkdiscard -f /dev/nvmeXn1 Then add it to a new pool and format it.
September 8, 20241 yr Author Yes, that worked, thank you. Now I'm still unclear what happened with my original corruption issue. I'm not convinced it's a hardware issue, both times this occurred, I think I was pushing the limits of available system memory with many VM's and dockers running, then when overnight backup tasks starting running, the cache pool corrupted and went read only. Does that seem like a possible cause? What's a good way to test this cache drive?
September 9, 20241 yr Community Expert NVMe devices dropping offline is usually not a device problem, if it happens again post new diags before rebooting.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.