extrobe Posted September 6, 2020 Share Posted September 6, 2020 I have a 4xSSD Cache Pool on BTRFS (3x480GB, 1x500GB) Few days ago, one of the drives was being flagged for replacement (500gb). New one arrived, and read through the FAQ post The only bit I was different on, was I didn't have a spare port, so instead did the following... - Stopped the array - Pulled out the faulty disk caddy - Replaced the disk with the new one - Selected the new disk in the pool (which is pretty much the same process I use on the main disks) But... whilst the offending disk shows as a 'new device', now one of the other other 3 disks is showing as Unmountable: No File System I've tried stopping the array again, and removing/re-inserting the disk. I've also tried putting the old disk back in, but I don't seem to be able progress from here. Is this recoverable? I have partial backups, so not all is lost, but annoyingly I think my PLEX instances were on my exclude list, and probably my biggest 'loss' Quote Link to comment
extrobe Posted September 6, 2020 Author Share Posted September 6, 2020 Although, perhaps I'm just being a dunce - is the Unmountable message just reflecting that the pool as an entirety can't be mounted? It's prompting me to format the 'lead' disk in the cache - it was cache 4 which I swapped out Quote Link to comment
extrobe Posted September 6, 2020 Author Share Posted September 6, 2020 (edited) Ok... I did the following... Stopped Array Started Array in Maintenance Mode Ran mkdir /x mount -o degraded,usebackuproot,ro /dev/sdh1 /x Realised I should probably run that not in maintenance mode Stopped Array Started Array (normal) and the cache pool seemingly is back online Not sure if I've lost data or not, but I can't get docker to start "Docker Service failed to start" EDIT: Did a restart, and back to being unmountable EDIT: Repeating the previous steps, this time copying to the array using Midnight Commander - but getting a lot of copy errors (keeps saying [stalled]), and pretty sure there are some missing some directories. Would putting the old disk in again, and using the above command be a sensible next step? EDIT: Adding the old disk back just gives the warning 'all data will be overwritten when you start the array', so doesn't feel like this will work Edited September 6, 2020 by extrobe Quote Link to comment
JorgeB Posted September 6, 2020 Share Posted September 6, 2020 2 hours ago, extrobe said: The only bit I was different on, was I didn't have a spare port, so instead did the following... The FAQ mentions to use another procedure if you don't have a spare port. Please post the diagnostics: Tools -> Diagnostics Quote Link to comment
extrobe Posted September 6, 2020 Author Share Posted September 6, 2020 (edited) demeter-diagnostics-20200906-1810.zip Diagnostics attached. I did follow that link for no spare port, but it went back to the single-disk procedure, and wasn't sure if that was the right one to follow - so figured that as the multi-disk procedure was to just select a new disk (seemingly like for a standard disk) I thought I could just hot-swap them instead 😕 Edit: References to disk Crucial_CT500MX200 = Old Cache 4, Crucial_CT500MX500 = New Cache 4 Edited September 6, 2020 by extrobe Quote Link to comment
JorgeB Posted September 6, 2020 Share Posted September 6, 2020 According to syslog you're missing two devices:, devid 3 and 4, it's detecting 2 new devices. Sep 6 17:31:02 DEMETER emhttpd: cache uuid: c8f42191-039e-41d8-894d-bdd878c15864 Sep 6 17:31:02 DEMETER emhttpd: cache TotDevices: 4 Sep 6 17:31:02 DEMETER emhttpd: cache NumDevices: 4 Sep 6 17:31:02 DEMETER emhttpd: cache NumFound: 2 Sep 6 17:31:02 DEMETER emhttpd: cache NumMissing: 1 Sep 6 17:31:02 DEMETER emhttpd: cache NumMisplaced: 0 Sep 6 17:31:02 DEMETER emhttpd: cache NumExtra: 2 Sep 6 17:31:02 DEMETER emhttpd: cache LuksState: 0 Sep 6 17:31:02 DEMETER emhttpd: shcmd (408): mount -t btrfs -o noatime,nodiratime,degraded -U c8f42191-039e-41d8-894d-bdd878c15864 /mnt/cache Sep 6 17:31:02 DEMETER kernel: BTRFS info (device sdh1): allowing degraded mounts Sep 6 17:31:02 DEMETER kernel: BTRFS info (device sdh1): disk space caching is enabled Sep 6 17:31:02 DEMETER kernel: BTRFS info (device sdh1): has skinny extents Sep 6 17:31:02 DEMETER kernel: BTRFS warning (device sdh1): devid 3 uuid 91c09af4-c319-4ef1-a89e-17b8b9080b28 is missing Sep 6 17:31:02 DEMETER kernel: BTRFS warning (device sdh1): devid 4 uuid 20de7db3-546e-45a6-b08b-919ab79effeb is missing Sep 6 17:31:02 DEMETER kernel: BTRFS warning (device sdh1): chunk 1009532010496 missing 2 devices, max tolerance is 1 for writeable mount If you just replaced one there's a problem with another one, which is not being detected as a pool member, any idea why that would happen, did you do anything else? Quote Link to comment
extrobe Posted September 6, 2020 Author Share Posted September 6, 2020 I did swap 2 of the disks around (the 2x Samsungs) - that was because I thought the 'unmountable filesystem' message was specific to that disk, so swapped the bays over to check it wasn't a connection issue. But I checked the assignments still matched before moving on. Quote Link to comment
JorgeB Posted September 6, 2020 Share Posted September 6, 2020 If by swapping you mean just changing slots that wouldn't be a problem, something else must have happened, or the pool was already missing a device, you can try the old cache device if you still have it untouched, if not you can't recover the pool with 2 missing devices. Quote Link to comment
extrobe Posted September 6, 2020 Author Share Posted September 6, 2020 (edited) Yes, just changing slots. I'm trying to get it to read off the original disk (the disk itself should have still been physically ok, it was just approaching EoL), but struggling to get it to include it in the pool. Working my way through some of the BTRFS troubleshooting steps, but starting to look like a lost cause EDIT: Looks like it's the other Crucial disk which is not showing up - but there was nothing to suggest it was an issue before hand - in fact, I checked the SMART data before I started the replacement as wanted to see how much life that one had left. When I try to mount it, it says the Special Device doesn't exist - any diagnostics I can do on this to work out why that might be / confirm it's damaged? Looks like the data on the original disk has also already gone - when I try to mount it, it says wrong FS type Edited September 6, 2020 by extrobe Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.