Cache is broken

Dschijn · January 24, 2021

I switched to a rackmount 19" case with hot swap trays and build my unRAID system into the new case.

Only the HDDs went into the trays in the front and the SSDs (cache pool of 2 SSDs in RAID 1) have been directly connected to the motherboard SATA ports.

After some testing I tried to improve airflow in the case and build the SSDs from the cache into the front trays. Here something must have been going wrong.

I got the error that the first cache drive was not found (front LEDs saw the SSD on the case), I hot swapped the SSD and it was found at some point. Starting the array was weird and it seemd like the SSD was dropped by unRAID.

I tried to check "no device" for the first cache drive and only start with the 2nd. That worked somehow and I forced the system into a shutdown to troubleshoot the first SSD. I think this was a bad idea because the shutdown had to be forced since something was busy but I had no clue.

Now I have no way in getting anthing with the cache to work again.

The logs are missing the first cache disk in the BTRFS startup, the first disk is empty and not formatted, the 2nd disk seems to be fine but I don't know... I can't mount the 2nd cache disk.

Is there any way to revive my cache without loosing data?

I formatted to first cache drive (sdb) since it didn't want to mount without. But the 2nd (sdc) is still "stuck" with the error:

Quote

Jan 24 23:38:42 unRAID kernel: BTRFS warning (device sdc1): devid 1 uuid 63723161-457c-4ed3-9cb3-d6455c944c64 is missing
Jan 24 23:38:42 unRAID kernel: BTRFS error (device sdc1): failed to read chunk root
Jan 24 23:38:42 unRAID root: mount: /mnt/cache: wrong fs type, bad option, bad superblock on /dev/sdc1, missing codepage or helper program, or other error.
Jan 24 23:38:42 unRAID kernel: BTRFS error (device sdc1): open_ctree failed

Quote

root@unRAID:~# lsblk -f
NAME   FSTYPE     LABEL UUID                                 FSAVAIL FSUSE% MOUNTPOINT
loop0 btrfs             f746b3f6-bc6e-4631-85f9-1ad9ae8b7e06   19.5G     0% /var/lib/docker
loop1 squashfs                                                     0   100% /lib/firmware
loop2 btrfs             c0a3af37-b31d-43d9-8d95-222f0053ba72 904.5M     2% /etc/libvirt
sda
└─sda1 vfat       UNRAID 2732-64F5                              28.1G     2% /boot
sdb
└─sdb1 btrfs             47b645fe-3b6b-4401-b050-9b9ddb8c3d70 446.1G     0% /mnt/cache
sdc
└─sdc1

unraid-diagnostics-20210124-2347.zip

Herbiewalker · January 25, 2021

I guess you haven't had a chance or aren't able to do a backup of the cache to save that data and start over on your cache drives??

Not too sure about how to fix that one drive but if you could pull it off the server and use a 3rd party program to save the data then you might be more comfortable starting over on your cache.

I messed up my cache drive pool and the only fix that ended up working for me was to start over. Fortunately for me it was the day after my backup so I ended up ok

Dschijn · January 25, 2021

I assumed that my mirrored cache was good enough. I only used the Appdata Backup plugin, so I hope that at least the Docker are backed up.

From what I can read in the log files the 2nd SSD is not mounted since BTRFS is not "happy". Because of that I think that all data is still on the 2nd SSD, I just need to be able to access it or make it my main cache again.

JorgeB · January 25, 2021

Diags show that you formatted sdb:

Jan 24 23:41:08 unRAID emhttpd: shcmd (398): /sbin/wipefs -a /dev/sdb1
Jan 24 23:41:08 unRAID emhttpd: shcmd (399): mkfs.btrfs -f /dev/sdb1

Then added sdc to the pool, this also wiped that device:

Jan 24 23:41:08 unRAID emhttpd: shcmd (401): /sbin/wipefs -a /dev/sdc1
Jan 24 23:41:08 unRAID root: /dev/sdc1: 8 bytes were erased at offset 0x00010040 (btrfs): 5f 42 48 52 66 53 5f 4d

So there's no device still with old pool data to recover from.

Dschijn · January 25, 2021

Ok, lesson learned.

How can I remove the BTRFS warning?

Quote

BTRFS warning (device sdc1): devid 1 uuid 63723161-457c-4ed3-9cb3-d6455c944c64 is missing

Dschijn · January 26, 2021

@JorgeB ok the exact same thing happend today as well, wenn I tried to connect the SSDs to a tray and pushed them into the hot swap bays (PC was off) and I used other bays as the last time. The main cache SSD wasn't identified, which I only realized once unRAID was booted, but the array was not started.

I did a shutdown and tested the trays by going directly into the BIOS and not booting into unRAID.

In the end I connected the difficult SSD directly to a SATA port and didn't use the backplane of the front hot swap trays.

Now the main SSD was not listed as a cache, it was an unassigned devices and only the 2nd SSD was still listed as the 2nd SSD. After adding the unassigned Device SSD as the 1st cache disk, it got the "blue" status with the remark "All existing data on this device will be OVERWRITTEN when array is Started".

Is the cache/BTRFS unhappy about a missing drive and refuses it after adding it again? I am a bit lost about the behaviour of unRAID... I will be more patient and hope for a reply

unraid-diagnostics-20210126-2125.zip

JorgeB · January 27, 2021

Do this:

With the array stopped, if Docker/VM services are using the cache pool disable them, unassign all cache devices, start array to make Unraid "forget" current cache config, stop array, reassign all cache devices (there can't be an "All existing data on this device will be OVERWRITTEN when array is Started" warning for any cache device), re-enable Docker/VMs if needed, start array.

Dschijn · January 27, 2021

That worked. Thanks @JorgeB

Cache is broken

Recommended Posts

Dschijn

Link to comment

Herbiewalker

Link to comment

Dschijn

Link to comment

JorgeB

Link to comment

Dschijn

Link to comment

Dschijn

Link to comment

JorgeB

Link to comment

Dschijn

Link to comment

Join the conversation