Cache pool btrfs missing issue/disappeared/now unassigned

Adam Kesher · December 22, 2021

hi everyone,

So I had a weird sequence of events happen here with my cache disk.

I got a cache pool btrfs missing warning for my cache disk. I then ran scrub with fix errors checked. Still showed the error. Then I stopped the array and restarted. It disappeared when I restarted the array.

Then I restarted my whole unraid system and now it is showing up in unassigned devices. Hoping it can be remounted or salvaged somehow. I have diagnostics here. I screenshotted how it appears. It's the dev3 device.

any help would be great. thank you

zigplex2-diagnostics-20211222-1430.zip

dlandon · December 22, 2021

3 minutes ago, Adam Kesher said:

hi everyone,

So I had a weird sequence of events happen here with my cache disk.

I got a cache pool btrfs missing warning for my cache disk. I then ran scrub with fix errors checked. Still showed the error. Then I stopped the array and restarted. It disappeared when I restarted the array.

Then I restarted my whole unraid system and now it is showing up in unassigned devices. Hoping it can be remounted or salvaged somehow. I have diagnostics here. I screenshotted how it appears. It's the dev3 device.

any help would be great. thank you

zigplex2-diagnostics-20211222-1430.zip 140.99 kB · 0 downloads

You might try reassigning it back to the array, but I'd probably do a UD file system check first. You'd have to mount it then click on the check icon.

It looks mountable in UD if you'd just rather mount it and unload it. I'd suggest setting it to read only to stop any write activity. Click on the three gears and set 'Read Only' On.

Adam Kesher · December 22, 2021

38 minutes ago, dlandon said:

You might try reassigning it back to the array, but I'd probably do a UD file system check first. You'd have to mount it then click on the check icon.

It looks mountable in UD if you'd just rather mount it and unload it. I'd suggest setting it to read only to stop any write activity. Click on the three gears and set 'Read Only' On.

so i did try this, tried the scrub and got this in the log. Changed the last digits of the local IPs...,but that XX ending IP is an old IP that this server had before I gave it a dedicated IP on my router at YY. Could this be part of the issue, or just an issue running the scrub?

Quote

Dec 22 12:09:31 Zigplex2 nginx: 2021/12/22 12:09:31 [error] 7434#7434: *7170 upstream timed out (110: Connection timed out) while reading upstream, client: 192.168.1.XX, server: , request: "GET /plugins/unassigned.devices/include/fsck.php?device=/dev/nvme0n1p1&fs=btrfs&luks=&serial=Samsung_SSD_970_EVO_1TB_S467NX0M827957N&check_type=ro&type=Done HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock:", host: "192.168.1.YY", referrer: "http://192.168.1.YY/Main"

dlandon · December 22, 2021

Ok, looking at the message again I see where the UD fsck had a problem. I assume that the scrub didn't show any results?

Back out of any browsers you had been using to access the server and then start the browser over and try again.

Adam Kesher · December 22, 2021

3 hours ago, dlandon said:

Ok, looking at the message again I see where the UD fsck had a problem. I assume that the scrub didn't show any results?

Back out of any browsers you had been using to access the server and then start the browser over and try again.

ok i ran it again adn this time that same upstream timeout error popped up but this time i also got

Quote

Dec 22 15:53:30 Zigplex2 kernel: BTRFS info (device nvme0n1p1): scrub: finished on devid 1 with status: 0

the pop up window for the scrub only says this (though it did say 'transferring data from IP' for a bit

Quote

FS: btrfs

Executing file system scrub: /sbin/btrfs scrub start -B -R -d -r /dev/nvme0n1p1 2>&1

attaching diagnostics after all this

zigplex2-diagnostics-20211222-1859.zip

Edited December 23, 2021 by Adam Kesher
missing info, adding diagnostics

JorgeB · December 23, 2021

If it was a single device cache you can just re-assign it.

Adam Kesher · December 23, 2021

5 hours ago, JorgeB said:

If it was a single device cache you can just re-assign it.

it is a single cache.

when i stop the array to re-assign it this is how it appears in unassigned devices

if i click that blue disk name i get this

when i assign it to the cache pool, this historical devices shows up

is this something i need to be worried about before starting the array with the disk put back in the cache pool (formatting or anything like that?)

do I need to change the disk mount point first? or do I just assign the disk?

Thanks for all your help everyone I am just paranoid.

JorgeB · December 23, 2021

16 minutes ago, Adam Kesher said:

is this something i need to be worried about before starting the array with the disk put back in the cache pool

No, that's just about previous UD devices, and you can safely remove it.

JorgeB · December 23, 2021

16 minutes ago, Adam Kesher said:

or do I just assign the disk?

Pool is showing 3 slots, if there was just one assigned it's still OK, but make sure you've started the array once before without any device assigned, there can't be a "all data on this device will be deleted" warning for the pool device.

Adam Kesher · December 24, 2021

On 12/23/2021 at 9:28 AM, JorgeB said:

Pool is showing 3 slots, if there was just one assigned it's still OK, but make sure you've started the array once before without any device assigned, there can't be a "all data on this device will be deleted" warning for the pool device.

thanks for this.

re-assigning it worked and i was up and running just fine yesterday but today i woke up to the same Cache pool BTRFS missing device(s) warning.

is there anything else I can do to prevent this from happening? when i ran the scrub yesterday there were no errors. very frustrating. all the help has been great.

zigplex2-diagnostics-20211224-1116.zip

Edited December 24, 2021 by Adam Kesher
added diagnostics

Adam Kesher · December 24, 2021

I stopped the array and the cache disappeared again, but on a full shutdown and then starting back up it re-appeared and i got this

it auto-mounted and was correctly assigned again. this is super confusing. when i had the issue from the previous post the 'missing devices' warning happened overnight with the machine running. no restart or shutdown. diagnostics from this boot included.

zigplex2-diagnostics-20211224-1218.zip

JorgeB · December 25, 2021

NVMe device dropped offline:

Dec 23 22:48:36 Zigplex2 kernel: nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
Dec 23 22:48:36 Zigplex2 kernel: nvme 0000:08:00.0: enabling device (0000 -> 0002)
Dec 23 22:48:36 Zigplex2 kernel: nvme nvme0: Removing after probe failure status: -19

Look for a BIOS update, the below also helps sometimes, failing that try a different brand/model device or board.

Some NVMe devices have issues with power states on Linux, try this, on the main GUI page click on flash, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (on the top right) and add this to your default boot option, after "append initrd=/bzroot"

nvme_core.default_ps_max_latency_us=0

e.g.:

append initrd=/bzroot nvme_core.default_ps_max_latency_us=0

Reboot and see if it makes a difference.

Cache pool btrfs missing issue/disappeared/now unassigned

Recommended Posts

Adam Kesher

Link to comment

dlandon

Link to comment

Adam Kesher

Link to comment

dlandon

Link to comment

Adam Kesher

Link to comment

JorgeB

Link to comment

Adam Kesher

Link to comment

JorgeB

Link to comment

JorgeB

Link to comment

Adam Kesher

Link to comment

Adam Kesher

Link to comment

JorgeB

Link to comment

Join the conversation