BTRFS error and drive disabled

nmills3 · June 27, 2022

Just came back to find my server with half of the docker containers stopped and getting btrfs errors. I restarted the server and now one of my main array disks is disabled. I've attached it's smart report. What do i need to do to fix this?

tower-smart-20220627-1817.zip

JorgeB · June 27, 2022

Please post the diagnostics, ideally from before and after rebooting, if you don't have before post just the current ones.

nmills3 · June 27, 2022

Unfortunately i don't have the logs from before the reboot. here's the log from after tower-diagnostics-20220627-1824.zip

Edited June 27, 2022 by nmills3

JorgeB · June 27, 2022

Diags after array start please.

nmills3 · June 27, 2022

my bad. here you go. tower-diagnostics-20220627-2049.zip

trurl · June 27, 2022

Disabled/emulated disk4 mounts, but it doesn't have much data. Is that expected?

Diagnostics shows shares G-----------y and M---a with files on disk4.

You should be able to see the contents of emulated disk4 by clicking on the icon in the View column for the disk on Main - Array Devices. The emulated contents is exactly what would be rebuilt.

trurl · June 27, 2022

The only btrfs filesystems are your 3 drive pool named cache. And docker and libvirt .img vdisks which aren't on disk4. So that has nothing to do with the disabled disk.

I didn't see any problems with disk4 in syslog. Did you reboot after it became disabled?

And I didn't notice any btrfs error in syslog maybe that was before reboot.

I did notice you have 50G allocated to docker.img. Have you had problems filling it?

Unrelated, but your appdata and system shares have files on the array.

trurl · June 27, 2022

SMART for disk4 looks OK though it hasn't had an extended test run on it. Connection problems are much more common than bad disks.

Check all connections, SATA and power, both ends, including splitters.

If emulated disk contents looks OK you can rebuild to the same disk.

https://wiki.unraid.net/Manual/Storage_Management#Rebuilding_a_drive_onto_itself

nmills3 · June 27, 2022

ok, i'll answer your questions in order.

It's expected that it doesn't have much data. i have about 16TB of storage and only about 5TB used so that's normal for it.

I'm unsure if the drive was disabled before the reboot. I just say a bunch of my docker containers stopped and some btrfs errors in the log and decided to reboot to try and fix it before posting and after that reboot i noticed that the drive was disabled (the drive didn't have any data being used by the docker containers so i have no idea why they stopped)

I had an issue with filling the docker image in the past because i have a lot of game servers running on my tower so even with 50gb i end up using about 43% of it at all times

I'll have to take a look at the connections. i'm using a hba with a sas to sata splitter thing so it's probably a bad connection there. After i redo the connections, would you say i'm probably safe to rebuild onto the same drive?

trurl · June 27, 2022

You can browse the emulated disk contents by clicking on the icon under View for the disk on Main - Array Devices.

trurl · June 28, 2022

2 hours ago, nmills3 said:

safe to rebuild onto the same drive?

yes

JorgeB · June 28, 2022

Jun 27 20:45:08 Tower kernel: BTRFS info (device sdg1): bdev /dev/sdf1 errs: wr 1592711, rd 617759, flush 127116, corrupt 878662, gen 0

This suggests sdf dropped offline in the past, see here for how to handle that.

nmills3 · July 10, 2022

Ok, the same or a very similar issue just happened again. I've not restarted the server this time and i've got the diagnostics

tower-diagnostics-20220710-0904.zip

Edit: I tried to stop the array to stop any further errors and now the array is failing to stop and is stuck on unmounting disks

Edit 2: Just rebooted the system to get things running again and have the same issue as last time with disk 4 disabled

Edited July 10, 2022 by nmills3

JorgeB · July 10, 2022

Problem with the onboard SATA controller, this is quite common with some Ryzen servers, look for a BIOS update but best bet is to use an add-on controller.

nmills3 · July 10, 2022

all the hard drives (sdb,sdd,sdc,sde and whatever disk 4 was) are connected via a HBA but all of the other drives (ssds) are connected directly to the motherboard because people said not to connect the ssds to a hba. should i just switch them over to all be on the hba?

JorgeB · July 10, 2022

2 minutes ago, nmills3 said:

and whatever disk 4 was) are connected via a HBA

No, disk4 is on the onboard SATA ports, don't use those for anything, or like mentioned see if a BIOS update helps.

nmills3 · July 10, 2022

So basically that sata controller pooped the bed and that broke disk 4 and i assume also messed up the cache drives connected to it causing the issues with my dockers dying then i assume? Is it safe for me to just connect everything via the HBA?

JorgeB · July 10, 2022

6 minutes ago, nmills3 said:

So basically that sata controller pooped the bed and that broke disk 4 and i assume also messed up the cache drives connected to it causing the issues with my dockers dying then i assume?

Yes.

9 minutes ago, nmills3 said:

Is it safe for me to just connect everything via the HBA?

Yes, you'll lose TRIM support on the SSDs though.

nmills3 · July 10, 2022

hmm. what i might do then, is connect all the hard drives to the hba and leave the ssds on the motherboard. I've had the cache drives connected to that controller for upto 150 days working fine but it seems that as soon as a hdd is connected, that's when it goes wrong. If I continue to have these issues after that, then i guess my only option is to connect everything to the HBA

JorgeB · July 10, 2022

You can try leaving just the SSDs there, I believe heavy i/o makes this issue worse, so probably the less devices the less chance of seeing this.

nmills3 · July 10, 2022

Well i've had 3 860 evos connected for 100 days without issue. thinking about it, i might have moved a drive about a month ago. i bet i moved disk 4 from one of the hot swap bays connected to the HBA to one of the ones connected to the mobo without realising. I'll probably switch it so all the hotswap ports are on the hba and i'll just move the ssds to somewhere else connected directly to the mobo

BTRFS error and drive disabled

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation