nmills3 Posted June 27, 2022 Share Posted June 27, 2022 Just came back to find my server with half of the docker containers stopped and getting btrfs errors. I restarted the server and now one of my main array disks is disabled. I've attached it's smart report. What do i need to do to fix this? tower-smart-20220627-1817.zip Quote Link to comment
JorgeB Posted June 27, 2022 Share Posted June 27, 2022 Please post the diagnostics, ideally from before and after rebooting, if you don't have before post just the current ones. Quote Link to comment
nmills3 Posted June 27, 2022 Author Share Posted June 27, 2022 (edited) Unfortunately i don't have the logs from before the reboot. here's the log from after tower-diagnostics-20220627-1824.zip Edited June 27, 2022 by nmills3 Quote Link to comment
JorgeB Posted June 27, 2022 Share Posted June 27, 2022 Diags after array start please. Quote Link to comment
nmills3 Posted June 27, 2022 Author Share Posted June 27, 2022 my bad. here you go. tower-diagnostics-20220627-2049.zip Quote Link to comment
trurl Posted June 27, 2022 Share Posted June 27, 2022 Disabled/emulated disk4 mounts, but it doesn't have much data. Is that expected? Diagnostics shows shares G-----------y and M---a with files on disk4. You should be able to see the contents of emulated disk4 by clicking on the icon in the View column for the disk on Main - Array Devices. The emulated contents is exactly what would be rebuilt. Quote Link to comment
trurl Posted June 27, 2022 Share Posted June 27, 2022 The only btrfs filesystems are your 3 drive pool named cache. And docker and libvirt .img vdisks which aren't on disk4. So that has nothing to do with the disabled disk. I didn't see any problems with disk4 in syslog. Did you reboot after it became disabled? And I didn't notice any btrfs error in syslog maybe that was before reboot. I did notice you have 50G allocated to docker.img. Have you had problems filling it? Unrelated, but your appdata and system shares have files on the array. Quote Link to comment
trurl Posted June 27, 2022 Share Posted June 27, 2022 SMART for disk4 looks OK though it hasn't had an extended test run on it. Connection problems are much more common than bad disks. Check all connections, SATA and power, both ends, including splitters. If emulated disk contents looks OK you can rebuild to the same disk. https://wiki.unraid.net/Manual/Storage_Management#Rebuilding_a_drive_onto_itself Quote Link to comment
nmills3 Posted June 27, 2022 Author Share Posted June 27, 2022 ok, i'll answer your questions in order. It's expected that it doesn't have much data. i have about 16TB of storage and only about 5TB used so that's normal for it. I'm unsure if the drive was disabled before the reboot. I just say a bunch of my docker containers stopped and some btrfs errors in the log and decided to reboot to try and fix it before posting and after that reboot i noticed that the drive was disabled (the drive didn't have any data being used by the docker containers so i have no idea why they stopped) I had an issue with filling the docker image in the past because i have a lot of game servers running on my tower so even with 50gb i end up using about 43% of it at all times I'll have to take a look at the connections. i'm using a hba with a sas to sata splitter thing so it's probably a bad connection there. After i redo the connections, would you say i'm probably safe to rebuild onto the same drive? Quote Link to comment
trurl Posted June 27, 2022 Share Posted June 27, 2022 You can browse the emulated disk contents by clicking on the icon under View for the disk on Main - Array Devices. Quote Link to comment
trurl Posted June 28, 2022 Share Posted June 28, 2022 2 hours ago, nmills3 said: safe to rebuild onto the same drive? yes Quote Link to comment
JorgeB Posted June 28, 2022 Share Posted June 28, 2022 Jun 27 20:45:08 Tower kernel: BTRFS info (device sdg1): bdev /dev/sdf1 errs: wr 1592711, rd 617759, flush 127116, corrupt 878662, gen 0 This suggests sdf dropped offline in the past, see here for how to handle that. Quote Link to comment
nmills3 Posted July 10, 2022 Author Share Posted July 10, 2022 (edited) Ok, the same or a very similar issue just happened again. I've not restarted the server this time and i've got the diagnostics tower-diagnostics-20220710-0904.zip Edit: I tried to stop the array to stop any further errors and now the array is failing to stop and is stuck on unmounting disks Edit 2: Just rebooted the system to get things running again and have the same issue as last time with disk 4 disabled Edited July 10, 2022 by nmills3 Quote Link to comment
JorgeB Posted July 10, 2022 Share Posted July 10, 2022 Problem with the onboard SATA controller, this is quite common with some Ryzen servers, look for a BIOS update but best bet is to use an add-on controller. Quote Link to comment
nmills3 Posted July 10, 2022 Author Share Posted July 10, 2022 all the hard drives (sdb,sdd,sdc,sde and whatever disk 4 was) are connected via a HBA but all of the other drives (ssds) are connected directly to the motherboard because people said not to connect the ssds to a hba. should i just switch them over to all be on the hba? Quote Link to comment
JorgeB Posted July 10, 2022 Share Posted July 10, 2022 2 minutes ago, nmills3 said: and whatever disk 4 was) are connected via a HBA No, disk4 is on the onboard SATA ports, don't use those for anything, or like mentioned see if a BIOS update helps. Quote Link to comment
nmills3 Posted July 10, 2022 Author Share Posted July 10, 2022 So basically that sata controller pooped the bed and that broke disk 4 and i assume also messed up the cache drives connected to it causing the issues with my dockers dying then i assume? Is it safe for me to just connect everything via the HBA? Quote Link to comment
JorgeB Posted July 10, 2022 Share Posted July 10, 2022 6 minutes ago, nmills3 said: So basically that sata controller pooped the bed and that broke disk 4 and i assume also messed up the cache drives connected to it causing the issues with my dockers dying then i assume? Yes. 9 minutes ago, nmills3 said: Is it safe for me to just connect everything via the HBA? Yes, you'll lose TRIM support on the SSDs though. Quote Link to comment
nmills3 Posted July 10, 2022 Author Share Posted July 10, 2022 hmm. what i might do then, is connect all the hard drives to the hba and leave the ssds on the motherboard. I've had the cache drives connected to that controller for upto 150 days working fine but it seems that as soon as a hdd is connected, that's when it goes wrong. If I continue to have these issues after that, then i guess my only option is to connect everything to the HBA Quote Link to comment
JorgeB Posted July 10, 2022 Share Posted July 10, 2022 You can try leaving just the SSDs there, I believe heavy i/o makes this issue worse, so probably the less devices the less chance of seeing this. Quote Link to comment
nmills3 Posted July 10, 2022 Author Share Posted July 10, 2022 Well i've had 3 860 evos connected for 100 days without issue. thinking about it, i might have moved a drive about a month ago. i bet i moved disk 4 from one of the hot swap bays connected to the HBA to one of the ones connected to the mobo without realising. I'll probably switch it so all the hotswap ports are on the hba and i'll just move the ssds to somewhere else connected directly to the mobo Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.