thebedivere Posted July 17, 2022 Share Posted July 17, 2022 Hi everyone! I've been running an Unraid server for about 3 years now, and overall it has been going great. A few months ago I started having issues with the stability of my server. I have drive failures frequently, my docker containers were acting up being unable to start or write. This past week the server has become unusable and unreliable. Most of my dockers won't start, and I have 2 drives marked as unmountable. I did the basic trouble shooting I have in the past such as deleting the docker.img and rebuilt my containers. In the past when I did have a drive fail (it has happened maybe 4-6 times over 3 years, 2 were actual bad drives I had to RMA) I would stop the array, unmount the bad drive, restart the array, stop the array, add the drive back to the array, and let it rebuild from parity. I decided not to do that this time with the 2 drives that died, since it takes almost a week at this point to rebuild a drive from parity. I did some digging and decided to do a "new config" and I was hopeful at first. Parity rebuilt from data in just a few days, and overall the system felt snappier. But now nothing is working as expected. Note: Anything important on the server is backed up on another NAS in my house and offsite to BackBlaze. The unimportant stuff that is not backed up is TBs of movies for a plex server. I would be sad if I had to repopulate this, but it't just not worth the cost to back up. The important non-data stuff on the server are my docker containers. I run Home Assistant, Plex, NextCloud, Foundry, and around 20 or so dockers that get daily use. Most of these were set up 3 years ago when I started the sever and haven't really been touched since, they just work. I don't want to just mess with things. I feel like something is off somewhere, and I am hoping to get some help from the community to solve my issues. I love tech stuff and have experience with Linux, and I am a software developer. I just don't know the particular issues with Unraid and I am hoping to learn. I attached my diagnostics zip, but let me know if I need to provide any other information. Thanks in advance. orthanc-diagnostics-20220717-1043.zip Quote Link to comment
thebedivere Posted July 17, 2022 Author Share Posted July 17, 2022 Another issue I just noticed is that I cannot make any new shares. Quote Link to comment
trurl Posted July 17, 2022 Share Posted July 17, 2022 Not a good idea to completely fill all any of your disks. Can make it difficult or impossible to recover from filesystem corruption. Quote Link to comment
thebedivere Posted July 17, 2022 Author Share Posted July 17, 2022 Just now, trurl said: Not a good idea to completely fill all of your disks. Can make it difficult or impossible to recover from filesystem corruption. Any tips on this? Do I just need more disks? I have used unBalance in the past to empty out older disks when adding new disks. Is there some setting I should change? Can I have disks be somehow marked as full once they only have a certain number of GBs left? Quote Link to comment
trurl Posted July 17, 2022 Share Posted July 17, 2022 All of the syslogs included in diagnostics are just the same problem over and over, probably due to your completely full disks. Are there any older syslogs in /var/log? Quote Link to comment
trurl Posted July 17, 2022 Share Posted July 17, 2022 You should set Minimum Free for each user share to larger than the largest file you expect to write to the share. That will make Unraid choose a different disk when a drive has less than Minimum. Won't help in the case where no disks have any space, of course. Filesystems will typically perform worse the fuller a disk is. You need to have larger disks or more disks. Looks like a few of your disks could be replaced with larger disks. 9 minutes ago, thebedivere said: I have used unBalance in the past to empty out older disks when adding new disks. Not sure I understand. Usually if you have a disk you want to get rid of, you would rebuild it to a larger disk instead of adding another disk. Quote Link to comment
thebedivere Posted July 17, 2022 Author Share Posted July 17, 2022 8 minutes ago, trurl said: All of the syslogs included in diagnostics are just the same problem over and over, probably due to your completely full disks. Are there any older syslogs in /var/log? Syslogs go back about a week, all the same issue of just not being able to write to disks. I am assuming the "new config" wiped any older logs? Quote Link to comment
trurl Posted July 17, 2022 Share Posted July 17, 2022 Also note I corrected my first reply. Not only should you not fill all of your disks, you shouldn't even fill any of your disks. 13 minutes ago, trurl said: Can make it difficult or impossible to recover from filesystem corruption. 1 minute ago, trurl said: Filesystems will typically perform worse the fuller a disk is. Quote Link to comment
thebedivere Posted July 17, 2022 Author Share Posted July 17, 2022 2 minutes ago, trurl said: You need to have larger disks or more disks. Looks like a few of your disks could be replaced with larger disks. Not sure I understand. Usually if you have a disk you want to get rid of, you would rebuild it to a larger disk instead of adding another disk. Let's say all my disks have about 20 gb free. I add a new disk to expand the array by 6-14 TB. Then I use unBalance to move files from the older disks that are more full onto the newer disk. It sounds like this isn't necessary and if I follow your advice and just have shares set up with minimum free space. I can delete some of the movies on the array and free up some space, then set up the minimum free space. Quote Link to comment
thebedivere Posted July 17, 2022 Author Share Posted July 17, 2022 (edited) 3 minutes ago, trurl said: Also note I corrected my first reply. Not only should you not fill all of your disks, you shouldn't even fill any of your disks. Gotcha, thanks for the advice. I will see what I can do to empty some space and set up minimum free space. I'll restart the server after I have some free space and see what happens with docker and such. How much free space should I have? 10 GB? 1 GB? A percentage of the drive size? Edited July 17, 2022 by thebedivere Quote Link to comment
trurl Posted July 17, 2022 Share Posted July 17, 2022 Just now, thebedivere said: "new config" wiped any older logs? No, reboot wipes older logs since syslogs are in RAM just like the rest of the OS. New Config isn't really the way to fix things if you get disabled or unmountable disks. For one thing, it won't fix unmountable anyway. But if you New Config your way out of a disabled disk, you will lose any writes that happened to the disk when it became disabled and any writes to the emulated disk after it became disabled. All of those writes can be recovered from parity if you rebuild the data disk, but are lost if you rebuild parity instead. Quote Link to comment
trurl Posted July 17, 2022 Share Posted July 17, 2022 7 minutes ago, trurl said: You should set Minimum Free for each user share to larger than the largest file you expect to write to the share. That just helps keep Unraid from choosing a disk that won't have enough space for the new file. But in general I would leave even more for the other reasons I mentioned. 4 minutes ago, thebedivere said: I add a new disk to expand the array by 6-14 TB. Then I use unBalance to move files from the older disks that are more full onto the newer disk. Why not replace/rebuild that older disk to the larger disk? Disk5 could be replaced with 14TB, for example since that is the size of your smallest parity. You can just keep adding more disks, of course. But you shouldn't get even close to running out of capacity before you do something about it. Quote Link to comment
thebedivere Posted July 17, 2022 Author Share Posted July 17, 2022 2 hours ago, trurl said: Why not replace/rebuild that older disk to the larger disk? Disk5 could be replaced with 14TB, for example since that is the size of your smallest parity. You can just keep adding more disks, of course. But you shouldn't get even close to running out of capacity before you do something about it. I've been trying to expand the space over time. Instead of removing a disk I'd rather add more disks. It's one of the reasons I wanted unraid for the server, since I can just drop in disks of different sizes to grow when I need the space. Quote Link to comment
thebedivere Posted July 17, 2022 Author Share Posted July 17, 2022 Cleaned up some space by deleting a bunch of movies, and unmounted disk 6 and 7 (they were flagged as unmountable, but no smart errors). Disk 7 is rebuilding from parity, and disk 6 is still marked as unmountable. I suspect that disk 6 might actually need to be replaced. But, now disk 3, which didn't have any issues before, is marked as unmountable. I've attached new diagnostics. orthanc-diagnostics-20220717-1415.zip Quote Link to comment
trurl Posted July 17, 2022 Share Posted July 17, 2022 8 minutes ago, thebedivere said: disk 6 and 7 (they were flagged as unmountable disk7 not unmountable according to that screenshot. Perhaps you just meant they were both disabled. Both disks are rebuilding, disk6 is rebuilding an unmountable filesystem. We usually recommend repairing the filesystem before rebuilding on top of the same disk to make sure it can be repaired before you overwrite the disk. Too late now, you will have to hope for the best when you repair the filesystem. disk3 filesystem will also have to be repaired, after rebuild completes. So, you are exactly in this position I was concerned about with your full disks. 3 hours ago, trurl said: Can make it difficult or impossible to recover from filesystem corruption. And... While rebuilding 6,7, it looks like you are having connection problems with disk5. You need to stop and fix that before attempting rebuild or repair. Quote Link to comment
thebedivere Posted July 17, 2022 Author Share Posted July 17, 2022 How do I resolve the connection problems? Could that be sata cables, or the sata controller? Quote Link to comment
trurl Posted July 17, 2022 Share Posted July 17, 2022 4 minutes ago, thebedivere said: Could that be sata cables, or the sata controller? yes, maybe just the sata/power connectors. Any splitters? I see you have a Marvell controller. 06:00.0 SATA controller [0106]: Marvell Technology Group Ltd. 88SE9215 PCIe 2.0 x1 4-port SATA 6 Gb/s Controller [1b4b:9215] (rev 11) Subsystem: Marvell Technology Group Ltd. 88SE9215 PCIe 2.0 x1 4-port SATA 6 Gb/s Controller [1b4b:9215] Kernel driver in use: ahci Kernel modules: ahci Can you remove that or just not connect to it? Quote Link to comment
thebedivere Posted July 17, 2022 Author Share Posted July 17, 2022 (edited) There are some power splitters. Edit: 4 pin to sata, not splitters The 4 port sata is on the motherboard, and I have a 6 port expansion card for the rest. I'm assuming I should power down, disconnect, and test out each drive connection. What's the best tool/approach for that? I'm more than happy to use CLI tools. Edited July 17, 2022 by thebedivere clarifying Quote Link to comment
trurl Posted July 17, 2022 Share Posted July 17, 2022 Simplest test is to just try again after replugging or replacing cables. Quote Link to comment
thebedivere Posted July 17, 2022 Author Share Posted July 17, 2022 OK, moved some cables around to see what happened, and some drives are not showing up at all. It's in a small server rack in the basement, so it'll take me some effort to pull the case fully open to get to the inner connectors. The case also has some quick replace slots in the front, I'll also try plugging directly to the drives. Thanks for the help so far, it might be a day or two before I get to the next step. Quote Link to comment
trurl Posted July 17, 2022 Share Posted July 17, 2022 9 minutes ago, thebedivere said: moved some cables around All cables should have some slack so connectors can sit squarely on the connection with no tension that might cause them to move. Don't bundle data cables. 15 minutes ago, thebedivere said: some drives are not showing up Multiple disks suggest power or controller. Already mentioned splitters. Already mentioned Marvell. If it has been working for you then maybe OK, more likely to be an issue if you enable IOMMU for hardware passthru to VMs. Quote Link to comment
thebedivere Posted July 17, 2022 Author Share Posted July 17, 2022 I was able to get some time to take the server out of the rack and up to my office (thank god the kids are napping lol). I opened it up and disconnected all the sata cables and power. I carefully reconnected everything and double checked the connections. I still have multiple drives that show up as unmountable. orthanc-diagnostics-20220717-1849.zip Quote Link to comment
trurl Posted July 17, 2022 Share Posted July 17, 2022 Post screenshot of Main - Array Operation Quote Link to comment
trurl Posted July 17, 2022 Share Posted July 17, 2022 Nevermind, syslog shows it is rebuilding disks 6,7 Is it showing anything in the Errors column on Main - Array Devices since you took that screenshot? Probably you are just going to have to repair the filesystems on all the unmountable disks and hope for the best. Quote Link to comment
thebedivere Posted July 17, 2022 Author Share Posted July 17, 2022 No errors are showing up. I'm freeing some more storage space from disk 4 before I stop the array and try and do some file system repair. Current view on the UI: Disk 7 did eventually show up after I swapped out the sata cable for one that connected better. I tried moving disks 3, 5, and 6 to other sata ports on the motherboard instead of the expansion card, and I also tried different power connectors, and nothing changed. I will attempt to do some file system repair and see how it goes. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.