jaspervanisterdael Posted March 24, 2020 Share Posted March 24, 2020 My Unraid server is really unstable. I'm using unraid since the beginning of 2019, I know quite a bit about the OS but I'm not able to get it fully stable. At the beginning I was using a Ryzen CPU in my server. It was very unstable, because the first Ryzen generation and Linux isn't the best combination in my case. I decided to sell my Ryzen server and build a proper server with server grade hardware. My server is still very unstable. I already disabled C-states in my bios, but it's still not stable. The shares I'm using for Docker and the VM's are Cache Only. I only have 1 VM and a few docker containers running, my CPU load is 20% at most. Specs: - Intel Core i3 8100 - Supermicro X11SCH-LN4F - 32 GiB DDR4 Single-bit ECC - 4x TB ST6000VN0033 (1 of them is the parity disk, the others are bulk storage) - 2x 480GB KINGSTON_SA400S37480G (both are cached) I was using Windows Server before, but because of the limitations of the os (linux related), i decided to switch to unraid. When i was using Windows on this server, the problems I mentioned weren't present. gravity-diagnostics-20200324-1922.zip Quote Link to comment
Dissones4U Posted March 24, 2020 Share Posted March 24, 2020 2 hours ago, jaspervanisterdael said: The shares I'm using for Docker and the VM's are Cache Only Something appears to be wrong with your cache, I'm not well versed with the cache functionality yet so I could be way off base here, if nobody else chimes in then check that your cables and cards are properly seated, change cables if you have spares. Also, I can't really tell but your cache looks like it's RAID 1 and maybe full but I'm not positive. Quote Link to comment
civic95man Posted March 24, 2020 Share Posted March 24, 2020 Looks like you're using the on-board sata ports. Try changing cables or at least re-seating all of them as @Dissones4U suggested and maybe check the power connection to the drives/ssd. Should probably run a scrub afterwards on the cache. Quote Link to comment
jaspervanisterdael Posted March 24, 2020 Author Share Posted March 24, 2020 8 minutes ago, civic95man said: Looks like you're using the on-board sata ports. Try changing cables or at least re-seating all of them as @Dissones4U suggested and maybe check the power connection to the drives/ssd. Should probably run a scrub afterwards on the cache. I'll try this tomorrow. I'm just wondering, how can the SATA ports on my motherboard where I connected the drives to causing the problems? Is there any explanation for? Quote Link to comment
civic95man Posted March 24, 2020 Share Posted March 24, 2020 1 minute ago, jaspervanisterdael said: how can the SATA ports on my motherboard where I connected the drives to causing the problems? Is there any explanation for Sata ports by their very nature are bad designs and known to cause all kinds of reliability issues. Are you using the ones with clips? I hear those are marginally better. It may not be the motherboard port, it could be the drive(s) ports too. As an example, my windows computer began randomly rebooting and after a week of troubleshooting, I narrowed it down to a faulty sata cable going to the ssd boot drive. That computer has been unopened and unmoved for over a year and worked flawlessly up until a few weeks ago. The cable or connection just decided to give up without notice. So I know and have witnessed it happening. And on top of that, reseating and the power and sata cables is relatively easy and cost-free troubleshooting step so it never hurts to try it first. Next up to that would be swapping the cables with new ones if you have them. Good luck Quote Link to comment
JorgeB Posted March 25, 2020 Share Posted March 25, 2020 There haven been similar errors on both cache devices: Mar 14 19:45:59 Gravity kernel: BTRFS info (device sdd1): bdev /dev/sdd1 errs: wr 428, rd 482, flush 0, corrupt 0, gen 0 Mar 14 19:45:59 Gravity kernel: BTRFS info (device sdd1): bdev /dev/sdc1 errs: wr 295, rd 293, flush 0, corrupt 0, gen 0 Possibly there's a cable/connection problem on both, like for example if they share a SATA splitter, that or some compatibility issue with that model, but that would be strange. Also see here for better cache pool monitoring. 1 Quote Link to comment
jaspervanisterdael Posted March 25, 2020 Author Share Posted March 25, 2020 @johnnie.black I replaced the sata cables with quality cables, I'll let know if this solved the problem. Quote Link to comment
jaspervanisterdael Posted March 26, 2020 Author Share Posted March 26, 2020 after replacing the sata cables, i still have the problem that my server hangs with 100% cpu usage. Quote Link to comment
civic95man Posted March 26, 2020 Share Posted March 26, 2020 When you say "hangs", does it become unresponsive? does it eventually recover or does it need to be rebooted? Quote Link to comment
jaspervanisterdael Posted March 26, 2020 Author Share Posted March 26, 2020 My server is recovering after several minutes. It doesn't need any reboot. I'm noticing now that my unraid server doesn't have any SWAP storage allocated. Is it possible that this could be the problem? Quote Link to comment
jaspervanisterdael Posted March 26, 2020 Author Share Posted March 26, 2020 @johnnie.blackwhen I'm checking the cache pool monitor, I'm seeing several error's: root@Gravity:~# btrfs dev stats /mnt/cache [/dev/sdd1].write_io_errs 473 [/dev/sdd1].read_io_errs 867 [/dev/sdd1].flush_io_errs 0 [/dev/sdd1].corruption_errs 0 [/dev/sdd1].generation_errs 0 [/dev/sdc1].write_io_errs 395 [/dev/sdc1].read_io_errs 771 [/dev/sdc1].flush_io_errs 0 [/dev/sdc1].corruption_errs 0 [/dev/sdc1].generation_errs 0 root@Gravity:~# Quote Link to comment
JorgeB Posted March 26, 2020 Share Posted March 26, 2020 Did you reset the stats after replacing the cables? How to is in the link above. Quote Link to comment
jaspervanisterdael Posted March 26, 2020 Author Share Posted March 26, 2020 @johnnie.blackI cleaned the stats, so I'll let you know if there are new error's occuring. Is it a good idea to make a new docker image? Quote Link to comment
JorgeB Posted March 26, 2020 Share Posted March 26, 2020 8 minutes ago, jaspervanisterdael said: Is it a good idea to make a new docker image? It can't hurt, though don't remember seeing any issues with current one. Quote Link to comment
civic95man Posted March 26, 2020 Share Posted March 26, 2020 1 hour ago, jaspervanisterdael said: I'm noticing now that my unraid server doesn't have any SWAP storage allocated. Is it possible that this could be the problem? Unraid gets decompressed to and runs from memory anyway, but no Quote Link to comment
jaspervanisterdael Posted March 27, 2020 Author Share Posted March 27, 2020 Sadly, the server is still hanging. It could be a hardware issue, but its weird that both of the SSD's are giving errors... Quote Link to comment
JorgeB Posted March 27, 2020 Share Posted March 27, 2020 11 minutes ago, jaspervanisterdael said: but its weird that both of the SSD's are giving errors. It is, if it sill happens with new cables (assuming you also replaced power cables) it could be a compatibility issue with board, though it would be strange one, since the Intel ports are usually problem free. Quote Link to comment
jaspervanisterdael Posted March 28, 2020 Author Share Posted March 28, 2020 I ordered a new power cable (currently im using a molex > sata adapter), i'll give an update about the situation. Quote Link to comment
jaspervanisterdael Posted March 29, 2020 Author Share Posted March 29, 2020 I replaced both the sata power and data cables, but they're still hanging. i also did a SMART extended self test, but there were no errors so i suppose that the SSD'S are good. Quote Link to comment
JorgeB Posted March 29, 2020 Share Posted March 29, 2020 If you can try a couple of different brand/model SSDs. Quote Link to comment
jaspervanisterdael Posted April 6, 2020 Author Share Posted April 6, 2020 @johnnie.black after replacing the ssd's with samsung evo 860's, the server is running fully stable Quote Link to comment
JorgeB Posted April 7, 2020 Share Posted April 7, 2020 Good, thanks for reporting back. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.