Jump to content
jaspervanisterdael

Unraid hangs 100% CPU usage

20 posts in this topic Last Reply

Recommended Posts

My Unraid server is really unstable. I'm using unraid since the beginning of 2019, I know quite a bit about the OS but I'm not able to get it fully stable. At the beginning I was using a Ryzen CPU in my server. It was very unstable, because the first Ryzen generation and Linux isn't the best combination in my case. I decided to sell my Ryzen server and build a proper server with server grade hardware. My server is still very unstable. I already disabled C-states in my bios, but it's still not stable. 

 

The shares I'm using for Docker and the VM's are Cache Only. I only have 1 VM and a few docker containers running, my CPU load is 20% at most. 

 

Specs:

- Intel Core i3 8100

- Supermicro X11SCH-LN4F

- 32 GiB DDR4 Single-bit ECC

- 4x TB ST6000VN0033 (1 of them is the parity disk, the others are bulk storage)

- 2x 480GB KINGSTON_SA400S37480G (both are cached)

 

I was using Windows Server before, but because of the limitations of the os (linux related), i decided to switch to unraid. When i was using Windows on this server, the problems I mentioned weren't present. 

gravity-diagnostics-20200324-1922.zip

Share this post


Link to post
2 hours ago, jaspervanisterdael said:

The shares I'm using for Docker and the VM's are Cache Only

Something appears to be wrong with your cache, I'm not well versed with the cache functionality yet so I could be way off base here, if nobody else chimes in then check that your cables and cards are properly seated, change cables if you have spares. Also, I can't really tell but your cache looks like it's RAID 1 and maybe full but I'm not positive.

Share this post


Link to post

Looks like you're using the on-board sata ports. Try changing cables or at least re-seating all of them as @Dissones4U suggested and maybe check the power connection to the drives/ssd. Should probably run a scrub afterwards on the cache.

Share this post


Link to post
8 minutes ago, civic95man said:

Looks like you're using the on-board sata ports. Try changing cables or at least re-seating all of them as @Dissones4U suggested and maybe check the power connection to the drives/ssd. Should probably run a scrub afterwards on the cache.

I'll try this tomorrow. I'm just wondering, how can the SATA ports on my motherboard where I connected the drives to causing the problems? Is there any explanation for?

Share this post


Link to post
1 minute ago, jaspervanisterdael said:

how can the SATA ports on my motherboard where I connected the drives to causing the problems? Is there any explanation for

Sata ports by their very nature are bad designs and known to cause all kinds of reliability issues.  Are you using the ones with clips? I hear those are marginally better.  It may not be the motherboard port, it could be the drive(s) ports too.  

 

As an example, my windows computer began randomly rebooting and after a week of troubleshooting, I narrowed it down to a faulty sata cable going to the ssd boot drive. That computer has been unopened and unmoved for over a year and worked flawlessly up until a few weeks ago.  The cable or connection just decided to give up without notice.  So I know and have witnessed it happening.

 

And on top of that, reseating and the power and sata cables is relatively easy and cost-free troubleshooting step so it never hurts to try it first.  Next up to that would be swapping the cables with new ones if you have them.

 

Good luck

Share this post


Link to post

There haven been similar errors on both cache devices:

Mar 14 19:45:59 Gravity kernel: BTRFS info (device sdd1): bdev /dev/sdd1 errs: wr 428, rd 482, flush 0, corrupt 0, gen 0
Mar 14 19:45:59 Gravity kernel: BTRFS info (device sdd1): bdev /dev/sdc1 errs: wr 295, rd 293, flush 0, corrupt 0, gen 0

 

Possibly there's a cable/connection problem on both, like for example if they share a SATA splitter, that or some compatibility issue with that model, but that would be strange.

 

Also see here for better cache pool monitoring.

Share this post


Link to post

When you say "hangs", does it become unresponsive? does it eventually recover or does it need to be rebooted? 

Share this post


Link to post

My server is recovering after several minutes. It doesn't need any reboot. I'm noticing now that my unraid server doesn't have any SWAP storage allocated. Is it possible that this could be the problem?image.thumb.png.bf2dabab685654f1b890c95dd7d9ce2e.png

Share this post


Link to post

@johnnie.blackwhen I'm checking the cache pool monitor, I'm seeing several error's:

 

root@Gravity:~# btrfs dev stats /mnt/cache
[/dev/sdd1].write_io_errs    473
[/dev/sdd1].read_io_errs     867
[/dev/sdd1].flush_io_errs    0
[/dev/sdd1].corruption_errs  0
[/dev/sdd1].generation_errs  0
[/dev/sdc1].write_io_errs    395
[/dev/sdc1].read_io_errs     771
[/dev/sdc1].flush_io_errs    0
[/dev/sdc1].corruption_errs  0
[/dev/sdc1].generation_errs  0
root@Gravity:~# 

Share this post


Link to post
8 minutes ago, jaspervanisterdael said:

Is it a good idea to make a new docker image?

It can't hurt, though don't remember seeing any issues with current one.

Share this post


Link to post
1 hour ago, jaspervanisterdael said:

I'm noticing now that my unraid server doesn't have any SWAP storage allocated. Is it possible that this could be the problem?

Unraid gets decompressed to and runs from memory anyway, but no

Share this post


Link to post
11 minutes ago, jaspervanisterdael said:

but its weird that both of the SSD's are giving errors.

It is, if it sill happens with new cables (assuming you also replaced power cables) it could be a compatibility issue with board, though it would be strange one, since the Intel ports are usually problem free.

Share this post


Link to post

I replaced both the sata power and data cables, but they're still hanging. i also did a SMART extended self test, but there were no errors so i suppose that the SSD'S are good. 

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.