Unraid hangs 100% CPU usage


Recommended Posts

My Unraid server is really unstable. I'm using unraid since the beginning of 2019, I know quite a bit about the OS but I'm not able to get it fully stable. At the beginning I was using a Ryzen CPU in my server. It was very unstable, because the first Ryzen generation and Linux isn't the best combination in my case. I decided to sell my Ryzen server and build a proper server with server grade hardware. My server is still very unstable. I already disabled C-states in my bios, but it's still not stable. 

 

The shares I'm using for Docker and the VM's are Cache Only. I only have 1 VM and a few docker containers running, my CPU load is 20% at most. 

 

Specs:

- Intel Core i3 8100

- Supermicro X11SCH-LN4F

- 32 GiB DDR4 Single-bit ECC

- 4x TB ST6000VN0033 (1 of them is the parity disk, the others are bulk storage)

- 2x 480GB KINGSTON_SA400S37480G (both are cached)

 

I was using Windows Server before, but because of the limitations of the os (linux related), i decided to switch to unraid. When i was using Windows on this server, the problems I mentioned weren't present. 

gravity-diagnostics-20200324-1922.zip

Link to comment
2 hours ago, jaspervanisterdael said:

The shares I'm using for Docker and the VM's are Cache Only

Something appears to be wrong with your cache, I'm not well versed with the cache functionality yet so I could be way off base here, if nobody else chimes in then check that your cables and cards are properly seated, change cables if you have spares. Also, I can't really tell but your cache looks like it's RAID 1 and maybe full but I'm not positive.

Link to comment
8 minutes ago, civic95man said:

Looks like you're using the on-board sata ports. Try changing cables or at least re-seating all of them as @Dissones4U suggested and maybe check the power connection to the drives/ssd. Should probably run a scrub afterwards on the cache.

I'll try this tomorrow. I'm just wondering, how can the SATA ports on my motherboard where I connected the drives to causing the problems? Is there any explanation for?

Link to comment
1 minute ago, jaspervanisterdael said:

how can the SATA ports on my motherboard where I connected the drives to causing the problems? Is there any explanation for

Sata ports by their very nature are bad designs and known to cause all kinds of reliability issues.  Are you using the ones with clips? I hear those are marginally better.  It may not be the motherboard port, it could be the drive(s) ports too.  

 

As an example, my windows computer began randomly rebooting and after a week of troubleshooting, I narrowed it down to a faulty sata cable going to the ssd boot drive. That computer has been unopened and unmoved for over a year and worked flawlessly up until a few weeks ago.  The cable or connection just decided to give up without notice.  So I know and have witnessed it happening.

 

And on top of that, reseating and the power and sata cables is relatively easy and cost-free troubleshooting step so it never hurts to try it first.  Next up to that would be swapping the cables with new ones if you have them.

 

Good luck

Link to comment

There haven been similar errors on both cache devices:

Mar 14 19:45:59 Gravity kernel: BTRFS info (device sdd1): bdev /dev/sdd1 errs: wr 428, rd 482, flush 0, corrupt 0, gen 0
Mar 14 19:45:59 Gravity kernel: BTRFS info (device sdd1): bdev /dev/sdc1 errs: wr 295, rd 293, flush 0, corrupt 0, gen 0

 

Possibly there's a cable/connection problem on both, like for example if they share a SATA splitter, that or some compatibility issue with that model, but that would be strange.

 

Also see here for better cache pool monitoring.

  • Like 1
Link to comment

@johnnie.blackwhen I'm checking the cache pool monitor, I'm seeing several error's:

 

root@Gravity:~# btrfs dev stats /mnt/cache
[/dev/sdd1].write_io_errs    473
[/dev/sdd1].read_io_errs     867
[/dev/sdd1].flush_io_errs    0
[/dev/sdd1].corruption_errs  0
[/dev/sdd1].generation_errs  0
[/dev/sdc1].write_io_errs    395
[/dev/sdc1].read_io_errs     771
[/dev/sdc1].flush_io_errs    0
[/dev/sdc1].corruption_errs  0
[/dev/sdc1].generation_errs  0
root@Gravity:~# 

Link to comment
11 minutes ago, jaspervanisterdael said:

but its weird that both of the SSD's are giving errors.

It is, if it sill happens with new cables (assuming you also replaced power cables) it could be a compatibility issue with board, though it would be strange one, since the Intel ports are usually problem free.

Link to comment
  • 2 weeks later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.