Jump to content

New ZFS Pool, Freezing Issue. iowait?


Go to solution Solved by JorgeB,

Recommended Posts

I upgraded to 6.12.6 recently, and set up a raidz2 pool with 8x 2TB Crucial MX500 SSDs. The array and share work, but not without issues. I threw a bunch of videos on there, and sometimes I'm able to scrub through videos with ease, then all of a sudden between 1-3 cores on my dashboard will be pegged at 100%, and everything freezes for about 10-15 seconds. 

 

It seems random too, for example I can load up a 2hr \~10-20GB clip and scrub through with ease, then I'll load a much smaller sub 300MB clip and that causes things to freeze. 

It's not just when I'm scrubbing too. I can load a video and have it play normally, and all of a sudden everything freezes. 

 

I loaded up \`top\` in the terminal and it seems like \`wa\` (which I think is the disk io?) will between 10-20% when this happens... I've seen it go over 20% once. 

 

Is there anything I can do to help this? I've already enabled "Permit Exclusive Shares" as well as "Enable Disk Shares" from the global share settings. This helped with the speed of navigating the drive from macOS, as well as scrubbing performance, but I'm still seeing the freeze issues. 
 

I'm running an 8700K, 64GB RAM, and the 8 SSDs are connected to an LSI 9305 24i. I'm also connected via 10Gbe. When running \`iperf3\` from my mac to my unraid, I get around 4Gb/s. When running from unraid to my mac, I'm able to saturate the connection, so I'm not sure what's going on there. 

 

— 
 

Just wanted to add - I've been sitting watching a YouTube video, not touching anything on my shares, no background tasks running (at least that I'm aware of), but I can see the network I/O on unraid at a relatively constant 200-250kbps (200, spiking to 250 every 10 seconds or so). Then when I see one of the CPU cores get pegged at 100% on the dashboard, the outbound stops momentarily, then when the CPU goes back to normal, the network I/O keeps doing what it's doing.


I should note that Docker and VMs have been turned off while I'm debugging, so that shouldn't be a factor.

Link to comment
  • Solution

There are constant HBA issues like this:

 

Jan 17 08:50:06 Tower kernel: mpt3sas_cm0 fault info from func: mpt3sas_base_make_ioc_ready
Jan 17 08:50:06 Tower kernel: mpt3sas_cm0: fault_state(0x2810)!

 

It keeps resetting, make sure it well seated and sufficiently cooled, you can also try a different PCIe slot.

Link to comment

I'm going to tentatively say this issue is fixed. Not sure exactly what fixed it, as I both reseated and ziptied a 40mm Noctua fan onto the HBA, but I don't appear to be getting these errors any more, and I've just been scrubbing random videos as fast as I can and it seems to be working.

Thank you! 

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...