The Q calculation is pretty expensive, especially for less-capable CPU's.  The kernel has a number of hand-coded assembly functions to do the math and it picks the "best" one upon boot (by measuring it).  Refer to: http://lxr.free-electrons.com/source/lib/raid6/algos.c?v=4.4   Best performance is had when your CPU supports AVX2 instruction set: https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#CPUs_with_AVX2