Sometime back I wrote (fun fact: it’s Google first hit for faster quaternion multiplication) about my favorite commit I did exactly two years ago to Qt :
git show cbc22908 commit cbc229081a9df67a577b4bea61ad6aac52d470cb Author: Ariya Hidayat Date: Tue Jun 30 11:18:03 2009 +0200 Faster quaternion multiplications. Use the known factorization trick to speed-up quaternion multiplication. Now we need only 9 floating-point multiplications, instead of 16 (but at the cost of extra additions and subtractions).
Ages ago, during my Ph.D research, when I worked with a certain hardware platform (hint: it’s not generalized CPU), minimizing the needed number of hardware multipliers with a very little impact in the computation speed makes a huge different. With today’s advanced processor architecture armed with vectorized instructions and a really smart optimizing compiler, there is often no need to use the factorized version of the multiplication.
Side note: if you want to like quaternion, see this simple rotatation quiz which can be solved quite easily once you know quaternion.
I try to apply the same trick to PhiloGL, an excellent WebGL framework from Nicolas. Recently, to my delight, he added quaternion support to the accompanying math library in PhiloGL. I think this is a nice chance to try the old trick, as I had the expectation that reducing the number of multiplications from 16 to just 9 could give some slight performance advantage.
It turns out that it is not the case, at least based on the benchmark tests running on modern browsers with very capable JavaScript engine. You can try the test yourself at jsperf.com/quaternion-multiplication. I have no idea whether this is due to JSPerf (very unlikely) or it’s simply because the longer construct of the factorized version does not really speed-up anything. If any, seems that the amount of executed instruction matters more than whether addition is much faster than multiplication. And of course, we’re talking about modern CPU, the difference is then becoming more subtle.
With the help of Nicolas, I tried various other tricks to help the JavaScript engine, mainly around different ways to prepare the persistent temporary variables: using normal properties, using Array, using Float32Array (at the cost of precision). Nothing leads to any significant improvement.
Of course if you have other tricks in your sleeve, I welcome you to try it with the benchmark. Meanwhile, let’s hope that someday some JavaScript engine will run the factorized version faster. It’s just a much cooler way to multiply quaternions!