Those who work on web-based applications on mobile platforms often recall the advice, “Use translate3d to make sure it’s hardware accelerated”. This advice seems magical at first, and I seldom find anyone who explains (or wants to explain) the actual machinery behind such a practical tip.
For a while, Safari (and Mobile Safari) was the only WebKit-based browser which supports hardware-accelerated CSS animation. Google Chrome caught up, QtWebKit-powered browser (like the one in Nokia N9) also finally supported it. Such a situation often gave the wrong impression that Apple kept the hardware-acceleration code for themselves.
The above two are basically the reasons for this blog post.
In case you miss it (before we dive in further), please read what I wrote before about different WebKit ports (to get the idea of implementation + back-end approach) and tiled backing store (decoupling web page complexity with smooth UX). The GraphicsContext abstraction will be specially useful in this topic. In particular, because animation is tightly related to efficient graphics.
Imagine if you have to move an image (of a unicorn, for example) from one position to another. The pseudo-code for doing it would be:
for pos = startPosition to endPosition draw unicorn at pos
To ensure smooth 60fps, your inner loop has only 16 ms to draw that unicorn image. Usually this is a piece of cake because all the CPU does is sending the pixels of the unicorn image once to the GPU (in the form of texture) and then just refer the texture inside the animation loop. No heavy work is needed on the CPU and GPU sides.
If, however, what you draw is very complicated, e.g. formatted text consisting of different font typefaces and sizes, this gets hairy. The “draw” part can take more than 16 ms and the animation is not butter-smooth anymore. Because your text does not really change during the animation, only the position changes, the usual trick is to cache the text, i.e. draw it onto a buffer and just move around the buffer as needed. Again, the CPU just needs to push the buffer the GPU once:
prepare a temporary buffer draw the text onto the buffer for pos = startPosition and endPosition set a new transformation matrix for the buffer
As you can imagine, that’s exactly what happens when WebKit performs CSS animation. Instead of drawing your div (or whatever you animate) multiple times in different position, it prepares a layer and redirect the drawing there. After that, animation is a simple matter of manipulating the layer, e.g. moving it around. WebKit term for this (useful if you comb the source code) is accelerated compositing.
Side note: Mozilla has the same concept, available since Firefox 4, called Layer.
If you understand immediate vs retain mode rendering, non-composited vs composited is just like that. The idea to treat the render tree more like a scene graph, a stack of layers with different depth value.
Because compositing reduces the computation burden (GPU can handle varying transformation matrix efficiently), the animation is smoother. This is not so noticeable if you have a modern machine. In the following video demo (http://youtu.be/KujWTTRkPkM), I have to use my slightly old Windows laptop to demonstrate the frames/second differences:
The excellent falling leaves animation is something you have seen before, back when WebKit support for CSS animation was announced.
Accelerated compositing does not magically turn every WebKit ports capable of doing fluid animation. Analog to my previous rounded corner example, compositing requires the support from the underlying platform. On Mac OS X port of WebKit, compositing is mapped into CoreAnimation (part of CoreGraphics), the official API to have animated user interface. Same goes for iOS WebKit. On Chromium, it is hooked into sandboxed GPU process.
With QtWebKit, compositing is achieved via Graphics View framework (read Noam’s explanation for details). The previous video you have seen was created with QtWebKit, running without and with compositing, i.e. QGraphicsWebView with different AcceleratedCompositingEnabled run-time setting. If you want to check out the code and try it yourself, head to the usual X2 repository and look under webkit/composition. Use spacebar (or mouse click) to switch between composited and non-composited mode. If there is no significant frame rate improvement, increase NUMBER_OF_LEAVES in leaves.js and rebuild. When compositing is active, press D to draw thin yellow border around each layer. Since it’s all about Graphics View, this debugging is easy to implement. I just inject a custom BorderEffect, based on QGraphicsEffect (which I did prototype back when I was with Nokia):

Thus, there is nothing like hidden secret with respect to Safari hardware-accelerated CSS support. In fact, Safari is not different than other Mac apps. If you compile WebKit yourself and build an application with it, you would definitely get the animation with hardware acceleration support.
As the bonus, since Mac and iOS WebKit delegate the animation to CoreAnimation (CA), you can use various CA tweaks to debug it. CA_COLOR_OPAQUE=1 will emphasize each layer with red color overlay (as in the demo). While this applies to any CA-based apps (not limited to WebKit or Safari), it’s still very useful nevertheless. Chromium’s similar feature is –show-composited-layer-border command line option.
How does WebKit determine what to composited? Since the goal is to fully take advantage of the GPU, there are few particular operations which are suitable for such a compositing. Among others are transparency (opacity < 1.0) and transformation matrix. Ideally we would just use compositing for the entire web page. However, compositing implies a higher memory allocation and a quite capable graphics processor. On mobile platforms, these two translate into additional critical factor: power consumption. Thus, one just needs to draw a line somewhere and stick with it. Hence, that’s why currently (on iOS) translate3d and scale3d are using compositing and their 2-D counterparts are not. Addendum: on the desktop WebKit, all transformed element is accelerated, regardless whether it’s 2-D or 3-D.
If you make it this far, here are few final twists.
First of all, just like the tiled backing store approach I explained before, accelerated compositing does not force you to use the graphics processor for everything. For efficiency, your layer (backing store) might be mapped to GPU textures. However, you are not obligated to prepare the layer, i.e. drawing onto it, using the GPU. As an example, you can use a software rasterizer to draw to a buffer which will be mapped to OpenGL texture.
In fact, a further variation of this would be not to use the GPU at all. This may come as a surprise to you but Android 2.2 (Froyo) added compositing support (see the commit), albeit doing everything with in software (via its Skia graphics engine). The advantage is of course not that great (compared to using OpenGL ES entirely), however the improvement is really obvious. If you have two Android phones (of the same hardware specification), one still running the outdated 2.1 (Eclair) and the other with Froyo, just open the Falling Leaves demo and watch the frame rate difference.
With the non-GPU compisiting-based CSS animation in Froyo, translate3d and other similar tricks do not speed-up anything significantly. In fact, it may haunt you with bugs. For example, placing form elements in a div could wreck the touch events accuracy, mainly because the hit test procedures forget to take into account that the composited layer has moved. Things which seem to work just fine Eclair may start behaving weird under Froyo and Gingerbread. If that happens to you, check your CSS properties.
Fortunately (or unfortunately, depending on your point of view), Android madness with accelerated compositing is getting better with Honeycomb and further upcoming releases. Meanwhile, just take it for granted that your magical translate3d spell has no effect on the green robots.
Last but not least, I’m pretty excited with the lightweight scene graph direction in the upcoming Qt 5. If any, this will become a better match for QtWebKit accelerated compositing compared to the current Graphics View solution. This would totally destroy the myth (or misconception) that only native apps can take advantage of OpenGL (ES). Thus, if you decide to use web technologies via QtWebKit (possibly through the hybrid approach), your investment would be future-attractive!
Update: The incorrect term composition has been changed to correct one, compositing.