javascript minifier and variable renaming

This is something I’ve already pointed out during my talk at SenchaCon 2010 last year as well as the recent Velocity 2011.

When you build a JavaScript-heavy web application, usually the final code is passed through a special JavaScript minifier in order to reduce the size without altering the program flow. Notable popular minifiers are JSMin, YUI Compressor, Closure Compiler, UglifyJS, and many more. Some minification tools out there also go an extra mile to rename variables (among others) so as to shorten the code even further. Of course, it is done in a such a way that the program would still work.

Let’s take a look at this trivial function:

function add(firstNumber, secondNumber) {
    return firstNumber + secondNumber;
}

Using Google Closure Compiler, e.g. the online version, one would get the optimized version as follows (UglifyJS and YUI Compressor will give the same result):

function add(a,b){return a+b};

As you can see, the new function does the same amazing job of adding two values passed to the function. The renaming of firstNumber to a and secondNumber to b, as well the extra whitespace elimination, does not have any effect whatsoever, but it does shorten the code.

Now let’s step back and see what a JavaScript engine does when it is about to run a certain script. The first step is the so-called lexical analysis, i.e. breaking down the string representing the code into a list of tokens. Because of that, it’s also called tokenization and the part which carries it out is often referred as the tokenizer.

Here we see that both function and return are keywords, our function arguments firstNumber and secondNumber are identifiers. There are also punctuators, shown above with the green highlight. The minified version will give the same exact list of tokens.

When the tokenizer sees a sequence of characters, e.g. function or firstNumber, it has to decide whether that string is a keyword (the former) or an identifier (the latter). It is obvious that if the character sequence is very long and very similar to a keyword, the tokenizer will have a hard time. For example, deciding whether instanceComponent is an identifier or not is slightly more difficult than just the single-letter a, not only because it is longer but also there is a keyword called instanceof.

Now here is my argument: we shall not blindly use the simple alphabet series (a, b, c, …) as the replacement names for the variables. The reason is as follows. If we have a string, whose first letter can never be the first of letter of any reserved words, then the tokenizer may know immediately that that string is an identifier (in this context). Thus, we save a few CPU cycles.

If we look at the latest ECMAScript language specification, in particular section 7.6.1, we see the list of keywords and future reserved keywords. The reserved words also include true, false, null, and we may want to throw undefined for the practical purpose.

Here is a list of characters which never start a reserved word:

a g h j k l m o p q x y z

If you want to involve all future keywords, such as let (which actually already in Firefox), package, etc, then the choices are slightly restricted to:

a g h j k o q x z

Let me know if I miss other keywords. Update: I removed m (for module), thanks to Rick and Thaddee!

Of course, this is a micro optimization. In most cases, the speed-up is negligible. It also depends on the implementation details, some tokenization implementation (perfect hash? I’m not sure about it) may not be affected by this cheap trick at all.

Having said that, if I create my own JavaScript minifier, am I going to pick this technique? Definitely!

javascript minifier and variable renaming

Related posts: