ariya.io About Talks Articles

JavaScript branching and code shuffling

4 min read

Modern CPU is equipped with a branch predictor to help with optimizing program execution. For example, a loop typically exits only once and the flow is kept inside the loop for a while (it’s called a loop for a reason). Since the terminal condition to quit the loop is only hit once, the loop predictor assumes the path of going back into the loop and preempts the next step of execution.

Generalizing the concept to a higher level is certainly possible. For a decision branch where one path is likely taken more than the others, let’s just optimize for that (hence, the term fast code path). After all, engineering is nothing but setting the right compromise. In this case, we incline ourselves towards the fast case and penalize any other slow cases (hopefully only by a very small margin).

While profiling Esprima, I stumbled upon a case where fast code path optimization makes sense, see issue #171. Consider the simplified version of its primary expression parsing function (see ECMAScript specification section 11.1):

function parsePrimaryExpression() {
 
    if (match('[')) return parseArrayInitialiser();
    if (match('{')) return parseObjectInitialiser();
    if (match('(')) return parseBracketExpression();
 
    if (matchKeyword('function')) return parseFunctionExpression()
    if (matchKeyword('this')) return createThisExpression();
 
    if (match('/') || match('/=')) return createRegExpLiteral();
 
    var token = lex();
    if (token.type === Token.Identifier) return createIdentifier(token);
    if (token.type === Token.NullLiteral) return createNullLiteral();
    if (token.type === Token.NumericLiteral) return createNumericLiteral(token);
    if (token.type === Token.StringLiteral) return createStringLiteral(token);
    if (token.type === Token.BooleanLiteral) return createBooleanLiteral(token);
 
    return throwUnexpected(token);
}

The code should be pretty self-explanatory. All the ‘match’ functions look ahead the next token. While logically nothing is wrong with the code, the profiler outcome suspiciously provoked me into thinking that this function should be biased towards the most common cases. It is hard to believe that primary expressions in a typical JavaScript programs are mostly about arrays and object literals, yet those cases are inspected first.

Just like any profiling strategy, assumption needs to be verified with experiments. By inserting a simple extra line:

console.log(tokenTypeAsString(token));

and running the parser with the corpus of the benchmarks suite, which includes well-known libraries from jQuery to Backbone, I got some pretty exciting result, in particular after filtering it with the pipes:

sort | uniq -c | sort -rn

The result is as follows:


or, for number freaks:

54362 Identifier
10419 Keyword
 8170 String
 5820 Punctuator
 5213 Numeric
 1575 Boolean
  909 Null

This means that 63% of the time, the function has to deal with identifiers and yet it has to pass several hoops until it comes to the point of processing one! Once the hot spot is identified (no pun intended), the remedy is rather straightforward (the order of various checks after the identifier fast path apparently does not matter much):

function parsePrimaryExpression() {
    var token = lookahead();
 
    if (token.type === Token.Identifier) return createIdentifier(token);
    if (token.type === Token.NumericLiteral) return createNumericLiteral(token);
    if (token.type === Token.StringLiteral) return createStringLiteral(token);
 
    if (matchKeyword('this')) return createThisExpression();
    if (matchKeyword('function')) return parseFunctionExpression()
 
    if (token.type === Token.BooleanLiteral) return createBooleanLiteral(token);
    if (token.type === Token.NullLiteral) return createNullLiteral();
 
    if (match('[')) return parseArrayInitialiser();
    if (match('{')) return parseObjectInitialiser();
    if (match('(')) return parseBracketExpression();
 
    if (match('/') || match('/=')) return createRegExpLiteral();
 
    return throwUnexpected(lex());
}

While branch predictor can help reducing the impact of slow paths, the dynamic nature of JavaScript often does not make the life of the JavaScript engine easy. For example, it needs to be careful to preempt the condition check because match function may do crazy thing such as changing the implementation of matchKeyword function! Modern engines are getting smarter these days and can collect enough data flow to exclude such a rare dynamic run-time patching, however nothing is easier than helping the JavaScript engines to take care of the common cases first.

Just like previous trick to keep the object structure unchanged, be careful and please optimize responsibly. I highly recommend doing the initial profiling to ensure that you are dealing with the hot code, executed a bajillion times (like in the Esprima example above). If the suspicious branch is only hit a dozen times, do not worry about that. The purpose of high-level programming language is to free you from fiddling around with micro-optimization unless you reach the point where you absolutely need to.

This approach also shows that there is a requirement to have a representative benchmark suite for your use cases, way before you go crazy with any types of optimizations. Avoid creating a simplistic loop to test your theory. Apply the code reorganization with a set of real-world stress tests and monitor the impact under various conditions. Code shuffling is fun but don’t forget to lather, rinse, and repeat throughly!

Related posts:

♡ this article? Explore more articles and follow me Twitter.

Share this on Twitter Facebook