Primavera updates of Esprima

It’s spring, it’s primavera! Meanwhile, all your code are belong to us.

On a more serious front, it’s been a while since I announced Esprima, the lightning-fast ECMAScript parser project. I already wrote a bit about the idea behind its development strategies, in particular since Esprima was one of my important FOSS focus last year.

Now after two months, what kind of new goodies you can get from it? The executive summary (TLDR) will be the following list:

Improved browser compatibility
Optional collection of comments
Location information for all syntax nodes
Source code modification
Experimental code generation

One parser to rule them all

Let me start with browser support. There have been lots of fixes and tweaks so that Esprima finally runs also on less-than-modern web browsers, as old as Internet Explorer 6, Safari 3, and Opera 8. Of course, any versions of Firefox and Chrome will handle Esprima just fine. However, keep in mind that older generations of browsers were not equipped with a fast JavaScript engine (nothing we can do about it) and therefore the performance of Esprima parsing may not be as fantastic as you would want it to be.

Speaking about browsers, the recent releases of Chrome 17 and Firefox 10 pack some serious speed improvements on the JavaScript execution. Since Esprima also received minor tuning here and there in order to leverage features like type inference and property-access optimization, these browsers highlight the performance boost as well. An updated speed comparison of Esprima parser to its closest competitor, parse-js from the well-known UglifyJS project, when running the benchmarks suite is clear from the following chart:

The test machine is the same like the last comparison: an iMac from late 2010, with 3 GHz Intel Core i3. Don’t be surprised if your shiny 2012 computer shows an even more impressive performance. It’s not uncommon to witness the parser consumes the entire jQuery source code (and not the minified version) in just 50 ms. Expressed differently, in a duration of one second, 240 KB jQuery code can be parsed 20 times.

Document it, and they will come

One use case I had in mind when I started working on Esprima is the basis for a documentation tool. Since such a tool relies heavily on the source code comments, as a form of annotations, this means that Esprima optionally has to keep all the comments in the code. This mechanism is now in place, every comment found in the source is collected in an array, along with its location. As a quick example, the following code:

// Hello, world!
42

will yield a syntax tree which looks like (pay attention to the comments block):

{
    type: 'Program',
    body: [{
        type: 'ExpressionStatement',
        expression: {
            type: 'Literal',
            value: 42
        }
    }],
    comments: [{
        range: [, 16],
        type: 'Line',
        value: ' Hello, world!'
    }]
},

Both types of comment, line (// ...) and block (/* .. */), are supported.

In the future, there will be improvements to this comment handling (see issue 71). For example, each comment should be attached to the nearest syntax node so that it is easier to find an annotation to its associated annotated function declaration.

One’s destination is never a place

Location info can be added to every syntax node. This could be in the form of index-based range. For example, this source string:

(1 + 2 ) * 3

will generate a syntax tree which has the following:

{
    type: 'ExpressionStatement',
    expression: {
        type: 'BinaryExpression',
        operator: '*',
        left: {
            type: 'BinaryExpression',
            operator: '+',
            left: {
                type: 'Literal',
                value: 1,
                range: [1, 1]
            },
            right: {
                type: 'Literal',
                value: 2,
                range: [5, 5]
            },
            range: [, 7]
        },
        right: {
            type: 'Literal',
            value: 3,
            range: [11, 11]
        },
        range: [, 11]
    },
    range: [, 11]
}

Note the presence of range array, the numbers represent the zero-based index where the node starts and ends (inclusive). This is quite powerful since you can then refer back to the original source string to find the part associated with a syntax node. We’ll see shortly how this can be beneficial to us.

For compatibility with Mozilla (SpiderMonkey) Parser API, location info based on line and column number is also supported. If this is specified, then the syntax tree for the above example source will look like:

 
{
    type: 'ExpressionStatement',
    expression: {
        type: 'BinaryExpression',
        operator: '*',
        left: {
            type: 'BinaryExpression',
            operator: '+',
            left: {
                type: 'Literal',
                value: 1,
                loc: {
                    start: { line: 1, column: 1 },
                    end: { line: 1, column: 2 }
                }
            },
            right: {
                type: 'Literal',
                value: 2,
                loc: {
                    start: { line: 1, column: 5 },
                    end: { line: 1, column: 6 }
                }
            },
            loc: {
                start: { line: 1, column:  },
                end: { line: 1, column: 8 }
            }
        },
        right: {
            type: 'Literal',
            value: 3,
            loc: {
                start: { line: 1, column: 11 },
                end: { line: 1, column: 12 }
            }
        },
        loc: {
            start: { line: 1, column:  },
            end: { line: 1, column: 12 }
        }
    },
    loc: {
        start: { line: 1, column:  },
        end: { line: 1, column: 12 }
    }
}

Line number and column index are very useful when giving a message back to the user. For example, the regex collector demo tracks every node which represent a regular expression and then uses the location info to let the user knows where each regex is to be found.

Note that index and line/column are not mutually exclusive, both can be enabled together.

Nothing is permanent except change

Getting a complete syntax tree, along with the exact position of every node, gives you the power of non-destructive code modification. An example of such modification is adding a generated prolog in every function entrance. Being able to instrument function invocation like this permits advanced probing, see my previous blog posts on run-time analysis of complexity and execution tracking during application startup for the details.

The trick is rather simple. Given the following function:

Array.prototype.swap = function (i, j) {
    var k = this[i]; this[i] = this[j]; this[j] = k;
}

automatically it is modified to be:

Array.prototype.swap = function (i, j) {
Log({ name: 'Array.prototype.swap', lineNumber: 1, range: [23, 94] });
    var k = this[i]; this[i] = this[j]; this[j] = k;
}

The extra call to that Log function, which is totally customizeable, is the key. This is accomplished via the new Esprima modification API.

As of now, only FunctionEntrance modifier is available. However, I do expect that several built-in modifiers will be at your disposal some time soon. The modification API itself is pretty generic, it’s rather easy to plug in your own modifier implementation.

Note that this modification approach only touches the relevant part of the source code. It is quite important because often you don’t want any automatic tool to mangle your specially formatted code or change the various indentations. Since the goal is partial source modification, each modifier shall stick to whatever syntax node it is actively targeting. Now you should understand why location info, mentioned before, is extremely important.

One generation plants the trees, another gets the shade

While Esprima was started with a parser project in mind, one thing which was waiting to happen is the opposite of parsing. What if we carefully construct a syntax tree representing the logic we want to accomplish and then the corresponding code will appear? Such a feature will let us do more crazy things!

That’s exactly what code generation is all about. This is still a work in progress (track issue 89). The initial implementation of Esprima code generator already allows you to do some useful thing.

First, you can create the syntax tree yourself. Esprima expects an input formatted to Mozilla (SpiderMonkey) Parser API. Even partial or incomplete sub-tree would work (to a certain extent). A rather simplistic example follows.

esprima.generate({
    type: 'BinaryExpression',
    operator: '+',
    left: { type: 'Literal', value: 40 },
    right: { type: 'Literal', value: 2 }
});

The result of the above is as expected:

40 + 2

If you create a more sophisticated expression, the code generator understands the operator precedence very well (read its design idea) and would produces sensible output. For example, it won’t insert unnecessary parentheses depending on the expected order of evaluation.

Another use-case is for code regeneration after some syntax tree modification. This is different than the previous partial modification approach. By regenerating the source code, the modification is rather destructive, i.e. the outcome may or may not resemble the original. It is still very powerful, in a different way.

The snippet below is a representative example:

var syntax = esprima.parse('answer = 42;');
syntax.body[].expression.right.value = 1337;
esprima.generate(syntax)

What would you get? Surprisingly, the string:

answer = 1337;

Pretty neat! And if you’re thinking what we’re thinking, there will be few more tools which can be derived from this idea. Let’s keep the suspense and save it for another blog post.

To boldly go…

The above (rather long) explanation outlines all the recent highlights of Esprima development. In addition, there are also the usual tons of bug fixes and standard conformance improvements. For more info, always visit its website esprima.org, project page, and issues dashboard. Feedback can also be sent to the mailing list. Contribution is always welcomed, follow the contribution guide.

Last but not least, special thanks to Yusuke Suzuki, Kris Kowal, Joost-Wim Boekesteijn, and Arpad Borsos for the important contribution in the last two months.