JavaScript Source Transformation: Non-Destructive vs Regenerative

3 min read

Transforming JavaScript source code is essential to build various JavaScript tools, from minifier to transpiler. There are two different techniques for doing the transformation: non-destructive change of the original source or full regeneration from the syntax tree. They serve different purposes and tend to complement each other.

With both strategies, the original source needs to be parsed first. This can be easily carried out using a parser. After that, what happens to the produced syntax tree depends on the chosen approach, as summarized by the following diagram.


In the case of non-destructive modification, we use the location information of the syntax nodes and/or tokens to find out what needs to be tweaked. An obvious example is when we want to change string literal quotes, from single quotes to double quotes (or the other way around). By locating every string literal, we know where the quotes are and thus we can just perform in-place replacement for the quotes (note that additional escaping might be necessary, since this is about strings).

Simple syntax transpilation is also another typical use case. For example, if we want to use ECMASCript 6 block scope feature today, we need to transform the code (e.g. using defs.js) to run in an ECMAScript 5 environment. This is about converting let to var (taken into account the proper scoping of course).

The advantage of non-destructive transformation is that we do not lose many important parts of the original source which do not affect the syntax and execution. For example, converting double quotes to single quotes means that the existing indentation, comments, etc are not touched at all. The modification tool only changes stuff it is interested in, it should ignore everything else.

If we are building a tool which does not care about the original source, the obviously it is easier to just regenerate the source from the syntax tree. For example, a minifier reproduces a new source which is semantically equivalent to the syntax tree, but without the extra white spaces. In many cases, the minifier may also shorten variables names, remove unused code, and many other tweaks to the syntax nodes. This way, the code becomes shorter (in term of bytes) but it does not affect its execution. Such a minifier will not care about the original indentation and comments.

For the purpose of code coverage analysis, instrumentation is the first necessary step. A coverage tool like Istanbul will sprinkle its instrumentation code surrounding the syntax nodes, this way it can keep track of all statements and branches hit by the JavaScript interpreter. The instrumenter is another perfect use case of code regeneration. After it adds some more extra instrumentation syntax nodes, the newly generated code is the one which is going to be executed. At this stage, nobody cares about the formatting, indentation, and other cosmetic appearance.

Of course, nothing stops us from composing two or more tools using these two different techniques!

♡ this article? Explore other posts, browse the archives, or follow me Twitter.

Share this on Twitter Facebook Google+

comments powered by Disqus