ariya.io About Collections Archives

From double-quotes to single-quotes

3 min read

Inconsistency begets insanity. If every developer follows the agreed coding conventions, life feels more wonderful. When a string literal can be enclosed in single or double quotes (ECMAScript 5 specification section 7.8.4), often it helps to stick with one type of quotes. For example, jQuery code style mandates the use of double-quotes.

Personally I prefer single-quotes. That’s just my preference, though. When looking at Esprima, I realize that I can use its non-destructive partial modification feature (see also the summary of its other features, from source location info to code generation) to force every string literals to use single-quotes. And thus the following singlequote.js example was born.

var fs = require('fs'),
    esprima = require('esprima'),
    input = process.argv[2],
    output = process.argv[3],
    offset = ,
    content = fs.readFileSync(input, 'utf-8'),
    tokens = esprima.parse(content, { tokens: true, range: true }).tokens;
 
function convert(literal) {
    var result = literal.substring(1, literal.length - 1);
    result = result.replace(/'/g, '\'');
    return ''' + result + ''';
}
 
tokens.forEach(function (token) {
    var str;
    if (token.type === 'String' && token.value[] !== '\'') {
        str = convert(token.value);
        content = content.substring(, offset + token.range[]) + str +
            content.substring(offset + token.range[1] + 1, content.length);
        offset += (str.length - token.value.length);
    }
});
fs.writeFileSync(output, content);

Run it with Node.js like this:

node singlequote.js inputfile outputfile

How does this work? Let’s assume that the content of the input file is:

console.log("Hello")

When we ask Esprima parser to consume it, with the option tokens set to true, the parser also outputs the list of all tokens collected during the parsing process in an array. For our example above, the array is:

[
    { type: "Identifier", value: "console", range: [, 6] },
    { type: "Punctuator", value: ".", range: [7, 7] },
    { type: "Identifier", value: "log", range: [8, 10] },
    { type: "Punctuator", value: "(", range: [11, 11] },
    { type: "String", value: ""Hello"", range: [12, 18] },
    { type: "Punctuator", value: ")", range: [19, 19] }
]

Once the tokens are available, all we have to do is to iterate and find the token associated with a string literal. Each token also contains the location info in its range property which denotes the zero-based start and end position (inclusive). Of course, what interests us is only the String token:

{ type: "String", value: ""Hello"", range: [12, 18] }

This facilitates some string operations to replace the original source, for the above example it’s between [12, 18]. Care must be taken that if the literal value contains one or more single-quotes, those single-quotes must be properly escaped (see SingleEscapeCharacters in section 7.8.4). Since this may change the total literal length, offset adjustment is often needed as well. An example follows:

// before
"color = 'blue'";
 
// after
'color = 'blue'';

The conversion still does not have the ability to do the reverse, i.e. removing unnecessary escaped characters. This is the case where double-quotes in the literal need not be escaped anymore. This functionality is left as an exercise to the readers!

Obviously this tool is nothing more than an academic exercise. Most editor supports search-replace, though you need to be careful not to change unrelated quotes intentionally. I’m sure there is an IDE out there which can carry out the same task efficiently. I do hope that whatever techniques you would use would take into account the escaping issue mentioned above.

Got some other ideas with the token list and partial modification?

Related posts:

♡ this article? Explore more, check the archives, or follow me Twitter.

Share this on Twitter Facebook Google+

comments powered by Disqus