JavaScript Identifier Length Distribution

After the fun distribution charts of statements and keywords in popular JavaScript libraries, it is time for another metrics analysis. For a while, I was wondering how JavaScript developers come up with a variable name, function name, and other identifiers. Is it just few characters? Is it not that short? Is it always descriptive? The following script idlen.js (to be executed with Node.js) uses the parser from Esprima to dump all the identifiers, excluding the duplicates, of each file in its its corpus of libraries (for the benchmark suite).

var fs = require('fs'),
    esprima = require('esprima'),
    files = process.argv.splice(2);
 
files.forEach(function (filename) {
    var identifiers = {},
        content = fs.readFileSync(filename, 'utf-8'),
        syntax = esprima.parse(content);
 
    JSON.stringify(syntax, function (key, value) {
        if (key === 'name' && typeof identifiers[value] === 'undefined') {
            identifiers[value] = value.length;
        }
        return value;
    });
 
    for (var key in identifiers) {
        if (identifiers.hasOwnProperty(key)) {
            console.log(identifiers[key]);
        }
    }
});

With the help of Unix tools:

node idlen.js /path/to/some/*.js | sort -n | uniq -c

the distribution will look like the following diagram:

There is a long tail from 15 characters and above, which makes sense since an identifier that long will be likely special cases only (excluding this long tail region, the data roughly follows the expected normal distribution). The actual mean of the identifier length is 8.27 characters.

For the fun of it, the top 5 longest identifiers found among the libraries, with over 34 characters, are:

prototype-1.7.0.0.js   SCRIPT_ELEMENT_REJECTS_TEXTNODE_APPENDING
prototype-1.7.0.0.js   MOUSEENTER_MOUSELEAVE_EVENTS_SUPPORTED
     jquery-1.7.1.js   subtractsBorderForOverflowNotVisible
jquery.mobile-1.0.js   getClosestElementWithVirtualBinding
prototype-1.7.0.0.js   HAS_EXTENDED_CREATE_ELEMENT_SYNTAX

What kind of distribution do you get for your own JavaScript project?

JavaScript Identifier Length Distribution

Related posts: