This is the second part of my weekly series on how to write a math expression evaluator in JavaScript. If you miss the first part, which was about lexical analysis, go and read it first and come back later.
In this installment, we will see the actual parsing process. Based on a sequence of tokens obtained from the lexer, we shall be able to match these tokens to the expression grammar which describes the syntax of the expression. At the end of the process, we want to produce a syntax tree. This entire procedure is commonly known as syntactic analysis.
An abstract syntax tree (AST) represents the syntactic structure of the expression. Consider the following expression:
x = -6 * 7
The associated syntax tree for the above expression should look like:
The above syntax tree matches our logic if we are supposed to evaluate the expression (if we handle every node in the tree using depth-first traversal): take 6
, negate it, multiply the result with 7
, put the result into x
. Unless you don’t pass primary school math, your brain should magically gives -42
as the answer.
Of course, the question is how to create the tree given a list of tokens associated with the expression? There are multiple strategies to analyze the expression’s syntax. The one which I illustrate below is commonly known recursive descent parser. It may not be the fast one in some cases, but it is very illustrative and quite easy to trace should you want to follow the parsing logic.
Again, the following paragraphs explain and discuss various code fragment only (error checkings may be omitted). The full code is available for your pleasure from the TapDigit project.
The main entry point for the parsing looks like this. The lexer itself comes from the implementation of the lexical analyzer discussed in part 1.
function parse(expression) {
var expr;
lexer.reset(expression);
expr = parseExpression();
return {
'Expression': expr
};
}
From this, we go to the main parseExpression
function which is surprisingly simple. This is because our syntax only implies a variable assignment as an expression. For other languages with more elaborate control flow (branching, loop, etc) or some form of domain-specific language (DSL), assignment may not be the only form of expression.
function parseExpression() {
return parseAssignment();
}
For the next subsequent parseFoo variants, we need a function which can match an operator. If the incoming operator is the same as the expected one, then it returns true.
function matchOp(token, op) {
return (typeof token !== 'undefined') &&
token.type === T.Operator &&
token.value === op;
}
An example form of assignment is x = 42
. However, we also want to tackle the case where the expression is also as plain as 42
or a nested assignment such as x = y = 42
. See if you can understand how the implementation of parseAssignment
below handles all the three cases (hint: recursive is a possibility).
function parseAssignment() {
var token, expr;
expr = parseAdditive();
if (typeof expr !== 'undefined' && expr.Identifier) {
token = lexer.peek();
if (matchOp(token, '=')) {
lexer.next();
return {
'Assignment': {
name: expr,
value: parseAssignment()
}
};
}
return expr;
}
return expr;
}
The function parseAdditive
processes both addition and subtraction, i.e. it creates a binary operator node. There will be two child nodes, the left and the right ones. They represent the two subexpression, further handled by parseMultiplicative
, to be added or subtracted. Update: the previous version of the function was copied from a wrong implementation, this has been fixed since.
function parseAdditive() {
var expr, token;
expr = parseMultiplicative();
token = lexer.peek();
while (matchOp(token, '+') || matchOp(token, '-')) {
token = lexer.next();
expr = {
'Binary': {
operator: token.value,
left: expr,
right: parseMultiplicative()
}
}
token = lexer.peek();
};
return expr;
}
The same logic follows for parseMultiplicative
below. It does handle both multiplication and division.
function parseMultiplicative() {
var expr, token;
expr = parseUnary();
token = lexer.peek();
while (matchOp(token, '*') || matchOp(token, '/')) {
token = lexer.next();
expr = {
'Binary': {
operator: token.value,
left: expr,
right: parseUnary()
}
};
token = lexer.peek();
}
return expr;
}
Before we go and check the details of parseUnary
, you may wonder why parseAdditive
is first called and then parseMultiplicative
. This is done in order to satisfy the operator precedence requirement. Consider the expression 2 + 4 * 10
which actually evaluates to 42
(multiply 4 by 10, then add 2) rather than 60
(add 2 to 4, multiply by 10). This is only possible if the topmost node in syntax tree is binary operator + which has two child nodes, the left one is just the number 2
and the right node is actually another binary operator *. The latter holds two numbers as the corresponding child nodes, 4
and 10
.
To handle a negation, like -42
, we use the concept of unary operation. In the syntax tree, this is represented by a unary operator node and it has only one child node (hence the name). While negation is one form of unary operation, there is also a case such as +42
which we need to take into account. Thanks to the recursive nature, expression like ----42
or even -+-+42
can be handled without any problem as well. The code to handle the said unary operation is as simple as the following:
function parseUnary() {
var token, expr;
token = lexer.peek();
if (matchOp(token, '-') || matchOp(token, '+')) {
token = lexer.next();
expr = parseUnary();
return {
'Unary': {
operator: token.value,
expression: expr
}
};
}
return parsePrimary();
}
Now here comes one of the most important function of all: parsePrimary
. First of all, let’s see four possible forms of primary node:
- an identifier (basically referring to a variable in this context), e.g.
x
- a number, e.g.
3.14159
- a function call, e.g.
sin(0)
- another expression enclosed in brackets, e.g.
(4 + 5)
Fortunately, deciding whether the incoming tokens will form one of the above possibilities is rather easy as we just need to examine the token type. There is only ambiguity between an identifier and a function call, which can be solved if we peek at the next token, i.e. whether it is an open bracket or not. Without further ado, here is the code:
function parsePrimary() {
var token, expr;
token = lexer.peek();
if (token.type === T.Identifier) {
token = lexer.next();
if (matchOp(lexer.peek(), '(')) {
return parseFunctionCall(token.value);
} else {
return {
'Identifier': token.value
};
}
}
if (token.type === T.Number) {
token = lexer.next();
return {
'Number': token.value
};
}
if (matchOp(token, '(')) {
lexer.next();
expr = parseAssignment();
token = lexer.next();
if (!matchOp(token, ')')) {
throw new SyntaxError('Expecting )');
}
return {
'Expression': expr
};
}
throw new SyntaxError('Parse error, can not process token ' + token.value);
}
Now the remaining part is parseFunctionCall
. If we see an example of a function call like sin(0)
, it basically consists of a function name, open bracket, function argument, and close bracket. It is important to realize that there can be more than one arguments (foo(1, 2, 3)
) or no argument at all (random()
), depending on the function itself. For simplicity, we split the handling of function argument to parseArgumentList
. Here are both functions for your pleasure:
function parseArgumentList() {
var token, expr, args = [];
while (true) {
expr = parseExpression();
if (typeof expr === 'undefined') {
break;
}
args.push(expr);
token = lexer.peek();
if (!matchOp(token, ',')) {
break;
}
lexer.next();
}
return args;
}
function parseFunctionCall(name) {
var token, args = [];
token = lexer.next();
if (!matchOp(token, '(')) {
throw new SyntaxError('Expecting ( in a function call "' + name + '"');
}
token = lexer.peek();
if (!matchOp(token, ')')) {
args = parseArgumentList();
}
token = lexer.next();
if (!matchOp(token, ')')) {
throw new SyntaxError('Expecting ) in a function call "' + name + '"');
}
return {
'FunctionCall' : {
'name': name,
'args': args
}
};
}
Voila! That’s all our parser code. When combined properly into a functional object, it is just about 200 lines of code, supporting various math operations with proper precedences, brackets, variables, and function calls.
Live demo is available at ariya.github.io/tapdigit/parser.html (most common desktop and mobile browsers are supported).
For the last part next week, we will see the final missing puzzle necessary to be able to evaluate the syntax tree. Update: the final part (on building the interpreter) has been published!