Writing an Interpretter Using JavaCC
Java, Tutorial - - Posted on January, 15 at 11:37 am
Ever wondered how an interpreter works ? Well, while evaluating parser generators for executing expressions I wrote this small example.
Getting to the basics
What is an Interpreter ?
An interpreter is a computer program that executes the program given to it as an input. In contrast, a compiler generates machine code for the program given to it, which is then executed by the operating system or computer.
Lets say the language we are going to interpret supports simple assignment statements like
-
a=1+5;
-
b=a*3;
-
c=a+b;
The execution of this small piece of code would involve four steps
- Lexical Analysis
- Syntax Analysis
- Generation of Abstract Syntax Tree
- Execute the code
In the lexical analysis phase all the tokens in the code are identified. In out example the tokens are [a, =, 1, +, 5, ; ......]. Once the tokens are identified we need to match them to our defined grammar i.e. many lines of Identifier=Expression;
Defining our language as a Context Free Grammar
The first thing we need to do is to define the grammar of our language as a context free grammar. A context free grammar is a way to define the formal structure in the form
V -> w
-
language -> (statement)+ # contains one or more statements statement ->
-
variable=expression; variable -> [“a”-“z”]+ # a variable contains one or more letters from a-z
-
expression -> additiveExpr
-
additiveExpr -> multiplyExpr ((+ | -) multiplyExpr)* # can be 5 or (5 + 4) or (5 - 2)
-
multiplyExpr -> unaryExpr ((* | %) unaryExpr)* # can be 5 or (5 * 4) or (5 % 2)
-
unaryExpr -> -primaryExpr | primaryExpr # can be 5 or -5
-
primaryExpr -> number | variable | [1-9]+ # can be 1 or abc
-
Implementing with JavaCC
JavaCC is a parser generator that takes a context free grammar and generates a parser for that language. Lets try to define our grammar in JavaCC. JavaCC would take a .jj file and generate java classes to parse the file and validate the grammar.
Lexical Analysis
We would need to instruct javacc to identify the tokens. The first thing to define in the javacc file is the tokens it needs to skip and the tokens that it needs to scan and parse out.
-
SKIP :
-
{
-
” “
-
|“\t“
-
|“\n“
-
|“\r“
-
}
Now we need to define all the tokens that the parser generated by javacc should parse out
-
TOKEN:
-
{
-
<number:>
-
| <variable:>
-
|</variable:></number:>
-
<divide:> | <multiply:>
-
|</multiply:></divide:>
-
<plus:> | <minus:>
-
}
Syntax Analysis
In JavaCC the way to define a production of the form V->e is like
-
V() :
-
{}
-
{
-
e()
-
}
The final grammar defined using the javacc format would look like
-
PARSER_BEGIN(ExpressionParser)public class ExpressionParser {
-
-
}
-
}
-
-
PARSER_END(ExpressionParser)
-
-
SKIP :
-
{
-
” “
-
|“\t“
-
|“\n“
-
|“\r“
-
}
-
-
TOKEN:
-
{
-
<number:>
-
| <variable:>
-
|</variable:></number:>
-
<divide:> | <multiply:>
-
|</multiply:></divide:>
-
<plus:> | <minus:>
-
}</minus:></plus:>ASTstart start() :{}
-
{
-
(statement())+
-
{}
-
}void statement() :
-
{}
-
{
-
identifier()“=”expression()“;”
-
}
-
-
void identifier() :
-
{}
-
{
-
<variable>
-
{
-
}
-
}</variable>
-
-
void expression():
-
{}
-
{
-
additiveExpression()
-
}
-
-
void additiveExpression() :
-
{}
-
{
-
multiplicativeExpression()
-
(
-
-
<plus> multiplicativeExpression()
-
| <minus> multiplicativeExpression()
-
)*
-
}</minus></plus>void multiplicativeExpression() :
-
{}
-
{
-
unaryExpression()
-
(
-
<multiply> unaryExpression()
-
|</multiply>
-
<divide> unaryExpression()
-
)*
-
}
-
void unaryExpression() :
-
{}
-
{
-
<minus> numberExpression()|
-
numberExpression()
-
}</minus></divide>void numberExpression() :
-
{
-
}
-
{
-
<number>
-
| <variable>
-
}
This would generate ExpressionParser.java which would contain the code requried to parse the defined grammar. On execution of the main method with a file as an argument the parser would pass if there is a valid grammar in the file, or it would throw an exception if the file does not confine to the defined grammar. Now we are done with the lexical and the syntactical analysis.
Constructing the Abstract Syntax Tree
Now we need to construct the abstract syntax tree. Lets take a simple example
-
a=1+5;
-
b=a*3;
-
c=a+b;
The abstract syntax tree for this should look like
![]()
JavaCC comes along with a preprocessor called jjtree that would create us a abstract syntax tree. Whereever in the grammar we need a tree node we embed #
-
void statement() #Statement:
-
{}
-
{
-
identifier()“=”expression()“;”
-
}
This would create a node ASTStatement.java for the production statement. All nodes by default would extend SimpleNode.java which implements Node that is generated with the node lifecycle methods.
-
/* Generated By:JJTree: Do not edit this line. Node.java */
-
-
/* All AST nodes must implement this interface. It provides basic
-
machinery for constructing the parent and child relationships
-
between nodes. */
-
-
public interface Node {
-
-
/** This method is called after the node has been made the current
-
node. It indicates that child nodes can now be added to it. */
-
public void jjtOpen();
-
-
/** This method is called after all the child nodes have been
-
added. */
-
public void jjtClose();
-
-
/** This pair of methods are used to inform the node of its
-
parent. */
-
public void jjtSetParent(Node n);
-
public Node jjtGetParent();
-
-
/** This method tells the node to add its argument to the node’s
-
list of children. */
-
public void jjtAddChild(Node n, int i);
-
-
/** This method returns a child node. The children are numbered
-
from zero, left to right. */
-
public Node jjtGetChild(int i);
-
-
/** Return the number of children the node has. */
-
public int jjtGetNumChildren();
-
-
/** Accept the visitor. **/
-
}
Our final grammar decorated with all the defintions of the abstract syntax tree nodes would look like
-
options {
-
MULTI=true;
-
VISITOR=true;
-
NODE_DEFAULT_VOID=true;
-
NODE_EXTENDS=“BaseNode”;
-
}
-
-
PARSER_BEGIN(ExpressionParser)
-
-
public class ExpressionParser {
-
-
//ExpressionParser parser = new ExpressionParser(System.in);
-
ASTstart expr=parser.start();
-
ExpressionVisitor v=new ExpressionVisitor();
-
}
-
}
-
-
PARSER_END(ExpressionParser)
-
-
SKIP :
-
{
-
” “
-
|“\t“
-
|“\n“
-
|“\r“
-
}
-
-
TOKEN:
-
{
-
<number:>
-
| <variable:>
-
|</variable:></number:>
-
<divide:> | <multiply:>
-
|</multiply:></divide:>
-
<plus:> | <minus:>
-
}</minus:></plus:>ASTstart start() #start:{}
-
{
-
(statement())+
-
{ return jjtThis; }
-
}void statement() #Statement:
-
{}
-
{
-
identifier()“=”expression()“;”
-
}
-
-
void identifier() :
-
{}
-
{
-
<variable>
-
{
-
jjtThis.data.put(“name”,token.image);
-
}#Variable
-
}</variable>
-
-
void expression():
-
{}
-
{
-
additiveExpression()
-
}
-
-
void additiveExpression() :
-
{}
-
{
-
multiplicativeExpression()
-
(
-
-
<plus> multiplicativeExpression()#AddExpr(2)
-
| <minus> multiplicativeExpression()#SubractExpr(2)
-
)*
-
}</minus></plus>void multiplicativeExpression() :
-
{}
-
{
-
unaryExpression()
-
(
-
<multiply> unaryExpression()#MultiplyExpr(2)
-
|</multiply>
-
<divide> unaryExpression()#DivideExpr(2)
-
)*
-
}
-
void unaryExpression() :
-
{}
-
{
-
<minus> numberExpression()#NegateExpr(1)|
-
numberExpression()
-
}</minus></divide>void numberExpression() :
-
{
-
}
-
{
-
<number>
-
{
-
}#Number
-
| <variable>
-
{
-
jjtThis.data.put(“name”,token.image);
-
}#VariableValue
-
}
If you notice in the option, we have instructed jjtree to extend all sources from BaseNode.java. The base node is decorated with a hashmap that would be used to store the scanned tokens during parsing.
If you observe line 99 of the grammar definition, it shows how the parsed number token is added to the Number node that is constructed by jjtree.
This completes the generation of the abstract syntax tree.
Interpretation and Execution
What now remaing is to walk the tree and execute the code. One of the options that we have specified in the grammar definition is to generate the visitor. JJtree would automatically add accept methods on all the nodes generated and also generate the visitor interface
-
void statement() #Statement:
-
/* Generated By:JJTree: Do not edit this line. C:\workspace\JavaCC\target\ExpressionParserVisitor.java */
-
-
public interface ExpressionParserVisitor
-
{
-
}
All we need to do now is to implement the visitor interface and interpret our language. The Interpreter would need two datastores
- Symbol Table
- Execution Stack
The symbol table would hold all the variables and their values, and the execution stack would contain all the intermediate results while expression are evaluated.
The Visitor code would look like.
-
public interface ExpressionParserVisitor
-
-
import java.util.HashMap;
-
import java.util.LinkedList;
-
-
public class ExpressionVisitor implements ExpressionParserVisitor{
-
-
-
node.childrenAccept(this,data);
-
return null;
-
}
-
-
node.childrenAccept(this,data);
-
return symbolTable;
-
}
-
-
node.childrenAccept(this,data);
-
return null;
-
}
-
-
node.childrenAccept(this,data);
-
return null;
-
}
-
-
node.childrenAccept(this,data);
-
return null;
-
}
-
-
node.childrenAccept(this,data);
-
return null;
-
}
-
-
node.childrenAccept(this,data);
-
return null;
-
}
-
-
node.childrenAccept(this,data);
-
stack.addFirst(node.data.get(“value”));
-
return null;
-
}
-
-
node.childrenAccept(this,data);
-
symbolTable.put(var,value);
-
return null;
-
}
-
-
node.childrenAccept(this,data);
-
stack.addFirst(var);
-
return null;
-
}
-
-
node.childrenAccept(this,data);
-
stack.addFirst(symbolTable.get(var));
-
return null;
-
}
-
-
{
-
}
-
-
}
Well, thats it .. our very own interpretter !!
Source Code As Eclipse Project
January 20th, 2006 at 9:00 am
you might want to check out this wordpress plugin for syntax highlighting
http://blog.igeek.info/wp-plugins/igsyntax-hiliter/
i doesn’t install by default in wordpress 2.0, so you’ll need to remove a couple checks it does in the installer….
January 24th, 2006 at 10:38 pm
Thanks for a very good JavaCC example!
I have to conduct a very similar task to what you are doing in the near future (choose a Java parser generator). Maybe you can save me some trouble. Could you offer me your views on the different ones out there? With regards to the grammar definition, generated code, documentation, market share etc, and possibly your rationale for choosing one over the other. I guess I’m first and foremost looking for simplicity, and not have to deal with too much complexity or over engineered stuff. If there are things you are not comfortable with as a public post, I would highly appreciate an email.
Cheers!
Martin Skarsaune, Norway
January 25th, 2006 at 12:14 am
I am in no way expert on this but I took part in one project where we needed this (https://coyote.dev.java.net/) and the choices there were JavaCC + JJTree or ANTLR. I suggest checking out also ANTLR.
Otherwise also thanks to Anand for a simple example. All the best, David
January 25th, 2006 at 1:24 am
This is a nice example which I expect to be helpful to me in the near future. Thanks!
-Norbert
January 25th, 2006 at 5:26 am
Martin, as David suggests it may be worthwhile trying Antlr out. In terms of features, I think Antlr and JacaCC are similar, however there seems to be more documentation and a broader community for Antlr. Hibernate’s new implementation also uses Antlr. However personally I am more comfortable with JavaCC, and we use JavaCC in quite a lot of instances during work, personally I prefer JavaCC as I am more aquanited with it. One small drawback with Antlr is that the runtime requires the antlr.jar, however javacc does not require any runtime jars.
January 25th, 2006 at 1:59 pm
While I’ve never really used any of them in anger, I sometimes wonder why all the parser libraries are parser generators.
Instead, why can’t you load the library which you can load in at runtime, configure it with your grammar and then add hooks to do whatever your code needs to do with the parser output.
Also, it seems that, with the parser generators, you’ve got to learn an new language, which includes writing fragments of Java.
Why does parsing require code generation, while most other libraries don’t? What’s different about parsing?
I’ve recently come across JParsec (), which seems to be a runtime library based a parser library (parsec) for Haskell.
Are there any other approaches like this? What are the pros and cons of this compared with parser generators?
Thanks,
Calum
March 9th, 2006 at 8:31 am
Hi, Anand
Thanks for the example. I am having problems with using this option NODE_EXTENDS. I tried using JDK_VERSION option, but the jjtree compiler ignores this option saying BAD_OPTION. Any insights on that?
I am using Java version 1.5.0_05, and JavaCC version4.0
Any inputs will be greatly appreciated.
Thanks,
Aatish
June 15th, 2006 at 5:08 am
hy,
I started with some simple parser and I use JavaCC. It works just fine but now I would like to make an interpreter from it. I readed this tutorial but I don’t get it. Can you help me please. thanx in advance
best regards,
erno
June 15th, 2006 at 7:01 am
Hi Erno,
You have completed the lexical and the syntactical analyis phase, your next step would be to generate the abstract syntax tree. You can do this using JJTree, The abstract syntax tree would be the hierachical view of the program, as shown in the diagram in the tutorial. Once the tree is created you could traverse the tree and execute each node.
I recommend you to download the source code, I have a very simple example and I think you would find it useful. I have used javacc and jjtree in the example
anand
July 7th, 2006 at 2:25 am
Hi All,
I’m new to javacc. When i try to run the generated code, i got the following error.
Exception in thread “main” java.lang.NoClassDefFoundError: ExpressionParser/java
Have i did anything wrong? Please help me, thanks.
Ada
July 10th, 2006 at 6:50 am
Hi Ada,
The parser generator generates all the classes requried for parsing the langugage, to use the parser you would need to invoke the parser. Here is a sample code to invoke the parser.
#
public static void main(String args[]) throws Exception {
#
ExpressionParser parser = new ExpressionParser(new java.io.FileReader(args[0]));
#
//ExpressionParser parser = new ExpressionParser(System.in);
#
ASTstart expr=parser.start();
#
ExpressionVisitor v=new ExpressionVisitor();
#
System.out.println(expr.jjtAccept(v,null));
#
}
#
}
#
July 10th, 2006 at 7:42 am
Hi Anand,
Thanks for your reply. I found these code are contained in the generated ExpressionParser.java. Why when i try to run it, it complains “The selection does not contain a main type”? Am i missing something? Why my eclipse does not recognize these code as my main type? Sorry if this is a stupid question but please help me.
Thanks,
Kam Ting
July 21st, 2006 at 9:43 am
Hi Ada,
Your code needs to have the main method
public static void main(String args[])
for it to be entry point of execution
June 2nd, 2007 at 12:29 pm
hi to all
nice tutorial but i have a question: which IDE can i use to run this example?
thanks in advance
bye
June 2nd, 2007 at 12:53 pm
I have fixed the link to the source code. You can use ecplise or any IDE to run this.
June 6th, 2007 at 1:03 pm
Hi Anand,
Great resource!
I would like to know whether we can find the information like package name, file name, method names, getter and setter methods of an object from a java file. I am trying to figure this out, and am hoping your tutorial would be of help. Please send me an email if you have more information regarding my specific issue.
Thanks a lot.
Gem.
June 6th, 2007 at 2:14 pm
The easiest way is to use reflection. However if you want to parse the file and inspect you could use the grammar for java from the community and tune it to your liking. The grammar is available at https://javacc.dev.java.net/servlets/ProjectDocumentView?documentID=882&showInfo=true
August 7th, 2007 at 1:05 am
Is it my browser? or is the display of this example somewhat garbled, especially the TOKEN production?