Writing an Interpretter Using JavaCC

Java, Tutorial - - Posted on January, 15 at 11:37 am

Ever wondered how an interpreter works ? Well, while evaluating parser generators for executing expressions I wrote this small example.

Getting to the basics

What is an Interpreter ?

An interpreter is a computer program that executes the program given to it as an input. In contrast, a compiler generates machine code for the program given to it, which is then executed by the operating system or computer.

Lets say the language we are going to interpret supports simple assignment statements like

  1. a=1+5;
  2. b=a*3;
  3. c=a+b;

The execution of this small piece of code would involve four steps

  1. Lexical Analysis
  2. Syntax Analysis
  3. Generation of Abstract Syntax Tree
  4. Execute the code

In the lexical analysis phase all the tokens in the code are identified. In out example the tokens are [a, =, 1, +, 5, ; ......]. Once the tokens are identified we need to match them to our defined grammar i.e. many lines of Identifier=Expression;

Defining our language as a Context Free Grammar
The first thing we need to do is to define the grammar of our language as a context free grammar. A context free grammar is a way to define the formal structure in the form

V -> w

  1. language -> (statement)+ # contains one or more statements statement ->
  2. variable=expression; variable -> [“a”-“z”]+ # a variable contains one or more letters from a-z
  3. expression -> additiveExpr
  4. additiveExpr -> multiplyExpr ((+ | -) multiplyExpr)* # can be 5 or (5 + 4) or (5 - 2)
  5. multiplyExpr -> unaryExpr ((* | %) unaryExpr)* # can be 5 or (5 * 4) or (5 % 2)
  6. unaryExpr -> -primaryExpr | primaryExpr # can be 5 or -5
  7. primaryExpr -> number | variable | [1-9]+ # can be  1 or abc
  8.  

Implementing with JavaCC

JavaCC is a parser generator that takes a context free grammar and generates a parser for that language. Lets try to define our grammar in JavaCC. JavaCC would take a .jj file and generate java classes to parse the file and validate the grammar.

Lexical Analysis

We would need to instruct javacc to identify the tokens. The first thing to define in the javacc file is the tokens it needs to skip and the tokens that it needs to scan and parse out.

  1. SKIP :
  2. {
  3. ” “
  4. |\t
  5. |\n
  6. |\r
  7. }

Now we need to define all the tokens that the parser generated by javacc should parse out

  1. TOKEN:
  2. {
  3. <number:>
  4. |       <variable:>
  5. |</variable:></number:>
  6. <divide:>       |     <multiply:>
  7. |</multiply:></divide:>
  8. <plus:>         |       <minus:>
  9. }

Syntax Analysis
In JavaCC the way to define a production of the form V->e is like

  1. V() :
  2. {}
  3. {
  4. e()
  5. }

The final grammar defined using the javacc format would look like

  1. PARSER_BEGIN(ExpressionParser)public class ExpressionParser {
  2.  
  3. public static void main(String args[]) throws Exception {
  4. ExpressionParser parser = new ExpressionParser(new java.io.FileReader(args[0]));
  5. }
  6. }
  7.  
  8. PARSER_END(ExpressionParser)
  9.  
  10. SKIP :
  11. {
  12. ” “
  13. |\t
  14. |\n
  15. |\r
  16. }
  17.  
  18. TOKEN:
  19. {
  20. <number:>
  21. |       <variable:>
  22. |</variable:></number:>
  23. <divide:>       |     <multiply:>
  24. |</multiply:></divide:>
  25. <plus:>         |       <minus:>
  26. }</minus:></plus:>ASTstart start() :{}
  27. {
  28. (statement())+
  29. {}
  30. }void statement() :
  31. {}
  32. {
  33. identifier()“=”expression()“;”
  34. }
  35.  
  36. void identifier() :
  37. {}
  38. {
  39. <variable>
  40. {
  41. }
  42. }</variable>
  43.  
  44. void expression():
  45. {}
  46. {
  47. additiveExpression()
  48. }
  49.  
  50. void additiveExpression() :
  51. {}
  52. {
  53. multiplicativeExpression()
  54. (
  55.  
  56. <plus> multiplicativeExpression()
  57. | <minus> multiplicativeExpression()
  58. )*
  59. }</minus></plus>void multiplicativeExpression() :
  60. {}
  61. {
  62. unaryExpression()
  63. (
  64. <multiply> unaryExpression()
  65. |</multiply>
  66. <divide> unaryExpression()
  67. )*
  68. }
  69. void unaryExpression() :
  70. {}
  71. {
  72. <minus> numberExpression()|
  73. numberExpression()
  74. }</minus></divide>void numberExpression() :
  75. {
  76. }
  77. {
  78. <number>
  79. | <variable>
  80. }

This would generate ExpressionParser.java which would contain the code requried to parse the defined grammar. On execution of the main method with a file as an argument the parser would pass if there is a valid grammar in the file, or it would throw an exception if the file does not confine to the defined grammar. Now we are done with the lexical and the syntactical analysis.
Constructing the Abstract Syntax Tree
Now we need to construct the abstract syntax tree. Lets take a simple example

  1. a=1+5;
  2. b=a*3;
  3. c=a+b;

The abstract syntax tree for this should look like
Abstract Syntax Tree
JavaCC comes along with a preprocessor called jjtree that would create us a abstract syntax tree. Whereever in the grammar we need a tree node we embed # in the grammar definition file.

  1. void statement() #Statement:
  2. {}
  3. {
  4. identifier()“=”expression()“;”
  5. }

This would create a node ASTStatement.java for the production statement. All nodes by default would extend SimpleNode.java which implements Node that is generated with the node lifecycle methods.

  1. /* Generated By:JJTree: Do not edit this line. Node.java */
  2.  
  3. /* All AST nodes must implement this interface.  It provides basic
  4. machinery for constructing the parent and child relationships
  5. between nodes. */
  6.  
  7. public interface Node {
  8.  
  9. /** This method is called after the node has been made the current
  10. node.  It indicates that child nodes can now be added to it. */
  11. public void jjtOpen();
  12.  
  13. /** This method is called after all the child nodes have been
  14. added. */
  15. public void jjtClose();
  16.  
  17. /** This pair of methods are used to inform the node of its
  18. parent. */
  19. public void jjtSetParent(Node n);
  20. public Node jjtGetParent();
  21.  
  22. /** This method tells the node to add its argument to the node’s
  23. list of children.  */
  24. public void jjtAddChild(Node n, int i);
  25.  
  26. /** This method returns a child node.  The children are numbered
  27. from zero, left to right. */
  28. public Node jjtGetChild(int i);
  29.  
  30. /** Return the number of children the node has. */
  31. public int jjtGetNumChildren();
  32.  
  33. /** Accept the visitor. **/
  34. public Object jjtAccept(ExpressionParserVisitor visitor, Object data);
  35. }

Our final grammar decorated with all the defintions of the abstract syntax tree nodes would look like

  1. options {
  2. MULTI=true;
  3. VISITOR=true;
  4. NODE_DEFAULT_VOID=true;
  5. NODE_EXTENDS=“BaseNode”;
  6. }
  7.  
  8. PARSER_BEGIN(ExpressionParser)
  9.  
  10. public class ExpressionParser {
  11.  
  12. public static void main(String args[]) throws Exception {
  13. ExpressionParser parser = new ExpressionParser(new java.io.FileReader(args[0]));
  14. //ExpressionParser parser = new ExpressionParser(System.in);
  15. ASTstart expr=parser.start();
  16. ExpressionVisitor v=new ExpressionVisitor();
  17. System.out.println(expr.jjtAccept(v,null));
  18. }
  19. }
  20.  
  21. PARSER_END(ExpressionParser)
  22.  
  23. SKIP :
  24. {
  25. ” “
  26. |\t
  27. |\n
  28. |\r
  29. }
  30.  
  31. TOKEN:
  32. {
  33. <number:>
  34. |       <variable:>
  35. |</variable:></number:>
  36. <divide:>       |     <multiply:>
  37. |</multiply:></divide:>
  38. <plus:>         |       <minus:>
  39. }</minus:></plus:>ASTstart start() #start:{}
  40. {
  41. (statement())+
  42. { return jjtThis; }
  43. }void statement() #Statement:
  44. {}
  45. {
  46. identifier()“=”expression()“;”
  47. }
  48.  
  49. void identifier() :
  50. {}
  51. {
  52. <variable>
  53. {
  54. jjtThis.data.put(“name”,token.image);
  55. }#Variable
  56. }</variable>
  57.  
  58. void expression():
  59. {}
  60. {
  61. additiveExpression()
  62. }
  63.  
  64. void additiveExpression() :
  65. {}
  66. {
  67. multiplicativeExpression()
  68. (
  69.  
  70. <plus> multiplicativeExpression()#AddExpr(2)
  71. | <minus> multiplicativeExpression()#SubractExpr(2)
  72. )*
  73. }</minus></plus>void multiplicativeExpression() :
  74. {}
  75. {
  76. unaryExpression()
  77. (
  78. <multiply> unaryExpression()#MultiplyExpr(2)
  79. |</multiply>
  80. <divide> unaryExpression()#DivideExpr(2)
  81. )*
  82. }
  83. void unaryExpression() :
  84. {}
  85. {
  86. <minus> numberExpression()#NegateExpr(1)|
  87. numberExpression()
  88. }</minus></divide>void numberExpression() :
  89. {
  90. }
  91. {
  92. <number>
  93. {
  94. jjtThis.data.put(“value”,new Integer(Integer.parseInt(token.image)));
  95. }#Number
  96. | <variable>
  97. {
  98. jjtThis.data.put(“name”,token.image);
  99. }#VariableValue
  100. }

If you notice in the option, we have instructed jjtree to extend all sources from BaseNode.java. The base node is decorated with a hashmap that would be used to store the scanned tokens during parsing.

  1. import java.util.HashMap;</variable></number>public class BaseNode{
  2. public HashMap data=new HashMap();
  3. }

If you observe line 99 of the grammar definition, it shows how the parsed number token is added to the Number node that is constructed by jjtree.
This completes the generation of the abstract syntax tree.
Interpretation and Execution
What now remaing is to walk the tree and execute the code. One of the options that we have specified in the grammar definition is to generate the visitor. JJtree would automatically add accept methods on all the nodes generated and also generate the visitor interface

  1. void statement() #Statement:
  2. /* Generated By:JJTree: Do not edit this line. C:\workspace\JavaCC\target\ExpressionParserVisitor.java */
  3.  
  4. public interface ExpressionParserVisitor
  5. {
  6. public Object visit(SimpleNode node, Object data);
  7. public Object visit(ASTstart node, Object data);
  8. public Object visit(ASTStatement node, Object data);
  9. public Object visit(ASTVariable node, Object data);
  10. public Object visit(ASTAddExpr node, Object data);
  11. public Object visit(ASTSubractExpr node, Object data);
  12. public Object visit(ASTMultiplyExpr node, Object data);
  13. public Object visit(ASTDivideExpr node, Object data);
  14. public Object visit(ASTNegateExpr node, Object data);
  15. public Object visit(ASTNumber node, Object data);
  16. public Object visit(ASTVariableValue node, Object data);
  17. }

All we need to do now is to implement the visitor interface and interpret our language. The Interpreter would need two datastores

    Symbol Table
    Execution Stack

The symbol table would hold all the variables and their values, and the execution stack would contain all the intermediate results while expression are evaluated.
The Visitor code would look like.

  1. public interface ExpressionParserVisitor
  2.  
  3. import java.util.HashMap;
  4. import java.util.LinkedList;
  5.  
  6. public class ExpressionVisitor implements ExpressionParserVisitor{
  7.  
  8. private LinkedList stack=new LinkedList();
  9. private HashMap symbolTable=new HashMap();
  10.  
  11. public Object visit(SimpleNode node, Object data) {
  12. node.childrenAccept(this,data);
  13. return null;
  14. }
  15.  
  16. public Object visit(ASTstart node, Object data) {
  17. node.childrenAccept(this,data);
  18. return symbolTable;
  19. }
  20.  
  21. public Object visit(ASTAddExpr node, Object data) {
  22. node.childrenAccept(this,data);
  23. Integer arg1=pop();
  24. Integer arg2=pop();
  25. stack.addFirst(new Integer(arg2.intValue()+arg1.intValue()));
  26. return null;
  27. }
  28.  
  29. public Object visit(ASTSubractExpr node, Object data) {
  30. node.childrenAccept(this,data);
  31. Integer arg1=pop();
  32. Integer arg2=pop();
  33. stack.addFirst(new Integer(arg2.intValue()-arg1.intValue()));
  34. return null;
  35. }
  36.  
  37. public Object visit(ASTMultiplyExpr node, Object data) {
  38. node.childrenAccept(this,data);
  39. Integer arg1=pop();
  40. Integer arg2=pop();
  41. stack.addFirst(new Integer(arg2.intValue()*arg1.intValue()));
  42. return null;
  43. }
  44.  
  45. public Object visit(ASTDivideExpr node, Object data) {
  46. node.childrenAccept(this,data);
  47. Integer arg1=pop();
  48. Integer arg2=pop();
  49. stack.addFirst(new Integer(arg2.intValue()/arg1.intValue()));
  50. return null;
  51. }
  52.  
  53. public Object visit(ASTNegateExpr node, Object data) {
  54. node.childrenAccept(this,data);
  55. Integer arg1=pop();
  56. stack.addFirst(new Integer(arg1.intValue()*-1));
  57. return null;
  58. }
  59.  
  60. public Object visit(ASTNumber node, Object data) {
  61. node.childrenAccept(this,data);
  62. stack.addFirst(node.data.get(“value”));
  63. return null;
  64. }
  65.  
  66. public Object visit(ASTStatement node, Object data) {
  67. node.childrenAccept(this,data);
  68. Integer value=(Integer)stack.removeFirst();
  69. String var=(String)stack.removeFirst();
  70. symbolTable.put(var,value);
  71. return null;
  72. }
  73.  
  74. public Object visit(ASTVariable node, Object data) {
  75. node.childrenAccept(this,data);
  76. String var=(String)node.data.get(“name”);
  77. stack.addFirst(var);
  78. return null;
  79. }
  80.  
  81. public Object visit(ASTVariableValue node, Object data) {
  82. node.childrenAccept(this,data);
  83. String var=(String)node.data.get(“name”);
  84. stack.addFirst(symbolTable.get(var));
  85. return null;
  86. }
  87.  
  88. private Integer pop()
  89. {
  90. return (Integer)stack.removeFirst();
  91. }
  92.  
  93. }

Well, thats it .. our very own interpretter !!
Source Code As Eclipse Project

Posted in Java, Tutorial |

18 Responses to “Writing an Interpretter Using JavaCC”

  1. Greg Says:

    you might want to check out this wordpress plugin for syntax highlighting

    http://blog.igeek.info/wp-plugins/igsyntax-hiliter/

    i doesn’t install by default in wordpress 2.0, so you’ll need to remove a couple checks it does in the installer….

  2. Martin Skarsaune Says:

    Thanks for a very good JavaCC example!
    I have to conduct a very similar task to what you are doing in the near future (choose a Java parser generator). Maybe you can save me some trouble. Could you offer me your views on the different ones out there? With regards to the grammar definition, generated code, documentation, market share etc, and possibly your rationale for choosing one over the other. I guess I’m first and foremost looking for simplicity, and not have to deal with too much complexity or over engineered stuff. If there are things you are not comfortable with as a public post, I would highly appreciate an email.

    Cheers!

    Martin Skarsaune, Norway

  3. David Strupl Says:

    I am in no way expert on this but I took part in one project where we needed this (https://coyote.dev.java.net/) and the choices there were JavaCC + JJTree or ANTLR. I suggest checking out also ANTLR.
    Otherwise also thanks to Anand for a simple example. All the best, David

  4. Norbert Says:

    This is a nice example which I expect to be helpful to me in the near future. Thanks!
    -Norbert

  5. anandsekar Says:

    Martin, as David suggests it may be worthwhile trying Antlr out. In terms of features, I think Antlr and JacaCC are similar, however there seems to be more documentation and a broader community for Antlr. Hibernate’s new implementation also uses Antlr. However personally I am more comfortable with JavaCC, and we use JavaCC in quite a lot of instances during work, personally I prefer JavaCC as I am more aquanited with it. One small drawback with Antlr is that the runtime requires the antlr.jar, however javacc does not require any runtime jars.

  6. Calum Says:

    While I’ve never really used any of them in anger, I sometimes wonder why all the parser libraries are parser generators.
    Instead, why can’t you load the library which you can load in at runtime, configure it with your grammar and then add hooks to do whatever your code needs to do with the parser output.
    Also, it seems that, with the parser generators, you’ve got to learn an new language, which includes writing fragments of Java.
    Why does parsing require code generation, while most other libraries don’t? What’s different about parsing?

    I’ve recently come across JParsec (), which seems to be a runtime library based a parser library (parsec) for Haskell.

    Are there any other approaches like this? What are the pros and cons of this compared with parser generators?

    Thanks,
    Calum

  7. Aatish Says:

    Hi, Anand
    Thanks for the example. I am having problems with using this option NODE_EXTENDS. I tried using JDK_VERSION option, but the jjtree compiler ignores this option saying BAD_OPTION. Any insights on that?

    I am using Java version 1.5.0_05, and JavaCC version4.0

    Any inputs will be greatly appreciated.

    Thanks,
    Aatish

  8. erno Says:

    hy,

    I started with some simple parser and I use JavaCC. It works just fine but now I would like to make an interpreter from it. I readed this tutorial but I don’t get it. Can you help me please. thanx in advance

    best regards,
    erno

  9. anandsekar Says:

    Hi Erno,

    You have completed the lexical and the syntactical analyis phase, your next step would be to generate the abstract syntax tree. You can do this using JJTree, The abstract syntax tree would be the hierachical view of the program, as shown in the diagram in the tutorial. Once the tree is created you could traverse the tree and execute each node.

    I recommend you to download the source code, I have a very simple example and I think you would find it useful. I have used javacc and jjtree in the example

    anand

  10. Ada Says:

    Hi All,

    I’m new to javacc. When i try to run the generated code, i got the following error.

    Exception in thread “main” java.lang.NoClassDefFoundError: ExpressionParser/java

    Have i did anything wrong? Please help me, thanks.

    Ada

  11. anandsekar Says:

    Hi Ada,

    The parser generator generates all the classes requried for parsing the langugage, to use the parser you would need to invoke the parser. Here is a sample code to invoke the parser.
    #
    public static void main(String args[]) throws Exception {
    #
    ExpressionParser parser = new ExpressionParser(new java.io.FileReader(args[0]));
    #
    //ExpressionParser parser = new ExpressionParser(System.in);
    #
    ASTstart expr=parser.start();
    #
    ExpressionVisitor v=new ExpressionVisitor();
    #
    System.out.println(expr.jjtAccept(v,null));
    #
    }
    #
    }
    #

  12. Ada Says:

    Hi Anand,

    Thanks for your reply. I found these code are contained in the generated ExpressionParser.java. Why when i try to run it, it complains “The selection does not contain a main type”? Am i missing something? Why my eclipse does not recognize these code as my main type? Sorry if this is a stupid question but please help me.

    Thanks,
    Kam Ting

  13. anandsekar Says:

    Hi Ada,

    Your code needs to have the main method

    public static void main(String args[])

    for it to be entry point of execution

  14. leo Says:

    hi to all
    nice tutorial but i have a question: which IDE can i use to run this example?

    thanks in advance

    bye

  15. anandsekar Says:

    I have fixed the link to the source code. You can use ecplise or any IDE to run this.

  16. Gem Says:

    Hi Anand,

    Great resource!
    I would like to know whether we can find the information like package name, file name, method names, getter and setter methods of an object from a java file. I am trying to figure this out, and am hoping your tutorial would be of help. Please send me an email if you have more information regarding my specific issue.

    Thanks a lot.
    Gem.

  17. anandsekar Says:

    The easiest way is to use reflection. However if you want to parse the file and inspect you could use the grammar for java from the community and tune it to your liking. The grammar is available at https://javacc.dev.java.net/servlets/ProjectDocumentView?documentID=882&showInfo=true

  18. Ken Beesley Says:

    Is it my browser? or is the display of this example somewhat garbled, especially the TOKEN production?

Leave a Reply