"http://www.w3.org/TR/html4/loose.dtd"> >
This short tutorial presents how to make a simple calculator. The calculator will compute basic mathematical expressions (+, -, *, /) possibly nested in parenthesis. We assume the reader is familiar with regular expressions.
Expressions are defined with a grammar. For example an expression is a term, a term is a sum of factors and a factor is a product of atomic expressions. An atomic expression is either a number or a complete expression in parenthesis.
We describe such grammars with rules. A rule describe the composition of an item of the language. In
our grammar we have 3 items (Term, Factor, Atom). We will call these items ‘symbols’ or ‘non terminal
symbols’. The decomposition of a symbol is symbolized with . The grammar of this tutorial is given in
figure 3.1.
We have defined here the grammar rules (i.e. the sentences of the language). We now need to describe the lexical items (i.e. the words of the language). These words - also called terminal symbols - are described using regular expressions. In the rules we have written some of these terminal symbols (+,-,*,/,(,)). We have to define number. For sake of simplicity numbers are integers composed of digits (the corresponding regular expression can be [0 - 9]+). To simplify the grammar and then the Python script we define two terminal symbols to group the operators (additive and multiplicative operators). We can also define a special symbol that is ignored by TPG. This symbol is used as a separator. This is generaly usefull for white spaces and comments. The terminal symbols are given in figure 3.2
|
This is sufficient to define our parser with TPG. The grammar of the expressions in TPG can be found in figure 3.3.
|
Calc is the name of the Python class generated by TPG. START is a special non terminal symbol treated as the axiom1 of the grammar.
With this small grammar we can only recognize a correct expression. We will see in the next sections how to read the actual expression and to compute its value.
The input of the grammar is a string. To do something useful we need to read this string in order to transform it into an expected result.
This string can be read by catching the return value of terminal symbols. By default any terminal symbol returns a string containing the current token. So the token '(' always returns the string '('. For some tokens it may be useful to compute a Python object from the token. For example number should return an integer instead of a string, add and mul should return a function corresponding to the operator. That why we will add a function to the token definitions. So we associate int to number and make_op to add and mul.
int is a Python function converting objects to integers and make_op is a user defined function (figure 3.4).
|
To associate a function to a token it must be added after the token definition as in figure 3.5
|
We have specified the value returned by the token. To read this value after a terminal symbol is recognized we will store it in a Python variable. For example to save a number in a variable n we write number/n. In fact terminal and non terminal symbols can return a value. The syntax is the same for both sort of symbols. In non terminal symbol definitions the return value defined at the left hand side is the expression return by the symbol. The return values defined in the right hand side are just variables to which values are saved. A small example may be easier to understand (figure 3.6).
|
In the example described in this tutorial the computation of a Term is made by applying the operator to the factors, this value is then returned :
Term/t -> Fact/t ( add/op Fact/f {{ t = op(t,f) }} )* ;
|
This example shows how to include Python code in a rule. Here {{...}} is copied verbatim in the generated parser.
Finally the complete parser is given in figure 3.7.
|
To embed a TPG parser in a Python program, you only need the tpg.compile function. This function takes a grammar (in a string2) and returns a string containing the Python code for the parser. One way to use this parser is to exec its definition. A practical way to build parsers is to exec the result of tpg.compile (figure 3.8).
|
To use this parser you now just need to import tpg, compile the grammar and instanciate an object of the class Calc as in figure 3.9.
|
This tutorial shows some of the possibilities of TPG. If you have read it carefully you may be able to start with TPG. The next chapters present TPG more precisely. They contain more examples to illustrate all the features of TPG.
Happy TPG’ing!