"http://www.w3.org/TR/html4/loose.dtd"> >
Before the version 2 of TPG, lexers were context sensitive. That means that the parser commands the lexer to match some tokens, i.e. different tokens can be matched in a same input string according to the grammar rules being used. These lexers were very flexible but slower than context free lexers because TPG backtracking caused tokens to be matched several times.
In TPG 2, the lexer is called before the parser and produces a list of tokens from the input string. This list is then given to the parser. In this case when TPG backtracks the token list remains unchanged.
Since TPG 2.1.2, context sensitive lexers have been reintroduced in TPG. By default lexers are context free but the CSL option (see 5.3.2) turns TPG into a context sensitive lexer.
CSL grammar have the same structure than non CSL grammars (see 5.1) except from the CSL option (see 5.3.2).
The CSL lexer is based on the re module. The difference with non CSL lexers is that the given regular expression is compiled as this, without any encapsulation. Grouping is then possible and usable.
In CSL lexers there is no predefined tokens. Tokens are always inlined and there is no precedance issue since tokens are matched while parsing, when encountered in a grammar rule.
A token definition can be simulated by defining a rule to match a particular token (see figure 8.1).
In non CSL parsers there are two kinds of tokens: true tokens and token separators. To declare separators in CSL parsers you must use the special separator rule. This rule is implicitly used before matching a token. It is thus necessary to distinguish lexical rules from grammar rules. Lexical rule declarations start with the lex keyword. In such rules, the separator rule is not called to avoid infinite recursion (separator calling separator calling separator ...). The figure 8.2 shows a separator declaration with nested C++ like comments.
In CSL parsers, tokens are matched as in non CSL parsers (see 6.3). There is a special feature in CSL parsers. The user can benefit from the grouping possibilities of CSL parsers. The text of the token can be saved with the infix / operator. The groups of the token can also be saved with the infix // operator. This operator (available only in CSL parsers) returns all the groups in a tuple. For example, the figure 8.3 shows how to read entire tokens and to split tokens.
|
There is no difference between CSL and non CSL parsers except from lexical rules which look like grammar rules1.