TC-1 Code to Write
Be sure to read RE/flex and Bison documentations and tutorials, see RE/flex & Bison.
See the lecture notes, and read the C++ chapter of GNU Bison’s documentation.
Pay special attention to its ‘’Complete C++ Example’’ which is very much like our set up.
- Makefile.am
Include your own testsuite in the
tests
directory. Some examples of tests are given in the tests folder (Given Test Cases). Do not hesitate to make use of them.Include your own test suite in the
tests
directory, and hook it tomake check
. See Generating The Test Driver.- lib/misc/symbol.*, lib/misc/unique.*
The class
misc::symbol
keeps a single copy of identifiers, see The lib/misc Directory. Its implementation inlib/misc/symbol.hxx
andlib/misc/symbol.cc
is incomplete. Note that runningmake check
at the root of the project will produce inlib/misc
unit testing binaries. Having this unit testlib/misc/test-symbol.cc
pass should be a goal by itself. As a matter of fact, unit tests were left to help you: once they pass successfully you may proceed to the rest of the compiler.misc::symbol
’s implementation is based onmisc::unique
, a generic class implementing the Flyweight design pattern. The definition of this class,lib/misc/unique.hxx
, is also to be completed.- lib/misc/variant.*
The implementation of the class template
misc::variant<T0, Ts...>
lacks a couple of conversion operators that you have to supply.- src/parse/scantiger.ll
- Specifications:
The scanner must be able to scan inputs as described in Lexical Specifications and Additional Lexical Specifications.
- What is RE/flex:
RE/flex is a C++ library used to generate lexical analyzers that can interact with Bison. Its purpose is to generate a C++ lexer class with a lex method, which can be called to initiate the lexical analysis of the input data and interact with Bison to produce a complete parser.
The file has a special syntax which is explained here.
More about RE/flex & Bison.
- What does RE/flex generate?
RE/flex generates two files
parsetiger.hh
andparsetiger.cc
these files contain theLexer
class which has alex
method that allows the lexer to be launched. Bison will use it to feed the parser with tokens.- How do I write a rule with RE/flex?
The goal of a lexer is to produce a token from an input stream. For that, RE/flex uses a Maximal Munch algorithm and thus proposes to define patterns as regex that will, when matched, create a token.
Here is an example of how to create a token from a regex to create a array token:
%% "array" return parser::make_ARRAY(td.location); // or `return TOKEN(ARRAY); %%
To create the tokens in the parser you have to use a bison function which is defined for each token, which is :
parser::make_TOKEN(...)
.The macro
TOKEN()
allows to simplify the creation of tokens because Bison’s syntax for creating tokens is verbose.You can find out how to write a rule on the RE/flex documentation.
To have more readability on the regexes you can define them before using them as done on this example.
- RE/flex and sublexer:
To be able to lex the strings you will have to use sublexers, here is the syntax of the sublexers:
... "\"" {grown_string.clear(); start(SC_STRING);} /* Handling of the strings. Initial " is eaten. */ <SC_STRING>{ "\"" { start(INITIAL); return TOKEN_VAL(STRING, grown_string); } ... \\x[0-9a-fA-F]{2} { grown_string += strtol(text() + 2, 0, 16); } ... }
On RE/flex documentation, sublexers are called states.
Strings will be stored as C++
std::string
. See the following code for the basics.Symbols (i.e. identifiers) must be returned as
misc::symbol
objects, not strings.- Metavariable:
Be carefull, metavariable (Additional Lexical Specifications) keywords should only be scanned when extensions are enabled, i.e. not on direct user input.
- Trace the lexing
To keep track of lexing you can enable it using the
set_debug(true)
method on the lexer.- Location tracking
The locations are tracked. The class
Location
to use is produced by Bison:src/parse/location.hh
.To track locations, adjust your scanner, use
YY_USER_ACTION
and thelex
prologue:... %% %{ // Everything here is run each time :code:`lex` is *invoked*. %} /* The rules. */ "if" return TOKEN(IF); ... %%
- src/parse/parsetiger.yy
- Complete parser
The parser must be able to parse tokens as described in Syntactic Specifications and Additional Syntactic Specifications and must not produce comportment yet.
- What is Bison?
Bison is a parser generator, allowing to generate parsers of the LR family. Bison is a tool created by the GNU and maintained by Akim Demaille the creator of the Tiger project (Fair Criticism). The goal of Bison is to write grammar rules in YACC (Yet another Compiler Compiler) format in order to be able to parse a language.
- What does Bison generate?
Bison generates two files parsetiger.hh and parsetiger.cc which represent the parser. The parser is a class named
Parser
which contains aparse(...)
method to start parsing.- What is the structure of a Bison file?
A bison file has a special syntax because it will generate C++. Here is a link to the documentation for more information.
- How to write a Bison rule:
To write a rule with bison you have to start from the EBNF grammar defined in the Syntactic Specifications and Additional Syntactic Specifications, then adapt it to the YACC syntax that bison uses.
Here is an example of Bison code that parses this rule:
(* === Expressions. === *) exp = (* Literals. *) | INT | STRING ;
In bison syntax:
exp: INT | STRING;
In this case INT and STRING are tokens created by the lexer.
During the development of your parser you will have problems with conflicts, there are two types of conflicts Shift/Reduce and Reduce/Reduce.
In order to solve these different problems I invite you to read this documentation on the Bison algorithm.
For some cases such as function arguments you will need to use recursive rules.
- GLR parser
We use
%glr-parser
and%skeleton "glr2.cc"
directives to have Bison generate a GLR parser. Thanks to GLR, some conflicts (S/R and/or R/R) can be resolved at runtime. Use%expect
and%expect-rr
to specify their number. For information, we allow zero R/R conflicts, and one S/R related to the “big lvalue” issue.- Trace the parsing
Use
%printer
to implement--parse-trace
support for terminals and non terminals (see TC-1 Samples). For instance,%define api.value.type variant %token <int> INT "integer" %printer { yyo << $$; } <int>
- src/parse/tiger-driver.cc
The class
TigerDriver
drives the lexing and parsing of input file. Its implementation insrc/parse/tiger-driver.cc
is incomplete.You have to add the creation of the lexer, the parser and launch the parsing.
TigerDriver
acts as a bridge between the lexer, the parser and the error handling, among other things. For more information, see old/02-parser-scanner.pdf: most of what you need to know is detailed here.