Compiler Construction – SMD163
- Implement a lexer for MiniJava using JFlex
- A short main method
shall call the lexer repeatedly, and print ”Lexical analysis
depending on whether lexical errors were found in the input.
- The lexer shall handle the MiniJava test suite correctly.
See further the MiniJava project home page.
- The MiniJava lexical syntax is described in Appel,
Appendix 1; we recapitulate it here for the
benefit of those without access to the text book. However, we make one
exception to the original
Handling /* nested /*
comments is not */
We also add the clarification that white space consists of any
sequence of the space, tab,
newline, and the carriage return characters, and that white space may
appear between any two
tokens, or between a token and a comment.
Quoting Andrew W. Appel, Modern
compiler implementation in Java, 2nd edition:
A.1 LEXICAL ISSUES
Identifiers: An identifier is a sequence if
letters, digits, and underscores, starting with a letter.
Uppercase letters are distinguished form lowercase. In
this appendix the symbol id
Integer literals: A
sequence of decimal digits is an integer
constant that denotes the
corresponding integer value. In this appendix the symbol INTEGER_LITERAL stands for an
Binary operators: A binary operator is one of
< + - *
In this appendix the symbol op stands for a binary operator.
Comments: A comment
may appear between any two tokens. There are two forms of
comments: One starts with /*, ends
[and may be nested]; another begins with //
and goes to the end of the line.
See the online
- Implement a parser for the MiniJava language according to the grammar in
A.1. Generate a bottom-up parser with Jacc
- Apart from simply checking whether the input conforms to the
MiniJava definition, the parser
should also construct an abstract syntax representation of the parsed
program. You are advised to make use of the predefined abstract syntax
classes that come with the
MiniJava project (available here
as a gzipped tar-file).
- Input to the parser should be the token stream produced by the
lexer implemented in step 1.
- To make it possible to observe the result of parsing, your
program should print a textual
representation of the abstract syntax tree once it is built. The
MiniJava project provides ready-
made classes using the visitor
pattern for this purpose, see this link
for a gzipped tar-file of the
- Note: the grammar specification in Appel, A.2 cannot be used as is in any of the
implementations listed above. The challenge is thus to rewrite the
grammar so that it suits the
parsing technique you have chosen. This particularly involves sorting
out how nested
expressions like !a.b(c)+d[e]*f.length<g.h().i()
should be read, and how to
express that grammatically.
This step is simple to express:
implement a type-checker for MiniJava according to the type system
definition you have developed as your fourth
. The distinct activities that
constitute this task are:
- Definition of a suitable representation for types. A particular
method that needs to be
supported is an equality-test on types. The existing abstract syntax
classes that implement
types may be used, but they will need to be complemented in order to
- Definition of a flexible structure for representing environments.
A linked list approach is
recommended, but there are many variations to this theme. Moreover, the
between looking up a variable identifier and class identifier needs to
be captured in some
- Definition of abstract methods in classes like Exp and Statement
that correspond to the
respective typing judgment, as well as implementation of these methods
in every concrete
class of the abstract syntax hierarchy.
- Addition of some form of source file position information to the
abstract syntax tree, so that
type errors can be adequately reported. One way of doing that is to
modify the lexer so that the
semantic information returned for a keyword or separator symbol is its file position, and then to
attach that information to selected parts of the syntax tree as it is
constructed by the parser.
Now we've come to the point when it's time to generate executable code
for validated MiniJava
programs. This is the final mandatory step of the programming project,
but it can be implemented
with varying levels of ambition:
- The basic option is to implement a code generator that
corresponds to the E1 and S1
schemes you have constructed as part of homework assignment 5. This implies using
the stack for all intermediate values, and a very naive technique for
the conditionals in if- and while-statements.
- As a more refined, but still very basic attempt, the context
sensitive compilation schemes
branchTrue and branchFalse from lecture 10 (pp 47-53) can be added to the
implementation to improve the handling of if and while conditionals.
- Add a simple register allocator for temporaries, by implementing
the stack simulation
described in the slides for lecture 10.
Apart from that, this option is like no. 2 above.
Please note that the enumerated alternatives is a menu of choices, you
are not supposed to complete
more than one of these tasks.
The Pentium-based server tau10.sm.luth.se
is available for testing the output from your
compiler. The server is running Linux with your personal directories
mounted, and you can log on
using your ordinary username and password.
A note regarding code generation: The Linux calling convention for the
IA32 treats register ebx
as callee saves, and the others as the caller's responsibility. That
means that your method prologue/
epilogue must push/pop ebx if it is used at all within the function.
Other registers must be saved
during method calls only if their values are used after the call. For
more information regarding
assembly language programming for IA32 under Linux, see this online
, in particular
appendices B, C and F.
The programming project will be reviewed all four steps at once after the
deadline of March 10.
Deliverables consist of a written
report and a demonstration
of your implementation. The
purpose of the report is to make it easy for people outside your group
to quickly grasp the
structure of your
implementation, and identify the interesting design choices (see below).
Also, make sure to book a
time for demonstration well in advance of the deadline, by
contacting one of the lecturers.
A successfully completed programming project is worth 2 academic points.
Some report writing guidelines:
The report should be sent in pdf format to
midnight Friday March 9 2006.
- Be very economic: Write for your lecturer and informed
colleagues, not for a
unaware of the assignment.
- Start by describing the structure
of your compiler, in terms of phases, passes, and
- Focus on your design choices;
i.e., things that are important in order to understand
your particular solution.
- Emphasize areas where you belive your solution has particular
benefits, as well as parts
where you think your compiler doesn't work as well as you would like.
- Your aim should be to help the
reader see the big picture of your implementation.
- Do not repeat in plain text facts that are obvious in the source
code, like the type of
- Do not write less than two, and not more than six pages of text.
The programming project should be carried out in pairs, or
Oral discussion is good, copying
files is not!