Computer Science & Electrical Engineering

Compiler Construction – SMD163



Homework assignment 4

Introduction

This homework assignment will lead to a complete formal definition of the type system for
MiniJava. Its primary goal is to help you getting acquainted with mathematical notations like
inference rules for describing the semantics of a programming language, not to check whether
you are able to come up with the "right" rules. As one is unlikely to find a developed
specification for a task that is about actually producing a specification, a correct solution to
this assignment simply doesn't exist. Instead you will have to rely on intuition, common sense,
and your prior knowledge of how Java works. Any design decision may also be discussed
with your lecturer as the work proceeds.

To avoid problems with mathematical typesetting, we will rely entirely on ASCII symbols in
this assignment. For example, the turnstile symbol will be written |-, and the horizontal line
of an inference rule will simply be a sequence of minus signs (----------). Examples of
rules in this format will follow below.

Abstract syntax

As the backbone of our definitions we will use an abstract syntax notation for MiniJava that
closely follows the structure of the syntax trees built up by the predefined classes of the
MiniJava project. A grammar for the abstract syntax is defined as follows:

  p  ::= C s cs

  c  ::= class C { ds ms }

  m  ::= t M(fs) { ds ss e }

  f  ::= t V

  d  ::= t V

  t  ::= int
      |  boolean
      |  int[]
      |  C

  s  ::= { ss }
      |  if(e) s else s
      |  while(e) s
      |  print(e)
      |  V = e
      |  V[e] = e

  e  ::= e + e
      |  e - e
      |  e * e
      |  e > e
      |  e && e
      |  e[e]
      |  e.length
      |  e.M(es)
      |  n
      |  true
      |  false
      |  V
      |  this
      |  new int[e]
      |  new C()
      |  !e

The nonterminals we use in this grammar are p, c, m, f, d, t, s, e; standing for program,
class declaration, method declaration, formal variable declaration, variable declaration, type,
statement, and expression, respectively. As a very convenient notational short-cut, we let a
nonterminal ending in an s stand for a (possibly empty) sequence of of the corresponding
nonterminal without the s. For example, in the production s ::= { ss } one should
read the ss symbol as a sequence of zero or more s. Note that we have abstracted away
from all concerns regarding separator symbols in this grammar, just like we do in the syntax
trees built from the predefined Java classes. Moreover, much of the redundancy in the
concrete MiniJava syntax is also gone; this is especially true for the representation of the
main class.

Terminal symbols are n (integer constant), V (variable identifier), M (method identifier), and C
(class identifier), as well as any other symbol not counting as a nonterminal. Note that we may
separate between variable, method and class identifiers in an abstract grammar like this, even
though they all have the same lexical definition. This is because we are no longer concerned
with the problem of parsing the grammar; all we are interested in now is a convenient way to
denote distinct nodes in the syntax tree. Using a special symbols for method and class names
just improves readability in this respect.

Environments

For the purposes of type-checking, an environment is basically just a mapping from identifiers
to types. However, because the MiniJava type language is too weak to express that some
identifiers stand for methods and classes, we will have to adopt a slightly different strategy.
Specifically, in our setting the environments will map variable identifiers to types, and class
identifiers to class interfaces. Moreover, a class interface will itself be an environment that
maps method identifiers to method signatures. The structure of a method signature can be
expressed using the following grammar:

  h  ::= t(ts)


That is, a method signature is just a method declaration with the method name, argument
names, and method body erased.

Environments will be constructed on basis of the declarations contained in the program to
be checked. We define a meta-function mkenv that takes care of taking the relevant parts
of
a declaration, and turning them into a corresponding singleton environment.

  mkenv(t V)                = { V -> t }
  mkenv(class C {ds ms})    = { C -> mkenv(ms) }
  mkenv(t M(ds) {ds' ss e}) = { M -> t(ts) }
                              where ts = range(env(ds))

Furthermore, we let mkenv be overloaded so that for each form of declaration, a sequence
xs
will result in the union of the singleton environments for each x.

  mkenv(x1 ... xn)          = mkenv(x1) U ... U mkenv(xn)
                              if x1,...,xn are all distinct


It is very convenient to be able to treat an environment as a function. Thus, if we write
E(V) = t
,  we want it to be read as an assertion that there is a mapping from V to t in E.
Similar definitions apply for the notations E(T) = I and I(M) = t(ts), where the
symbol I stands for a class interface. We will also need a notation for extending
environments: Let E1+E2 denote the union of E1 and E2, where a mapping in E2 is
skipped if there is a mapping in E1 for the same identifier.

Rules

The following basic judgments will need to be part of a static semantics for MiniJava:

E |- e : t Expression e has type t in environment E
E |- s
Statement s is well-typed in environment E
E |- m
Method declaration m is well-typed in environment E
E |- c
Class declaration c is well-typed in environment E
  |- p
Program p is well-typed (in an empty environment)

In addition, we will need judgments that state that sequences of statements or method/class
declarations are well-typed. However, in the context of MiniJava, there is only one sensible
rule for such a judgment, as illustrated by the definition of type-correctness for statement
sequences:

      E |- s1  ...  E |- sn
      ---------------------
      E |- s1 ... sn

The same pattern naturally repeats itself for method and class declaration sequences.

As examples of the different forms of judgments listed above, a few of the core rules that
would make up a static semantics for MiniJava follow below:

      E |- e1 : int    E |- e2 : int
 (1)  ------------------------------
      E |- e1 > e2 : boolean

      E |- e : int
 (2)  -------------
      E |- print(e)

      E' = mkenv(fs)+mkenv(ds)+E   E' |- ss    E' |- e : t

 (3)  ----------------------------------------------------
      E |- t M(fs) {ds ss e}

      E = mkenv(class C {} cs)  E |- s  E |- cs
 (4)  -----------------------------------------
      |- C s cs


These rules can preferably be read as follows:
  1. In environment E, expression e1 > e2 has type boolean if e1 and e2 both
    have type int in E.

  2. In environment E, statement print(e) is well-typed if e has type int in E.

  3. In environment E, method declaration t M(fs) {ds ss e} is well-typed
    if both ss is well-typed and e has type t in some environment E', where E'
    is obtained by extending E with the variable declarations in fs and ds.

  4. Program C s cs is well-typed if both the main statement s and the classes cs
    are well-types in E, where E defines the given classes, including the main class C.
Notice that the rules above contain clear design decisions, which may be all challenged.
For example, rule (1) does not allow for ordered comparison between booleans, something
which could make sense. Note also that rules (3) and (4) are the only rules needed in the
form of method and program type-correctness judgments (no other syntax forms exist).

In general, it is a good idea to keep a set of typing rules syntax-directed, which means that
for each construct in the abstract syntax, there is only one rule that can possibly match.
Exceptions to this principle exist, but they usually apply to languages with a more advanced
type system than ours.

Assignment

Your task is now to complete the type system definition of MiniJava, by providing rules
for the missing expression and statement constructs, as well as a rule defining well-typedness
for class declarations. If necessary, you are allowed to provide alternatives to the initial rules
given above, although it is recommended that you take these rules as a starting point.

Limitation: as you might have noticed, there is no construct in the abstract syntax above that
corresponds to class declarations that use extension. That feature will require a bit more
sophistication than what might be feasible at this point, although the basic principles of
subclassing and subtyping will be covered later in the course. You are therefore not required
to handle class extension in this assignment.



Report via email to Subject line: SMD163 - 4.  Deadline is Feb 20, 13:00.




Homework assignments are carried out individually.

Oral discussion is good, copying files is not
!