CSE526: Principles of Programming Languages
			      Scott Stoller
		 project: type checker for AtomicJava
			version: 6pm,29mar2004
		    team formation due: 22 mar 2004
		       project due: 29 apr 2004

Read:

  Cormac Flanagan and Shaz Qadeer.  A type and effect system for
  atomicity.  In Proc. 2003 ACM SIGPLAN Conference on Programming
  Language Design and Implementation (PLDI), pages 338-349.  
  ACM Press, 2003.  http://www.cs.ucsc.edu/~cormac/papers/pldi03.ps

Implement a type checker for programs in this type system.

This project is done in teams of two students.  One person on each team
should send a message containing the names of both team members to
cse526@cs.sunysb.edu by the above team formation deadline.

You may use any programming language.  Teams that use ocaml will 
receive 10% extra credit, reflecting the fact that they put in the
effort to learn and use a new programming language.  ocaml is well
suited for this kind of program.

Here is a more detailed description of the requirements.

1. Define a data type for abstract syntax trees of the programming
language.  This corresponds to the grammar in Figure 1 with the
extensions in Section 3.3.

If you use ocaml, I suggest that you use a datatype similar in style to
the datatypes for expressions in hw1.  for example,

  type expression = 
    Var of string
  | FieldUpdate of expression*string*expression
  | WhileExpression of expression*expression
  | ...

  type classDef = ...

  type program = Program of classDef list * expression

If you use Java or C++, I suggest that you introduce a class
corresponding to each production in the grammar.  For example, there
will be classes called Atomicity, Program, and Expression.  The
Expression class will have a sub-class WhileExpression, for example,

  class WhileExpression extends Expression {
    Expression guard;
    Expression body;
    ...
  }


You do not need to write a parser.  Inputs to your system will be
supplied directly as elements of this data type.  for example,

  let test1 _ = 
    let exp1 = Var "x" in
    let exp2 = Var "y" in
    let exp3 =  (FieldUpdate exp1 "field1" exp2) in 
    let classdef1 = ... in
    let pgm1 = Program [classdef1] exp3
    in typecheck pgm1

  (* execute test1 *)
  let _ = test1 ()

It is not required---but it is probably worthwhile---to write a simple
"pretty printer" that prints programs in a user-friendly format.

2. Define a data type for atomicities (including conditional
atomicites), with support for operations on atomicities (sequential
composition, join, etc.).

3. Define a type inference function that computes the type and
atomicity type of an expression.  Specifically, given a program P, a
type environment E, and an expression e, this function computes a type
t and an atomicity type a such that P; E |- e : t & a.

Type inference is necessary because subexpressions of a method are not
annotated with their atomicity (because such annotations do not appear
in the grammar extensions in Section 3.3).

This function does intra-procedural (not inter-procedural) type
inference, because each method is annotated with its atomicity, so when
this function encounters a method call, it needs to look only at the
declaration of that method, not at the code of that method.

This function can be defined by recursion on the structure of
expressions.

4. Define a type checking function.  Specifically, given a program P,
this function either indicates that the program is well-typed, or
produces an error message providing some details about a type error
that was encountered.  For simplicity, the type checker can halt after
providing details about the first error it encountered.

This function computes, for each method m of each class c, a type t and
an atomicity type a for m's body, and then checks that type t and
atomicity type a are consistent with the declared type and atomicity
type of m.

5. Write several testcases to test your type checker.  Be sure to
include both well-typed programs and programs with type errors.
Your project submission should include the testcases and your type
checker's output for each testcase.  [more details about this were
added below on 26apr2004.]

You have a few small examples of well-typed programs:

1. the program in atomicity-typing-example on the cse526 web page.

2. atomic bank accounts in section 3.4 of the paper on atomicity types.

3. the Vector example in section 4.2 of the paper on atomicity types.
   although it is written in java, you can capture the "essence" of the
   example in a small AtomicJava program.  (AtomicJava is the language
   defined in section 3.3.)

one point illustrated by the bank account example is that a well-typed
method can have a conditional atomicity containing "error".  for
example, since "requires" is an abbreviation, the atomicity of the
method Account.deposit3 is actually "this ? atomic : error".  This
method is well-typed.  This is not a type error.  There is a type error
only when the declared type of a method (or the main expression) is not
greater-than-or-equal-to the actual type of its body (or of the main
expression, respectively).  Note that, according to the typing rule
[PROG], the main expression of a program is implicitly declared to have
the unconditional atomicity "cmpd".

I think it is a good exercise for people to try to make some examples
on their own, to understand the type system better before implementing
it.  if you make a few small examples and indicate whether you think
they are well-typed or not, I will be happy to check them.

6. Write a brief README describing how to compile and run your system.

7. Submit the assignment in accordance with the general instructions on
the cse526 home page.


GRADING

functionality: 80%
clean, clear, and well-documented code: 10%
good testcases: 10%

Your code should not be grossly inefficient, but clarity and
correctness are more important than efficiency in this assignment, 
so don't bother with minor optimizations.

Each team will hold a 30-minute demo with the TA shortly after the
project is due.  (A sign-up sheet will be available the week before.)
The TA will supply some testcases and will ask the team to run some of
the testcases from their TESTING file.

--------------------
added 22apr2004

a purely bottom-up approach won't easily work.  you need a simple
top-down pass before the bottom-up pass.  or you could use a purely
top-down approach.

here's an example of the need for some top-down analysis.  consider

  let x:t = exp1 in exp2

suppose you are type-checking a use of variable x in exp2.
a purely bottom-up approach will not have processed the enclosing 
"let" expression yet, so it does not know the declared type t of x.
in other words, we need to do some simple top-down type inference, 
either before or during the type checking.  if every expression were
labeled with its type, no type inference would be needed, and a purely
bottom-up approach would work  (since type checking can be done
bottom-up).

--------------------
added 25apr20004

 >  For our PPL project, the paper mentioned something called "ls"(lockset)
 > in the section 3.2.  Would you please tell us:
 > 
 > How you generate a lockset? (or how it is implemented conceptually?)

assuming that you have expanded "requires" clauses into conditional
atomicities (I think that's the simplest way to deal with them), the
only locksets used in the typing rules are implicitly in [EXP SUB],
because to prove "a squareLessThanOrEqual b", you might need to check
facts of the form "a' squareLessThanOrEqual^h_n b'", where h and h are
locksets.  you can see that the locksets in the rules defining
squareLessThanOrEqual^h_n come from the locks in the conditional
atomicities.

in some approaches, the algorithm might need to accumulate locskets as
it goes down through (possibly nested) "synchronized" expressions.
however, I think this is unnecessary if you do most of the work in a
bottom-up pass (preceded by a top-down pass to gather types for local
variables).

 > Besides, for the typing rule [EXP LET], does it forget to compare t = t1?

you are right.  there is a typo in [EXP LET].  "E, t x" should be "E, t_1 x".

--------------------
added 26apr2004

 >     By the way, can you explain what does the "a' squareLessThanOrEqual^h_n
 > b'"
 >     mean, because I can't understand its definition according to the paper.
 > Then
 >     I don't know how to implement the funtion of "a'
 > squareLessThanOrEqual^h_n b'".

a squareLessThanOrEqual b means that a is a "stronger" atomicity than  b.

as we discussed in lecture (some time ago), the sets h and n
(intuitively, "held" and "not held", respectively) are used to 
detect consistency of assumptions about which locks are held.
such assumptions arise when considering a specific branch of a
conditional atomicity.  for example, in "l?(l?a:b):c", the 
atomicity "b" is "unreachable" and should not affect the 
meaning of this conditional atomicity, hence should not
affect the position of "l?(l?a:b):c" in the squareLessThanOrEqual
ordering.  the ordering squareLessThanOrEqual^h_n is defined
to achieve this.

--------------------
added 26apr2004

in addition to a README (as described above), your submission should
include a file called TESTING that summarizes how you tested your type
checker.  specifically, for each non-trivial (e.g., more than 6 or 8
lines) test program that you used, please include
 1. the program written in reader-friendly notation (i.e., as normal text,
    like the program in atomicity-typing-example, not as a data structure).
 2. a brief description of the output of your type checker, e.g.,
    "NullPointerException" (I hope not! :-), or "the program is well-typed",
    or "type error: the expression ... is expected to have type ... but
    has type ...".  if your program produces voluminous diagnostic
    information during type-checking, you do not need to include all of 
    it here.

--------------------
added 27apr2004

the due date is extended to 4may2004.

there is a minor problem with the typing rules in the paper.  the
grammar in Figure 1 does not contain productions for integer constants.
accordingly, there are no typing rules for integer constants.  however,
the [FIELD NO GUARD] and [FIELD GUARD] rules have the hypothesis 
"P; emptyset |- e : t & mover".  without integer constants, there is 
no way to satisfy this hypothesis when t is "int"!  thus, with the
current rules, every class that declares a field of type "int" is
untypable.

  note: there is no problem when t is a class, 
  because the typing rules support judgments like
  "P; emptyset |- new c : c & mover".

there are two ways to fix this problem.

1. eliminate initializers for integer fields.  in this case, 
you should also prohibit integer fields from being declared 
final.  thus, the production for field declarations becomes

  field ::=   [final]_opt c fd [g]_opt = e 
            | int fd [g]_opt

corresponding minor changes to the typing rules are needed.

2. add integer constants to the language.  specifically, add the
following production to the grammar

  e ::= i

  i \in integer constants

and add the following typing rule:

         P |- E
  ----------------------
  P;E |- i : int & const

all teams should adopt approach 2.  if this causes difficulties, please
let me know.

--------------------
added 30apr2004

In addition to the README and TESTING files, the project submission
must include a file called CONTRIBUTIONS that summarizes each team
member's contributions to the project.  For example, indicate who wrote
or significantly revised each part of the code, and who wrote each
testcase.  Also indicate contributions to design, debugging,
documentation, etc.

If both teammates contributed comparable amounts, they will both
receive the same score for the project.  Otherwise, they will receive
different scores, commensurate with their contributions.

--------------------
added 3may2004

during the demo, students will first be asked to run some testcases
that we have prepared, and then will be asked to run some of the
testcases in their TESTING file, to confirm the accuracy of the results
reported there.  probably there won't be enough time to run all of the
testcases in the TESTING file, so the TA will randomly select testcases
to run.

one way to design a comprehensive set of testcases is to use a
specification-based coverage metric.  for example, the grammar for the
programming language is part of the specification of your type-checker,
so each construct in the programming language should appear in at least
one testcase.  similarly, the definitions of several operations on
atomicities (e.g., ";", "*", and "S(l,a)") are part of the
specification, so for each case in the definition of each of these
operations, there should be a testcase whose outcome depends on correct
implementation of that case.  (you can develop such testcases by
"working backwards" based on how those operations are used in the
typing rules.)  due to time limits, you are not required to develop a
test suite that systematically covers all of those cases, but your test
suite should at least cover some of the more complicated cases.

--------------------
added 3may2004

updated information about testcases and demos:

0. the project submission deadline is 11:59pm on Tuesday.  
the testcases will be posted on the cse526 announcements page at that time.
projects submitted after that time will receive a -15% penalty.
note that your submitted zip file should include the README, TESTING, 
and CONTRIBUTIONS files.  if you can't conveniently submit printouts of
those files before your demo, then you can simply bring the printouts
to the demo.

1. at the beginning of the demo, the TA checks that the files being
used are the same files that were submitted, using of one of these two
methods:

  1.1. at the beginning of the demo, the TA mails the zip file from the
  cse526 account to the students.  the students unzip it and use the files.

  1.2 when the project is submitted, students include in the README a
  checksum and the exact command used to compute it.  for example,

    cat *.java */*.java | /usr/local/bin/md5
    47fdea02d713b22c114faf800e9133cc

  at the start of the demo, students re-compute the checksum on their
  copy of the files, and the TA checks that the value is the same as in
  the submitted README.  if not, use approach 1.1.

  note: if you are using Solaris, you must use /usr/local/bin/md5 to
  compute the checksum.  if you are using Windows, you must use
  approach 1.1.

2. students modify the files from step 1 to contain our testcases, by
cutting-and-pasting text into one of the existing files, or copying a
file containing the testcases into the directory containing the files
from step 1.  the TA looks at the text being pasted or the file being
copied to make sure that it contains only our testcases (and perhaps a
small amount of code to call the typechecker on the testcases).

3. students run the testcases we supplied and then run their own
testcases.

--------------------
added 5may2004

 > is there any difference between some stupid mistakes (like typo in
 > program) and those serious ones?
 > And can we get some of points back if we can modify it to correct
 > program easily?

we will try to distinguish small errors from serious ones.  in part,
this happens automatically, if one believes that serious errors
generally affect more testcases than minor errors.

if your code has small errors that can be fixed by changing a few
lines, write a file called CHANGES that describes (1) the errors and
how to fix them, and (2) your type checker's output for testcases
affected by the change.  bring a printout of this file to the demo.

to be fair, changes of more than a few lines would have to be
considered equivalent to submitting the project late.

successful fixes in your CHANGES file will only earn partial credit.
giving too much credit for them would be unfair to students who are
busy with other committments during the relatively short interval
between yesterday evening and the demos tomorrow.

during the demo, you should first demonstrate your submitted code on
our testcases, so there is a common baseline.  Yiping, after looking at
printouts of your TESTING file and your CHANGES file (if any), will ask
you to demonstrate selected items from them, to confirm the accuracy.
(Demonstrating an item in your CHANGES file means applying the fix
and re-running the testcase.)

demos can be continued at a later time if necessary, but we hope this
will be unnecessary for most teams.

--------------------
added 7may2004

for grading, each team member's contribution to the project is
quantified as an individual contribution factor (ICF).  a person who
does the expected amount of work has an ICF of 1.  specifically, for
teams with 2 people, a person who did half of the team's work would
have an ICF of 1; this should be the norm.  a person who did all of the
team's work would have an ICF of 2.  a person who did less than half of
the team's work would have an ICF less than 1.