CSE 504: Advanced Compiler Design

[Announcements][Projects]



Instructor: Radu Grosu (grosu@cs.sunysb.edu),
Class: MonWed  5:20pm - 6:40pm, CS 2311
Office hours: MonWed 12:00-2:00pm, CS Building room 1425, and by appointment

TA: Jonghaw Lee (jlee@cs.sunysb.edu)
Office hours: TueThu 4-5pm, CS Building room 2110

TA: Xiaohan Zhang (xhzhang@cs.sunysb.edu)
Office hours: Tue 2:30-4:30pm, CS Building room 1207


Contents


Objectives

Compilers and interpreters are very widely used. Every programmer uses these tools to execute programs written in high level languages. It is hence essential for a computer scientist to understand  the process by which programs written in high-level languages are translated and executed. The main objective of this course is to gain an in depth understanding of the above process.  Most complex software systems are not monolithic; they are programmable using special purpose languages. An understanding of the language translation process thus plays an important role in the design of real life systems.

Prerequisites

You must have completed CSE 214, CSE 220, and CSE 303.

Being a project intensive course, a good working experience with Java programming and the UNIX environment (not just familiarity) is essential. You will also need to know how to set up "make"  files and use symbolic debuggers. At the very least, you should be able to pick these up within the first couple of weeks.

Familiarity with concepts from object oriented languages (classes, inheritance etc.) will be useful but not necessary: I will briefly cover the fundamental concepts we will need for the projects.


Overview

In the class, we will discuss the theoretical aspects of designing a compiler. In the projects, you will then apply the theory you have learned in the class to develop a complete compiler for a high level language. The problem of language translation is traditionally decomposed into many phases. Most common are: All phases use a symbol table to keep track of the properties (types, number of parameters etc.) of each symbol (variables, procedures etc.). In the end, code is generated for an abstract machine  which may be interpreted (as in Java), or may be translated further into machine instructions (as in C). We will cover each of these topics in detail.

The compiler will be written in Java.  Advanced topics such as program optimization, machine code generation and storage management will be discussed in the lectures.


Reading

The course will use the following textbooks: Slides will be posted as links in the tentative schedule below. Following books are recommended for additional reading:


Software

We will use the Java programming language and the JavaCC compiler compiler.  Documentation and free download may be found at the associated web pages. In doing the projects you might want to use the JBuilder javacc support plugin. It is very small and easy to install and does syntax highlighting. I also displays the .jj and .jjt file structure in the structure pane.
 


Communication

This class is project intensive and you may want to exchange ideas and experience with each other. The best way to do this is by using the news group created specially for cse504. To subscribe to this group, proceed as follows (e.g. with  the netscape browser): (1) right click the news.sunysb.edu, (2) select subscribe to Newsgroups, (3) choose the search panel and search for sbcs.cse504, (4) select the sbcs.cse504 newsgroup and click on the subscribe button. Now you are ready to send and retrieve messages.

Grading

Your performance on the programming projects will be the major part of the final grades. In addition, there will be a mid-term exam and a final exam. The relative weights for the above components will be: The grade for projects is split between programming assignments (30%) and project presentation (10%). Near the end of the semester, each student will make a presentation of a particular portion of his/her project work to the instructor and TAs. The portion of project to be presented will be determined by the instructor and the TAs at that time; the student is expected to come prepared on all aspects of his/her code. The motivation for the presentation is to guage the understanding and insight the student had obtained by performing the project. There will be 6 projects in all. Students will work in teams of two on the projects. Both members of the team are expected to know the code.

You are advised to start working on the projects at the earliest possible time even if the deadlines are far away. The date the projects are due will be clearly specified in the project handout, on the web; any changes in deadlines will be posted on the course homepages.

Projects are due by 11:59pm on the specified date. Late projects will be penalized at the rate of 20% per day. So, there is no point submitting a project more than 4 days late! Each team can submit one project late by seven calendar days (1 week) without late penalty. You don't need to send me mail about this. No extensions will be allowed beyond this.


Your Rights and Responsibilities


Special Needs

If you have a physical, psychological, medical or learning disability that may impact on your ability to carry out assigned course work, you are urged to contact the staff in the Disabled

Student Services office (DSS), Room 133 Humanities, 632-6748/TDD. DSS will review your concerns and determine, with you, what accommodations are necessary and appropriate.

All information and documentation of disability is confidential.

The Importance of Being Earnest

Because a primary goal of the course is to teach professionalism, any academic dishonesty will be viewed as evidence that this goal has not been achieved. Any act of cheating will be treated with utmost seriousness.

You can discuss the course material with other students, but not the homework assignments themselves. In effect, you can discuss the problems but not the solutions. If you help another student with a homework, use examples that do not resemble those in the homework. Remember that there are many different ways to solve the same problem; even solutions with the same central idea can be formulated in many different ways. Therefore, suspiciously similar homework solutions will be considered as evidence of disallowed collaboration or copying.

In case you have any questions about whether an act of collaboration may constitute "cheating", please come and talk to the instructor beforehand to clarify the issue.

Copying an assignment from another student in this class or obtaining a solution from some other source will lead to an automatic F for this course and to a disciplinary action. Allowing another student to copy one's work will be treated as an act of academic dishonesty, leading to the same penalty as copying. You should learn how to protect your data. Failure to do so is also unprofessional and it may expose you to the danger that someone will copy your homework and will submit it as his or her own (see above). In this case, you may be given a score of 0 for the assignment in question (and the other party will get an F).

 All cases of academic dishonesty will be reviewed by the Engineeing College's committee (CASA).

Survival Tips

Do not postpone working on assignments. Start working on programming assignments as soon as they are handed out. Do not wait till the day before the deadline. You will see that assignments take much more time when you work on them under pressure, than when you are more relaxed. Remember that no late submissions are allowed.

Do not postpone working on assignments! This cannot be understated. Despite the above warning, most students will end up working only around the deadline. Remember, the homeworks usually take more time that it initially appears. Furthermore, I expect both the TA and me to be swamped on the office hours before projects are due. So, you, being wiser than the rest, should start earlier and beat the rush!

Design before you code. Writing a well designed piece of code is always easier than staring with some code that "almost works" and adding patches to make it "really work".

Follow good programming discipline. Why spend hours debugging when you can party? A little bit of thought before you do some esoteric pointer manipulation will save you a lot of time, headache and may be a grade!


Tentative Schedule


 
  Date Topic Chapter Projects
1. Jan  22 Organization. Overview of Compilation.  l1 1, 2  
2. Jan  24 Lexical Analysis: Regular Expression & Definitions  l3, l4 3.1-3.3  
3. Jan  29  Lexical Analysis: RE<->NFA<->DFA  l4, l5,  Javacc 1 2 3 4 5 6 7 8 9 10 3.4-3.6  
4. Jan  31 Grammars, Recursive Descent Parsing.  l6, l7 4.4 P1 Out
5. Feb  5 LL Parsing.   l8, l9 4.4  
6. Feb  7 Bottom-up Parsing,   l10 4.5  
7. Feb  12 LR Parsers,   l11 4.7 P1 due, P2 out
8.  Feb  14 Item set construction, 1 2 3 4 SLR, LR, LALR,   l12 4.7  
9. Feb  19 Attributes,   l16, l17 4.9  
10. Feb  21 Syntax-Directed Translation,   l15, att 5.1  
11. Feb  26 Symbol Tables,   l13, st 7.6  
12. Feb  28 Name Resolution,    l14, l15    P2 due,P3 out
13. Mar  5 Abstract Syntax Trees l18, ast 5.2  
14. Mar  7 Review for Mid-Term Exam    
15. Mar  12 Mid-Term Exam.     Previous: 00e, 00a; 98e, 98a; 97e, 97a,    
16. Mar 14 Types: type checking expressions and operations l19 6.1, 6.2  
17. Mar 19 Types & Equivalences l20 TypeChecking.ppt 6.3, 6.4   P3 due, P4 out
18. Mar 21 Types in OO-languages: method resolution l21 6.5  
19. Mar 26 Intermediate code generation: languages, expressions  l22, l24 8.1-8.4  
20. Apr 28 Runtime storage organization l23, pCall 7.2, 7.3  
*** Apr 2 Spring recess (no class)    
*** Apr 4 Spring Recess (no class)    
21. Apr 9 Intermediate code generation: statements. Optimizations  l25, l26 8.4-8.7, 9.4  P4 due, P5 out
22. Apr 11 Control-Flow Analysis CFA): blocks, flow-graphs, loops l27.5-9.ppt 9.4,10.1,10.4  
23. Apr 16 CFA:  l28.15-24.ppt

24. Apr 18 Data-Flow Analysis (DFA):   l29.25-34.ppt    
25. Apr 23 DFA: dags, bbOpt.  Global DFA:paths, reaching defs, struct Progs,  l p35-47.ppt    
26. Apr 25 DFA: iterative solution, structure-based solution,  
 
P5 due
27. Apr 30 Review for Final Test    
28. May 2  Final (11:00-13:30)   Previous: 98e ,98a; 97e, 97a    


Last updated on Feb 7, 2007 by Radu Grosu