Lecture 1 (09/05/00) Overview: program construction and problem solving, program transformation and program analysis, tools and applications; two examples: longest common subsequence and image blurring. Lecture 2 (09/07/00) Problems in programming: clarity and simplicity vs efficiency and language-specific details, good functional and object-oriented programming, a simple list processing example. Exercises: Handout E1. Reading: Chapter 5 of the textbook "Introduction to Algorithms" by Cormen, Leiserson, and Rivest. _______________________________________________________________________________ Handout syllabus, form to fill. Call me Annie. What is this subject like? what is this course like? Motivation: Q: a central task/activity in CS? construction of correct, efficient computer programs, productively. anything else? and arguments? not really. cost drives everything. Q: what subjects in CS are the basis for addressing this? algorithms, programming anything else? arguments? not really, but need to combine them. Q: what attract you to study CS? easier to get better jobs? not all, o.w., you would do business: get twice as much the subjects above, a kind of prob solving! Q: so, what is problem solving in CS? precisely: problem specification ----> machine executable code e.g. finding the greatest common divisor of two numbers factorial of a given number; Fibonacci function; Ackermann function sort a list of numbers; searching longest common subsequence; paragraph formatting; ... many in network and distributed systems (routing): MST, shortest path hardware design (circuit design, layout): multiply two num's compilers (lex anal,parsing,sem anal,dataflow anal,code gen): verification: reachability, SCC, topo sort database query: security (computing security levels): SCC, topo sort too Q: what's important (forms challenge/difficulty) in such problem solving? precise (down to earth, realized in bits), correct, efficient go through problem analysis/specification --- first understanding problem design, code, test/debug/profile/performance tuning/(analyze/verify) --- then solve problem possibly re-analyze, design, code, test... e.g., reachability, in a graph. say code in C. is it a directed graph? what strategies/steps? what precise data format/representation? confidence in correctness? efficiency? first challenge: problem analysis/specification, will see but usually can only do well after knowing the rest well. Q: 2 major aspects of such problem solving? algorithms, in pseudo code, flow graph, English, some "spec" lang: UML. programming, in particular programming language: C, Java, List, ... latter is lower-level, and former is higher, as shown in companies. also lower-level/non-CS courses vs ..., with data struct in middle. so algorithm is at higher-level, more critical in ways. but way from enough to meet the challenges: lots of work to go from pseudo code to code. lack systematic methods for design (mimicking a classical algorithm) goal: look at how to design algorithms, how analysis can guide design, and how to do so systematically, even automatically, by transformation not merely study designed algorithms and analysis of their efficiency, not manually turn design / pseudo code into code. (looking for methods, algorithms for algorithm design, at a meta level Algorithms: steps, procedure for problem solving. in Webster: a procedure for solving a mathematical problem (as of finding the greatest common divisor) in a finite number of steps that frequently involves repetition of an operation; broadly: a step-by-step procedure for solving a problem or accomplishing some end) mean: do design precisely/formally based on languages, by program transformation --- programming for algorithms look at systematic methods (steps) for such design and analysis --- algorithms for programming if fully succeed, complete problem solving. that will be in the future. even if partly, significant improvements to the state of art. already better design/analysis for many real-world problems than done by CS experts. (much more complex than textbook algorithms) more goal: so can focus on interesting parts of problem analysis and problem solving. also look at what existing methods can't do, for research. mean: collect important problems and algorithms, from textbooks and applications talk about best general methods known, with known derivations as examples. try derivation yet undone (new sol) and on new problems (first sol), may lead to new general methods Summary of motivation/goals/problems: program construction correctness/clarity, efficiency, productivity (clearer code is easier to see correct or not, and more importantly, easier to maintain for productivity/cost reasons) issues: specification: this course starts with executable programs. trade-offs: (correctness <--> efficiency) <--> productivity Methods and techniques: transformational programming, step-wise refinement; (synthesis) object-oriented programming, composable software; (reuse, modification) program transformation: mostly source-to-source, optimization basic algebraic properties: (0) a-a =0, a and a =a, -- primitives first(pair(a,b)) =a, -- data structures if true a b =a -- control structures fun fold,unfold,inlining, loop unrolling, peeling -- abstraction, modularity, reuse composition / fusion / stream processing / deforestation: e.g., sum squares of numbers in a list composing list traversal, squares, and sum (reuse code) but removing intermediate list of squares. specialization / partial eval / mixed comp / staging: e.g., if f(x,y) def= x*x+y, then f(5,y)=25+y. specialized for x=5, more efficient especially when called repeated on different y's incrementalization / finite differencing / memoization / tabulation / promotion & accumulation / tupling... e.g., if sum(x) def= ..., then if sum(x)=r, x'=x plus new element y then sum(y')=y+r, call this sum'(y,r). e.g., if #S=r, then #(S U {a}) = r + 1 crucial if #S is used in a loop body that adds elements data representation selection: set -> list, arrays. program analysis: abstraction: e.g., live or dead, not value; date only, not all values, for Y2K problem. explain using examples in (0) above dependencies: forward vs backward e.g., dead-code analysis is a backward analysis looking for date dependent computation is forward data-flow analysis: abstract interpretation: set constraints: types: Tools: compilers; language-based environments; visualization tools; ... tool generators; compiler vs program manipulation environment: (like advanced compilers; different from traditional compilers) Compilers: Program Manipulation Environments: stand alone interactive functional reactive batch incremental imperative declarative, constraints-oriented automatic semi-automatic many additional issues: pretty print, editing, tool interaction Applications: solving many problems systematically. two examples. 1. longest common subsequence the following function computes the length of the longest common subsequence of sequences $x$ and $y$ of lengths $n$ and $m$, respectively. (much easier to write this recursion) lcs(n,m) = if n=0 or m=0 then 0 else if x[n]=y[m] then 1+lcs(n-1,m-1) else max(lcs(n,m-1),lcs(n-1,m)) this function contains repeated function calls, and takes exponential time. write a program for this function that has no repeated function calls and takes only O(n*m) time. Clearly, the optimized program is an exponential factor faster. 2. image blurring the following program takes an n-by-n image in array a, blurs it into array s, and takes n*n*m*m time. (much easier to write 4 nested loops) for i = 0 to n-m for j = 0 to n-m s[i,j]=0; for k = 0 to m-1 for l = 0 to m-1 s[i,j]=s[i,j]+a[i+k,j+l] s[i,j]=s[i,j]/(m*m) end end write an efficient program that computes exactly the same but takes only O(n*n) time. The optimized program is again much more difficult to write, but it can be many many times faster depending on m. _______________________________________________________________________________ move to next Wed 12-5p. will email to let you the meeting time and place. OK Wed. 12-2:30p, still in CS 2212. Review: a CS central task: constructing programs 2 levels: algorithm design at higher level, programming at lower level preview ideas of some transformations and analyses. longest common subsequence, image blurring are bonus-like homework. today: example programming problems before we study design and analysis later. CS is data processing, information processing. handle lots of data. how to represent data, so we can talk about them, do computation on them? We used arrays for longest common subsequence and image blurring. Other data structures include simple lists or complicated graphs. 1. a simple list processing example. TO SHOW ALL THE PROBLEMS WE WANT TO AVOID! Write a program that takes an element i of a list x and x, and returns the rest of list without the first occurrence of i. First a specification problem, can we update list x? if x is used for other purposes, then we can not update it; otherwise we can (and should, since it save allocating new cells and freeing the old cells, much more efficient). Code that updates pointers is always difficult to write correctly. So assume that we just want to return a new list. in C: define a list struct. for simplicity, use list of int only typedef struct { int head; void* tail; } List; code? there are >100 ways of writing it, using a loop. I have one, but I am not giving it here. try your best in the exercises can write a recursive function (functional programming). it is much easier to see that it is correct. List* rest(int i, List* x) { if (i == x->head) { return x->tail; } else { List* cell = (List *)malloc(sizeof(List)); (1) cell->head = x->head; (2) cell->tail = rest(i,x->tail); (3) return cell; (4) } } can define a constructor mkList (object-oriented programming) and change lines (1)-(4) above with (5) below. the code is clearer, more modular, allows more reuse. List* mkList(int a, List* b) { List* cell = (List *)malloc(sizeof(List)); cell->head = a; cell->tail = b; return cell; } return mkList(x->head,rest(i,x->tail))". (5) other language specific things (driver, input/output, more memory management) driver: int main(int argc, char* argv[]) { ...(rest(3, mkList(1,mkList(3,NULL)))); (6) } input and output: here output only even replace ... in (6) with printList void printList(list * x) { list* temp = x; printf("[ "); while (temp != NULL) { printf("%i ",temp->car); temp = temp->cdr; } printf("]\n"); } memory management: free a list of int. need to assign the list in (6) to a variable, call freeList on it after rest. void freeList(list* x) { list* cur = x; list* next; while (cur != NULL) { next = cur->cdr; free(cur); cur = next; } } summarize: high-level design is simple, but has too much low-level stuff in Java: OO. should define toString() instead of a separate printList. no need to free memory, has garbage collection. class List { int head; List tail; public List (int hd, List tl) { head = hd; tail = tl; } public String toString() { return tail==null ? head+"\n" : head+" "+tail.toString(); } } class Rest { static List rest (int i, List x) { if (i==x.head) return x.tail; else return (new List(x.head,rest(i,x.tail))); } public static void main (String[] args) { System.out.print(rest(3,new List(1,new List(3,new List(2,null))))); } } could define printList in class Rest and call it instead of System.out.print. but would be bad OO style. public static void printList (List x) { while (x!=null) { System.out.print(x.head+" "); x=x.tail; } System.out.println(); } in Lisp: built-in list, succinct, but strange name, also can not add new; not typed (defun rest (x l) (if (equal x (car l)) (cdr l) (cons (car l) (rest x (cdr l))))) driver: (rest '3 (cons 1 (cons 3 nil))) or (rest '3 '(1 3))) return value of the exp is printed; no need to write one for output. auto GC in Scheme: similar to Lisp (define (rest x l) (if (equal? x (car l)) (cdr l) (cons (car l) (rest x (cdr l))))) same driver: (rest '3 '(1 3)) same output, could use (display ...) auto GC in ML: (typed, better than Java in terms its polymorphism; but types can get in way, as in Java sometimes; and type errors can be hard to understand) fun rest (x, head::tail) = if (x = head) then tail else head::(rest x tail); driver: rest 3 [1 3]) output: use output (out, makeString n) need a loop or recursion. auto GC Summarize: if you don't undertand all the details, it is not a big problem, as long as you understand the following point: functional, object-oriented, GC -> easier for writing better code. easier to write and easier to understand what it is doing, which is the no 1 most important thing in programming. GC: in many new languages, not in C but C allows finer performance tuning. object-oriented styles, other language specific things: important but can be done with automatic code generation functional: e.g., easier to write recursions on recursive data like lists but inefficient, e.g, using a loop for rest can be 30 times faster. What we will do: First, we write something like below rest(x,l) def= if x=head(l) then tail(l) else mkList(head(l),rest(x,tail(l))) Then, use systematic transformation to improve efficiency, including efficient memory management. can use well-known compiler technology to generate code in specific lang's more good (clearer, and easier to write) and bad (even less efficient) examples of list manipulation using functional style code: append(a,b) def= if empty(a) then b else mkList(head(a),append(tail(a),b)) takes |a| recursive calls, each allocating a new cell. reverse(l) def= if empty(l) then emptyList else append(reverse(tail(l)),mkList(head(l),emptyList)) takes quadratic time. fac(n) def= if n=1 then 1 else n*fac(n-1) List is actually simple. What about graphs? 2. a simple graph reachability example. given a graph and a set of nodes, find all the nodes in the graph reachable from the given set of nodes. If write C or Java, a blackboard, like we did for rest, is not large enough. But using a functional language, it is nontrivial to write graph algorithms. But there are methods that allow us to First, describe the precise thing we want in one line: if a node is reachable, then we can follow an edge to get another node that is reachable. Second, systematically (actually there is a system that automatically) transforms it into 100-200 lines of C code that has all the right data structures and manipulations. e.g., adjacency-list representation of the graphs. queue for depth-first search or stack for breath-first search. We will continue next time.