next up previous
Next: About this document Up: My Home Page

lec11-1.au Parallel Bubblesort

In order for me to give back your midterms, please form a line and sort yourselves in alphabetical order, from A to Z.  

There is traditionally a strong correlation between the midterm grades and the number of daily problems attempted:

daily: 0, sum: 134, count: 3, avg: 44.67

daily: 1, sum: 0, count: 2, avg: XXXXX

daily: 2, sum: 63, count: 1, avg: 63.00

daily: 3, sum: 194, count: 3, avg: 64.67

daily: 4, sum: 335, count: 5, avg: 67.00

daily: 5, sum: 489, count: 8, avg: 61.12

daily: 6, sum: 381, count: 6, avg: 63.50

daily: 7, sum: 432, count: 6, avg: 72.00

daily: 8, sum: 217, count: 3, avg: 72.33

daily: 9, sum: 293, count: 4, avg: 73.25

lec8-1.au

Show that there is no sorting algorithm which sorts at least tex2html_wrap_inline159 instances in O(n) time. Think of the decision tree which can do this.    What is the shortest tree with tex2html_wrap_inline163 leaves?

f_42.0in

Moral: there cannot be too many good cases for any sorting algorithm! lec8-2.au

Show that the tex2html_wrap_inline165 lower bound for sorting still holds with ternary comparisons. f_5

The maximum number of leaves in a tree of height h is tex2html_wrap_inline167 ,

So it goes for any constant base.

lec12-6.au

Optimization Problems

In the algorithms we have studied so far, correctness tended to be easier than efficiency. In optimization problems, we are interested in finding a thing which maximizes or minimizes some function.  

In designing algorithms for optimization problem - we must prove that the algorithm in fact gives the best possible solution.

Greedy algorithms, which makes the best local decision at each step, occasionally produce a global optimum - but you need a proof!  

Dynamic Programming

Dynamic Programming is a technique for computing recurrence relations efficiently by sorting partial results.   lec12-9.au Computing Fibonacci Numbers

Implementing it as a recursive procedure is easy but slow!  

We keep calculating the same value over and over!

fibonacci

How slow is slow?

tex2html_wrap_inline169

Thus tex2html_wrap_inline171 , and since our recursion tree has 0 and 1 as leaves, means we have tex2html_wrap_inline177 calls! lec12-10.au

What about Dynamic Programming?

We can calculate tex2html_wrap_inline179 in linear time by storing small values:

tex2html_wrap_inline181
tex2html_wrap_inline183
For i=1 to n
\> tex2html_wrap_inline189

Moral: we traded space for time.

Dynamic programming is a technique for efficiently computing recurrences by storing partial results.

Once you understand dynamic programming, it is usually easier to reinvent certain algorithms than try to look them up!

Dynamic programming is best understood by looking at a bunch of different examples.

I have found dynamic programming to be one of the most useful algorithmic techniques in practice:

lec12-11.au Multiplying a Sequence of Matrices

Suppose we want to multiply a long sequence of matrices tex2html_wrap_inline191 .  

Multiplying an tex2html_wrap_inline193 matrix by a tex2html_wrap_inline195 matrix (using the common algorithm) takes tex2html_wrap_inline197 multiplications.

matrix-multiplication-Lmatrix-multiplication-R

We would like to avoid big intermediate matrices, and since matrix multiplication is associative, we can parenthesise however we want.

Matrix multiplication is not communitive, so we cannot permute the order of the matrices without changing the result. lec12-12.au Example

Consider tex2html_wrap_inline199 , where A is tex2html_wrap_inline203 , B is tex2html_wrap_inline207 , C is tex2html_wrap_inline211 , and D is tex2html_wrap_inline215 .

There are three possible parenthesizations:

The order makes a big difference in real computation. How do we find the best order?

Let M(i,j) be the minimum number of multiplications necessary to compute tex2html_wrap_inline219 .

The key observations are

lec12-13.au

A recurrence for this is:

If there are n matrices, there are n+1 dimensions.

A direct recursive implementation of this will be exponential, since there is a lot of duplicated work as in the Fibonacci recurrence.

Divide-and-conquer is seems efficient because there is no overlap, but ...

There are only tex2html_wrap_inline231 substrings between 1 and n. Thus it requires only tex2html_wrap_inline237 space to store the optimal cost for each of them.

We can represent all the possibilities in a triangle matrix. We can also store the value of k in another triangle matrix to reconstruct to order of the optimal parenthesisation.

The diagonal moves up to the right as the computation progresses. On each element of the kth diagonal |j-i| = k.

For the previous example:

lec13-3.au

Procedure MatrixOrder
for i=1 to n do M[i, j]=0
for diagonal=1 to n-1
\> for i=1 to n-diagonal do
\> \> j=i+diagonal
\> \> tex2html_wrap_inline261
\> \> faster(i,j)=k
return [m(1, n)]

Procedure ShowOrder(i, j)
if (i=j) write ( tex2html_wrap_inline271 )
else
\> k=factor(i, j)
\> write ``(''
\> ShowOrder(i, k)
\> write ``*''
\> ShowOrder (k+1, j)
\> write ``)'' lec13-4.au A dynamic programming solution has three components:  

  1. Formulate the answer as a recurrence relation or recursive algorithm.
  2. Show that the number of different instances of your recurrence is bounded by a polynomial.
  3. Specify an order of evaluation for the recurrence so you always have what you need.
lec13-5.au

Approximate String Matching

A common task in text editing is string matching - finding all occurrences of a word in a text.     

Unfortunately, many words are mispelled. How can we search for the string closest to the pattern?

Let p be a pattern string and T a text string over the same alphabet.

A k-approximate match between P and T is a substring of T with at most k differences.

Differences may be:

  1. the corresponding characters may differ: KAT tex2html_wrap_inline295 CAT
  2. P is missing a character from T: CAAT tex2html_wrap_inline301 CAT
  3. T is missing a character from P: CT tex2html_wrap_inline307 CAT

Approximate Matching is important in genetics as well as spell checking. lec13-6.au A 3-Approximate Match

A match with one of each of three edit operations is:

P = unescessaraly

T = unnecessarily

Finding such a matching seems like a hard problem because we must figure out where you add blanks, but we can solve it with dynamic programming.

D[i, j] = the minimum number of differences between tex2html_wrap_inline315 and the segment of T ending at j.

D[i, j] is the minimum of the three possible ways to extend smaller strings:

  1. If tex2html_wrap_inline323 then D[i-1, j-1] else D[i-1, j-1]+1 (corresponding characters do or do not match)
  2. D[i-1, j]+1 (extra character in text - we do not advance the pattern pointer).
  3. D[i, j-1]+1 (character in pattern which is not in text).

Once you accept the recurrence it is easy.

To fill each cell, we need only consider three other cells, not O(n) as in other examples. This means we need only store two rows of the table. The total time is O(mn). lec13-7.au Boundary conditions for string matching

What should the value of D[0,i] be, corresponding to the cost of matching the first i characters of the text with none of the pattern?  

It depends. Are we doing string matching in the text or substring matching?

In both cases, D[i,0] = i, since we cannot excuse deleting the first i characters of the pattern without cost.

lec13-8.au

What do we return?

If we want the cost of comparing all of the pattern against all of the text, such as comparing the spelling of two words, all we are interested in is D[n,m].

But what if we want the cheapest match between the pattern anywhere in the text? Assuming the initialization for substring matching, we seek the cheapest matching of the full pattern ending anywhere in the text. This means the cost equals tex2html_wrap_inline355 .

This only gives the cost of the optimal matching. The actual alignment - what got matched, substituted, and deleted - can be reconstructed from the pattern/text and table without an auxiliary storage, once we have identified the cell with the lowest cost. lec13-10.au How much space do we need?  

Do we need to keep all O(mn) cells, since if we evaluate the recurrence filling in the columns of the matrix from left to right, we will never need more than two columns of cells to do what we need. Thus O(m) space is sufficient to evaluate the recurrence without changing the time complexity at all.

Unfortunately, because we won't have the full matrix we cannot reconstruct the alignment, as above.

Saving space in dynamic programming is very important. Since memory on any computer is limited, O(nm) space is more of a bottleneck than O(nm) time.

Fortunately, there is a clever divide-and-conquer algorithm which computes the actual alignment in O(nm) time and O(m) space.




next up previous
Next: About this document Up: My Home Page

Steve Skiena
Tue Sep 15 16:29:04 EDT 1998