In Spring 1996, I taught my Analysis of Algorithms course via EngiNet, the SUNY Stony Brook distance learning program. Each of my lectures that semester was videotaped, and the tapes made available to off-site students. I found it an enjoyable experience.
As an experiment in using the Internet for distance learning, we have digitized the complete audio of all 23 lectures, and have made this available on the WWW. We partitioned the full audio track into sound clips, each corresponding to one page of lecture notes, and linked them to the associated text and images.
In a real sense, listening to all the audio is analogous to sitting through a one-semester college course on algorithms! Properly compressed, the full semester's audio requires less than 300 megabytes of storage, which is much less than I would have imagined. The entire semesters lectures, over thirty hours of audio files, fit comfortably on The Algorithm Design Manual CD-ROM, which also includes a hypertext version of the book and a substantial amount of software.
Lecture Schedule
| subject | topics | reading |
| Preliminaries | Analyzing algorithms | 1-32 |
| " | Asymptotic notation | 32-37 |
| " | Recurrence relations | 53-64 |
| Sorting | Heapsort | 140-150 |
| " | Quicksort | 153-167 |
| " | Linear Sorting | 172-182 |
| Searching | Data structures | 200-215 |
| " | Binary search trees | 244-245 |
| " | Red-Black trees:insertion | 262-272 |
| `` | Red-Black trees:deletion | 272-277 |
| MIDTERM 1 | ||
| Comb. Search | Backtracking | |
| " | Elements of dynamic programming | 301-314 |
| " | Examples of dynamic programming | 314-323 |
| Graph Algorithms | Data structures | 465-477 |
| for graphs | ||
| " | Breadth/depth-first search | 477-483 |
| " | Topological Sort/Connectivity | 485-493 |
| " | Minimum Spanning Trees | 498-510 |
| " | Single-source shortest paths | 514-532 |
| " | All-pairs shortest paths | 550-563 |
| MIDTERM 2 | ||
| Intractability | P and NP | 916-928 |
| " | NP-completeness | 929-939 |
| " | NP-completeness proofs | 939-951 |
| " | Further reductions | 951-960 |
| " | Approximation algorithms | 964-974 |
| " | Set cover / knapsack heuristics | 974-983 |
| FINAL EXAM |
What Is An Algorithm?
Algorithms are the ideas behind computer programs.
An algorithm is the thing which stays the same whether the program is in Pascal running on a Cray in New York or is in BASIC running on a Macintosh in Kathmandu!
To be interesting, an algorithm has to solve a general, specified problem. An algorithmic problem is specified by describing the set of instances it must work on and what desired properties the output must have.
Example: Sorting
Output: the permutation (reordering) of the input sequence such as
.
We seek algorithms which are correct and efficient.
Correctness
For sorting, this means even if (1) the input is already sorted, or (2) it contains repeated elements.
Correctness is Not Obvious!
The following problem arises often in manufacturing and transportation testing applications.
Suppose you have a robot arm equipped with a tool, say a soldering iron. To enable the robot arm to do a soldering job, we must construct an ordering of the contact points, so the robot visits (and solders) the first contact point, then visits the second point, third, and so forth until the job is done.
Since robots are expensive, we need to find the order which minimizes the time (ie. travel distance) it takes to assemble the circuit board.
Nearest Neighbor Tour
A very popular solution starts at some point
and then walks to its nearest neighbor
first,
then repeats from
, etc. until done.
Pick and visit an initial point
![]()
![]()
i = 0
While there are still unvisited points
i = i+1
Let
be the closest unvisited point to
![]()
Visit
![]()
Return to
from
![]()
This algorithm is simple to understand and implement and very efficient. However, it is not correct!
Closest Pair Tour
Always walking to the closest point is too restrictive, since that point might trap us into making moves we don't want.
Another idea would be to repeatedly connect the closest pair of points whose connection will not cause a cycle or a three-way branch to be formed, until we have a single chain with all the points in it.
Let n be the number of points in the set
![]()
For i=1 to n-1 do
For each pair of endpoints (x,y) of partial paths
If
then
,
, d = dist(x,y)
Connect
by an edge
Connect the two endpoints by an edge.
Although it works correctly on the previous example, other data causes trouble:
A Correct Algorithm
We could try all possible orderings of the points, then select the ordering which minimizes the total length:
![]()
For each of the n! permutations
of the n points
If
then
and
![]()
Return
![]()
Since all possible orderings are considered, we are guaranteed to end up with the shortest possible tour.
Because it trys all n! permutations, it is extremely slow, much too slow to use when there are more than 10-20 points.
No efficient, correct algorithm exists for the traveling salesman problem, as we will see later.
Efficiency
"Why not just use a supercomputer?"
Supercomputers are for people too rich and too stupid to design efficient algorithms!
A faster algorithm running on a slower computer will always win for sufficiently large instances, as we shall see.
Usually, problems don't have to get that large before the faster algorithm wins.
Expressing Algorithms
In order of increasing precision, we have English, pseudocode, and real programming languages. Unfortunately, ease of expression moves in the reverse order.
I prefer to describe the ideas of an algorithm in English, moving to pseudocode to clarify sufficiently tricky details of the algorithm.
The RAM Model
Algorithms are the only important, durable, and original part of computer science because they can be studied in a machine and language independent way.
The reason is that we will do all our design and analysis for the RAM model of computation:
We measure the run time of an algorithm by counting the number of steps.
This model is useful and accurate in the same sense as the flat-earth model (which is useful)!
Best, Worst, and Average-Case
The worst case complexity of the algorithm is the function defined by the maximum number of steps taken on any instance of size n.
The average-case complexity of the algorithm is the function defined by an average number of steps taken on any instance of size n.
Each of these complexities defines a numerical function - time vs. size!
Insertion Sort
One way to sort an array of n elements is to start with
empty list,
then successively insert new elements in the proper position:
At each stage, the inserted element leaves a sorted list, and after n insertions contains exactly the right elements. Thus the algorithm must be correct.
But how efficient is it?
Note that the run time changes with the permutation instance! (even for a fixed size problem)
How does insertion sort do on sorted permutations?
How about unsorted permutations?
Exact Analysis of Insertion Sort
Count the number of times each line of pseudocode will be executed.
| Line | InsertionSort(A) | #Inst. | #Exec. |
| 1 | for j:=2 to len. of A do | c1 | n |
| 2 | key:=A[j] | c2 | n-1 |
| 3 | /* put A[j] into A[1..j-1] */ | c3=0 | / |
| 4 | i:=j-1 | c4 | n-1 |
| 5 | while | c5 | tj |
| 6 | A[i+1]:= A[i] | c6 | |
| 7 | i := i-1 | c7 | |
| 8 | A[i+1]:=key | c8 | n-1 |
Within the for statement, "key:=A[j]" is executed n-1 times.
Steps 5, 6, 7 are harder to count.
Let
the number of elements that have to be slide right
to insert the jth item.
Step 5 is executed
times.
Step 6 is
.
Add up the executed instructions for all pseudocode lines to get the run-time of the algorithm:
What are the
?
They depend on the particular input.
Best Case
Hence, the best case time is
where C and D are constants.
Worst Case
Problem 1.2-6: How can we modify almost any algorithm to have a good best-case running time?
For sorting, we can check if the values are already ordered, and if so output them. For the traveling salesman, we can check if the points lie on a line, and if so output the points in that order.
The supercomputer people pull this trick on the linpack benchmarks!
Because it is usually very hard to compute the average running time, since we must somehow average over all the instances, we usually strive to analyze the worst case running time.
The worst case is usually fairly easy to analyze and often close to the average or real running time.
Exact Analysis is Hard!
We have agreed that the best, worst, and average case complexity of an algorithm is a numerical function of the size of the instances.
Thus it is usually cleaner and easier to talk about upper and lower bounds of the function.
This is where the dreaded big O notation comes in!
Since running our algorithm on a machine which is twice as fast will effect the running times by a multiplicative constant of 2 - we are going to have to ignore constant factors anyway.
Names of Bounding Functions
Now that we have clearly defined the complexity functions we are talking about, we can talk about upper and lower bounds on it:
Got it?
C,
, and
are all constants independent of n.
All of these definitions imply a constant
beyond which they are
satisfied.
We do not care about small values of n.
O,
, and
(a)
if there exist positive constants
,
, and
such that to the right of
,
the value of f(n) always lies between
and
inclusive.
(b) f(n) = O(g(n)) if there are positive constants
and c
such that to the right of
, the value of f(n) always lies on or
below
.
(c)
if there are positive constants
and
c such that to the right of
, the value of f(n) always lies on
or above
.
Asymptotic notation
are as well as we can practically
deal with complexity functions.
What does all this mean?
Think of the equality as meaning in the set of functions.
Note that time complexity is every bit as well defined a function as
or you bank account as a function of time.
Testing Dominance
f(n) dominates g(n) if
,
which is the same as saying g(n)=o(f(n)).
Note the little-oh - it means ``grows strictly slower than''.
Knowing the dominance relation between common functions is important because we want algorithms whose time complexity is as low as possible in the hierarchy. If f(n) dominates g(n), f is much larger (ie. slower) than g.
dominates
if a > b since
Complexity 10 20 30 40 50 60 n 0.00001 sec 0.00002 sec 0.00003 sec 0.00004 sec
0.00005 sec 0.00006 sec
0.0001 sec 0.0004 sec 0.0009 sec 0.016 sec
0.025 sec 0.036 sec
0.001 sec 0.008 sec 0.027 sec 0.064 sec
0.125 sec 0.216 sec
0.1 sec 3.2 sec 24.3 sec 1.7 min
5.2 min 13.0 min
0.001 sec 1.0 sec 17.9 min 12.7 days
35.7 years 366 cent
0.59 sec 58 min 6.5 years 3855 cent
cent
cent
Logarithms
It is important to understand deep in your bones what logarithms are and where they come from.
A logarithm is simply an inverse exponential function.
Saying
is equivalent to saying that
.
Exponential functions, like the amount owed on a n year
mortgage at an interest rate of
per year, are functions which grow distressingly fast,
as anyone who has tried to pay off a mortgage knows.
Thus inverse exponential functions, ie. logarithms, grow refreshingly slowly.
Binary search is an example of an
algorithm.
After each comparison, we can throw away half the possible number of keys.
Thus twenty comparisons suffice to find any name in the million-name Manhattan
phone book!
If you have an algorithm which runs in
time, take it,
because this is blindingly
fast even on very large instances.
Properties of Logarithms
Recall the definition,
.
Asymptotically, the base of the log does not matter:
Thus,
,
and note that
is just a constant.
Asymptotically, any polynomial function of n does not matter:
since
, and
.
Any exponential dominates every polynomial. This is why we will seek to avoid exponential time algorithms.
Federal Sentencing Guidelines
2F1.1. Fraud and Deceit; Forgery; Offenses Involving Altered or Counterfeit Instruments other than Counterfeit Bearer Obligations of the United States.
(a) Base offense Level: 6
(b) Specific offense Characteristics
(1) If the loss exceeded $2,000, increase the offense level as follows:
| Loss(Apply the Greatest) | Increase in Level |
| (A) $2,000 or less | no increase |
| (B) More than $2,000 | add 1 |
| (C) More than $5,000 | add 2 |
| (D) More than $10,000 | add 3 |
| (E) More than $20,000 | add 4 |
| (F) More than $40,000 | add 5 |
| (G) More than $70,000 | add 6 |
| (H) More than $120,000 | add 7 |
| (I) More than $200,000 | add 8 |
| (J) More than $350,000 | add 9 |
| (K) More than $500,000 | add 10 |
| (L) More than $800,000 | add 11 |
| (M) More than $1,500,000 | add 12 |
| (N) More than $2,500,000 | add 13 |
| (O) More than $5,000,000 | add 14 |
| (P) More than $10,000,000 | add 15 |
| (Q) More than $20,000,000 | add 16 |
| (R) More than $40,000,000 | add 17 |
| (Q) More than $80,000,000 | add 18 |
The federal sentencing guidelines are designed to help judges be consistent in assigning punishment. The time-to-serve is a roughly linear function of the total level.
However, notice that the increase in level as a function of the amount of money you steal grows logarithmically in the amount of money stolen.
This very slow growth means it pays to commit one crime stealing a lot of money, rather than many small crimes adding up to the same amount of money, because the time to serve if you get caught is much less.
The Moral: ``if you are gonna do the crime, make it worth the time!''
Working with the Asymptotic Notation
Suppose
and
.
What do we know about g'(n) = f(n)+g(n)?
Adding the bounding constants shows
.
What do we know about g''(n) = f(n)-g(n)?
Since the bounding constants don't necessary cancel,
We know nothing about the lower bounds on g'+g'' because we know nothing about lower bounds on f, g.
What do we know about g'(n) = f(n)+g(n)?
Adding the lower bounding constants shows
.
What do we know about g''(n) = f(n)-g(n)? We know nothing about the lower bound of this!
The Complexity of Songs
Suppose we want to sing a song which lasts for n units of time. Since n can be large, we want to memorize songs which require only a small amount of brain space, i.e. memory.
Let S(n) be the space complexity of a song which lasts for n units of time.
The amount of space we need to store a song can be measured in either
the words or characters needed to memorize it.
Note that the number of characters is
since every word
in a song is at most 34 letters long - Supercalifragilisticexpialidocious!
What bounds can we establish on S(n)?
The Refrain
Most popular songs have a refrain, which is a block of text which gets repeated after each stanza in the song:
Bye, bye Miss American pie
Drove my chevy to the levy but the levy was dry
Them good old boys were drinking whiskey and rye
Singing this will be the day that I die.
Refrains made a song easier to remember, since you memorize it once yet sing it O(n) times. But do they reduce the space complexity?
Not according to the big oh. If
Then the space complexity is still O(n) since it is only halved (if the verse-size = refrain-size):
The k Days of Christmas
To reduce S(n), we must structure the song differently.
Consider ``The k Days of Christmas''. All one must memorize is:
On the kth Day of Christmas, my true love gave to me,
On the First Day of Christmas, my true love gave to me,
a partridge in a pear tree
But the time it takes to sing it is
If
, then
, so
.
100 Bottles of Beer
What do kids sing on really long car trips?
n bottles of beer on the wall,
n bottles of beer.
You take one down and pass it around
n-1 bottles of beer on the ball.
All you must remember in this song is this template of size
,
and the current value of n.
The storage size for n depends on its value, but
bits
suffice.
This for this song,
.
That's the way, uh-huh, uh-huh
I like it, uh-huh, huh
Reference: D. Knuth, `The Complexity of Songs', Comm. ACM, April 1984, pp.18-24
Problem 2.1-2:
Show that for any real constants
a and b, b > 0,
Note the need for absolute values.
Problem 2.1-4:
(a) Is
(b) Is
?
?
Is
?
Yes, if
for all n
(b) Is
Is
?
note
Is
?
Is
?
No! Certainly for any constant c we can find an n such that this is not true.
Recurrence Relations
Many algorithms, particularly divide and conquer algorithms, have time complexities which are naturally modeled by recurrence relations.
A recurrence relation is an equation which is defined in terms of itself.
Why are recurrences good things?
Recursion is Mathematical Induction!
In both, we have general and boundary conditions, with the general condition breaking the problem into smaller and smaller pieces.
The initial or boundary condition terminate the recursion.
As we will see, induction provides a useful tool to solve recurrences - guess a solution and prove it by induction.
| n | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| | 0 | 1 | 3 | 7 | 15 | 31 | 63 | 127 |
Prove
by induction:
height6pt width4pt
Solving Recurrences
No general procedure for solving recurrence relations is known, which is why it is an art. My approach is:
Realize that linear, finite history, constant coefficient recurrences always can be solved
Consider
,
,
It has history = 2, degree = 1, and coefficients of 2 and 1. Thus it can be solved mechanically! Proceed:
Systems like Mathematica and Maple have packages for doing this.
Guess a solution and prove by induction
To guess the solution, play around with small values for insight.
Note that you can do inductive proofs with the big-O's notations - just be sure you use it right.
Example:
.
Show that
for large enough c and n.
Assume that it is true for n/2, then
Starting with basis cases T(2)=4, T(3)=5, lets us complete the proof
for
.
Try backsubstituting until you know what is going on
Also known as the iteration method. Plug the recurrence back into itself until you see a pattern.
Example:
.
Try backsubstituting:
The
term should now be obvious.
Although there are only
terms before we get to T(1),
it doesn't hurt to sum them all since this is a fast growing geometric
series:
Recursion Trees
Drawing a picture of the backsubstitution process gives you a idea of what is going on.
We must keep track of two things - (1) the size of the remaining argument to the recurrence, and (2) the additive stuff to be accumulated during this call.
Example:
Although this tree has height
, the total sum at each level
decreases geometrically,
so:
The recursion tree framework made this much easier to see than with algebraic backsubstitution.
See if you can use the Master theorem to provide an instant asymptotic solution
The Master Theorem:
Let
and b>1 be constants, let f(n) be a function, and let
T(n) be defined on the nonnegative integers by the recurrence
where we interpret n/b as
or
.
Then T(n) can be bounded asymptotically as follows:
Examples of the Master Theorem
Which case of the Master Theorem applies?
Reading from the equation, a=4, b=2, and f(n) = n.
Is
?
Yes, so case 1 applies and
.
Reading from the equation, a=4, b=2, and
.
Is
?
No, if
, but it is true if
, so case 2 applies
and
.
Reading from the equation, a=4, b=2, and
.
Is
?
Yes, for
, so case 3 might apply.
Is
?
Yes, for
, so there exists a c < 1 to satisfy the regularity
condition,
so case 3 applies and
.
Why should the Master Theorem be true?
Consider T(n) = a T(n/b) + f(n).
Suppose f(n) is small enough
Then we have a recursion tree where the only contribution is at the leaves.
There will be
levels, with
leaves at level l.
Suppose f(n) is large enough
Example:
.
In fact this holds unless
!
In case 3 of the Master Theorem, the additive term dominates.
In case 2, both parts contribute equally, which is why the log pops up. It is (usually) what we want to have happen in a divide and conquer algorithm.
Famous Algorithms and their Recurrence
Matrix Multiplication
Since
dwarfs
, case 1 of the master theorem
applies and
.
This has been ``improved'' by more and more complicated recurrences until the
current best in
.
Polygon Triangulation
The simplest algorithm might be to try each pair of points and check if they
see each other.
If so, add the diagonal and recur on both halves, for a
total of
.
However, Chazelle gave an algorithm which runs
in
time.
Since
,
by case 1 of the Master Theorem, Chazelle's algorithm is
linear, ie. T(n) = O(n).
Sorting
Since
but not
,
Case 2 of the Master Theorem applies and
.
In case 2, the divide and merge steps balance out perfectly, as we usually hope for from a divide-and-conquer algorithm.
Mergesort Animations
Approaches to Algorithms Design
Incremental
A good example of this approach is insertion sort
Divide-and-Conquer
A good example of this approach is Mergesort.
4.2-2 Argue the solution to
is
by appealing to the recursion tree.
The shortest path to a leaf occurs when we take the heavy branch each time.
The height k is given by
, meaning
or
.
The longest path to a leaf occurs when we take the light branch each time.
The height k is given by
, meaning
or
.
The problem asks to show that
, meaning we are looking
for a lower bound
On any full level, the additive terms sums to n.
There are
full levels.
Thus
4.2-4 Use iteration to solve T(n) = T(n-a) + T(a) + n,
where
is a constant.
Why don't CS profs ever stop talking about sorting?!
You should have seen most of the algorithms - we will concentrate on the analysis.
Applications of Sorting
One reason why sorting is so important is that once a set of items is sorted, many other problems become easy.
Searching
Speeding up searching is perhaps the most important application of sorting.
Closest pair
Once the numbers are sorted, the closest pair will be next to each other in sorted order, so an O(n) linear scan completes the job.
Element uniqueness
Sort them and do a linear scan to check all adjacent pairs.
This is a special case of closest pair above.
Frequency distribution - Mode
Sort them and do a linear scan to measure the length of all adjacent runs.
Median and Selection
Once the keys are placed in sorted order in an array, the kth largest can be found in constant time by simply looking in the kth position of the array.
Convex hulls
Convex hulls are the most important building block for more sophisticated geometric algorithms.
Once you have the points sorted by x-coordinate, they can be inserted from left to right into the hull, since the rightmost point is always on the boundary.
Without sorting the points, we would have to check whether the point is inside or outside the current hull.
Adding a new rightmost point might cause others to be deleted.
Huffman codes
If you are trying to minimize the amount of space a text file is taking up, it is silly to assign each letter the same length (ie. one byte) code.
Example: e is more common than q, a is more common than z.
If we were storing English text, we would want a and e to have shorter codes than q and z.
To design the best possible code, the first and most important step is to sort the characters in order of frequency of use.
| Character | Frequency | Code |
| f | 5 | 1100 |
| e | 9 | 1101 |
| c | 12 | 100 |
| b | 13 | 101 |
| d | 16 | 111 |
| a | 45 | >Listening t00 |
Selection Sort
A simple
sorting algorithm is selection sort.
Sweep through all the elements to find the smallest item, then the smallest remaining item, etc. until the array is sorted.
Selection-sort(A)
for i = 1 to n
for j = i+1 to n
if (A[j] < A[i]) then swap(A[i],A[j])
It is clear this algorithm must be correct from an inductive argument, since the ith element is in its correct position.
It is clear that this algorithm takes
time.
It is clear that the analysis of this algorithm cannot be improved
because there will be n/2 iterations which will require at least
n/2 comparisons each, so at least
comparisons will be made.
More careful analysis doubles this.
Thus selection sort runs in
time.
Binary Heaps
A binary heap is defined to be a binary tree with a key in each node such that:
Conditions 1 and 2 specify shape of the tree, and condition 3 the labeling of the tree.
The ancestor relation in a heap defines a partial order on its elements, which means it is reflexive, anti-symmetric, and transitive.
Partial orders can be used to model heirarchies with incomplete information or equal-valued elements. One of my favorite games with my parents is fleshing out the partial order of ``big'' old-time movie stars.
The partial order defined by the heap structure is weaker than that of the total order, which explains
Constructing Heaps
Heaps can be constructed incrementally, by inserting new elements into the left-most open spot in the array.
If the new element is greater than its parent, swap their positions and recur.
Since at each step, we replace the root of a subtree by a larger one, we preserve the heap order.
Since all but the last level is always filled, the height h of an n element heap is bounded because:
so
.
Doing n such insertions takes
, since the last n/2
insertions require
time each.
Heapify
The bottom up insertion algorithm gives a good way to build a heap, but Robert Floyd found a better way, using a merge procedure called heapify.
Given two heaps and a fresh element, they can be merged into one by making the new one the root and trickling down.
Build-heap(A)
n = |A|
For
do
Heapify(A,i)
Heapify(A,i)
left = 2i
right = 2i+1
if
then
max = left
else max = i
if
and (A(right] > A[max]) then
max = right
if
then
swap(A[i],A[max])
Heapify(A,max)
Rough Analysis of Heapify
Heapify on a subtree containing n nodes takes
The 2/3 comes from merging heaps whose levels differ by one. The last row could be exactly half filled. Besides, the asymptotic answer won't change so long the fraction is less than one.
Solve the recurrence using the Master Theorem.
Let a = 1, b= 3/2 and f(n) = 1.
Note that
, since
.
Thus Case 2 of the Master theorem applies.
where we interpret n/b to mean either
or
.
Then T(n) can be bounded asymptotically as follows:
Exact Analysis of Heapify
In fact, Heapify performs better than
,
because most of the heaps we merge are extremely small.
In general, there are at most
nodes of height
h, so the cost of building a heap is:
Since this sum is not quite a geometric series, we can't apply the usual identity to get the sum. But it should be clear that the series converges.
Proof of Convergence
Series convergence is the ``free lunch'' of algorithm analysis.
The identify for the sum of a geometric series is
If we take the derivative of both sides, ...
Multiplying both sides of the equation by x gives the identity we need:
Substituting x = 1/2 gives a sum of 2, so Build-heap uses at most 2n comparisons and thus linear time.
The Lessons of Heapsort, I
"Are we doing a careful analysis? Might our algorithm be faster than it seems?"
Typically in our analysis, we will say that since we are doing at most x operations of at most y time each, the total time is O(x y).
However, if we overestimate too much, our bound may not be as tight as it should be!
Heapsort
Heapify can be used to construct a heap, using the observation that an isolated element forms a heap of size 1.
Heapsort(A)
Build-heap(A)
for i = n to 1 do
swap(A[1],A[i])
n = n - 1
Heapify(A,1)
If we construct our heap from bottom to top using Heapify, we do not have to do anything with the last n/2 elements.
With the implicit tree defined by array positions, (i.e. the ith position is the parent of the 2ith and (2i+1)st positions) the leaves start out as heaps.
Exchanging the maximum element with the last element and calling
heapify repeatedly gives an
sorting algorithm,
named Heapsort.
Lecture Sound../sounds/lec4-17a.au
Heapsort Animations
The Lessons of Heapsort, II
Always ask yourself, ``Can we use a different data structure?''
Selection sort scans throught the entire array, repeatedly finding the smallest remaining element.
For i = 1 to n
A: Find the smallest of the first n-i+1 items.
B: Pull it out of the array and put it first.
Using arrays or unsorted linked lists as the data structure, operation A takes O(n) time and operation B takes O(1).
Using heaps, both of these operations can be done within
time,
balancing the work and achieving a better tradeoff.
Priority Queues
A priority queue is a data structure on sets of keys supporting the following operations:
These operations can be easily supported using a heap.
Applications of Priority Queues
Heaps as stacks or queues
Both stacks and queues can be simulated by using a heap, when we add a new time field to each item and order the heap according it this time field.
This simulation is not as efficient as a normal stack/queue implementation, but it is a cute demonstration of the flexibility of a priority queue.
Discrete Event Simulations
The stack and queue orders are just special cases of orderings. In real life, certain people cut in line.
Sweepline Algorithms in Computational Geometry
Greedy Algorithms
Example: Sequential strips in triangulations.
Danny Heep
4-2 Find the missing integer from 0 to n using O(n) ``is bit[j] in A[i]'' queries.
Also note, the problem is asking us to minimize the number of bits we read. We can spend as much time as we want doing other things provided we don't look at extra bits.
How can we find the last bit of the missing integer?
Ask all the n integers what their last bit is and see whether 0 or 1 is the bit which occurs less often than it is supposed to. That is the last bit of the missing integer!
How can we determine the second-to-last bit?
Ask the
numbers which ended with the correct last bit!
By analyzing the bit patterns of the numbers from 0 to n
which end with this bit.
By recurring on the remaining candidate numbers, we get the answer in T(n) = T(n/2) + n =O(n), by the Master Theorem.
Quicksort
Although mergesort is
, it is quite inconvenient for
implementation with arrays, since we need space to merge.
In practice, the fastest sorting algorithm is Quicksort, which uses partitioning as its main idea.
Example: Pivot about 10.
17 12 6 19 23 8 5 10 - before
6 8 5 10 23 19 12 17 - after
Partitioning places all the elements less than the pivot in the left part of the array, and all elements greater than the pivot in the right part of the array. The pivot fits in the slot between them.
Note that the pivot element ends up in the correct place in the total order!
Partitioning the elements
Once we have selected a pivot element, we can partition the array in one linear scan, by maintaining three sections of the array: < pivot, > pivot, and unexplored.
Example: pivot about 10
| 17 12 6 19 23 8 5 | 10
| 5 12 6 19 23 8 | 17
5 | 12 6 19 23 8 | 17
5 | 8 6 19 23 | 12 17
5 8 | 6 19 23 | 12 17
5 8 6 | 19 23 | 12 17
5 8 6 | 23 | 19 12 17
5 8 6 ||23 19 12 17
5 8 6 10 19 12 17 23
As we scan from left to right, we move the left bound to the right when the element is less than the pivot, otherwise we swap it with the rightmost unexplored element and move the right bound one step closer to the left.
Since the partitioning step consists of at most n swaps, takes time linear in the number of keys. But what does it buy us?
Thus we can sort the elements to the left of the pivot and the right of the pivot independently!
This gives us a recursive sorting algorithm, since we can use the partitioning approach to sort each subproblem.
Quicksort Animations
Pseudocode
Sort(A)
Quicksort(A,1,n)
Quicksort(A, low, high)
if (low < high)
pivot-location = Partition(A,low,high)
Quicksort(A,low, pivot-location - 1)
Quicksort(A, pivot-location+1, high)
Partition(A,low,high)
pivot = A[low]
leftwall = low
for i = low+1 to high
if (A[i] < pivot) then
leftwall = leftwall+1
swap(A[i],A[leftwall])
swap(A[low],A[leftwall])
Best Case for Quicksort
Since each element ultimately ends up in the correct position, the algorithm correctly sorts. But how long does it take?
The best case for divide-and-conquer algorithms comes when we split the input as evenly as possible. Thus in the best case, each subproblem is of size n/2.
The partition step on each subproblem is linear in its size.
Thus the total effort in partitioning the
problems of size
is O(n).
The recursion tree for the best case looks like this:
Worst Case for Quicksort
Suppose instead our pivot element splits the array as unequally as possible. Thus instead of n/2 elements in the smaller half, we get zero, meaning that the pivot element is the biggest or smallest element in the array.
Thus the worst case time for Quicksort is worse than Heapsort or Mergesort.
To justify its name, Quicksort had better be good in the average case. Showing this requires some fairly intricate analysis.
The divide and conquer principle applies to real life. If you will break a job into pieces, it is best to make the pieces of equal size!
Intuition: The Average Case for Quicksort
Suppose we pick the pivot element at random in an array of n keys.
Whenever the pivot element is from positions n/4 to 3n/4, the larger remaining subarray contains at most 3n/4 elements.
If we assume that the pivot element is always in this range, what is the maximum number of partitions we need to get from n elements down to 1 element?
What have we shown?
At most
levels of decent partitions suffices to sort
an array of n elements.
But how often when we pick an arbitrary element as pivot will it generate a decent partition?
Since any number ranked between n/4 and 3n/4 would make a decent pivot, we get one half the time on average.
If we need
levels of decent partitions to finish the job,
and half of random partitions are decent, then on average the recursion
tree to quicksort the array has
levels.
More careful analysis shows that the expected number of comparisons is
.
Average-Case Analysis of Quicksort
To do a precise average-case analysis of quicksort, we formulate a recurrence given the exact expected time T(n):
Each possible pivot p is selected with equal probability. The number of comparisons needed to do the partition is n-1.
We will need one useful fact about the Harmonic numbers
, namely
It is important to understand (1) where the recurrence relation comes from and (2) how the log comes out from the summation. The rest is just messy algebra.
rearranging the terms give us:
substituting
gives
We are really interested in A(n), so
What is the Worst Case?
The worst case for Quicksort depends upon how we select our partition or pivot element. If we always select either the first or last element of the subarray, the worst-case occurs when the input is already sorted!
A B D F H J K
B D F H J K
D F H J K
F H J K
H J K
J K
K
Having the worst case occur when they are sorted or almost sorted is very bad, since that is likely to be the case in certain applications.
To eliminate this problem, pick a better pivot:
Whichever of these three rules we use, the worst case remains
.
However, because the worst case is no longer a natural order it
is much more difficult to occur.
Is Quicksort really faster than Heapsort?
Since Heapsort is
and selection sort is
,
there is no debate about which will be better for decent-sized files.
But how can we compare two
algorithms to see which is
faster?
Using the RAM model and the big Oh notation, we can't!
When Quicksort is implemented well, it is typically 2-3 times faster than mergesort or heapsort. The primary reason is that the operations in the innermost loop are simpler. The best way to see this is to implement both and experiment with different inputs.
Since the difference between the two programs will be limited to a multiplicative constant factor, the details of how you program each algorithm will make a big difference.
If you don't want to believe me when I say Quicksort is faster, I won't argue with you. It is a question whose solution lies outside the tools we are using.
Randomization
Suppose you are writing a sorting program, to run on data given to you by your worst enemy. Quicksort is good on average, but bad on certain worst-case instances.
If you used Quicksort, what kind of data would your enemy give you to run it on? Exactly the worst-case instance, to make you look bad.
But instead of picking the median of three or the first element as pivot, suppose you picked the pivot element at random.
Now your enemy cannot design a worst-case instance to give to you, because no matter which data they give you, you would have the same probability of picking a good pivot!
Randomization is a very important and useful idea. By either picking a random pivot or scrambling the permutation before sorting it, we can say:
``With high probability, randomized quicksort runs intime.''
Where before, all we could say is:
``If you give me random input data, quicksort runs in expectedtime.''
Since the time bound how does not depend upon your input distribution, this means that unless we are extremely unlucky (as opposed to ill prepared or unpopular) we will certainly get good performance.
Randomization is a general tool to improve algorithms with bad worst-case but good average-case complexity.
The worst-case is still there, but we almost certainly won't see it.
7.1-2: Show that an n-element heap has height
.
The height is defined as the number of edges in the longest simple path from the root.
Thus the height increases only when
, or in other words
when
is an integer.
7.1-5 Is a reverse sorted array a heap?
In the array representation of a heap, the descendants of the ith element are the 2ith and (2i+1)th elements.
If A is sorted in reverse order, then
implies
that
.
Since 2i > i and 2i+1 > i then
and
.
Thus by definition A is a heap!
Can we sort in better than
?
Any comparison-based sorting program can be thought of as defining a decision tree of possible executions.
Running the same program twice on the same permutation causes it to do exactly the same thing, but running it on different permutations of the same data causes a different sequence of comparisons to be made on each.
Once you believe this, a lower bound on the time complexity of sorting follows easily.
Since any two different permutations of n elements requires a different sequence of steps to sort, there must be at least n! different paths from the root to leaves in the decision tree, ie. at least n! different leaves in the tree.
Since only binary comparisons (less than or greater than) are used, the decision tree is a binary tree.
Since a binary tree of height h has at most
leaves, we know
, or
.
By inspection
, since the last n/2 terms of the product
are each greater than n/2.
By Sterling's approximation, a better bound is
where e=2.718.
Non-Comparison-Based Sorting
All the sorting algorithms we have seen assume binary comparisons as the basic primative, questions of the form ``is x before y?''.
Suppose you were given a deck of playing cards to sort. Most likely you would set up 13 piles and put all cards with the same number in one pile.
A 2 3 4 5 6 7 8 9 10 J Q K
A 2 3 4 5 6 7 8 9 10 J Q K
A 2 3 4 5 6 7 8 9 10 J Q K
A 2 3 4 5 6 7 8 9 10 J Q K
With only a constant number of cards left in each pile, you can use insertion sort to order by suite and concatenate everything together.
If we could find the correct pile for each card in constant time, and each pile gets O(1) cards, this algorithm takes O(n) time.
Bucketsort
Suppose we are sorting n numbers from 1 to m, where we know the numbers are approximately uniformly distributed.
We can set up n buckets, each responsible for an interval of m/n numbers from 1 to m
If we use an array of buckets, each item gets mapped to the right bucket in O(1) time.
With uniformly distributed keys, the expected number of items per bucket is 1. Thus sorting each bucket takes O(1) time!
The total effort of bucketing, sorting buckets, and concatenating the sorted buckets together is O(n).
What happened to our
lower bound!
We can use bucketsort effectively whenever we understand the distribution of the data.
However, bad things happen when we assume the wrong distribution.
Suppose in the previous example all the keys happened to be 1. After the bucketing phase, we have:
Problems like this are why we worry about the worst-case performance of algorithms!
Such distribution techniques can be used on strings instead of just numbers. The buckets will correspond to letter ranges instead of just number ranges.
The worst case ``shouldn't'' happen if we understand the distribution of our data.
Real World Distributions
Consider the distribution of names in a telephone book.
Either make sure you understand your data, or use a good worst-case or randomized algorithm!
The Shifflett's of Charlottesville
For comparison, note that there are seven Shifflett's (of various spellings) in the 1000 page Manhattan telephone directory.
Rules for Algorithm Design
The secret to successful algorithm design, and problem solving in general, is to make sure you ask the right questions. Below, I give a possible series of questions for you to ask yourself as you try to solve difficult algorithm design problems: