next up previous
Next: About this document Up: My Home Page

Application of Functions: Hashing

You are given a set of student records, each of which includes the student's ID-number and which have to be stored in a table. For any given ID-number one has to be able to determine whether there is a record with that number and, if so, to retrieve the record. It may also be necessary to add new records and delete existing ones.

It is fairly easy to come up with some method for solving this problem:

Store the first record in the first table slot, the second in the second slot, and so on. To search for a record, simply scan all entries from the beginning (to the end if necessary).
How are records added or deleted?

It is much harder to design a method for which the search, add, and delete operations are efficient and at the same time storage space is used economically (i.e., no huge table for just a few records).

One solution is to use a hash table based on a well-chosen hash function.

A Hash Table

For simplicity let us first assume that the number of records is small, say no more than 7.

We will use a table with seven entries, numbered 0, 1 , 2 , tex2html_wrap_inline216 , 6.

Example.

tabular158

Note. The elements x for which one computes hash values are also called keys. In this example the keys are ID-numbers.

Collisions

Typically, the domain X of a hash function is much larger than its co-domain Y, though the subset X' of those elements of X for which hash values need to be computed is usually about the same size as Y.

If the function f, when restricted to X' as a domain, is one-to-one, then hashing works fine. If it is not one-to-one, there may be collisions.

Example. Where do we store the record of a student with ID-number 223-79-9068?

Collisions can be resolved in two ways:

Hash functions are typically onto - why is this good?

In the example, if the number of student records is reasonably large, say around 8,000, the function h above, with tex2html_wrap_inline248 , is not suitable. A more reasonable function might be

displaymath207

The Pigeonhole Principle

If A and B are finite domains and B has fewer elements than A, then there is no one-to-one function from A to B.

This observation is also known as the Pigeonhole Principle.

Example. Let A be the set tex2html_wrap_inline264 . How many of the integers from A need to be selected so that, regardless of the choice of selection, there is at least one pair with a sum of 9?

Four is not enough, as we may select 1,2,3,4 where no pair yields a sum larger than 7.

But any selection of five integers from A must contain a pair whose sum is 9. To see why, observe that A can be partitioned into four different subsets tex2html_wrap_inline280 , tex2html_wrap_inline282 , tex2html_wrap_inline284 , and tex2html_wrap_inline286 , where the sum of each of the four corresponding pairs is 9.

Now if tex2html_wrap_inline290 , and tex2html_wrap_inline292 are the selected integers from A, we define a function f, by setting tex2html_wrap_inline298 to be the set tex2html_wrap_inline300 that contains tex2html_wrap_inline302 .

By the pigeonhole principle, the function f is not one-to-one, so that there exists two integers tex2html_wrap_inline306 and tex2html_wrap_inline308 with tex2html_wrap_inline310 . In other words, there must be one subset tex2html_wrap_inline312 , both of whose elements are selected. The corresponding sum is 9.

A Bald Statement

Despite its simplicity, the pigeonhole principle can be used to solve an amazing variety of problems.

Claim: There must be at least two non-bald New Yorkers who have exactly the same number of hairs on their heads!

Proof: The maximum number of hairs on a human head is 1,000,000, and there are greater than 1,000,000 non-bald New Yorkers. height6pt width4pt

Note that this proof, although completely rigourous, is not constructive. We don't figure out which two people share the same hair count, or what the hair count it - only that the given pair must exist.

Other Applications of the Pigeonhole Principle

I own n distinct pairs of socks, which I keep in an unmatched pile in my drawer. How many individual socks must I pull out of the drawer to ensure that I get two that match?

Think of this as having n pigeonholes, one for each type of sock. How many pigeons do I need to ensure that some hole contains 2 of them?


How long a document must you write in order to ensure that at least some word is used more than once?

If there are only 100,000 words in the dictionary, a book with 100,001 words will use at least one of them twice.

A Subset of Divisors

Suppose you are given an arbitrary subset of 101 distinct integers from the set tex2html_wrap_inline320 . There must be two integers x, y in S such that x divides y.

Proof: Every positive integer n can be written as tex2html_wrap_inline334 , for tex2html_wrap_inline336 and m odd. (Why? Factoring all the twos from n leaves an odd number.)

Thus every number in S can be mapped to an odd number from 1 to 199. There are exactly 100 such numbers. (Why? These are the integers 2i - 1 for tex2html_wrap_inline346 )

Thus at least two of the 101 distinct integers must be mapped to the same odd number m, say tex2html_wrap_inline350 and tex2html_wrap_inline352 . Then x must divide y. height6pt width4pt

This result can be generalized to to state that any subset S of n+1 integers from 1 to 2n must contain a pair x, y in S such that x divides y.




next up previous
Next: About this document Up: My Home Page

Steve Skiena
Tue Aug 24 20:25:28 EDT 1999