HELP/FAQ


Tables/Searching Algorithms
(Chapter 11)

Definitions

A ________, or ___________, is an abstract data type whose data items are stored and retrieved according to a key value.

The items are called ____________.

The table may be implemented using a variety of data structures: array, tree, heap, etc.

Each record can have a number of ______________.

The data is ordered based on one of these, named the _________ .

The record we are searching for has a key value that is called the ________.

Sequential Search

public static int search(int[] a, int target) {

    int i = 0;

    boolean found = ________;

    while ((i < a.length) && ! found) {

        if (___________________)

            found = true;

        else _________________;

    }

    if (found) return i;

    else return ______________;

}

Sequential Search on N elements

Worst Case

Number of comparisons: ________

Average Case

Number of comparisons: ________

Best Case

Number of comparisons: ________

Binary Search

Can be applied to any data structure where the data elements are sorted.

Additional parameters:

first – index of the first element to examine

size – number of elements to search starting from the first element above

Binary Search

Precondition:

If size > 0, then the data structure must have size elements starting with the element denoted as the first element. In addition, these elements are sorted.

Postcondition:

If target appears, the position of the target is returned (a non-negative integer). Otherwise, -1 is returned.

Recursive Binary Search

Binary Search Implementation

public static int search int[] a, int first, int size, int target)
{

        int mid;

        if (size <= 0) return -1;

        else {

            mid = first + size/2;

            if (target == a[mid]) return mid;

            else if (target < a[mid])

                return search(a, first, size/2, target);

            else

                return search(a, mid+1, (size-1)/2, target);

        }

}

Binary Search on N elements

Let T(N) = the total number of comparisons for a search on N elements.

T(N) = 1 + T(N/2)

T(N/2) = 1 + T(N/4)

...

T(1) = 1

Thus, T(N) = 1 + 1 + 1 + ... + 1 = O(_______________)

Hashing

Data records are stored in a hash table.

The position of a data record in the hash table is determined by its key.

A hash function maps keys to positions in the hash table.

If a hash function maps two keys to the same position in the hash table, then a collision occurs.

Simple Example

Let the hash table be an 11-element array.

If k is the key of a data record, let H(k) represent the hash function, where H(k) = k mod 11.

Insert the keys 83, 14, 29, 70, 55, 72:

Goals of Hashing

An insert without a collision takes O(1) time.

A search also takes O(1) time, if the record is stored in its proper location.

The hash function can take many forms:

- If the key k is an integer:  k % tablesize

- If key k is a String (or any Object): k.hashCode() % tablesize

- Any function that maps k to a table position!

The table size should be a prime number.

Collision Resolution

Linear Probing

- During insert of key k to position p:

If position p contains a different key, then examine positions p+1, p+2, etc.* until an unused position is found and insert k there.

- During a search for key k at position p:

If position p contains a different key, then examine positions p+1, p+2, etc.* until either the key is found or an unused position is encountered.

*wrap around to beginning of array if p+i > tablesize

Collision Resolution (cont’d)

Quadratic probing

If position p contains a different key, then examine positions p+1, p+4, p+9, etc. until either the key is found or an empty position is encountered.

Example: Insert additional keys 72, 36, 48 using H(k) = k mod 11 and linear probing.

Special consideration

If we remove a key from the hash table, can we get into problems?

Special consideration

Add another array with boolean values that indicate whether the position is currently used or has been used in the past.

If a key is removed, leave the boolean value set at true so we can search past it if necessary.

Load Factor

The load factor a of a hash table is given by the following formula:

a = number of elements in table/size of table

Thus, ______ < a < ________ for linear probing.

For linear probing, as a approaches _________, the number of collisions increases

Birthday Paradox

Probability that n people don’t have the same birthday in a room:

p = (364/365)*(363/365)*...*((365-n+1)/365)

When n > __________, p < 0.5.

This means when there are at least ______ people in the room, chances are better that two people share the same birthday!

MORAL: For any hashing problem of reasonable size, we are almost certain to have collisions.

Table ADT (pages 548-549)

WATCH OUT - THERE ARE TYPOS IN THE CODE! 
(EXERCISE: FIND THEM!)

The number of elements in the table is given by manyItems.

We try to store an element with a given key at location hash(key). If a collision occurs, linear probing is used to find a location to store the element and its associated key.

If index i has never been used in the table, data[i] and key[i] are set to null.

If index i is or has been used in the past, then hasBeenUsed[i] is true; otherwise it is false.

Various Hash Functions

Division hash function

1. convert key to a positive integer

2. return the integer modulo the table size

Mid-square hash function

1. convert key to an integer

2. multiply the integer by itself

3. return several digits in the middle of result

Multiplicative hash function

1. convert the key to an integer

2. multiply the integer by a constant less than 1

3. return the first several digits of the fractional part

Reducing Clustering

Linear probing can cause significant clusters.

To reduce clustering, use double hashing.

Define two hash functions: hash1 and hash2

Use the hash1 function to determine the initial location of the key in the hash table.

If a collision occurs, use the hash2 function to determine how far to move ahead to look for a vacant location in the hash table.

Double Hashing Example

H1(k) = k mod 1231
H2(k) = 1 + k mod 1229

For key k = 2000: H1(k) = ______

If this location is occupied, H2(k) = ________

So we check position _________________ to see if it is occupied.

If it is, we check position ___________________, etc.

NOTE: The table size is relatively prime to the value returned by the second hash function hash2.

Chained Hashing

The maximum number of elements that can be stored in a hash table implemented using an array is the table size (a = ______).

We can have a load factor greater than ______ by using chained hashing.

Each array position in the hash table is a head reference to a linked list of keys.

All colliding keys that hash to an array position are inserted to that linked list.

Chained Hashing: A Picture

Average Search Time

In open-addressing, the average number of table elements examined in a successful search is approximately:

_____________ using linear probing*

_____________ using double hashing*

_____________ using chained hashing

*assuming a non-full hash table with no removals

Average number of searches during a successful search as a function of the load factor a


Course Info | Schedule | Sections | Announcements | Homework | Exams | Help/FAQ | Grades | HOME