Tables/Searching
Algorithms
(Chapter 11)
Definitions
A ________, or ___________, is an abstract data type whose data items are stored and retrieved according to a key value.
The items are called ____________.
The table may be implemented using a variety of data structures: array, tree, heap, etc.
Each record can have a number of ______________.
The data is ordered based on one of these, named the _________ .
The record we are searching for has a key value that is called the ________.
Sequential Search
public static int search(int[] a,
int target) {int i = 0;
boolean found = ________;
while ((i < a.length) && ! found) {
if (___________________)
found = true;
else _________________;
}
if (found) return i;
else return ______________;
}
Sequential Search on N elements
Worst Case
Number of comparisons: ________
Average Case
Number of comparisons: ________
Best Case
Number of comparisons: ________
Binary Search
Can be applied to any data structure where the data elements are sorted.
Additional parameters:
first – index of the first element to examine
size – number of elements to search starting from the first element above
Binary Search
Precondition:
If size > 0, then the data structure must have size elements starting with the element denoted as the first element. In addition, these elements are sorted.
Postcondition:
If target appears, the position of the target is returned (a non-negative integer). Otherwise, -1 is returned.
Recursive Binary Search

Binary Search Implementation
public static int search
int[] a, int first, int size, int target)
{
int mid;
if (size <= 0) return -1;
else {
mid = first + size/2;
if (target == a[mid]) return mid;
else if (target < a[mid])
return search(a, first, size/2, target);
else
return search(a, mid+1, (size-1)/2, target);
}
}
Binary Search on N elements
= O(_______________)Let T(N) = the total number of comparisons for a search on N elements.
T(N) = 1 + T(N/2)
T(N/2) = 1 + T(N/4)
...
T(1) = 1
Thus, T(N) = 1 + 1 + 1 + ... + 1
Hashing
Data records are stored in a hash table.
The position of a data record in the hash table is determined by its key.
A hash function maps keys to positions in the hash table.
If a hash function maps two keys to the same position in the hash table, then a collision occurs.
Simple Example
Let the hash table be an 11-element array.
If k is the key of a data record, let H(k) represent the hash function, where H(k) = k mod 11.
Insert the keys 83, 14, 29, 70, 55, 72:

Goals of Hashing
An insert without a collision takes O(1) time.
A search also takes O(1) time, if the record is stored in its proper location.
The hash function can take many forms:
- If the key k is an integer:
k % tablesize- If key k is a String (or any Object):
k.hashCode() % tablesize - Any function that maps k to a table position!The table size should be a prime number.
Collision Resolution
Linear Probing
- During insert of key k to position p:
If position p contains a different key, then examine positions p+1, p+2, etc.* until an
unused position is found and insert k there.- During a search for key k at position p:
If position p contains a different key, then examine positions p+1, p+2, etc.* until either the key is found or an unused position is encountered.
*wrap around to beginning of array if p+i > tablesize
Collision Resolution (cont’d)
Quadratic probing
If position p contains a different key, then examine positions p+1, p+4, p+9, etc. until either the key is found or an empty position is encountered.
Example: Insert additional keys 72, 36, 48 using H(k) = k mod 11 and linear probing.

Special consideration
If we remove a key from the hash table, can we get into problems?

Special consideration
Add another array with boolean values that indicate whether the position is currently used or has been used in the past.
If a key is removed, leave the boolean value set at true so we can search past it if necessary.

Load Factor
The load factor
a of a hash table is given by the following formula: a = number of elements in table/size of tableThus, ______ <
a < ________ for linear probing.For linear probing, as
a approaches _________, the number of collisions increases
Birthday Paradox
Probability that n people don’t have the same birthday in a room:
p = (364/365)*(363/365)*...*((365-n+1)/365)
When n > __________, p < 0.5.
This means when there are at least ______ people in the room, chances are better that two people share the same birthday!
MORAL: For any hashing problem of reasonable size, we are almost certain to have collisions.
Table ADT (pages 548-549)
WATCH OUT - THERE ARE TYPOS IN THE CODE!
(EXERCISE: FIND THEM!)
The number of elements in the table is given by manyItems.
We try to store an element with a given key at location hash(key). If a collision occurs, linear probing is used to find a location to store the element and its associated key.
If index i has never been used in the table, data[i] and key[i] are set to null.
If index i is or has been used in the past, then hasBeenUsed[i] is true; otherwise it is false.
Various Hash Functions
Division hash function
1. convert key to a positive integer
2. return the integer modulo the table size
Mid-square hash function
1. convert key to an integer
2. multiply the integer by itself
3. return several digits in the middle of result
Multiplicative hash function
1. convert the key to an integer
2. multiply the integer by a constant less than 1
3. return the first several digits of the fractional part
Reducing Clustering
Linear probing can cause significant clusters.
To reduce clustering, use double hashing.
Define two hash functions: hash1 and hash2
Use the hash1 function to determine the initial location of the key in the hash table.
If a collision occurs, use the hash2 function to determine how far to move ahead to look for a vacant location in the hash table.
Double Hashing Example
H1(k) = k mod 1231
H2(k) = 1 + k mod 1229
For key k = 2000: H1(k) = ______
If this location is occupied, H2(k) = ________
So we check position _________________ to see if it is occupied.
If it is, we check position ___________________, etc.
NOTE: The table size is relatively prime to the value returned by the second hash function hash2.
Chained Hashing
The maximum number of elements that can be stored in a hash table implemented using an array is the table size (
a = ______).We can have a load factor greater than ______ by using chained hashing.
Each array position in the hash table is a head reference to a linked list of keys.
All colliding keys that hash to an array position are inserted to that linked list.
Chained Hashing: A Picture

Average Search Time
In open-addressing, the average number of table elements examined in a successful search is approximately:
_____________ using linear probing*
_____________ using double hashing*
_____________ using chained hashing
*assuming a non-full hash table with no removals
Average number of searches during a successful search as a function of the load factor a

Course Info | Schedule | Sections | Announcements | Homework | Exams | Help/FAQ | Grades | HOME