ISE 390 -- Programming Challenges -- Week 4 Notes This Weeks Goals: To discuss sorting and searching library functions To review the basic sorting algorithms selection sort, bubble sort, and insertion sort. To introduce this weeks problems on sorting and variations Topic for Discussion: Should we do the problems in groups instead of individually? Programming Ideas: The problems this week all involve using sorting and/or searching in one way or another. Be sure you know about the built-in sorting/searching libraries in your favorite programming language: The following useful functions appear in the C language standard library: #include void qsort(void *base, size_t nel, size_t width, int (*compar) (const void *, const void *)); DESCRIPTION The qsort() function is an implementation of the quick-sort algorithm. It sorts a table of data in place. The contents of the table are sorted in ascending order according to the user-supplied comparison function. The base argument points to the element at the base of the table. The nel argument is the number of elements in the table. The width argument specifies the size of each ele- ment in bytes. The compar argument is the name of the com- parison function, which is called with two arguments that point to the elements being compared. The function must return an integer less than, equal to, or greater than zero to indicate if the first argument is to be considered less than, equal to, or greater than the second argument. The contents of the table are sorted in ascending order according to the user supplied comparison function. EXAMPLES The following program sorts a simple array: static int intcompare(int *i, int *j) { if (*i > *j) return (1); if (*i < *j) return (-1); return (0); } main() { int a[10]; /* assume you have initialized the array */ qsort((char *) a, 10, sizeof(int), intcompare); for (i=0; i<10; i++) printf(" %d",a[i]); printf("\n"); } Question: I once found that qsort took forever in sorting a large array of 0's and 1's. What stupid thing did the implementor of qsort do and how could you fix it with the right comparison function? Note that qsort destroys the contents of the original array. If you need the original order, make a copy. NAME bsearch - binary search a sorted table SYNOPSIS #include void *bsearch(const void *key, const void *base, size_t nel, size_t size, int (*compar)(const void *, const void *)); DESCRIPTION bsearch() is a binary search routine generalized from Knuth (6.2.1) Algorithm B. It returns a pointer into a table (an array) indicating where a datum may be found or a null pointer if the datum cannot be found. The table must be previously sorted in increasing order according to a com- parison function pointed to by compar. key points to a datum instance to be sought in the table. base points to the element at the base of the table. nel is the number of elements in the table. size is the number of bytes in each element. The function pointed to by compar is called with two arguments that point to the elements being compared. The function must return an integer less than, equal to, or greater than 0 as accordingly the first argument is to be considered less than, equal to, or greater than the second. RETURN VALUES A null pointer is returned if the key cannot be found in the table. EXAMPLE struct node { /* these are stored in the table */ char *string; int length; }; /* routine to compare two nodes based on an */ /* alphabetical ordering of the string field */ static int node_compare(const void *node1, const void *node2) { return (strcmp( ((const struct node *)node1)->string, ((const struct node *)node2)->string)); } /* Inside main program... */ node_ptr = bsearch( &node, table, sizeof(table)/sizeof(struct node), sizeof(struct node), node_compare); if (node_ptr != NULL) { (void) printf("string = %20s, length = %d\n", node_ptr->string, node_ptr->length); } else { (void)printf("not found: %20s\n", node.string); } The C++ Standard Template Library (STL) has methods for doing sorting and searching, and more: void sort(RandomAccessIterator bg, RandomAccessIterator end) void sort(RandomAccessIterator bg, RandomAccessIterator end, BinaryPredicate op) The first of these assume that you will be using the comparison (e.g. <=) function defined for the class, the second of which enables sorting by multiple criteria. void stable_sort(RandomAccessIterator bg, RandomAccessIterator end) void stable_sort(RandomAccessIterator bg, RandomAccessIterator end, BinaryPredicate op) In stable sorting, keys of equal value are guaranteed to remain in the same relative order, which is useful if sorting by multiple criteria. void partial_sort(RandomAccessIterator bg, RandomAccessIterator sortend, RandomAccessIterator end) void partial_sort(RandomAccessIterator bg, RandomAccessIterator sortend, RandomAccessIterator end, BinaryPredicate op) only sort the elements up to sortend void nth_element(RandomAccessIterator bg, RandomAccessIterator nth, RandomAccessIterator end) void nth_element(RandomAccessIterator bg, RandomAccessIterator nth, RandomAccessIterator end, BinaryPredicate op) Extract the smallest elements, without necessarily sorting them. Other STL library functions of interest: merge set_union set_intersection set_difference reverse remove -- remove all occurrences of a value or elements satisfying a predicate unique -- remove consecutive duplicates Algorithm Theory Make sure you understand how the classical sorting algorithms work and what is interesting about them: Selection sort -- repeatedly find the largest element in the unsorted data, and put it in the next position in sorted order. SelectionSort(A[n]) For i = 1 to n do Sort[i] = Find-Minimum from A Delete-Minimum from A Bubble sort -- percolate elements forward through adjacent element swaps. No element is guaranteed to be in the proper position until the end Bubblesort (A[n]) for i := 1 to n - 1 do for j := 1 to n - i do if A[j] > A[j+1] then temp := A[j] A[j] := A[j + 1] A[j + i ] := A[j] Insertion sort -- repeatedly take the next unsorted element, and insert it into the sorted part of the array InsertionSort(A) A[0] = -\infty for i = 1 to n-1 do j=i while (A[j] > A[j-1]) do swap(A[j],A[j-1]) Assigned Problems: 120 -- Sort a stack of pancakes using spatula flips. Don't laugh, this is how Bill Gates got his start! 156 -- Identify which words in a list have a unique letter distribution, i.e. cannot be transformed into another word by permutation. 230 -- Maintain a dictionary of book titles under insertion/deletion, telling which book comes before/after other books. 299 -- Rearrange the cars in a train using the smallest number of pairwise swaps. 390 -- Identify the most frequently occurring substrings in a text to help cracking codes. Problems for Discussion ----------------------- Sorting by Reversals -- upper and lower bounds and applications. biology application in analyzing genomes, find the minimum reversal distance in genes occuring in two different species (mice and men) can you prove that at least f(n) reversals are needed in the worst case? how many always suffice? consider the case of fixed length reversals. Can you always sort with reversals of size 2? size 3, size 4, etc.? (think parity arguments to show you can't do it for certain sizes) Measures of presortedness -- runs and inversions