x is majority element of a set S if the number of times it occurs is > |S|/2. Give an O(n) algorithm to test whether an unsorted array S of n elements has a majority element. Sorting the list and checking the median element yields an algorithm - correct, but too slow.
Observe that if I delete two occurences of different elements from the set, I have not changed the majority element - since n is reduced by two while the count of the majority element is decreased by at most one.
Thus we can scan the set from left to right, and keep count of how many times we see the first element before we see an instance of a second element. We delete this pair and continue. If we are left with one element at the end, this is the only candidate for the majority element.
We must verify that this candidate is in fact a majority element, but that can be tested by counting in a second O(n) sweep over the data.
Combinatorial Optimization
In most of the algorithmic problems we study, we seek to find the best answer as quickly as possible.
Traditional algorithmic methods fail when (1) the problem is provably hard, or (2) the problem is not clean enough to lead to a nice formulation.
In most problems, there is a natural way to (1) construct possible solutions and (2) measure how good a given solution is, but it is not clear how to find the best solution short of searching all configurations.
Heuristic methods like simulated annealing give us a general approach to search for good solutions.
unconstrained-optimization-R
Simulated Annealing
The inspiration for simulated annealing comes from cooling molten materials down to solids. To end up with the globally lowest energy state you must cool slowly so things cool evenly.
In thermodynamic theory, the likelihood of a particular particle jumping to a higher energy state is given by:
where , denote the before/after energy states, is the Boltzman constant, and T is the temperature.
Since minimizing energy is a combinatorial optimization problem, we can mimic the physics for computing.
Simulated-Annealing()
\> Create initial solution S
\> Initialize temperature t
\> repeat
\>\> for i=1 to iteration-length do
\>\>\> Generate a random transition from S to
\>\>\> If then
\>\>\> else if
\>\>\>\> then
\>\> Reduce temperature t
\> until (no change in C(S))
\> Return S
Components of Simulated Annealing
There are three components to any simulated annealing algorithm for combinatorial search:
where r is a random number . The constant c normalizes this cost function, so that almost all transitions are accepted at the starting temperature.
We provide several examples to demonstrate how these components can lead to elegant simulated annealing algorithms for real combinatorial search problems.
Traveling Salesman Problem
Solution space - set of all (n-1)! circular permutations.
Cost function - sum up the costs of the edges defined by S.
Transition mechanism - The most obvious transition mechanism would be to swap the current tour positions of a random pair of vertices and . This changes up to eight edges on the tour, deleting the edges currently adjacent to both and , and adding their replacements. Better would be to swap two edges on the tour with two others that replace it
tsp-swap
Since only four edges change in the tour, the transitions can be performed and evaluated faster. Faster transitions mean that we can evaluate more positions in the given amount of time.
In practice, problem-specific heuristics for TSP outperform simulated annealing, but the simulated annealing solution works admirably, considering it uses very little knowledge about the problem.
Maximum Cut
Given a weighted graph, partition the vertices to maximize the weight of the edges cut.
max-cut-R2.0in
This NP-complete problem arises in circuit design applications.
Solution space - set of all vertex partitions, represented as a bit string.
Cost function - the weight of the edges which are cut.
Transition mechanism - move one vertex across the partition.
This kind of procedure seems to be the right way to do maxcut in practice.
Independent Set
An independent set of a graph G is a subset of vertices S such that there is no edge with both endpoints in S. The maximum independent set of a graph is the largest such empty induced subgraph.
independent-set-R1.5in
Solution space - set of all vertex subsets, represented as a bit string.
Cost function - , where is a constant, T is the temperature, and is the number of edges in the subgraph induced by S.
The dependence of C(S) on T ensures that the search will drive the edges out faster as the system cools.
Transition mechanism - move one vertex in/out of the subset.
More flexibility in the search space and quicker computations result from allowing non-empty graphs at the early stages of the cooling.
Chromatic Number
What is the smallest number of colors needed to color vertices such that no edge links two vertices of the same color?
vertex-coloring-R2.0in
The solution is complicated by the fact that many vertices have to shift (potentially) to reduce the chromatic number by one.
To insure that the proposed colorings are biased in favor of low cardinality subsets (i.e. 28 red, 1 blue, and 1 green is better than 10 red, 10, blue, and 10 green), we will make certain colors more expensive than others.
By weighting the colors (ex: 100, 99, 97, 93, 85, 69, 37) we get faster convergence, although certain configurations might be cheaper than ones achieving the chromatic number! This can be enforced with a more complicated scheme.
By Brooks' Theorem, every graph can be colored with colors. In fact colors suffice unless G is complete or an odd-cycle.
Solution space - all possible partitions of vertices into color classes, where is the maximum vertex degree.
Cost function - , where is the penalty constant.
Transition mechanism - randomly move one vertex to another subset.
Circuit Board Placement
In designing printed circuit boards, we are faced with the problem of positioning modules (typically integrated circuits) on the board.
Desired criteria in a layout include (1) minimizing the area or aspect ratio of the board, so that it properly fits within the allotted space, and (2) minimizing the total or longest wire length in connecting the components.
Circuit board placement is an example of the kind of messy, multicriterion optimization problems for which simulated annealing is ideally suited.
We are given a collection of a rectangular modules , each with associated dimensions . For each pair of modules , we are given the number of wires that must connect the two modules. We seek a placement of the rectangles that minimizes area and wire-length, subject to the constraint that no two rectangles overlap each other.
Solution space - The positions of each rectangle. To provide a discrete representation, the rectangles can be restricted to lie on vertices of an integer grid.
Cost function - A natural cost function would be
where , , and are constants governing the impact of these components on the cost function.
Transition mechanism - moving one rectangle to a different location, or swapping the position of two rectangles.
lec19-1.au Lessons from the Backtracking contest
lec19-3.au
Winning Optimizations