The Algorithm Design Manual
About the Book
Programming Challenges

The Stony Brook Algorithm Repository

Steven Skiena
Stony Brook University
Dept. of Computer Science

1.7.9 Shortest Common Superstring

Problem Input | Problem Output

INPUT                    OUTPUT


Input Description: A set of strings s_1, ..., s_m.

Problem: Find the shortest string S which contains each s_i as a substring of S.

Excerpt from The Algorithm Design Manual: Shortest common superstring arises in a variety of applications, including sparse matrix compression. Suppose we have an n X m matrix with most of the elements being zero. We can partition each row into m/k runs of k elements each and construct the shortest common superstring S' of these runs. We now have reduced the problem to storing the superstring, plus an n X m/k array of pointers into the superstring denoting where each of the runs starts. Accessing a particular element M[i,j] still takes constant time, but there is a space savings when |S| << mn.


Implementations

  • CAP -- Contig Assembly Program (C) (rating 9)
  • Celera Assembler (C) (rating 8)

  • Recommended Books

    Algorithms on Strings, Trees, and Sequences by Dan Gusfield Introduction to Computational Biology by M. Waterman

    Related Problems


      
    Longest Common Substring
      
    Suffix Trees and Arrays
      
    Text Compression



    This page last modified on 2008-07-10 .
    www.algorist.com