=================================================================== String Algorithmics for Fun and Profit Steven Skiena Department of Computer Science State University of New York Stony Brook, NY 11794-4400 skiena@cs.sunysb.edu The demand for efficient algorithms to manipulate large text strings has exploded because of biological applications (i.e. the Human Genome project) and large text databases (i.e. Internet search engines). In this talk, I'll discuss some of our recent research in string algorithms and computational biology. Although the basic tools and paradigms of algorithm analysis are theoretical, its problem-solving approach lends itself naturally to applied problems. Beginning graduate students should have no difficulty following this talk. Algorithms for Fast Fragment Assembly ------------------------------------- We have been working with biologists at Brookhaven National Laboratory to sequence the genome of the bacteria which causes Lyme Disease. Our fragment assembler, STROLL, is based on sophisticated string algorithms and data structures. STROLL assembled the entire Borrelia genome five times faster than the most popular competing assembly program, and found a large repeat region which other assemblers missed. (with Ting Chen) Matching in Run-Length Encoded Strings -------------------------------------- Measuring the similarity of two strings (approximate string matching) is a fundamental problem in computational biology. Strings arising in graphics and image processing applications typically contain long runs of identical symbols, and are best represented in compressed form. We have developed the first algorithm for efficiently matching run-length encoded strings, with applications to graphics morphing and OCR systems. (with Alberto Apostolico and Gad Landau) File Renaming for CD-ROMs ------------------------- My forthcoming book ``The Algorithm Design Manual'' is accompanied by a platform-independent, multimedia CD-ROM. Conforming to the ISO-9660 file system standard for CD-ROMs restricts all file names to eight characters, with a three character extension, all upper-case. Moving general file systems over to ISO-9660 requires shortening the names without causing name collisions, while preserving as much meaning as possible. We have developed a slick algorithm to find optimal collision-free renamings, and implemented it in a tool to simplify the production of CD-ROMs. (with Ricky Bradley and Dario Vlah)