Homework 2
Due Thursday, November 3, 2005
Type and print your answer, which is appreciated, or handwrite clearly and neatly. Each of the problems should be solved on a separate sheet of paper to facilitate grading. Limit the solution of each problem to one sheet of paper unless otherwise stated.
Many of these problems are deliberately open-ended and vague; do the best you can on them but don't make yourself crazy. Please don't wait until the last minute to look at the problems.
You must do this assignment in groups of 2 to 3 people.
Redo the experiments with more restrictive search parameters such that the original sequence is reported as the only match. If this is not possible, explain why! Experiment to determine the percentage of base changes where it starts to be unlikely the original sequence is the best match.
Write a 2-3 page paper describing your experiences. You may
instead follow the assignment presented in
http://www.tigr.org/
salzberg/hw2.html if you prefer a more
constrained exercise.
A simple global alignment program written in C will be available on the course webpage.
Those who want a more macho experience can try to support affine gap
costs (i.e. the cost of a gap of length is
for constants
and
) and/or linear space.
However, these are not required for the assignment.
Analyze one or more (1) viral genomes, (2) bacterial genomes, and (3) human sequences (you do not need to do the full genome) looking for long open reading frames and report what you find. What is the size distribution among the long (say > 100 codon) ORFs you find, and how does that compare to random sequences.
Write a 2 page paper on your experiences with such simple gene recognition techniques. Attach printouts of your programs.
Give a polynomial-time algorithm for finding the minimum-length palindrome, with a time analysis.