These notes are
taken from the presentation slides for CSE 648, November 14, 2000,
by Vinhthuy
Phan (phan@cs.sunysb.edu) And Brian
Bowen (bbowen@cs.sunysb.edu)
protein folding programs, CASP-3
Introduction
· Methods
for obtaining information about protein structure from the amino acid sequence
have apparently been advancing rapidly. But just what can these methods
currently deliver?
· There
has been an on going attempt by the Protein Structure Prediction
Center to answer this question. The first experiment is known as CASP-1 (Critical
Assessment of techniques for protein structure prediction) which was first
held in 1994. Since then, there has been CASP-2(1996), CASP-3 (1998) and
CASP-4(2000).
A) CASP-3 : Primary Focus was on the
evaluation of three-dimensional models and secondary structure predictions.
B) CASP-3 General Goals:
· Are the models produced
similar to the corresponding experimental structure?
· Are the models correctly
alligned with the experimental structures?
· Have similar structures
that a model can be based on been identified?
·
What methods are most
effective?
· Where can future effort
must be productively be focused?
B) What about CASP-4?
· Similar to CASP-3 with additional
experiments
· Meeting to evaluate the results will be held
December 3-7, 2000
CASP-3
· Prediction targets were chosen with the help of the
experimental community. A List of target proteins was chosen based on details
from both Protein crystallographers and NMR spectroscopists on structures they
were planning to make public soon.
· How many particpants? How many targets? How many predictions?
· Predictions were made using three approaches:
A) Comparitive Modeling :
Uses similarities between a targets protein sequence
with another sequence of known structure to predict the target structure.
(There is a threshold used for similarity tests)
B) Threading:
Based on developing potentials for fold assignment,
and Hidden Markov Models (HHMs) or profile methods that descended from sequence
alignment methods.
C) Ab initio :
Based on the laws of physics and increasingly
relying on the structure knowledge base to calculate how the forces between
atoms affect their arrangement.
CASP-3
· Summary of Target Difficulty
This is a graph plotting two measures of similarity
between CASP3 targets and structure neighbors as defined by VAST.
Q: What is VAST?
A: Given
a particular 3D structure currently stored in MMDB, VAST provides a
means of retrieving related structures, which are also currently stored there.
¨ The Y-Axis is the fraction
of the CASP3 target protein that has been used for structure superpositions to
VAST neighbors.
¨ The X-Axis gives the
agreement between the sequences in the structure (sequence independent)
superpositions as calculated by VAST.
Square boxes around a
prediction problems indicate that at least one group has been able to make prediction that passes an accuracy
threshold. At least
20%
of the prediction must be based on the correct fold and at least 50% of the
model must be aligned accurately.
Note:
The line between prediction problems that can be solved and those that can't.
Ab Initio
· "ROSETTA"
Kim T. Simons,1 Rich Bonneau,1 Ingo Ruczinski,2 and David Baker 1 *
1 Department of
Biochemistry, University of Washington, Seattle, Washington
2 Department of
Statistics, University of Washington, Seattle, Washington
¨ Program assembles
structures from fragments with similar scoring sequence using simulated
annealing and knowledge-based scoring functions.
¨ Simulated annealing attempts
to find the global energy minima by taking into account:
sequence
dependent terms representing hydrophobic burial, specific pair interactions
such as electrostatics and disulfide bonding, and sequence independent
terms representing hard sphere packing, alpha-helix and beta-strand packing,
and the collection of beta strands in beta sheets.
· Prediction results include:
1) 99-residue segment for MarA with an
rmsd of 6.4A to the native structure
2) 95-residue (full length)
prediction for EH2 domain for EPS15 with an rmsd of 6.0A
3) 75-residue segment of
DNA-helicase with an rmsd of 4.7A
4) 67-residue segment
of ribosomal protein L30 with an rmsd
of 3.8A
cn3d
Model (Get a viewer from NCBI)
Ab Initio
· Harold Scheraga's
group
Cornell
Univeristy
¨ To shorten the computation
they started with a simplified version of the Amino Acid sequence taking
advantage of the fact that every AA has the same backbone, a regularly
repeating string of one nitrogen and
two carbon atoms. There program first ignores the nitrogen and a carbon atom, leaving
a central carbon atom attached to its side chain.
¨ Generates rough structures,
which serve as starting points for computations that consider all the molecules
and their associated forces.
¨ MarA structure
determination took 100 hours running on 64 parallel processors of an IBM SP2
supercomputer
· Results include :
1) HDEA