These notes are taken from the presentation slides for CSE 648, November 14, 2000,

by Vinhthuy Phan (phan@cs.sunysb.edu) And Brian Bowen (bbowen@cs.sunysb.edu)

 


                                              protein folding programs, CASP-3

Introduction

 

·        Methods for obtaining information about protein structure from the amino acid sequence have apparently been advancing rapidly. But just what can these methods currently deliver?

 

·        There has been an on going attempt by the Protein Structure Prediction Center to answer this question. The first experiment is known as CASP-1 (Critical Assessment of techniques for protein structure prediction) which was first held in 1994. Since then, there has been CASP-2(1996), CASP-3 (1998) and CASP-4(2000).

 

A) CASP-3 : Primary Focus was on the evaluation of three-dimensional models and secondary structure predictions.

 

B) CASP-3 General Goals:

· Are the models produced similar to the corresponding experimental structure?

· Are the models correctly alligned with the experimental structures?

· Have similar structures that a model can be based on been identified?

· What methods are most effective?

· Where can future effort must be productively be focused?

 

                    B) What about CASP-4?

                              · Similar to CASP-3 with additional experiments

                              · Meeting to evaluate the results will be held December 3-7, 2000

 

 

                                                                               

 

 

CASP-3

 

·        Prediction targets were chosen with the help of the experimental community. A List of target proteins was chosen based on details from both Protein crystallographers and NMR spectroscopists on structures they were planning to make public soon.

 

·        How many particpants? How many targets? How many predictions?

          --CASP-3 in numbers

 

·        Predictions were made using three approaches:

 

          A) Comparitive Modeling :

Uses similarities between a targets protein sequence with another sequence of known structure to predict the target structure. (There is a threshold used for similarity tests)

 

          B) Threading:

Based on developing potentials for fold assignment, and Hidden Markov Models (HHMs) or profile methods that descended from sequence alignment methods.

 

          C) Ab initio :

Based on the laws of physics and increasingly relying on the structure knowledge base to calculate how the forces between atoms affect their arrangement.

 

                             

 

 

 

 

 


CASP-3

 

·        Summary of Target Difficulty

This is a graph plotting two measures of similarity between CASP3 targets and structure neighbors as defined by VAST.

Q: What is VAST?

A:  Given a particular 3D structure currently stored in MMDB, VAST provides a means of retrieving related structures, which are also currently stored there.  

         

¨ The Y-Axis is the fraction of the CASP3 target protein that has been used for structure superpositions to VAST neighbors.

¨ The X-Axis gives the agreement between the sequences in the structure (sequence independent) superpositions as calculated by VAST.

 

 

 

 

Square boxes around a prediction problems indicate that at least one group has been able to make  prediction that passes an accuracy threshold. At least

20% of the prediction must be based on the correct fold and at least 50% of the model must be aligned accurately.

 

 

Note: The line between prediction problems that can be solved and those that can't.

 

 

 

 

 

 

 

 

 



 

Ab Initio

 

·          "ROSETTA"

              Kim T. Simons,1 Rich Bonneau,1 Ingo Ruczinski,2 and David Baker 1 *

                                                              1 Department of Biochemistry, University of Washington, Seattle, Washington

                                                              2 Department of Statistics, University of Washington, Seattle, Washington

 

¨ Program assembles structures from fragments with similar scoring sequence using simulated annealing and knowledge-based scoring functions.

 

¨ Simulated annealing attempts to find the global energy minima by taking into account:

 sequence dependent terms representing hydrophobic burial, specific pair interactions such as electrostatics and disulfide bonding, and sequence independent terms representing hard sphere packing, alpha-helix and beta-strand packing, and the collection of beta strands in beta sheets.

 

·        Prediction results include:

1) 99-residue segment for MarA with an rmsd of 6.4A to the native structure

2) 95-residue (full length) prediction for EH2 domain for EPS15 with an rmsd of 6.0A

3) 75-residue segment of DNA-helicase with an rmsd of 4.7A

4) 67-residue segment of  ribosomal protein L30 with an rmsd of 3.8A

 

cn3d Model (Get a viewer from NCBI)

 

 

 


Ab Initio

 

·          Harold Scheraga's group

             Cornell Univeristy

 

¨ To shorten the computation they started with a simplified version of the Amino Acid sequence taking advantage of the fact that every AA has the same backbone, a regularly repeating string of  one nitrogen and two carbon atoms. There program first ignores the nitrogen and a carbon atom, leaving a central carbon atom attached to its side chain.

 

¨ Generates rough structures, which serve as starting points for computations that consider all the molecules and their associated forces.

 

¨ MarA structure determination took 100 hours running on 64 parallel processors of an IBM SP2 supercomputer

 

 

·        Results include :

                    1) HDEA

                    2) MarA                

                      

                             cn3d Model