The tertiary structure of a protein specifies the location of each carbon atom along its backbone.
The secondary structure (helices and sheets) captures some notion of shape, but it does not suffice to accurately predict whether two proteins bind or dock together.
Predicting protein interactions arises critically in searching databases for potential drugs (rational drug design).
In protein docking, we seek to (1) predict the binding between two different proteins, or a protein and a flexible ligand, and (2) identify the orientation maximizing the interaction.
A variety of different representations can be used for geometric protein structures:
All are somewhat of a fiction since molecules vibrate, move, and bend.
This flexibility limits our ability to use standard geometric algorithms and concepts.
The idea of the shape defined by a set of points is inherently difficult to define.
The convex hull of a set of points defines the smallest convex polygon which contain all of them.
The convex hull fails to pick up the cavities and protrusions which inherently make shapes interesting.
Structures based on connecting points to their nearest neighbors can recover if the points have been sampled densely enough.
The alpha-hull is a generalization of the convex hull, where the shape
is defined by spheres of radius alpha, for some given value of
.
An edge (face) between two (three) points is alpha exposed if there is a sphere of radius alpha which contact these points and contain no internal points.
As alpha decreases, concavities get cut out from the convex hull.
The theory gives you little insight into which value of
defines
your shape, except by trial and error.
Two different alpha-shapes Gramacidin A, the latter highlighting the tunnel through the molecule:
Myoglobin molecule with heme binding pocket:
HIV protease with inhibitor binding site:
The entire spectrum of alpha-hulls can be constructed in
time
in the plane, the same as for convex hulls.
Typically, both proteins are modeled as rigid bodies, with the geometry used to constrain the possible sites of interaction.
Energy computations are performed at geometrically possible binding sites.
The ``right'' way to solve such problems is to construct the six dimensional configuration space of allowable positions of the second protein, and perform energy calculations at vertices/edges of the allowable region.
Modeling the interactions between a rigid protein and a small but flexible ligand is more complicated, since every hinge in the ligand increases the dimensionality of the problem.
Rough geometric interactions with parts of a ligand can be used to predict possible binding sites, but detailed energy calculations are needed to make precise predictions.
Preliminary screenings of possible docking sites can be based on maximizing the number of contact pairs or RMS distance.
The docking problem is not purely geometric, since attractive/repulsive forces have strong effects.
The best docking seeks to maximize the surface area and attractive forces while minimizing the energy loss due to solvent interaction.
Small ligands tend to bind in big pockets.
Finding the best docking is a difficult algorithmic problem because it involves six degrees of freedom, the three possible translations (x, y, and z) and three possible rotations.
Binding flexible ligands is analogous to motion planning for articulated robot arms.
Motion planning with many degrees of freedom becomes difficult as the complexity of the surfaces defining the conformation space grows.
A good general approach is to randomly sample points in the configuration space, and add edges between nearby collision-free points with collision-free straight line paths.
One approach to simplifying continuous geometric problems is to insist that all sites lie on a 3D grid.
The finer the grid, the more accurate the predictions, though at greater computational cost.
Another approach to discretization is to analyze the possible positions of isolated spheres which contact the surface, with pockets identified where there are many intersecting spheres.
Geometric hashing stores all possible point triplets (triangles) in both ligand and receptor.
The sets of triangles which match defines molecular orientations of interest.
Note that conventional hashing techniques do not really apply, since we are looking for approximate matches.