|
Structural
genomics: Protein structure determination, classification,
modelling and docking
Structural
genomics is the assignment of three-dimensional structures
to proteomes and the investigation of their biological implications.
Protein structure is an important indicator of function, particularly
where the structure of a new protein is homologous to one
already known. Two levels of assignment are employed in structural
genomics, one being experimental large-scale determination
of protein structures using NMR or X-ray crystallography,
and the other computational structure prediction through detection
of homologies with proteins of known structure.
With
numerous genome sequences already available and the majority
of human genes represented in the EST database, it is becoming
increasingly likely that a family to which a new protein belongs
is represented already in the databases. Computational methods
involve pairwise or multiple sequence comparisons, fold recognition,
predictions of secondary structures based on statistical rules
derived from structures, and modelling. Mycoplasma genitalium,
with only 479 proteins, is a focus for computational investigations
in structural genomics. As with the other functional genomics
technologies, structure prediction is dependent on databases
and appropriate search programs. Predictions can be made by
searching databases of complete protein domains (CATH, ProDom,
SCOP), collections of structural or functional sequence motifs
(BLOCKS, PRINTS) or libraries of conserved sequence patterns
associated with specific functions (PROSITE). For example,
the SCOP (Structural Classification of Proteins) database,
aims to provide a detailed and comprehensive description of
the structural and evolutionary relationships between all
proteins whose structure is known, with links to PDB entries,
sequences, references, images and interactive display systems.
Generally, the problem of obtaining the best results from
a database search is one of signal to noise. To compensate
for the noise, the sensitivity of the search method can be
increased, e.g. the PSI-BLAST method, which combines the popular
BLAST algorithm with profile analysis. Integration of pre-processing
methods into the search scheme also considerably improves
signal to noise. For multidomain proteins, searching with
the entire sequence is much less sensitive than searching
with segments that are located between known domains. Thus
scanning databases of known domains is an important complement
to standard database searches. In the absence of recognisable
sequence similarity, threading approaches - fold assignments
by checking for sequence compatibility with known three-dimensional
structure, e.g. using ProFIT - may reveal additional insights,
as has been successfully demonstrated for leptin. Model building
can be carried out where genome sequences show good enough
matches (>30% identity) to a PDB sequence; a public repository
of predicted models of protein structures cross-referenced
with sequence databases and PDB has been initiated jointly
by the PDB and the Swiss Bioinformatics Institute.
A
number of bioinformatics methods devoted to the analysis of
protein-protein interactions have been developed in recent
years. Different docking procedures have been described which
try to analyse whether two proteins are able to interact and
the structure of the resulting complex. Protein docking methods
can be applied when the 3-dimensional structures of the proteins
are known or when a good structural model can be built. Many
docking methods are based on the rigid body approximation,
while 'soft docking' procedures rely on a geometric criterion
and on a simplified representation of the protein surface,
and others combine a shape complementarity search with subsequent
energy refinement. Some other methods, e.g. FLEXX, can dock
a flexible ligand into the active site of a protein. Powerful
methods have recently been developed which rely on data other
than protein structure to infer possible interaction networks,
namely protein interaction mapping based on the analysis of
gene fusion events or on comparison of genome sequences. The
different methods discussed can be incorporated in a structural
genomics landscape and will be essential complements to experimental
approaches.
Contacts
within the programme
Tom
Blundell
Peer
Bork
Carlo V. Bruschi
Cyrus Chothia
Maurizio
Cirilli
Manuela
Helmer-Citterich
Jeremy Clarke
Arthur
Lesk
Christine
Orengo
Dave
Ritchie
Arne
Skerra
Geremia
Silvano
Enrico
Stura
Sarah
Teichmann
Anna
Tramontano
|