Proteomics

 

 

Structural genomics: Protein structure determination, classification, modelling and docking

Co-ordinators:
Arthur Lesk University of Cambridge, Cambridge, UK more
Manuela Helmer-Citterich University of Rome, Rome, Italy more

Structural genomics is the assignment of three-dimensional structures to proteomes and the investigation of their biological implications. Protein structure is an important indicator of function, particularly where the structure of a new protein is homologous to one already known. Two levels of assignment are employed in structural genomics, one being experimental large-scale determination of protein structures using NMR or X-ray crystallography, and the other computational structure prediction through detection of homologies with proteins of known structure.

With numerous genome sequences already available and the majority of human genes represented in the EST database, it is becoming increasingly likely that a family to which a new protein belongs is represented already in the databases. Computational methods involve pairwise or multiple sequence comparisons, fold recognition, predictions of secondary structures based on statistical rules derived from structures, and modelling. Mycoplasma genitalium, with only 479 proteins, is a focus for computational investigations in structural genomics. As with the other functional genomics technologies, structure prediction is dependent on databases and appropriate search programs. Predictions can be made by searching databases of complete protein domains (CATH, ProDom, SCOP), collections of structural or functional sequence motifs (BLOCKS, PRINTS) or libraries of conserved sequence patterns associated with specific functions (PROSITE). For example, the SCOP (Structural Classification of Proteins) database, aims to provide a detailed and comprehensive description of the structural and evolutionary relationships between all proteins whose structure is known, with links to PDB entries, sequences, references, images and interactive display systems. Generally, the problem of obtaining the best results from a database search is one of signal to noise. To compensate for the noise, the sensitivity of the search method can be increased, e.g. the PSI-BLAST method, which combines the popular BLAST algorithm with profile analysis. Integration of pre-processing methods into the search scheme also considerably improves signal to noise. For multidomain proteins, searching with the entire sequence is much less sensitive than searching with segments that are located between known domains. Thus scanning databases of known domains is an important complement to standard database searches. In the absence of recognisable sequence similarity, threading approaches - fold assignments by checking for sequence compatibility with known three-dimensional structure, e.g. using ProFIT - may reveal additional insights, as has been successfully demonstrated for leptin. Model building can be carried out where genome sequences show good enough matches (>30% identity) to a PDB sequence; a public repository of predicted models of protein structures cross-referenced with sequence databases and PDB has been initiated jointly by the PDB and the Swiss Bioinformatics Institute.

A number of bioinformatics methods devoted to the analysis of protein-protein interactions have been developed in recent years. Different docking procedures have been described which try to analyse whether two proteins are able to interact and the structure of the resulting complex. Protein docking methods can be applied when the 3-dimensional structures of the proteins are known or when a good structural model can be built. Many docking methods are based on the rigid body approximation, while 'soft docking' procedures rely on a geometric criterion and on a simplified representation of the protein surface, and others combine a shape complementarity search with subsequent energy refinement. Some other methods, e.g. FLEXX, can dock a flexible ligand into the active site of a protein. Powerful methods have recently been developed which rely on data other than protein structure to infer possible interaction networks, namely protein interaction mapping based on the analysis of gene fusion events or on comparison of genome sequences. The different methods discussed can be incorporated in a structural genomics landscape and will be essential complements to experimental approaches.

Contacts within the programme

Tom Blundell
Peer Bork
Carlo V. Bruschi
Cyrus Chothia

Maurizio Cirilli
Manuela Helmer-Citterich
Jeremy Clarke
Arthur Lesk
Christine Orengo
Dave Ritchie
Arne Skerra
Geremia Silvano
Enrico Stura
Sarah Teichmann
Anna Tramontano