Proteomics

 

 

Combination and integration of functional genomics data to derive new biological knowledge

Co-ordinators:
Antoine de Daruvar Lion Bioscience, Bordeaux, France more
Antoine Danchin Institut Pasteur, Paris, France more

This area is designed as the main point of convergence for researchers working in the other areas. We propose the development of integrative platforms (concepts and software tools) able to expand and increase knowledge accumulated in the individual fields. Biological information and knowledge is dispersed amongst numerous different sources: experimental results obtained using various technologies, the scientific literature and specialised molecular biology databases for annotated sequences, families, structures, phenotypes, etc. (even databases of biological databases such as DBCAT from Infobiogen or The Biocatalog from the EBI). The first requirement for the experimentalist is to link his own results to external sources in order to better interpret and exploit them. Typically, the biologist would like to ask the following types of questions: to which metabolic pathways do genes that are found as over-expressed in an expression profiling experiment belong or, was the protein interaction found in a two-hybrid experiment already described in the literature? In fact, the requirement for linkage is even broader since the information to be brought together can cover areas such as medicine, chemistry, agronomy, ecology or patents.

Linking local data to external sources of information requires either direct references, such as cross reference in sequence databases, or computer supported methods, such as programs for homology searches, literature extraction or genome comparison. Systematic attempts to build up and explore those links are found in tools that have been developed over the past few years to support the annotation of genome sequences. How this first generation software should be extended or adapted to handle new types of data produced by functional genomics technologies (expression, protein-protein interactions, etc.) will be one focus of the programme. We will also exchange experience and information on all technical means available to biologists for building links to and from their own data.

Linking information is absolutely essential in order to support the deductive reasoning of the biologist who wants to derive biological knowledge from his data. We believe, however, that an even more promising outcome can be expected from tools that will offer means to make inductive exploration of functional genomics information. An inductive process requires a unified framework in which heterogeneous types of information can be projected and visualised. The principle underlying the "neighbourhood concept" is to study relationships between biological objects and not to consider them as isolated entities. In fact, genomics and functional analysis naturally focus on relationships between biological sequences. Indeed, proximity on the chromosome, coregulation or interactions can be seen as relationships that correspond, from the viewpoint of sequence, to different types of neighbourhood. The definition of neighbourhood can readily be generalised to cluster sequences with similar structural features or similar physicochemical properties, or to define distances between sequences based on their co-occurence in the literature. Neighbourhood also opens up opportunities for new approaches in the field of genome comparison: not only sequences themselves but their relationships will be compared. In this model, each neighbourhood puts a specific light on a gene which could elicit fundamentally new findings.

Within the framework of the programme, we will investigate further the applicability of the neighbourhood concept in functional genomics and try to identify and evaluate other innovative concepts that could be relevant for integration and combination of information. Another objective will be to investigate how the combination of experimental results and predictive methods could help in the definition of prediction-driven experimental protocols or strategies. Such approaches are major factors for the optimisation of biologists' investments and efforts.

There are currently several organisms for which large functional genomics projects have been launched and publicly funded at the European level (Eurofan for S. cerevisiae, REGIA for Arabidopsis, REALIS for Listeria, BFA for B. subtilis). Each of those projects will have to address the question of information integration. This programme is a unique and ideal forum through which to organise fruitful exchanges about the different strategies that have been chosen for integration of information in those different projects.

Contacts within the programme
Miguel Andrade
Francisco Azuaje
Antoine Danchin
Antoine de Daruvar
Werner Dubitzky
Guillaume Dussert
Alessandro Guffanti
Martijn Huynen
Daniel Kahn
Juha Kere
Marie-Paule Lefranc
Steve Oliver
Christos Ouzounis
José E. Pérez-Ortín
Isabel Rojas
Brian Sturgeon
Alain Viari
Anil Wipat
Marc Zabeau