|
Combination
and integration of functional genomics data to derive new
biological knowledge
This
area is designed as the main point of convergence for researchers
working in the other areas. We propose the development of
integrative platforms (concepts and software tools) able to
expand and increase knowledge accumulated in the individual
fields. Biological information and knowledge is dispersed
amongst numerous different sources: experimental results obtained
using various technologies, the scientific literature and
specialised molecular biology databases for annotated sequences,
families, structures, phenotypes, etc. (even databases of
biological databases such as DBCAT from Infobiogen or The
Biocatalog from the EBI). The first requirement for the experimentalist
is to link his own results to external sources in order to
better interpret and exploit them. Typically, the biologist
would like to ask the following types of questions: to which
metabolic pathways do genes that are found as over-expressed
in an expression profiling experiment belong or, was the protein
interaction found in a two-hybrid experiment already described
in the literature? In fact, the requirement for linkage is
even broader since the information to be brought together
can cover areas such as medicine, chemistry, agronomy, ecology
or patents.
Linking
local data to external sources of information requires either
direct references, such as cross reference in sequence databases,
or computer supported methods, such as programs for homology
searches, literature extraction or genome comparison. Systematic
attempts to build up and explore those links are found in
tools that have been developed over the past few years to
support the annotation of genome sequences. How this first
generation software should be extended or adapted to handle
new types of data produced by functional genomics technologies
(expression, protein-protein interactions, etc.) will be one
focus of the programme. We will also exchange experience and
information on all technical means available to biologists
for building links to and from their own data.
Linking
information is absolutely essential in order to support the
deductive reasoning of the biologist who wants to derive biological
knowledge from his data. We believe, however, that an even
more promising outcome can be expected from tools that will
offer means to make inductive exploration of functional genomics
information. An inductive process requires a unified framework
in which heterogeneous types of information can be projected
and visualised. The principle underlying the "neighbourhood
concept" is to study relationships between biological
objects and not to consider them as isolated entities. In
fact, genomics and functional analysis naturally focus on
relationships between biological sequences. Indeed, proximity
on the chromosome, coregulation or interactions can be seen
as relationships that correspond, from the viewpoint of sequence,
to different types of neighbourhood. The definition of neighbourhood
can readily be generalised to cluster sequences with similar
structural features or similar physicochemical properties,
or to define distances between sequences based on their co-occurence
in the literature. Neighbourhood also opens up opportunities
for new approaches in the field of genome comparison: not
only sequences themselves but their relationships will be
compared. In this model, each neighbourhood puts a specific
light on a gene which could elicit fundamentally new findings.
Within
the framework of the programme, we will investigate further
the applicability of the neighbourhood concept in functional
genomics and try to identify and evaluate other innovative
concepts that could be relevant for integration and combination
of information. Another objective will be to investigate how
the combination of experimental results and predictive methods
could help in the definition of prediction-driven experimental
protocols or strategies. Such approaches are major factors
for the optimisation of biologists' investments and efforts.
There
are currently several organisms for which large functional
genomics projects have been launched and publicly funded at
the European level (Eurofan for S. cerevisiae, REGIA for Arabidopsis,
REALIS for Listeria, BFA for B. subtilis). Each of those projects
will have to address the question of information integration.
This programme is a unique and ideal forum through which to
organise fruitful exchanges about the different strategies
that have been chosen for integration of information in those
different projects.
Contacts
within the programme
Miguel
Andrade
Francisco
Azuaje
Antoine
Danchin
Antoine
de Daruvar
Werner
Dubitzky
Guillaume
Dussert
Alessandro
Guffanti
Martijn
Huynen
Daniel
Kahn
Juha
Kere
Marie-Paule
Lefranc
Steve
Oliver
Christos
Ouzounis
José
E. Pérez-Ortín
Isabel
Rojas
Brian
Sturgeon
Alain
Viari
Anil
Wipat
Marc
Zabeau
|