- Training Courses
- Workshops
- Grants & Fellowships
- European Conference in Functional Genomics
- Meeting Reports
- Online Registration

 

 

ESF Workshop: Mining High-Throughput Data in Functional Genomics
8-9 May 2007
Coleraine, Northern Ireland

Organisers
Report
1. Scientific content
2. Results & Contributions
3. Programme

Organisers:

Daniel Berrar, University of Ulster, Coleraine, Northern Ireland
Werner Dubitzky, University of Ulster, Coleraine, Northern Ireland

Report

Scientific Content

This workshop was intended for academic researchers and industrial practitioners who
wish to understand the state of the art of the presented methods and identify the areas in which gaps in our knowledge demand further research and development. To this end, a key aim of the workshop was to demonstrate the practical application of the methods in the context of real questions and problems encountered in research and development.

Focusing on actual questions, problems and applications arising in research,
development and real-world applications, the workshop revolved around two major themes:
(1) State of the art and challenges in analyzing high-throughput data in
functional genomics and proteomics.
This theme addressed the state of the art of analytical methods from data mining,
statistics, machine learning, and artificial intelligence for the analysis of high-throughput
data such as DNA microarrays (genomics) and mass spectrometry data (proteomics). The key challenges and limitations of current methods were discussed.
(2) State of the art and challenges in integrative systems biology.
This theme addressed the issues related to the integration of heterogeneous data arising from genomics and proteomics experiments in order to support the systemic analysis (interaction of genomic and proteomics information with other elements and system dynamics).

The first presentation was given by Prof. Dr. Joaquín Dopazo who talked about
methods for the functional profiling of microarray data. This talk was an excellent
introduction to the state of the art of current research in functional genomics. His
presentation put particular emphasis on caveats and pitfalls in the analysis of high throughput data (e.g., the use and misuse of clustering techniques and the importance of methods for correction of multiple hypotheses testing).

The next three presentations demonstrated the application of statistical and machine
learning techniques for the analysis of genomic data. Prof. Des Higgins reported on a
combination of multivariate analysis methods to analyze high-throughput genomic data
sets. His presentation included correspondence analysis (CA) and variations on it, especially co-insertia analysis (CIA) and between groups analysis (BGA). Prof. Higgins and colleagues used this method to find trends in common between multiple gene expression data sets on the same data and to match gene expression and proteomics data sets and to match gene expression and transcription factor binding site data sets.

Prof. Jean-Michel Claverie reported on recent studies on chromosome conformation,
which show that chromosomes colocalize in the nucleus, bringing together active genes in transcription factories. This spatial proximity of actively transcribing genes could provide a means for RNA interaction at the transcript level. Prof. Claverie and his colleagues have screened public databases for chimeric EST and mRNA sequences with the intent of mapping transcription-induced interchromosomal interactions. They suggest that chimeric transcripts may be the result of close encounters of active genes, either as functional products or ‘‘noise’’ in the transcription process, and that they could be used as probes for chromosome interactions.

Prof. Tony Bjourson reported on a novel model of invasion and metastasis in human
breast cancer. This model is derived from microarray data. Prof. Bjourson and colleagues
have identified a number of key players in aggressive tumor development and used network analysis to decipher the interplay between the genes.

The following five presentations were characterized by a common theme: network
analysis. High-throughput techniques and the massive accumulation of biological data have evidenced the huge number of interacting components making a living cell. The relations observed in metabolism, proteome and gene regulation form part of large-scale networks that escape the direct understanding by simple observation. To overcome this limitation, graph theory offers a suitable framework for the characterization and modeling for the study these interacting multi-component systems. Particularly in the case of protein-protein interaction networks, where we establish relations between network properties such as modularity or the number of connections of an element, and their functional and structural features. In his introductory talk to this topic, Dr. Carlos Rodriguez-Caso showed that graph analysis can help elucidate the evolutionary rules governing the molecular network shaping.

Prof. Jean-Daniel Zucker illustrated graph/network analysis by presenting a study
which explored underlying functional mechanisms in gene co-expression networks. The
strong modularity exhibited by co-expression networks illustrates, at a genomic level, the functional partitioning of various molecular processes, which constitute cellular
environments. Such modular networks of functionally related genes and proteins are
relying, most often, on a small number of highly connected hub nodes, which have been
proved to assure key roles in modulating the expression of a large number of other genes in response to environmental changes. Prof. Zucker and colleagues proposed an original conceptual framework designed to promote the assessment of functional interactions at the genomic level by relying on abstract multiple-instance objects to represent the relationship between expression profiles and the relevant functional themes involving these transcripts.

Prof. Dr. Alvis Brazma reported on the analysis and interpretation of DNA microarray
data, and the integration of publicly available data repositories.

Prof. Dr. Martin Stetter reported on methods for robust learning of very large
biomedical interaction networks. His talk revolved around Bayesian networks, which
became a popular framework for analyzing high-dimensional data. However, learning their structure is constrained by computational limitations such that the maximal network size, and is usually restricted dramatically to a small set of variables. Prof. Stetter summarized methods for robust learning of high-dimensional interaction networks, such as genetic networks or genotype phenotype networks, and for performing inference in them.

Dr. Markus Ringnér reported on methods for revealing signaling pathway deregulation
in cancer using gene expression signatures. There is still a lack of methods for identifying cell signaling pathways whose deregulation result in an observed expression signature. Dr. Ringnér presented a strategy for identifying such signaling pathways and evaluated the strategy using six human and mouse gene expression signatures. Dr. Ringnér and colleagues also showed that pathway signatures defined in one data set correlate significantly to patient survival in independent data sets across multiple types of carcinoma.

Prof. Dr. Mark Girolami reported on methods for inferring the parameters and topology
of mechanistic pathway models from high-throughput data. Prof. Girolami and colleagues
developed a Bayesian methodology, which might shed new light on our understanding of
these pathways.

Prof. Dr. Geoff McMullan reported on the issues involved in the analysis of high-
throughput proteomic data, with an emphasis on data curation in the validation of
microbial proteomic experiments.

While all talks so far focused on the analysis of wet-lab high-throughput data from
genomics and proteomics, Prof. Dr. Rui Brito reported on in silico simulation data, which
represent truly massive data sets and pose unprecedented challenges, both with respect to data analysis and data management. Prof. Brito’s talk on mining protein folding and unfolding simulations of proteins highlighted these challenges. Prof. Brito and colleagues are currently investigating the mechanisms involved in protein folding/unfolding that cause amyloid diseases. Furthermore, Prof. Brito reported on an ongoing project aiming at the development of a grid-enabled, distributed data warehouse for the analysis of protein folding simulation data. Ultimately, these simulation studies could give new insights for the prediction of protein structures and have real pharmacological impact.

Results & Contributions

It is widely accepted that the challenges of modern biology can only be tackled by a truly multidisciplinary approach, which embraces the expertise of scientists from various
disciplines. This workshop was tailored for an audience with diverse academic
backgrounds, such as biology, mathematics, physics, computer science, bioinformatics and medicine. With the aim of bridging the “language barrier” that still exists between these disciplines, this workshop comprised 12 presentations of 55 minutes each, with 30 minutes for the talk and ample time (25 min) for discussions.

The workshop presented an arsenal of modern tools and methodologies from applied
statistics and machine learning that are useful for the analysis of high-throughput genomic and proteomic data. Of particular importance are methods for generating, visualizing and analyzing large graphs and networks, for example, gene-gene interaction networks. These methods represented the centerpiece of five talks.

One important insight of this workshop is that many statistical problems that we
encountered in early microarray studies, such as the issue of correcting for multiple
hypotheses testing, are now largely solved. However, we now encounter similar problems in high-throughput proteomic studies such as mass spectrometry. Many solutions proposed in the context of microarray data analysis might be transferable to the analysis of mass spectrometry data.

The workshop represented a platform for an exchange of ideas, discussions of the state
of the art in analyzing data from functional genomics, and critical assessment of the
limitations of current approaches. The workshop has supported the consolidation of existing and the development of future collaborations between the delegates. These new collaborations might include guest lectures by visiting scientists, student exchange
programs, joint research grant proposals, and the organization of future scientific meetings (workshops, conferences).

Programme
Tuesday, 8th of May 2007
08:15-08:45 Registration
08:45-09:00 Welcome and Introduction
09:00-09:55 Joaquin Dopazo Methods for functional profiling of high-throughput data
09:55-10:50 Des HigginsMultivariate analysis of genes, proteins and promoters
10:50-11:45 Jean-Michel ClaverieTentative mapping of transcription-induced interchromosomal interaction using chimeric EST and mRNA data
11:45-12:00 Coffee Break in the Science Research Park
12:00-12:55 Tony Bjourson A model of breast cancer metastasis
12:55-13:55 Lunch Break in the Science Research Park
13:55-14:50 Carlos Rodriguez-CasoNetworks in molecular biology
14:50-15:45 Jean-Daniel Zucker Exploring underlying functional mechanisms in gene co-expression networks
15:45-16:00 Coffee Break in the Science Research Park
16:00-16:30 Summary and Open Discussion
16:30-19:00 Informal Get-together in the Senior Common Room, University Campus
19:30 Dinner in the Bannview Restaurant, University Campus

Wednesday, 9th of May 2007
09:00-09:55 Alvis Brazma Analyzing and interpreting DNA microarray data
09:55-10:50 Martin StetteRobust learning of very large biomedical interaction networks
10:50-11:45 Markus Ringnér Revealing signaling pathway deregulation by using gene expression signatures
11:45:12:00 Coffee Break in the Science Research Park
12:00-12:55 Mark Girolami Inferring the parameters and topology of mechanistic pathway models from high-throughput data
12:55-13:55 Lunch Break in the Science Research Park
13:55-14:50 Geoff McMullan What is the importance of data curation in the validation of microbial proteomic experiments?
14:50-15:45 Rui Brito Mining protein folding and unfolding simulations: from amyloid diseases to
protein structure prediction
15:45-16:00 Coffee Break in the Science Research Park
16:00-16:30 Summary and Discussion