Proteomics

 

 

Data management: Databases, interfaces and ontologies

Co-ordinators:
Günther Zehetner Max Planck Institute for Molecular Genetics, Berlin, Germany more
Thure Etzold LION Bioscience , Cambridge, UK more

This area addresses the development of new guidelines and directions in a form for rational use. Bioinformatics has become an integral part of almost all projects within genome research. Data acquisition, data analysis and databases to store and search for data are the main areas where informatics supports the researcher. Informatics is essential in various steps throughout an experiment, especially for high throughput approaches producing large amounts of complex data, such as hybridisations of complex probes to high density membranes or microarray chips, expression profiling experiments or protein 2-D analysis. Specific examples are for image analysis to identify positions and densities of spots after hybridisations, to convert captured images to digital values, for statistical analysis, to compare data points and normalise data sets, for data display and many more.

Currently there are a number of deficiencies. In contrast to structural genomics, where many programme tools and databases exist, only a few sophisticated tools have been developed so far for the complex data arising from expression profiling studies: two European databases are in advanced development, one at the European Bionformatics Institute (Cambridge) and another at the Max Planck Institute for Molecular Genetics (Berlin). Furthermore, no major database exists to deal with the large amounts of experimental data from different functional genomic projects (ranging from DNA-based to protein-based techniques, from knock-in/out mice to in situ hybridisations), although a small number of specialised databases for certain expression data have been developed. Despite several attempts around the world to build databases specifically for the management of biomolecular interactions, no standard has yet emerged. In general, more programmes are available for the early stages of experiments, such as the detection and quantification of spots on microarrays or 2-D gels, than for the efficient and user-friendly management, analysis and display of large complex data sets.

The incorporation of known functional information into databases at various levels is thus a pressing need requiring the combined efforts of experimentalists, computational biologists and database developers. The challenge is particularly formidable as new types of information, such as tissue and organ gene expression patterns on a genome scale as well as numerous data on protein interactions, post-translational modification and protein structure are rapidly becoming available. The continuous flow of information also requires 'update' and 'awareness' tools that filter and incorporate incoming data (sequences, literature, etc.) and new applications (servers, methods). These tools will systematically integrate the filtered information into dynamic databases.

Central questions related to the management of information in functional genomics are accessibility to the data, classifications and ontologies used, and the identification of errors. The variations in database formats and technologies add significantly to an already complex task of fruitfully accessing the available data that is distributed over many different sites. Even though many databases directly or indirectly reference data stored elsewhere, these links are difficult to exploit due to the large differences between individual database implementations. A standardization of formats, or at least an agreed interface for database interconnectivity, would greatly alleviate these problems. Moreover, there is a need to improve accuracy of existing databases, since not all the data is accurate and most of it is likely to be less reliable than the gene sequences themselves. Biologists must be able to rely on up to date, accurate information on topics such as gene expression patterns.

Without getting involved in maintaining a database as such, the programme will aim to support efforts to achieve a common subset of ontologies, classes and structures which are best able to store and represent the experiments and resulting data in the areas involved. It will also focus on identification or definition of general ways of presenting and visualising the complex data, probably in a graphical format. This is particularly important for 'wet lab' workers wanting to analyze their data and who are often unhappy with available databases, which they may not find easy to use at a practical level. We would seek to establish guidelines for the development of such interfaces, taking into account the specific needs of experimentalists.

At the level of European scientific infrastructure, the computer requirements for functional genomics should be identified and a proposal made for how the facilities may be provided most effectively. Another issue is the economics of bioinformatics resources, in terms of comparison of costs and effectiveness of service from linking to a national, regional or local server or to establish local copies of databanks (costs of equipment, technical support staff). The value of shared access to software and licensing should also be investigated. It will be important to understand the problems in data archives and how to disseminate information to scientists at the bench most effectively. Similarly, we need to identify the problems of multiple copies of databanks installed all over Europe, exisiting as they do in different versions and different update levels. Possible solutions can be discussed, e.g. provision of software to analyse what is present locally, and a report written recommending updating if appropriate.

Contacts within the programme
John Armstrong
Francisco Azuaje

Amos Bairoch
Cyrus Chothia

Werner Dubitzky
Thure Etzold
Jay Hinton
Benoit Leblanc
Marie-Paule Lefranc
Rune Linding
Antoni Matilla
Syed Asad Rahman
Isabel Rojas
Susanna-Assunta Sansone
Peter Savic
Gavin H. Thomas
Paul van der Vet
Ulrike Wittig
Marc Zabeau
Günther Zehetner