|
BIOINFORMATICS & SYSTEMS BIOLOGY
In the present era of post-genome research, the various high throughput genomics and proteomics approaches make it possible for the first time to think in terms of the construction of conceptual models of the living cell. This global and inter-disciplinary approach, designated systems biology, requires the organisation and integration of complex experimental knowledge and its expression in models with sufficient predictive capacity, therefore being closely linked to developments in computational biology. Systems biology aims to analyse the relationships among the elements in a system in response to genetic or environmental perturbations, where a biological system may encompass molecules, cells, organs, individuals, or even ecosystems. Systems approaches have already been applied to a number of fundamental biological processes, such as the modelling of the yeast galactose metabolic pathway and embryonic development of the sea urchin. Most recent work includes the comparison of biological and non-biological networks at the level of the organisation of their regulatory components, from evolutionary or functional viewpoints.
Systems biology acts as a natural link between experimental functional genomics activities and computational biology and bioinformatics. Relevant computational steps are the organisation of complex genomics information, modelling of the biological system, prediction of properties of uncharacterised systems components, and the generation of hypotheses concerning the reaction of the system to external perturbation. The computational aspects of systems biology require a combination of classical developments in bioinformatics, followed by the application of mathematical modelling tools. This unique combination of expertise is inherently multidisciplinary and requires input from biomedical experimental research, high throughput genomics and proteomics technologies, bioinformatics, mathematics, computer sciences, physics and engineering.
Bioinformatics aspects of Systems Biology: Handling and integration of the basic information for systems biology
A number of computational developments are essential for the aspects related to basic data information, preceding the work in systems biology and modelling. The organisation and integration of the molecular information underlies the development of genomics and biomedicine, and builds the basic data necessary for the work on computational systems biology. In particular, we will focus on the definition of standards for different experimental information and ways of integrating data provided by the various technologies. We propose to focus on the following emerging topics and new technical initiatives:
Basic assessment of molecular function It is still the case that 20% of proteins are of unknown function, even in the best characterised systems. Approaches based on sequence searches are commonly used, but are unable to extend function further and tend to introduce a number of errors that are difficult to correct later. Assessment by experiment remains a major challenge, but it has been difficult to assemble consortia to carry out systematic function determination. An alternative is provided by methods related with the concept of functional neighbourhood; computational methods combining information about gene order, conservation and organisation with standard sequence searches are a very interesting route for the prediction of function that will be explored during the next few years.
Ontologies Related to the exploration of protein function are the problems of classification and description (i.e. ontologies), which include the corresponding definitions of function at various levels (molecular, cellular), associated with experimental information. While practical aspects of function definition have been addressed by the Gene Ontology consortium, there are still many open questions in different species and systems, including the conditions under which it is possible to transfer functional information from model organisms to other species, the definition of function in areas related with the molecular basis of disease, and others. At the same time there is a great need to integrate research done at the more theoretical level in the definition, building, maintenance (congruence) and comparison of ontologies, with the practical work in the construction of classifications such as GO. An associated aspect is the linkage of the classification of genes and proteins provided by ontologies with corresponding molecular information stored in databases (sequences, variants, structures, etc.) and in the literature. The separation between the classifications, molecular data and experimental (bibliographic) information remains one of the chief obstacles in the growth and maintenance of ontologies. The initial work of the Biocreative consortium in the assessment of the capacity of text mining to extract information about molecular function from textual sources is a good example of the type of new activities being developed in this area.
Graphical data and life science image processing Various microscopy technologies have increased the capacity to generate images of biological processes in cells, organs and individuals. Unfortunately images are difficult to handle in a systematic way by databases, and their integration with the underlying molecular, anatomical, pathological and physiological information remains a significant challenge. Images are more difficult to organise than common genomic information, and labelling and classification are very much dependent on manual work. Common standards for the exchange of organised information between different repositories must be developed to interface between experimental biologists and bioinformaticians.
Technologies for connecting primary biological databases The more obvious of these are SRS for the integration of and access to databases, and DAS, the basic technology used by the ENSEMBL human genome annotation platform for accessing and integrating annotations provided by different groups around the world. Associated with data integration, current work in the integration of methods provided by web servers is equally important. A technology that is providing a basic level of integration is Biomoby, emerging as a standard for connection of information servers in molecular biology. Projects related with the grid concept provide an advanced means for the integration of services, in particular the Mygrid project, which has developed the initial tools for the distribution of services in biomedicine.
Computational Systems Biology (CSB)
This relates to the use of the molecular and cellular information for the computational modelling of biological networks, including the analysis of network properties, dynamics and possible simulation. Large-scale experimental approaches have made possible the first descriptions of genetic regulatory networks and protein interaction maps as ‘directed graphs'. Their properties can be studied with a variety of methods, including mathematical frameworks known as ‘Bayesian graphs', able to model the dynamic aspects of networks, and more qualitative approaches, such as the ‘generalised logical formalism' and ‘stochastic equations', adequate to reproduce the properties of systems with very low numbers of components (e.g. proteins expressed in low quantities), which take into account the explicit probability of molecular encounters. All these require careful analysis and consideration in the context of the experimental work in the corresponding systems, with the aim of combining data and modelling those which, through subsequent testing and simulations, can lead to experiments that will be further refined.
European needs in the area of CSB have been discussed extensively in a document entitled ‘Workshop Report on Computational Systems Biology (CSB) - Its Future in Europe', for the EC Research Directorate General Directorate F - Health Research. The document describes the current state of the discussion in Europe. This programme will contribute to some of the key needs detected, relating to standards, scientific interchange and teaching. The following points were identified as key items to be addressed:
Fragmented nature of research in Europe CSB research in Europe is conducted in multiple locations, with little coordination and inadequate funding. Fragmentation includes research and funding in different countries, as well as know-how and approaches in different scientific fields. Through this programme a network of workshops and courses in which representatives of different scientific fields, institutions and countries, can be brought together for information exchange. In turn this activity will increase the awareness of the different national (and international) institutions regarding their position in CSB.
General modelling requirements A major post-genome challenge is to advance from sequences to complete understanding of gene function and biological processes. A key priority is the development of Europe-wide initiatives to create and integrate relevant databases and analysis software, enabling systems-level interpretation of complex experimental data in functional genomics. Research projects should focus on integrated modelling of cellular processes leading to as complete an understanding as possible of the dynamic behaviour of the cell.
Model organisms as data sources, with a focus on disease At the current stage in the development of CSB, results from all biological systems and model organisms are relevant, but some have been particularly well studied and are able to provide the full range of data needed for modelling. Potential single cell systems include S. cerevisiae. B. subtilis, E. coli and filamentous fungi. Multicellular models could include any of the ‘standard' organisms, depending on data available, e.g. mouse, rat, zebrafish, worm, Arabidopsis, mosquito, fly, plus human cells relevant to particular health aspects, e.g. neurones, hepatocytes, heart, etc.
Standardisation of in vitro / in vivo experiments and their data It is often the case that in vitro and in vivo data for CSB modelling are inconsistent, inaccessible, incomplete or unstructured. Experiments should be designed taking account of standards for data collection, storage in databases, as well as analysis already defined and in place and consistent with further modelling. This process is already under way with a range of data at the bioinformatics level, and needs to be extended to make it applicable for CSB analysis, making full use of standard ontologies and controlled vocabularies.
Standardisation of databases, software and modelling A separate standardisation issue relates to the computational software and modelling procedures themselves. Standard computer platforms are being developed as part of a number of worldwide projects.
Data required beyond present 'omics (genomics, proteomics, metabolomics, etc.) Genome, transcriptome, proteome and metabolome studies currently dominate large-scale functional analyses. A missing link is the capacity to observe the output of true units of function. Such functional data may comprise the precise cellular localization of proteins, their interaction in supramolecular structures or reliable protein-protein interaction data. Capturing and modelling dynamic properties, with time course and spatial distribution, is important. In addition, most data generated today give relative values, while it is more difficult, but necessary, to generate data with absolute levels. Although all biological data should be collected with bioinformatics and CSB analysis in mind, key types of quantitative data are becoming available which support CSB and which require special attention for standardisation and analysis. These include gene expression and transcription (microarrays), protein-protein interaction (mass spectrometry, two-hybrid analysis), genetic analysis and mutations (knockouts), comparative genomics (bioinformatics analysis), metabolic flux analysis ( 13 C-labelling), and in vivo imaging (e.g. time-lapse microscopy)
Training requirements Training in bioinformatics remains a concern of European dimensions. The current programme has contributed to progress through the organisation of courses, and we can now do more in collaboration with the FP6 Networks of Excellence and Integrated Projects, focusing on ‘training the trainers'. This will be possible by combining general and specialised courses in collaboration with Networks such as BioSapiens and we will seek actively for similar alliances with other projects.
Increasing excellence in experimental research projects via bioinformatics and CSB While this programme does not have the ability to address directly the development of research projects, we propose to contribute to the creation of a framework for the discussion and scientific exchange in systems biology. In particular we will foster activities related with the interchange of standards between databases of different types. They include the protein interaction and pathway databases, joining the current efforts led by HUPO, as well as the model organism databases and those developed by the ongoing European projects in this area, including COMBIO EMI-CD and others. Our workshops and courses will address these points. Issues related with fragmentation will be a particular focus. We also propose to include a number of topics related to the basic technology, including:
Standards for systems biology, such as languages and simulation techniques.
A new proposal for a network of software for systems biology
Incorporation of information from small molecules into biological contexts.
Standards for access to disease information at various levels, including medical records.
Combination and integration of functional genomics data to derive new biological knowledge. This remains an area where additional efforts are still very much needed. We propose to explore the possibilities of the this new field of knowledge discovery in biology.
|