- Training Courses
- Workshops
- Grants & Fellowships
- European Conference in Functional Genomics
- Meeting Reports
- Online Registration

 

 

Ortholog Databases
5-6 July 2009
Cambridge, UK

Organisers
Report
1. Summary
2. Scientific content
3. Assessment of the results & impact of the event

Organisers:

Erik Sonnhammer: Stockholm Bioinformatics Center, AlbaNova University Center, Stockholm University and Center for Genomics and Bioinformatics, Karolinska Institutet, Stockholm, Sweden
Michael Ashburner: University of Cambridge, UK

Draft Report

Summary

Orthologs are genes in different organisms that originate from a single gene in the last common ancestral species. Therefore, orthologs are more likely to have the same function than other homologs. When comparing complete proteomes it is thus of primary interest to identify the orthologs while excluding homologs that are not orthologous.

To date, at least 15 ortholog databases exist. The reason for this diversity is that different research groups have focused on different species, different methodology, and different resolution. Some have prioritised sensitivity while others have minimised the error rate. Several ortholog databases have been reviewed. Alexeyenko et al. (2006) compared 15 databases and Dolinski and Botstein (2007) compared 9. Hulsen et al. (2006) compared six different orthology assignment methods. Although these reviews give a general picture of the advantages and disadvantages of each method, they were hampered by the lack of standards and reference data sets. There is a need to improve quantitative and qualitative assessment of ortholog databases, in order to compare them objectively.

The Model Organism Databases (MODs) are probably the most important users of ortholog databases. A completely sequenced genome's MOD needs to cross-reference other MODs, and the biologically most relevant way to do this is via orthologs. At present, a few MODs, e.g. FlyBase and Wormbase, have implemented pipelines to incorporate ortholog links. Many MODs have not been able to do this, however, partly due to the diversity of ortholog databases and lack of standards.

The 2-day workshop brought together ortholog database providers and MODs, in order to exchange information and to promote development of better standards and better tools for analysing and processing orthology information. The outcome of this meeting was agreement between the providers of ortholog datasets with respect to both (i) input protein datasets to be used for analysis, and (ii) a common output data format, which would be enormously helpful to to "consumers" of these datasets, primarily Model Organism Databases and protein function annotators.

Scientific Content

Each delegate presented his/her approach to the orthology problem. The scientific content was either in the form of original algorithms, comparative studies of multiple algorithms, or approaches to using orthology for improved protein function annotation. The panel discussions were mostly focused around interoperability issues, i.e. how to create standardized proteome sets and formats. Also standardized benchmarks was seen as a great advantage.

For easy communication, a Google Group called “Quest for orthologs” was set up. It was agreed that all speakers should upload their presentations there.

A meeting report has been published in Genome Biology:“Joining forces in the quest for orthologs”, Gabaldón T, Dessimoz C, Huxley-Jones J, Vilella AJ, Sonnhammer EL, Lewis S., Genome Biol. 2009 10:403.

Assessment of the results & impact of the event on the future direction of the field

The main result is that the orthology community will become much more coherent. Before, most groups would develop their own method, apply it to their own datasets, and making it available in their own format. This made it impossible to compare different approaches, and very hard to integrate results from different groups. The workshop has made it clear that the field needs to standardise data and formats, and have also agreed on how to do this.

The impact on orthology consumers is that they will be able to utilise a wide range of orthology databases with a relatively small effort.

The impact on orthology providers is that they will be able to access prepared standardised datasets and no longer will have to painstakingly assemble them themselves.

The impact on comparative studies is that it will finally be possible to compare different orthology algorithms using the same data, and it will become much less labor-intensive to do so.

Future direction: This workshop marked the start of a global collaboration between orthology researchers. The agreements were mostly on a high level, outlining broad goals and general properties of the data exchange facilities. What needs to be done in the future is to define all the details. A Google group will be used to discuss proposed solutions.

It was agreed to arrange a follow-up meeting in about a year.