Ortholog Databases
Cambridge, UK, Date to be confirmed, 2009

 

Organisers

Erik Sonnhammer, Stockholm Bioinformatics Center, AlbaNova University Center, Stockholm University and Center for Genomics and Bioinformatics, Karolinska Institutet, Stockholm, Sweden
Michael Ashburner,University of Cambridge, UK


Introduction

Orthologs are genes in different organisms that originate from a single gene in the last common ancestral species. Therefore, orthologs are more likely to have the same function than other homologs. When comparing complete proteomes it is thus of primary interest to identify the orthologs while excluding homologs that are not orthologous.

To date, at least 15 ortholog databases exist. The reason for this diversity is that different research groups have focused on different species, different methodology, and different resolution. Some have prioritised sensitivity while others have minimised the error rate. Several ortholog databases have been reviewed. Alexeyenko et al. (2006) compared 15 databases and Dolinski and Botstein (2007) compared 9. Hulsen et al. (2006) compared six different orthology assignment methods. Although these reviews give a general picture of the advantages and disadvantages of each method, they were hampered by the lack of standards and reference data sets. There is a need to improve quantitative and qualitative assessment of ortholog databases, in order to compare them objectively.

The Model Organism Databases (MODs) are probably the most important users of ortholog databases. A completely sequenced genome's MOD needs to cross-reference other MODs, and the biologically most relevant way to do this is via orthologs. At present, a few MODs, e.g. FlyBase and Wormbase, have implemented pipelines to incorporate ortholog links. Many MODs have not been able to do this, however, partly due to the diversity of ortholog databases and lack of standards. This 2-day workshop that will bring together ortholog database providers and MODs, in order to exchange information and to promote development of better standards and better tools for analysing and processing orthology information. The expected outcome of this meeting is agreement between the providers of ortholog datasets with respect to both (i) input protein datasets to be used for analysis, and (ii) a common output data format, which would be enormously helpful to to "consumers" of these datasets, that is the Model Organism Databases.

The current ortholog databases are: InParanoid, COGs, OrthoMCL, EG, EGGnog, OMA, Homologene, Treefam, PPOD, Panther

The most relevant MODs are: FlyBase, Wormbase, MGI, SGD, Dictybase, Gramene, TAIR, PlasmoDB, Gene Ontology

Major sequence databases: Uniprot, Ensembl, NCBI

References
Dolinski K, Botstein D.Orthology and functional conservation in eukaryotes.Annu Rev Genet. 2007, 41:465-507
Andrey Alexeyenko, Julia Lindberg, Asa Perez-Bercoff, Erik L.L. Sonnhammer Overview and comparison of ortholog databases Drug Discovery Today: Technologies; 2006, 3:137-143
Hulsen T, Huynen MA, de Vlieg J, Groenen PM.Benchmarking ortholog identification methods using functional genomics data.Genome Biol. 2006, 7:R31

Invited Speakers

Erik Sonnhammer, InParanoid
Michael Ashburner, FlyBase
Eugene Koonin, COGs
David Roos, OrthoMCL, PlasmoDB
John Quackenbush, EGO
Peer Bork, EGGnog
Gaston Gonnet, OM
Richard Durbin, TreeFam
David Botstein, PPOD
Lincoln Stein, Gramene
Judith Blake, MGI
Michael Cherry, SGD
Erik Just, DictyDB
Robert Poole, TAIR
Amos Bairoch, Uniprot
Paul Thomas, Panther
Suzanna Lewis, Gene Ontology
Ewan Birney, Ensembl
David Lipman, NCBI

Draft Programme

The programme will be available shortly.

  Venue

Details of the venue will be available shortly.

Registration

Registration is not open yet .

 



 

_______________________________________________________________________________

 

______________________________________________________________________________