2nd BioCreAtIvE Challenge Evaluation:
assessment of text mining methods in molecular biology

Madrid, Spain, 23 - 25 April 2007

 

Organisers
Martin Krallinger, CNIO, Madrid, Spain
Alfonso Valencia, CNIO, Madrid, Spain

Lynette Hirschman, MITRE, Bedford, MA, USA

Introduction

_____________________________________________________________________________

The biomedical literature contains functional characterizations of genes and proteins, being the main information source for biological database annotations. The growth of scientific literature databases such as PubMed together with the increasing interest in more efficient information access demanded by the biology community resulted in methods that can automatically process collections of biological texts. Text mining aims to efficiently retrieve and classify documents in response to complex user queries and to perform a deeper analysis of the literature to extract specific associations, such as protein-protein interactions and protein annotations.

BioCreAtIvE is a community-wide effort for evaluating text mining and information extraction systems applied to the biological domain. BioCreAtIvE arose out of the needs of working biologists, biological curators and bioinformaticians to access the wealth of information in the literature, and to link this information to biological databases and ontologies. BioCreAtIvE focuses on the comparison of methods and community assessment of scientific progress, rather than on the purely competitive aspects. BioCreAtIvE is organized through collaborations between text mining groups, biological database curators and bioinformatics researchers.

The Second BioCreAtIvE challenge will be held during October of 2006, with the workshop to be held in April 2007. This will consist of three tracks
•  focus on finding the mentions of genes and proteins in sentences drawn from Medline abstracts
•  produce a list of the EntrezGene identifiers for all the human genes/proteins mentioned in a collection of MEDLINE abstracts and is similar to BioCreAtIvE I Task 1B
•  a new advanced task on protein interaction detection, coordinated by our group at the CNIO, in collaboration with two of the main protein interaction databases (MINT and INTACT). The complexity of the first large scale proteomics experiments makes of the extraction of experimentally validated interactions extracted from text a key area in fast development. A number of text mining tools are already accessible to the community, including the popular iHOP system developed by our group, that has truly captured the interest of both biologists as well as biological database developers. The BioCreAtIvE protein interaction challenge will include detection of articles containing information relevant to protein interactions, the detection of actual protein interaction pairs, and the corresponding text evidence. More than 55 teams are already training their systems for the protein interaction task, which will be evaluated using a test collection released in October 2006. We will be then in the best position to assess their performances and estimate the capacity of the current text mining systems.

Venue

The meeting will be held at the Centro de Investigaciones Oncologicas (CNIO) located in northern Madrid.

Registration

Registration is closed.

Draft Programme

A pdf of the programme can be downloaded from here.

Sunday 22nd April
20:00 Get together dinner.

Monday 23rd April
9:30-10:30 Welcome and Introduction to Second BioCreAtIvE challenge
10:30-11:15 Session 1: Detection and Evaluation of Gene Mentions
11:30-13:30 Selected participant of the Gene Mention Task participation (6 presentations)
13:00-14:30 Lunch
14:30-15:10 Organized discussion: current state and future direction of NER in Biology 3
15:25-15:50 Invited Speaker
15:50-16:50 Poster session I: Gene Mention task
16:50-17:50 BioCreAtIvE and the RegCreative jamboree: Mining Genetic Interactions
20:30 Dinner

Tuesday 24th April
9:00-10:00 Session 2: Introduction and Evaluation of Gene Normalization task
10:00-11:00 Selected participants of the Gene Normalization Task participation (3 presentations)
11:20-12:30 Selected participants of the Gene Normalization Task participation (3 presentations)
12:30-13:10 Organized discussion: Current state and of Gene Mention and Normalization
13:10-14:10 Lunch
14:10-14:30 Invited speaker
14:30-15:30 Poster session II: Gene Normalization task
15:30-15:50 Invited speaker
15:50-16:20 Session 3: Introduction to Protein-Protein Interaction task
16:20-16:50 The Interaction-Article Sub-Task evaluation
16:50-18:20 Selected participants of the Interaction-Article Sub-Task (4 presentations)
20:30 Dinner

Wednesday 25th April
9:15-10:00 The Interaction-Pair Sub-Task evaluation
10:00-11:00 Selected participants of the Interaction-Pair Sub-Task (4 presentations)
11:20-11:40 Invited speaker
11:40-12:10 The Interaction-Method Sub-Task evaluation
12:10-13:10 Selected participants of the Interaction-Method Sub-Task (4 presentations)
13:10-14:20 Lunch
14:20-14:40 Invited speaker
14:40-15:40 Poster session III: Protein-Protein Interaction Task
15:40-16:10 The Interaction-Sentence Sub-Task evaluation
16:10-17:10 Selected participants of the Interaction-Sentence Sub-Task (4 presentations)
17:10-18:00 Organized discussion:
18:00-18:30 Closing Open Public Lecture.

Speakers

Gianni Cesareni Dept of Biology, University Rome Tor Vergata, Rome, Italy
Aaron M. Cohen School of Medicine, Dept of Medical Informatics and Clinical Epidemiology (DMICE), Oregon Health & Science University, USA
Matthew Day Database Publisher Nature Publishing Group, UK
Lynette Hirschman MITRE Corporation, USA
Samuel Kerrien EMBL Outstation, European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, UK
Martin Krallinger Structural and Computational Bioinformatics Programme, Spanish National Cancer Centre, CNIO, Spain
Suzanna Lewis Director of Bioinformatics, Berkeley Drosophila Genome Project, USA
Alex Morgan Biomedical Informatics Programme, Stanford University, USA
Jun'ichi Tsujii Dept of Computer Science, University of Tokyo, & University of Manchester, UK, & Director National Center for Text Mining, UK
Alfonso Valencia Head of Structural and Computational Bioinformatics Programme, Spanish National Cancer Centre, CNIO, Spain
John Wilbur Computational Biology Branch, National Center for Biotechnology Information, NCBI, NIH, USA

Sponsors

The organisers are grateful to the CNIO and the ESF for supporting this meeting.

 

 

_____________________________________________________________________________
_____________________________________________________________________________
_____________________________________________________________________________
_______________________________________________________________________________

 

_____________________________________________________________________________