- Training Courses
- Workshops
- Grants & Fellowships
- European Conference in Functional Genomics
- Meeting Reports
- Online Registration

 

 

Computational Challenges of the Next Generation Sequencing
15-16 January 2009
Uppsala, Sweden

Organisers
Report
1. Summary
2. Scientific content
3. Assessment of the results & impact of the event
4. Programme

Organisers:

Jan Komorowski, Uppsala University, Sweden
Philipp Bucher, Swiss Institute of Bioinformatics, Lausanne, Switzerland

Draft Report

Summary

The next generation sequencing tecnologies have started to generate a new burst of biological data of different types. This symposium brought together leading bioinformatics and computational biology experts with first-hand experience in management and analysis of the data generated by these new technologies. World leading experts in their respective fields were presenting talks on the most promising application areas of next generation sequencing (NGS), including genome re-sequencing, transriptome analysis, chromatin-immunoprecipation (ChIP-Seq), and DNA-methylation profiling. The format of the symposium provided substantial time for moderated and informal discussions, which benefited from a very active participation by young scientists, including Ph.D. students. The panel discussion also left room for discussion of organisational and political issues by senior scientists holding decision-making positions regarding infrastructire investments and the choices of future research directions. Based on feedback from participants, we conjecture that a number of new collaborations were initiated at this meeting, mostly between users of the technologies and bioinformatics experts. Overall, the conference largely confirmed our assumption that next generation sequencing will have a revolutionary impact on biology, and that the challenges for computational biologists will be immense. At the same time we were impressed about the rapid response by the European bioinformatics community to this new challenge. Without exaggeration, we can say that virtually every speaker presented brand-new ideas and results we have not heard before. In summary, we feel that this conference was more than timely and fulfilled its promise in giving a snapshot of the state of the art NGS data analysis, and bringing together people with complementary experience.

Scientific Content

We were lucky in that we were able to invite excellent speakers and world leading experts for virtually all areas we intended to cover. A brief summary of the main messages and highlights from the presentations follows.

Eran Segal gave a fascinating talk on nucleosome positioning in yeast based on the computational analysis ChIP-Seq (chromatin immunoprecipitation combined with NGS) data. The work presented constitutes an impressive demonstration of the power of integrating NGS data analysis with classical bioinformatics approaches to study DNA regulatory sequences. Claes Wadelius presented additional examples of new discoveries made possible by the ChIP-Seq technique. In particular he showed how this method can be used to analyse the in vivo finction of regulatory SNPs in mammalian genomes. Esko Ukkonen presented results from a joint project with Jussi Taipala aimed at defining the binding specificity of human transcription factors with the SAGE/SELEX and NGS sequencing. His talk focused on the highly non-trivial compoutational problem of extracting a sequence motif from hundreds of thousands of sequences containing a binding site for the same transcription factor. Thomas Down gave a technical presentation of a state-of-the-art Bayesian algorithm for genome-wide inference of the methylation status of CpG dinuclueotides from bisulfate sequencing data.

Three talks illustrated the potential for NGS for quantisation and discovery of RNA molecules. Stefan Haas presented convincing evidence that RNA-Seq is more sensitive and accurate than microarray-based techniques in measuring the expression levels of all mRNAs in a given cell type. Richard Sandberg focused on the detection of new splice-variants. Finally, Frank Schwach explained how deep sequencing can be used to discover novel microRNAs and other non-coding RNAs. All three speakers explained in a didactic fashion the computational issues involved in the analysis of the data and discussed the pros and cons of currently available methods and software tools.

NGS technology makes it possible to sequence multiple genomes from the same or closely related species at a very low price. Leif Andersson's talk was an imressive illustration of what can be learned about the evolution and function of a domestic animal (chicken) by re-sequencing individuals from different strains. Ed Green reported fascinating results from a mammoth's sequencing project explaining the very intricate problems one faces when trying to reconstruct a genome from short read sequences of damaged ancient DNA samples. The last speaker, Erik Bongcam, presented the computational challenges of NGS from the viewpoint of a bioinformatics support provider in a provocative and enlightening manner.

The symposium ended with a panel discussion. he panel was composed of senior researchers from experimental and computational biology as well as a graduate student representative. The following four issues were discussed:

Data sharing: There were different opinions regarding the questions whether raw data versus derived data should be stored or exchanged. However, there was agreement that the question is important and that it would be desirable to reach a consensus on this matter. There was a general consensus that proper annotation of data sets is important and that standards like Miami for micro-array data should be proposed. On a technical note, it was mentioned that sets of related genomes could be published in a compressed format, in which only the difference to a reference genome are recorded.

Education issues: It was recognised that the users of NGS technology often don't know the methods, computer programs and computational infrastructure necessary to interpret the data. Nevertheless, at most universities, there is at least one person who knows something about the bioinformatics analysis of NGS data, but the know-how typically does not spread to other users. The graduate student's representative proposed regular informal meetings of all graduate students and postdoctoral fellows involved in NGS projects for exchange of technical tips.

Political issues: Should researchers be forced to archive data or to submit data to public repositories? It was mentioned that data storage costs may exceed the costs of an individual experiments. The obligation to archive raw data (images) could significantly increase the costs of a project. Moreover, running a public archive of NGS data may also turn into an expensive operation. The question is therefore highly relevant to funding agencies. The processing of primary data to more compact derived data (peak positions in case of ChIP-Seq, mRNA expression levels, etc.) may also be a computationally expensive process. The conclusion was that funding agencies and research centres should be aware of the computational costs of NGS-based research, and that investments in computational infrastructure will ne necessary to ensure good use of the new technology.

Bioinformatics analysis: What holds for laboratory equipment and computational resources also holds for manpower. Experiments are quick, bioinformatics analysis is laborious and time consuming. Clearly, the advent of NGS calls for an extension of the bioinformatics training programmes at the bachelor and masters level. It was also felt that the computational challenges in the different application areas of NGS are very different from each other. The model where a single bioinformatics expert attached to a sequencing facility would take care of the data analysis or consult users in all different areas does therefore not seem to be a viable one.

Assessment of the results & impact of the event on the future direction of the field

Besides presenting the state of the art of the field, the various talks and the panel discussion helped to clarify the main challenges and future needs in the computational analysis of NGS data. Reinforcements of bioinformatics training programmes, investments in computational infrastructure for data storage and analysis, and improved exchange of bioinformatics know-how may be among the most effective measures that should be taken in the near future.

The various talks made clear that the computational methods developed in the different application areas are surprisingly different from each other. This even applies to the basic initial data processing steps where some applications require elimination of low quality sequences prior to downstream analysis, whereas others are error-tolerant and benefit from more sequences. Briefly, NGS data analysis is not a single sub-discipline of computational biology.

The presentations of ChIP-Seq and RNA-Seq confirmed that these technologies are both more accurate and cost-effective than microarray-based alternatives and therefore likely to take over in the near future. Regarding methylation profiling the situation is less clear: the sequencing-based approaches are definitely more accurate but substantially more expensive than alternative approaches. Moreover, the analysis of the data is highly non-trivial.

The various discussions at the meetings made clear that access to bioinformatics tools and knowledge of how to use them are the bottlenecks in the applications of NGS technology. We were therefore very pleased to see many NGS users with little bioinformatics experience among the participants. This shows a strong interest in the topic of this symposium from outside the bioinformatics community, which we find encouraging.

In conclusion, we are confident that this conference had a significant and positive impact on the future of the field. For one thing, it showed that the European computational biology community is very active in responding to needs created by NGS technology, perhaps more active than their colleagues in the US. The symposium certainly helped shape a community feeling among us. All participants had the opportunity to talk in person to the experts of a particular application area. We believe that this will promote establishing European research networks capable of integrating NGS technology in the large-scale biomedical research efforts funded by the EU.

Programme
Friday 15 January
8:30-9:00 Registration
9:00-9:15 Opening of the conference: Jan Komorowski/Philipp Bucher
Chair: Jan Komorowski
9:15-10:15 Eran Segal, Weizmann Insitute, Israel. Using high-throughput sequencing to decipher the gene rgulatory code
10:15-11:00 Coffee and mingle session
11:00-11:45 Esko Ukkonen, Helsinki, Finland. Computational prediction of gene regulator modules in DNA
11:45-13:30 Lunch
Chairman: Claes Wadelius
13:30-14:30 Stefan Haas, Berlin, Germany. Deep sequencing of the transcriptome of two human cell lines
14:30-15:30 Leif Andersson, Uppsala Univeristy, Sweden. Whole genome resequencing - a major leap forward in chicken genomics
15:30-16:10 Coffee break
16:15-17:00 Richard Sandberg, Karolinska Institute, RNA-analyses of human tissue transcriptomes reveals alternative isoform regulation and compositional differences
17:00 End of day 1
19:00 Dinner at the Linne Orangeriet

Saturday 16 January
8:30-9:00 Registration and coffee
Chairman: Philipp Bucher
9:00-10:00 Claes Wadelius, Uppsala, Sweden. Reading the regulatory code
10:00-10:45 Coffee and mingle session
10:45-11:30 Thomas Down, Cambridge, UK. Quantifying the epigenome
11:30-13:00 Lunch
13:00-13:45 Frank Schwach, Norwich, UK. Computational analysis of large-scale small-RNA data sets
13:45-14:30 Ed Green, Dresden, Germany. Computational issues in ancient genomics
14:30-15:15 Erik Bongcam-Rudloff, Uppsala, Sweden. Annotating next-sequencing data
15:15-16:00 Coffee and mingle session
16:00-16:45 Panel discussion
16:45-17:00 Closing remarks and end of conference