|
Computational methods for RNA analysis
26 July - 8 August 2009
Benasque, Spain
Organisers:
Eric Westhof: University Louis Pasteur of Strasbourg and Institut Universitaire de France, France
Elena Rivas: Janelia Farm Research Campus, Ashburn, Virginia, USA
Draft
Report
Scientific
Content
Everyday several seminars were held on RNA structure. They spanned the
whole range of RNA knowledge and covered also some practical and
experimental aspects of RNA research. It is the only school in the world
where the major scientists on RNA bioinformatics assemble and discuss
freely on their research. The gap between the theoretical scientists and
the experimentalists is especially large in that field bevaseu of the
complexities of the theoretical approaches and the sophistication of the
experimental techniques. At the same time, the difficulties in
communicating the real needs of the practical scientists to the computer
scientists are real.
The first day, we had talks by experimentalists explaining what they were
looking for and their despair at communicating in precise terms their
needs for more computer science. An amazing talk was given on the
discovery of very new and numerous RNAs in the sea and extreme
environments on the basis of metagenomes. The computer tools for such
searches are far from trivial. The second day was dedicated to the de
novo searching of RNAs.
The semantics of family grammars were all reviewed in depth,
emphasising the power and the limits of each of them and making sure the
audience grasp all points. These grammars are central to RNA research.
Later all programmes dedicated to searching RNAs were reviewed and
assessed; how well do they perform? What are the limitations? What goes
wrong? Why is there so little overlap between the various programmes for
the same sets of experimental data?
The third day, more classical approaches were tackled, especially the
partition functions and the underlying problems of the algorithms and
combinatorics of RNA sampling. The state of the art of 2D structrue
prediction was overviewed with the tools available for computations
including pseudoknots.
The fourth day the central roles of databases was discussed. What are
the practical tools? Why are the databases of to-day insufficient? The
famous Rfam database was much discussed. How to improve it? The
available tools are not reliable for automatic classification; manual
intervention is necessary. Can we promote it? In a Wikipedia style? With
everyone improving the annotations and alignments. Because databases
contain sequences extracted by homology seraches in part (and
annotated this way too), databases cannot be better than the tools used
for the searches and the alignments. A recurrent question in this issue is“what is the meaning of homology?” Althought the theoretical
understanding, based on darwinian evolution, is well appreciated, its
manifestation at the sequence level (and especially at the 3d structure
level) is much less so.
The last days of the week was devoted on a continuation of the preceding
discussion with emphasis on the lcoal alignments and the alignments with
respect to a given 2D structure. The integration of substructures is like
with combinatorics a cumbersome problem.
Further, we had discussions on how to assess the validity of 3D structural
models. There exists now several programmes producing automatically 3D
models. What is their validity? How close are they from reality? A clear
assessment of the proximity between prediction and reality (again
definitions of what is meant by reality, or to what one compares
prediction, have to be cleared before) is absolutely necessary in order to
improve the modelling methods and our common understanding of RNA
structure and function.
The following week we started by presentations and discussions on new
technologies on fast and deep sequencing. The impact of those
technologies on RNA biology is incredible but we need to cope with the
production fo data and their interpretations. How to treat short reads of
RNAs? Can we produce RNA structures in a high throuput way? Those
were some of the questions treated. In this framework, the visualisation
tools are critical, since our brains cannot process raw data of such
magnitude. A whole afternoon was dedicated to visualisation tools.
Clearly, although many of such tools are similar and redundant, several
others are missing to reach to the experimentalists.
The next day was dedicated to RNA ontologies and the explanations for
their needs. Can we use ontologies to improve alignements? What are
the reliabilities between sequence and structure alignments? Which kind
of benchmarks should be offered?
Following this discussion, instead of analysing isolated RNAs, we started
to look precisely on RNA-RNA inetractions. Such interactions are key
intramolecularly for the folding of RNA architecture and intermolecularly
for microRNAs binding to their targets of other non-coding RNAs binding
to other RNAs.
Accessibilities of the RNAs are important parameters. Such accessibilities
can in principle be calculated on the basis of the secondary structure but
without the knowledge of the types and numbers of proteins bound to the
RNAs not particularly valuable.
Next, we had to conside the folding kinetics of the RNAs, and especially
the steps occurring during the synthesis of the RNA itself on the
polymerase. In bacetria, it is now experimentally proven that kinetics is
exploited by biology; i.e. the RNA adopts different conformations
depending on the number of nucleotides produced and the speed of
synthesis. In other words, one cannot anymore consider only
thermodynamics stability as the sole criterion for selecting secondary
structure of RNA. Experimentally, some structures are produced under
one condition and they do not change when put in another condition in
which a second structure is stabilised. Kinetics adds a huge difficulty to an
already extremely complex and subtle field.
The last day, we discussed RNA system biology, a new buzz word. This
was kept for the last day since it encompasses all of the other points
discussed during the meeting. Can we integrate all the data gathered in a
coherent and useful fashion? Are we stuck with collecting butterflies
without deep understanding of the underlying biology?
|