|
European Summer Institute in Statistical Genetics
31 August - 9 September 2009
Liège, Belgium
Organiser:
Michel Georges: University of Liège, Belgium
Draft
Report
Summary
The European Summer Institute in Statistical Genetics (ESISG) in Liège was the 2009 extra-USA
version of the well known Summer Institute in Statistical Genetics (SISG) organised by Prof. Bruce
Weir, Head of the Department of Biostatistics of the University of Washington in Seattle. The SISGs
aim to provide advanced post-graduate courses of statistical genetics to both statisticians interested
in genetics, as well as geneticists wanting to strengthen their knowledge in statistics. Modern
genomics entails the manipulation of huge data sets (whether of genomic, transcriptomic or
proteomic nature) that require specific expertise to be properly analysed. There is at present a huge
demand for scientists that are competent in this field, and hence a huge demand for high quality
courses in these disciplines.
The SISG is typically organized as a series of 2 ½ day modules, comprising both theoretical courses
and practicals. The modules are taught by at least two internationally acclaimed experts in the
corresponding field, and attended by up to 60 students including PhD students, post-doctoral fellows
as well as senior scientists from both academia and the private sector.
The ESISG in Liège comprised 9 modules organised over a period of 10 days (3 series of 2 ½ day
modules). It was attended by 120 students, originating from 32 countries, which jointly followed 250
modules. Seventeen instructors, coming primarily from the US and the UK, participated in the 2009
SISG in Liège.
Scientific
Content
Overview. The 2009 ESISG aimed at offering three sets of three consecutive models that would allow
the students to first strengthen basic concepts, and then progressively address more advanced and
sophisticated issues. The first set was targeting QTL mapping and association studies and was
composed of (i) a module on quantitative genetics, (ii) a module on QTL mapping and (iii) a module
on association mapping. The second was targeting expression analysis and systems biology and was
composed of (i) a module on genomic and proteomic data analysis, (ii) a R/Bioconductor workshop
and (iii) a module on graphical models for genetics. The third set was targeting population genetics
and included (i) a module on population genetic data analysis, (ii) a module on DNA evidence, and (iii) a module on coalescent theory.
Module 1: Population genetic data analysis. This module covered estimation of allele and haplotype
frequencies, inferences about Hardy-Weinberg and linkage disequilibrium, characterisation of
population structure, linkage estimation, joint genotype probabilities, and relationship estimation.
Included the use of public domain software packages, including GDA and PowerMarker.
Module 2: Quantitative Genetics. This module covered quantitative trait models, variances and
covariances of relatives, estimation of variance components, response to selection, and the effects of
mutation.
Module 3: Expression data analysis. This module provided an overview of technologies and
statistical methods for analysis of expression data, including RNA, proteins, and metabolites,
obtained by microarrays, mass spectrometry, or sequencing. Sessions cover data normalisation,
experimental design, statistical modeling and inference (e.g., detecting differential gene expression,
ANOVA, multiple testing correction, clustering and classification, network behaviour), and expression
QTL analysis. Examples are taken from human genetics, model systems, and evolutionary biology.
The module included software demonstrations.
Module 4: QTL mapping. This module covered linkage map construction, single-marker analyses,
multiple and partial regression methods, and interval, composite-interval, and multiple-interval
mapping. Model selection and determining significance levels were addressed. Included the use of
the Windows QTL-Cartographer software package.
Module 5: DNA evidence. This module covered statistical and population genetic topics for the
interpretation of forensic DNA profiles. Topics addressed included allelic independence, Bayes’
theorem and likelihood ratios, genotype probabilities for one and two individuals, effects of relatives
and population structure, interpretation of mixtures, low copy number profi les, and paternity index
and missing person calculations. Included the use of GDA and DNAMix-3 public domain software.
Module 6: Coalescent theory. Sequence variation within populations is important to medical genetics
and to the study of evolutionary history. Coalescent models that describe genealogical histories
underlying sampled chromosomes in natural populations are central to the analysis of such data. The
module covered the derivation and properties of the basic model and its extension to include factors
such as recombination, geographical structure, and natural selection; use of the coalescent in
analysing data, considering different statistical approaches to inference in the settings of disease
mapping, estimating recombination rates, and detecting recent adaptive evolution; and use of
coalescent methodologies in large-scale surveys of genetic variation, such as the HapMap project.
Computer programmes that can analyse real data and simulate genealogies were demonstrated and
used in computer sessions.
Module 7: Association mapping. Topics for this module included an introduction to the theory of
linkage disequilibrium and mapping, population and family-based association techniques for discrete
and quantitative traits, detecting and accounting for population structure, estimating haplotypes
from population data, haplotype blocks, and multiple testing issues.
Module 8: R/Bioconductor workshop. This module introduced software for analysis of genetic data,
in the R statistical environment. Data management in R, programming concepts for R, and standard
regression analyses were discussed. These topics were followed by analysis more specific to genetic
data, including association analysis, and haplotype inference. Use of the extensive collection of
genomics packages from the Bioconductor project was introduced. Finally, the use of R as an
interface to other more specialised “legacy” software was demonstrated.
Module 9: Graphical models for genetics. Probabilisitic graphical models have their origins in genetic
path analysis and provide a natural general framework for expressing and manipulating many
important concepts in statistical genetics. Local computational algorithms can be described in this
way but complex issues of identification in forensic settings, for example, together with genetic
mapping and pedigree uncertainty can all be handled in this context, as can issues of causal inference
and identification of regulatory networks. This module introduced the basic ideas and illustrate how
graphical models can be used in a variety of settings.
Assessment of the results & impact of the event on the future direction of the field
We are of the strong opinion that the ESISG makes an important contribution in raising the level of
expertise in statistical genetics amongst young European scientists. As genomic sciences increasingly
rely on such knowledge, the ESISG has an important mission to fulfill. It is therefore our aim to
secure more long-term funding to be able to more systematically organise the SISG in Europe.
Programme
Monday 31st 9:00 AM - Wednesday 2nd 12:30 PM:
a. Population genetic data analysis: D. Nielsen & B. Weir
b. Quantitative genetics: W. Muir & B. Walsh
c. Expression data Analysis : J.Storey & G. Gibson
Wednesday 2nd 13:30 PM to Friday 4th 17:00 PM:
a. QTL mapping: R. Doerge & Z. Zeng
b. DNA evidence: J.Buckleton & B. Weir
c. Coalescent theory: A.Hobolt & P. Awadalla
Monday 7th 9:00 AM to Wednesday 9th 12:30 PM:
a. Association mapping : A.Motsinger-Reif, D. Nielsen & M.Georges
b. R/Bioconductor workshop :T.Lumley & K.Rice
c. Graphical models for genetics: V. Didelez & N. Sheehan. |