- Training Courses
- Workshops
- Grants & Fellowships
- European Conference in Functional Genomics
- Meeting Reports
- Online Registration

 

 

European Summer Institute in Statistical Genetics
3-10 September, 2007
Liège, Belgium

Organisers
Report
1. Summary
2. Scientific content
3. Impact of the event
4. Programme

Organiser:

Michel Georges: University of Liège, Belgium

Draft Report

Summary

The European Summer Institute in Statistical Genetics (ESISG) in Liège was the 2007 extra-USA version of the well known Summer Institute in Statistical Genetics (SISG) (http://www.biostat.washington.edu/sisg08/index.php) organized by Prof. Bruce Weir, Head of the Department of Biostatistics of the University of Washington in Seattle. The SISGs aim to provide advanced post-graduate courses of statistical genetics to both statisticians interested in genetics, as well as geneticists wanting to strengthen their knowledge in statistics. Modern genomics entails the manipulation of huge data sets (whether of genomic, transcriptomic or proteomic nature) that require specific expertise to be properly analyzed. There is at present a huge demand for scientists that are competent in this field, and hence a huge demand for high quality courses in these disciplines.

The SISG is typically organised as a series of 2 ½ day modules, comprising both theoretical courses and practicals. The modules are taught by at least two internationally acclaimed experts in the corresponding field, and attended by up to 60 students including PhD students, post-doctoral fellows as well as senior scientists from both academia and the private sector.

The ESISG in Liège comprised 9 modules organized over a period of 10 days (3 series of 2 ½ day modules). It was attended by 223 students, originating from 32 countries, which jointly followed 457 modules. Seventeen instructors, coming primarily from the US and the UK, participated in the 2007 ESISG in Liège.

Scientific Content

Overview. The 2007 ESISG aimed at offering three sets of three consecutive models that would allow the students to first strengthen basic concepts, and then progressively address more advanced and sophisticated issues. The first set was targeting QTL mapping and association studies and was composed of (i) a module on quantitative genetics, (ii) a module on QTL mapping and (iii) a module on association mapping. The second was targeting expression analysis and systems biology and was composed of (i) a module on genomic and proteomic data analysis, (ii) a R/Bioconductor workshop and (iii) a module on graphical models for genetics. The third set was targeting population genetics and included (i) a module on population genetic data analysis, (ii) a module on DNA evidence, and (iii) a module on coalescent theory.

Module 1: Population genetic data analysis. This module covered estimation of allele and haplotype frequencies, inferences about Hardy-Weinberg and linkage disequilibrium, characterization of population structure, linkage estimation, joint genotype probabilities, and relationship estimation. Included the use of public domain software packages, including GDA and PowerMarker .

Module 2: Quantitative Genetics. This module covered quantitative trait models, variances and covariances of relatives, estimation of variance components, response to selection, and the effects of mutation.

Module 3: Genomic and proteomic data analysis. This module provided an overview of array technologies, image analysis and normalization, experimental design, statistical modeling and inference (e.g., detecting differential gene expression, ANOVA, multiple testing, false discovery rate, clustering, and classifi cation), and expression QTL. Applications to molecular and evolutionary biology were covered. The module included software demonstrations.

Module 4: QTL mapping . This module covered linkage map construction, single-marker analyses, multiple and partial regression methods, and interval, composite-interval, and multiple-interval mapping. Model selection and determining significance levels were addressed. Included the use of the Windows QTL-Cartographer software package.

Module 5: DNA evidence. This module covered statistical and population genetic topics for the interpretation of forensic DNA profiles. Topics addressed included allelic independence, Bayes' theorem and likelihood ratios, genotype probabilities for one and two individuals, effects of relatives and population structure, interpretation of mixtures, low copy number profi les, and paternity index and missing person calculations. Included the use of GDA and DNAMix-3 public domain software.

Module 6: R/Bioconductor workshop. This module introduced software for analysis of genetic data, in the R statistical environment. Data management in R, programming concepts for R, and standard regression analyses were discussed. These topics were followed by analysis more specific to genetic data, including association analysis, and haplotype inference. Use of the extensive collection of genomics packages from the Bioconductor project was introduced. Finally, the use of R as an interface to other more specialized “legacy” software was demonstrated.

Module 7: Association mapping. Topics for this module included an introduction to the theory of linkage disequilibrium and mapping, population and family-based association techniques for discrete and quantitative traits, detecting and accounting for population structure, estimating haplotypes from population data, haplotype blocks, and multiple testing issues.

Module 8: Coalescent theory. Sequence variation within populations is important to medical genetics and to the study of evolutionary history. Coalescent models that describe genealogical histories underlying sampled chromosomes in natural populations are central to the analysis of such data. The module covered the derivation and properties of the basic model and its extension to include factors such as recombination, geographical structure, and natural selection; use of the coalescent in analyzing data, considering different statistical approaches to inference in the settings of disease mapping, estimating recombination rates, and detecting recent adaptive evolution; and use of coalescent methodologies in large-scale surveys of genetic variation, such as the HapMap project. Computer programs that can analyze real data and simulate genealogies were demonstrated and used in computer sessions.

Module 9: Graphical models for genetics. Probabilisitic graphical models have their origins in genetic path analysis and provide a natural general framework for expressing and manipulating many important concepts in statistical genetics. Local computational algorithms can be described in this way but complex issues of identification in forensic settings, for example, together with genetic mapping and pedigree uncertainty can all be handled in this context, as can issues of causal inference and identification of regulatory networks. This module introduced the basic ideas and illustrate how graphical models can be used in a variety of settings.

Impact of the event

We are of the strong opinion that the ESISG makes an important contribution in raising the level of expertise in statistical genetics amongst young European scientists. As genomic sciences increasingly rely on such knowledge, the ESISG has an important mission to fulfill. It is therefore our aim to secure more long-term funding to be able to more systematically organize the SISG in Europe . A 2009 version of the ESISG is already scheduled to take place in Liège again.

As for all other editions of the SISG, evaluation forms were circulated amongst attendants at the 2007 ESISG in Liège. Results of this evaluation are available from the organizers upon request. These were very positive in general, both with regards to the content and the organization of the courses, yet with a number of valuable comments and suggestions that will help us to improve the 2009 edition.

Programme
Monday 3th 8:00 AM to Wednesday 5th 12:30 PM:

•  Population genetic data analysis: D. Nielsen & B. Weir

•  Quantitative genetics: W. Muir & B. Walsh

•  Genomic and proteomic data analysis: E. Dermitzakis & G. Gibson

Wednesday 5th 13:30 PM to Friday 7th 17:00 PM:

•  QTL mapping: R. Doerge & Z. Zeng

•  DNA evidence: J.Buckleton & B. Weir

•  R/Bioconductor workshop: T. Lumley & K. Rice

Monday 10th 8:00 AM to Wednesday 12th 12:30 PM:

•  Association mapping: L. Cardon, M. Georges & D. Nielsen

•  Coalescent theory: G. McVean & P. Awadalla

•  Graphical models for genetics: V. Didelez & N. Sheehan