Module Aims
This module aims to provide the understanding, specialist knowledge and practical skills required to analyse large genomic datasets to answer questions about the genetic contribution to human health and disease. The module will particularly focus on processing, annotating and interpreting short-read data, and applying a number of different analytical approaches to both rare and common genetic variation in the context of different diseases and traits.
Module Learning Outcomes
By the end of this module, students should be able to:
- Understand how high-throughput sequencing (HTS) data are generated, and the related quality control processes
- Analyse and interpret HTS data using command-line programming and bioinformatic approaches
- Use appropriate publicly available resources to annotate genetic variation and interpret it in the context of the patient’s disease and clinical presentation
- Understand how the HTS data are utilised for providing genetic diagnoses for patients with rare disease in clinical diagnostic setting
- Understand how genetic associations with rare and common diseases are discovered via cohort studies and different analytical approaches
- Understand and appraise the purpose, current benefits, and future opportunities of the large genomic datasets generated by sequencing the genomes and deep phenotyping of both patient and population-based cohorts
- Appropriately apply different analytical approaches to rare genetic variation
- Develop basic practical skills in the analysis of genomic data in a Cloud computing environment
Pre-requisites
Understanding of the following concepts: the DNA structure; the central dogma (DNA -> RNA -> protein); different types of genetic sequence variation (e.g. single nucleotide substitutions, whole gene deletions) and their effects on protein structure and function; common vs rare variants; common vs Mendelian disease; inheritance patterns (e.g. dominant, recessive, x-linked); linkage disequilibrium and haplotypes. Ideally students would have completed the Genetic Epidemiology module.
This is a very much hands-on module involving cloud-based and command-line programming, and it is expected that students have completed Linux training and are comfortable with writing commands. Students are encouraged to refresh their Linux skills, e.g. by working through some of the many online tutorials (e.g. https://tutorials.ubuntu.com/tutorial/command-line-for-beginners#0; http://www.ee.surrey.ac.uk/Teaching/Unix/index.html). Note that all necessary commands used in practical session will be provided.
Teaching Strategy
Students are required to do around 3 hours of preparatory work for each session (self-directed learning). They will be told in advance of the software and applications that they should download, or set up user accounts for, in preparation for each session.
The sessions will be a mixture of interactive lectures/workshops/group discussions used to consolidate and expand the knowledge gained during the preparatory work, and hands-on genomic data analysis. The module will use anonymised real patient data and publicly available resources, databases and software.
Assessment
Group presentation, 20%: Groups of 3-4 students deliver a 20min presentation on the last day of the module. Presentations should cover the background and context of the task, description of the dataset provided, analysis methods used, results and their interpretation, including wider implication for the use of genomics in public health.
Individual technical report, 80%: Using the feedback provided during the presentations, each student submits up to 1200-word technical report. The focus of the report will be on applying the genomic data methods covered in the course, description and justification of data processing methods used, and interpretation of results at different stages of genomic data processing.
Module Length
4 days