DRP-HCB Training Courses
This page is intended to provide a set of resources for those interested in learning about bioinformatics and acquiring new skills. It is by no means a comprehensive list of online training materials and does not cover everything you need to know to become a skilled bioinformatician. However, it should be enough to get you started and we will happily add more subjects if they are of interest to members of the centre.
Our dedicated bioinformatics servers provide an expansive list of tools via the Linux command line, an RStudio server and access to programming languages and libraries such as Python and Perl. If you require an account, please contact Shaun Webb.
Linux command line
Linux operating systems are the workhorse for most large scale genomics applications and are well suited to managing and processing huge data files produced by high throughput sequencing technologies. Although command-line computing can be daunting to new users, grasping a few fundamental commands and learning to run software packages and pipelines from the terminal will open new doors
Beginner
Intermediate
-
- git, Conda, Snakemake, R Markdown, Jupyter, Docker and Singularity
Package management with Conda (coming soon)
Running automated pipelines (coming soon)
Programming in R
R is a programming language designed for data retrieval, manipulation and visualisation as well as statistical analysis. Learning R will allow you to move beyond spreadsheets and access the full analysis potential of large datasets. You can build multi-dimensional visualisations as well as interactive web applications and documents. It’s easy to get to grips with the basics of R and the RStudio interface provides an intuitive graphical environment in which to develop your code. The extensive Bioconductor libraries also provide pre-built functions to analyse and interpret all sorts of genomic datasets.
Beginner
-
- RStudio, base R, Tidyverse packages
Beginner - Intermediate
Advanced topics
Programming in Python
Python is a general-purpose programming language that is becoming increasingly popular with data scientists. Python has a growing number of libraries for data manipulation, exploration and visualisation as well as machine learning and is ideal for deploying applications and reproducible code. Jupyter notebooks provide an interactive development environment for coding in Python and other languages.
Beginner - Intermediate
Advanced topics
Building pipelines with SnakeMake
Statistics for biology
While there are countless resources for learning statistics and a wealth of engaging videos on Youtube, it is important for bioinformaticians to understand statistics in the context of biology. Knowledge of statistical analysis will allow you to correctly describe and make inferences from your data as well as designing meaningful and useful experiments. Here are some links to get started:
Introduction to Stats (slides)
Introduction to statistics with R
Points of Significance (Nature collection)
Statistics for biologists (Nature collection)
Statistical learning (comprehensive stats applied in R)
Analysing HTS experiments
The tutorials below provide introductions to analysing specific types of high throughput sequencing experiments. They are designed to help you understand your data, complete generic processing steps and explore and interpret the output. Please note the pre-requisite skills in brackets.
In most cases, the pipelines provided are simple and generic and may not be suitable for your own analysis. The bioinformatics core facility is available to collaborate on your project and to offer advice on experimental design and data analysis strategies. We also provide an expansive set of software tools and analysis workflows accessible through our bioinformatics servers:
ChIP-seq
- WCB ChIP-seq analysis workshop (Linux command line)
RNA-seq
WCB RNA-seq analysis (Linux command line)
Hi-C
- WCB Hi-C data analysis (command line)
CLIP/CRAC
CLASH
Methyl-seq
- QC and alignment of BS-seq data using Bismark (command line)
Further Bioinformatics Training Resources
- Training programme for Edinburgh Genomics workshops and online courses.
- Community driven project delivering training in data science and coding skills. Software Carpentry and Data Carpentry websites include many free online courses. Edinburgh Carpentries host regular events at the university and include a Data Carpentry for Genomics course.
- A comprehensive set of courses covering R, Python, and biostatistics among other things. Includes case studies in analysing RNA-seq, ChIP-seq and Methyl-seq data.
Harvard Chan Bioinformatics Core
- Training courses in R and command line including many different HTS techniques.
Training courses offered by the Babraham Bioinformatics group in R, Python, statistics and sequencing applications.
- A collection of bioinformatics training courses
- A collection of R courses in data science, statistics and computational biology.
- Interactive learning with videos, lessons, tests and a built-in programming interface. Datacamp is a subscription service with some free modules for learning R, Python and data science skills.
- A collection of bioinformatics training courses from the University of Cambridge bioinformatics core facility.