Deep Learning in Genomics and Biomedicine

Course Overview

Recent breakthroughs in high-throughput genomic and biomedical data are transforming biological sciences into "big data" disciplines. In parallel, progress in deep neural networks is revolutionizing fields such as image recognition, natural language processing and, more broadly, AI. This course explores the exciting intersection between these two advances.
The course will start with an introduction to deep learning and overview the relevant background in genomics and high-throughput biotechnology, focusing on the available data and their relevance. It will then cover the ongoing developments in deep learning (supervised, unsupervised, and generative models) with a focus on the applications of these methods to biomedical data, which are beginning to produce dramatic results. In addition to predictive modeling, the course emphasizes how to visualize and extract interpretable, biological insights from such models.
Recent papers from the literature will be presented and discussed. Experts in the field will present guest lectures. Students will be introduced to and work with popular deep learning software frameworks. Students will work in groups on a final class project using real world datasets.

Prerequisites

College calculus, linear algebra, basic probability and statistics such as CS 109, and basic machine learning such as CS 229. No prior knowledge of biology is necessary.

Time and Location

In peson Mon, Wed 3:00 PM - 4:20 PM at 370-370

Instructors

Anshul Kundaje, Associate Professor (akundaje@stanford.edu)

James Zou, Assistant Professor (jamesz@stanford.edu)

TAs

Kyle Swanson (swansonk@stanford.edu)

Soumya Kundu (soumyak@stanford.edu)

Austin Wang (atwang@stanford.edu)

Recitation

There are no regularly scheduled recitations/discussions. Instead, short videos and example notebooks will be posted to Canvas.

Office hours

All times are in PST

Time Location
Anshul Kundaje and James Zou After each class Class

Kyle Swanson

Soumya Kundu

Austin Wang

Friday 10-11am Bytes Cafe (1st floor of David Packard Electrical Engineering)

Assignments

Course project (80%): The students will form teams of 3⁠–5 and choose from one of the suggested projects or select their own project. Teams are expected to work on the research project throughout the second half of the quarter and produce conference-style papers. Each team will present the paper to the entire class at the end of the semester. The course project will consists of the following milestones:

  1. One project update presentations in class.
  2. First draft of the paper for peer review.
  3. Project presentation in class.
  4. Final paper.

Peer review (10%): Each student will be assigned another group's project paper draft to review. The review should concisely summarize the key findings of the paper, highlight interesting ideas, weaknesses and give suggestions.

Class participation (10%): We encourage students to actively participate in all the classes by asking questions and offering comments.

Schedule

Note that as the quarter progresses, some parts of the schedule are subject to change, including the papers to read prior to each class.

https://www.nature.com/articles/s41588-022-01065-4
Date Week Topic Suggested readings Slides
4/3 1 Bioimaging Deep learning for cellular image analysis. PDF
4/5 1 Bioimaging

Image-based profiling for drug discovery.

DynaMorph

PPT
4/10 2 Bioimaging

Predicting transcriptomics from histology.

Deep learning in histopathology: the path to the clinic.

PPT
4/12 2 Bioimaging OpenCell: Endogenous tagging for the cartography of human cellular organization | Science PPT
4/17 3 Regulatory genomics

Life and its molecules https://www.biostat.wisc.edu/bmi576/papers/hunter04.pdf (Basic primer on mol. biology for computational students)

Human genetic variation and its contribution to complex traits https://www.nature.com/articles/nrg2554 (A review on human genetic variation and disease)

PPTX
4/19 3 Regulatory genomics

(Review) Deep learning: new computational modelling techniques for genomics https://www.nature.com/articles/s41576-019-0122-6 

Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks
https://genome.cshlp.org/content/26/7/990.long 

Base-resolution models of transcription-factor binding reveal soft motif syntax
https://www.nature.com/articles/s41588-021-00782-6 

An interpretable bimodal neural network characterizes the sequence and preexisting chromatin predictors of induced transcription factor binding
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02218-6 

maxATAC: Genome-scale transcription-factor binding prediction from ATAC-seq with deep neural networks
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1010863 

PPTX
4/24 4 Regulatory genomics

Base-resolution models of transcription-factor binding reveal soft motif syntax
https://www.nature.com/articles/s41588-021-00782-6

Interpretable Machine Learning https://christophm.github.io/interpretable-ml-book/   

A unified approach to interpreting model predictions https://arxiv.org/abs/1705.07874  

PPTX
4/26 4 Regulatory genomics

Molecular quantitative trait loci
https://www.nature.com/articles/s43586-022-00188-6

A method to predict the impact of regulatory variants from DNA sequence
https://www.nature.com/articles/ng.3331

Predicting effects of noncoding variants with deep learning–based sequence model
https://www.nature.com/articles/nmeth.3547 

PPTX
5/1 5 Regulatory genomics/Single cell PPTX
5/3 5 Single cell/spatial
5/8 6 Regulatory genomics

Sequence-based modeling of genome 3D architecture from kilobase to chromosome-scale
https://www.nature.com/articles/s41588-022-01065-4 

Effective gene expression prediction from sequence by integrating long-range interactions
https://www.nature.com/articles/s41592-021-01252-x 

How far are we from personalized gene expression prediction using sequence-to-expression deep neural networks?
https://www.biorxiv.org/content/10.1101/2023.03.16.532969v2 

Current sequence-based models capture gene expression determinants in promoters but mostly ignore distal enhancers
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-023-02899-9 

Chromatin interaction–aware gene regulatory modeling with graph attention networks
https://genome.cshlp.org/content/early/2022/04/08/gr.275870.121  

PPTX
5/10 6 Single cell/spatial Romain Lopez guest lecture
5/15 7 project proposal
5/17 7
Protein sequence
Disease variant prediction with deep generative models of evolutionary data (EVE)
https://www.nature.com/articles/s41586-021-04043-8 

Evolutionary-scale prediction of atomic-level protein structure with a language model (ESMFold)
https://www.science.org/doi/10.1126/science.ade2574 

Simplified description of AlphaFold2 Architecture and loss functions https://www.uvio.bio/alphafold-architecture/ 
PPTX
5/22 8
Molecules/proteins
Brian Hie guest lecture PDF
5/24 8
Molecules/proteins
PDF
5/29 9 holiday/no class
5/31 9 Population omics/EHR
6/5 10
student presentation
6/7 10
student presentation

 

Course Summary:

Date Details Due