Deep Learning in Genomics and Biomedicine

Course Overview

Recent breakthroughs in high-throughput genomic and biomedical data are transforming biological sciences into "big data" disciplines. In parallel, progress in deep neural networks are revolutionizing fields such as image recognition, natural language processing and, more broadly, AI. This course explores the exciting intersection between these two advances. The course will provide an introduction to deep learning and overview the relevant background in genomics, high-throughput biotechnology, protein and drug/small molecule interactions, medical imaging and other clinical measurements focusing on the available data and their relevance. We will cove  ongoing developments in deep learning (supervised, unsupervised and generative models) with the focus on the applications of these methods to biomedical data, which are beginning to produced dramatic results.  In addition to predictive modeling, the course emphasizes how to visualize and extract interpretable, biological insights from such models. Recent papers from the literature will be presented and discussed. Students will work in groups on a final class project using real world datasets.


College calculus, linear algebra, basic probability and statistics such as CS109, and basic machine learning such as CS229. Prior knowledge of the biology domains is not necessary.


Anshul Kundaje , Assistant Professor (

James Zou , Assistant Professor ()


Kevin Wu, Graduate Student, Computer Science (

Laksshman Sundaram, Graduate Student, Computer Science (


Fridays 10:30 - 11:20am at 380-380Y. 

Note: we will not have recitation every week. Please check the syllabus for recitation topics. 

Office hours

James Zou Find me after each class or Mondays 5-5:30pm in Packard 258.

Anshul Kundaje : Find me after each class or Wednesdays 5-5.30 pm (Lane Med School Building L301)

TAs: Thursdays 5pm - Packard Engineering 2nd Floor Kitchen


Course project (60%): the students will form teams of 3-4 and choose from one of the suggested projects or select their own project. Teams will be given Google Cloud credits to implement algorithms and perform analysis. Teams are expected to work on the research project throughout the second half of the quarter and produce conference-style papers. Each team will present the paper to the entire class at the end of the semester. The course project will consists of the following milestones:

  1. Project proposal in class (3 minute talk).
  2. First draft of the paper for peer review. 
  3. Poster presentation (12/10 8:30-11:30pm Gates lobby).
  4. Final paper (due at noon on Friday 12/13). 

A significant portion of the class will be based on reading and discussing the latest literature. Every student should read the assigned papers before class and participate in discussions.

Paper presentation (20%): each team selects one of the suggested papers to present in detail to the class. The presentation should be 20 mins + 5 mins for Q&A. Each team will also write a concise review of the paper. The review will be published on bioRxiv. 

Peer project review (10%): each team will be assigned two other groups' paper drafts to review. The review should concisely summarize the key findings of the paper, highlight interesting ideas, weaknesses and give suggestions.

Class participation (10%): every student should actively engage in paper discussions in class. 


The first few lectures will cover the basics of deep learning---convolutional and recurrent architectures, generative models, and optimization/regularization. We will also study the applications of deep learning in several biomedical domains---genomics, protein structure, small molecule/drug interactions, medical imaging and medical records. 

Date Topic Papers Recitation topic Assignment
9/23 Class overview, Intro to supervised ML and neural networks, SGD + backprop + optimization + regularization
  1. Deep Learning
  2. Neural Nets and Deep learning primer
  3. ML cheatsheet (Links to an external site.) 
9/25 Intro to genomics + genetics + biological application domains for deep learning
  1. Life and its molecules (Links to an external site.) (Basic primer on mol. biology for computational students)
  2. Next generation genomics: An Integrative Approach (Links to an external site.) (A review on next generation sequencing and functional genomics)
  3.  Human genetic variation and its contribution to complex traits (Links to an external site.) (A review on human genetic variation and disease)
Genomics primer
9/30 Deep learning for genomics and intro to CNNs
  1. Deep learning: new computational modelling techniques for genomics
  2. Deep learning for computational biology (Review) (Links to an external site.) 
  3. Deep learning in Biomedicine (Review)

Primers on deep learning for genomics

10/2 Applications of CNNs and Dilated CNNs in regulatory genomics and genetics
  1. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning (Links to an external site.)
  2. Deep learning at base-resolution reveals motif syntax of the cis-regulatory code
  3. Sequential regulatory activity prediction across chromosomes with convolutional neural networks
Deep learning primer
10/7 Recurrent neural networks + Transformers
  1. DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences (Links to an external site.) 
  2. DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning (Links to an external site.) 

Methods for interpreting deep learning models and generating mechanistic hypotheses


  1. Interpretable Machine Learning (Links to an external site.) 
  2. Learning Important Features Through Propagating Activation Differences (Links to an external site.) 
  3. Axiomatic Attribution for Deep Networks (Links to an external site.)
  4. A unified approach to interpreting model predictions (Links to an external site.) 
  5. Section 5.3 in Opportunities and obstacles for deep learning in biology and medicine (Links to an external site.)
  6.  (Links to an external site.)The Building Blocks of Interpretability (Links to an external site.)
  7. Differentiable Image Parameterizations (Links to an external site.)  
DragoNN tutorial

Projects released on Oct. 11.



Autoencoders + unsupervised representation learning for single cell genomics

  1. CYCLOPS reveals human transcriptional rhythms in health and disease (Links to an external site.)
  2. Bayesian Inference for a Generative Model of Transcriptome Profiles from Single-cell RNA Sequencing (Links to an external site.)
  3. Manifold learning-based methods for analyzing single-cell RNA-sequencing data
Select projects and papers on Oct. 14 (signups open at 9am PT). 
10/16 Generative models
  1. Feedback GANs (Links to an external site.)
  2. Boltzmann generators
Keras + Pytorch tutorial
10/21 Molecules and drug discovery + graph convolution
  1. Deep generative models of genetic variation capture the effects of mutations
    1. TA Note: useful for understanding VAE:
  2. MoleculeNet
10/23 Faculty led paper discussions
  1. Deep learning at base-resolution reveals motif syntax of the cis-regulatory code
  2. Modular modeling improves the predictions of genetic variant effects on splicing (Links to an external site.) 
  3. Predicting Splicing from Primary Sequence with Deep Learning
  4. Deep generative models of genetic variation capture the effects of mutations (Links to an external site.) 
Deep learning review
10/28 Project proposal presentations


5 minute proposal presentations
10/30 Faculty led paper discussions
  1. Clairvoyante: a multi-task convolutional deep neural network for variant calling in Single Molecule Sequencing (Links to an external site.)
  2.  Skyhawk: An Artificial Neural Network-based discriminator for reviewing clinically significant genomic variants (Links to an external site.) 
  3. Exploring Single-Cell Data with Deep Multitasking Neural Networks (Links to an external site.) 
  4. Deep Count Autoencoder
11/4 Guest lecture
11/6 Guest lecture
11/11 Student paper presentations
  1. Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk (Links to an external site.) 
  2. Multi-scale deep tensor factorization learns a latent representation of the human epigenome (Links to an external site.) 
  3. Cross-species regulatory sequence activity prediction
11/13 Student Paper presentations
  1. End-to-End Differentiable Learning of Protein Structure (Links to an external site.)
  2.  Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model (Links to an external site.)
  3. Convolutional Networks on Graphs for Learning Molecular Fingerprints (Links to an external site.) 
  4. Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation (Links to an external site.) 
  5. Entangled Conditional Adversarial Autoencoder for de Novo Drug Discovery (Links to an external site.) 
11/18 Faculty led paper discussion
  1. Label-free prediction of three-dimensional fluorescence images from transmitted-light microscopy (Links to an external site.)
  2. Improving Automated Veterinary Disease Coding Via Large-scale Language Modeling 
11/20 Student Paper presentations
  1. Learning Sleep Stages from Radio Signals: A Conditional Adversarial Architecture (Links to an external site.) 
  2. Dermatologist-level classification of skin cancer with deep neural networks (Links to an external site.)  
Initial paper submitted for peer review (due at 5pm 11/22). 
12/2 Student Paper presentations
  1. Content-Aware Image Restoration: Pushing the Limits of Fluorescence Microscopy (Links to an external site.)
  2.  Learning to Detect Sepsis with a Multitask Gaussian Process RNN Classifier (Links to an external site.) 
Peer review out
12/4 Student Paper presentations
  1. Privacy-preserving generative deep neural networks support clinical data sharing (Links to an external site.)
  2. Scalable and accurate deep learning with electronic health records (Links to an external site.) 
  3. Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning (Links to an external site.) 
12/10 Finals week: poster presentation

Poster session will be in the atrium/lobby of the Gates Computer Science Building. The poster session will run from 8:30 AM to 11:30 AM. We will provide easels and board backing to mount your posters.

Final paper due 12/13. 
12/13 Final paper due


Course Summary:

Date Details Due