Deep Learning in Genomics and Biomedicine

Deep Learning in Genomics and Biomedicine

Course Overview

Recent breakthroughs in high-throughput genomic and biomedical data are transforming biological sciences into "big data" disciplines. In parallel, progress in deep neural networks are revolutionizing fields such as image recognition, natural language processing and, more broadly, AI. This course explores the exciting intersection between these two advances. The course will provide an introduction to deep learning and overview the relevant background in genomics, high-throughput biotechnology, protein and drug/small molecule interactions, medical imaging and other clinical measurements focusing on the available data and their relevance. We will cove  ongoing developments in deep learning (supervised, unsupervised and generative models) with the focus on the applications of these methods to biomedical data, which are beginning to produced dramatic results.  In addition to predictive modeling, the course emphasizes how to visualize and extract interpretable, biological insights from such models. Recent papers from the literature will be presented and discussed. Students will work in groups on a final class project using real world datasets.

Prerequisites

College calculus, linear algebra, basic probability and statistics such as CS109, and basic machine learning such as CS229. Prior knowledge of the biology domains is not necessary.

Instructors

Anshul Kundaje , Assistant Professor (akundaje@stanford.edu)

James Zou , Assistant Professor ()

TAs

Kevin Wu, Graduate Student, Computer Science (wukevin@stanford.edu

Laksshman Sundaram, Graduate Student, Computer Science (lakss@stanford.edu)

Recitation

Fridays 10:30 - 11:20am at 380-380Y. 

Note: we will not have recitation every week. Please check the syllabus for recitation topics. 

Office hours

James Zou Find me after each class or Mondays 5-5:30pm in Packard 258.

Anshul Kundaje : Find me after each class or Wednesdays 5-5.30 pm (Lane Med School Building L301)

TAs: Thursdays 5pm - Packard Engineering 2nd Floor Kitchen

Assignments

Course project (60%): the students will form teams of 3-4 and choose from one of the suggested projects or select their own project. Teams will be given Google Cloud credits to implement algorithms and perform analysis. Teams are expected to work on the research project throughout the second half of the quarter and produce conference-style papers. Each team will present the paper to the entire class at the end of the semester. The course project will consists of the following milestones:

  1. Project proposal in class (3 minute talk).
  2. First draft of the paper for peer review. 
  3. Poster presentation (12/10 8:30-11:30pm Gates lobby).
  4. Final paper (due at noon on Friday 12/13). 

A significant portion of the class will be based on reading and discussing the latest literature. Every student should read the assigned papers before class and participate in discussions.

Paper presentation (20%): each team selects one of the suggested papers to present in detail to the class. The presentation should be 20 mins + 5 mins for Q&A. Each team will also write a concise review of the paper. The review will be published on bioRxiv. 

Peer project review (10%): each team will be assigned two other groups' paper drafts to review. The review should concisely summarize the key findings of the paper, highlight interesting ideas, weaknesses and give suggestions.

Class participation (10%): every student should actively engage in paper discussions in class. 

Schedule

The first few lectures will cover the basics of deep learning---convolutional and recurrent architectures, generative models, and optimization/regularization. We will also study the applications of deep learning in several biomedical domains---genomics, protein structure, small molecule/drug interactions, medical imaging and medical records. 

Date Topic Papers Recitation topic Assignment
9/23 Class overview, Intro to supervised ML and neural networks, SGD + backprop + optimization + regularization
  1. Deep Learning 
    http://www.nature.com/nature/journal/v521/n7553/full/nature14539.html
  2. Neural Nets and Deep learning primer 
    http://neuralnetworksanddeeplearning.com/
  3. ML cheatsheet https://ml-cheatsheet.readthedocs.io/en/latest/index.html (Links to an external site.) 
  4. https://www.deeplearningbook.org/contents/optimization.html
  5. https://www.deeplearningbook.org/contents/regularization.html
9/25 Intro to genomics + genetics + biological application domains for deep learning
  1. Life and its molecules https://www.biostat.wisc.edu/bmi576/papers/hunter04.pdf (Links to an external site.) (Basic primer on mol. biology for computational students)
  2. Next generation genomics: An Integrative Approach https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3321268/ (Links to an external site.) (A review on next generation sequencing and functional genomics)
  3.  Human genetic variation and its contribution to complex traits https://www.nature.com/articles/nrg2554 (Links to an external site.) (A review on human genetic variation and disease)
Genomics primer
9/30 Deep learning for genomics and intro to CNNs
  1. Deep learning: new computational modelling techniques for genomics
    https://www.nature.com/articles/s41576-019-0122-6
  2. Deep learning for computational biology (Review) http://msb.embopress.org/content/12/7/878 (Links to an external site.) 
  3. Deep learning in Biomedicine (Review) https://www.nature.com/articles/nbt.4233?linkId=56568955

Primers on deep learning for genomics

https://kundajelab.github.io/dragonn/tutorials.html


https://colab.research.google.com/drive/17E4h5aAOioh5DiTo7MZg4hpL6Z_0FyWr


10/2 Applications of CNNs and Dilated CNNs in regulatory genomics and genetics
  1. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning https://www.nature.com/articles/nbt.3300 (Links to an external site.)
  2. Deep learning at base-resolution reveals motif syntax of the cis-regulatory code https://www.biorxiv.org/content/10.1101/737981v1
  3. Sequential regulatory activity prediction across chromosomes with convolutional neural networks https://genome.cshlp.org/content/28/5/739
Deep learning primer
10/7 Recurrent neural networks + Transformers
  1. DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences https://www.ncbi.nlm.nih.gov/pubmed/27084946 (Links to an external site.) 
  2. DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning https://genomebiology.biomedcentral.com/articles/10.1186/s13059-017-1189-z (Links to an external site.) 
10/9

Methods for interpreting deep learning models and generating mechanistic hypotheses

 

  1. Interpretable Machine Learning https://christophm.github.io/interpretable-ml-book/ (Links to an external site.) 
  2. Learning Important Features Through Propagating Activation Differences https://arxiv.org/abs/1704.02685 (Links to an external site.) 
  3. Axiomatic Attribution for Deep Networks https://arxiv.org/abs/1703.01365 (Links to an external site.)
  4. A unified approach to interpreting model predictions https://arxiv.org/abs/1705.07874 (Links to an external site.) 
  5. Section 5.3 in Opportunities and obstacles for deep learning in biology and medicine http://rsif.royalsocietypublishing.org/content/15/141/20170387.long (Links to an external site.)
  6.  (Links to an external site.)The Building Blocks of Interpretability https://distill.pub/2018/building-blocks/ (Links to an external site.)
  7. Differentiable Image Parameterizations https://distill.pub/2018/differentiable-parameterizations/ (Links to an external site.)  
DragoNN tutorial

Projects released on Oct. 11.

 

10/14

Autoencoders + unsupervised representation learning for single cell genomics

  1. CYCLOPS reveals human transcriptional rhythms in health and disease http://www.pnas.org/content/pnas/114/20/5312.full.pdf (Links to an external site.)
  2. Bayesian Inference for a Generative Model of Transcriptome Profiles from Single-cell RNA Sequencing https://www.biorxiv.org/content/biorxiv/early/2018/03/30/292037.full.pdf (Links to an external site.)
  3. Manifold learning-based methods for analyzing single-cell RNA-sequencing data https://www.sciencedirect.com/science/article/pii/S2452310017301877
Select projects and papers on Oct. 14 (signups open at 9am PT). 
10/16 Generative models
  1. Feedback GANs https://arxiv.org/pdf/1804.01694.pdf (Links to an external site.)
  2. Boltzmann generators https://science.sciencemag.org/content/365/6457/eaaw1147.full
Keras + Pytorch tutorial
10/21 Molecules and drug discovery + graph convolution
  1. Deep generative models of genetic variation capture the effects of mutations https://www.nature.com/articles/s41592-018-0138-4
    1. TA Note: useful for understanding VAE: https://arxiv.org/pdf/1606.05908.pdf
  2. MoleculeNet https://arxiv.org/abs/1703.00564
10/23 Faculty led paper discussions
  1. Deep learning at base-resolution reveals motif syntax of the cis-regulatory code https://www.biorxiv.org/content/10.1101/737981v1.full
  2. Modular modeling improves the predictions of genetic variant effects on splicing https://www.biorxiv.org/content/early/2018/10/10/438986 (Links to an external site.) 
  3. Predicting Splicing from Primary Sequence with Deep Learning
    https://www.sciencedirect.com/science/article/pii/S0092867418316295?via%3Dihub
  4. Deep generative models of genetic variation capture the effects of mutations https://www.nature.com/articles/s41592-018-0138-4 (Links to an external site.) 
Deep learning review
10/28 Project proposal presentations

 

5 minute proposal presentations
10/30 Faculty led paper discussions
  1. Clairvoyante: a multi-task convolutional deep neural network for variant calling in Single Molecule Sequencing https://www.biorxiv.org/content/early/2018/04/28/310458 (Links to an external site.)
  2.  Skyhawk: An Artificial Neural Network-based discriminator for reviewing clinically significant genomic variants https://www.biorxiv.org/content/early/2018/05/01/311985 (Links to an external site.) 
  3. Exploring Single-Cell Data with Deep Multitasking Neural Networks https://www.biorxiv.org/content/early/2018/08/27/237065.1 (Links to an external site.) 
  4. Deep Count Autoencoder https://www.nature.com/articles/s41467-018-07931-2
11/4 Guest lecture
11/6 Guest lecture
11/11 Student paper presentations
  1. Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk https://www.nature.com/articles/s41588-018-0160-6 (Links to an external site.) 
  2. Multi-scale deep tensor factorization learns a latent representation of the human epigenome 
    https://www.biorxiv.org/content/early/2018/07/08/364976 (Links to an external site.) 
  3. Cross-species regulatory sequence activity prediction
    https://www.biorxiv.org/content/10.1101/660563v2.full
11/13 Student Paper presentations
  1. End-to-End Differentiable Learning of Protein Structure https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3239970 (Links to an external site.)
  2.  Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005324 (Links to an external site.)
  3. Convolutional Networks on Graphs for Learning Molecular Fingerprints 
    http://papers.nips.cc/paper/5954-convolutional-networks-on-graphs-for-learning-molecular-fingerprints (Links to an external site.) 
  4. Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation https://arxiv.org/pdf/1806.02473.pdf (Links to an external site.) 
  5. Entangled Conditional Adversarial Autoencoder for de Novo Drug Discovery https://pubs.acs.org/doi/10.1021/acs.molpharmaceut.8b00839 (Links to an external site.) 
11/18 Faculty led paper discussion
  1. Label-free prediction of three-dimensional fluorescence images from transmitted-light microscopy https://www.nature.com/articles/s41592-018-0111-2 (Links to an external site.)
  2. Improving Automated Veterinary Disease Coding Via Large-scale Language Modeling 
11/20 Student Paper presentations
  1. Learning Sleep Stages from Radio Signals: A Conditional Adversarial Architecture http://proceedings.mlr.press/v70/zhao17d.html (Links to an external site.) 
  2. Dermatologist-level classification of skin cancer with deep neural networks https://www.nature.com/articles/nature21056 (Links to an external site.)  
Initial paper submitted for peer review (due at 5pm 11/22). 
12/2 Student Paper presentations
  1. Content-Aware Image Restoration: Pushing the Limits of Fluorescence Microscopy https://www.biorxiv.org/content/early/2018/07/03/236463 (Links to an external site.)
  2.  Learning to Detect Sepsis with a Multitask Gaussian Process RNN Classifier https://arxiv.org/abs/1706.04152 (Links to an external site.) 
Peer review out
12/4 Student Paper presentations
  1. Privacy-preserving generative deep neural networks support clinical data sharing https://www.biorxiv.org/content/early/2018/06/05/159756 (Links to an external site.)
  2. Scalable and accurate deep learning with electronic health records https://www.nature.com/articles/s41746-018-0029-1 (Links to an external site.) 
  3. Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning https://www.nature.com/articles/s41591-018-0177-5 (Links to an external site.) 
12/10 Finals week: poster presentation

Poster session will be in the atrium/lobby of the Gates Computer Science Building. The poster session will run from 8:30 AM to 11:30 AM. We will provide easels and board backing to mount your posters.

Final paper due 12/13. 
12/13 Final paper due

 

Course Summary:

Date Details