Deep Learning in Genomics and Biomedicine
Course Overview
Recent breakthroughs in high-throughput genomic and biomedical data are transforming biological sciences into "big data" disciplines. In parallel, progress in deep neural networks are revolutionizing fields such as image recognition, natural language processing and, more broadly, AI. This course explores the exciting intersection between these two advances. The course will provide an introduction to deep learning and overview the relevant background in genomics, high-throughput biotechnology, protein and drug/small molecule interactions, medical imaging and other clinical measurements focusing on the available data and their relevance. We will cove ongoing developments in deep learning (supervised, unsupervised and generative models) with the focus on the applications of these methods to biomedical data, which are beginning to produced dramatic results. In addition to predictive modeling, the course emphasizes how to visualize and extract interpretable, biological insights from such models. Recent papers from the literature will be presented and discussed. Students will work in groups on a final class project using real world datasets.
Prerequisites
College calculus, linear algebra, basic probability and statistics such as CS109, and basic machine learning such as CS229. Prior knowledge of the biology domains is not necessary.
Instructors
Anshul Kundaje , Assistant Professor (akundaje@stanford.edu)
James Zou , Assistant Professor (jamesz@stanford.edu)
TAs
Kevin Wu, Graduate Student, Computer Science (wukevin@stanford.edu)
Laksshman Sundaram, Graduate Student, Computer Science (lakss@stanford.edu)
Recitation
Fridays 10:30 - 11:20am at 380-380Y.
Note: we will not have recitation every week. Please check the syllabus for recitation topics.
Office hours
James Zou : Find me after each class or Mondays 5-5:30pm in Packard 258.
Anshul Kundaje : Find me after each class or Wednesdays 5-5.30 pm (Lane Med School Building L301)
TAs: Thursdays 5pm - Packard Engineering 2nd Floor Kitchen
Assignments
Course project (60%): the students will form teams of 3-4 and choose from one of the suggested projects or select their own project. Teams will be given Google Cloud credits to implement algorithms and perform analysis. Teams are expected to work on the research project throughout the second half of the quarter and produce conference-style papers. Each team will present the paper to the entire class at the end of the semester. The course project will consists of the following milestones:
- Project proposal in class (3 minute talk).
- First draft of the paper for peer review.
- Poster presentation (12/10 8:30-11:30pm Gates lobby).
- Final paper (due at noon on Friday 12/13).
A significant portion of the class will be based on reading and discussing the latest literature. Every student should read the assigned papers before class and participate in discussions.
Paper presentation (20%): each team selects one of the suggested papers to present in detail to the class. The presentation should be 20 mins + 5 mins for Q&A. Each team will also write a concise review of the paper. The review will be published on bioRxiv.
Peer project review (10%): each team will be assigned two other groups' paper drafts to review. The review should concisely summarize the key findings of the paper, highlight interesting ideas, weaknesses and give suggestions.
Class participation (10%): every student should actively engage in paper discussions in class.
Schedule
The first few lectures will cover the basics of deep learning---convolutional and recurrent architectures, generative models, and optimization/regularization. We will also study the applications of deep learning in several biomedical domains---genomics, protein structure, small molecule/drug interactions, medical imaging and medical records.
Date | Topic | Papers | Recitation topic | Assignment |
9/23 | Class overview, Intro to supervised ML and neural networks, SGD + backprop + optimization + regularization |
|
||
9/25 | Intro to genomics + genetics + biological application domains for deep learning |
|
Genomics primer | |
9/30 | Deep learning for genomics and intro to CNNs |
|
Primers on deep learning for genomics https://kundajelab.github.io/dragonn/tutorials.html
|
|
10/2 | Applications of CNNs and Dilated CNNs in regulatory genomics and genetics |
|
Deep learning primer | |
10/7 | Recurrent neural networks + Transformers |
|
||
10/9 |
Methods for interpreting deep learning models and generating mechanistic hypotheses
|
|
DragoNN tutorial |
Projects released on Oct. 11.
|
10/14 |
Autoencoders + unsupervised representation learning for single cell genomics |
|
Select projects and papers on Oct. 14 (signups open at 9am PT). | |
10/16 | Generative models |
|
Keras + Pytorch tutorial | |
10/21 | Molecules and drug discovery + graph convolution |
|
||
10/23 | Faculty led paper discussions |
|
Deep learning review | |
10/28 | Project proposal presentations |
|
5 minute proposal presentations | |
10/30 | Faculty led paper discussions |
|
||
11/4 | Guest lecture | |||
11/6 | Guest lecture | |||
11/11 | Student paper presentations |
|
||
11/13 | Student Paper presentations |
|
||
11/18 | Faculty led paper discussion |
|
||
11/20 | Student Paper presentations |
|
Initial paper submitted for peer review (due at 5pm 11/22). | |
12/2 | Student Paper presentations |
|
Peer review out | |
12/4 | Student Paper presentations |
|
||
12/10 | Finals week: poster presentation |
Poster session will be in the atrium/lobby of the Gates Computer Science Building. The poster session will run from 8:30 AM to 11:30 AM. We will provide easels and board backing to mount your posters. |
Final paper due 12/13. | |
12/13 | Final paper due |
Course Summary:
Date | Details | Due |
---|---|---|