Course Syllabus
|
STATS 216. Intro to Statistical Learning (Winter'20) Class (15108): 1/6 - 3/13/2020, M/W 3-4:20pm at Gates B3 Textbook: An Introduction to Statistical Learning, with applications in R, J. Gareth, et. al., ISBN: 9781461471387. Errata & data. Available in pdf through the Stanford libraries or from the book website. Important: International editions may have missing/swapped exercises, which complicates your learning and may impact your grade (if you submit wrong solutions). Pre-recorded videos of lectures are available on YouTube. These are the same videos you have in Modules (but these are all in one place on YouTube). Use |
+ |
Overview: Overview of supervised learning, with a focus on regression and classification methods. Syllabus includes: linear and polynomial regression, logistic regression and linear discriminant analysis;cross-validation and bootstrap, model selection and regularization methods (ridge and lasso); nonlinear models, splines and generalized additive models; tree-based methods, random forests and boosting; support-vector machines; Some unsupervised learning: principal components and clustering (k-means and hierarchical). Computing is done in R, through tutorial sessions and homework assignments. This math-light course is offered via video segments (MOOC style), and in-class problem solving sessions.
Prerequisites: Introductory courses in statistics or probability (e.g., Stats60 or Stats101), linear algebra (e.g., Math51), and computer programming (e.g., CS105).
Objectives of the course: The course covers the entire contents of the textbook with objectives to
- introduce fundamental tools for building predictive models, including some "state-of-the-art" methods in data science
- understand the role of model selection and assessment using cross-validation and randomization
- learn how to use the vast collection of tools in R to implement the methods learned
Flipped format: This course is in the "flipped" format. The lectures are pre-recorded, and students will watch them in their own time. The course material consists of recorded video chunks (typically around 10-12 minutes long), as well as quizzes and review questions. Each week new material will become available, and students are expected to keep up. The lectures follow the course textbook.
In-class sessions focus on hands-on experience, where we will solve problems both with pen, paper and in R. The in-class sessions are recorded and will be posted in Canvas for all students to view.
Attendance of the in-class sessions is required for all on-campus students.
Laptop: in-class students will still need to bring their laptops for lab work. If you don't have a laptop, you can team up with someone who does. Our in-class sessions will often require the use of computing.
Computing: Students will be required to use R, and the lectures include some instructions in the use of R. R is an excellent open source (free), available from CRAN, where you will also find many tutorials. John Maindonald’s guide is especially recommended. You can also learn from a variety of YouTube and MOOC classes (LinkedIn Learning, Udacity, Coursera). You will be introduced to R Studio IDE (integrated development environment), a user-friendly environment for developing, running, and documenting R code.
Course outline: The material on Canvas will be available at 8am PST on Sunday, including the slides and R-session scripts used in the videos.
| Week | Dates | Ch | Topics | Assisting TAs |
| 1 | Jan 6, 8 | 1, 2 | Introduction and statistical learning | Ying |
| 2 | 13, 15 | 3 | Linear regression | Ying |
| 3 | 20, 22 | 4 | MLK Holiday (no classes), Classification | Han |
| 4 | 27, 29 | 5 | Resampling methods | Ying |
| 5 | Feb 3, 5 | 6 | Linear model selection and regularization | Han |
| 6 | 10, 12 | 7 | Moving beyond linearity, Tree-based methods, | Kenneth |
| 7 | 17, 19 | 8 | President's Day, MT Exam (same place/time) | Han |
| 8 | 24, 26 | 9 | Support vector machines | Kenneth |
| 9 | Mar 2, 4 | 10 | Unsupervised learning | Kenneth |
| 10 | 9, 11 | Review. | Yu | |
| 17 |
Final Exam. See schedule. Gates B3, 8:30-11:30 am |
Yu |
TAs: Kenneth is the head TA.
To make the OH most effective, please come prepared, i.e. bring clear questions and what you may have tried that did or did not work.
|
Name |
OH Date/Time |
OH Location |
Piazza Monitoring Days |
|
|
Jingyi Kenneth Tay |
Thu 2-4pm |
Zoom Meeting ID 685-553-1394 |
Sun | |
|
Ying Jin |
Tue 10.30am-12.30pm |
Sequoia 207 |
Mon, Tue |
|
|
Han Wu |
Mon 9-11am | Sequoia 105 | Wed, Thu | |
|
Yu Wang |
Fri 9-11am | 160-B35 | Fri, Sat |
Assignments & Grading: Your assignment submissions must be your own effort, but you can discuss high-level ideas with your peers. Do NOT share or post the code and solutions on the forums and Internet. Keep the level of solution detail similar to that in your textbook, which we will consider as the key source of truth (which can still have typos and can require clarifications).
There are no practice exams, but we will post the textbook topics you need to know to succeed on the exam. A TA may be available to do a review session before the exams.
HW (35%): We will have regular biweekly graded homework assignments (4 total), which will include analysis of datasets, analytical and conceptual problems, and programming assignments. These are to be completed individually, since they count toward the final grade for the course. Some parts of the assignments involve computing. Where appropriate, we will indicate that students MAY do the COMPUTING part of the assignments in groups (of size at most 4). If so, students must indicate the membership of the groups.
HW submission is via in Gradescope.com. We will manually sync its student list with Canvas. So, if you don't have access to Gradescope in Week 1, let us know. Late HW will be penalized at 10% of the maximum score per day. HW turned in more than 4 days late (hard deadline) will not be graded. The final homework has a sharp deadline of the due date.
Each problem should start on the new page, be properly tagged in Gradescope, and execute correctly and independently from other problems. (Seed your RNG, if needed).
Gradescope will only accept one PDF document for each homework. The best way to have all your text, code and code results in one PDF document is to write up your homework as an R markdown file, then either (i) knit it directly to PDF, or (ii) knit it to HTML, then save that HTML file as a PDF.
Midterm Exam (15%): The midterm (date posted above) is in class (unless changed). Logistics: The exams are closed book, notes, calculators and phones. We will provide blue books, but the pencil and effort is yours. SCPD students will be assisted by the SCPD. Exam questions are different from homework questions: HW deepens your understanding, but exam measures it.
Final Exam (40%): Location is TBA; date/time is here. It is cumulative. Do not book travel that conflicts with this date. University policy is that students may not register for two classes with exams at the same time.
In-Canvas Quizzes (10%) are based on video lectures, slides, and textbook. We highly recommend to take these as you watch the videos, instead of saving them till the end of the quarter. Answers can only be submitted once, so please check them carefully before submitting. Quizzes will close 24 hours after the final exam's submission deadline (so we can timely submit all course grades).
Grade Scale is firm for now. NP:<60, D−:60-63, D:63-67, D+:67-70, C−:70-73, C:73-77, C+:77-80, B−:80-83, B:83-87, B+:87-90, A−:90-93, A:93-97, A+:97-... (boundary values counted in your favor). See grade definitions.
Re-grading: We aim to grade fairly, accurately, and timely. If you believe we made a crude grading error, please notify the staff privately via forum ASAP (within 1 week of the grades’ release). To discourage frivolous appeals, we reserve the right to deduct a 2-5% of the issued grade, if your appeal lacks a strong justification or the benefit fails to exceed 2-5%. Be sure it is worth the mutual effort.
Make up policy: If you miss an exam with a valid/verifiable excuse (be prepared to demonstrate), contact Oleg ASAP to reschedule the exam. Please mind that making exceptions is difficult and time consuming and can only be done before exams/solutions are distributed. Typical reasons are of medical emergency, which travel and conferences are not.
Extra credit: see Communication & Piazza.
Academic Integrity: Students are expected to maintain the exemplary integrity in their class efforts. Make sure you understand the Stanford Honor Code. It’s a must!
Examples of honor code violations:
- Looking at the solutions from previous years’ HW or exams - either official or written up by another student.
- Sharing the write up or code with another student (showing to or looking at).
- Uploading your write up or code to a public repository so that it can be accessed by other students.
- Discussing homework problems in such detail that your solution (write up or code) is almost identical to another student's answer.
Unless explicitly mentioned otherwise, we will assume that any submitted work is
- your own
- created without assistance from anyone else (except possibly course staff)
- created without consulting any resources other than the course materials.
Note that StackOverflow, StackExchange, GitHub repos, etc. are also accessible to our staff and plagiarism detection software.
Collaboration: You are encouraged to discuss the homework problems and the material with your classmates, but you must submit your own individually developed HW solutions. Please indicate at the top of your write-up the names of the students with whom you worked.
Communication & Piazza: Most e-communication will be done on Piazza (course link, also available at the "Piazza" tab within Canvas). Post your personal concerns (your grading, your absences, your accommodations, etc.) via private posts for the staff. All other concerns that are beneficial to your peers (who may be seeking the same info) should be posted as public messages. If emailing, use "stats 216" in the subject line to avoid delays in reply.
Piazza offers a reliable and convenient interface for discussions, learning and assistance. Its great functionality is only valuable if utilized. So, please format your posts (use Code Blocks, LaTeX, embedded images, other post references, tables, links, etc.). Collaboratively edit the posts to make corrections, clarifications, instead of chaining corrected posts. The profile photos in Piazza and Canvas also help your audience.
Do NOT post HW code or solutions while your peers are still working on HW. Give your peers a chance for discovery and learning. We do encourage higher-level discussions of problems/solutions and posts of R code that clarifies functionality of tools. The instructor team will try to attend daily, so please be patient. Peer assistance is encouraged, just please avoid posting threads for the sake of posting (the quality does matter). Before asking a question, quick-search prior posts for similar concerns. The anonymous posts are ok, but the staff still sees the identities.
If you have a complaint about the class, please notify the instructor team privately. If unresolved, contact Oleg. Mind that the forum answers may not always be correct (typos, question misunderstandings, etc.). Naturally, keep the communication professional, respectful and cordial.
Please use Piazza for all questions related to lectures, homework and exams. Students may earn up to 3% extra credit by answering other students’ questions in a substantial and helpful way.
Other great resources:
- The Elements of Statistical Learning, T.Hastie, et.al. (free pdf).
- Piazza's post on R tutorials
Disabilities assistance: If you have a disability and need help or reasonable accommodations, contact The Office of Accessible Education (OAE). The OAE should notify the instructors about needed accommodations at the beginning of the semester. Do not wait till the last minute.
Classroom: Please avoid browsing irrelevant websites (movies, social media, games) in class, as these are distracting to your peer and divert your attention.
Copyright: Each disseminated document in this course is copyrighted with all rights reserved. These may not be reproduced or distributed without an explicit permission of the author(s). The Stanford University also has a policy that one should know before audio/video recording.
Recording policy: see University policy on recording and copyright reminder.
Video recordings: Video cameras located in the back of the room will capture the instructor presentations in this course. For your convenience, you can access these recordings by logging into the course Canvas site. These recordings might be reused in other Stanford courses, viewed by other Stanford students, faculty, or staff, or used for other education and research purposes. Note that while the cameras are positioned with the intention of recording only the instructor, occasionally a part of your image or voice might be incidentally captured. If you have questions, please contact a member of the teaching team.
SCPD Students: submit HW assignments on Gradescope just like non-SCPD students.
Disclaimer: This syllabus and course details are subject to change, but we will keep this to a minimum and give an advanced warning, whenever possible.
Course assistance: If you feel the TAs did not resolve your issue, please escalate to Oleg.
Course Summary:
| Date | Details | Due |
|---|---|---|
