Value of Data and AI

Course overview

Many of the most valuable companies in the world and the most innovative startups have business models based on data and AI, but our understanding about the economic value of data, networks and algorithmic assets remains at an early stage. For example, what is the value of a new dataset or an improved algorithm? How should investors value a data-centric business such as Netflix, Uber, Google, or Facebook? And what business models can best leverage data and algorithmic assets in settings as diverse as e-commerce, manufacturing, biotech and humanitarian organizations? In this graduate seminar, we will investigate these questions by studying recent research on these topics and by hosting in-depth discussions with experts from industry and academia. Key topics will include value of data quantity and quality in statistics and AI, business models around data, networks, scaling effects, economic theory around data, and emerging data protection regulations. Students will also conduct a group research project in this field.

Location: Wednesdays and Fridays at 10:30-12:00, Room 200-203

Prerequisites

This course will require sufficient mathematical maturity to follow the technical content; some familiarity with data mining and machine learning and at least an undergraduate course in statistics are recommended. However, the course will be accessible to a wide audience including graduate students in computer science, engineering, economics, law and business.

Instructors

Matei Zaharia (matei@cs.stanford.edu)

James Zou (jamesz@stanford.edu

Steve Eglash (seglash@stanford.edu)

TA

Amirata Ghorbani (amiratag@stanford.edu)

Piazza sign-up link: piazza.com/stanford/winter2020/cs320

Scribes should be forwarded to the TA within one week after each lecture:

Office hours

The instructors will be available after each class. Please email for additional meetings. 

Assignments

Course project (65%): The main assignment is a quarter-long project, which could range from original research to a case study of a particular company or industry. We will ask students to form small groups and submit a project proposal in the first two weeks of the course, and we will then meet with each group several times to gauge their progress and provide advice. Each group will do an in-class presentation at the end of the course, and possibly a mid-quarter presentation too. Each group will also submit a final report (up to 8 pages).

Class participation (20%): every student should read the assigned papers and actively engage in class discussions.

Class scribing (15%): students will be responsible for scribing one class. Good scribing should supplement the class discussion with additional readings. 

Schedule

Date Topic

Readings (required in red

1/8 Introduction, examples of data and business models (Lecture Slides) (Lecture Notes)

1. AI and economics.

2. AWS data exchange.

3. FlatIron cancer data

1/10

Business value of data, Netflix case study

(Lecture Slides) (Lecture Notes)

1. Netflix recommender system.

2. Data inverting

1/15

Design of data platforms, Databricks case study

(Lecture Slides) (Lecture Notes)

1. Evolution of decision support systems (up to page 18) (Ch.1 of Building the data warehouse)

2. How to build an analytics team for impact in an organization

1/17

What can ML do. Statistical methods to evaluate impact.

(Lecture Slides) (Lecture Notes)

1. What can ML do.

2. Evaluating causal impact. 

1/22

Data-driven targeting and commerce 

(Lecture Slides) (Lecture Notes)

1. How WaPo optimizes articles.

2. Personalized recommendation with bandit algorithms.

1/24

Consumer privacy and data security

(Lecture Slides) (Lecture Notes)

1. NYTimes cell phone tracking

2. California data brokers registry (from CCPA).

3. Robust De-anonymization of Large Sparse Datasets.

1/29

Guest speaker:  Hal Varian, Google Chief Economist (Bio & CV): Google Ad Auction History

(Lecture Slides) (Lecture Notes)

Project proposals due

1. The Economics of Internet Search

2. Position Auctions

1/31

Business models and scaling effects

(Lecture Slides) (Lecture Notes)

1. WeWork: Blitzscaling or Blitzflailing?

2. Reid Hoffman Shares Lessons

3. The fundamental problem with Silicon's Valley's favorite growth strategy

4. Response to The fundamental problem...

2/5

Data valuation; case study

(Lecture Slides) (Lecture Notes)

1. What is this data worth?

2. Business model of consumer genetics

2/7

Guest speaker: Brad Peterson (NASDAQ CIO)  & William Dague (Head of Alternative Data): NASDAQ Data Products

(Lecture Slides) (Lecture Notes)

 

2/12

Data quality

(Lecture Notes)

1. Data validation for machine learning

2/14

Guest Speaker: Chris Ré, Stanford CS, Lattice Data and Apple: data and machine learning

(Lecture Notes)

 

2/19

ML accountability and fairness

(Lecture Slides) (Lecture Notes)

1. AI fairness. 2. Health prediction disparity. 

3. Brookings AI and bias overview

2/21

Guest speaker:  David Engstrom, Professor at Stanford Law School: Government agency use of AI and data

(Lecture Slides) (Lecture Notes)

  1. Engstrom-Gelbach, Legal Tech, Civ Pro, and Future of American Adversarialism (CS 320)-1.pdf
  2. AI-Report_v13-1.pdf
2/26 Project update presentations.

 

2/28

Regulations around data sharing, data dividend

(Lecture Notes)

1. GDPR.

2. California Consumer Privacy Act.

3/4

Guest speaker: Nicole Vadivel, research engineer at Tesla: Fleet field data and battery development

(Lecture Slides) (Lecture Notes)

1. Severson, Kristen A., et al. "Data-driven prediction of battery cycle life before capacity degradation." Nature Energy 4.5 (2019): 383-391.

3/6

How VCs value data driven startups. Julia Schottenstein, David Pezeshki, Alexander Beard.

(Lecture Notes)

1. The new business of AI (and how it's different from traditional software)

3/11 Challenges for ML production

 

3/13 Final presentation.

 

Course Summary:

Date Details Due