Cs329s 01 Slides
Cs329s 01 Slides
Cs329s 01 Slides
2
1. Course overview
3
What’s machine learning systems design?
The process of defining the interface, algorithms, data, infrastructure, and
hardware for a machine learning system to satisfy specified requirements.
4
What’s machine learning systems design?
The process of defining the interface, algorithms, data, infrastructure, and
hardware for a machine learning system to satisfy specified requirements.
5
We’ll learn
about all of this
System
Interface
Data ML algorithms
Infrastructure
Hardware
6
This class will cover ...
● ML production in the real-world from software, hardware, business
perspectives
● Iterative process for building ML systems at scale
○ project scoping, data management, developing, deploying, monitoring & maintenance,
infrastructure & hardware, business analysis
● Challenges and solutions of ML engineering
7
This class will not teach ...
● Machine learning/deep learning algorithms
○ CS 229: Machine Learning
○ CS 230: Deep Learning
○ CS 231N: Convolutional Neural Networks for Visual Recognition
○ CS 224N: Natural Language Processing with Deep Learning
● Computer systems
○ CS 110: Principles of Computer Systems
○ CS 140E: Operating systems design and implementation
● UX design
○ CS 147: Introduction to Human-Computer Interaction Design
○ DESINST 240: Designing Machine Learning: A Multidisciplinary Approach
8
Machine learning: expectation
9
Machine learning: reality
10
Prerequisites
● Knowledge of CS principles and skills (CS 106B/X)
● Understanding of ML algorithms (CS 229, CS 230, CS 231N, or CS 224N)
● Familiar with at least one framework such as TensorFlow, PyTorch, JAX
● Familiarity with basic probability theory (CS 109/Stat 116).
11
AI value creation by 2030
13 trillion USD
Most of it will be outside the
consumer internet industry
12
Zoom etiquettes
● Write questions into Zoom chat
○ Feel free to reply to each other — TAs will also reply
● I will stop occasionally for Q&A
○ TAs will re-share some of the questions with me
13
Zoom etiquettes
● Write questions into Zoom chat Ping Karan if you want to opt out
○ Feel free to reply to each other — TAs will also reply
● I will stop occasionally for Q&A
○ TAs will re-share some of the questions with me
● After each lecture, a random question will get a random reward
14
Zoom etiquettes
We appreciate it
if you keep your video on!
15
Grading
● Assignments (30%)
○ 2-3 assignments
● Final project (60%)
● Class participation (10%)
○ Zoom questions + Piazza
○ Bad sign if by the end of the quarter, we still don’t know who you are
16
See course website: https://cs329s.stanford.edu
Final project
● Build an ML-powered application
● Must work in group of three
● Demo + report (creative formats encouraged)
● Evaluated by course staff and industry experts
17
See course website: https://cs329s.stanford.edu
Honor code: permissive but strict - don’t test us ;)
● OK to search, ask in public about the systems we’re studying. Cite all the
resources you reference.
○ E.g. if you read it in a paper, cite it. If you ask on Quora, include the link.
● NOT OK to ask someone to do assignments/projects for you.
● OK to discuss questions with classmates. Disclose your discussion partners.
● NOT OK to copy solutions from classmates.
● OK to use existing solutions as part of your projects/assignments. Clarify your
contributions.
● NOT OK to pretend that someone’s solution is yours.
● OK to publish your final project after the course is over (we encourage that!)
● NOT OK to post your assignment solutions online.
● ASK the course staff if unsure! 18
Course staff
19
See course website: https://cs329s.stanford.edu
⚠⚠ Work in progress ⚠⚠
● First time the course is offered
● First time Chip’s taught a course online
● The subject is new, we don’t have all the answers
○ We are all learning too!
● We appreciate your:
○ enthusiasm for trying out new things
○ patience bearing with things that don’t quite work
○ feedback to improve the course
20
Inspired by Michael Bernstein’s CS 278
● https://cs329s.stanford.edu
● OHs start next week
● If you enrolled without submitting an application, send us an email!
● Questions so far?
21
2. ML in research vs. production
22
ML in research vs. in production
Research Production
23
“*” It’s actively being worked. See Utility is in the Eye of the User: A Critique of NLP Leaderboards (Ethayarajh and Jurafsky, EMNLP 2020)
Stakeholder objectives
ML team
highest accuracy
24
Stakeholder objectives
ML team Sales
highest accuracy sells more ads
25
Stakeholder objectives
ML team Sales Product
highest accuracy sells more ads fastest inference
26
Stakeholder objectives
ML team Sales Product Manager
highest accuracy sells more ads fastest inference maximizes profit
= laying off ML teams
27
Computational priority
Research Production
Computational priority Fast training, high throughput Fast inference, low latency
generating predictions
28
Latency matters
29
● Latency: time to move a leaf
● Throughput: how many leaves in 1 sec
30
● Real-time: low latency = high throughput
● Batched: high latency, high throughput
31
ML in research vs. in production
Research Production
Computational priority Fast training, high throughput Fast inference, low latency
32
Data
Research Production
● Clean ● Messy
● Static ● Constantly shifting
● Mostly historical data ● Historical + streaming data
● Biased, and you don’t know how biased
● Privacy + regulatory concerns
33
34
ML in research vs. in production
Research Production
Computational priority Fast training, high throughput Fast inference, low latency
35
Fairness
36
ML in research vs. in production
Research Production
Computational priority Fast training, high throughput Fast inference, low latency
37
Interpretability
38
ML in research vs. in production
Research Production
Computational priority Fast training, high throughput Fast inference, low latency
39
3. Breakout exercise
Each lecture, you’ll be randomly assigned to a group
40
7 mins - no one right answer!
1. How can academic leaderboards be modified to account for multiple
objectives? Should they?
2. ML models are getting bigger and bigger. How does this affect the usability of
these models in production?
41
Future of leaderboards
● More comprehensive utility function
○ Model performance (e.g. accuracy)
○ Latency
○ Prediction cost
○ Interpretability
○ Robustness
○ Ease of use (e.g. OSS tools)
○ Hardware requirements
● Adaptive to different use cases
○ Instead of a leaderboard for each dataset/task, each use case has its own leaderboard
● Dynamic datasets
○ Distribution shifts
42
Dynamic datasets
WILDS (Koh and Sagawa et al., 2020): 7 datasets with evaluation metrics and
train/test splits representative of distribution shifts in the wild.
43
4. ML systems vs. traditional software
44
Separation of Concerns is a design principle for
Traditional software separating a computer program into distinct sections
such that each section addresses a separate concern
45
Image by Arda Cetinkaya
ML systems
● Code and data are tightly coupled
○ ML systems are part code, part data
● Not only test and version code, need to test and version data too
the hard part
46
ML System: version data
● Line-by-line diffs like Git doesn’t work with datasets
● Can’t naively create multiple copies of large datasets
● How to merge changes?
47
ML System: test data
● How to test data correctness/usefulness?
● How to know if data meets model assumptions?
● How to know when the underlying data distribution has changed? How to
measure the changes?
● How to know if a data sample is good or bad for your systems?
○ Not all data points are equal (e.g. images of road surfaces with cyclists are more important for
autonomous vehicles)
○ Bad data might harm your model and/or make it susceptible to attacks like data poisoning
attacks
48
ML System: data poisoning attacks
49
Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning (Chen et al., 2017)
Engineering challenges with large ML models
● Too big to fit on-device
● Consume too much energy to work on-device
● Too slow to be useful
○ Autocompletion is useless if it takes longer to make a prediction than to type
● How to run CI/CD tests if a test takes hours/days?
50
5. ML production myths
51
Myth #1: Deploying is hard
52
Myth #1: Deploying is hard
53
Myth #2: You only deploy one or two ML models
at a time
54
Myth #2: You only deploy one or two ML models
at a time
55
Myth #3: If we don’t do anything, model
performance remains the same
56
Myth #3: If we don’t do anything, model
performance remains the same
Concept drift
57
Myth #3: If we don’t do anything, model
performance remains the same
Concept drift
Tip: train models on data generated 2 months ago & test
on current data to see how much worse they get.
58
Myth #4: You won’t need to update your models
as much
59
Myth #4: You won’t need to update your models
as much
DevOps standard
● Etsy deployed 50 times/day
● Netflix 1000s times/day
● AWS every 11.7 seconds
60
Weibo’s iteration cycle: 10 mins
61
Machine learning with Flink in Weibo (Qian Yu, QCon 2019)
ML + DevOps =
62
Myth #5: Most ML engineers don’t need to worry
about scale
63
Myth #5: Most ML engineers don’t need to worry
about scale
64
StackOverflow Developer Survey 2019
Myth #6: ML can magically transform your
business overnight
65
Myth #6: ML can magically your business
overnight
Magically: possible
Overnight: no
66
Efficiency improves with maturity
67
2020 state of enterprise machine learning (Algorithmia, 2020)
68
Ishan got a
9 6 3 head 2 7
scratcher!
5 1 8 10 4
69
Machine Learning Systems Design
Next class: Designing an ML system