Cs329s 01 Slides

Machine Learning Systems Design
Lecture 1: Understanding ML production
Reply in Zoom chat:
CS 329 | Chip Huyen | cs329s.stanford.edu Where are you (physically)?

Agenda
1. Course overview
2. ML in research vs. production
3. Breakout exercise
4. ML systems vs. traditional software
5. ML production myths
1. Short class today

2. Lecture note is on course website / syllabus
3. I’m in Vermont, sorry in advance about the
bad Internet
2
1. Course overview
3
What’s machine learning systems design?
The process of defining the interface, algorithms, data, infrastructure, and
hardware for a machine learning system to satisfy specified requirements.
4
What’s machine learning systems design?
The process of defining the interface, algorithms, data, infrastructure, and
hardware for a machine learning system to satisfy specified requirements.
reliable, scalable, maintainable, adaptable
5
We’ll learn
about all of this
System
Interface
Data ML algorithms
Infrastructure
Hardware
6
This class will cover ...
● ML production in the real-world from software, hardware, business
perspectives
● Iterative process for building ML systems at scale
○ project scoping, data management, developing, deploying, monitoring & maintenance,
infrastructure & hardware, business analysis
● Challenges and solutions of ML engineering
7
This class will not teach ...
● Machine learning/deep learning algorithms
○ CS 229: Machine Learning
○ CS 230: Deep Learning
○ CS 231N: Convolutional Neural Networks for Visual Recognition
○ CS 224N: Natural Language Processing with Deep Learning
● Computer systems
○ CS 110: Principles of Computer Systems
○ CS 140E: Operating systems design and implementation
● UX design
○ CS 147: Introduction to Human-Computer Interaction Design
○ DESINST 240: Designing Machine Learning: A Multidisciplinary Approach
8
Machine learning: expectation
This class won’t teach

you how to do this
9
Machine learning: reality
This class will teach you how to

build something like this
(buggy but cool)
10
Prerequisites
● Knowledge of CS principles and skills (CS 106B/X)
● Understanding of ML algorithms (CS 229, CS 230, CS 231N, or CS 224N)
● Familiar with at least one framework such as TensorFlow, PyTorch, JAX
● Familiarity with basic probability theory (CS 109/Stat 116).
11
AI value creation by 2030
13 trillion USD
Most of it will be outside the
consumer internet industry
We need more people from

non-CS background in AI!
12
Zoom etiquettes
● Write questions into Zoom chat
○ Feel free to reply to each other — TAs will also reply
● I will stop occasionally for Q&A
○ TAs will re-share some of the questions with me
13
Zoom etiquettes
● Write questions into Zoom chat Ping Karan if you want to opt out
○ Feel free to reply to each other — TAs will also reply
● I will stop occasionally for Q&A
○ TAs will re-share some of the questions with me
● After each lecture, a random question will get a random reward
14
Zoom etiquettes
We appreciate it
if you keep your video on!
15
Grading
● Assignments (30%)
○ 2-3 assignments
● Final project (60%)
● Class participation (10%)
○ Zoom questions + Piazza
○ Bad sign if by the end of the quarter, we still don’t know who you are
16
See course website: https://cs329s.stanford.edu
Final project
● Build an ML-powered application
● Must work in group of three
● Demo + report (creative formats encouraged)
● Evaluated by course staff and industry experts
17
Honor code: permissive but strict - don’t test us ;)
● OK to search, ask in public about the systems we’re studying. Cite all the
resources you reference.
○ E.g. if you read it in a paper, cite it. If you ask on Quora, include the link.
● NOT OK to ask someone to do assignments/projects for you.
● OK to discuss questions with classmates. Disclose your discussion partners.
● NOT OK to copy solutions from classmates.
● OK to use existing solutions as part of your projects/assignments. Clarify your
contributions.
● NOT OK to pretend that someone’s solution is yours.
● OK to publish your final project after the course is over (we encourage that!)
● NOT OK to post your assignment solutions online.
● ASK the course staff if unsure! 18
Course staff
19
⚠⚠ Work in progress ⚠⚠
● First time the course is offered
● First time Chip’s taught a course online
● The subject is new, we don’t have all the answers
○ We are all learning too!
● We appreciate your:
○ enthusiasm for trying out new things
○ patience bearing with things that don’t quite work
○ feedback to improve the course
20
Inspired by Michael Bernstein’s CS 278
● https://cs329s.stanford.edu
● OHs start next week
● If you enrolled without submitting an application, send us an email!
● Questions so far?
21
2. ML in research vs. production
22
ML in research vs. in production
Research Production
Objectives Model performance* Different stakeholders have

different objectives
23
“*” It’s actively being worked. See Utility is in the Eye of the User: A Critique of NLP Leaderboards (Ethayarajh and Jurafsky, EMNLP 2020)
Stakeholder objectives
ML team
highest accuracy
24
ML team Sales
highest accuracy sells more ads
25
ML team Sales Product
highest accuracy sells more ads fastest inference
26
ML team Sales Product Manager
highest accuracy sells more ads fastest inference maximizes profit
= laying off ML teams
27
Computational priority
Research Production
Objectives Model performance Different stakeholders have

Computational priority Fast training, high throughput Fast inference, low latency
generating predictions
28
Latency matters
Latency 100 -> 400 ms reduces searches 0.2% - 0.6% (2009)
30% increase in latency costs 0.5% conversion rate (2019)
29
● Latency: time to move a leaf
● Throughput: how many leaves in 1 sec
30
● Real-time: low latency = high throughput
● Batched: high latency, high throughput
31
Research Production

Data Static Constantly shifting
32
Data
Research Production
● Clean ● Messy
● Static ● Constantly shifting
● Mostly historical data ● Historical + streaming data
● Biased, and you don’t know how biased
● Privacy + regulatory concerns
33
34
Research Production

Fairness Good to have (sadly) Important
35
Fairness
36
Research Production

Interpretability* Good to have Important
37
Interpretability
Result from the Zoom poll
38
Research Production

Interpretability Good to have Important
39
3. Breakout exercise
Each lecture, you’ll be randomly assigned to a group
40
7 mins - no one right answer!
1. How can academic leaderboards be modified to account for multiple
objectives? Should they?
2. ML models are getting bigger and bigger. How does this affect the usability of
these models in production?
Don’t forget to introduce yourself to your

classmates!
41
Future of leaderboards
● More comprehensive utility function
○ Model performance (e.g. accuracy)
○ Latency
○ Prediction cost
○ Interpretability
○ Robustness
○ Ease of use (e.g. OSS tools)
○ Hardware requirements
● Adaptive to different use cases
○ Instead of a leaderboard for each dataset/task, each use case has its own leaderboard
● Dynamic datasets
○ Distribution shifts
42
Dynamic datasets
WILDS (Koh and Sagawa et al., 2020): 7 datasets with evaluation metrics and
train/test splits representative of distribution shifts in the wild.
43
4. ML systems vs. traditional software
44
Separation of Concerns is a design principle for
Traditional software separating a computer program into distinct sections
such that each section addresses a separate concern
● Code and data are separate

○ Inputs into the system shouldn’t change the underlying code
45
Image by Arda Cetinkaya
ML systems
● Code and data are tightly coupled
○ ML systems are part code, part data
● Not only test and version code, need to test and version data too
the hard part
46
ML System: version data
● Line-by-line diffs like Git doesn’t work with datasets
● Can’t naively create multiple copies of large datasets
● How to merge changes?
47
ML System: test data
● How to test data correctness/usefulness?
● How to know if data meets model assumptions?
● How to know when the underlying data distribution has changed? How to
measure the changes?
● How to know if a data sample is good or bad for your systems?
○ Not all data points are equal (e.g. images of road surfaces with cyclists are more important for
autonomous vehicles)
○ Bad data might harm your model and/or make it susceptible to attacks like data poisoning
attacks
48
ML System: data poisoning attacks
49
Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning (Chen et al., 2017)
Engineering challenges with large ML models
● Too big to fit on-device
● Consume too much energy to work on-device
● Too slow to be useful
○ Autocompletion is useless if it takes longer to make a prediction than to type
● How to run CI/CD tests if a test takes hours/days?
50
5. ML production myths
51
Myth #1: Deploying is hard
52
Myth #1: Deploying is hard
Deploying is easy. Deploying reliably is hard
53
Myth #2: You only deploy one or two ML models
at a time
54
Myth #2: You only deploy one or two ML models
at a time
Booking.com: 150+ models, Uber: thousands
55
Myth #3: If we don’t do anything, model
performance remains the same
56
Concept drift
57
Concept drift
Tip: train models on data generated 2 months ago & test
on current data to see how much worse they get.
58
Myth #4: You won’t need to update your models
as much
59
Myth #4: You won’t need to update your models
as much
DevOps standard
● Etsy deployed 50 times/day
● Netflix 1000s times/day
● AWS every 11.7 seconds
Weibo’s ML iteration cycles: 10 minutes
60
Weibo’s iteration cycle: 10 mins
61
Machine learning with Flink in Weibo (Qian Yu, QCon 2019)
ML + DevOps =
62
Myth #5: Most ML engineers don’t need to worry
about scale
63
Myth #5: Most ML engineers don’t need to worry
about scale
64
StackOverflow Developer Survey 2019
Myth #6: ML can magically transform your
business overnight
65
Myth #6: ML can magically your business
overnight
Magically: possible
Overnight: no
66
Efficiency improves with maturity
67
2020 state of enterprise machine learning (Algorithmia, 2020)
68
Ishan got a
9 6 3 head 2 7
scratcher!
5 1 8 10 4
69
Machine Learning Systems Design
Next class: Designing an ML system
cs329s.stanford.edu | Chip Huyen

Cs329s 01 Slides

Uploaded by

Copyright:

Available Formats

Cs329s 01 Slides

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cs329s 01 Slides

Uploaded by

Copyright:

Available Formats

Machine Learning Systems Design

Lecture 1: Understanding ML production

Reply in Zoom chat:

CS 329 | Chip Huyen | cs329s.stanford.edu Where are you (physically)?

1. Short class today

reliable, scalable, maintainable, adaptable

This class won’t teach

This class will teach you how to

We need more people from

Objectives Model performance* Diﬀerent stakeholders have

Objectives Model performance Diﬀerent stakeholders have

Latency 100 -> 400 ms reduces searches 0.2% - 0.6% (2009)

30% increase in latency costs 0.5% conversion rate (2019)

Objectives Model performance Diﬀerent stakeholders have

Data Static Constantly shifting

Objectives Model performance Diﬀerent stakeholders have

Data Static Constantly shifting

Fairness Good to have (sadly) Important

Objectives Model performance Diﬀerent stakeholders have

Data Static Constantly shifting

Fairness Good to have (sadly) Important

Interpretability* Good to have Important

Result from the Zoom poll

Objectives Model performance Diﬀerent stakeholders have

Data Static Constantly shifting

Fairness Good to have (sadly) Important

Interpretability Good to have Important

Don’t forget to introduce yourself to your

● Code and data are separate

Deploying is easy. Deploying reliably is hard

Booking.com: 150+ models, Uber: thousands

Weibo’s ML iteration cycles: 10 minutes

cs329s.stanford.edu | Chip Huyen

You might also like