Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Cs329s 01 Slides

Download as pdf or txt
Download as pdf or txt
You are on page 1of 70

Machine Learning Systems Design

Lecture 1: Understanding ML production

Reply in Zoom chat:

CS 329 | Chip Huyen | cs329s.stanford.edu Where are you (physically)?


Agenda
1. Course overview
2. ML in research vs. production
3. Breakout exercise
4. ML systems vs. traditional software
5. ML production myths

1. Short class today


2. Lecture note is on course website / syllabus
3. I’m in Vermont, sorry in advance about the
bad Internet

2
1. Course overview

3
What’s machine learning systems design?
The process of defining the interface, algorithms, data, infrastructure, and
hardware for a machine learning system to satisfy specified requirements.

4
What’s machine learning systems design?
The process of defining the interface, algorithms, data, infrastructure, and
hardware for a machine learning system to satisfy specified requirements.

reliable, scalable, maintainable, adaptable

5
We’ll learn
about all of this

System

Interface

Data ML algorithms

Infrastructure

Hardware

6
This class will cover ...
● ML production in the real-world from software, hardware, business
perspectives
● Iterative process for building ML systems at scale
○ project scoping, data management, developing, deploying, monitoring & maintenance,
infrastructure & hardware, business analysis
● Challenges and solutions of ML engineering

7
This class will not teach ...
● Machine learning/deep learning algorithms
○ CS 229: Machine Learning
○ CS 230: Deep Learning
○ CS 231N: Convolutional Neural Networks for Visual Recognition
○ CS 224N: Natural Language Processing with Deep Learning
● Computer systems
○ CS 110: Principles of Computer Systems
○ CS 140E: Operating systems design and implementation
● UX design
○ CS 147: Introduction to Human-Computer Interaction Design
○ DESINST 240: Designing Machine Learning: A Multidisciplinary Approach

8
Machine learning: expectation

This class won’t teach


you how to do this

9
Machine learning: reality

This class will teach you how to


build something like this
(buggy but cool)

10
Prerequisites
● Knowledge of CS principles and skills (CS 106B/X)
● Understanding of ML algorithms (CS 229, CS 230, CS 231N, or CS 224N)
● Familiar with at least one framework such as TensorFlow, PyTorch, JAX
● Familiarity with basic probability theory (CS 109/Stat 116).

11
AI value creation by 2030

13 trillion USD
Most of it will be outside the
consumer internet industry

We need more people from


non-CS background in AI!

12
Zoom etiquettes
● Write questions into Zoom chat
○ Feel free to reply to each other — TAs will also reply
● I will stop occasionally for Q&A
○ TAs will re-share some of the questions with me

13
Zoom etiquettes
● Write questions into Zoom chat Ping Karan if you want to opt out
○ Feel free to reply to each other — TAs will also reply
● I will stop occasionally for Q&A
○ TAs will re-share some of the questions with me
● After each lecture, a random question will get a random reward

14
Zoom etiquettes
We appreciate it
if you keep your video on!

15
Grading
● Assignments (30%)
○ 2-3 assignments
● Final project (60%)
● Class participation (10%)
○ Zoom questions + Piazza
○ Bad sign if by the end of the quarter, we still don’t know who you are

16
See course website: https://cs329s.stanford.edu
Final project
● Build an ML-powered application
● Must work in group of three
● Demo + report (creative formats encouraged)
● Evaluated by course staff and industry experts

17
See course website: https://cs329s.stanford.edu
Honor code: permissive but strict - don’t test us ;)
● OK to search, ask in public about the systems we’re studying. Cite all the
resources you reference.
○ E.g. if you read it in a paper, cite it. If you ask on Quora, include the link.
● NOT OK to ask someone to do assignments/projects for you.
● OK to discuss questions with classmates. Disclose your discussion partners.
● NOT OK to copy solutions from classmates.
● OK to use existing solutions as part of your projects/assignments. Clarify your
contributions.
● NOT OK to pretend that someone’s solution is yours.
● OK to publish your final project after the course is over (we encourage that!)
● NOT OK to post your assignment solutions online.
● ASK the course staff if unsure! 18
Course staff

19
See course website: https://cs329s.stanford.edu
⚠⚠ Work in progress ⚠⚠
● First time the course is offered
● First time Chip’s taught a course online
● The subject is new, we don’t have all the answers
○ We are all learning too!
● We appreciate your:
○ enthusiasm for trying out new things
○ patience bearing with things that don’t quite work
○ feedback to improve the course

20
Inspired by Michael Bernstein’s CS 278
● https://cs329s.stanford.edu
● OHs start next week
● If you enrolled without submitting an application, send us an email!
● Questions so far?

21
2. ML in research vs. production

22
ML in research vs. in production

Research Production

Objectives Model performance* Different stakeholders have


different objectives

23
“*” It’s actively being worked. See Utility is in the Eye of the User: A Critique of NLP Leaderboards (Ethayarajh and Jurafsky, EMNLP 2020)
Stakeholder objectives
ML team
highest accuracy

24
Stakeholder objectives
ML team Sales
highest accuracy sells more ads

25
Stakeholder objectives
ML team Sales Product
highest accuracy sells more ads fastest inference

26
Stakeholder objectives
ML team Sales Product Manager
highest accuracy sells more ads fastest inference maximizes profit
= laying off ML teams

27
Computational priority

Research Production

Objectives Model performance Different stakeholders have


different objectives

Computational priority Fast training, high throughput Fast inference, low latency

generating predictions

28
Latency matters

Latency 100 -> 400 ms reduces searches 0.2% - 0.6% (2009)

30% increase in latency costs 0.5% conversion rate (2019)

29
● Latency: time to move a leaf
● Throughput: how many leaves in 1 sec

30
● Real-time: low latency = high throughput
● Batched: high latency, high throughput

31
ML in research vs. in production

Research Production

Objectives Model performance Different stakeholders have


different objectives

Computational priority Fast training, high throughput Fast inference, low latency

Data Static Constantly shifting

32
Data

Research Production

● Clean ● Messy
● Static ● Constantly shifting
● Mostly historical data ● Historical + streaming data
● Biased, and you don’t know how biased
● Privacy + regulatory concerns

33
34
ML in research vs. in production

Research Production

Objectives Model performance Different stakeholders have


different objectives

Computational priority Fast training, high throughput Fast inference, low latency

Data Static Constantly shifting

Fairness Good to have (sadly) Important

35
Fairness

36
ML in research vs. in production

Research Production

Objectives Model performance Different stakeholders have


different objectives

Computational priority Fast training, high throughput Fast inference, low latency

Data Static Constantly shifting

Fairness Good to have (sadly) Important

Interpretability* Good to have Important

37
Interpretability

Result from the Zoom poll

38
ML in research vs. in production

Research Production

Objectives Model performance Different stakeholders have


different objectives

Computational priority Fast training, high throughput Fast inference, low latency

Data Static Constantly shifting

Fairness Good to have (sadly) Important

Interpretability Good to have Important

39
3. Breakout exercise
Each lecture, you’ll be randomly assigned to a group

40
7 mins - no one right answer!
1. How can academic leaderboards be modified to account for multiple
objectives? Should they?
2. ML models are getting bigger and bigger. How does this affect the usability of
these models in production?

Don’t forget to introduce yourself to your


classmates!

41
Future of leaderboards
● More comprehensive utility function
○ Model performance (e.g. accuracy)
○ Latency
○ Prediction cost
○ Interpretability
○ Robustness
○ Ease of use (e.g. OSS tools)
○ Hardware requirements
● Adaptive to different use cases
○ Instead of a leaderboard for each dataset/task, each use case has its own leaderboard
● Dynamic datasets
○ Distribution shifts

42
Dynamic datasets
WILDS (Koh and Sagawa et al., 2020): 7 datasets with evaluation metrics and
train/test splits representative of distribution shifts in the wild.

43
4. ML systems vs. traditional software

44
Separation of Concerns is a design principle for
Traditional software separating a computer program into distinct sections
such that each section addresses a separate concern

● Code and data are separate


○ Inputs into the system shouldn’t change the underlying code

45
Image by Arda Cetinkaya
ML systems
● Code and data are tightly coupled
○ ML systems are part code, part data
● Not only test and version code, need to test and version data too
the hard part

46
ML System: version data
● Line-by-line diffs like Git doesn’t work with datasets
● Can’t naively create multiple copies of large datasets
● How to merge changes?

47
ML System: test data
● How to test data correctness/usefulness?
● How to know if data meets model assumptions?
● How to know when the underlying data distribution has changed? How to
measure the changes?
● How to know if a data sample is good or bad for your systems?
○ Not all data points are equal (e.g. images of road surfaces with cyclists are more important for
autonomous vehicles)
○ Bad data might harm your model and/or make it susceptible to attacks like data poisoning
attacks

48
ML System: data poisoning attacks

49
Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning (Chen et al., 2017)
Engineering challenges with large ML models
● Too big to fit on-device
● Consume too much energy to work on-device
● Too slow to be useful
○ Autocompletion is useless if it takes longer to make a prediction than to type
● How to run CI/CD tests if a test takes hours/days?

50
5. ML production myths

51
Myth #1: Deploying is hard

52
Myth #1: Deploying is hard

Deploying is easy. Deploying reliably is hard

53
Myth #2: You only deploy one or two ML models
at a time

54
Myth #2: You only deploy one or two ML models
at a time

Booking.com: 150+ models, Uber: thousands

55
Myth #3: If we don’t do anything, model
performance remains the same

56
Myth #3: If we don’t do anything, model
performance remains the same

Concept drift

57
Myth #3: If we don’t do anything, model
performance remains the same

Concept drift
Tip: train models on data generated 2 months ago & test
on current data to see how much worse they get.

58
Myth #4: You won’t need to update your models
as much

59
Myth #4: You won’t need to update your models
as much

DevOps standard
● Etsy deployed 50 times/day
● Netflix 1000s times/day
● AWS every 11.7 seconds

Weibo’s ML iteration cycles: 10 minutes

60
Weibo’s iteration cycle: 10 mins

61
Machine learning with Flink in Weibo (Qian Yu, QCon 2019)
ML + DevOps =

62
Myth #5: Most ML engineers don’t need to worry
about scale

63
Myth #5: Most ML engineers don’t need to worry
about scale

64
StackOverflow Developer Survey 2019
Myth #6: ML can magically transform your
business overnight

65
Myth #6: ML can magically your business
overnight

Magically: possible
Overnight: no

66
Efficiency improves with maturity

67
2020 state of enterprise machine learning (Algorithmia, 2020)
68
Ishan got a
9 6 3 head 2 7
scratcher!

5 1 8 10 4
69
Machine Learning Systems Design
Next class: Designing an ML system

cs329s.stanford.edu | Chip Huyen

You might also like