0% found this document useful (0 votes)

39 views

Week-4 Lecture Notes

The document provides an overview of edge AI and knowledge distillation techniques for optimizing AI models. It discusses how knowledge distillation transfers knowledge from a teacher model to a student model through minimizing the KL divergence between their predictions. Various distillation methods and knowledge types are also covered, along with federated learning which decentralizes model training.

Uploaded by

Chinmayi HS

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views

Week-4 Lecture Notes

Uploaded by

Chinmayi HS

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 57

Week 4 Lecture 1

EL
PT
N
0 mins
Edge AI - Intelligence at the Edge

EL
PT
N
Dr. Rajiv Misra, Professor
Dept. of Computer Science & Engineering
Indian Institute of Technology Patna
rajivm@iitp.ac.in
Contents
• Review of Deep Learning
• Optimizing AI for the Edge
• Knowledge Distillation

EL
• Federated Learning

PT
N
Review of Deep Learning

Artificial Intelligence

EL
Machine Learning

PT
Representation Learning

N
Deep Learning

Source: Deep Learning, Ian Goodfellow and Yoshua Bengio and Aaron Courville, MIT Press 2016
Review of Deep Learning

EL
PT
N
Source: Deep Learning, Ian Goodfellow and Yoshua Bengio and Aaron Courville, MIT Press 2016
Optimizing Models
Model compression and acceleration techniques can be divided in the following categories:

• Parameter pruning and sharing:

model quantization

EL
model binarization
parameter sharing

PT
• Low-rank factorization

• Transferred compact convolutional ﬁlters

N
• Knowledge distillation (KD)
Knowledge Distillation

EL
PT
N
Figure Source: Knowledge Distillation A Survey Jianping Gou, Baosheng Yu, Stephen J. Maybank, Dacheng Tao in arXiv:2006.05525v7
Knowledge Distillation
• In vanilla knowledge distillation, knowledge is transferred from the teacher model to the
student model by minimizing a loss function in which the target is the distribution of class
probabilities predicted by the teacher model.
• Specifically, Knowledge Distillation (KD) is accomplished by minimizing the Kullback-Leibler (KL)

EL
divergence between the predictions of teacher and student.

Teacher Logits Student Logits

PT prediction

prediction
softmax

softmax
KD-Loss

N Minimize
Knowledge Distillation KL-Divergence
● KL divergence is a measure of how much a probability distribution diverges from another.

EL
PT
Non-negativity
N
Not Symmetric
Knowledge Distillation

EL
PT
N
Knowledge Distillation

EL
 response-based

PT
 feature-based
 relation-based

N
Knowledge Types Response-Based Knowledge

EL
PT
N
Knowledge Types Response-Based Knowledge

EL
PT
N
Why not use “hard-targets” ?
Knowledge Modeling

EL
PT
N
Knowledge Modeling

EL
PT
N
Distillation Methods

EL
PT
N
Federated Learning

EL
PT
N
Federated Learning: Distributed ML
● 2016: the term FL is first coined by Google researchers
● We have already seen some real-world deployments by companies
and researchers for large scale IOT devices

EL
● Several open-source libraries are under development: PySyft,
TensorFlow Federated, FATE, Flower, Substra...

PT
● FL is highly multidisciplinary: it involves machine learning, numerical
optimization, privacy & security, networks, systems, hardware...

N
Federated Learning: Decentralised data
● Federated Learning (FL) aims to collaboratively train a ML model while
keeping the data decentralized
● Enabling devices to learn from each other (ML training is brought close
● A network of nodes and all nodes with their own central server but

EL
instead of sharing data with the central server, we share model we
don't send data from node to server instead send our model to server

PT
N
Gradient Descent Procedure
The procedure starts off with initial values for the coefficient or coefficients for the
function. These could be 0.0 or a small random value.
coefficient = 0.0

The cost of the coefficients is evaluated by plugging them into the function and
calculating the cost.

EL
cost = f(coefficient) or cost = evaluate(f(coefficient))

We need to know the slope so that we know the direction (sign) to move the
coefficient values in order to get a lower cost on the next iteration.

PT
delta = derivative(cost)
we can now update the coefficient values.

A learning rate parameter (alpha) must be specified that controls how much the
N
coefficients can change on each update.
coefficient = coefficient – (alpha * delta)

This process is repeated until the cost of the coefficients (cost) is 0.0 or close to 0
It does require you to know the gradient of your cost function or the function you
are optimizing
Gradient Descent Algorithm
Gradient Descent algorithm:

EL
θ=θ−α⋅∇J(θ)

PT
Advantages:
• Easy computation
• Easy to implement
• Easy to understand
N
Edge Computing ML: FL

• moves the processing over the edge nodes so

that the clients’ data can be maintained.

EL
• trains an ML algorithm with the local data
samples distributed over multiple edge devices
or servers without any exchange of data.

PT
• Federated learning distributes deep learning by
eliminating the necessity of pooling the data
into a single place.

• model is trained at different sites in

numerous iterations
N
Edge Computing ML: FL
Finding the function: model training

EL
PT
Deep Learning model training

N
Edge Computing ML: FL
Edge Computing ML: FL
Finding the function: model training

EL
PT
N
How is this aggregation applied? FedAvg Algo

EL
PT
N
Example: FL with i.i.d.
Each client trains its model decentral - the model
training process is carried out separately for
each client.

EL
Only learned model parameters are sent to a trusted
center to combine and feed the aggregated main
model.

PT
Then the trusted center sent back the aggregated
main model back to these clients, and this process
is circulated.
N
Apple personalizes Siri without hoovering up data
The tech giant is using privacy-preserving machine learning to improve
its voice assistant while keeping your data on your phone.

It relies primarily on a technique called federated learning.

It allows Apple to train different copies of a speaker recognition

model across all its users’ devices, using only the audio data

EL
available locally.

It then sends just the updated models back to a central server to be

combined into a master model.

PT
In this way, raw audio of users’ Siri requests never leaves their
iPhones and iPads, but the assistant continuously gets better at
identifying the right speaker. In addition to federated learning, Apple

N
also uses something called differential privacy to add a further layer of
protection. The technique injects a small amount of noise into any
raw data before it is fed into a local machine-learning model. The
additional step makes it exceedingly difficult for malicious actors to
reverse-engineer the original audio files from the trained model.
Federated Learning: Training
● There are connected devices let's say we have cluster of four
devices from four of the devices and there is one central server
that has an untrained model.
● We will send a copy of the model to each of the node.
● Each node would receive a copy of that model.

EL
PT
N
Federated Learning: Training
● Now all the nodes in the network has that untrained model that is
received from the server.

EL
PT
N
Federated Learning: Training
● In the next step, we are taking data from each node by taking data it
doesn't mean that we are sharing data.
● Every node has its own data based on which it is going to train a

EL
model.

PT
N
Federated Learning: Training
● Each node is training the model to fit the data that they have and it will
train the model accordingly to its data.

EL
PT
N
Federated Learning: Training
● Now the server would combine all these model received from each node
by taking an average or it will aggregate all the models received from the
nodes.
● Then the server will train that a central model, this model which is now

EL
trained by aggregating the models from each node. It captures the pattern
in the training data on all the nodes it is an aggregated one

PT
N
Federated Learning: Training
● Once the model is aggregated, the server will send the copy of the
updated model back to the nodes.
● Everything is being achieved at the edge so no data sharing is done
which means there is privacy preservation and also very less

EL
communication overhead.

PT
N
Federated Learning: Challenges
Systems heterogeneity

● Size of data
● Computational power

EL
● Network stability
● Local learner
● Learning rate

PT
Expensive Communication

● Communication in the
network can be slower
N
than local computation by
many order of magnitude.
Federated Learning: Challenges
Dealing with Non-I.I.D. data i.i.d (independent and identical distributed)
● Learning from non-i.i.d. data is difficult/slow because each IOT device
needs the model to go in a particular direction
● If data distributions are very different, learning a single model which

EL
performs well for all IOT devices may require a very large number of
parameters
● Another direction to deal with non-i.i.d. data is thus to lift the

PT
requirement that the learned model should be the same for all IOT
devices (“one size fits all”)
● Instead, we can allow each IOT k to learn a (potentially simpler)

kind of collaboration N
personalized model θkbut design the objective so as to enforce some

● When local datasets are non-i.i.d., FedAvg suffers from client drift
● To avoid this drift, one must use fewer local updates and/or smaller
learning rates, which hurts convergence
Federated Learning: Challenges
Preserving Privacy
● ML models are susceptible to various attacks on data privacy
● Membership inference attacks try to infer the presence of a

EL
known individual in the training set, e.g., by exploiting the
confidence in model predictions
● Reconstruction attacks try to infer some of the points used to

PT
train the model, e.g., by differencing attacks
● Federated Learning offers an additional attack surface because

N
the server and/or other clients observe model updates (not only
the final model)
Key differences with Distributed Learning
Data distribution
● In distributed learning, data is centrally stored (e.g., in a data center)
○ The main goal is just to train faster

EL
○ We control how data is distributed across workers: usually, it is
distributed uniformly at random across workers
● In FL, data is naturally distributed and generated locally

PT
○ Data is not independent and identically distributed (non-i.i.d.), and it
is imbalanced

● Enforcing privacy constraints

N
Additional challenges that arise in FL

● Dealing with the possibly limited reliability/availability of participants

● Achieving robustness against malicious parties
Federated Learning: Concerns
When to apply Federated Learning

● Data privacy needed

EL
● Bandwidth and power consumptions are concerns
● High cost of data transfer

PT
When NOT to apply Federated Learning

● When more data won’t improve your model (construct a learning

cure) N
● When additional data is uncorrelated
● Performance is already at ceiling
Federated Learning: Applications
● Predictive maintenance/industrial IOT
● Smartphones

EL
● Healthcare (wearables, drug discovery, prognostics, etc.)
● Enterprise/corporate IT (chat, issue trackers, emails, etc.)

PT
N
Conclusion

In this lecture we discussed about

EL
• What is Knowledge distillation
• Types of knowledge distillation

PT
• Methods of knowledge distillation
• Understanding of Federated Learning
• Different issues with federated learning
N
EL
Thank You!

PT
N
References
• Deep Learning, Ian Goodfellow and Yoshua Bengio and Aaron Courville, MIT Press,
http://www.deeplearningbook.org
• Knowledge Distillation: A Survey Jianping Gou · Baosheng Yu · Stephen J. Maybank · Dacheng Tao
arXiv:2006.05525v7

EL
PT
N

ITWS02
No ratings yet
ITWS02
330 pages
Tracer SC+ IOM Aug 2018
No ratings yet
Tracer SC+ IOM Aug 2018
208 pages
CM Dashboard Arunachal Pradesh
100% (1)
CM Dashboard Arunachal Pradesh
8 pages
Distributed System PDF
No ratings yet
Distributed System PDF
148 pages
Onur Comparch Fall2017 Lecture3 Afterlecture
No ratings yet
Onur Comparch Fall2017 Lecture3 Afterlecture
219 pages
2024 - FCJ - Week 1 - Addons
No ratings yet
2024 - FCJ - Week 1 - Addons
146 pages
Unit5 Cryptography
No ratings yet
Unit5 Cryptography
156 pages
Week-2 - Lecture Notes
No ratings yet
Week-2 - Lecture Notes
100 pages
Computer Architecture Kin Truc May Tinh
No ratings yet
Computer Architecture Kin Truc May Tinh
249 pages
Oodp Unit 1
No ratings yet
Oodp Unit 1
217 pages
PHP Basics SF - 15 04 24
No ratings yet
PHP Basics SF - 15 04 24
118 pages
computer hardware
No ratings yet
computer hardware
227 pages
Java Script Part1 PPT-Unit2 MSD
No ratings yet
Java Script Part1 PPT-Unit2 MSD
135 pages
Lis 211 Quiz
No ratings yet
Lis 211 Quiz
270 pages
Module 1 - Lecture - 7CSE1
No ratings yet
Module 1 - Lecture - 7CSE1
124 pages
COPA
No ratings yet
COPA
469 pages
Lab20 - Understanding Table Storage - Azure
No ratings yet
Lab20 - Understanding Table Storage - Azure
22 pages
Operating System - Unit 1
No ratings yet
Operating System - Unit 1
145 pages
ME 157 Full Course
No ratings yet
ME 157 Full Course
203 pages
Storage and File Structure
No ratings yet
Storage and File Structure
104 pages
DSA Unit-5
No ratings yet
DSA Unit-5
230 pages
CAO - M01 - Introduction To Computer Architecture and Organization
No ratings yet
CAO - M01 - Introduction To Computer Architecture and Organization
100 pages
Comparch 04
No ratings yet
Comparch 04
73 pages
Unit2_WT
No ratings yet
Unit2_WT
204 pages
Lab16 - Understanding Zone Redundant Storage (ZRS) - Azure
No ratings yet
Lab16 - Understanding Zone Redundant Storage (ZRS) - Azure
17 pages
Cse - 2014 Se Module 2 V1
No ratings yet
Cse - 2014 Se Module 2 V1
154 pages
Linux Unit III (1)
No ratings yet
Linux Unit III (1)
74 pages
Digital Literacy New
No ratings yet
Digital Literacy New
311 pages
1-3
No ratings yet
1-3
184 pages
Mongo DB
No ratings yet
Mongo DB
297 pages
ICT PowerPoint Notes
No ratings yet
ICT PowerPoint Notes
285 pages
21 Mongo DB
No ratings yet
21 Mongo DB
104 pages
Python
No ratings yet
Python
323 pages
Lab9 - Understanding Managed Disks - Azure
No ratings yet
Lab9 - Understanding Managed Disks - Azure
33 pages
Unit 3 Final 1
No ratings yet
Unit 3 Final 1
153 pages
Vue Js
No ratings yet
Vue Js
70 pages
Devops Sheet
No ratings yet
Devops Sheet
286 pages
Lab7 - Understanding Features of Network Security Group - Azure
No ratings yet
Lab7 - Understanding Features of Network Security Group - Azure
88 pages
Road Traffic Rules Republic of Lithuania With Annexes 2020-01-10
No ratings yet
Road Traffic Rules Republic of Lithuania With Annexes 2020-01-10
145 pages
WT Unit 1
No ratings yet
WT Unit 1
121 pages
Linux Compressed
No ratings yet
Linux Compressed
1,503 pages
CDD Aws Storage 2022 05 25
No ratings yet
CDD Aws Storage 2022 05 25
116 pages
PP2 Curriculum Design
No ratings yet
PP2 Curriculum Design
221 pages
Unit 1
No ratings yet
Unit 1
241 pages
Java Computer Programming Language
No ratings yet
Java Computer Programming Language
196 pages
Cpe 112 - Intro To Computer Engineering
No ratings yet
Cpe 112 - Intro To Computer Engineering
155 pages
Introduction To Laravel: Presenter: Mohammad Adil
No ratings yet
Introduction To Laravel: Presenter: Mohammad Adil
105 pages
DBMS
No ratings yet
DBMS
334 pages
Php BCSFinalPn
No ratings yet
Php BCSFinalPn
140 pages
Week-3 Lecture Notes
No ratings yet
Week-3 Lecture Notes
171 pages
FALLSEM2024-25 BCSE324L TH VL2024250101403 2024-07-16 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE324L TH VL2024250101403 2024-07-16 Reference-Material-I
141 pages
Information Technology 2
No ratings yet
Information Technology 2
147 pages
Github Course
No ratings yet
Github Course
173 pages
Startup Engineering Full Course
No ratings yet
Startup Engineering Full Course
405 pages
Lab Manual 4340704 IWD
No ratings yet
Lab Manual 4340704 IWD
151 pages
WT Unit 5
No ratings yet
WT Unit 5
123 pages
UNIT - 3 - OS Theory
No ratings yet
UNIT - 3 - OS Theory
158 pages
CoursePresentation-GoogleCertifiedAssociateCloudEngineer
No ratings yet
CoursePresentation-GoogleCertifiedAssociateCloudEngineer
346 pages
18CSC205J-Operating Systems: Unit-II
No ratings yet
18CSC205J-Operating Systems: Unit-II
140 pages
2 Machine Learning Overview
No ratings yet
2 Machine Learning Overview
112 pages
Chapter 1
No ratings yet
Chapter 1
325 pages
Week8
No ratings yet
Week8
137 pages
afcat-previous-year-question-paper-2013-9aa9a65d
No ratings yet
afcat-previous-year-question-paper-2013-9aa9a65d
9 pages
Week-1 - Lecture Notes of NPTEL
No ratings yet
Week-1 - Lecture Notes of NPTEL
126 pages
At Aba 1
No ratings yet
At Aba 1
9 pages
Block Chain
No ratings yet
Block Chain
9 pages
SQL Practice Questions Part - 2
100% (1)
SQL Practice Questions Part - 2
1 page
BRAM With CORE Generator
No ratings yet
BRAM With CORE Generator
6 pages
CompTIA Certkiller N10-006 v2015-03-28 by Veronica 124q PDF
No ratings yet
CompTIA Certkiller N10-006 v2015-03-28 by Veronica 124q PDF
61 pages
Beyond Castle Wolfenstein Manual Commodore64 en
No ratings yet
Beyond Castle Wolfenstein Manual Commodore64 en
16 pages
Philips Hue Iris White and Color Zwart
No ratings yet
Philips Hue Iris White and Color Zwart
8 pages
Build Prop6 5
No ratings yet
Build Prop6 5
6 pages
Chapter 2
No ratings yet
Chapter 2
9 pages
Thesis-M Deri Taufan
No ratings yet
Thesis-M Deri Taufan
106 pages
10.2478 - Picbe 2022 0079
No ratings yet
10.2478 - Picbe 2022 0079
10 pages
Study On Fault Tolerance Methods
No ratings yet
Study On Fault Tolerance Methods
6 pages
TS-2888X PPT en 20181014 HW1
No ratings yet
TS-2888X PPT en 20181014 HW1
41 pages
109141-Day_Planner_-_Online_RN
No ratings yet
109141-Day_Planner_-_Online_RN
5 pages
Adv Synonyms PDF
No ratings yet
Adv Synonyms PDF
2 pages
Docker Workshop 1
No ratings yet
Docker Workshop 1
29 pages
20131113
No ratings yet
20131113
73 pages
Using The at and For Options With Relational Summary Functions
No ratings yet
Using The at and For Options With Relational Summary Functions
11 pages
Negotiation Plan: and Profit Making
100% (1)
Negotiation Plan: and Profit Making
3 pages
Chapter 3 Summary
No ratings yet
Chapter 3 Summary
5 pages
Srikanth Resume Personal
No ratings yet
Srikanth Resume Personal
5 pages
AutoDock Tutorial v1.2
No ratings yet
AutoDock Tutorial v1.2
3 pages
Institute of Aeronautical Engineering: Ch. Keerthi IT-A - 18951A1239 Adavanced Databases
No ratings yet
Institute of Aeronautical Engineering: Ch. Keerthi IT-A - 18951A1239 Adavanced Databases
10 pages
Data Structure
No ratings yet
Data Structure
109 pages
Analisis Dan Perancangan Sistem Informasi Berbasis Website Menggunakan Arsitektur MVC Dengan Framework Codeigniter
No ratings yet
Analisis Dan Perancangan Sistem Informasi Berbasis Website Menggunakan Arsitektur MVC Dengan Framework Codeigniter
20 pages
Region of Interest Pooling Explained
No ratings yet
Region of Interest Pooling Explained
12 pages
List of Courses
No ratings yet
List of Courses
3 pages
Kiln Roller Water Jacket-Model PDF
No ratings yet
Kiln Roller Water Jacket-Model PDF
1 page
Datamining ch8
No ratings yet
Datamining ch8
39 pages
Jboss Tattletale 1.1 User'S Guide: Betraying All Your Project'S Naughty Little Secrets
No ratings yet
Jboss Tattletale 1.1 User'S Guide: Betraying All Your Project'S Naughty Little Secrets
32 pages