0% found this document useful (0 votes)

96 views

Analyzing IoT Data in Python Chapter4

The document discusses preparing IoT data for machine learning in Python. It covers splitting time series data into training and test sets while avoiding looking into the future, extracting features and labels, building and evaluating a logistic regression model, scaling data for modeling, creating a machine learning pipeline with scaling and modeling steps, saving and loading the pipeline model, making predictions on new data, and applying the model to an IoT data stream.

Uploaded by

Fgpeqw

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

96 views

Analyzing IoT Data in Python Chapter4

Uploaded by

Fgpeqw

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Prepare data for

machine learning
A N A LY Z I N G I OT D ATA I N P Y T H O N

Matthias Voppichler
IT Developer
Machine Learning Refresher
Supervised learning
Classi cation

Regression

Unsupervised learning
Cluster analysis

Deep learning
Neural networks

ANALYZING IOT DATA IN PYTHON

Machine Learning Refresher
Supervised learning
Classi cation

Regression

Unsupervised learning
Cluster analysis

Deep learning
Neural networks

ANALYZING IOT DATA IN PYTHON

Labels
print(environment_labeled.head())

humidity temperature pressure label

timestamp
2018-10-01 00:00:00 81.0 11.8 1013.4 1
2018-10-01 00:15:00 79.7 11.9 1013.1 1
2018-10-01 00:30:00 81.0 12.1 1013.0 1
2018-10-01 00:45:00 79.7 11.7 1012.7 1
2018-10-01 01:00:00 84.3 11.2 1012.6 1

ANALYZING IOT DATA IN PYTHON

Train / Test split
Splitting time series data

Model should not see test-data during training

Cannot use random split

Model should not be allowed to look into the future

ANALYZING IOT DATA IN PYTHON

Train / test split
split_day = "2018-10-13"

train = environment[:split_day]
test = environment[split_day:]

print(train.iloc[0].name)
print(train.iloc[-1].name)
print(test.iloc[0].name)
print(test.iloc[-1].name)

2018-10-01 00:00:00
2018-10-13 23:45:00
2018-10-13 00:00:00
2018-10-15 23:45:00

ANALYZING IOT DATA IN PYTHON

Features and Labels
X_train = train.drop("target", axis=1)
y_train = train["target"]
X_test = test.drop("target", axis=1)
y_test = test["target"]

print(X_train.shape)
print(y_train.shape)

(1248, 3)
(1248,)

ANALYZING IOT DATA IN PYTHON

Logistic Regression
from sklearn.linear_model import LogisticRegression

logreg = LogisticRegression()

logreg.fit(X_train, y_train)

print(logreg.predict(X_test))

[0 0 1 1 1 1 1 0 0]

ANALYZING IOT DATA IN PYTHON

Let's practice!
A N A LY Z I N G I OT D ATA I N P Y T H O N
Scaling data for
machine learning
A N A LY Z I N G I OT D ATA I N P Y T H O N

Matthias Voppichler
IT Developer
Evaluate the model
logreg = LogisticRegression()
logreg.fit(X_train, y_train)

print(logreg.score(X_test, y_test))

0.78145113

ANALYZING IOT DATA IN PYTHON

Scaling
scikit-learn's StandardScaler
remove mean

scale data to variance

ANALYZING IOT DATA IN PYTHON

Unscaled data
print(data.head())

humidity temperature pressure

timestamp
2018-10-01 00:00:00 81.0 11.8 1013.4
2018-10-01 00:15:00 79.7 11.9 1013.1
2018-10-01 00:30:00 81.0 12.1 1013.0
2018-10-01 00:45:00 79.7 11.7 1012.7
2018-10-01 01:00:00 84.3 11.2 1012.6

ANALYZING IOT DATA IN PYTHON

Standardscaler
from sklearn.preprocessing import StandardScaler

sc = StandardScaler()

sc.fit(data)

print(sc.mean_)
print(sc.var_)

[ 71.8826716 14.17002019 1018.17042396]

[372.78261022 20.37926608 53.67519188]

data_scaled = sc.transform(data)

ANALYZING IOT DATA IN PYTHON

Standardscaler
df_scaled = pd.DataFrame(data_scaled,
columns=data.columns,
index=data.index)
print(data_scaled.head())

humidity temperature pressure

timestamp
2018-10-01 00:00:00 0.472215 -0.524998 -0.651134
2018-10-01 00:15:00 0.404884 -0.502847 -0.692082
2018-10-01 00:30:00 0.472215 -0.458543 -0.705731
2018-10-01 00:45:00 0.404884 -0.547150 -0.746679
2018-10-01 01:00:00 0.643132 -0.657908 -0.760329

ANALYZING IOT DATA IN PYTHON

Evaluate the model
logreg = LogisticRegression()
logreg.fit(X_train_scaled, y_train_scaled)

print(logreg.score(X_test_scaled, y_test_scaled))

0.88145113

ANALYZING IOT DATA IN PYTHON

Let's practice!
A N A LY Z I N G I OT D ATA I N P Y T H O N
Develop machine
learning pipeline
A N A LY Z I N G I OT D ATA I N P Y T H O N

Matthias Voppichler
IT Developer
Pipeline
Transform
Conversation

Scaling

Estimator
Model

ANALYZING IOT DATA IN PYTHON

Create a Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline

# Initialize Objects
sc = StandardScaler()
logreg = LogisticRegression()

# Create pipeline
pl = Pipeline([
("scale", sc),
("logreg", logreg)
])

ANALYZING IOT DATA IN PYTHON

Inspect Pipeline
pl

Pipeline(memory=None,
steps=[('scale', StandardScaler(copy=True, with_mean=True, with_std=True)),
('logreg', <class 'sklearn.linear_model.logistic.LogisticRegression'>)])

pl.fit(X_train, y_train)

print(pl.predict(X_test))

[0 0 1 1 0 1 1 0 0]

ANALYZING IOT DATA IN PYTHON

Save model
import pickle

with Path("pipeline_model.pkl").open("bw") as f:
pickle.dump(pl, f)

ANALYZING IOT DATA IN PYTHON

Load Model
import pickle
with Path("pipeline_model.pkl").open('br') as f:
pl = pickle.load(f)

Pipeline(memory=None,
steps=[('scale', StandardScaler(copy=True, with_mean=True, with_std=True)),
('logreg', LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='warn', n_jobs=None, penalty='l2',
random_state=None, solver='warn', tol=0.0001, verbose=0, warm_start=False))])

A word of caution
DO NOT unpickle untrusted les, this can lead to malicious code being executed.

ANALYZING IOT DATA IN PYTHON

Let's practice!
A N A LY Z I N G I OT D ATA I N P Y T H O N
Apply a machine
learning model
A N A LY Z I N G I OT D ATA I N P Y T H O N

Matthias Voppichler
IT Developer
Model Recap
# Create Pipeline
pl = Pipeline([
("scale", StandardScaler()),
("logreg", LogisticRegression())
])

# Fit the pipeline

pl.fit(X_train, y_train)
print(pl.score(X_test, y_test))

0.8897932222860425

ANALYZING IOT DATA IN PYTHON

Predict
predictions = pl.predict(X_test)

print(predictions)
print(f"Test length: {len(X_test)}")
print(f"Prediction length: {len(predictions)}")

[0 0 0 ... 1 1 1]
Test length: 500
Prediction length: 500

ANALYZING IOT DATA IN PYTHON

Record conversation
print(single_record)

{'timestamp': '2018-11-30 18:15:00',

'humidity': 81.7,
'pressure': 1019.8,
'temperature': 1.5},

cols = X_train.columns
df = pd.DataFrame.from_records([single_record],
index="timestamp",
columns=cols)

ANALYZING IOT DATA IN PYTHON

Apply to datastream
def on_message(client, userdata, message):
data = json.loads(message.payload)

df = pd.DataFrame.from_records([data],
index="timestamp",
columns=cols)

category = pl.predict(df)
maybe_alert(category[0])

subscribe.callback(on_message, topic, hostname=MQTT_HOST)

ANALYZING IOT DATA IN PYTHON

Let's practice!
A N A LY Z I N G I OT D ATA I N P Y T H O N
Wrapping up
A N A LY Z I N G I OT D ATA I N P Y T H O N

Matthias Voppichler
IT Developer
What you have learned
Accessing IoT data
from a REST API

from a datastream

Data Cleaning

Correlations

Time series decomposition

Machine learning pipeline

ANALYZING IOT DATA IN PYTHON

Next steps
Machine Learning

Database

Big data

PySpark

ANALYZING IOT DATA IN PYTHON

Congratulations!
A N A LY Z I N G I OT D ATA I N P Y T H O N

Edureka Training - Microsoft SQL Server Certification Course
No ratings yet
Edureka Training - Microsoft SQL Server Certification Course
11 pages
Credit Risk Modeling in Python Chapter3
No ratings yet
Credit Risk Modeling in Python Chapter3
35 pages
iCEDQ Brochure - Product Datasheet
No ratings yet
iCEDQ Brochure - Product Datasheet
5 pages
De Mod 5 Deploy Workloads With Databricks Workflows
No ratings yet
De Mod 5 Deploy Workloads With Databricks Workflows
19 pages
Analyzing IoT Data in Python Chapter3
No ratings yet
Analyzing IoT Data in Python Chapter3
30 pages
Designing Machine Learning Workflows in Python Chapter2
No ratings yet
Designing Machine Learning Workflows in Python Chapter2
39 pages
Customer Segmentation in Python Chapter2
No ratings yet
Customer Segmentation in Python Chapter2
33 pages
Introduction To Data Visualization With Seaborn Chapter3
100% (1)
Introduction To Data Visualization With Seaborn Chapter3
32 pages
(IJIT-V6I5P7) :ravishankar Belkunde
No ratings yet
(IJIT-V6I5P7) :ravishankar Belkunde
9 pages
Beginners Python Cheat Sheet PCC Pygal PDF
No ratings yet
Beginners Python Cheat Sheet PCC Pygal PDF
2 pages
Analyzing IoT Data in Python Chapter1
100% (1)
Analyzing IoT Data in Python Chapter1
27 pages
Analyzing IoT Data in Python Chapter2
No ratings yet
Analyzing IoT Data in Python Chapter2
35 pages
Spoken Language Processing in Python Chapter4
No ratings yet
Spoken Language Processing in Python Chapter4
46 pages
Introduction To Data Visualization With Seaborn Chapter2
No ratings yet
Introduction To Data Visualization With Seaborn Chapter2
38 pages
Cloud Practitioner: Aws Certified
No ratings yet
Cloud Practitioner: Aws Certified
18 pages
Spoken Language Processing in Python Chapter2
No ratings yet
Spoken Language Processing in Python Chapter2
23 pages
Designing Machine Learning Workflows in Python Chapter1
No ratings yet
Designing Machine Learning Workflows in Python Chapter1
32 pages
Introduction To Data Visualization With Matplotlib Chapter2
No ratings yet
Introduction To Data Visualization With Matplotlib Chapter2
27 pages
Spoken Language Processing in Python Chapter3
No ratings yet
Spoken Language Processing in Python Chapter3
26 pages
Building Chatbots in Python Chapter4
No ratings yet
Building Chatbots in Python Chapter4
20 pages
Extraction, Transformation, and Load (ETL) Specification
No ratings yet
Extraction, Transformation, and Load (ETL) Specification
8 pages
Experiment No: 1 Introduction To Data Analytics and Python Fundamentals Page-1/11
No ratings yet
Experiment No: 1 Introduction To Data Analytics and Python Fundamentals Page-1/11
8 pages
Credit Risk - Predictive Modelling
No ratings yet
Credit Risk - Predictive Modelling
47 pages
Customer Segmentation in Python Chapter3
No ratings yet
Customer Segmentation in Python Chapter3
25 pages
6632-Bootcamp in Credit Risk
No ratings yet
6632-Bootcamp in Credit Risk
167 pages
SQL Server To Aurora PostgreSQL Migration Playbook 1.0 Preliminary
No ratings yet
SQL Server To Aurora PostgreSQL Migration Playbook 1.0 Preliminary
456 pages
Spoken Language Processing in Python Chapter1
No ratings yet
Spoken Language Processing in Python Chapter1
17 pages
Designing Machine Learning Workflows in Python Chapter3
No ratings yet
Designing Machine Learning Workflows in Python Chapter3
42 pages
Introduction To Data Visualization With Seaborn Chapter1
No ratings yet
Introduction To Data Visualization With Seaborn Chapter1
26 pages
26 Pythonic Code Tips and Tricks
No ratings yet
26 Pythonic Code Tips and Tricks
30 pages
CIT-650 Introduction To Big Data, Developing With Spark and Hadoop
No ratings yet
CIT-650 Introduction To Big Data, Developing With Spark and Hadoop
4 pages
Designing Machine Learning Workflows in Python Chapter4
No ratings yet
Designing Machine Learning Workflows in Python Chapter4
38 pages
Immuta Technical: MARCH 2018
100% (1)
Immuta Technical: MARCH 2018
19 pages
Data Scientist Certification Study Guide
No ratings yet
Data Scientist Certification Study Guide
7 pages
ETL Testing: Online, Classroom, Corporate Mr. 40 Days
No ratings yet
ETL Testing: Online, Classroom, Corporate Mr. 40 Days
13 pages
Python Library Functions
No ratings yet
Python Library Functions
12 pages
Data Modeling: Jak Na Cheatsheet
No ratings yet
Data Modeling: Jak Na Cheatsheet
3 pages
Resident Load Vs Preceding Load
No ratings yet
Resident Load Vs Preceding Load
10 pages
ETL Testing Concepts iCEDQ
No ratings yet
ETL Testing Concepts iCEDQ
20 pages
Customer Segmentation Clustering
No ratings yet
Customer Segmentation Clustering
35 pages
CIS Lab Workbook
No ratings yet
CIS Lab Workbook
72 pages
IBM MDM 11.6 Installation: Topology, Software Bundles, Prerequisites, Steps and Issues
No ratings yet
IBM MDM 11.6 Installation: Topology, Software Bundles, Prerequisites, Steps and Issues
5 pages
Data Lakes For Maximum Flexibility
No ratings yet
Data Lakes For Maximum Flexibility
29 pages
IICT - Data Science
No ratings yet
IICT - Data Science
22 pages
Cleaning Data With PySpark Chapter3
No ratings yet
Cleaning Data With PySpark Chapter3
25 pages
Primo SQL Masterclass
No ratings yet
Primo SQL Masterclass
94 pages
Snowflakes Beginner To Intermediate Path Updated
No ratings yet
Snowflakes Beginner To Intermediate Path Updated
4 pages
Building Chatbots in Python Chapter2 PDF
No ratings yet
Building Chatbots in Python Chapter2 PDF
41 pages
Technologies For Handling Big Data: Prepared By: Saidatul Rahah Hamidi
No ratings yet
Technologies For Handling Big Data: Prepared By: Saidatul Rahah Hamidi
49 pages
Fast Payment Flagship - Final - Nov 1
No ratings yet
Fast Payment Flagship - Final - Nov 1
113 pages
Credit Score Validation
No ratings yet
Credit Score Validation
5 pages
Predictive Analytics I: Data Mining: Process, Methods, and Algorithms
No ratings yet
Predictive Analytics I: Data Mining: Process, Methods, and Algorithms
60 pages
Power BI Cheat Sheet
No ratings yet
Power BI Cheat Sheet
10 pages
Migrating Big Data Analytics
No ratings yet
Migrating Big Data Analytics
16 pages
Methodology For Data Validation v1.0 Rev-2016-06 Final
No ratings yet
Methodology For Data Validation v1.0 Rev-2016-06 Final
76 pages
Portfolio Management Report
No ratings yet
Portfolio Management Report
10 pages
iot cp and a ch 4
No ratings yet
iot cp and a ch 4
18 pages
A Review and Analysis of The Bot-IoT Dataset
No ratings yet
A Review and Analysis of The Bot-IoT Dataset
8 pages
Iot Cp and a Ch 3
No ratings yet
Iot Cp and a Ch 3
19 pages
Unit 4 Iot Cloud
No ratings yet
Unit 4 Iot Cloud
46 pages
Sensors 22 07726 With Cover
No ratings yet
Sensors 22 07726 With Cover
18 pages
Preparing Your Gures To Share With Others: Ariel Rokem
No ratings yet
Preparing Your Gures To Share With Others: Ariel Rokem
35 pages
Changing Plot Style and Color: Erin Case
No ratings yet
Changing Plot Style and Color: Erin Case
54 pages
Introduction To Data Visualization With Matplotlib: Ariel Rokem
No ratings yet
Introduction To Data Visualization With Matplotlib: Ariel Rokem
30 pages
Chapter3 PDF
No ratings yet
Chapter3 PDF
36 pages
Cleaning Data With PySpark Chapter4
No ratings yet
Cleaning Data With PySpark Chapter4
23 pages
Cleaning Data With PySpark Chapter2
100% (1)
Cleaning Data With PySpark Chapter2
25 pages
Customer Segmentation in Python Chapter4
No ratings yet
Customer Segmentation in Python Chapter4
37 pages
Credit Risk Modeling in Python Chapter4
100% (1)
Credit Risk Modeling in Python Chapter4
35 pages
Cleaning Data With PySpark Chapter1
0% (1)
Cleaning Data With PySpark Chapter1
20 pages
Advanced NLP With Spacy Chapter4
No ratings yet
Advanced NLP With Spacy Chapter4
26 pages
Vydehi School of Excellence: Annual Examination Portions (2021-22) Grade - Vii
No ratings yet
Vydehi School of Excellence: Annual Examination Portions (2021-22) Grade - Vii
4 pages
Mark Halverson PHD Data Scientist Resume
No ratings yet
Mark Halverson PHD Data Scientist Resume
1 page
CH 15 International HRM
No ratings yet
CH 15 International HRM
26 pages
MailRediff.3 HIRA Gypsum Plastering Works
No ratings yet
MailRediff.3 HIRA Gypsum Plastering Works
6 pages
Ups Bes
No ratings yet
Ups Bes
34 pages
Kruti - Panel Meter Brochure
No ratings yet
Kruti - Panel Meter Brochure
10 pages
ĐỀ THI SỐ 02 - 2019-2020 ĐÁP ÁN
No ratings yet
ĐỀ THI SỐ 02 - 2019-2020 ĐÁP ÁN
7 pages
Paroil E Mission Green - Atlas Copco Saudi Arabia
No ratings yet
Paroil E Mission Green - Atlas Copco Saudi Arabia
14 pages
NATURAL SCIENCE: Jurnal Penelitian Bidang IPA Dan Pendidikan IPA 6 (1), 2020, (7-21) ISSN: 2715-470X (Online), 2477 - 6181 (Cetak)
No ratings yet
NATURAL SCIENCE: Jurnal Penelitian Bidang IPA Dan Pendidikan IPA 6 (1), 2020, (7-21) ISSN: 2715-470X (Online), 2477 - 6181 (Cetak)
15 pages
06.2 Appendix 2 - Employee Satisfaction Survey PDF
No ratings yet
06.2 Appendix 2 - Employee Satisfaction Survey PDF
5 pages
Ncert Exemplar Problemssolutions Mathematics Class 11th Abhishek Chauhan pdf download
No ratings yet
Ncert Exemplar Problemssolutions Mathematics Class 11th Abhishek Chauhan pdf download
78 pages
IL-OSHA Compliance Guide For Fire Departments
No ratings yet
IL-OSHA Compliance Guide For Fire Departments
88 pages
Null (2) OK
No ratings yet
Null (2) OK
37 pages
L7 Living and Non Living Things (Notes1)
No ratings yet
L7 Living and Non Living Things (Notes1)
7 pages
Science Nc Exam
No ratings yet
Science Nc Exam
9 pages
Impedance Spectroscopy Applications to Electrochemical and Dielectric Phenomena 1st Edition Vadim F. Lvovich - Download the ebook today to explore every detail
No ratings yet
Impedance Spectroscopy Applications to Electrochemical and Dielectric Phenomena 1st Edition Vadim F. Lvovich - Download the ebook today to explore every detail
57 pages
Basic 7 Term 3 Week 4 Science
No ratings yet
Basic 7 Term 3 Week 4 Science
4 pages
ASSESMENT
No ratings yet
ASSESMENT
3 pages
Homework
No ratings yet
Homework
3 pages
(Ebook) Microeconometrics Using Stata: Volume II: Nonlinear Models and Causal Inference Methods (Second Edition) by A. Colin Csmron, Pravin K. Trivedi ISBN 9781597183642, 1597183644 - The ebook version is available in PDF and DOCX for easy access
No ratings yet
(Ebook) Microeconometrics Using Stata: Volume II: Nonlinear Models and Causal Inference Methods (Second Edition) by A. Colin Csmron, Pravin K. Trivedi ISBN 9781597183642, 1597183644 - The ebook version is available in PDF and DOCX for easy access
50 pages
Machine Tool Technology
No ratings yet
Machine Tool Technology
12 pages
Semaphore
No ratings yet
Semaphore
9 pages
Math in The Modern World Module 1
No ratings yet
Math in The Modern World Module 1
103 pages
Faktor Bullying (English Task) - Yasfa N
No ratings yet
Faktor Bullying (English Task) - Yasfa N
3 pages
II Puc Exam-1 Result 2025
No ratings yet
II Puc Exam-1 Result 2025
2 pages
Corrosion-Induced Damages and Failures of Posttensioned Bridges: A Literature Review
No ratings yet
Corrosion-Induced Damages and Failures of Posttensioned Bridges: A Literature Review
17 pages
Shri Mata Vaishno Devi University: Result of B.Arch., Jan-May 2013
No ratings yet
Shri Mata Vaishno Devi University: Result of B.Arch., Jan-May 2013
7 pages
AP_Physics_1_Manual_Alignment_DL
No ratings yet
AP_Physics_1_Manual_Alignment_DL
1 page
Academic Collocation List
No ratings yet
Academic Collocation List
36 pages

Analyzing IoT Data in Python Chapter4

Uploaded by

Analyzing IoT Data in Python Chapter4

Uploaded by

Prepare data for

ANALYZING IOT DATA IN PYTHON

ANALYZING IOT DATA IN PYTHON

humidity temperature pressure label

ANALYZING IOT DATA IN PYTHON

Model should not see test-data during training

Cannot use random split

Model should not be allowed to look into the future

ANALYZING IOT DATA IN PYTHON

ANALYZING IOT DATA IN PYTHON

ANALYZING IOT DATA IN PYTHON

ANALYZING IOT DATA IN PYTHON

ANALYZING IOT DATA IN PYTHON

scale data to variance

ANALYZING IOT DATA IN PYTHON

humidity temperature pressure

ANALYZING IOT DATA IN PYTHON

[ 71.8826716 14.17002019 1018.17042396]

ANALYZING IOT DATA IN PYTHON

humidity temperature pressure

ANALYZING IOT DATA IN PYTHON

ANALYZING IOT DATA IN PYTHON

ANALYZING IOT DATA IN PYTHON

ANALYZING IOT DATA IN PYTHON

ANALYZING IOT DATA IN PYTHON

ANALYZING IOT DATA IN PYTHON

ANALYZING IOT DATA IN PYTHON

# Fit the pipeline

ANALYZING IOT DATA IN PYTHON

ANALYZING IOT DATA IN PYTHON

{'timestamp': '2018-11-30 18:15:00',

ANALYZING IOT DATA IN PYTHON

subscribe.callback(on_message, topic, hostname=MQTT_HOST)

ANALYZING IOT DATA IN PYTHON

Time series decomposition

Machine learning pipeline

ANALYZING IOT DATA IN PYTHON

ANALYZING IOT DATA IN PYTHON

You might also like