Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
83 views

Analyzing IoT Data in Python Chapter4

The document discusses preparing IoT data for machine learning in Python. It covers splitting time series data into training and test sets while avoiding looking into the future, extracting features and labels, building and evaluating a logistic regression model, scaling data for modeling, creating a machine learning pipeline with scaling and modeling steps, saving and loading the pipeline model, making predictions on new data, and applying the model to an IoT data stream.

Uploaded by

Fgpeqw
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views

Analyzing IoT Data in Python Chapter4

The document discusses preparing IoT data for machine learning in Python. It covers splitting time series data into training and test sets while avoiding looking into the future, extracting features and labels, building and evaluating a logistic regression model, scaling data for modeling, creating a machine learning pipeline with scaling and modeling steps, saving and loading the pipeline model, making predictions on new data, and applying the model to an IoT data stream.

Uploaded by

Fgpeqw
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Prepare data for

machine learning
A N A LY Z I N G I OT D ATA I N P Y T H O N

Matthias Voppichler
IT Developer
Machine Learning Refresher
Supervised learning
Classi cation

Regression

Unsupervised learning
Cluster analysis

Deep learning
Neural networks

ANALYZING IOT DATA IN PYTHON


Machine Learning Refresher
Supervised learning
Classi cation

Regression

Unsupervised learning
Cluster analysis

Deep learning
Neural networks

ANALYZING IOT DATA IN PYTHON


Labels
print(environment_labeled.head())

humidity temperature pressure label


timestamp
2018-10-01 00:00:00 81.0 11.8 1013.4 1
2018-10-01 00:15:00 79.7 11.9 1013.1 1
2018-10-01 00:30:00 81.0 12.1 1013.0 1
2018-10-01 00:45:00 79.7 11.7 1012.7 1
2018-10-01 01:00:00 84.3 11.2 1012.6 1

ANALYZING IOT DATA IN PYTHON


Train / Test split
Splitting time series data

Model should not see test-data during training

Cannot use random split

Model should not be allowed to look into the future

ANALYZING IOT DATA IN PYTHON


Train / test split
split_day = "2018-10-13"

train = environment[:split_day]
test = environment[split_day:]

print(train.iloc[0].name)
print(train.iloc[-1].name)
print(test.iloc[0].name)
print(test.iloc[-1].name)

2018-10-01 00:00:00
2018-10-13 23:45:00
2018-10-13 00:00:00
2018-10-15 23:45:00

ANALYZING IOT DATA IN PYTHON


Features and Labels
X_train = train.drop("target", axis=1)
y_train = train["target"]
X_test = test.drop("target", axis=1)
y_test = test["target"]

print(X_train.shape)
print(y_train.shape)

(1248, 3)
(1248,)

ANALYZING IOT DATA IN PYTHON


Logistic Regression
from sklearn.linear_model import LogisticRegression

logreg = LogisticRegression()

logreg.fit(X_train, y_train)

print(logreg.predict(X_test))

[0 0 1 1 1 1 1 0 0]

ANALYZING IOT DATA IN PYTHON


Let's practice!
A N A LY Z I N G I OT D ATA I N P Y T H O N
Scaling data for
machine learning
A N A LY Z I N G I OT D ATA I N P Y T H O N

Matthias Voppichler
IT Developer
Evaluate the model
logreg = LogisticRegression()
logreg.fit(X_train, y_train)

print(logreg.score(X_test, y_test))

0.78145113

ANALYZING IOT DATA IN PYTHON


Scaling
scikit-learn's StandardScaler
remove mean

scale data to variance

ANALYZING IOT DATA IN PYTHON


Unscaled data
print(data.head())

humidity temperature pressure


timestamp
2018-10-01 00:00:00 81.0 11.8 1013.4
2018-10-01 00:15:00 79.7 11.9 1013.1
2018-10-01 00:30:00 81.0 12.1 1013.0
2018-10-01 00:45:00 79.7 11.7 1012.7
2018-10-01 01:00:00 84.3 11.2 1012.6

ANALYZING IOT DATA IN PYTHON


Standardscaler
from sklearn.preprocessing import StandardScaler

sc = StandardScaler()

sc.fit(data)

print(sc.mean_)
print(sc.var_)

[ 71.8826716 14.17002019 1018.17042396]


[372.78261022 20.37926608 53.67519188]

data_scaled = sc.transform(data)

ANALYZING IOT DATA IN PYTHON


Standardscaler
df_scaled = pd.DataFrame(data_scaled,
columns=data.columns,
index=data.index)
print(data_scaled.head())

humidity temperature pressure


timestamp
2018-10-01 00:00:00 0.472215 -0.524998 -0.651134
2018-10-01 00:15:00 0.404884 -0.502847 -0.692082
2018-10-01 00:30:00 0.472215 -0.458543 -0.705731
2018-10-01 00:45:00 0.404884 -0.547150 -0.746679
2018-10-01 01:00:00 0.643132 -0.657908 -0.760329

ANALYZING IOT DATA IN PYTHON


Evaluate the model
logreg = LogisticRegression()
logreg.fit(X_train_scaled, y_train_scaled)

print(logreg.score(X_test_scaled, y_test_scaled))

0.88145113

ANALYZING IOT DATA IN PYTHON


Let's practice!
A N A LY Z I N G I OT D ATA I N P Y T H O N
Develop machine
learning pipeline
A N A LY Z I N G I OT D ATA I N P Y T H O N

Matthias Voppichler
IT Developer
Pipeline
Transform
Conversation

Scaling

Estimator
Model

ANALYZING IOT DATA IN PYTHON


Create a Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline

# Initialize Objects
sc = StandardScaler()
logreg = LogisticRegression()

# Create pipeline
pl = Pipeline([
("scale", sc),
("logreg", logreg)
])

ANALYZING IOT DATA IN PYTHON


Inspect Pipeline
pl

Pipeline(memory=None,
steps=[('scale', StandardScaler(copy=True, with_mean=True, with_std=True)),
('logreg', <class 'sklearn.linear_model.logistic.LogisticRegression'>)])

pl.fit(X_train, y_train)

print(pl.predict(X_test))

[0 0 1 1 0 1 1 0 0]

ANALYZING IOT DATA IN PYTHON


Save model
import pickle

with Path("pipeline_model.pkl").open("bw") as f:
pickle.dump(pl, f)

ANALYZING IOT DATA IN PYTHON


Load Model
import pickle
with Path("pipeline_model.pkl").open('br') as f:
pl = pickle.load(f)

pl

Pipeline(memory=None,
steps=[('scale', StandardScaler(copy=True, with_mean=True, with_std=True)),
('logreg', LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='warn', n_jobs=None, penalty='l2',
random_state=None, solver='warn', tol=0.0001, verbose=0, warm_start=False))])

A word of caution
DO NOT unpickle untrusted les, this can lead to malicious code being executed.

ANALYZING IOT DATA IN PYTHON


Let's practice!
A N A LY Z I N G I OT D ATA I N P Y T H O N
Apply a machine
learning model
A N A LY Z I N G I OT D ATA I N P Y T H O N

Matthias Voppichler
IT Developer
Model Recap
# Create Pipeline
pl = Pipeline([
("scale", StandardScaler()),
("logreg", LogisticRegression())
])

# Fit the pipeline


pl.fit(X_train, y_train)
print(pl.score(X_test, y_test))

0.8897932222860425

ANALYZING IOT DATA IN PYTHON


Predict
predictions = pl.predict(X_test)

print(predictions)
print(f"Test length: {len(X_test)}")
print(f"Prediction length: {len(predictions)}")

[0 0 0 ... 1 1 1]
Test length: 500
Prediction length: 500

ANALYZING IOT DATA IN PYTHON


Record conversation
print(single_record)

{'timestamp': '2018-11-30 18:15:00',


'humidity': 81.7,
'pressure': 1019.8,
'temperature': 1.5},

cols = X_train.columns
df = pd.DataFrame.from_records([single_record],
index="timestamp",
columns=cols)

ANALYZING IOT DATA IN PYTHON


Apply to datastream
def on_message(client, userdata, message):
data = json.loads(message.payload)

df = pd.DataFrame.from_records([data],
index="timestamp",
columns=cols)

category = pl.predict(df)
maybe_alert(category[0])

subscribe.callback(on_message, topic, hostname=MQTT_HOST)

ANALYZING IOT DATA IN PYTHON


Let's practice!
A N A LY Z I N G I OT D ATA I N P Y T H O N
Wrapping up
A N A LY Z I N G I OT D ATA I N P Y T H O N

Matthias Voppichler
IT Developer
What you have learned
Accessing IoT data
from a REST API

from a datastream

Data Cleaning

Correlations

Time series decomposition

Machine learning pipeline

ANALYZING IOT DATA IN PYTHON


Next steps
Machine Learning

Database

Big data

PySpark

ANALYZING IOT DATA IN PYTHON


Congratulations!
A N A LY Z I N G I OT D ATA I N P Y T H O N

You might also like