Analyzing IoT Data in Python Chapter4
Analyzing IoT Data in Python Chapter4
machine learning
A N A LY Z I N G I OT D ATA I N P Y T H O N
Matthias Voppichler
IT Developer
Machine Learning Refresher
Supervised learning
Classi cation
Regression
Unsupervised learning
Cluster analysis
Deep learning
Neural networks
Regression
Unsupervised learning
Cluster analysis
Deep learning
Neural networks
train = environment[:split_day]
test = environment[split_day:]
print(train.iloc[0].name)
print(train.iloc[-1].name)
print(test.iloc[0].name)
print(test.iloc[-1].name)
2018-10-01 00:00:00
2018-10-13 23:45:00
2018-10-13 00:00:00
2018-10-15 23:45:00
print(X_train.shape)
print(y_train.shape)
(1248, 3)
(1248,)
logreg = LogisticRegression()
logreg.fit(X_train, y_train)
print(logreg.predict(X_test))
[0 0 1 1 1 1 1 0 0]
Matthias Voppichler
IT Developer
Evaluate the model
logreg = LogisticRegression()
logreg.fit(X_train, y_train)
print(logreg.score(X_test, y_test))
0.78145113
sc = StandardScaler()
sc.fit(data)
print(sc.mean_)
print(sc.var_)
data_scaled = sc.transform(data)
print(logreg.score(X_test_scaled, y_test_scaled))
0.88145113
Matthias Voppichler
IT Developer
Pipeline
Transform
Conversation
Scaling
Estimator
Model
# Initialize Objects
sc = StandardScaler()
logreg = LogisticRegression()
# Create pipeline
pl = Pipeline([
("scale", sc),
("logreg", logreg)
])
Pipeline(memory=None,
steps=[('scale', StandardScaler(copy=True, with_mean=True, with_std=True)),
('logreg', <class 'sklearn.linear_model.logistic.LogisticRegression'>)])
pl.fit(X_train, y_train)
print(pl.predict(X_test))
[0 0 1 1 0 1 1 0 0]
with Path("pipeline_model.pkl").open("bw") as f:
pickle.dump(pl, f)
pl
Pipeline(memory=None,
steps=[('scale', StandardScaler(copy=True, with_mean=True, with_std=True)),
('logreg', LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='warn', n_jobs=None, penalty='l2',
random_state=None, solver='warn', tol=0.0001, verbose=0, warm_start=False))])
A word of caution
DO NOT unpickle untrusted les, this can lead to malicious code being executed.
Matthias Voppichler
IT Developer
Model Recap
# Create Pipeline
pl = Pipeline([
("scale", StandardScaler()),
("logreg", LogisticRegression())
])
0.8897932222860425
print(predictions)
print(f"Test length: {len(X_test)}")
print(f"Prediction length: {len(predictions)}")
[0 0 0 ... 1 1 1]
Test length: 500
Prediction length: 500
cols = X_train.columns
df = pd.DataFrame.from_records([single_record],
index="timestamp",
columns=cols)
df = pd.DataFrame.from_records([data],
index="timestamp",
columns=cols)
category = pl.predict(df)
maybe_alert(category[0])
Matthias Voppichler
IT Developer
What you have learned
Accessing IoT data
from a REST API
from a datastream
Data Cleaning
Correlations
Database
Big data
PySpark