Analyzing IoT Data in Python Chapter3
Analyzing IoT Data in Python Chapter3
Analyzing IoT Data in Python Chapter3
datasources for
further analysis
A N A LY Z I N G I OT D ATA I N P Y T H O N
Matthias Voppichler
IT Developer
Combining data sources
print(temp.head()) print(sun.head())
value value
timestamp timestamp
2018-10-03 08:00:00 16.3 2018-10-03 08:00:00 1798.7
2018-10-03 09:00:00 17.7 2018-10-03 08:30:00 1799.9
2018-10-03 10:00:00 20.2 2018-10-03 09:00:00 1798.1
2018-10-03 11:00:00 20.9 2018-10-03 09:30:00 1797.7
2018-10-03 12:00:00 21.8 2018-10-03 10:00:00 1798.0
print(temp.head(2))
print(sun.head(2))
temperature
timestamp
2018-10-03 08:00:00 16.3
2018-10-03 09:00:00 17.7
sunshine
timestamp
2018-10-03 08:00:00 1798.7
2018-10-03 08:30:00 1799.9
print(environ.head())
temperature sunshine
timestamp
2018-10-03 08:00:00 16.3 1798.7
2018-10-03 08:30:00 NaN 1799.9
2018-10-03 09:00:00 17.7 1798.1
2018-10-03 09:30:00 NaN 1797.7
2018-10-03 10:00:00 20.2 1798.0
env1h = environ.resample("1h").agg(agg_dict)
print(env1h.head())
temperature sunshine
timestamp
2018-10-03 08:00:00 16.3 3598.6
2018-10-03 09:00:00 17.7 3595.8
2018-10-03 10:00:00 20.2 3596.2
2018-10-03 11:00:00 20.9 3594.1
2018-10-03 12:00:00 21.8 3599.9
temperature sunshine
timestamp
2018-10-03 08:00:00 16.3 1798.7
2018-10-03 08:30:00 16.3 1799.9
2018-10-03 09:00:00 17.7 1798.1
2018-10-03 09:30:00 17.7 1797.7
2018-10-03 10:00:00 20.2 1798.0
Matthias Voppichler
IT Developer
df.corr()
print(data.corr())
Positive correlation
Correlation close to 1
Matthias Voppichler
IT Developer
Outliers
Reasons why outliers appear in Datasets:
Measurement error
Manipulation
Extreme Events
data["mean"] = temp_mean
data["upper_limit"] = temp_mean + (temp_std * 3)
data["upper_limit"] = temp_mean - (temp_std * 3)
print(data.iloc[0]["upper_limit"])
print(data.iloc[0]["mean"])
print(data.iloc[0]["lower_limit"])
29.513933116002725
14.5345
-0.44493311600272456
tsaplots.plot_acf(data['temperature'], lags=50)
tsaplots.plot_acf(data['temperature'], lags=50)
Matthias Voppichler
IT Developer
Time series components
Trend
Seasonal
Residual / Noise
decomp.plot()
timestamp
2018-10-01 00:00:00 -3.670394
2018-10-01 01:00:00 -3.987451
2018-10-01 02:00:00 -4.372217
2018-10-01 03:00:00 -4.534066
2018-10-01 04:00:00 -4.802165
Freq: H, Name: temperature, dtype: float64
plt.show()