Data Reduction Using Pythonh
Data Reduction Using Pythonh
Import Dependencies
In [0]:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
Dataset
In [0]:
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
# load dataset into Pandas DataFrame
df = pd.read_csv(url, names=['sepal length','sepal width','petal length','petal width','target
In [4]:
df.head()
Out[4]:
finaldf.head()
Out[16]:
In [20]:
In [21]:
pca.explained_variance_ratio_
Out[21]:
array([0.72770452, 0.23030523])
Conclusion:
The explained variance tells you how much information (variance) can be attributed to each of the principal
components.
This is important as while you can convert 4 dimensional space to 2 dimensional space, you lose some of
the variance (information) when you do this.
By using the attribute explained_variance_ratio_, you can see that the first principal component contains
72.77% of the variance and the second principal component contains 23.03% of the variance.
Together, the two components contain 95.80% of the information.
Variance Threshold
In [1]:
# View first five rows with features with variances above threshold
X_high_variance[0:5]
Out[4]: