Cia Code
Cia Code
Cia Code
Importing data
[32]: import pandas as pd
df1 = pd.read_csv('C:\\Users\\lariy\\Downloads\\price prediction.csv')
df = pd.DataFrame(df1)
df
1
4 3 Condominium 2012 94109
.. … … … …
434 8 Condominium 1914 94123
435 10 SingleFamily 1908 94123
436 13 SingleFamily 1905 94123
437 5 Condominium 1900 94123
438 5 Condominium 1900 94123
df.drop(columns=columns_to_drop, inplace=True)
[34]: df.columns
[35]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 439 entries, 0 to 438
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 bathrooms 439 non-null float64
1 bedrooms 439 non-null int64
2 finishedsqft 438 non-null float64
3 lastsoldprice 439 non-null int64
4 latitude 439 non-null float64
5 longitude 437 non-null float64
6 totalrooms 439 non-null int64
dtypes: float64(4), int64(3)
memory usage: 24.1 KB
2.Detecting and Plotting missing values
[36]: missing_values = df.isnull().sum()
print("Missing values in the data:",missing_values)
2
finishedsqft 1
lastsoldprice 0
latitude 0
longitude 2
totalrooms 0
dtype: int64
3
[38]: missing_values = df.isnull().sum()
print("Missing values before handling:")
print(missing_values)
4.Detecting Outliers
[41]: import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import zscore
4
# Select numerical columns for outlier detection
numerical_columns = df.select_dtypes(include=['float64', 'int64']).columns
plt.title(column)
plt.xlabel('Index')
plt.ylabel(column)
plt.tight_layout()
plt.show()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[41], line 15
13 plt.figure(figsize=(12, 6))
14 for i, column in enumerate(numerical_columns, 1):
---> 15 plt.subplot(2, 3, i)
16 sns.boxplot(data=df[column])
17 plt.title(column)
File␣
↪~\AppData\Local\Programs\Python\Python310\lib\site-packages\matplotlib\pyplot.
↪py:1425, in subplot(*args, **kwargs)
5
1430 if (ax.get_subplotspec() == key
1431 and (kwargs == {}
1432 or (ax._projection_init
1433 == fig._process_projection_requirements(**kwargs)))):
File␣
↪~\AppData\Local\Programs\Python\Python310\lib\site-packages\matplotlib\gridspec.
↪py:599, in SubplotSpec._from_subplot_args(figure, args)
597 else:
598 if not isinstance(num, Integral) or num < 1 or num > rows*cols:
--> 599 raise ValueError(
600 f"num must be an integer with 1 <= num <= {rows*cols}, "
601 f"not {num!r}"
602 )
603 i = j = num
604 return gs[i-1:j]
6
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[43], line 3
1 plt.figure(figsize=(12, 6))
2 for i, column in enumerate(numerical_columns, 1):
----> 3 plt.subplot(2, 3, i)
4 sns.boxplot(data=df_no_outliers[column])
5 plt.title(column)
File␣
↪~\AppData\Local\Programs\Python\Python310\lib\site-packages\matplotlib\pyplot.
↪py:1425, in subplot(*args, **kwargs)
File␣
↪~\AppData\Local\Programs\Python\Python310\lib\site-packages\matplotlib\gridspec.
↪py:599, in SubplotSpec._from_subplot_args(figure, args)
597 else:
598 if not isinstance(num, Integral) or num < 1 or num > rows*cols:
--> 599 raise ValueError(
600 f"num must be an integer with 1 <= num <= {rows*cols}, "
601 f"not {num!r}"
602 )
603 i = j = num
604 return gs[i-1:j]
7
[29]: #the outliers have been reduced to maximum
[44]: df.columns
df.drop(columns=columns_to_drop, inplace=True)
df
totalrooms
8
0 7
1 7
2 3
3 6
4 3
.. …
434 8
435 10
436 13
437 5
438 5
[ ]: #CORRELATION ESTIMATION
plt.figure(figsize=(10, 8))
sns.heatmap(corr_matrix, annot=True, cmap=colors, fmt=".2f", linewidths=0.5,␣
↪cbar=False)
9
0.2 PRINCIPLE COMPONENT ANALYSIS
1.Standardization
[52]: df_index = df.index
df_scaled = pd.DataFrame(scaled_features)
10
df_scaled
[76]: 0 1 2 3 4 5 6
0 -0.285556 -0.399122 -0.403174 -0.211435 1.185443 1.148603 0.105818
1 0.698425 0.140000 0.680420 0.624953 0.874861 0.720970 0.105818
2 -0.941543 -0.938244 -0.883323 -0.689106 0.018791 1.152340 -0.909570
3 0.042437 -0.399122 0.076381 0.080372 0.996031 1.010062 -0.148029
4 -0.941543 -0.938244 -0.774252 -0.545990 0.342496 1.357481 -0.909570
.. … … … … … … …
434 -0.285556 0.140000 0.001099 -0.322953 1.464529 0.118894 0.359665
435 0.698425 0.679121 0.532819 0.251367 1.268994 -0.350380 0.867359
436 3.322374 1.757365 1.528090 1.795897 1.232249 -0.477175 1.628900
437 -0.941543 -0.399122 -0.496240 -0.434844 1.819293 0.570418 -0.401876
438 -0.941543 -0.938244 -0.618352 -0.471645 1.236186 0.138114 -0.401876
'longitude', 'totalrooms']
covariance_df = pd.DataFrame(covariance_matrix, columns=column_names,␣
↪index=column_names)
Covariance Matrix:
bathrooms bedrooms finishedsqft lastsoldprice \
bathrooms 2.329161e+00 2.017295e+00 2.231689e+03 3.157419e+06
bedrooms 2.017295e+00 3.448394e+00 2.356723e+03 3.033648e+06
finishedsqft 2.231689e+03 2.356723e+03 2.845895e+06 3.819981e+09
lastsoldprice 3.157419e+06 3.033648e+06 3.819981e+09 7.253364e+12
latitude 3.289555e-04 8.149072e-05 3.582618e-01 7.795914e+02
longitude -2.819931e-03 -3.999855e-03 -3.881395e+00 -6.205566e+03
totalrooms 4.699683e+00 5.724836e+00 5.793232e+03 7.124227e+06
11
finishedsqft 0.358262 -3.881395 5.793232e+03
lastsoldprice 779.591402 -6205.566010 7.124227e+06
latitude 0.000005 0.000009 2.110409e-04
longitude 0.000009 0.000056 -9.048211e-03
totalrooms 0.000211 -0.009048 1.555414e+01
[ ]:
3.Eigen Decomposition
[94]: import pandas as pd
Eigenvectors:
PC1 PC2 PC3 PC4 PC5 \
0 -4.353040e-07 6.819708e-04 -7.635453e-02 1.506102e-01 9.856396e-01
1 -4.182401e-07 9.100233e-04 -3.482915e-01 9.222261e-01 -1.679016e-01
12
2 -5.266495e-04 9.999962e-01 2.655423e-03 -7.046468e-05 -4.754296e-04
3 -9.999999e-01 -5.266507e-04 -3.019358e-07 -6.439164e-08 -9.080315e-08
4 -1.074800e-10 -6.271480e-08 1.190288e-04 -2.203851e-05 1.232789e-04
5 8.555430e-10 -7.352065e-07 4.166878e-04 -1.856730e-04 8.704261e-04
6 -9.821963e-07 2.447252e-03 -9.342675e-01 -3.561116e-01 -1.796084e-02
PC6 PC7
0 8.041470e-04 4.783139e-05
1 -4.697221e-04 -8.824796e-06
2 9.181873e-09 -1.982864e-07
3 -6.302828e-10 -2.212895e-10
4 -1.940234e-01 9.809969e-01
5 -9.809964e-01 -1.940235e-01
6 -3.528579e-04 3.782708e-05
5.Selecting the best features k
[106]: from sklearn.impute import SimpleImputer
from sklearn.decomposition import PCA
# Apply PCA
pca = PCA()
pca.fit(imputed_features)
[106]: PCA()
plt.xlabel('Number of Components')
plt.ylabel('Cumulative Explained Variance Ratio')
plt.title('Cumulative Explained Variance Ratio by Number of Components')
plt.grid(True)
plt.show()
13
[158]: k = 3 # Number of selected best features
best_k = eigenvectors[:, :k]
14
The selected Principle Components are:
PC5: [ 9.85639614e-01 -1.67901638e-01 -4.75429552e-04 -9.08031536e-08
1.23278868e-04 8.70426120e-04 -1.79608433e-02]
The selected Principle Components are:
PC6: [ 8.04146996e-04 -4.69722146e-04 9.18187349e-09 -6.30282826e-10
-1.94023389e-01 -9.80996398e-01 -3.52857856e-04]
The selected Principle Components are:
PC7: [ 4.78313902e-05 -8.82479632e-06 -1.98286392e-07 -2.21289454e-10
9.80996888e-01 -1.94023456e-01 3.78270850e-05]
# Perform projection
projected_data = np.dot(df, principal_components)
15
# Optionally, you can print the first few rows of the projected data
print("Projected data:")
print(projected_data)
16
[-1.80000060e+06 6.68041584e+02]
[-1.02500065e+06 9.60192920e+02]
[-6.25000258e+05 3.25849888e+02]
[-2.73500082e+06 8.31620176e+02]
[-1.61150059e+06 7.01309677e+02]
[-1.40000006e+07 -2.47009155e+03]
[-1.76000055e+06 5.73112033e+02]
[-1.18500063e+06 8.75928816e+02]
[-1.02500038e+06 4.60190771e+02]
[-7.90000475e+05 6.92953262e+02]
[-1.15500043e+06 5.06728199e+02]
[ nan nan]
[-8.15000500e+05 7.34787698e+02]
[-1.13000043e+06 5.13893632e+02]
[-2.68100147e+06 2.08806798e+03]
[-7.95000360e+05 4.74318378e+02]
[-1.90000062e+06 6.69376315e+02]
[-9.50000351e+05 4.16691475e+02]
[-7.45000255e+05 2.88651712e+02]
[-1.75000064e+06 7.45373924e+02]
[-6.83500242e+05 2.80038435e+02]
[-8.25000435e+05 6.09520735e+02]
[-1.00000045e+06 5.91359038e+02]
[-1.14000038e+06 4.29629872e+02]
[-9.20000386e+05 4.90490776e+02]
[-1.17500050e+06 6.49196199e+02]
[-1.20000059e+06 8.05025255e+02]
[-5.00000318e+06 4.72676112e+03]
[-1.80000075e+06 9.59037127e+02]
[-9.15000381e+05 4.82122479e+02]
[-6.56000212e+05 2.29524020e+02]
[-3.70000099e+06 9.08405035e+02]
[-4.20000115e+06 1.07907716e+03]
[-1.30100043e+06 4.68839330e+02]
[-5.50000174e+05 1.86344469e+02]
[-1.18000066e+06 9.43561823e+02]
[-8.90000377e+05 4.81286351e+02]
[-1.97500088e+06 1.15987619e+03]
[-1.08000064e+06 9.31227134e+02]
[-1.30000037e+06 3.56363279e+02]
[-8.50000356e+05 4.52356605e+02]
[-1.12612546e+06 5.76936620e+02]
[-1.29000061e+06 8.21626454e+02]
[-9.50000425e+05 5.57693390e+02]
[-7.22000319e+05 4.15766689e+02]
[-8.03000402e+05 5.51108906e+02]
[-1.00000041e+06 5.14358474e+02]
[-8.25000469e+05 6.73522085e+02]
17
[-1.61150059e+06 7.01309677e+02]
[-1.70000076e+06 1.00470626e+03]
[-2.05000087e+06 1.12037740e+03]
[-9.15000370e+05 4.61124150e+02]
[-1.50000077e+06 1.06003254e+03]
[-5.71000071e+06 -1.61164325e+02]
[-2.52500220e+06 3.50722891e+03]
[-9.20000662e+05 1.01548879e+03]
[-1.20000035e+06 3.48031022e+02]
[-5.37500462e+05 7.34933236e+02]
[-8.40000621e+05 9.57621221e+02]
[-9.80000592e+05 8.65894752e+02]
[-9.30000253e+05 2.35221175e+02]
[-6.25000235e+05 2.80852506e+02]
[-4.00000051e+06 -7.85910660e+01]
[-3.99500080e+06 4.74040790e+02]
[-1.95000050e+06 4.36046102e+02]
[-1.00000389e+05 7.12341065e+02]
[-9.70000819e+05 1.29915509e+03]
[-1.15500043e+06 5.06728199e+02]
[-3.90000231e+06 3.36309129e+03]
[-9.00000233e+05 2.06020865e+02]
[-7.25000308e+05 3.93184370e+02]
[-8.25000473e+05 6.80520467e+02]
[-9.40000486e+05 6.74959473e+02]
[-6.20000441e+05 6.73482179e+02]
[-4.95000247e+05 3.39313771e+02]
[-1.60000095e+06 1.38537464e+03]
[-7.50000565e+05 8.75021809e+02]
[-1.49500074e+06 1.01267002e+03]
[-2.47000206e+06 3.25920161e+03]
[-8.22500360e+05 4.66835459e+02]
[-1.27500071e+06 1.01253292e+03]
[-5.37000263e+05 3.57192755e+02]
[-1.19500073e+06 1.06966247e+03]
[-1.24000068e+06 9.58960159e+02]
[-8.10000471e+05 6.81421845e+02]
[-9.65000503e+05 7.01790267e+02]
[-6.25000305e+05 4.13849556e+02]
[-7.19000401e+05 5.71346969e+02]
[-1.38500076e+06 1.07060034e+03]
[-1.31000055e+06 7.00099492e+02]
[-7.55000443e+05 6.42390335e+02]
[-9.30000687e+05 1.06022210e+03]
[-1.30000067e+06 9.31366910e+02]
[-2.15000103e+06 1.39572002e+03]
[-7.25000531e+05 8.18186120e+02]
[-7.75000359e+05 4.76855460e+02]
18
[-8.49000447e+05 6.24882605e+02]
[-5.59000212e+05 2.55609228e+02]
[-1.90000074e+06 9.05376105e+02]
[-1.32500085e+06 1.26020140e+03]
[-1.95000050e+06 4.36046102e+02]
[-1.45700039e+06 3.57678809e+02]
[-1.77000065e+06 7.67838339e+02]
[-1.12500035e+06 3.59526569e+02]
[-1.15000058e+06 8.00361262e+02]
[-6.50000113e+06 4.27796070e+02]
[-4.50000076e+06 2.54083462e+02]
[-2.72000067e+06 5.53524033e+02]
[-1.41800050e+06 5.83219175e+02]
[-8.25000311e+05 3.73519180e+02]
[-1.85300048e+06 4.32126187e+02]
[-4.00000114e+06 1.10542164e+03]
[-1.51000072e+06 9.72774593e+02]
[-3.90000091e+06 6.96080545e+02]
[-1.71000012e+07 -2.20571130e+03]
[-1.00000042e+06 5.43356773e+02]
[-3.65000063e+06 2.37735822e+02]
[-9.95000172e+06 6.53841092e+02]
[-4.35000201e+05 2.66910756e+02]
[-1.25000039e+06 4.11696557e+02]
[-2.00000051e+06 4.44708542e+02]
[-2.82500073e+06 6.39226774e+02]
[-1.70000074e+06 9.54702407e+02]
[-1.90500072e+06 8.60744775e+02]
[-1.95000055e+06 5.23038090e+02]
[-8.25000311e+05 3.72519183e+02]
[-8.00000231e+05 2.28686043e+02]
[-8.80000309e+05 3.54557395e+02]
[-1.15500033e+06 3.24726439e+02]
[-2.53200068e+06 6.26540030e+02]
[-6.50000109e+06 3.49784984e+02]
[-4.07500365e+05 5.85399214e+02]
[-3.80000085e+06 6.11744536e+02]
[-1.20000010e+07 -1.31978680e+03]
[-3.69999995e+06 -1.07360136e+03]
[-5.62700147e+06 1.31155559e+03]
[-6.55000100e+06 1.82462146e+02]
[-2.10000073e+06 8.42045475e+02]
[-1.42500063e+06 8.27537259e+02]
[-2.61000064e+06 5.26456608e+02]
[-7.05000090e+06 -1.48860135e+02]
[-1.15000030e+06 2.64361523e+02]
[-3.21000058e+06 2.48462944e+02]
[-2.66000061e+06 4.64118294e+02]
19
[-2.15000069e+06 7.42717771e+02]
[-7.99500110e+06 -1.05509351e+01]
[-6.41000180e+05 1.73424022e+02]
[-1.19900043e+06 5.07553032e+02]
[-3.60000283e+06 4.42911322e+03]
[-5.60000175e+06 1.84077042e+03]
[-1.26000040e+06 4.25427531e+02]
[-4.97500126e+06 1.08893128e+03]
[-2.41000069e+06 6.74786235e+02]
[-2.38890021e+07 -2.34511635e+03]
[-1.70000064e+06 7.61705584e+02]
[-7.87500145e+06 6.77640060e+02]
[-1.09950019e+07 7.29495682e+02]
[-5.99499996e+06 -1.65525150e+03]
[-3.75000019e+06 -6.33928254e+02]
[-8.30000294e+05 3.39888491e+02]
[-2.25000069e+06 7.16050843e+02]
[-1.46500039e+06 3.53464922e+02]
[-6.80000262e+05 3.18884019e+02]
[-1.13000036e+06 3.82895893e+02]
[-1.24500059e+06 7.86329314e+02]
[-8.85000459e+05 6.37921810e+02]
[-6.50000349e+06 4.91179488e+03]
[-1.02000036e+06 4.09824224e+02]
[-1.99500069e+06 7.82343205e+02]
[-8.90000358e+05 4.45288934e+02]
[-3.35000104e+06 1.08873523e+03]
[-9.90000389e+05 4.78624226e+02]
[-8.05000323e+05 4.02052807e+02]
[-1.25500042e+06 4.69062222e+02]
[-3.60000105e+06 1.04607399e+03]
[-1.25000041e+06 4.41695588e+02]
[-1.97500071e+06 8.29876760e+02]
[-2.52500100e+06 1.22521631e+03]
[-2.40000062e+06 5.36060627e+02]
[-2.22500097e+06 1.26421668e+03]
[-9.41750376e+05 4.65037032e+02]
[-1.30000046e+06 5.23361055e+02]
[-1.15000028e+06 2.22358324e+02]
[-1.80000094e+06 1.31404404e+03]
[-1.95000074e+06 8.98042818e+02]
[-3.45000067e+06 3.59070272e+02]
[-1.84000077e+06 9.69974109e+02]
[-4.15000213e+06 2.94441993e+03]
[-1.70000074e+06 9.54695066e+02]
[-3.75000122e+06 1.32007210e+03]
[-1.08300041e+06 4.91645709e+02]
[-1.90000068e+06 7.87378316e+02]
20
[-2.67500079e+06 7.91230527e+02]
[-7.75000124e+06 3.04473906e+02]
[-1.63500061e+06 7.24939055e+02]
[-1.07500041e+06 5.03857974e+02]
[-2.10000073e+06 8.31042728e+02]
[-1.52500061e+06 7.55871411e+02]
[-1.45000050e+06 5.66366354e+02]
[-1.16500049e+06 6.32462789e+02]
[-1.31000057e+06 7.29099897e+02]
[-1.15000052e+06 6.80358358e+02]
[-2.15000078e+06 9.10711332e+02]
[-2.90000086e+06 8.72723245e+02]
[-6.41500174e+06 1.62155299e+03]
[-8.60000308e+06 3.59081937e+03]
[-1.01175018e+07 8.46635381e+02]
[-1.10000041e+06 4.92689594e+02]
[-2.01000050e+06 4.19445469e+02]
[-2.07500042e+06 2.57207856e+02]
[-1.07500051e+06 6.85860643e+02]
[-7.15000320e+05 4.18450800e+02]
[-2.15000090e+06 1.15071714e+03]
[-1.40000082e+06 1.18070651e+03]
[-1.00000037e+06 4.42358746e+02]
[-3.20000085e+06 7.79736396e+02]
[-6.50000266e+05 3.34683539e+02]
[-9.95000610e+05 8.96992056e+02]
[-1.02500042e+06 5.22191447e+02]
[-7.40000094e+06 -1.72200080e+02]
[-7.81000310e+05 3.83690320e+02]
[-4.95000075e+06 1.18093644e+02]
[-5.88001921e+05 3.49236230e+03]
[-2.00000076e+06 9.19721485e+02]
[-5.65000114e+06 6.69442220e+02]
[-7.50000620e+05 9.80022322e+02]
[-1.73500078e+06 1.03127753e+03]
[-1.85000082e+06 1.07070498e+03]
[-8.65000136e+06 3.06486074e+02]
[-3.80000085e+06 6.11744536e+02]
[-6.50000109e+06 3.49784984e+02]
[-9.46000514e+05 7.26797769e+02]
[-1.80000076e+06 9.77044455e+02]
[-7.45000281e+05 3.37653974e+02]
[-3.15000133e+06 1.69107816e+03]
[-1.60000073e+06 9.62373106e+02]
[-1.73000090e+06 1.24690508e+03]
[-3.35000104e+06 1.08873523e+03]
[-1.78000081e+06 1.06957403e+03]
[-3.85000126e+06 1.37241552e+03]
21
[-2.01000116e+06 1.66844979e+03]
[-3.67500085e+06 6.52573174e+02]
[-1.45000065e+06 8.61368368e+02]
[-1.28000049e+06 5.93891394e+02]
[-9.50000426e+05 5.58688491e+02]
[-3.00000110e+06 1.30506067e+03]
[-8.90000387e+05 5.00292765e+02]
[-3.80000279e+06 4.29876503e+03]
[-3.15000140e+06 1.82607247e+03]
[-5.25000104e+06 6.01106210e+02]
[-1.62500028e+06 1.13202089e+02]
[-1.68500082e+06 1.11861228e+03]
[-1.26100074e+06 1.07391487e+03]
[-7.12500518e+05 7.96768790e+02]
[-1.74000086e+06 1.18363948e+03]
[-7.30000357e+05 4.85550757e+02]
[-8.40000320e+05 3.85620026e+02]
[-6.35000307e+05 4.15583023e+02]
[-8.80000359e+06 4.49049300e+03]
[-1.36800084e+06 1.22555410e+03]
[-1.02500096e+06 1.56019242e+03]
[-1.78200060e+06 6.61517999e+02]
[-1.20000081e+06 1.22402771e+03]
[-2.85500192e+06 2.89643477e+03]
[-9.60000394e+06 4.94419228e+03]
[-1.75000051e+06 5.13371784e+02]
[-1.70000074e+06 9.54702407e+02]
[-2.10000083e+06 1.01604937e+03]
[-8.85000376e+05 4.80922744e+02]
[-9.80000417e+05 5.33890202e+02]
[-1.65000031e+06 1.49037403e+02]
[-8.60000276e+06 2.97081705e+03]
[-7.30000228e+05 2.40551683e+02]
[-1.62900053e+06 5.72099207e+02]
[-1.45000080e+06 1.13236490e+03]
[-2.00000101e+06 1.39371076e+03]
[-1.77000065e+06 7.67838339e+02]
[-5.32000208e+06 2.55023014e+03]
[-2.00000070e+06 8.02712312e+02]
[-1.80000083e+06 1.10504642e+03]
[-6.70000352e+05 4.91149895e+02]
[-4.45000128e+06 1.25642084e+03]
[-1.58950079e+06 1.07990286e+03]
[-1.61000072e+06 9.52102065e+02]
[-2.51500109e+06 1.41548826e+03]
[-4.99900132e+06 1.18728965e+03]
[-6.55000186e+05 1.80050860e+02]
[-2.15000069e+06 7.42717771e+02]
22
[-3.35000194e+06 2.80673454e+03]
[-1.05000029e+06 2.77023382e+02]
[-8.95000424e+05 5.69655201e+02]
[-1.34900042e+06 4.34559451e+02]
[-1.30000080e+06 1.17136265e+03]
[-1.08500067e+06 9.86596109e+02]
[-1.34000061e+06 8.06301277e+02]
[-2.16500102e+06 1.35980990e+03]
[-9.65000281e+05 2.79790611e+02]
[-8.30000502e+05 7.34887680e+02]
[-9.50000939e+05 1.53369061e+03]
[-7.94000449e+05 6.43844390e+02]
[-6.50000277e+06 3.54478218e+03]
[-1.47500063e+06 8.10199798e+02]
[-2.05000134e+06 2.01038171e+03]
[-4.99000099e+06 5.72030316e+02]
[-5.00000149e+06 1.50876014e+03]
[-9.10000458e+05 6.30758537e+02]
[-7.50000105e+06 1.01425589e+01]
[-1.15000050e+06 6.46361844e+02]
[-1.80000080e+06 1.05204594e+03]
[-1.25000075e+06 1.09169962e+03]
[-1.27500062e+06 8.48533539e+02]
[-7.05000433e+05 6.36716502e+02]
[-8.72500461e+05 6.46505505e+02]
[-2.60000116e+06 1.52672146e+03]
[-1.10000037e+06 4.20693564e+02]
[-2.10000097e+06 1.28204694e+03]
[-1.19900043e+06 5.07553032e+02]
[-1.62500100e+06 1.47820308e+03]
[-2.71000073e+06 6.77793788e+02]
[-2.53200068e+06 6.26540030e+02]
[-3.21000058e+06 2.48462944e+02]
[-1.50000077e+06 1.06003254e+03]
[-1.90300029e+06 4.27993523e+01]
[-3.50005829e+04 1.09757852e+03]
[-7.20000459e+05 6.82816538e+02]
[-1.30000046e+06 5.39365034e+02]
[-3.80000279e+06 4.29876503e+03]
[-4.62100210e+06 2.76636720e+03]
[-7.70000562e+05 8.64489024e+02]
[-1.60000075e+06 1.00736747e+03]
[-4.00000168e+06 2.14540963e+03]
[-6.26000357e+05 5.12324978e+02]
[-1.85300048e+06 4.32126187e+02]
[-1.16600078e+06 1.17093655e+03]
[-3.35000373e+06 6.19278769e+03]
[-1.10000047e+06 6.07692175e+02]
23
[-4.00000113e+06 1.09340763e+03]
[-1.45700039e+06 3.57678809e+02]
[-1.77000065e+06 7.67838339e+02]
[-2.40000077e+06 8.32056547e+02]
[-7.45000014e+06 -1.69352958e+03]
[-1.30750030e+07 2.30405906e+03]
[-2.50000052e+06 3.30386011e+02]
[-4.15000071e+06 2.56414152e+02]
[-2.30000030e+06 -4.02878644e+01]
[-9.00000190e+06 1.23215198e+03]
[-3.50000192e+06 2.72376843e+03]
[-1.25000028e+07 2.09188138e+03]
[-1.30000050e+06 6.12362311e+02]
[-1.51000049e+06 5.29763976e+02]
[-1.67600019e+07 -8.26659911e+02]
[-1.20000016e+07 -2.08787587e+02]
[-2.98900122e+06 1.53084935e+03]
[-2.81000056e+06 3.29121021e+02]
[-2.52500165e+06 2.47022049e+03]
[-2.47500083e+06 9.21551970e+02]
[-1.15000049e+06 6.30358547e+02]
[-1.00900030e+06 2.96617831e+02]
[-7.45000014e+06 -1.69352958e+03]
[-2.40000062e+06 5.36044693e+02]
[-8.30000342e+05 4.29888151e+02]
[-1.41000037e+06 3.32431578e+02]
[-3.00000087e+06 8.70060097e+02]
[-3.50000112e+06 1.20173252e+03]
[-9.50000362e+06 4.37482971e+03]
[-4.15000071e+06 2.56414152e+02]
[-3.62500102e+06 9.90906119e+02]
[-2.50000064e+06 5.53384258e+02]
[-9.20000525e+05 7.55488182e+02]
[-2.50200099e+06 1.22234003e+03]
[-4.75000111e+06 8.56422746e+02]
[-3.40000034e+06 -2.55602169e+02]
[-2.90000037e+06 -6.72769005e+01]
[-5.00000262e+06 3.65976652e+03]
[-1.84500068e+06 7.96341161e+02]
[-1.23000053e+06 6.77231963e+02]
[-6.00000061e+06 -4.25890317e+02]
[-3.17500099e+06 1.04589681e+03]
[-9.65000382e+05 4.70812074e+02]
[-1.33000047e+06 5.49562287e+02]
[-1.00000013e+07 -1.24496790e+02]
[-9.50000182e+06 9.46827929e+02]
[-2.85000104e+06 1.21705736e+03]
[-2.72500105e+06 1.28288961e+03]
24
[-5.25000238e+06 3.13511444e+03]
[-2.01500055e+06 5.13808492e+02]
[-5.55000131e+06 1.02710488e+03]
[-4.65000138e+06 1.40110678e+03]
[-1.75000058e+06 6.38371312e+02]
[-1.57600071e+06 9.33028246e+02]
[-6.25000159e+06 1.36746268e+03]
[-9.49000623e+05 9.32216179e+02]
[-8.95000549e+05 8.07655211e+02]
[-1.25000055e+06 7.13700706e+02]
[-1.65000090e+06 1.27604208e+03]
[-3.19500116e+06 1.35937026e+03]
[-7.35000147e+06 8.50142342e+02]
[-1.34900050e+06 5.95558160e+02]
[-1.25000041e+06 4.41696443e+02]]
[159]: projected_data.shape
[159]: (439, 2)
plt.xlabel('Number of Components')
plt.ylabel('Cumulative Explained Variance Ratio')
plt.title('Scree Plot')
plt.grid(True)
25
plt.show()
2.Biplot
[180]:
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[180], line 3
1 import pandas as pd
2 import matplotlib.pyplot as plt
----> 3 from prince import PCA
5 # Assuming X contains your standardized data
6 pca = PCA(n_components=2)
26
c:\users\lariy\appdata\local\programs\python\python310\lib\site-packages
(0.13.0)
Requirement already satisfied: altair<6.0.0,>=4.2.2 in
c:\users\lariy\appdata\local\programs\python\python310\lib\site-packages (from
prince) (5.3.0)
Requirement already satisfied: pandas<3.0.0,>=1.4.1 in
c:\users\lariy\appdata\local\programs\python\python310\lib\site-packages (from
prince) (2.1.3)
Requirement already satisfied: scikit-learn<2.0.0,>=1.0.2 in
c:\users\lariy\appdata\local\programs\python\python310\lib\site-packages (from
prince) (1.4.2)
Requirement already satisfied: jinja2 in
c:\users\lariy\appdata\local\programs\python\python310\lib\site-packages (from
altair<6.0.0,>=4.2.2->prince) (3.1.2)
Requirement already satisfied: packaging in
c:\users\lariy\appdata\local\programs\python\python310\lib\site-packages (from
altair<6.0.0,>=4.2.2->prince) (23.2)
Requirement already satisfied: jsonschema>=3.0 in
c:\users\lariy\appdata\local\programs\python\python310\lib\site-packages (from
altair<6.0.0,>=4.2.2->prince) (4.20.0)
Requirement already satisfied: typing-extensions>=4.0.1 in
c:\users\lariy\appdata\local\programs\python\python310\lib\site-packages (from
altair<6.0.0,>=4.2.2->prince) (4.7.1)
Requirement already satisfied: toolz in
c:\users\lariy\appdata\local\programs\python\python310\lib\site-packages (from
altair<6.0.0,>=4.2.2->prince) (0.12.1)
Requirement already satisfied: numpy in
c:\users\lariy\appdata\local\programs\python\python310\lib\site-packages (from
altair<6.0.0,>=4.2.2->prince) (1.26.2)
Requirement already satisfied: pytz>=2020.1 in
c:\users\lariy\appdata\local\programs\python\python310\lib\site-packages (from
pandas<3.0.0,>=1.4.1->prince) (2023.3.post1)
Requirement already satisfied: python-dateutil>=2.8.2 in
c:\users\lariy\appdata\local\programs\python\python310\lib\site-packages (from
pandas<3.0.0,>=1.4.1->prince) (2.8.2)
Requirement already satisfied: tzdata>=2022.1 in
c:\users\lariy\appdata\local\programs\python\python310\lib\site-packages (from
pandas<3.0.0,>=1.4.1->prince) (2023.3)
Requirement already satisfied: scipy>=1.6.0 in
c:\users\lariy\appdata\local\programs\python\python310\lib\site-packages (from
scikit-learn<2.0.0,>=1.0.2->prince) (1.12.0)
Requirement already satisfied: threadpoolctl>=2.0.0 in
c:\users\lariy\appdata\local\programs\python\python310\lib\site-packages (from
scikit-learn<2.0.0,>=1.0.2->prince) (3.2.0)
Requirement already satisfied: joblib>=1.2.0 in
c:\users\lariy\appdata\local\programs\python\python310\lib\site-packages (from
scikit-learn<2.0.0,>=1.0.2->prince) (1.3.2)
Requirement already satisfied: attrs>=22.2.0 in
27
c:\users\lariy\appdata\local\programs\python\python310\lib\site-packages (from
jsonschema>=3.0->altair<6.0.0,>=4.2.2->prince) (23.1.0)
Requirement already satisfied: jsonschema-specifications>=2023.03.6 in
c:\users\lariy\appdata\local\programs\python\python310\lib\site-packages (from
jsonschema>=3.0->altair<6.0.0,>=4.2.2->prince) (2023.11.1)
Requirement already satisfied: referencing>=0.28.4 in
c:\users\lariy\appdata\local\programs\python\python310\lib\site-packages (from
jsonschema>=3.0->altair<6.0.0,>=4.2.2->prince) (0.31.1)
Requirement already satisfied: rpds-py>=0.7.1 in
c:\users\lariy\appdata\local\programs\python\python310\lib\site-packages (from
jsonschema>=3.0->altair<6.0.0,>=4.2.2->prince) (0.13.2)
Requirement already satisfied: six>=1.5 in
c:\users\lariy\appdata\local\programs\python\python310\lib\site-packages (from
python-dateutil>=2.8.2->pandas<3.0.0,>=1.4.1->prince) (1.16.0)
Requirement already satisfied: MarkupSafe>=2.0 in
c:\users\lariy\appdata\local\programs\python\python310\lib\site-packages (from
jinja2->altair<6.0.0,>=4.2.2->prince) (2.1.3)
Note: you may need to restart the kernel to use updated packages.
imputer = SimpleImputer(strategy='mean')
X_imputed = imputer.fit_transform(df_scaled)
# Perform PCA
pca = PCA(n_components=2)
principal_components = pca.fit_transform(X_df)
# Impute missing values with mean (replace 'mean' with 'median' or␣
↪'most_frequent' if desired)
imputer = SimpleImputer(strategy='mean')
28
X_imputed = imputer.fit_transform(df_scaled)
# Extract loadings of each feature for the first two principal components
loadings = pca.components_.T[:, :2]
plt.show()
29
[189]: import pandas as pd
import numpy as np
from prince import PCA
# Perform PCA
pca = PCA(n_components=2)
principal_components = pca.fit_transform(X_df)
3.Pairplot
[130]: import seaborn as sns
import pandas as pd
30
sns.pairplot(df)
plt.show()
# Impute missing values with mean (replace 'mean' with 'median' or␣
↪'most_frequent' if desired)
imputer = SimpleImputer(strategy='mean')
X_imputed = imputer.fit_transform(X)
[135]: df
31
[135]: bathrooms bedrooms finishedsqft lastsoldprice latitude longitude \
0 2.0 2 1463.0 1950000 37.795139 -122.425309
1 3.5 3 3291.0 4200000 37.794429 -122.428513
2 1.0 1 653.0 665000 37.792472 -122.425281
3 2.5 2 2272.0 2735000 37.794706 -122.426347
4 1.0 1 837.0 1050000 37.793212 -122.423744
.. … … … … … …
434 2.0 3 2145.0 1650000 37.795777 -122.433024
435 3.5 4 3042.0 3195000 37.795330 -122.436540
436 7.5 6 4721.0 7350000 37.795246 -122.437490
437 1.0 2 1306.0 1349000 37.796588 -122.429641
438 1.0 1 1100.0 1250000 37.795255 -122.432880
totalrooms
0 7
1 7
2 3
3 6
4 3
.. …
434 8
435 10
436 13
437 5
438 5
# Perform PCA
pca = PCA(n_components=2)
principal_components = pca.fit_transform(X_imputed)
32
# Create a DataFrame for the principal components
pc_df = pd.DataFrame(data=principal_components, columns=['PC1', 'PC2'])
# Impute missing values with mean (replace 'mean' with 'median' or␣
↪'most_frequent' if desired)
imputer = SimpleImputer(strategy='mean')
X_imputed = imputer.fit_transform(df_scaled)
33
# Assuming X contains your standardized data
pca = PCA(n_components=2)
principal_components = pca.fit_transform(X_imputed)
plt.xlabel('Principal Component')
plt.ylabel('Data Point')
plt.title('Heatmap of Principal Components')
plt.show()
[ ]:
Score plot
[152]: import pandas as pd
from sklearn.impute import SimpleImputer
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
34
import matplotlib.pyplot as plt
imputed_data = imputer.fit_transform(df)
# Perform PCA
pca = PCA(n_components=2) # Reduce to 2 components for a 2D plot
principal_components = pca.fit_transform(scaled_data)
35
0.3 ADDITIONAL EXPLORATION
[148]: from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestRegressor
# Assuming you have already reduced your data dimensionality and stored it in␣
↪X_reduced
# Perform cross-validation
scores = cross_val_score(model, df_scaled, y, cv=5,␣
↪scoring='neg_mean_squared_error')
36
Cross-validation Mean Squared Error: 195630660332.1332
[164]: df
37
437 1.0 2 1306.0 1349000 37.796588 -122.429641
438 1.0 1 1100.0 1250000 37.795255 -122.432880
totalrooms
0 7
1 7
2 3
3 6
4 3
.. …
434 8
435 10
436 13
437 5
438 5
38