Bda Assign
Bda Assign
Bda Assign
• Plot a histogram for the variable ‘TotalBill’ to check which range has the highest
frequency.
• Draw a bar chart for the variable “Day”. Identify the category with the maximum
count.
• Demonstrate the data distributions using box, scatter plot, histogram, and bar chart
on iris
dataset.
• Demonstrate the correlation plot on iris dataset and perform exploratory visualization
giving an
CODE:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Task 4: Draw a bar chart for the variable "day". Identify the category with the maximum
count.
plt.figure(figsize=(8, 6))
sns.countplot(x='day', data=Data, hue='day', palette='viridis', legend=False)
plt.title('Bar Chart for Day')
plt.xlabel('Day')
plt.ylabel('Count')
plt.show()
# Task 6: Demonstrate data distributions using box, scatter plot, histogram, and bar chart
# Box plot
plt.figure(figsize=(12, 6))
sns.boxplot(data=iris[['sepallength', 'sepalwidth', 'petallength', 'petalwidth']])
plt.title('Box Plot of Iris Dataset')
plt.show()
# Scatter plot
sns.pairplot(iris)
plt.title('Scatter Plot of Iris Dataset')
plt.show()
# Histogram
plt.figure(figsize=(8, 6))
sns.histplot(data=iris[['sepallength', 'sepalwidth', 'petallength', 'petalwidth']], kde=True)
plt.title('Histogram of Iris Dataset')
plt.show()
# Bar chart
plt.figure(figsize=(8, 6))
#sns.countplot(x='Flowers', data=iris, palette='Set2')
sns.countplot(x='Flowers', data=iris, hue='Flowers', palette='Set2', legend=False)
plt.title('Bar Chart of Flowers in Iris Dataset')
plt.xlabel('Flowers')
plt.ylabel('Count')
plt.show()
# Covariance matrix
covariance_matrix = iris.cov()
OUTPUT:
Covariance Matrix:
Question 2:. Split the Iris dataset into two the datasets - IrisTest_TrainData.csv,
IrisTest_TestData.csv.
• Read them as two separate data frames named Train_Data and Test_Data
respectively.
➢ What is the accuracy score of the K-Nearest Neighbor model (model_1) with 2/3
CODE:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
# 3. Train the K-Nearest Neighbor model (model_1) with 2/3 neighbors and calculate accuracy
features = ['sepallength', 'sepalwidth', 'petallength', 'petalwidth']
model_1 = KNeighborsClassifier(n_neighbors=2) # You can adjust the number of neighbors
model_1.fit(Train_Data[features], Train_Data['Flowers'])
predictions_model_1 = model_1.predict(Test_Data[features])
accuracy_model_1 = accuracy_score(Test_Data['Flowers'], predictions_model_1)
print(f"Accuracy score of model_1: {accuracy_model_1}")
# 5. Train the Logistic Regression model (model_2) and find its accuracy
model_2 = LogisticRegression()
model_2.fit(Train_Data[features], Train_Data['Flowers'])
predictions_model_2 = model_2.predict(Test_Data[features])
accuracy_model_2 = accuracy_score(Test_Data['Flowers'], predictions_model_2)
print(f"Accuracy score of model_2: {accuracy_model_2}")
OUTPUT:
CODE:
import pandas as pd
import statsmodels.api as sm
# Make predictions
predictions = result.predict(X)
OUTPUT:
0 0 380 3.61 3
1 1 660 3.67 3
2 1 800 4.00 1
3 1 640 3.19 4
4 0 520 2.93 4
Iterations 6
==================================================================
============
==================================================================
============
------------------------------------------------------------------------------
==================================================================
============
Model Fit:
0.08107331276954477
Predicted Probabilities:
0 0.189553
1 0.317781
2 0.717814
3 0.148949
4 0.097954
...
395 0.490176
396 0.184989
397 0.186814
398 0.468108
399 0.325045
CODE:
print("\nConfusion Matrix:")
print(confusion_matrix(y_test, y_pred))
OUTPUT:
Accuracy: 1.00
Classification Report:
accuracy 1.00 30
[[10 0 0]
[ 0 9 0]
[ 0 0 11]]
Question 5: Demonstrate any of the Clustering model and evaluate the performance on
Iris dataset.
CODE:
OUPUT: