ml record
ml record
Lab Manual
Course Objective:
The objective of this lab is to get an overview of the various machine
learning techniques and can demonstrate them using python.
Course Outcomes:
Understand modern notions in predictive data analysis
Select data, model selection, model complexity and identify the trends
Understand a range of machine learning algorithms along
with their strengths and weaknesses
Build predictive models from data and analyze their performance
List of Experiments
1. Write a python program to compute Central Tendency
Measures: Mean, Median, Mode Measure of Dispersion:
Variance, Standard Deviation
2. Study of Python Basic Libraries such as Statistics, Math, Numpy and Scipy
3. Study of Python Libraries for ML application such as Pandas and Matplotlib
4. Write a Python program to implement Simple Linear Regression
5. Implementation of Multiple Linear Regression for House Price Prediction
using sklearn
6. Implementation of Decision tree using sklearn and its parameter tuning
7. Implementation of KNN using sklearn
8. Implementation of Logistic Regression using sklearn
9. Implementation of K-Means Clustering
10. Performance analysis of Classification Algorithms on a specific dataset (Mini
Project)
1. Write a python program to compute Central Tendency Measures: Mean, Median,
def compute_statistics(data):
if not data:
mean = stats.mean(data)
median = stats.median(data)
try:
mode = stats.mode(data)
except stats.StatisticsError:
# Measures of Dispersion
variance = stats.variance(data) if len(data) > 1 else "Variance requires at least two data points"
std_dev = stats.stdev(data) if len(data) > 1 else "Standard deviation requires at least two data
points"
# Display Results
print(f"Mean: {mean}")
print(f"Median: {median}")
print(f"Mode: {mode}")
print(f"Variance: {variance}")
# Example usage
OUTPUT
Mean: 28.333333333333332
Median: 25.0
Mode: 20
Variance: 266.6666666666667
Scipy Statistics:
import statistics as
30]
print(stats.mean(data)) # 20
print(stats.stdev(data)) # 8.16
Math:
import math
print(math.sqrt(16)) # 4
print(math.pi) # 3.141592653589793
Numpy:
import numpy as np
print(np.mean(arr)) # 2.5
print(np.std(arr)) # 1.118033988749895
Scipy:
print(stats.zscore(data)) # Z-scores
3. Study of Python Libraries for ML application such as Pandas and
Matplotlib Pandas:
import pandas as pd
# Load data
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'Score': [85, 90, 95]}
df = pd.DataFrame(data)
# Basic operations
print(df)
Matplotlib:
# Example data
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
plt.title("Example Plot")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.legend()
plt.show()
4. Write a Python program to implement Simple Linear Regression
import numpy as np
# Sample dataset
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
# Display results
print("Intercept:", model.intercept_)
print("Slope:", model.coef_[0])
plt.legend()
plt.show()
OUTPUT:
Evaluation Metrics:
R² Score: Indicates how well the model explains the variability of the data.
5. Implementation of Multiple Linear Regression for House Price Prediction using sklearn
import numpy as np
import pandas as pd
data = {
# Create a DataFrame
df = pd.DataFrame(data)
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print("Intercept:", model.intercept_)
predicted_price = model.predict(new_house)
OUTPUT:
Intercept: 250000.0
import numpy as np
import pandas as pd
data = {
# Create a DataFrame
df = pd.DataFrame(data)
regressor = DecisionTreeRegressor(random_state=42)
regressor.fit(X_train, y_train)
y_pred_reg = regressor.predict(X_test)
classifier = DecisionTreeClassifier(random_state=42)
classifier.fit(X_train_class, y_train_class)
y_pred_class = classifier.predict(X_test_class)
param_grid = {
"min_samples_leaf": [1, 2, 4]
grid_search_reg.fit(X, y)
grid_search_class.fit(X, y_class)
OUTPUT:
Decision Tree
import numpy as np
import pandas as pd
mean_squared_error
# Sample dataset
data = {
"Target_Value": [1.1, 2.0, 3.0, 4.1, 5.2, 6.3, 7.4, 8.5] # Regression values
# Create a DataFrame
df = pd.DataFrame(data)
X = df[["Feature1", "Feature2"]]
knn_classifier.fit(X_train_class, y_train_class)
y_pred_class = knn_classifier.predict(X_test_class)
print("KNN Classifier")
knn_regressor.fit(X_train_reg, y_train_reg)
y_pred_reg = knn_regressor.predict(X_test_reg)
print("\nKNN Regressor")
# Classification
predicted_class = knn_classifier.predict(new_sample_class)
# Regression
predicted_value = knn_regressor.predict(new_sample_reg)
KNN Classifier
Accuracy: 1.0
KNN Regressor
import numpy as np
import pandas as pd
# Sample dataset
data = {
# Create a DataFrame
df = pd.DataFrame(data)
# Features and Target
y = df["Target"] # Target (0 or 1)
model = LogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
y_pred_proba = model.predict_proba(X_test)
# Evaluation Metrics
print("\nConfusion Matrix:")
print(confusion_matrix(y_test, y_pred))
print("\nClassification Report:")
print(classification_report(y_test, y_pred))
# Example Prediction
predicted_class = model.predict(new_sample)
predicted_probabilities =
OUTPUT
Accuracy: 1.0
Confusion Matrix:
[[1 0]
[0 1]]
Classification Report:
accuracy 1.00 2
import numpy as np
import pandas as pd
y_kmeans = kmeans.fit_predict(X)
plt.title("K-Means Clustering")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.legend()
plt.show()
---- inertia = []
for k in k_range:
kmeans.fit(X)
inertia.append(kmeans.inertia_)
plt.ylabel("Inertia")
plt.show()
OUTPUT
Clustering Visualization
A scatter plot shows the data points in their respective clusters, with red "X" marks
for centroids.
Elbow Curve
A plot showing the inertia against the number of clusters helps determine the
optimal number of clusters.
import pandas as pd
import numpy as np
data = load_iris()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = pd.Series(data.target, name="Target")
models = {
"K-Nearest Neighbors":
DecisionTreeClassifier(),
results = []
model.fit(X_train, y_train)
# Predictions
y_pred = model.predict(X_test)
# Evaluation Metrics
cm = confusion_matrix(y_test, y_pred)
# Cross-validation
cv_mean = np.mean(cv_scores)
# Append results
results.append({
"Model": name,
"Accuracy": acc,
})
print(f"Model: {result['Model']}")
print(f"Accuracy: {result['Accuracy']:.2f}")
print("Confusion Matrix:")
print(result["Confusion Matrix"])
print("Classification Report:")
print(pd.DataFrame(result["Classification Report"]).transpose())
print("-" * 40)
OUTPUT
Accuracy: 0.97
Confusion Matrix:
[[10 0 0]
[ 0 9 0]
[ 0 1 10]]
Classification Report: