ML Lab Manual
ML Lab Manual
ML Lab Manual
Installation of Python
Type “Python download” in the Google search bar and press Enter key. In the list of links shown, select
the
Choose the correct link for your device from the options provided: either Windows installer (64-bit) or
Once you have downloaded the installer, open the .exe file, such as python-3.12.3-amd64.exe, by
double_clicking it to launch the Python installer. Choose the option to install the launcher for all users by
checking the corresponding checkbox, so that all users of the computer can access the Python launcher
application.
Enable users to run Python from the command line by checking the "Add python.exe to PATH"
checkbox. After Clicking the "Install Now" Button the setup will start installing Python on your Windows
system.
After successful installation of Python, close the installation window. You can check if the installation of
Python was successful by using either the command line or the Integrated Development Environment
(IDLE), which you may have installed. To access the command line, click on the Start menu and type
“cmd” in the search bar.
Then click on Command Prompt, type the command “python --V” or “python --version”. You can see
installed version of Python on your system.
Go to Python Integrated Development Environment (IDLE). In Windows search bar, type IDLE and you
can see “IDLE (Python 3.12.3- bit)”. Open IDLE on the IDLE screen itself you can see version. This gives
the conformation of successful installation of python.
NumPy is an open-source Python library that facilitates efficient numerical operations on large
quantities of data. There are a few functions that exist in NumPy that we use on pandas DataFrames.
The most important part about NumPy is that pandas is built on top of it which means Numpy is
required for operating the Pandas.
It is defined as a Python package used for performing the various numerical computations and
processing of the multidimensional and single-dimensional array elements. The calculations using
Numpy arrays are faster than the normal Python array. It is also capable of handling a vast amount of
data and convenient with Matrix multiplication and data reshaping. Steps to install Numpy is,
Or
Pandas is a very popular library for working with data (its goal is to be the most powerful and flexible
open_source tool, and in our opinion, it has reached that goal). DataFrames are at the center of pandas.
A DataFrame is structured like a table or spreadsheet. The rows and the columns both have indexes, and
you can perform operations on rows or columns separately. It can perform five significant steps required
for processing and analysis of data irrespective of the origin of the data, i.e., load, manipulate, prepare,
model, and analyze.
Or
Or
Upon successful installation, the version installed can be verified using the below program.
Output:
Scikit-learn is a widely-used Python library that provides a comprehensive suite of tools and
functionalities for machine learning tasks.
1. Versatility: Scikit-learn offers a wide range of tools and functionalities for various machine learning
tasks, including but not limited to:
• Supervised Learning: Classification, Regression. Almost all the popular supervised learning
algorithms, like Linear Regression, Support Vector Machine (SVM), Decision Tree etc., are the part of
scikit-learn
• Unsupervised Learning: Clustering, Dimensionality Reduction. It supports all the popular unsupervised
learning algorithms from clustering, factor analysis, PCA (Principal Component Analysis) to unsupervised
neural networks
• Model Selection and Evaluation: Cross-validation, Hyperparameter Tuning
2. Consistent Interface: It provides a consistent and user-friendly API, making it easy to experiment with
different algorithms and techniques without needing to learn new syntax for each.
3. Integration with Other Libraries: Scikit-learn seamlessly integrates with other Python libraries like
NumPy, pandas, and Matplotlib, allowing smooth data manipulation, preprocessing, and visualization.
4. Ease of Learning: Its well-documented and straightforward interface makes it suitable for both
beginners and experienced machine learning practitioners. It's often recommended for educational
purposes due to its simplicity.
5. Performance and Scalability: While focusing on simplicity, scikit-learn also emphasizes performance.
It's optimized for efficiency and scalability, making it suitable for handling large datasets and complex
models.
7. Application in Industry and Academia: Scikit-learn's robustness and ease of use have made it a go-to
choose in various domains, including finance, healthcare, natural language processing, and more. It's
widely used in research and production environments.
To access the command line, click on the Start menu and type “cmd” in the search bar. To install the
scikit-learn module type the below command:
Output: dir command can be used to find out all the methods supported by sklearn
Example: linear_example.py
# Sample dataset
model = LinearRegression()
model.fit(X_train, y_train)
predictions = model.predict(X_test)
# Print predictions
print("Predictions:", predictions)
Output:
Scikit-learn (Sklearn) is the most useful and robust library for machine learning in Python.
It provides a selection of efficient tools for machine learning and statistical modeling including
classification, regression, clustering and dimensionality reduction via a consistence interface in
Python.
This library, which is largely written in Python, is built upon NumPy, pandas,SciPy and Matplotlib.
Pip upgrade:
Using pip:
a)Install NumPy :-
pip install numpy
b)Install pandas:-
c)Install matplotlib:-
d)Install scipy:-
e)Install scikit-learn(sklearn):-
Verify installations:
Open a IDLE:
import numpy
import pandas
import scipy
import sklearn
import matplotlib
Output:
Numpy version:
1.26.3
Numpy library is successfully installed
Pandas version:
2.2.0
Pandas library is successfully installed
scipy version:
1.12.0
scipy library is successfully installed
sklearn version:
1.4.0
sklearn library is successfully installed
matplotlib version:
3.8.2
matplotlib library is successfully installed
4. Write a program to Load and explore the dataset of .CVS and excel files using pandas.
import pandas as pd
def explore_dataset(file_path):
if file_path.endswith('.csv'):
df = pd.read_csv("C:/Prabha/BCA/Machine Learning/ML_LAB/iris.csv")
elif file_path.endswith('.xlsx'):
df = pd.read_excel("C:/Prabha/BCA/Machine Learning/ML_LAB/iris.xlsx")
else:
print("Unsupported file format.")
return
print("Dataset information:")
print(df.info())
print("¥n first few rows of the data are")
print(df.head())
print("Summary statistics:")
print(df.describe())
print("Unique values are: ")
for column in df.select_dtypes(include='object').columns:
print(f"{column}: {df[column].unique()}")
file_path = 'iris.csv'
explore_dataset("F:¥XXXXX¥iris.csv")
Output:
Dataset information:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Id 150 non-null int64
1 SepalLengthCm 150 non-null float64
2 SepalWidthCm 150 non-null float64
3 PetalLengthCm 150 non-null float64
4 PetalWidthCm 150 non-null float64
5 Species 150 non-null object
dtypes: float64(4), int64(1), object(1)
memory usage: 7.2+ KB
None
Summary statistics:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm
count 150.000000 150.000000 150.000000 150.000000 150.000000
mean 75.500000 5.843333 3.054000 3.758667 1.198667
std 43.445368 0.828066 0.433594 1.764420 0.763161
min 1.000000 4.300000 2.000000 1.000000 0.100000
25% 38.250000 5.100000 2.800000 1.600000 0.300000
50% 75.500000 5.800000 3.000000 4.350000 1.300000
75% 112.750000 6.400000 3.300000 5.100000 1.800000
max 150.000000 7.900000 4.400000 6.900000 2.500000
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
def visualize_dataset(file_path):
df = pd.read_csv("C:/Prabha/BCA/Machine Learning/ML_LAB/iris.csv")
sns.pairplot(df)
plt.title("Pairplot of the Dataset")
plt.show()
if df.iloc[:, 0].dtype =='object':
sns.countplot(x=df.columns[0], data=df)
plt.title("Bar chart of categorical column")
plt.xlabel(df.columns[0])
plt.ylabel("count")
plt.show()
else:
print("No categorical column found to plot bar chart")
file_path = 'iris.csv'
visualize_dataset("C:/Prabha/BCA/Machine Learning/ML_LAB/iris.csv")
Output:
6. Write a program to Handle missing data, encode categorical variables, and perform feature scaling
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
def preprocess_dataset(df):
df.iloc[::10, 0] = float('NaN')
imputer = SimpleImputer(strategy='mean')
df[df.columns] = imputer.fit_transform(df[df.columns])
scaler = StandardScaler()
df[df.columns[:-1]] = scaler.fit_transform(df[df.columns[:-1]])
return df
preprocessed_df = preprocess_dataset(iris_df)
print("Preprocessed dataset:")
print(preprocessed_df.head())
Output:
7. Write a program to implement a k-Nearest Neighbours (k-NN) classifier using scikit-learn and Train
the classifier on the dataset and evaluate its performance.
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report
Output:
import numpy as np
import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
Output:
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, plot_tree
import matplotlib.pyplot as plt
Output:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
# Generate sample data
X, y = make_blobs(n_samples=500, centers=4, cluster_std=0.8, random_state=42)
Output: