Iml Practical Assignment
Iml Practical Assignment
Iml Practical Assignment
Semester: 5
Overview of Scikit-learn
Key Features
Example Workflow
1. Import Libraries:
2. Load data:
data = load_iris()
X =
y =
3. split data:
4. preprocess data:
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
5. train model:
model = SVC(kernel='linear'), y_train)
6. make predictions:
y_pred = model.predict(X_test_scaled)
7. evaluate model:
Ease of Use: It has a clean and consistent API that makes it easy to use and integrate into
existing Python code.
Comprehensive Documentation: Excellent documentation and a large community make it
easier to find support and resources.
Efficiency: Scikit-learn is optimized for performance and can handle large datasets
Integration: It integrates well with other Python libraries, making it a great choice for
building end-to-end machine learning solutions.
Scikit-learn is a powerful and user-friendly tool that caters to both beginners and experienced
practitioners in machine learning. Its extensive library of algorithms, preprocessing tools, and
model evaluation methods make it a go-to choice for many machine learning tasks in Python.
2. Write a NumPy program to implement following operation
to convert a list of numeric values into a one-dimensional NumPy array
to create a 3x3 matrix with values ranging from 2 to 10
to append values at the end of an array
to create another shape from an array without changing its data(3*2 to 2*3)
import numpy as np
# 4. Create another shape from an array without changing its data (from 3x2 to 2x3)
# First, create a 3x2 array
original_array = np.arange(6).reshape((3, 2))
print("\nOriginal 3x2 array:")
# Reshape it to 2x3
reshaped_array = original_array.reshape((2, 3))
print("\nReshaped 2x3 array:")
Here's how you can achieve a horizontal stack with equal-length arrays or view
them individually:
import numpy as np
# Optionally, you can pad arrays to make them same length and then stack if needed
max_length = max(len(array1), len(array2), len(array3))
array1_padded = np.pad(array1, (0, max_length - len(array1)), constant_values=np.nan)
array2_padded = np.pad(array2, (0, max_length - len(array2)), constant_values=np.nan)
array3_padded = np.pad(array3, (0, max_length - len(array3)), constant_values=np.nan)
# Print differences
print("\nDifference Between Neighboring Elements:")
1. Element-wise Operations:
o np.add, np.subtract, np.multiply, and np.divide perform element-wise
operations on the arrays array1 and array2.
2. Rounding Elements:
o np.round(array1) rounds each element of array1 to the nearest integer.
3. Mean Across Dimensions:
o np.mean(matrix, axis=1) calculates the mean of each row.
o np.mean(matrix, axis=0) calculates the mean of each column.
4. Difference Between Neighboring Elements:
o np.diff(array1) calculates the difference between each pair of neighboring
elements in array1.
# Sample data
flattened_array = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
matrix = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
# 2. Compute the mean, standard deviation, and variance of the 2D array along the
second axis
mean_along_axis_1 = np.mean(matrix, axis=1)
std_dev_along_axis_1 = np.std(matrix, axis=1)
variance_along_axis_1 = np.var(matrix, axis=1)
In the code:
# Sample data
numpy_array = np.array([10, 20, 30, 40, 50])
data = {
'A': [1, 2, 3, 4, 5],
'B': [10, 20, 30, 40, 50]
df = pd.DataFrame(data)
# Print the Series created from the first column of the DataFrame
print("\nFirst column of DataFrame as a Pandas Series:")
# 3. Create the mean and standard deviation of the data of a given Series
mean_of_series = first_column_series.mean()
std_dev_of_series = first_column_series.std()
# Sample data
data = {
'name': ['Alice', 'Bob', 'Charlie', 'David'],
'age': [25, 30, 35, 40],
'city': ['New York', 'Los Angeles', 'Chicago', 'Houston']
8. Write a Pandas program to create a line plot of the opening, closing stock prices of
given company between two specific dates.
To create a line plot of the opening and closing stock prices for a company between two
specific dates using Pandas and Matplotlib, you'll need to follow these steps:
For this example, let's assume you have stock price data in a CSV file with columns for Date,
Open, and Close. Here’s how you can implement this:
import pandas as pd
import matplotlib.pyplot as plt
# Plotting
plt.figure(figsize=(12, 6))
plt.plot(filtered_df.index, filtered_df['Open'], label='Opening Price', color='blue')
plt.plot(filtered_df.index, filtered_df['Close'], label='Closing Price', color='red')
9. Write a Pandas program to create a plot of Open, High, Low, Close, Adjusted
Closing prices and Volume of given company between two specific dates.
To create a comprehensive plot of multiple stock prices (Open, High, Low, Close, Adjusted
Close) and Volume between two specific dates using Pandas and Matplotlib, follow these
import pandas as pd
import matplotlib.pyplot as plt
# Plotting
fig, ax1 = plt.subplots(figsize=(14, 7))
# Plot Open, High, Low, Close, and Adjusted Close prices on the primary y-axis
ax1.plot(filtered_df.index, filtered_df['Open'], label='Open', color='blue')
ax1.plot(filtered_df.index, filtered_df['High'], label='High', color='green')
ax1.plot(filtered_df.index, filtered_df['Low'], label='Low', color='red')
ax1.plot(filtered_df.index, filtered_df['Close'], label='Close', color='orange')
ax1.plot(filtered_df.index, filtered_df['Adj Close'], label='Adjusted Close', color='purple')
# Adding grid
import pandas as pd
# 2. Remove duplicates
# Dropping duplicate rows based on all columns
df_dropped_duplicates = df_dropped_missing.drop_duplicates()
Additional Notes:
Missing Values Handling: Sometimes, instead of dropping missing values, you might choose
to fill them with a specific value or use interpolation. For example, df.fillna(value=0)
can be used to replace missing values with 0.
Duplicates Handling: Removing duplicates helps ensure that the dataset is unique, which is
important for accurate analysis and model training.
11. Write a Pandas program to filter all columns where all entries present, check which
rows and columns has a NaN and finally drop rows with any NaNs from the given
import pandas as pd
12. Write a Python program using Scikit-learn to print the keys, number of rows-
columns, feature names and the description of the given data.
from sklearn.datasets import load_iris # You can replace this with any dataset you
want to use
Scikit-learn datasets come with useful attributes like data, target, feature_names, and
DESCR which provide a comprehensive view of the dataset.
Make sure to install Scikit-learn if it’s not already installed. You can do this via pip:
pip install scikit-learn
To implement the K-Nearest Neighbors (KNN) supervised machine learning algorithm using
Scikit-learn, you need to follow these steps:
1. Load and Prepare the Dataset: Load the dataset and split it into training and test
2. Create and Train the KNN Model: Initialize the KNN model and train it on the
training data.
3. Evaluate the Model: Predict on the test set and evaluate the model’s performance.
Here's a complete Python program that demonstrates these steps using the Iris dataset as an
Ensure you have Scikit-learn installed. You can install it using pip if necessary:
14. Write a Python program to implement a machine learning algorithm for given
dataset. (It is recommended to assign different machine learning algorithms group
wise – micro project)
To implement various machine learning algorithms on a given dataset, you should follow a
structured approach. For this example, I will demonstrate how to implement three common
machine learning algorithms using Scikit-learn on a dataset:
1. Logistic Regression
2. Decision Tree Classifier
3. Support Vector Machine (SVM)
We'll use the Iris dataset for this demonstration. This dataset is often used for classification
tasks and comes built-in with Scikit-learn.
Additional Notes:
Hyperparameter Tuning: For each algorithm, you can perform hyperparameter tuning to
improve performance. This can be done using techniques like GridSearchCV.
Feature Scaling: For SVM, it's often beneficial to scale features using StandardScaler.
This step is omitted here for simplicity but is worth considering for real-world applications.