Iml Practical Assignment
Iml Practical Assignment
Iml Practical Assignment
Semester: 5
Overview of Scikit-learn
Key Features
Example Workflow
1. Import Libraries:
2. Load data:
data = load_iris()
X = data.data
y = data.target
3. split data:
4. preprocess data:
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
5. train model:
model = SVC(kernel='linear')
model.fit(X_train_scaled, y_train)
6. make predictions:
y_pred = model.predict(X_test_scaled)
7. evaluate model:
Ease of Use: It has a clean and consistent API that makes it easy to use and integrate into
existing Python code.
Comprehensive Documentation: Excellent documentation and a large community make it
easier to find support and resources.
Efficiency: Scikit-learn is optimized for performance and can handle large datasets
effectively.
Integration: It integrates well with other Python libraries, making it a great choice for
building end-to-end machine learning solutions.
Conclusion
Scikit-learn is a powerful and user-friendly tool that caters to both beginners and experienced
practitioners in machine learning. Its extensive library of algorithms, preprocessing tools, and
model evaluation methods make it a go-to choice for many machine learning tasks in Python.
2. Write a NumPy program to implement following operation
to convert a list of numeric values into a one-dimensional NumPy array
to create a 3x3 matrix with values ranging from 2 to 10
to append values at the end of an array
to create another shape from an array without changing its data(3*2 to 2*3)
import numpy as np
# 4. Create another shape from an array without changing its data (from 3x2 to 2x3)
# First, create a 3x2 array
original_array = np.arange(6).reshape((3, 2))
print("\nOriginal 3x2 array:")
print(original_array)
# Reshape it to 2x3
reshaped_array = original_array.reshape((2, 3))
print("\nReshaped 2x3 array:")
print(reshaped_array)
Explanation:
Explanation:
Here's how you can achieve a horizontal stack with equal-length arrays or view
them individually:
import numpy as np
# Optionally, you can pad arrays to make them same length and then stack if needed
max_length = max(len(array1), len(array2), len(array3))
array1_padded = np.pad(array1, (0, max_length - len(array1)), constant_values=np.nan)
array2_padded = np.pad(array2, (0, max_length - len(array2)), constant_values=np.nan)
array3_padded = np.pad(array3, (0, max_length - len(array3)), constant_values=np.nan)
# Print differences
print("\nDifference Between Neighboring Elements:")
print(difference_array)
Explanation:
1. Element-wise Operations:
o np.add, np.subtract, np.multiply, and np.divide perform element-wise
operations on the arrays array1 and array2.
2. Rounding Elements:
o np.round(array1) rounds each element of array1 to the nearest integer.
3. Mean Across Dimensions:
o np.mean(matrix, axis=1) calculates the mean of each row.
o np.mean(matrix, axis=0) calculates the mean of each column.
4. Difference Between Neighboring Elements:
o np.diff(array1) calculates the difference between each pair of neighboring
elements in array1.
# Sample data
flattened_array = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
matrix = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
# 2. Compute the mean, standard deviation, and variance of the 2D array along the
second axis
mean_along_axis_1 = np.mean(matrix, axis=1)
std_dev_along_axis_1 = np.std(matrix, axis=1)
variance_along_axis_1 = np.var(matrix, axis=1)
Explanation:
In the code:
# Sample data
numpy_array = np.array([10, 20, 30, 40, 50])
data = {
'A': [1, 2, 3, 4, 5],
'B': [10, 20, 30, 40, 50]
}
df = pd.DataFrame(data)
# Print the Series created from the first column of the DataFrame
print("\nFirst column of DataFrame as a Pandas Series:")
print(first_column_series)
# 3. Create the mean and standard deviation of the data of a given Series
mean_of_series = first_column_series.mean()
std_dev_of_series = first_column_series.std()
Explanation:
# Sample data
data = {
'name': ['Alice', 'Bob', 'Charlie', 'David'],
'age': [25, 30, 35, 40],
'city': ['New York', 'Los Angeles', 'Chicago', 'Houston']
}
Explanation:
8. Write a Pandas program to create a line plot of the opening, closing stock prices of
given company between two specific dates.
To create a line plot of the opening and closing stock prices for a company between two
specific dates using Pandas and Matplotlib, you'll need to follow these steps:
For this example, let's assume you have stock price data in a CSV file with columns for Date,
Open, and Close. Here’s how you can implement this:
import pandas as pd
import matplotlib.pyplot as plt
# Plotting
plt.figure(figsize=(12, 6))
plt.plot(filtered_df.index, filtered_df['Open'], label='Opening Price', color='blue')
plt.plot(filtered_df.index, filtered_df['Close'], label='Closing Price', color='red')
Explanation:
9. Write a Pandas program to create a plot of Open, High, Low, Close, Adjusted
Closing prices and Volume of given company between two specific dates.
To create a comprehensive plot of multiple stock prices (Open, High, Low, Close, Adjusted
Close) and Volume between two specific dates using Pandas and Matplotlib, follow these
steps:
import pandas as pd
import matplotlib.pyplot as plt
# Plotting
fig, ax1 = plt.subplots(figsize=(14, 7))
# Plot Open, High, Low, Close, and Adjusted Close prices on the primary y-axis
ax1.plot(filtered_df.index, filtered_df['Open'], label='Open', color='blue')
ax1.plot(filtered_df.index, filtered_df['High'], label='High', color='green')
ax1.plot(filtered_df.index, filtered_df['Low'], label='Low', color='red')
ax1.plot(filtered_df.index, filtered_df['Close'], label='Close', color='orange')
ax1.plot(filtered_df.index, filtered_df['Adj Close'], label='Adjusted Close', color='purple')
# Adding grid
ax1.grid(True)
plt.show()
Explanation:
import pandas as pd
# 2. Remove duplicates
# Dropping duplicate rows based on all columns
df_dropped_duplicates = df_dropped_missing.drop_duplicates()
Explanation:
Additional Notes:
Missing Values Handling: Sometimes, instead of dropping missing values, you might choose
to fill them with a specific value or use interpolation. For example, df.fillna(value=0)
can be used to replace missing values with 0.
Duplicates Handling: Removing duplicates helps ensure that the dataset is unique, which is
important for accurate analysis and model training.
11. Write a Pandas program to filter all columns where all entries present, check which
rows and columns has a NaN and finally drop rows with any NaNs from the given
dataset.
import pandas as pd
Explanation:
12. Write a Python program using Scikit-learn to print the keys, number of rows-
columns, feature names and the description of the given data.
from sklearn.datasets import load_iris # You can replace this with any dataset you
want to use
Explanation:
Note:
Scikit-learn datasets come with useful attributes like data, target, feature_names, and
DESCR which provide a comprehensive view of the dataset.
Make sure to install Scikit-learn if it’s not already installed. You can do this via pip:
pip install scikit-learn
To implement the K-Nearest Neighbors (KNN) supervised machine learning algorithm using
Scikit-learn, you need to follow these steps:
1. Load and Prepare the Dataset: Load the dataset and split it into training and test
sets.
2. Create and Train the KNN Model: Initialize the KNN model and train it on the
training data.
3. Evaluate the Model: Predict on the test set and evaluate the model’s performance.
Here's a complete Python program that demonstrates these steps using the Iris dataset as an
example:
Explanation:
Notes:
Ensure you have Scikit-learn installed. You can install it using pip if necessary:
14. Write a Python program to implement a machine learning algorithm for given
dataset. (It is recommended to assign different machine learning algorithms group
wise – micro project)
To implement various machine learning algorithms on a given dataset, you should follow a
structured approach. For this example, I will demonstrate how to implement three common
machine learning algorithms using Scikit-learn on a dataset:
1. Logistic Regression
2. Decision Tree Classifier
3. Support Vector Machine (SVM)
We'll use the Iris dataset for this demonstration. This dataset is often used for classification
tasks and comes built-in with Scikit-learn.
Additional Notes:
Hyperparameter Tuning: For each algorithm, you can perform hyperparameter tuning to
improve performance. This can be done using techniques like GridSearchCV.
Feature Scaling: For SVM, it's often beneficial to scale features using StandardScaler.
This step is omitted here for simplicity but is worth considering for real-world applications.