Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
9 views

Assignment 2

The document outlines the assignment for a Bachelor of Technology course in Information Technology, focusing on Artificial Intelligence, Machine Learning, and Deep Learning. Students are required to work with a unique dataset, complete various tasks including data import, visualization, preprocessing, model selection, training, and evaluation, and submit their findings in a Google Colab file converted to PDF. The assignment has a submission deadline of March 29, 2025, and provides a list of datasets and additional resources for assistance.

Uploaded by

devhirpara8
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Assignment 2

The document outlines the assignment for a Bachelor of Technology course in Information Technology, focusing on Artificial Intelligence, Machine Learning, and Deep Learning. Students are required to work with a unique dataset, complete various tasks including data import, visualization, preprocessing, model selection, training, and evaluation, and submit their findings in a Google Colab file converted to PDF. The assignment has a submission deadline of March 29, 2025, and provides a list of datasets and additional resources for assistance.

Uploaded by

devhirpara8
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

School of Technology Design and Computer Application

College of Technology
Bachelor of Technology
Information Technology

Semester: 6 Academic Year: 2024-2025

Course Artificial Intelligence with Course Code: 1010103322


Name: concepts of Machine
Learning & Deep Learning

Assignment 2 [Unit: 5,6]

Instructions
Each student/group will be assigned a unique dataset. The following tasks must be completed and
documented in the report:

1. Import the Dataset

●​ Load the dataset using appropriate Python libraries (pandas, tensorflow, sklearn, etc.).
●​ Display the first few rows and understand the dataset’s structure.

2. Data Visualization & Preprocessing

●​ Identify missing values and handle them appropriately.


●​ Perform exploratory data analysis (EDA) using matplotlib and seaborn.
●​ Check for class imbalances and outliers.
●​ Perform necessary feature scaling and encoding if required.

3. Feature Extraction

●​ Identify important features using correlation, mutual information, or PCA.


●​ Drop irrelevant or redundant features.

4. Train-Test Data Split

●​ Split the dataset into training and testing sets (e.g., 80-20 or 70-30 split).
●​ Use train_test_split() from sklearn.model_selection.

1
5. Model Selection

●​ Choose an appropriate machine learning or deep learning model.


●​ Justify your choice of model for the given dataset.
●​ Consider traditional ML models (SVM, Decision Trees, Random Forest, Logistic Regression)
and deep learning models (CNN, LSTMs, Transformers) where applicable.

6. Model Training

●​ Train the selected model on the training dataset.


●​ Use hyperparameter tuning (GridSearchCV, RandomizedSearchCV, etc.) to improve model
performance.

7. Model Evaluation

●​ Evaluate model performance using appropriate metrics:


○​ Classification: Accuracy, Precision, Recall, F1-score, AUC-ROC
○​ Regression: RMSE, MAE, R2-score
○​ Time Series: MSE, Mean Absolute Percentage Error (MAPE)
●​ Visualize results using confusion matrix, ROC curves, or loss/accuracy plots.

8. Conclusion

●​ Interpret model performance.


●​ Suggest improvements and future enhancements.
●​ Compare different models (if applicable) and justify the best choice.

Datasets & Assignments

Each student/group will work on one of the following datasets:

1.​ Titanic Survival Prediction (Classification) - Kaggle Link


2.​ House Price Prediction (Regression) - Kaggle Link
3.​ IMDB Movie Reviews Sentiment Analysis (NLP) - tensorflow.keras.datasets.imdb
4.​ CIFAR-10 Image Classification (Computer Vision) - tensorflow.keras.datasets.cifar10
5.​ UCI Heart Disease Prediction (Medical Classification) - Kaggle Link
6.​ Retail Sales Forecasting (Walmart Sales Data) (Time Series) - Kaggle Link
7.​ Fake News Detection (NLP) - Kaggle Link
8.​ Credit Card Fraud Detection (Anomaly Detection) - Kaggle Link
9.​ Human Activity Recognition (HAR) with Smartphones (Classification) - Kaggle Link
10.​Plant Seedlings Classification (Image Classification) - Kaggle Link

2
Submission Guidelines

●​ The assignment must be submitted in the form of a google colab file, convert that file into
PDF then take print out and submit it after midsem exam.
●​ A PDF summarizing the approach, results, and analysis must be included.
●​ Deadline for submission: [29/03/2025 Saturday].

Additional Resources

●​ Python Libraries: pandas, numpy, sklearn, tensorflow, matplotlib, seaborn


●​ Kaggle Datasets: https://www.kaggle.com/datasets
●​ Google Colab for running models online: https://colab.research.google.com

Need Help?

If you have any questions, feel free to reach out via email or during teaching hours at EA-601(Ms.
Purvi patel).

Good luck and happy coding!

You might also like