Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
45 views6 pages

Lab-4 - Muhammad Ahmad - 282660 - BESE-10B

Download as docx, pdf, or txt
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 6

Department of Computing

CS423: Data Warehousing and Data Mining


Class: BSCS
Lab 04: Data Pre-processing-Dimensionality Reduction through
Principal Component Analysis (PCA)
Date: 6th October, 2022
Time: 10:00 am- 1:40 pm

Name: Muhammad Ahmad


CMS: 282660

Course Instructor: Dr. Rabia Irfan


Lab Engineer: Shakeela

CS423: Data Warehousing and Data Mining Page 1


Lab 4: Data Pre-processing-Dimensionality Reduction through Principle Component
Analysis (PCA)
Introduction
Data preprocessing is crucial in any data mining process as they directly impact success rate of
the project. This reduces complexity of the data under analysis as data in real world is unclean.
Data is said to be unclean if it is missing attribute, attribute values, contain noise or outliers and
duplicate or wrong data. Presence of any of these will degrade quality of the results.
Furthermore, data sparsity increases as the dimensionality increases which makes operations like
clustering, outlier detection less meaningful as they greatly depend on density and distance
between points. Purpose of dimensionality reduction is to:
 Avoid curse of dimensionality
 Reduces time required by algorithms
 Greatly reduces memory consumption
 Ease of visualization of data
 Eliminate irrelevant features
Principal Component Analysis (PCA) is a method used to reduce number of variables in your
data by extracting important one from a large pool. It reduces the dimension of your data with
the aim of retaining as much information as possible. In other words, this method combines
highly correlated variables together to form a smaller number of an artificial set of variables
which is called “principal components” that account for most variance in the data.

Objectives
After performing this lab students should be able to:
1. Develop an understanding of the dimensionality reduction concept
2. Perform dimensionality reduction using PCA

Tools/Software Requirement
Jupyter Notebook (Python)

Procedure
1. First, go through the following tutorial to install and get familiarize with Juypter
notebook and Python, if you are already not familiar with it:
https://www.youtube.com/watch?v=fiQTb7-rCPo
2. Setup Juypter notebook on your machine.
3. After that go through the following video tutorial to perform PCA on the dataset as
mentioned: https://www.youtube.com/watch?v=kApPBm1YsqU. The part of our concern

CS423: Data Warehousing and Data Mining Page 2


is till 9:00 minutes of 19:55 minutes complete video. The support article for this video
tutorial is available at: https://towardsdatascience.com/pca-using-python-scikit-learn-
e653f8989e60. The part of our concern in this article is PCA for data visualization. The
other part covering machine learning with PCA is out of scope for the purpose of this lab
and our course.

Task
Get yourself familiarize with Juypter notebook and work with PCA following the same steps as
in the tutorial. We would be using the same environment and the same dataset for Lab 5.

CS423: Data Warehousing and Data Mining Page 3


CS423: Data Warehousing and Data Mining Page 4
CS423: Data Warehousing and Data Mining Page 5
Deliverable
Following are the deliverables of this lab:
1. Screenshots showing running of Juypter notebook in your machine.
2. Jupyter notebook file (.ipynb) containing the code for PCA
Submission should be in a single zipped folder (.zip format) at the LMS link provided.

CS423: Data Warehousing and Data Mining Page 6

You might also like