Lab-4 - Muhammad Ahmad - 282660 - BESE-10B

Uploaded by

This document provides instructions for a lab assignment on dimensionality reduction using Principal Component Analysis (PCA). The lab aims to help students understand and apply PCA to reduce the number of variables in a dataset. Students are asked to install Jupyter Notebook, follow video tutorials demonstrating PCA, and submit screenshots and a Jupyter notebook file applying PCA to a provided dataset. The notebook file will then be used in a subsequent lab.

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Download as docx, pdf, or txt

Lab-4 - Muhammad Ahmad - 282660 - BESE-10B

Uploaded by

Ahmad Dogar

0% found this document useful (0 votes)

45 views6 pages

Original Title

Lab-4_Muhammad Ahmad_282660_BESE-10B

Copyright

Available Formats

DOCX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Download as docx, pdf, or txt

0% found this document useful (0 votes)

45 views6 pages

Lab-4 - Muhammad Ahmad - 282660 - BESE-10B

Uploaded by

Ahmad Dogar

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Download as docx, pdf, or txt

You are on page 1/ 6

Department of Computing

CS423: Data Warehousing and Data Mining

Class: BSCS
Lab 04: Data Pre-processing-Dimensionality Reduction through
Principal Component Analysis (PCA)
Date: 6th October, 2022
Time: 10:00 am- 1:40 pm

Name: Muhammad Ahmad

CMS: 282660

Course Instructor: Dr. Rabia Irfan

Lab Engineer: Shakeela

CS423: Data Warehousing and Data Mining Page 1

Lab 4: Data Pre-processing-Dimensionality Reduction through Principle Component
Analysis (PCA)
Introduction
Data preprocessing is crucial in any data mining process as they directly impact success rate of
the project. This reduces complexity of the data under analysis as data in real world is unclean.
Data is said to be unclean if it is missing attribute, attribute values, contain noise or outliers and
duplicate or wrong data. Presence of any of these will degrade quality of the results.
Furthermore, data sparsity increases as the dimensionality increases which makes operations like
clustering, outlier detection less meaningful as they greatly depend on density and distance
between points. Purpose of dimensionality reduction is to:
 Avoid curse of dimensionality
 Reduces time required by algorithms
 Greatly reduces memory consumption
 Ease of visualization of data
 Eliminate irrelevant features
Principal Component Analysis (PCA) is a method used to reduce number of variables in your
data by extracting important one from a large pool. It reduces the dimension of your data with
the aim of retaining as much information as possible. In other words, this method combines
highly correlated variables together to form a smaller number of an artificial set of variables
which is called “principal components” that account for most variance in the data.

Objectives
After performing this lab students should be able to:
1. Develop an understanding of the dimensionality reduction concept
2. Perform dimensionality reduction using PCA

Tools/Software Requirement
Jupyter Notebook (Python)

Procedure
1. First, go through the following tutorial to install and get familiarize with Juypter
notebook and Python, if you are already not familiar with it:
https://www.youtube.com/watch?v=fiQTb7-rCPo
2. Setup Juypter notebook on your machine.
3. After that go through the following video tutorial to perform PCA on the dataset as
mentioned: https://www.youtube.com/watch?v=kApPBm1YsqU. The part of our concern

CS423: Data Warehousing and Data Mining Page 2

is till 9:00 minutes of 19:55 minutes complete video. The support article for this video
tutorial is available at: https://towardsdatascience.com/pca-using-python-scikit-learn-
e653f8989e60. The part of our concern in this article is PCA for data visualization. The
other part covering machine learning with PCA is out of scope for the purpose of this lab
and our course.

Task
Get yourself familiarize with Juypter notebook and work with PCA following the same steps as
in the tutorial. We would be using the same environment and the same dataset for Lab 5.

CS423: Data Warehousing and Data Mining Page 3

CS423: Data Warehousing and Data Mining Page 4
CS423: Data Warehousing and Data Mining Page 5
Deliverable
Following are the deliverables of this lab:
1. Screenshots showing running of Juypter notebook in your machine.
2. Jupyter notebook file (.ipynb) containing the code for PCA
Submission should be in a single zipped folder (.zip format) at the LMS link provided.

CS423: Data Warehousing and Data Mining Page 6

NDC+ and APTRA Advance NDC: EMV Integrated Circuit Card (ICC) Reference Manual
No ratings yet
NDC+ and APTRA Advance NDC: EMV Integrated Circuit Card (ICC) Reference Manual
224 pages
Datacenter Design & Infrastructure Layout
100% (3)
Datacenter Design & Infrastructure Layout
80 pages
Machine Learning (16CIC73) Project Report Template
50% (2)
Machine Learning (16CIC73) Project Report Template
12 pages
Project Report For Aes
100% (1)
Project Report For Aes
89 pages
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
Epic v. Apple - Apple Tentative Witness List
No ratings yet
Epic v. Apple - Apple Tentative Witness List
17 pages
PBL-2 Report File
No ratings yet
PBL-2 Report File
11 pages
Data Engineering
No ratings yet
Data Engineering
15 pages
Group 7405 BDAD Report
No ratings yet
Group 7405 BDAD Report
30 pages
PASS Azure Data Engineering Bootcamp
No ratings yet
PASS Azure Data Engineering Bootcamp
35 pages
ml report
No ratings yet
ml report
21 pages
Iare Data Preparation and Analysis Lab Manual
No ratings yet
Iare Data Preparation and Analysis Lab Manual
55 pages
CS_Project_grocery shop management
No ratings yet
CS_Project_grocery shop management
19 pages
RaghavQUIZ APPLICATION
No ratings yet
RaghavQUIZ APPLICATION
22 pages
Gokul Project 1
No ratings yet
Gokul Project 1
63 pages
M03 Operate Database application
No ratings yet
M03 Operate Database application
68 pages
Buffer Pool Analyzer For DB2
No ratings yet
Buffer Pool Analyzer For DB2
13 pages
Appendix-74
No ratings yet
Appendix-74
42 pages
Architecture Report Ineuron Internship Backorder Prediction
No ratings yet
Architecture Report Ineuron Internship Backorder Prediction
10 pages
Object Detection - Deep Learning: Jamia Hamdard
No ratings yet
Object Detection - Deep Learning: Jamia Hamdard
26 pages
Course - Big Data Technology
No ratings yet
Course - Big Data Technology
7 pages
Osy 22516 Microproject
100% (1)
Osy 22516 Microproject
15 pages
Mca Cloud Comp Curriculum 621f0a460d
No ratings yet
Mca Cloud Comp Curriculum 621f0a460d
6 pages
Project Report For Advanced Encryption System Complted
No ratings yet
Project Report For Advanced Encryption System Complted
88 pages
Taxation System Proposal
No ratings yet
Taxation System Proposal
15 pages
Abdul Kadir
No ratings yet
Abdul Kadir
97 pages
Project Report For Advanced Encryption System Complted
No ratings yet
Project Report For Advanced Encryption System Complted
90 pages
Empman
No ratings yet
Empman
10 pages
A Recommendation Engine Using Apache Spark
No ratings yet
A Recommendation Engine Using Apache Spark
62 pages
A Project Report ON Grocery Billing System: Bachelor'S of Science IN Computer Science
No ratings yet
A Project Report ON Grocery Billing System: Bachelor'S of Science IN Computer Science
22 pages
Python For Data Science and Machine Learning
100% (2)
Python For Data Science and Machine Learning
31 pages
Office Automation System
No ratings yet
Office Automation System
46 pages
NIS Microproject Edit
No ratings yet
NIS Microproject Edit
18 pages
Forecasting Methods and Implementation of DRP (Distribution Requirement Planning) Methods in Determining The Master Production Schedule
No ratings yet
Forecasting Methods and Implementation of DRP (Distribution Requirement Planning) Methods in Determining The Master Production Schedule
10 pages
Himanshu Gupta Configuration Manual
No ratings yet
Himanshu Gupta Configuration Manual
16 pages
Ly DuongHai
No ratings yet
Ly DuongHai
70 pages
21ai402 Data Analytics Unit-3
No ratings yet
21ai402 Data Analytics Unit-3
150 pages
Post Graduate Advanced Certificate in Data Science Ed11eb0454
No ratings yet
Post Graduate Advanced Certificate in Data Science Ed11eb0454
26 pages
Quiz Application System Sysnopsis 2017
No ratings yet
Quiz Application System Sysnopsis 2017
15 pages
DMS Mircoproject Pra
No ratings yet
DMS Mircoproject Pra
15 pages
Yang Fengming - 2019213704 - DraftReport
No ratings yet
Yang Fengming - 2019213704 - DraftReport
31 pages
CDS MODERN BRONCHURE (8.5 x 11 in) (3) (1)
No ratings yet
CDS MODERN BRONCHURE (8.5 x 11 in) (3) (1)
18 pages
Masters Thesis Ossi Kotala Metropolia-2016 PDF
No ratings yet
Masters Thesis Ossi Kotala Metropolia-2016 PDF
80 pages
Ann Unit V
No ratings yet
Ann Unit V
30 pages
File Utility 0test Data
No ratings yet
File Utility 0test Data
33 pages
WEKA
No ratings yet
WEKA
50 pages
Scikit-Learn Cookbook Sample Chapter
No ratings yet
Scikit-Learn Cookbook Sample Chapter
52 pages
Week 2.vikas Gupta
No ratings yet
Week 2.vikas Gupta
3 pages
Maharashtra State Board of Technical Education, Mumbai: A Micro Project On
No ratings yet
Maharashtra State Board of Technical Education, Mumbai: A Micro Project On
16 pages
An Approach of RDD Optimization in Big Data Analyt
No ratings yet
An Approach of RDD Optimization in Big Data Analyt
9 pages
Sample
No ratings yet
Sample
28 pages
Project Report Emaildetection
No ratings yet
Project Report Emaildetection
44 pages
How Bytehouse Powers A World'S Top Video App'S Real-Time Analytics
No ratings yet
How Bytehouse Powers A World'S Top Video App'S Real-Time Analytics
12 pages
Title Page, Certificate & Acknowledgement
No ratings yet
Title Page, Certificate & Acknowledgement
24 pages
PART A-Plan Format of Micro Project Proposal For 1 To4 Semester
No ratings yet
PART A-Plan Format of Micro Project Proposal For 1 To4 Semester
7 pages
Informatics College Pokhara: Information Systems CC4002NP
No ratings yet
Informatics College Pokhara: Information Systems CC4002NP
48 pages
Shadab Internship Report
No ratings yet
Shadab Internship Report
15 pages
Lab Manual CSF346
No ratings yet
Lab Manual CSF346
21 pages
JSPM'S Bhivarabai Sawant Institute of Technology & Research: Mini Project Report On
No ratings yet
JSPM'S Bhivarabai Sawant Institute of Technology & Research: Mini Project Report On
33 pages
Detect Ephis Full
No ratings yet
Detect Ephis Full
51 pages
CS NEP Syllabus 5 and 6
No ratings yet
CS NEP Syllabus 5 and 6
34 pages
Data Project Trial 3 Fucking Final PDF
No ratings yet
Data Project Trial 3 Fucking Final PDF
37 pages
CV Ar 2023
No ratings yet
CV Ar 2023
2 pages
F5 DNS Guide Example
No ratings yet
F5 DNS Guide Example
15 pages
Gambit 2: Tutorial Guide
No ratings yet
Gambit 2: Tutorial Guide
8 pages
All
No ratings yet
All
48 pages
Qumulo Core 2.14.0 Release Notes
No ratings yet
Qumulo Core 2.14.0 Release Notes
7 pages
Virtual Mouse Using Artificial Intelligence
No ratings yet
Virtual Mouse Using Artificial Intelligence
8 pages
Automation With Ansible Playbooks - Narendra Kumar Reddy Polu
No ratings yet
Automation With Ansible Playbooks - Narendra Kumar Reddy Polu
1,102 pages
Exercise Week 1
No ratings yet
Exercise Week 1
5 pages
Java Software Development: Dr. Santosh Kumar Dwivedi
100% (1)
Java Software Development: Dr. Santosh Kumar Dwivedi
37 pages
Mad MP
No ratings yet
Mad MP
13 pages
The Installation of Offline WordPress Using Xampp
No ratings yet
The Installation of Offline WordPress Using Xampp
4 pages
HCI Topic 6 Reviewer
No ratings yet
HCI Topic 6 Reviewer
4 pages
Assignment 2.4.1 Multiclass Classification
No ratings yet
Assignment 2.4.1 Multiclass Classification
5 pages
76 Student Feedback Review System Using Python PY076
100% (1)
76 Student Feedback Review System Using Python PY076
13 pages
HLD Sample PDF Free
No ratings yet
HLD Sample PDF Free
38 pages
Generate a Random Name - Fake Name Generator
No ratings yet
Generate a Random Name - Fake Name Generator
1 page
Drop Box
No ratings yet
Drop Box
62 pages
PM - SDD - Breakdown Maintenance - V1.0
100% (1)
PM - SDD - Breakdown Maintenance - V1.0
13 pages
Fayaz Khan Resume
No ratings yet
Fayaz Khan Resume
2 pages
Demo
No ratings yet
Demo
12 pages
Acc2 Decoder: Powerful. Intelligent. Flexible. Connected
No ratings yet
Acc2 Decoder: Powerful. Intelligent. Flexible. Connected
8 pages
SQVI - Query Builder Ultimate Guide by Aman
No ratings yet
SQVI - Query Builder Ultimate Guide by Aman
10 pages
Importance_of_ITeS
No ratings yet
Importance_of_ITeS
12 pages
Internship Report File
No ratings yet
Internship Report File
35 pages
Capstone Project 1: Product Backlog
No ratings yet
Capstone Project 1: Product Backlog
11 pages
DotNetZip License
No ratings yet
DotNetZip License
4 pages
Assigment KCA205 (DS)
No ratings yet
Assigment KCA205 (DS)
2 pages
Lecture 2 - Robot Programming
100% (1)
Lecture 2 - Robot Programming
54 pages