100% found this document useful (1 vote)

458 views

Clustering Documentation R Code

The document discusses performing hierarchical clustering on two datasets: an airlines dataset and a crime dataset. For the airlines data, 9 clusters were identified and the data was aggregated and summarized. For the crime data, 6 clusters were identified after normalizing the values. The clustered data was then saved as a CSV file.

Uploaded by

nehal gundrapally

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

458 views

Clustering Documentation R Code

Uploaded by

nehal gundrapally

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

1.

)Perform clustering (Both hierarchical and K means clustering) for

the airlines data to obtain optimum number of clusters.

=> Inferences from the Data Set

Dataset talks about the airlines data transactions

=> In R studio :
install.packages("readxl")
library(readxl)
install.packages()

=> loading the library “readxl” the data set is in excel format
air <- read_xlsx("E:\\EastWestAirlines.xlsx",sheet=2)
=> reducing the coloumn in data set to summary the data
air1 <- air[ , c(2:12)]

=> summary data to find the max and min value for each and every coloumn
summary(air1)
Balance Qual_miles cc1_miles cc2_miles
Min. : 0 Min. : 0.0 Min. :1.00 Min. :1.000
1st Qu.: 18528 1st Qu.: 0.0 1st Qu.:1.00 1st Qu.:1.000
Median : 43097 Median : 0.0 Median :1.00 Median :1.000
Mean : 73601 Mean : 144.1 Mean :2.06 Mean :1.015
3rd Qu.: 92404 3rd Qu.: 0.0 3rd Qu.:3.00 3rd Qu.:1.000
Max. :1704838 Max. :11148.0 Max. :5.00 Max. :3.000
cc3_miles Bonus_miles Bonus_trans Flight_miles_12mo
Min. :1.000 Min. : 0 Min. : 0.0 Min. : 0.0
1st Qu.:1.000 1st Qu.: 1250 1st Qu.: 3.0 1st Qu.: 0.0
Median :1.000 Median : 7171 Median :12.0 Median : 0.0
Mean :1.012 Mean : 17145 Mean :11.6 Mean : 460.1
3rd Qu.:1.000 3rd Qu.: 23801 3rd Qu.:17.0 3rd Qu.: 311.0
Max. :5.000 Max. :263685 Max. :86.0 Max. :30817.0

=> Normalizing the data

normalized_data <- scale(air1[ ,2:11 ])
=> summary the normalized_data
summary(normalized_data)
=> Finding the distance of the data
d <- dist(normalized_data, method = "euclidean")
fit <- hclust(d, method = "complete")
=> Ploting the dindogram to plot the data
plot(fit)
plot(fit, hang = -1)

=> Cluster Dendrogram

=> Grouping the plot into 9 clusters
groups <- cutree(fit, k = 9)#cut tree into 9 clusters
rect.hclust(fit, k = 9, border = "blue")
=> membership <- as.matrix(groups)

=> final <- data.frame(membership, air1)

=> Aggerating the data through mean
And list the data
=> aggregate(air1[, 2:11], by = list(final$membership), FUN = mean)
Group .1 Qual_miles cc1_miles cc2_miles cc3_miles Bonus_miles Bonus_trans
1 1 89.60026 2.039119 1.007254 1.000777 15870.57 11.04870
2 2 472.40000 2.120000 1.000000 1.000000 31986.40 30.66000
3 3 648.73333 4.600000 1.000000 1.000000 112247.17 31.53333
4 4 118.20000 3.600000 1.000000 3.600000 79268.70 30.60000
5 5 66.66667 1.000000 3.000000 1.000000 20410.47 18.93333
6 6 7352.20000 1.760000 1.000000 1.000000 14299.56 11.48000
7 7 0.00000 3.200000 1.000000 5.000000 123246.20 23.00000
8 8 694.00000 2.500000 1.000000 1.000000 76325.00 75.50000
9 9 0.00000 2.500000 1.000000 1.000000 54943.50 63.00000
Flight_miles_12mo Flight_trans_12 Days_since_enroll Award?
1 312.3731 0.965544 4093.794 0.3575130
2 7867.2400 21.300000 4304.600 0.7800000
3 3739.0000 11.433333 6646.567 0.9333333
4 650.0000 2.100000 4891.600 0.4000000
5 692.6667 3.200000 4075.533 0.4000000
6 1225.6400 3.560000 4572.240 0.6400000
7 220.0000 0.600000 4058.400 0.8000000
8 26458.5000 49.000000 2602.000 1.0000000
9 13461.5000 49.500000 1798.500 1.0000000

library(readr)
write_csv(final, "hclustoutput.csv")
=> saving the data
getwd()
2.)Perform Clustering for the crime data and identify the number of
clusters formed and draw inferences

Interface of the data tells that states and their crime data
=> loading the library “readxl” the data set is in excel format
library(readxl)
=>loading the data set
cdata <- read_excel(file.choose())
=>Reducing the coloumn to summary the data
data <-cdata[,c(2:5)]
=>summarizing the data
summary(data)
Murder Assault UrbanPop Rape
Min. : 0.800 Min. : 45.0 Min. :32.00 Min. : 7.30
1st Qu.: 4.075 1st Qu.:109.0 1st Qu.:54.50 1st Qu.:15.07
Median : 7.250 Median :159.0 Median :66.00 Median :20.10
Mean : 7.788 Mean :170.8 Mean :65.54 Mean :21.23
3rd Qu.:11.250 3rd Qu.:249.0 3rd Qu.:77.75 3rd Qu.:26.18
Max. :17.400 Max. :337.0 Max. :91.00 Max. :46.00

=> Normalizing the data because the values are so high

normalized_data <- scale(data[ , ])
Murder Assault UrbanPop Rape
Min. :-1.6044 Min. :-1.5090 Min. :-2.31714 Min. :-1.4874
1st Qu.:-0.8525 1st Qu.:-0.7411 1st Qu.:-0.76271 1st Qu.:-0.6574
Median :-0.1235 Median :-0.1411 Median : 0.03178 Median :-0.1209
Mean : 0.0000 Mean : 0.0000 Mean : 0.00000 Mean : 0.0000
3rd Qu.: 0.7949 3rd Qu.: 0.9388 3rd Qu.: 0.84354 3rd Qu.: 0.5277
Max. : 2.2069 Max. : 1.9948 Max. : 1.75892 Max. : 2.6444

summary(normalized_data)
=>Finding the distance in the values through ‘euclidean’ method
d <- dist(normalized_data, method = "euclidean")
=>Clusting the data through the method ‘Complete’
fit <- hclust(d, method = "complete")

# display dindogram
plot(fit)
6
5
4
3
Height

2
2
1

13 223 9
25 8
739
156
28

33
0

5017

42
10
274
454

43
12

381
21
30

24
40
1635

181
26
41
483
46
1519
37
47

20
31
36
23
49
14
32
29
plot(fit, hang = -1)

6
5
4
Height

3
2
1
0
41
48
34
45
19
15
29
12
26
27
17
4
46
50
25
37
47
8
39
21
30
7
23
49
36
14
16
35
38
11
44
6
5
28
9
43
13
32
3
22
20
31
2
1
18
10
42
33
24
40

Creating the tree into 6 clusters

groups <- cutree(fit, k = 6)#cut tree
rect.hclust(fit, k = 6, border = "blue")
6
5
4
Height

3
2
1
0
41
48
34
45
19
15
29
12
26
27
17
4
46
50
25
37
47
8
39
21
30
7
23
49
36
14
16
35
38
11
44
6
5
28
9
43
13
32
3
22
20
31
2
1
18
10
42
33
24
40

membership <- as.matrix(groups)

final <- data.frame(membership, data)

aggregate(data[, 2:5], by = list(final$membership), FUN = mean)
library(readr)
=>saving the data
write_csv(final, "hclustoutput.csv")

getwd()

Lee Fratantuono - Madness Unchained - A Reading of Virgil's Aeneid-Lexington Books (2007) PDF
100% (1)
Lee Fratantuono - Madness Unchained - A Reading of Virgil's Aeneid-Lexington Books (2007) PDF
448 pages
11 Network Analytics - Problem Statement
25% (4)
11 Network Analytics - Problem Statement
4 pages
Module-Preliminaries For Data Analysis - Data Science
100% (1)
Module-Preliminaries For Data Analysis - Data Science
5 pages
Association Rules Problem Statement
50% (2)
Association Rules Problem Statement
5 pages
Q1) Identify The Data Type For The Following
75% (8)
Q1) Identify The Data Type For The Following
3 pages
Q1) Identify The Data Type For The Following
75% (8)
Q1) Identify The Data Type For The Following
3 pages
Sherlock Holmes and The Blue Diamond - Exerci
100% (1)
Sherlock Holmes and The Blue Diamond - Exerci
8 pages
Basic Statistics (Module - 3)
No ratings yet
Basic Statistics (Module - 3)
9 pages
Network Analytics - Problem Statement
No ratings yet
Network Analytics - Problem Statement
4 pages
Business Uderstanding Solved1 - Module 1
No ratings yet
Business Uderstanding Solved1 - Module 1
11 pages
"My Mystery Is For Me": A Saying of Jesus?
No ratings yet
"My Mystery Is For Me": A Saying of Jesus?
16 pages
Speech On Social Media - A Boon or A Curse
100% (2)
Speech On Social Media - A Boon or A Curse
3 pages
Discretization Problem Statement
No ratings yet
Discretization Problem Statement
3 pages
Day13 K Means Clustering
No ratings yet
Day13 K Means Clustering
4 pages
R - Assignment
No ratings yet
R - Assignment
2 pages
Day10 Mathematical Foundations
No ratings yet
Day10 Mathematical Foundations
4 pages
8.dummy Variables
No ratings yet
8.dummy Variables
4 pages
DataPreparation - Outlier - Treatment ASSIGNMENT 1
100% (1)
DataPreparation - Outlier - Treatment ASSIGNMENT 1
7 pages
Multinomial Problem Statement
No ratings yet
Multinomial Problem Statement
28 pages
Module 03 Assignment
100% (1)
Module 03 Assignment
13 pages
CRISP DM Business Understanding Completed
No ratings yet
CRISP DM Business Understanding Completed
18 pages
Zero Variance-Problem Statement
0% (1)
Zero Variance-Problem Statement
3 pages
Support Vector Machines Problem Statement
No ratings yet
Support Vector Machines Problem Statement
27 pages
Statistics and Probability
No ratings yet
Statistics and Probability
8 pages
15 KNN - Problem Statement
0% (2)
15 KNN - Problem Statement
3 pages
DataPreparation Outlier Treatment
100% (1)
DataPreparation Outlier Treatment
3 pages
Text Mining Problem Statement
100% (1)
Text Mining Problem Statement
3 pages
Duplication - Typecasting-Problem Statement
100% (1)
Duplication - Typecasting-Problem Statement
3 pages
Radhika PCA - Problem Statement
No ratings yet
Radhika PCA - Problem Statement
3 pages
Topic: Dimension Reduction With PCA: Instructions
No ratings yet
Topic: Dimension Reduction With PCA: Instructions
8 pages
Missing Values
No ratings yet
Missing Values
6 pages
06.discretization Problem Statement
50% (2)
06.discretization Problem Statement
2 pages
CRISP ML (Q) Business Understanding
No ratings yet
CRISP ML (Q) Business Understanding
17 pages
13.exploratory Data Analysis
50% (2)
13.exploratory Data Analysis
8 pages
CRISP DM Business Understanding - Data Science
No ratings yet
CRISP DM Business Understanding - Data Science
15 pages
20dit073 Jay Prajapati ML
No ratings yet
20dit073 Jay Prajapati ML
68 pages
Business Moments Graphic Assignmebt
No ratings yet
Business Moments Graphic Assignmebt
11 pages
Inferential Statistics (AutoRecovered)
No ratings yet
Inferential Statistics (AutoRecovered)
12 pages
Basic Statistics (Module - 3)
No ratings yet
Basic Statistics (Module - 3)
7 pages
Discretization Problem Statement
No ratings yet
Discretization Problem Statement
4 pages
KNN - Problem Statement ANSWER
100% (1)
KNN - Problem Statement ANSWER
8 pages
Python For Data Analytics
No ratings yet
Python For Data Analytics
3 pages
Basic Statistics (Module - 3)
100% (2)
Basic Statistics (Module - 3)
12 pages
Association Rules Ans
No ratings yet
Association Rules Ans
28 pages
Name:Silpa Batch Id: Analysis: WDEO 171220 Topic: Principal Component
100% (1)
Name:Silpa Batch Id: Analysis: WDEO 171220 Topic: Principal Component
7 pages
Assignment 2
No ratings yet
Assignment 2
7 pages
Day02-Data Understanding Answer Asit 25082022
No ratings yet
Day02-Data Understanding Answer Asit 25082022
4 pages
Data Assigment 1
100% (2)
Data Assigment 1
32 pages
Assignment 06
50% (2)
Assignment 06
2 pages
Day13-K-Means Clustering
No ratings yet
Day13-K-Means Clustering
10 pages
Transformations Problem Statement
0% (1)
Transformations Problem Statement
7 pages
CRISP ML (Q) Business Understanding
No ratings yet
CRISP ML (Q) Business Understanding
12 pages
Problem Statement - Mathematical Foundations
No ratings yet
Problem Statement - Mathematical Foundations
6 pages
CRISP - ML (Q) - Business Understanding
No ratings yet
CRISP - ML (Q) - Business Understanding
13 pages
Original
No ratings yet
Original
30 pages
Day17 Association Rules
No ratings yet
Day17 Association Rules
4 pages
Standardization Problem Statement
No ratings yet
Standardization Problem Statement
5 pages
Duplication - Typecasting-Problem Statement
No ratings yet
Duplication - Typecasting-Problem Statement
6 pages
1 - Write A Python Program To Check That A String Contains Only A Certain Set of Characters (In This Case A-Z, A-Z and 0-9)
No ratings yet
1 - Write A Python Program To Check That A String Contains Only A Certain Set of Characters (In This Case A-Z, A-Z and 0-9)
4 pages
Assignment 05 ANSWERS
100% (1)
Assignment 05 ANSWERS
5 pages
Assignment lab 1
No ratings yet
Assignment lab 1
3 pages
Clustering Documentation Python Code
No ratings yet
Clustering Documentation Python Code
8 pages
Practical 10
No ratings yet
Practical 10
22 pages
Stastistics and Probability With R Programming Language: Lab Report
50% (2)
Stastistics and Probability With R Programming Language: Lab Report
44 pages
Output
No ratings yet
Output
24 pages
Final Project Charter
No ratings yet
Final Project Charter
3 pages
Report Digital - Last Mile Delivery Challenge1
No ratings yet
Report Digital - Last Mile Delivery Challenge1
40 pages
Project Objective
No ratings yet
Project Objective
2 pages
Split Data
No ratings yet
Split Data
5 pages
Minutes of Meeting: Attendees Absentees
No ratings yet
Minutes of Meeting: Attendees Absentees
2 pages
Python Codes Arules
100% (1)
Python Codes Arules
17 pages
Books
No ratings yet
Books
6 pages
Assignment Module02
100% (1)
Assignment Module02
5 pages
Books
No ratings yet
Books
6 pages
Association Rules:: Books Data Set
No ratings yet
Association Rules:: Books Data Set
23 pages
Amazon Sentimental Analysis
No ratings yet
Amazon Sentimental Analysis
8 pages
Assignment Datatypes PDF
No ratings yet
Assignment Datatypes PDF
3 pages
Assignment Datatypes PDF
No ratings yet
Assignment Datatypes PDF
3 pages
Movement Competency Training Module No. 9
No ratings yet
Movement Competency Training Module No. 9
3 pages
700 Vocab PDF Antonym and Synonym 33
No ratings yet
700 Vocab PDF Antonym and Synonym 33
157 pages
Planning & Design: Classics 168 / Archlgy 118 11 April 2019
No ratings yet
Planning & Design: Classics 168 / Archlgy 118 11 April 2019
19 pages
Ethology PDF
No ratings yet
Ethology PDF
3 pages
Modern Dance Pioneers
No ratings yet
Modern Dance Pioneers
11 pages
Full download Fundamentals of Digital Marketing, 2/e 2nd Edition Puneet Singh Bhatia pdf docx
No ratings yet
Full download Fundamentals of Digital Marketing, 2/e 2nd Edition Puneet Singh Bhatia pdf docx
55 pages
Madrasah - Cse - DLP Lesson 1 q3
100% (1)
Madrasah - Cse - DLP Lesson 1 q3
8 pages
BR Contribution
No ratings yet
BR Contribution
5 pages
1 s2.0 S1364661315001801 Main
No ratings yet
1 s2.0 S1364661315001801 Main
13 pages
Homework 3 2
No ratings yet
Homework 3 2
7 pages
Learning Resource 1 Lesson 4 PDF
100% (1)
Learning Resource 1 Lesson 4 PDF
14 pages
Anima Beyond Science v5.0
75% (4)
Anima Beyond Science v5.0
99 pages
Lesson Plan, Romantic Composers - Bhs
No ratings yet
Lesson Plan, Romantic Composers - Bhs
10 pages
Denmark: Stuart Bedoya James Cantillo Julián López
No ratings yet
Denmark: Stuart Bedoya James Cantillo Julián López
42 pages
Mistake
No ratings yet
Mistake
4 pages
Writing War in Britain and France, 1370-1854. A History of Emotions
No ratings yet
Writing War in Britain and France, 1370-1854. A History of Emotions
259 pages
Case Digest Possession
No ratings yet
Case Digest Possession
3 pages
AF3507_Company_Law_SO_Semester 1_2024-2025 (1)(AV) (2)
No ratings yet
AF3507_Company_Law_SO_Semester 1_2024-2025 (1)(AV) (2)
7 pages
Transfer of Property
No ratings yet
Transfer of Property
15 pages
Final Project
No ratings yet
Final Project
42 pages
Judgement Writing
No ratings yet
Judgement Writing
5 pages
Dragon Masters 1 Comprehension Q
100% (2)
Dragon Masters 1 Comprehension Q
3 pages
The Chess Manual of Avoidable Mistakes Vol 1 - Romain Edouard
100% (1)
The Chess Manual of Avoidable Mistakes Vol 1 - Romain Edouard
226 pages
DESIGN STUDIO II Portfolio
No ratings yet
DESIGN STUDIO II Portfolio
53 pages
Operating System Concepts 10th 10th Edition Abraham Silberschatz download
100% (2)
Operating System Concepts 10th 10th Edition Abraham Silberschatz download
48 pages
Aguri
No ratings yet
Aguri
9 pages

Clustering Documentation R Code

Uploaded by

Clustering Documentation R Code

Uploaded by

1.

)Perform clustering (Both hierarchical and K means clustering) for

=> Inferences from the Data Set

=> Normalizing the data

=> Cluster Dendrogram

=> final <- data.frame(membership, air1)

=> Normalizing the data because the values are so high

Creating the tree into 6 clusters

membership <- as.matrix(groups)

final <- data.frame(membership, data)

You might also like