Clustering Documentation R Code
Clustering Documentation R Code
=> In R studio :
install.packages("readxl")
library(readxl)
install.packages()
=> loading the library “readxl” the data set is in excel format
air <- read_xlsx("E:\\EastWestAirlines.xlsx",sheet=2)
=> reducing the coloumn in data set to summary the data
air1 <- air[ , c(2:12)]
=> summary data to find the max and min value for each and every coloumn
summary(air1)
Balance Qual_miles cc1_miles cc2_miles
Min. : 0 Min. : 0.0 Min. :1.00 Min. :1.000
1st Qu.: 18528 1st Qu.: 0.0 1st Qu.:1.00 1st Qu.:1.000
Median : 43097 Median : 0.0 Median :1.00 Median :1.000
Mean : 73601 Mean : 144.1 Mean :2.06 Mean :1.015
3rd Qu.: 92404 3rd Qu.: 0.0 3rd Qu.:3.00 3rd Qu.:1.000
Max. :1704838 Max. :11148.0 Max. :5.00 Max. :3.000
cc3_miles Bonus_miles Bonus_trans Flight_miles_12mo
Min. :1.000 Min. : 0 Min. : 0.0 Min. : 0.0
1st Qu.:1.000 1st Qu.: 1250 1st Qu.: 3.0 1st Qu.: 0.0
Median :1.000 Median : 7171 Median :12.0 Median : 0.0
Mean :1.012 Mean : 17145 Mean :11.6 Mean : 460.1
3rd Qu.:1.000 3rd Qu.: 23801 3rd Qu.:17.0 3rd Qu.: 311.0
Max. :5.000 Max. :263685 Max. :86.0 Max. :30817.0
library(readr)
write_csv(final, "hclustoutput.csv")
=> saving the data
getwd()
2.)Perform Clustering for the crime data and identify the number of
clusters formed and draw inferences
Interface of the data tells that states and their crime data
=> loading the library “readxl” the data set is in excel format
library(readxl)
=>loading the data set
cdata <- read_excel(file.choose())
=>Reducing the coloumn to summary the data
data <-cdata[,c(2:5)]
=>summarizing the data
summary(data)
Murder Assault UrbanPop Rape
Min. : 0.800 Min. : 45.0 Min. :32.00 Min. : 7.30
1st Qu.: 4.075 1st Qu.:109.0 1st Qu.:54.50 1st Qu.:15.07
Median : 7.250 Median :159.0 Median :66.00 Median :20.10
Mean : 7.788 Mean :170.8 Mean :65.54 Mean :21.23
3rd Qu.:11.250 3rd Qu.:249.0 3rd Qu.:77.75 3rd Qu.:26.18
Max. :17.400 Max. :337.0 Max. :91.00 Max. :46.00
summary(normalized_data)
=>Finding the distance in the values through ‘euclidean’ method
d <- dist(normalized_data, method = "euclidean")
=>Clusting the data through the method ‘Complete’
fit <- hclust(d, method = "complete")
# display dindogram
plot(fit)
6
5
4
3
Height
2
2
1
13 223 9
25 8
739
156
28
33
0
5017
44
42
10
274
454
43
12
381
21
30
24
40
1635
181
26
41
483
46
1519
37
47
20
31
36
23
49
14
32
29
plot(fit, hang = -1)
6
5
4
Height
3
2
1
0
41
48
34
45
19
15
29
12
26
27
17
4
46
50
25
37
47
8
39
21
30
7
23
49
36
14
16
35
38
11
44
6
5
28
9
43
13
32
3
22
20
31
2
1
18
10
42
33
24
40
3
2
1
0
41
48
34
45
19
15
29
12
26
27
17
4
46
50
25
37
47
8
39
21
30
7
23
49
36
14
16
35
38
11
44
6
5
28
9
43
13
32
3
22
20
31
2
1
18
10
42
33
24
40
getwd()