Hierarchical Cluster Analysis - R Tutorial
Hierarchical Cluster Analysis - R Tutorial
http://www.r-tutor.com/gpu-computing/clustering/hierarchical-cluster-an...
Search With the distance matrix found in previous tutorial, we can use various techniques of cluster analysis for relationship discovery. For example, in the data set mtcars, we can run the distance matrix with hclust, and plot a dendrogram that displays a hierarchical relationship among the vehicles. > d <- dist(as.matrix(mtcars)) > hc <- hclust(d) > plot(hc) # find distance matrix # apply hirarchical clustering # plot the dendrogram
R Tutorial eBook
Careful inspection of the dendrogram shows that 1974 Pontiac Firebird and Camaro Z28 are classified as close relatives as expected.
R Tutorials
R Introduction Elementary Statistics with R GPU Computing with R Distance Matrix by GPU Hierarchical Cluster Analysis Kendall Rank Coefficient
Similarly, the dendrogram shows that the 1974 Honda Civic and Toyota Corolla are close to each other.
Significance Test for Kendall's Tau-b Support Vector Machine with GPU Support Vector Machine with GPU, Part II Bayesian Classification with Gaussian Process Hierarchical Linear Model Installing GPU Packages
Recent Articles
Installing CUDA Toolkit 5.5 on
In general, there are many choices of cluster analysis methodology. The hclust function in R uses the complete linkage method for hierarchical clustering by default. This particular clustering method defines the cluster distance between two clusters to be the maximum
1 of 3
1/21/2014 1:40 PM
http://www.r-tutor.com/gpu-computing/clustering/hierarchical-cluster-an...
distance between their individual components. At every stage of the clustering process, the two nearest clusters are merged into a new cluster. The process is repeated until the whole data set is agglomerated into one single cluster. For a data set with 4,000 elements, it takes hclust about 2 minutes to finish the job on an AMD Phenom II X4 CPU. > test.data <- function(dim, num, seed=17) { + + + } > m <- test.data(120, 4500) > > library(rpud) > d <- rpuDist(m) > > system.time(hclust(d)) user 115.765 system elapsed 0.087 115.914 # complete linkage # load rpud with rpudplus # Euclidean distance set.seed(seed) matrix(rnorm(dim * num), nrow=num)
Fedora 18 Linux August 2, 2013 Hierarchical Linear Model July 22, 2013 Bayesian Classification with Gaussian Process January 6, 2013
By code optimization, the rpuHclust function in rpud equipped with the rpudplus add-on performs much better. Moreover, as added bonus, the rpuHclust function creates identical cluster analysis output just like the original hclust function in R. Note that the algorithm is mostly CPU based. The memory access turns out to be too excessive for GPU computing. > system.time(rpuHclust(d)) user 0.792 system elapsed 0.104 0.896 # rpuHclust with rpudplus
Here is a chart that compares the performance of hclust and rpuHclust with rpudplus in R:
Exercises 1. Run the performance test with more vectors in higher dimensions. 2. Compute hierarchical clustering with other linkage methods, such as single,
2 of 3
1/21/2014 1:40 PM
http://www.r-tutor.com/gpu-computing/clustering/hierarchical-cluster-an...
median, average, centroid, Wards and McQuittys. Distance Matrix by GPU Tags: GPU Computing with R matrix hierarchical clustering set.seed rpud rpuHclust up cluster analysis dist mtcars hclust complete linkage library matrix Kendall Rank Coefficient dendrogram rnorm distance
plot
rpuDist
Copyright 2009 - 2014 Chi Yau All Rights Reserved Theme design by styleshout Fractal graphics by zyzstar Adaptation by Chi Yau
3 of 3
1/21/2014 1:40 PM