Deep Density-Based Image Clustering
Deep Density-Based Image Clustering
Knowledge-Based Systems
journal homepage: www.elsevier.com/locate/knosys
article info a b s t r a c t
Article history: Recently, deep clustering, which is able to perform feature learning that favors clustering tasks via deep
Received 11 August 2019 neural networks, has achieved remarkable performance in image clustering applications. However,
Received in revised form 4 January 2020 the existing deep clustering algorithms generally need the number of clusters in advance, which is
Accepted 29 March 2020
usually unknown in real-world tasks. In addition, the initial cluster centers in the learned feature space
Available online 11 April 2020
are generated by k-means. This only works well on spherical clusters and probably leads to unstable
Keywords: clustering results. In this paper, we propose a two-stage deep density-based image clustering (DDC)
Deep clustering framework to address these issues. The first stage is to train a deep convolutional autoencoder (CAE) to
Density-based clustering extract low-dimensional feature representations from high-dimensional image data, and then apply t-
Feature learning SNE to further reduce the data to a 2-dimensional space favoring density-based clustering algorithms.
In the second stage, we propose a novel density-based clustering technique for the 2-dimensional
embedded data to automatically recognize an appropriate number of clusters with arbitrary shapes.
Concretely, a number of local clusters are generated to capture the local structures of clusters,
and then are merged via their density relationship to form the final clustering result. Experiments
demonstrate that the proposed DDC achieves comparable or even better clustering performance than
state-of-the-art deep clustering methods, even though the number of clusters is not given.
© 2020 Elsevier B.V. All rights reserved.
https://doi.org/10.1016/j.knosys.2020.105841
0950-7051/© 2020 Elsevier B.V. All rights reserved.
2 Y. Ren, N. Wang, M. Li et al. / Knowledge-Based Systems 197 (2020) 105841
are normally raising: (1) How deep clustering methods effectively a cluster with points from continuous high-density regions and
find appropriate number of clusters with irregular shape when the treats those points in low-density regions as outliers or noises. In-
number of clusters is not known a-prior? (2) Do we really need to spired by this popular algorithm, a lot of density-based clustering
refine the deep neural networks with the initial cluster assignment? methods have been designed, such as OPTICS [41], DENCLUE [42],
In this paper, we aim to answer these two questions and DESCRY [43], and others [44–47]. DenPeak (clustering by fast
propose a novel effective deep density-based clustering (DDC) search and find of density peaks) [48] is another immensely
method for images. Specifically, DDC first learns deep feature rep- popular density-based clustering method, which assumes that
resentation of data via a deep autoencoder. Second, t-SNE [32] is cluster centers locate in regions with higher density and the dis-
adopted to further reduce the learned features to a 2-dimensional tances among different centers should be relatively large. Some
space while preserving the pairwise similarity of data instances. improvements of DenPeak have also been made [49–51]. These
Finally, we develop a novel density-based clustering method methods described above are applied in the original feature space.
which considers both the local structures of clusters and impor- Thus, their performance for grouping images which are with high
tance of instances to generate the final clustering results. The dimensionality is not satisfied due to the limited representation
source code of the proposed DDC is available at https://github. ability.
com/Yazhou-Ren/DDC. Most recently, several deep clustering methods [29–31] which
The contributions of this work are stated as below: seek to address the issue of estimating the number of clusters
have been proposed, i.e., DDC-UF (deep density clustering of
• We propose an effective density-based technique for deep unconstrained faces) [29], DCC (deep continuous clustering) [30],
clustering which can automatically find appropriate number and DED (deep embedding determination) [31]. However, these
of image clusters with arbitrary shapes. We first reduce the methods ignore the local structures in each cluster, and do not
original data to a 2-dimensional space and then develop a allow points to play different roles according to their densities.
novel density-based clustering method for the learned data. By contrast, the proposed DDC takes into both the local informa-
• DDC is with good cluster visualization and interpretability. tion of clusters and importance of points account and achieves
Its properties are theoretically and empirically analyzed. significant improvements on clustering performance.
Its efficiency and robustness to parameter setting are also
empirically verified. 3. Deep density-based image clustering
• Extensive experiments are conducted to show that DDC be-
comes the new state-of-the-art deep clustering method on This section presents the proposed deep density-based image
various image clusters discovering tasks when the number clustering (DDC) in detail. Let X = {xi ∈ RD }ni=1 denote the
of clusters is unknown. image data set, where n is number of data points and D is the
dimensionality. DDC aims at grouping X into an appropriate
2. Related work number of disjoint clusters without any prior knowledge such as
the number of clusters and label information. DDC is a two-stage
2.1. Deep clustering deep clustering model which contains two main steps, i.e., deep
feature learning which nonlinearly transfers the original features
Due to the good representation ability, deep neural networks to a low dimensional space, and density-based clustering which
(DNN) have gained impressive achievements in various types of automatically recognizes an appropriate number of clusters with
machine learning and computer vision applications [33–35]. Most shapes in the latent space.
of the DNN methods focus on supervised problems in which the
label information is known. In recent several years, people pay 3.1. Deep feature learning
increasing attentions to adopting DNN in unsupervised learn-
As deep clustering methods generally do, we adopt deep au-
ing tasks and a number of deep clustering methods have been
toencoder to initialize the feature transformation due to its ex-
proposed.
cellent representation ability. An autoencoder is consisted of two
One kind of deep clustering methods divide the clustering pro-
parts: the encoder h = fΘ (x) (maps each data point x to a learned
cedure into two stages, i.e., feature learning and clustering. They
representation h) and the decoder x′ = gΩ (h) (transfers data
first perform feature learning via DNN and then apply clustering
from the learned feature space to the original one). Here, the
algorithms in the learned space [19,20,36,37]. The other kind of
feature dimensionality of h is d. Θ and Ω denote the parameters
deep clustering methods incorporate the abovementioned two
of the encoder and decoder, respectively. In this paper, we use the
stages into one framework. Song et al. [38] refine the autoencoder
denoising autoencoder [52] that solves the following problem:
such that data representations in the learned space are close
n
to their affiliated cluster centers. Xie et al. [21] propose deep 1∑
embedded clustering (DEC) to jointly learn the cluster assign- arg min ∥xi − gΩ (fΘ (x̃i ))∥22 (1)
Θ ,Ω n
ment and the feature representations. Ren et al. [39] propose i=1
semi-supervised deep embedded clustering to enhance the per- where x̃ is a corrupted copy of x by adding noises, e.g., adding
formance of DEC by using pairwise constraints. Yang et al. [23] Gaussian noise or randomly setting a portion of input data to
and Chang et al. [17] apply convolutional neural networks (CNN) 0. We use the stacked autoencoder (SAE) [53] in this work, in
for exploring image clusters. Guo et al. [40] improve DEC with which each layer is a denoising autoencoder trained to recon-
local structure preservation. Guo et al. [18] use data augmentation struct the previous layer’s output. For image clustering, we adopt
in the DEC framework and achieve state-of-the-art clustering the deep convolutional autoencoder (CAE) in the experiments,
performance on several image data sets. whose structure will be stated in Section 3.3.
In [18], the data augmentation (DA) technique is used in
2.2. Density-based clustering the training process of deep autoencoder and has achieved sig-
nificant improvements of clustering performance. The resulting
The key advantage of density-based clustering is that the optimization model is:
number of clusters is not needed and clusters with arbitrary n
shape can be found. Over the past decades, many density-based 1∑
arg min ∥x̄i − gΩ (fΘ (x̄i ))∥22 (2)
clustering methods have been developed. DBSCAN [7] defines Θ ,Ω n
i=1
Y. Ren, N. Wang, M. Li et al. / Knowledge-Based Systems 197 (2020) 105841 3
where x̄i = Trand (xi ) denotes the random transformation1 of xi . Definition 1 (Local Cluster Centers).Those points satisfying the
When the training of deep autoencoder (solving Eq. (1) or following condition are defined as local cluster centers:
Eq. (2)) is finished, we observe the feature representations H =
{hi = fΘ (xi ) ∈ Rd }ni=1 . For visualization and better fitting the δi > dc and ρj > ρ̄ (9)
∑n
designed density-based clustering algorithm, we further reduce where ρ̄ = n1
j=1 ρj is the average density of all the points
data H to a 2-dimensional space Z = {zi ∈ R2 }ni=1 by using {zi }ni=1 .
t-SNE [32] which owns good preservation ability of pairwise
similarities. It is easy to verify that a local cluster center zi owns the largest
t-SNE is a dimensionality reduction method which can visual- density in its dc -neighborhood, i.e., a circle with zi and dc as the
ize high-dimensional data in a 2-dimensional space. Firstly, t-SNE center and radius, respectively. When all the local cluster centers
defines the joint probability pij of data points hi and hj as: are obtained, we assign each remaining point to the cluster as its
nearest neighbor of higher density. Then, a set of local clusters
pi|j + pj|i
pij = (3) are found and will be used to generate the final clustering. To
2n analyze the characteristic of local cluster centers, the following
where two theorems are stated.
exp(−∥hi − hj ∥2 /2σi2 )
pj|i = ∑ (4) Theorem 1. A local cluster center zi owns the largest density value
k̸ =i exp(−∥hi − hk ∥2 /2σi2 ) ρi locally in its dc -neighborhood.
Here, σi is a parameter for hi . Secondly, the joint probability qij of
zi and zj in the learned 2-dimensional space is calculated as: Proof. We use ‘proof by contradiction’ method to prove the
theorem. For a local cluster center zi , assume that there exists
2 −1
(1 + ∥zi − zj ∥ ) a point zj in the dc -neighborhood of zi satisfying ρj > ρi . Then,
qij = ∑ (5)
k̸ =l (1 + ∥zk − zl ∥2 )−1 δi ≤ dc holds according to Eq. (8). This actually contradicts Eq. (9)
in Definition 1. Thus, the assumption is wrong and the theorem
Both pii and qii are set to 0. Then, t-SNE seeks to minimize the
is proved. □
Kullback–Leibler divergence between the two joint probability
distributions P and Q :
Theorem 2. The distance of two local cluster centers with different
∑∑ pij densities is at least dc .
KL(P ∥ Q ) = pij log (6)
qij
i j
Proof. Suppose zi and zj are two local cluster centers with ρi ̸ =
When all the 2-dimensional data points {zi ∈ R2 }ni=1 are ρj . We assume the distance dij < dc , then zi and zj are in the
obtained, we develop a novel density-based clustering in the dc -neighborhoods of each other. Since zi is a local cluster center,
embedded space Z as below. it owns the highest density in its dc -neighborhood. Thus, ρi ≥ ρj .
zj is also a local cluster center. Similarly, we have ρj ≥ ρi . Thus,
3.2. Density-based clustering ρi = ρj . This contradicts the condition of the theorem. □
Thus, the distance of two local clusters is smaller than dc
We propose a novel density-based clustering method to ob-
only when they have the same density and Eq. (9) holds at the
tain an appropriate partition of data Z = {zi ∈ R2 }ni=1 in
same time. In real tasks, this situation extremely rarely occurs.
the 2-dimensional feature space when the number of clusters is
As a consequence, Theorems 1 and 2 indicate two important
unavailable.
properties of local cluster centers: (1) Each local center is with
the highest density locally. (2) The selected cluster centers are
3.2.1. Local clusters generation
not too close to each other, preventing a huge number of cluster
DDC shares two fundamental definitions (i.e., ρi and δi of point
centers from being selected.
zi ) with DenPeak [48]. Concretely, DDC defines the density of ρi
of point zi via a Gaussian kernel:
3.2.2. Merging local clusters
Suppose L local clusters (C (1) , C (2) , . . . , C (L) ) are obtained, they
( )
∑ dij
ρi = exp −( ) 2
(7)
dc will be merged to form the final clustering result. First, we define
zj ∈Z \{zi }
core and border points in Definition 2.
where dij is the Euclidean distance between points zi and zj , and
dc is the cutoff distance that need to be predefined. A higher value Definition 2 (Core and Border Points of a Cluster).Suppose a point
of ρi means a higher density of point zi . δi of point zi denotes the zi is from local cluster C (k) , it is defined as a core point if the
minimum Euclidean distance between zi and those points whose following condition holds:
densities are larger than zi . That is,
ρi > ρ̄ (k) (10)
δi = min (dij ) (8)
where ρ̄ (k) 1
ρj is the average density of all the points
∑
j:ρj >ρi = nk zj ∈C (k)
For the point with the highest density, its ρ is set to the in C (k) and nk is the number of points in C (k) . Otherwise, zi is
maximum of pairwise distances. DenPeak simply chooses several considered as a border point.
points with the highest ρ and δ values as cluster centers. Different Definition 2 indicates that whether a point is a core or border
from DenPeak, we consider those points with relatively large ρ point depends on its own density and the average density of
and δ values as local cluster centers. The corresponding definition the local cluster to which this point belongs. Generally, the core
is given in Definition 1. points of a cluster locate in the central regions, while the border
points place in the boundary of areas with lower density.
1 As in [18], we randomly shift for at most 3 pixels in each direction and Then, we define connectivity of clusters in Definitions 3 and
randomly rotate for at most 10◦ . 4.
4 Y. Ren, N. Wang, M. Li et al. / Knowledge-Based Systems 197 (2020) 105841
Fig. 1. Twomoon: Clustering performance comparison of DenPeak and DDC. The Twomoon data set has 2000 points from two classes. (a): The decision graph of
DenPeak. (b): The final result of DenPeak. (c): Initial local clusters of DDC. (d): The final result of DDC. (e): The border points detected by DDC are plotted as black
points. The center of each cluster is highlighted with black ‘♦’. Points with the same color are from the same cluster. As shown in (a), a number of points with high
ρ and δ values can be considered as centers and it is hard for DenPeak to choose an appropriate number of clusters. Even it is told that 2 clusters exist, the result
of DenPeak is still not satisfied, as (b) shows. By contrast, DDC first generate a relatively large number of local cluster centers and then merge them to form the
final clustering result. Compared (c) with (e), we find that two clusters are typically merged if there exists core points that are from both clusters and are close to
each other. It is shown in (e) that border points generally locate around the boundary of each real cluster, while core points locate in central areas.
Fig. 2. Clustering results of DenPeak and DDC on Flame and t4 data sets. (a) and (b) correspond to the Flame data set. (c) and (d) show the results on t4. DenPeak
is told to select the true number of clusters. Due to loss information of local structures, DenPeak fails to find suitable clusters (as shown in (a) and (c)). In contrast,
DDC performs perfectly on these two data sets. Even when noisy data exist (as exhibited in (d)), DDC can still automatically recognize the 4 irregular clusters.
Table 2
Results of the comparing methods. In each column, the best two results are highlighted in boldface. The results marked by ‘*’ are excerpted from the papers. ‘–’
denotes the results are unavailable from the papers or codes, and ‘- -’ means ‘out of memory’ when applying.
MNIST MNIST-test USPS Fashion LetterA-J
ACC NMI ACC NMI ACC NMI ACC NMI ACC NMI
k-means 0.485 0.470 0.563 0.510 0.611 0.607 0.554 0.512 0.354 0.309
DBSCAN - - - - 0.114 0 0.167 0 0.100 0 0.100 0
DenPeak - - - - 0.357 0.399 0.390 0.433 0.344 0.398 0.300 0.211
DEC 0.849 0.816 0.856 0.830 0.758 0.769 0.591 0.618 0.407 0.374
IDEC 0.881* 0.867* 0.846 0.802 0.759 0.777 0.523 0.600 0.381 0.318
DCN 0.830* 0.810* 0.802* 0.786* 0.688* 0.683* – – – –
JULE 0.964* 0.913* 0.961* 0.915* 0.950* 0.913* – – – –
DEPICT 0.965* 0.917* 0.963* 0.915* 0.964* 0.927* – – – –
ClusterGAN 0.950* 0.890* – – – – – – – –
DWSC 0.948* 0.889* – – – – – – – –
DKM 0.840* 0.796* – – 0.757* 0.776* – – – –
VaDE 0.945* 0.876* 0.287* 0.287* 0.566* 0.512* – – – –
DCC 0.963* – – – – – – – – –
DED - - - - 0.690 0.818 0.781 0.855 0.473 0.617 0.371 0.440
ConvDEC 0.940 0.916 0.861 0.847 0.784 0.820 0.514 0.588 0.517 0.536
ConvDEC-DA 0.985 0.961 0.955 0.949 0.970 0.953 0.570 0.632 0.571 0.608
DDC 0.965 0.932 0.965 0.916 0.967 0.918 0.619 0.682 0.573 0.546
DDC-DA 0.969 0.941 0.970 0.927 0.977 0.939 0.609 0.661 0.691 0.629
images. The USPS data set4 is collected from handwritten dig- 4.2. Evaluation measures
its from envelopes by the U.S. postal service. It contains 9298
grayscale images with size 16 × 16. Fashion [54] is a data set Clustering accuracy (ACC) and normalized mutual information
comprising 28 × 28 gray images of 70000 fashion products from (NMI) are used to estimate the performance of comparing algo-
10 categories. Its test set with 10000 images are used in our rithms. Their values are both in [0,1]. A higher value of ACC or
experiments. The LetterA-J data set5 is consisted of more than NMI indicates a better clustering performance.
500k 28 × 28 grayscale images of English letters from A to J. We
randomly select 10000 images from its uncleaned subset as test
set. 4.3. Comparing methods
The summary of all data sets is shown in Table 1. The features
of each data set are scaled to [0, 1]. We compare the proposed DDC with both shallow cluster-
ing methods and deep ones. Shallow baselines are k-means [4],
DBSCAN [7], and DenPeak [48]. Deep methods based on both
4 https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html full connected and convolutional autoencoders are compared,
5 https://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html including DEC (deep embedded clustering) [21], IDEC (improved
6 Y. Ren, N. Wang, M. Li et al. / Knowledge-Based Systems 197 (2020) 105841
DEC with local structure preservation) [40], DCN (deep clus- 5.3. Runtime analysis
tering network) [22], JULE (joint unsupervised learning for im-
age clustering) [23], DEPICT (deep embedded regularized clus- We compare our method with DEC-DA [18] because these two
tering) [24], ClusterGAN [25], DWSC (deep weighted k-subspace
models use the same CAE structure and DEC-DA has been proved
clustering) [26], DKM (deep k-means) [27], VaDE (variational
to be efficient compared with other existing deep clustering
deep embedding) [28], DCC (deep continuous clustering) [30],
methods. The experiments are tested on a server with 32 GB RAM
DED (deep embedding determination) [31], DEC-DA (DEC with
and 2 T P100 GPUs. Concretely, the runtimes of our DDC-DA on
data augmentation) [18].
Among all the comparing methods, DBSCAN, DenPeak, DCC, MNIST-test and USPS are 737 and 583 s, respectively. Those of
DED, and the proposed DCC do not need the number of clusters ConvDEC-DA are 798 and 436 s, respectively. DDC-DA needs time
in advance. For all other methods, the number of clusters is set to to estimate the density ρ and δ for each point. ConvDEC-DA needs
the ground-truth number of categories. When applying DBSCAN, to refine the CAE with initial cluster centers. Thus, these two
the 4-th nearest neighbor distances are computed w.r.t. the entire methods show competitive performance in terms of efficiency.
data, and parameter Eps is set to the median of those values.
The MinPts value of DBSCAN is always set to 4. For DenPeak, 6. Discussion
the Gaussian kernel is used and dc is set such that the average
number of points in dc -neighborhood is approximately 1% × n. We also conduct experiments to directly use t-SNE to reduce
To give DenPeak and DED an advantage, the detected number the original data to the 2-dimensional space and then apply
of clusters is set to the true number of classes according to the the proposed density-based clustering technique. The clustering
decision graph. So far, given the ground-truth number of clusters, results are much worse than our DDC methods. The main reason
ConvDEC-DA achieves state-of-the-art clustering performance in is that CAE can transform the original data to a lower dimensional
image clustering [18]. We compare ConvDEC-DA and its version space in which the intrinsic local structures are preserved. It is
without using DA in our experiments. better to further reduce the lower dimensional representations
The reported ACC and NMI values are either excerpted from to a 2-dimensional space rather than extracting from the original
the original papers, or are the average values of running the high dimensional data. As a consequence, DED [31] and our DDC
released code with corresponding suggested parameters for 10 make use of both CAE and t-SNE to obtain the 2-dimensional
independent trials.
representations that favor the density-based clustering.
5. Results and analysis Now, let us come back to the question raised in Section 1: Is
it really needed to refine the deep autoencoder with the initial
5.1. Results on real image data cluster assignment? To answer this question, we first visualize
the clustering results on MNIST-test and LetterA-J in the embed-
Table 2 gives the clustering results of comparing methods ded 2-dimensional space of DDC-DA in Figs. 4 and 5, respectively.
measured by ACC and NMI. In each column, the best two re- For data whose clusters are well separated (as shown in Fig. 4(a)),
sults are highlighted in boldface. From Table 2 we have the those centroid-based clustering methods, such as ConvDEC-DA,
following observations: (1) The shallow models generally per- which depends greatly on the initial selection of cluster centers,
form worse than deep clustering methods. DBSCAN works the needs to refine the CAE iteratively to achieve satisfied results. By
worst mainly because it is hard to choose suitable parameters in contrast, our DDC can output remarkable performance without
high dimensional space. (2) Data augmentation (DA) can improve refinement even when several clusters in the middle area have
the clustering performance. Except for two methods using DA overlapped areas.
(i.e., ConvDEC-DA and DDC-DA), our DDC always achieves the For data in which many points from different categories mess
highest ACC and NMI values. (3) Our DDC-DA always achieves one together (as shown in the middle area of Fig. 5(a)), the refinement
of the best two clustering results, even the number of clusters of ConvDEC-DA cannot separate the messed points correctly,
is not given. Even given the true number of clusters, DED still neither does our DDC. If this happens and no additional infor-
performs much worse than DDC and DDC-DA. (4) We also find mation is given, the effectiveness of refining autoencoder is not
that ConvDEC-DA sometimes performs unstably. For instance, it
significant for both centroid-based and density-based clustering.
can usually obtain a high ACC value (>0.98) on MNIST-test, but
In our opinion, one needs prior information (e.g., pairwise con-
it performs worse (ACC <0.84) occasionally on this data set. This
straints) or knowledge transferred from related tasks to handle
might be caused by the bad initial cluster centers provided by
k-means in the learned feature space. By contrast, our DDC and this situation.
DDC-DA are more stable with small standard deviations.
The average number of clusters detected by our DDC and DDC- 7. Conclusion and future work
DA as well as the corresponding standard deviations are given
in Table 3. From Table 3 we find that our methods can always We propose a novel deep density-based clustering (DDC)
find the correct numbers of categories on MNIST-test, and USPS. method for image clustering. It is well known that for high-
On MNIST, Fashion and LetterA-J, the recognized numbers of dimensional data such as images, it is difficult to obtain satisfied
clusters are slightly different from the true values. These indicate performance by applying clustering methods in the original space
the capability of the proposed DDC framework of automatically of image data. So in DDC, first, we use CAE with good representa-
recognizing reasonable numbers of clusters. tion ability to extract 10-dimensional features from the original
Y. Ren, N. Wang, M. Li et al. / Knowledge-Based Systems 197 (2020) 105841 7
Fig. 4. Visualization of DDC-DA on MNIST-test. (a) The ground truth labels of the embedded 2-dimensional data. (b) The initial result of DDC-DA. (c) The final result
of DDC-DA. (d) The border points detected by DDC-DA.
Fig. 5. Visualization of DDC-DA on LetterA-J. (a) The ground truth labels of the embedded 2-dimensional data. (b) The initial result of DDC-DA. (c) The final result
of DDC-DA. (d) The border points detected by DDC-DA.
[8] D. Comaniciu, P. Meer, Mean shift: a robust approach toward feature space [33] Y. Bengio, Learning deep architectures for AI, Found. Trends Mach. Learn.
analysis, TPAMI 24 (5) (2002) 603–619. 2 (1) (2009) 1–127.
[9] Y. Ren, U. Kamath, C. Domeniconi, G. Zhang, Boosted mean shift clustering, [34] Y. Bengio, A. Courville, P. Vincent, Representation learning: A review and
in: ECML-PKDD, 2014, pp. 646–661. new perspectives, TPAMI 35 (8) (2013) 1798–1828.
[10] Y. Ren, C. Domeniconi, G. Zhang, G. Yu, A weighted adaptive mean shift [35] G.E. Hinton, S. Osindero, Y.W. Teh, A fast learning algorithm for deep belief
clustering algorithm, in: SDM, 2014, pp. 794–802. nets, Neural Comput. (2006) 1527–1554.
[11] Y. Ren, X. Hu, K. Shi, G. Yu, D. Yao, Z. Xu, Semi-supervised denpeak [36] G. Chen, Deep learning with nonparametric clustering, 2015, pp. 1–14,
clustering with pairwise constraints, in: Proceedings of the 15th Pacific arXiv preprint arXiv:1501.03084.
Rim International Conference on Artificial Intelligence, 2018, pp. 837–850. [37] M. Shao, S. Li, Z. Ding, Y. Fu, Deep linear coding for fast graph clustering,
[12] C.M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006, in: IJCAI, 2015, pp. 3798–3804.
pp. 430–439. [38] C. Song, F. Liu, Y. Huang, L. Wang, T. Tan, Auto-encoder based data
[13] D.D. Lee, H.S. Seung, Algorithms for non-negative matrix factorization, in: clustering, in: Progress in Pattern Recognition, Image Analysis, Computer
NIPS, MIT Press, 2001, pp. 556–562. Vision, and Applications, Springer, 2013, pp. 117–124.
[14] S. Huang, Z. Xu, J. Lv, Adaptive local structure learning for document [39] Y. Ren, K. Hu, X. Dai, L. Pan, S.C. Hoi, Z. Xu, Semi-supervised deep
co-clustering, Knowl.-Based Syst. 148 (2018) 74–84. embedded clustering, Neurocomputing 325 (2019) 121–130.
[15] S. Huang, P. Zhao, Y. Ren, T. Li, Z. Xu, Self-paced and soft-weighted [40] X. Guo, L. Gao, X. Liu, J. Yin, Improved deep embedded clustering with
nonnegative matrix factorization for data representation, Knowl.-Based local structure preservation, in: IJCAI, 2017, pp. 1573–1759.
Syst. 164 (2019) 29–37. [41] M. Ankerst, M.M. Breunig, H.-P. Kriegel, J. Sander, OPTICS: ordering points
[16] F.D.l. Torre, T. Kanade, Discriminative cluster analysis, in: ICML, 2006, pp. to identify the clustering structure, in: SIGMOD, ACM, 1999, pp. 49–60.
241–248. [42] A. Hinneburg, D.A. Keim, et al., An efficient approach to clustering in large
[17] J. Chang, L. Wang, G. Meng, S. Xiang, C. Pan, Deep adaptive image multimedia databases with noise, in: KDD, vol. 98, 1998, pp. 58–65.
clustering, in: CVPR, 2017, pp. 5879–5887. [43] F. Angiulli, C. Pizzuti, M. Ruffolo, DESCRY: a density based clustering
[18] X. Guo, E. Zhu, X. Liu, J. Yin, Deep embedded clustering with data algorithm for very large data sets, in: Proceedings of the International Con-
augmentation, in: ACML, 2018, pp. 550–565. ference on Intelligent Data Engineering and Automated Learning, Springer,
[19] X. Peng, S. Xiao, J. Feng, W.Y. Yau, Z. Yi, Deep subspace clustering with 2004, pp. 203–210.
sparsity prior, in: IJCAI, 2016, pp. 1925–1931. [44] Q. Du, Z. Dong, C. Huang, F. Ren, Density-based clustering with geograph-
[20] F. Tian, B. Gao, Q. Cui, E. Chen, T.-Y. Liu, Learning deep representations for ical background constraints using a semantic expression model, ISPRS Int.
graph clustering, in: AAAI, 2014, pp. 1293–1299. J. Geo-Inf. 5 (5) (2016) 72.
[21] J. Xie, R.B. Girshick, A. Farhadi, Unsupervised deep embedding for [45] Y. Gu, X. Ye, F. Zhang, Z. Du, R. Liu, L. Yu, A parallel varied density-based
clustering analysis, in: ICML, 2016, pp. 478–487. clustering algorithm with optimized data partition, J. Spatial Sci. (2017)
[22] B. Yang, X. Fu, N.D. Sidiropoulos, M. Hong, Towards K-means-friendly 1–22.
spaces: Simultaneous deep learning and clustering, in: ICML, 2017, pp. [46] Y. Lv, T. Ma, M. Tang, J. Cao, Y. Tian, A. Al-Dhelaan, M. Al-Rodhaan, An
3861–3870. efficient and scalable density-based clustering algorithm for datasets with
[23] J. Yang, D. Parikh, D. Batra, Joint unsupervised learning of deep complex structures, Neurocomputing 171 (2016) 9–22.
representations and image clusters, in: CVPR, 2016, pp. 5147–5156. [47] S.T. Mai, X. He, J. Feng, C. Plant, C. Böhm, Anytime density-based clustering
[24] K. Ghasedi Dizaji, A. Herandi, C. Deng, W. Cai, H. Huang, Deep cluster- of complex data, Knowl. Inf. Syst. 45 (2) (2015) 319–355.
ing via joint convolutional autoencoder embedding and relative entropy [48] A. Rodriguez, A. Laio, Clustering by fast search and find of density peaks,
minimization, in: ICCV, 2017, pp. 5736–5745. Science 344 (6191) (2014) 1492–1496.
[25] S. Mukherjee, H. Asnani, E. Lin, S. Kannan, ClusterGAN: Latent space [49] Y. Liu, Z. Ma, F. Yu, Adaptive density peak clustering based on k-
clustering in generative adversarial networks, AAAI (2019) 4610–4617. nearest neighbors with aggregating strategy, Knowl.-Based Syst. 133 (2017)
[26] W. Huang, M. Yin, J. Li, S. Xie, Deep clustering via weighted k-subspace 208–220.
network, IEEE Signal Process. Lett. 26 (11) (2019) 1628–1632. [50] R. Mehmood, S. El-Ashram, R. Bie, H. Dawood, A. Kos, Clustering by fast
[27] M.M. Fard, T. Thonet, E. Gaussier, Deep k-means: Jointly clustering with search and merge of local density peaks for gene expression microarray
k-means and learning representations, 2018, pp. 1–14, arXiv preprint data, Sci. Rep. 7 (2017) 45602.
arXiv:1806.10069. [51] J. Xu, G. Wang, W. Deng, DenPEHC: Density peak based efficient
[28] Z. Jiang, Y. Zheng, H. Tan, B. Tang, H. Zhou, Variational deep embedding: hierarchical clustering, Inform. Sci. 373 (2016) 200–218.
An unsupervised and generative approach to clustering, in: IJCAI, 2017, pp. [52] P. Vincent, H. Larochelle, Y. Bengio, P.-A. Manzagol, Extracting and com-
1965–1972. posing robust features with denoising autoencoders, in: ICML, 2008, pp.
[29] W.-A. Lin, J.-C. Chen, C.D. Castillo, R. Chellappa, Deep density clustering of 1096–1103.
unconstrained faces, in: CVPR, 2018, pp. 8128–8137. [53] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, P.-A. Manzagol, Stacked
[30] S.A. Shah, V. Koltun, Deep continuous clustering, 2018, pp. 1–11, arXiv denoising autoencoders: Learning useful representations in a deep network
preprint arXiv:1803.01449. with a local denoising criterion, JMLR 11 (2010) 3371–3408.
[31] Y. Wang, E. Zhu, Q. Liu, Y. Chen, J. Yin, Exploration of human activities [54] H. Xiao, K. Rasul, R. Vollgraf, Fashion-MNIST: a novel image dataset for
using sensing data via deep embedded determination, in: Proceedings benchmarking machine learning algorithms, 2017, pp. 1–6, arXiv preprint
of the International Conference on Wireless Algorithms, Systems, and arXiv:1708.07747.
Applications, 2018, pp. 473–484.
[32] L.v.d. Maaten, G. Hinton, Visualizing data using t-SNE, JMLR 9 (2008)
2579–2605.