Do logarithmic proximity measures outperform plain ones in graph clustering?

Ivashkin, Vladimir; Chebotarev, Pavel

Computer Science > Machine Learning

arXiv:1605.01046v3 (cs)

[Submitted on 3 May 2016 (v1), last revised 18 Feb 2017 (this version, v3)]

Title:Do logarithmic proximity measures outperform plain ones in graph clustering?

Authors:Vladimir Ivashkin, Pavel Chebotarev

View PDF

Abstract:We consider a number of graph kernels and proximity measures including commute time kernel, regularized Laplacian kernel, heat kernel, exponential diffusion kernel (also called "communicability"), etc., and the corresponding distances as applied to clustering nodes in random graphs and several well-known datasets. The model of generating random graphs involves edge probabilities for the pairs of nodes that belong to the same class or different predefined classes of nodes. It turns out that in most cases, logarithmic measures (i.e., measures resulting after taking logarithm of the proximities) perform better while distinguishing underlying classes than the "plain" measures. A comparison in terms of reject curves of inter-class and intra-class distances confirms this conclusion. A similar conclusion can be made for several well-known datasets. A possible origin of this effect is that most kernels have a multiplicative nature, while the nature of distances used in cluster algorithms is an additive one (cf. the triangle inequality). The logarithmic transformation is a tool to transform the first nature to the second one. Moreover, some distances corresponding to the logarithmic measures possess a meaningful cutpoint additivity property. In our experiments, the leader is usually the logarithmic Communicability measure. However, we indicate some more complicated cases in which other measures, typically, Communicability and plain Walk, can be the winners.

Comments:	11 pages, 5 tables, 9 figures. Accepted for publication in the Proceedings of 6th International Conference on Network Analysis, May 26-28, 2016, Nizhny Novgorod, Russia
Subjects:	Machine Learning (cs.LG); Discrete Mathematics (cs.DM)
MSC classes:	62H30, 91C20, 05C12, 05C50
Cite as:	arXiv:1605.01046 [cs.LG]
	(or arXiv:1605.01046v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1605.01046

Submission history

From: Pavel Chebotarev [view email]
[v1] Tue, 3 May 2016 19:52:48 UTC (2,496 KB)
[v2] Thu, 15 Dec 2016 20:01:08 UTC (5,917 KB)
[v3] Sat, 18 Feb 2017 09:04:02 UTC (5,554 KB)

Computer Science > Machine Learning

Title:Do logarithmic proximity measures outperform plain ones in graph clustering?

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Do logarithmic proximity measures outperform plain ones in graph clustering?

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators