research-article

Coupled nominal similarity in unsupervised learning

Authors:

Wei Wei,

Yuming OuAuthors Info & Claims

CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

Pages 973 - 978

https://doi.org/10.1145/2063576.2063715

Published: 24 October 2011 Publication History

Get Access

Abstract

The similarity between nominal objects is not straightforward, especially in unsupervised learning. This paper proposes coupled similarity metrics for nominal objects, which consider not only intra-coupled similarity within an attribute (i.e., value frequency distribution) but also inter-coupled similarity between attributes (i.e. feature dependency aggregation). Four metrics are designed to calculate the inter-coupled similarity between two categorical values by considering their relationships with other attributes. The theoretical analysis reveals their equivalent accuracy and superior efficiency based on intersection against others, in particular for large-scale data. Substantial experiments on extensive UCI data sets verify the theoretical conclusions. In addition, experiments of clustering based on the derived dissimilarity metrics show a significant performance improvement.

References

[1]

A. Ahmad and L. Dey. A k-mean clustering algorithm for mixed numeric and categorical data. Data and Knowledge Engineering, 63:503--527, 2007.

Digital Library

Google Scholar

[2]

S. Boriah, V. Chandola, and V. Kumar. Similarity measures for categorical data: a comparative evaluation. In SDM 2008, pages 243--254, 2008.

Crossref

Google Scholar

[3]

D. Cai, X. He, and J. Han. Document clustering using locality preserving indexing. IEEE TKDE, 17(12):1624--1637, 2005.

Digital Library

Google Scholar

[4]

L. Cao, Y. Ou, and P. Yu. Coupled behavior analysis with applications. IEEE Transactions on Knowledge and Data Engineering, 2011.

Digital Library

Google Scholar

[5]

S. Cost and S. Salzberg. A weighted nearest neighbor algorithm for learning with symbolic features. Machine Learning, 10(1):57--78, 1993.

Digital Library

Google Scholar

[6]

G. Das and H. Mannila. Context-based similarity measures for categorical databases. In PKDD 2000, pages 201--210, 2000.

Digital Library

Google Scholar

[7]

G. Gan, C. Ma, and J. Wu. Data clustering: theory, algorithms, and applications. ASA-SIAM Series on Statistics and Applied Probability, VA, 2007.

Digital Library

Google Scholar

[8]

M. Houle, V. Oria, and U. Qasim. Active caching for similarity queries based on shared-neighbor information. In CIKM 2010, pages 669--678, 2010.

Digital Library

Google Scholar

[9]

U. Luxburg. A tutorial on spectral clustering. Statistics and Computing, 17(4):1--32, 2007.

Digital Library

Google Scholar

[10]

D. Wilson and T. Martinez. Improved heterogeneous distance functions. Journal of Artificial Intelligence Research, 6:1--34, 1997.

Crossref

Google Scholar

Cited By

View all

Cendana MKuo R(2024)Categorical Data Clustering: A Bibliometric Analysis and TaxonomyMachine Learning and Knowledge Extraction10.3390/make60200476:2(1009-1054)Online publication date: 7-May-2024
https://doi.org/10.3390/make6020047
Tripathi KBiswas SKhare NShukla S(2024)Tackling Privacy Concerns in Correlated Big Data: A Comprehensive Review with Machine Learning Insights2024 IEEE International Students' Conference on Electrical, Electronics and Computer Science (SCEECS)10.1109/SCEECS61402.2024.10482215(1-6)Online publication date: 24-Feb-2024
https://doi.org/10.1109/SCEECS61402.2024.10482215
Bai LZhu L(2023)Keyword Coupling Query of Spatiotemporal XML DataUncertain Spatiotemporal Data Management for the Semantic Web10.4018/978-1-6684-9108-9.ch012(211-226)Online publication date: 15-Dec-2023
https://doi.org/10.4018/978-1-6684-9108-9.ch012
Show More Cited By

Index Terms

Coupled nominal similarity in unsupervised learning
1. General and reference
  1. Cross-computing tools and techniques
    1. Metrics
2. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

A novel similarity/dissimilarity measure for intuitionistic fuzzy sets and its application in pattern recognition

Among the most interesting measures in intuitionistic fuzzy sets (IFSs) theory, the similarity measure is an essential tool to compare and determine degree of similarity between IFSs. Although there exist many similarity measures for IFSs, most of them ...
A new similarity measure between intuitionistic fuzzy sets and the positive definiteness of the similarity matrix

As a generation of fuzzy set theory, intuitionistic fuzzy (IF) set theory has received considerable attention for its capability on dealing with uncertainty. Similarity measures of IF sets are used to indicate the degree of commonality between IF sets. ...
Modified cosine similarity measure between intuitionistic fuzzy sets
AICI'12: Proceedings of the 4th international conference on Artificial Intelligence and Computational Intelligence

Similarity of intuitionistic fuzzy sets (IFSs) is an important measure to indicate the similarity degree between IFSs. Recently, Ye (2011) proposed a similarity measure between IFSs based on the cosine concept. Although this cosine similarity measure ...

Comments

Information & Contributors

Information

Published In

CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

October 2011

2712 pages

ISBN:9781450307178

DOI:10.1145/2063576

Editors:
Bettina Berendt,
Arjen de Vries,
Wenfei Fan,
Craig Macdonald
University of Glasgow, UK
,
Iadh Ounis
University of Glasgow, UK
,
Ian Ruthven
University of Strathclyde, UK

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 October 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CIKM '11

Sponsor:

CIKM '11: International Conference on Information and Knowledge Management

October 24 - 28, 2011

Glasgow, Scotland, UK

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

56
Total Citations
View Citations
426
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)1

Reflects downloads up to 10 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Cendana MKuo R(2024)Categorical Data Clustering: A Bibliometric Analysis and TaxonomyMachine Learning and Knowledge Extraction10.3390/make60200476:2(1009-1054)Online publication date: 7-May-2024
https://doi.org/10.3390/make6020047
Tripathi KBiswas SKhare NShukla S(2024)Tackling Privacy Concerns in Correlated Big Data: A Comprehensive Review with Machine Learning Insights2024 IEEE International Students' Conference on Electrical, Electronics and Computer Science (SCEECS)10.1109/SCEECS61402.2024.10482215(1-6)Online publication date: 24-Feb-2024
https://doi.org/10.1109/SCEECS61402.2024.10482215
Bai LZhu L(2023)Keyword Coupling Query of Spatiotemporal XML DataUncertain Spatiotemporal Data Management for the Semantic Web10.4018/978-1-6684-9108-9.ch012(211-226)Online publication date: 15-Dec-2023
https://doi.org/10.4018/978-1-6684-9108-9.ch012
Wang CChi CYao LLiew AShen H(2023)Interdependence analysis on heterogeneous data via behavior interior dimensionsKnowledge-Based Systems10.1016/j.knosys.2023.110893279(110893)Online publication date: Nov-2023
https://doi.org/10.1016/j.knosys.2023.110893
Zhou XChen L(2022)Migrating social event recommendation over microblogsProceedings of the VLDB Endowment10.14778/3551793.355186415:11(3213-3225)Online publication date: 1-Jul-2022
https://dl.acm.org/doi/10.14778/3551793.3551864
Shahrivari FZlatanov N(2021)On Supervised Classification of Feature Vectors with Independent and Non-Identically Distributed ElementsEntropy10.3390/e2308104523:8(1045)Online publication date: 13-Aug-2021
https://doi.org/10.3390/e23081045
Biswas SKhare NAgrawal PJain P(2021)Machine learning concepts for correlated Big Data privacyJournal of Big Data10.1186/s40537-021-00530-x8:1Online publication date: 15-Dec-2021
https://doi.org/10.1186/s40537-021-00530-x
Shahrivari FZlatanov N(2021)An Asymptotically Optimal Algorithm For Classification of Data Vectors with Independent Non-Identically Distributed Elements2021 IEEE International Symposium on Information Theory (ISIT)10.1109/ISIT45174.2021.9518006(2637-2642)Online publication date: 12-Jul-2021
https://doi.org/10.1109/ISIT45174.2021.9518006
Wang YJiang HZhou HWang C(2021)An Improved Numerical DBSCAN Algorithm Based on Non-IIDness LearningIEEE Access10.1109/ACCESS.2021.30815009(117052-117066)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2021.3081500
Wang ZWang TWan BHan M(2020)Partial Classifier Chains with Feature Selection by Exploiting Label Correlation in Multi-Label ClassificationEntropy10.3390/e2210114322:10(1143)Online publication date: 10-Oct-2020
https://doi.org/10.3390/e22101143
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

A novel similarity/dissimilarity measure for intuitionistic fuzzy sets and its application in pattern recognition

A new similarity measure between intuitionistic fuzzy sets and the positive definiteness of the similarity matrix

Modified cosine similarity measure between intuitionistic fuzzy sets