Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2857218.2857222acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmedesConference Proceedingsconference-collections
short-paper

Unsupervised cue-words discovery for tag-sense disambiguation: comparing dissimilarity metrics

Published: 25 October 2015 Publication History

Abstract

Although tagging simplifies resource browsing and retrieval, it suffers from several issues: among them are redundancy and ambiguity. In this work we focus on the problem of resolving tag word-sense ambiguity within a typical semi-automatic tagging procedure. In that process a user proposes a tag for a resource, if the tag is found to be related to more than one context, she is provided with two or more cues among which to choose, so as to remove the tag ambiguity. Key phases, in such a disambiguation procedure, are ambiguous tag detection and cue discovery. Both should rely on effective word-to-context relatedness metrics. Among the most effective relatedness metrics are those defined on the basis of a feature vector representation of the words. In this work we compare different word-to-context relatedness metrics in terms of effectiveness within the disambiguation process. We propose to use a metrics derived from a Maximum Likelihood estimator of the Jensen-Shannon Divergence among feature-count histograms and we show that such a metrics performs -- in terms of quality of the output -- better than both the Jensen-Shannon and the Symmetrized Kullback-Leibler divergence between histograms. We study the relative gain in quality within the task of unsupervised cue discovery by using a synthetic language corpus.

References

[1]
A. Mathes, "Folksonomies-cooperative classification and communication through shared metadata," Computer Mediated Communication, vol. 47, no. 10, 2004.
[2]
J. Gemmell, M. Ramezani, T. Schimoler, L. Christiansen, and B. Mobasher, The impact of ambiguity and redundancy on tag recommendation in folksonomies, RecSys '09. New York, NY, USA: ACM, 2009, pp. 45--52.
[3]
K. Q. Weinberger, M. Slaney, and R. Van Zwol, "Resolving tag ambiguity," in Proceedings of the 16th ACM international conference on Multimedia, ser. MM '08. New York, NY, USA: ACM, 2008, pp. 111--120. {Online}. Available: http://doi.acm.org/10.1145/1459359.1459375
[4]
G. Begelman, P. Keller, F. Smadja et al., "Automated tag clustering: Improving search and exploration in the tag space," in Collaborative Web Tagging Workshop at WWW2006, Edinburgh, Scotland, 2006, pp. 15--33.
[5]
J. Gemmell, A. Shepitsen, B. Mobasher, and R. Burke, "Personalization in folksonomies based on tag clustering," Intelligent techniques for web personalization & recommender systems, vol. 12, 2008.
[6]
S. Papadopoulos, Y. Kompatsiaris, and A. Vakali, "A graph-based clustering scheme for identifying related tags in folksonomies," in Proceedings of the 12th international conference on Data warehousing and knowledge discovery, ser. DaWaK'10. Berlin, Heidelberg: Springer-Verlag, 2010, pp. 65--76. {Online}. Available: http://dl.acm.org/citation.cfm?id=1881923.1881931
[7]
H. Mousselly-Sergieh, E. Egyed-Zsigmond, G. Gianini, M. Döller, J.-M. Pinon, and H. Kosch, "Tag relatedness in image folksonomies," Document numérique, vol. 17, no. 2, pp. 33--54, 2014.
[8]
"http://wordnet.princeton.edu/."
[9]
J. J. Jiang and D. W. Conrath, "Semantic similarity based on corpus statistics and lexical taxonomy," arXiv preprint cmp-lg/9709008, 1997.
[10]
A. Budanitsky and G. Hirst, "Evaluating wordnet-based measures of lexical semantic relatedness," Computational Linguistics, vol. 32, pp. 13--47, 2006.
[11]
L. Specia and E. Motta, "Integrating folksonomies with the semantic web," in Proceedings of the 4th European conference on The Semantic Web: Research and Applications, ser. ESWC '07. Berlin, Heidelberg: Springer-Verlag, 2007, pp. 624--639. {Online}. Available: http://dx.doi.org/10.1007/978-3-540-72667-8_44
[12]
E. Simpson, "Clustering Tags in Enterprise and Web Folksonomies," HP Labs Techincal Reports, 2008. {Online}. Available: http://www.hpl.hp.com/techreports/2008/HPL-2008-18.html
[13]
J. Gemmell, A. Shepitsen, B. Mobasher, and R. Burke, "Personalizing navigation in folksonomies using hierarchical tag clustering," in Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery, ser. DaWaK '08. Berlin, Heidelberg: Springer, 2008, pp. 196--205. {Online}. Available: http://dx.doi.org/10.1007/978-3-540-85836-2_19
[14]
C. Manning and H. Schütze, Foundations of statistical natural language processing. MIT press, 1999.
[15]
N. Ljubešić, D. Boras, N. Bakarić, and J. Njavro, "Comparing measures of semantic similarity," in 2008.
[16]
S. Kullback and R. A. Leibler, "On information and sufficiency," The Annals of Mathematical Statistics, vol. 22, no. 1, pp. 79--86, 1951.

Cited By

View all
  • (2016)Selecting Feature-Words in Tag Sense Disambiguation Based on Their Shapley Value2016 12th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS)10.1109/SITIS.2016.45(236-240)Online publication date: 2016

Index Terms

  1. Unsupervised cue-words discovery for tag-sense disambiguation: comparing dissimilarity metrics

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    MEDES '15: Proceedings of the 7th International Conference on Management of computational and collective intElligence in Digital EcoSystems
    October 2015
    271 pages
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    • The French Chapter of ACM Special Interest Group on Applied Computing
    • IFSP: Federal Institute of São Paulo

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 October 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Jensen-Shannon divergence
    2. disambiguation
    3. dissimilarity metrics
    4. retrieval models and ranking
    5. semantic relatedness
    6. similarity measures
    7. tagging

    Qualifiers

    • Short-paper

    Conference

    MEDES '15
    Sponsor:
    • IFSP

    Acceptance Rates

    MEDES '15 Paper Acceptance Rate 13 of 64 submissions, 20%;
    Overall Acceptance Rate 267 of 682 submissions, 39%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 22 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2016)Selecting Feature-Words in Tag Sense Disambiguation Based on Their Shapley Value2016 12th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS)10.1109/SITIS.2016.45(236-240)Online publication date: 2016

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media