Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1498759.1498810acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Classifying tags using open content resources

Published: 09 February 2009 Publication History

Abstract

Tagging has emerged as a popular means to annotate on-line objects such as bookmarks, photos and videos. Tags vary in semantic meaning and can describe different aspects of a media object. Tags describe the content of the media as well as locations, dates, people and other associated meta-data. Being able to automatically classify tags into semantic categories allows us to understand better the way users annotate media objects and to build tools for viewing and browsing the media objects. In this paper we present a generic method for classifying tags using third party open content resources, such as Wikipedia and the Open Directory. Our method uses structural patterns that can be extracted from resource meta-data. We describe the implementation of our method on Wikipedia using WordNet categories as our classification schema and ground truth. Two structural patterns found in Wikipedia are used for training and classification: categories and templates. We apply our system to classifying Flickr tags. Compared to a WordNet baseline our method increases the coverage of the Flickr vocabulary by 115%. We can classify many important entities that are not covered by WordNet, such as, London Eye, Big Island, Ronaldinho, geocaching and wii.

References

[1]
S. Auer and J. Lehmann. What have Innsbruck and Leipzig in common? In Proc. of ESWC, pages 503--517, 2007.
[2]
R. Bunescu and M. Pasca. Using encyclopedic knowledge for named entity disambiguation. In Proc. of EACL, pages 9--16, 2006.
[3]
D. Buscaldi, P. Rosso, and P. García. Inferring geographic ontologies from multiple resources for geographical information retrieval. In Proc. of the SIGIR workshop on GIR, pages 53--55, 2006.
[4]
P. Clough, A. Al-Maskari, and K. Darwish. Providing multilingual access to Flickr for arabic users. In Proc. of CLEF, 2006.
[5]
S. Cucerzan. Large-scale named entity disambiguation based on Wikipedia data. In Proc. of EMNLP-CoNLL, pages 708--716, 2007.
[6]
DBpedia. http://dbpedia.org/. Accessed 5 Dec 08.
[7]
Delicious. http://del.icio.us/. Accessed 5 Dec 08.
[8]
Flickr. http://www.Flickr.com/. Accessed 5 Dec 08.
[9]
FlickrAPI. http://www.flickr.com/services/api/. Accessed 5 Dec 08.
[10]
T. Joachims. Making large-scale SVM learning practical. In Advances in Kernal Methods - Support Vector Learning, pages 41--56, 1998.
[11]
R. Mihalcea. Using wikipedia for automatic word sense disambiguation. In Proc. of NAACL, pages 196--203, 2007.
[12]
S. Overell and S. Rüger. Geographic co-occurrence as a tool for GIR. In Proc. of the CIKM workshop on GIR, 2007.
[13]
T. Rattenbury, N. Good, and M. Naaman. Towards automatic extraction of event and place semantics from flickr tags. In Proc. of SIGIR, pages 103--110, 2007.
[14]
M. Ruiz-Casado, E. Alfonseca, and P. Castells. Automatic assignment of Wikipedia encyclopedic entries to WordNet synsets. In Proc. of AWIC, pages 380--386, 2005.
[15]
P. Schmitz. Inducing an ontology from flickr tags. In Proc. of the Workshop on Collaborative Web Tagging at WWW'06, 2006.
[16]
B. Sigurbjörnsson and R. van Zwol. Flickr tag recommendation based on collective knowledge. In Proc. of WWW'08, pages 327--336, 2008.
[17]
F. Suchanek, G. Kasneci, and G. Weikem. YAGO: A core of semantic knowledge unifying WordNet and Wikipedia. In Proc. of WWW'07, pages 697--706, 2007.
[18]
TagExplorer. http://sandbox.yahoo.com/TagExplorer. Accessed 5 Dec 08.
[19]
G. Weaver, B. Strickland, and G. Crane. Quantifying the accuracy of relational statements in Wikipedia: A methodology. In Proc. of JCDL, pages 358--358, 2006.
[20]
Wikipedia. http://www.wikipedia.org/. Accessed 5 Dec 08.
[21]
WordNet. http://wordnet.princeton.edu/. Accessed 5 Dec 08.
[22]
P. Yee, K. Swearingen, K. Li, and M. Hearst. Faceted metadata for image search and browsing. In Proc. of ACM CHI, pages 401--408, 2003.
[23]
YouTube. http://youtube.com/. Accessed 5 Dec 08.

Cited By

View all
  • (2021)Understanding the Role of Political Micro-influencers in Pakistan2021 International Conference on Frontiers of Information Technology (FIT)10.1109/FIT53504.2021.00016(31-36)Online publication date: Dec-2021
  • (2021)Classification of educational videos by using a semi-supervised learning method on transcripts and keywordsNeurocomputing10.1016/j.neucom.2020.11.075456:C(637-647)Online publication date: 7-Oct-2021
  • (2019)A Semi-supervised Method to Classify Educational VideosHybrid Artificial Intelligent Systems10.1007/978-3-030-29859-3_19(218-228)Online publication date: 26-Aug-2019
  • Show More Cited By

Index Terms

  1. Classifying tags using open content resources

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data Mining
    February 2009
    314 pages
    ISBN:9781605583907
    DOI:10.1145/1498759
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 February 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Flickr
    2. categorization
    3. multimedia annotation
    4. user-generated content
    5. wikipedia

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    WSDM'09
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 498 of 2,863 submissions, 17%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 15 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)Understanding the Role of Political Micro-influencers in Pakistan2021 International Conference on Frontiers of Information Technology (FIT)10.1109/FIT53504.2021.00016(31-36)Online publication date: Dec-2021
    • (2021)Classification of educational videos by using a semi-supervised learning method on transcripts and keywordsNeurocomputing10.1016/j.neucom.2020.11.075456:C(637-647)Online publication date: 7-Oct-2021
    • (2019)A Semi-supervised Method to Classify Educational VideosHybrid Artificial Intelligent Systems10.1007/978-3-030-29859-3_19(218-228)Online publication date: 26-Aug-2019
    • (2019)Places in Information ScienceJournal of the Association for Information Science and Technology10.1002/asi.2419470:11(1173-1182)Online publication date: 6-Oct-2019
    • (2018)Accessing Information with Tags: Search and RankingSocial Information Access10.1007/978-3-319-90092-6_9(310-343)Online publication date: 3-May-2018
    • (2016)Using Wikipedia for Cross-Language Named Entity RecognitionBig Data Analytics in the Social and Ubiquitous Context10.1007/978-3-319-29009-6_1(1-25)Online publication date: 7-Jan-2016
    • (2015)Linked tagMultimedia Tools and Applications10.1007/s11042-014-1855-z74:7(2273-2287)Online publication date: 1-Apr-2015
    • (2015)An Integrated Tag Recommendation Algorithm Towards Weibo User ProfilingDatabase Systems for Advanced Applications10.1007/978-3-319-18120-2_21(353-373)Online publication date: 9-Apr-2015
    • (2014)Using Wikipedia for cross-language named entity recognitionProceedings of the 5th and 1st International Conference on Big Data Analytics in the Social and Ubiquitous Context - 5th International Workshop on Modeling Social Media, 5th International Workshop on Mining Ubiquitous and Social Environments and First International Workshop on Machine Learning for Urban Sensor Data10.5555/3120818.3120819(1-25)Online publication date: 1-Jan-2014
    • (2014)Identifying Points of Interest Using Heterogeneous FeaturesACM Transactions on Intelligent Systems and Technology10.1145/26681115:4(1-27)Online publication date: 15-Dec-2014
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media