Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2539150.2539197acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiiwasConference Proceedingsconference-collections
research-article

Two Phase Extraction Method for Multi-label Classification of Real Life Tweets

Published: 02 December 2013 Publication History

Abstract

Recently, many users share their daily events and opinions on Twitter. Some are beneficial and comment on several aspects of a user's real life, i.e., eating, traffic, weather, disasters, and so on. Such posts as "The train is not coming!" are categorized in the "Traffic" aspect and will support users who want to ride the train. Such tweets as "The train is not coming due to heavy rain" are categorized in both the "Traffic" and "Weather" aspects. In this paper, we propose a multi-label method that estimates appropriate aspects against unknown tweets by extending the two phase extraction method. In it, many topics are extracted from a sea of tweets using Latent Dirichlet Allocation (LDA). Associations among many topics and fewer aspects are built using a small set of labeled tweets. Aspect scores for unknown tweets are calculated using the associations among the topics and the aspects based on the extracted terms. Appropriate aspects are labeled for unknown tweets by averaging of the aspect scores. Using a large amount of actual tweets, our sophisticated experimental evaluations demonstrate the high efficiency of our proposed multi-label classification method. When an aspect score is much larger than others, that aspect is estimated against the tweet. When several aspect scores are large within similar values, these aspects are estimated. Based on the experimental evaluation results, our prototype system demonstrates that our proposed method can appropriately estimate some aspects of each unknown tweets.

References

[1]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. The Journal of Machine Learning Research, 3:993--1022, 2003.
[2]
J. Bollen, A. Pepe, and H. Mao. Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. In Proceedings of the 2010 World Wide Web, pages 450--453, 2010.
[3]
J. Cohen. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1):37--46, 1960.
[4]
N. A. Diakopoulous and D. A. Shamma. Characterizing debate performance via aggregated twitter sentiment. Proceedings of CHI 2010, pages 1195--1198, 2010.
[5]
T. L. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of the National Academy of Science, 101:5228--5235, 2004.
[6]
L. Hong and B. D. Davison. Empirical study of topic modeling in twitter. In proceedings of the First Workshop on Social Media Analytics, pages 80--88, 2010.
[7]
K. Inui, S. Abe, H. Morita, M. Eguchi, A. Sumida, C. Sao, K. Hara, K. Murakami, and S. Matsuyoshi. Experience mining: Building a large-scale database of personal experiences and opinions from web documents. In Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence, pages 314--321, 2008.
[8]
W. Karen. Celebrating #twitter7. https://blog.twitter.com/2013/celebrating_twitter7, March 2013.
[9]
T. Kurashima, T. Tezuka, and K. Tanaka. Extracting and geographically mapping visitor experiences from urban blogs. The 6th International Conference on Web Information Systems Engineering, pages 496--503, 2005.
[10]
M. Mathioudakis and N. Koudas. Twittermonitor: trend detection over the twitter stream. In Proceedings of the 2010 International Conference on Management of Data, pages 1155--1158, 2010.
[11]
MeCab. Yet another part-of-speech and morphological analyzer. http://mecab.sourceforge.net/, 2005.
[12]
M. Michelson and S. A. MacsKassy. Discovering users' topics of interest on twitter: a first look. In proceedings of the fourth workshop on Analytics for noisy unstructured text data, pages 73--80, 2010.
[13]
D. Ramage, S. Dumais, and D. Liebling. Characterizing microblogs with topic models. In Proceedings of the 4th Int'l AAAI Conference on Weblogs and Social Media, pages 130--137, 2010.
[14]
T. Sakaki, M. Okazaki, and Y. Matsuo. Earthquake shakes twitter users: Real-time event detection by social sensors. In Proceedings of 18th International World Wide Web Conference, pages 851--860, 2010.
[15]
M. Yamamoto, H. Ogasawara, I. Suzuki, and M. Furukawa. Tourism informatics:9. information propagation network for 2012 tohoku earthquake and tsunami on twitter. IPSJ Magazine, 53(11):1184--1191, 2012 (in Japanese).
[16]
S. Yamamoto and T. Satoh. Real life information extraction method from twitter. The 4th Forum on Data Engineering and Information Management, 2012 (in Japanese).
[17]
S. Yamamoto and T. Satoh. Two phase extraction method for extracting real life tweets using lda. The 15th Asia-Pacific Web Conference, pages 340--347, 2013.
[18]
Y. Yang and J. O. Pedersen. A comparative study on feature selection in text categorization. In proceedings of the 14th International Conference on Machine Learning, pages 412--420, 1997.
[19]
X. Zhao, J. Jiang, J. He, Y. Song, P. Achananuparp, E. P. LIM, and X. Li. Topical key phrase extraction from twitter. The 49th Annual Meeting of the Association for Computational Linguistics, pages 379--388, 2011.

Cited By

View all
  • (2021)Design of a Novel Query System for Social NetworkJournal of Information Technology Research10.4018/JITR.201904011012:2(175-193)Online publication date: 23-Mar-2021
  • (2017)A Tweet Visualization System for Composite Facilities based on Spatio-Temporal Analysis of Geo-Tagged TweetsTransactions of the Japanese Society for Artificial Intelligence10.1527/tjsai.WII-I32:1(WII-I_1-11)Online publication date: 2017
  • (2014)Two phase estimation method for multi-classifying real life tweetsInternational Journal of Web Information Systems10.1108/IJWIS-04-2014-001310:4(378-393)Online publication date: 11-Nov-2014

Index Terms

  1. Two Phase Extraction Method for Multi-label Classification of Real Life Tweets

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    IIWAS '13: Proceedings of International Conference on Information Integration and Web-based Applications & Services
    December 2013
    753 pages
    ISBN:9781450321136
    DOI:10.1145/2539150
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    In-Cooperation

    • @WAS: International Organization of Information Integration and Web-based Applications and Services

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 02 December 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Latent Dirichlet Allocation
    2. Multi-label
    3. Real Life
    4. Twitter
    5. Two Phase Extraction

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    IIWAS '13

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 26 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)Design of a Novel Query System for Social NetworkJournal of Information Technology Research10.4018/JITR.201904011012:2(175-193)Online publication date: 23-Mar-2021
    • (2017)A Tweet Visualization System for Composite Facilities based on Spatio-Temporal Analysis of Geo-Tagged TweetsTransactions of the Japanese Society for Artificial Intelligence10.1527/tjsai.WII-I32:1(WII-I_1-11)Online publication date: 2017
    • (2014)Two phase estimation method for multi-classifying real life tweetsInternational Journal of Web Information Systems10.1108/IJWIS-04-2014-001310:4(378-393)Online publication date: 11-Nov-2014

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media