article

Mining the interests of Chinese microbloggers via keyword extraction

Authors:

Maosong SunAuthors Info & Claims

Frontiers of Computer Science in China, Volume 6, Issue 1

Pages 76 - 87

Published: 01 February 2012 Publication History

Abstract

Microblogging provides a new platform for communicating and sharing information among Web users. Users can express opinions and record daily life using microblogs. Microblogs that are posted by users indicate their interests to some extent. We aim to mine user interests via keyword extraction from microblogs. Traditional keyword extraction methods are usually designed for formal documents such as news articles or scientific papers. Messages posted by microblogging users, however, are usually noisy and full of new words, which is a challenge for keyword extraction. In this paper, we combine a translation-based method with a frequency-based method for keyword extraction. In our experiments, we extract keywords for microblog users from the largest microblogging website in China, Sina Weibo. The results show that our method can identify users' interests accurately and efficiently.

References

[1]

Kwak H, Lee C, Park H, Moon S. What is Twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web. 2010, 591-600.

Digital Library

[2]

Liu Z, Chen X, Zheng Y, Sun M. Automatic keyphrase extraction by bridging vocabulary gap. In: Proceedings of the 15th Conference on Computational Natural Language Learning. 2011, 135-144.

[3]

Brown P F, Pietra S A D, Pietra V J D, Mercer R L. The mathematics of statistical machine translation: parameter estimation. Computational linguistics, 1993, 19(2): 263-311.

[4]

Koehn P. Statistical Machine Translation. Cambridge: Cambridge University Press, 2010.

[5]

Berger A L, Lafferty J D. Information retrieval as statistical translation. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 1999, 222-229.

Digital Library

[6]

Karimzadehgan M, Zhai C X. Estimation of statistical translation models based on mutual information for ad hoc information retrieval. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2010, 323-330.

[7]

Duygulu P, Barnard K, de Freitas J F G, Forsyth D A. Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In: Proceedings of the 7th European Conference on Computer Vision, Part IV. 2002, 97-112.

[8]

Berger A L, Caruana R, Cohn D, Freitag D, Mittal V O. Bridging the lexical chasm: statistical approaches to answer-finding. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2000, 192-199.

[9]

Echihabi A, Marcu D. A noisy-channel approach to question answering. In: Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics. 2003, 16-23.

Digital Library

[10]

Murdock V, Croft W B. Simple translation models for sentence retrieval in factoid question answering. In: Proceedings of SIGIR 2004 Workshop on Information Retrieval for Question Answering. 2004.

[11]

Soricut R, Brill E. Automatic question answering using the web: beyond the factoid. Information Retrieval, 2006, 9(2): 191-206.

Digital Library

[12]

Xue X, Jeon J, Croft W B. Retrieval models for question and answer archives. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2008, 475-482.

Digital Library

[13]

Riezler S, Vasserman A, Tsochantaridis I, Mittal V, Liu Y. Statistical machine translation for query expansion in answer retrieval. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics. 2007, 464-471.

[14]

Riezler S, Liu Y, Vasserman A. Translating queries into snippets for improved query expansion. In: Proceedings of the 22nd International Conference on Computational Linguistics. 2008, 737-744.

Digital Library

[15]

Riezler S, Liu Y. Query rewriting using monolingual statistical machine translation. Computational Linguistics, 2010, 36(3): 569-582.

Digital Library

[16]

Banko M, Mittal V O, Witbrock M J. Headline generation based on statistical translation. In: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics. 2000, 318-325.

Digital Library

[17]

Liu Z, Wang H, Wu H, Li S. Collocation extraction using monolingual word alignment method. In: Proceedings of 2009 Conference on Empirical Methods in Natural Language Processing. 2009, 487-495.

[18]

Liu Z, Wang H, Wu H, Li S. Improving statistical machine translation with monolingual collocation. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. 2010, 825-833.

Digital Library

[19]

Quirk C, Brockett C, Dolan W B. Monolingual machine translation for paraphrase generation. In: Proceedings of 2004 Conference on Empirical Methods in Natural Language Processing. 2004, 142-149.

[20]

Zhao S, Wang H, Liu T. Paraphrasing with search engine query logs. In: Proceedings of the 23rd International Conference on Computational Linguistics. 2010, 1317-1325.

Digital Library

[21]

Frank E, Paynter G W, Witten I H, Gutwin C, Nevill-Manning C G. Domain-specific keyphrase extraction. In: Proceedings of the 16th International Joint Conference on Artificial Intelligence. 1999, 668-673.

Digital Library

[22]

Witten I H, Paynter G W, Frank E, Gutwin C, Nevill-Manning C G. Kea: practical automatic keyphrase extraction. In: Proceedings of 4th ACM conference on Digital Libraries. 1999, 254-255.

Digital Library

[23]

Turney P D. Learning algorithms for keyphrase extraction. Information Retrieval, 2000, 2(4): 303-336.

Digital Library

[24]

Salton G, Buckley C. Term-weighting approaches in automatic text retrieval. Information Processing and Management, 1988, 24(5): 513-523.

Digital Library

[25]

Mihalcea R, Tarau P. Textrank: bringing order into texts. In: Proceedings of 2004 Conference on Empirical Methods in Natural Language Processing. 2004, 404-411.

[26]

Page L, Brin S, Motwani R, Winograd T. The pagerank citation ranking: bringing order to the web. Technical Report, Stanford Digital Library Technologies Project, 1998.

[27]

Landauer T K, Foltz PW, Laham D. An introduction to latent semantic analysis. Discourse Processes, 1998, 25(2&3): 259-284.

[28]

Hofmann T. Probabilistic latent semantic indexing. In: Proceedings of 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 1999, 50-57.

Digital Library

[29]

Blei D M, Ng A Y, Jordan M I. Latent Dirichlet allocation. Journal of Machine Learning Research, 2003, 3: 993-1022.

Digital Library

[30]

Heinrich G. Parameter estimation for text analysis. http://www.arbylon.net/publications/text-est

[31]

Blei D M, Lafferty J D. Topic Models. In: Srivastava A, Sahami M, eds. Text Mining: Classification, Clustering, and Applications. London: Chapman & Hall, 2009.

[32]

Zhao D, Rosson MB. How and why people twitter: the role that microblogging plays in informal communication at work. In: Proceedings of ACM 2009 International Conference on Supporting Group Work. 2009, 243-252.

Digital Library

[33]

Savage N. Twitter as medium and message. Communications of the ACM, 2011, 54(3): 18-20.

Digital Library

[34]

Zhao W X, Jiang J, Weng J, He J, Lim E, Yan H, Li X. Comparing twitter and traditional media using topic models. In: Proceedings of the 33rd European Conference on IR Research. 2011, 338-349.

Digital Library

[35]

Java A, Song X, Finin T, Tseng B.Why we Twitter: understanding microblogging usage and communities. In: Proceedings of 9th WebKDD and 1st SNA-KDD 2007 Workshop on Web Mining and Social Network Analysis. 2007, 56-65.

Digital Library

[36]

Teevan J, Ramage D,Morris MR. #Twittersearch: a comparison of microblog search and web search. In: Proceedings of the 4th International Conference on Web Search and Web Data Mining. 2011, 35-44.

[37]

Mustafaraj E, Metaxas P. From obscurity to prominence in minutes: political speech and real-time search. In: Proceedings of Web Science Conference. 2010.

[38]

Phelan O, McCarthy K, Smyth B. Using twitter to recommend real-time topical news. In: Proceedings of the 3rd ACM conference on Recommender systems. 2009, 385-388.

Digital Library

[39]

Sakaki T, Okazaki M, Matsuo Y. Earthquake shakes twitter users: real-time event detection by social sensors. In: Proceedings of the 19th International Conference on World Wide Web. 2010, 851-860.

Digital Library

[40]

Culotta A. Detecting influenza outbreaks by analyzing twitter messages. In: Proceedings of KDD Workshop on Social Media Analytics. 2010.

Digital Library

[41]

Earle P S, Guy M, Ostrum C, Horvath S, Buckmaster R A. Omg earthquake! Can twitter improve earthquake response? In: Proceedings of 2009 AGU Fall Meeting Abstracts, Vol 1. 2009.

[42]

Petrovic S, Osborne M, Lavrenko V. Streaming first story detection with application to Twitter. In: Proceedings of 2010 Human Language Technologies: Conference of the North American Chapter of the Association for Computational Linguistics. 2010, 181-189.

Digital Library

[43]

Cha M, Haddadi H, Benevenuto F, Gummadi K P. Measuring user influence in Twitter: the million follower fallacy. In: Proceedings of the 4th International AAAI Conference on Weblogs and Social. 2010, 10-17.

[44]

Tumasjan A, Sprenger T O, Sandner P G, Welpe I M. Predicting elections with Twitter: what 140 characters reveal about political sentiment. In: Proceedings of the 4th International AAAI Conference on Weblogs and Social Media. 2010, 178-185.

[45]

OConnor B, Balasubramanyan R, Routledge B R, Smith N A. From tweets to polls: linking text sentiment to public opinion time series. In: Proceedings of the 4th International AAAI Conference on Weblogs and Social Media. 2010, 122-129.

[46]

Pak A, Paroubek P. Twitter as a corpus for sentiment analysis and opinion mining. In: Proceedings of International Conference on Language Resources and Evaluation. 2010.

[47]

Jiang L, Yu M, Zhou M, Liu X, Zhao T. Target-dependent twitter sentiment classification. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Vol 1. 2011, 151-160.

Digital Library

[48]

Agarwal A, Xie B, Vovsha I, Rambowand O, Passonneau R. Sentiment analysis of twitter data. In: Proceedings of Workshop on Language in Social Media. 2011, 30-38.

Digital Library

[49]

Qu Z, Liu Y. Interactive group suggesting for twitter. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Vol 2. 2011, 519-523.

Digital Library

[50]

Huang J, Thornton K M, Efthimiadis E N. Conversational tagging in Twitter. In: Proceedings of the 21st ACM Conference on Hypertext and Hypermedia. 2010, 173-178.

[51]

Efron M. Hashtag retrieval in a microblogging environment. In: Proceeding of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2010, 787-788.

[52]

Gimpel K, Schneider N, Oonnor B, Das D, Mills D, Eisenstein J, Heilman M, Yogatama D, Flanigan J, Smith N A. Part-of-speech tagging for twitter: annotation, features, and experiments. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Vol 2. 2011, 42-47.

[53]

Finin T, Murnane W, Karandikar A, Keller N, Martineau J, Dredze M. Annotating named entities in twitter data with crowdsourcing. In: Proceedings of NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk. 2010, 80-88.

Digital Library

[54]

Liu X, Zhang S,Wei F, Zhou M. Recognizing named entities in tweets. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Human Language Technologies, Vol 1. 2011, 359-367.

Digital Library

[55]

Ritter A, Clark S, Mausam, Etzioni O. Named entity recognition in tweets: an experimental study. In: Proceedings of 2011 Conference on Empirical Methods in Natural Language Processing. 2011, 1524-1534.

[56]

Han B, Baldwin T. Lexical normalisation of short text messages: makn sens a #twitter. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Vol 1. 2011, 368-378.

Digital Library

[57]

Wu W, Zhang B, Ostendorf M. Automatic generation of personalized annotation tags for twitter users. In: Proceedings of Human Language Technologies: Conference of the North American Chapter of the Association of. 2010, 689-692.

[58]

Zhang K, Sun M. A stacked model based on word lattice for Chinese word segmentation and part-of-speech tagging. http://nlp.csai.tsinghua.edu.cn/thulac

[59]

Jiang W, Mi H, Liu Q. Word lattice reranking for Chinese word segmentation and part-of-speech tagging. In: Proceedings of the 22nd International Conference on Computational Linguistics. 2008, 385-392.

Digital Library

[60]

Viegas F B,Wattenberg M, Feinberg J. Participatory visualization with Wordle. IEEE Transactions on Visualization and Computer Graphics, 2009, 15(6): 1137-1144.

Digital Library

[61]

Och F J, Ney H. A systematic comparison of various statistical alignment models. Computational linguistics, 2003, 29(1): 19-51.

[62]

Wan X, Xiao J. Single document keyphrase extraction using neighborhood knowledge. In: Proceedings of the 23rd AAAI Conference on Artificial Intelligence. 2008, 855-860.

[63]

Liu Y, Liu Q, Lin S. Discriminative word alignment by linear modeling. Computational Linguistics, 2010, 36(3): 303-339.

Digital Library

Cited By

Zhang ZMao XZhang CLu Y(2022)ForkXplorer: an approach of fork summary generationFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-020-0047-416:2Online publication date: 1-Apr-2022
https://dl.acm.org/doi/10.1007/s11704-020-0047-4
Yu YLi B(2021)Microblog User Interest Recognition Based on Multi-Granularity Text Feature RepresentationThe 2nd International Conference on Computing and Data Science10.1145/3448734.3450886(1-10)Online publication date: 28-Jan-2021
https://dl.acm.org/doi/10.1145/3448734.3450886
Zhao WHou YChen JZhu JYin ESu HWen J(2020)Learning Semantic Representations from Directed Social Links to Tag Microblog Users at ScaleACM Transactions on Information Systems10.1145/337755038:2(1-30)Online publication date: 7-Mar-2020
https://dl.acm.org/doi/10.1145/3377550
Show More Cited By

Mining the interests of Chinese microbloggers via keyword extraction

Recommendations

Mining the Personal Interests of Microbloggers via Exploiting Wikipedia Knowledge
CICLing 2014: Proceedings of the 15th International Conference on Computational Linguistics and Intelligent Text Processing - Volume 8404

This paper focuses on an emerging research topic about mining microbloggers' personalized interest tags from their own microblogs ever posted. It based on an intuition that microblogs indicate the daily interests and concerns of microblogs. Previous ...
User interest mining via tags and bidirectional interactions on Sina Weibo

Sina Weibo, one of the biggest social services in China, provides users with opportunities to share information and express their personal views, leading an explosive growth of information. How to recommend the right information to the proper person ...
Automatic detection of rumor on Sina Weibo
MDS '12: Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics

The problem of gauging information credibility on social networks has received considerable attention in recent years. Most previous work has chosen Twitter, the world's largest micro-blogging platform, as the premise of research. In this work, we shift ...

Comments

Information & Contributors

Information

Published In

cover image Frontiers of Computer Science in China

Frontiers of Computer Science in China Volume 6, Issue 1

February 2012

130 pages

ISSN:1673-7350

Issue’s Table of Contents

Copyright © Copyright © 2012 Higher Education Press and Springer-Verlag Berlin Heidelberg.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 February 2012

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

20
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 25 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang ZMao XZhang CLu Y(2022)ForkXplorer: an approach of fork summary generationFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-020-0047-416:2Online publication date: 1-Apr-2022
https://dl.acm.org/doi/10.1007/s11704-020-0047-4
Yu YLi B(2021)Microblog User Interest Recognition Based on Multi-Granularity Text Feature RepresentationThe 2nd International Conference on Computing and Data Science10.1145/3448734.3450886(1-10)Online publication date: 28-Jan-2021
https://dl.acm.org/doi/10.1145/3448734.3450886
Zhao WHou YChen JZhu JYin ESu HWen J(2020)Learning Semantic Representations from Directed Social Links to Tag Microblog Users at ScaleACM Transactions on Information Systems10.1145/337755038:2(1-30)Online publication date: 7-Mar-2020
https://dl.acm.org/doi/10.1145/3377550
Li LLiu JSun YXu GYuan JZhong L(2018)Unsupervised keyword extraction from microblog posts via hashtagsJournal of Web Engineering10.5555/3370048.337005317:1-2(93-120)Online publication date: 1-Mar-2018
https://dl.acm.org/doi/10.5555/3370048.3370053
Deng LJia YZhou BHuang JHan Y(2018)User interest mining via tags and bidirectional interactions on Sina WeiboWorld Wide Web10.1007/s11280-017-0469-621:2(515-536)Online publication date: 1-Mar-2018
https://dl.acm.org/doi/10.1007/s11280-017-0469-6
Zhu JMa SZhang HHu CLi X(2018)Incorporating User Grouping into Retweeting Behavior ModelingDatabase Systems for Advanced Applications10.1007/978-3-319-91452-7_31(474-490)Online publication date: 21-May-2018
https://dl.acm.org/doi/10.1007/978-3-319-91452-7_31
Ma HJia MZhang DLin X(2017)Combining tag correlation and user social relation for microblog recommendationInformation Sciences: an International Journal10.1016/j.ins.2016.12.047385:C(325-337)Online publication date: 1-Apr-2017
https://dl.acm.org/doi/10.1016/j.ins.2016.12.047
Li LQi LDeng FXiong SYuan J(2016)Enhancing keyword suggestion of web serach by leveraging microblog dataJournal of Web Engineering10.5555/3177210.317721115:3-4(181-202)Online publication date: 1-Jul-2016
https://dl.acm.org/doi/10.5555/3177210.3177211
Liu XWang MHuet B(2016)Event analysis in social multimediaFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-015-4583-210:3(433-446)Online publication date: 1-Jun-2016
https://dl.acm.org/doi/10.1007/s11704-015-4583-2
Zhou PLiu JYang XCui XChang LZhang S(2016)Automatically constructing course dependence graph based on association semantic link modelPersonal and Ubiquitous Computing10.1007/s00779-016-0950-820:5(731-742)Online publication date: 1-Oct-2016
https://dl.acm.org/doi/10.1007/s00779-016-0950-8
Show More Cited By

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents