research-article

TexRep: A Text Mining Framework for Online Reputation Monitoring

Authors:

Eduarda Mendes Rodrigues,

Eugénio OliveiraAuthors Info & Claims

Volume 35, Issue 4

Pages 365 - 389

https://doi.org/10.1007/s00354-017-0021-3

Published: 01 October 2017 Publication History

Abstract

This work aims to understand, formalize and explore the scientific challenges of using unstructured text data from different Web sources for Online Reputation Monitoring. We here present TexRep, an adaptable text mining framework specifically tailored for Online Reputation Monitoring that can be reused in multiple application scenarios, from politics to finance. This framework is able to collect texts from online media, such as Twitter, and identify entities of interest and classify sentiment polarity and intensity. The framework supports multiple data aggregation methods, as well as visualization and modeling techniques that can be used for both descriptive analytics, such as analyze how political polls evolve over time, and predictive analytics, such as predict elections. We here present case studies that illustrate and validate TexRep for Online Reputation Monitoring. In particular, we provide an evaluation of TexRep Entity Filtering and Sentiment Analysis modules using well known external benchmarks. We also present an illustrative example of TexRep application in the political domain.

References

[1]

Van Riel, C.B.M., Fombrun, C.J., et al.: Essentials of Corporate Communication: Implementing Practices for Effective Reputation Management. Routledge (2007)

[2]

Atvesson, M.: Organization: from substance to image? Org. Stud. 11(3), 373–394 (1990)

[3]

Maynard, D., Bontcheva, K., Rout, D.: Challenges in developing opinion mining tools for social media. In: Proceedings of @ NLP can u tag# usergeneratedcontent (2012)

[4]

Kaufmann, M., Portmann, E., Fathi, M.: A concept of semantics extraction from web data by induction of fuzzy ontologies. In: Electro/Information Technology (EIT), 2013 IEEE International Conference on, pp. 1–6. IEEE (2013)

[5]

Portmann, E.: The FORA Framework: A Fuzzy Grassroots Ontology for Online Reputation Management. Springer, New York (2012)

[6]

Gonzalo, J.: Monitoring reputation in the wild online west. In: Proceedings of the 4th Spanish Conference on Information Retrieval, p. 1. ACM (2016)

[7]

Amigó, E., de Albornoz, J.C., Chugur, I., Corujo, A., Gonzalo, J., Martín, T., Meij, E., de Rijke, M., Spina, D.: Overview of replab 2013: evaluating online reputation monitoring systems. CLEF (2013)

[8]

Samangooei, S., Cohn, T., Gibbins, N., Niranjan, M.: Trendminer: an architecture for real time analysis of social media text. In: ICWSM (2012)

[9]

Khalili, A., Auer, S., Ngomo, A.-C.N.: Context–lightweight text analytics using linked data. In: European Semantic Web Conference, pp. 628–643. Springer, New York (2014)

[10]

Saleiro, P., Amir, S., Silva, M., Soares, C.: Popmine: Tracking political opinion on the web. In: Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing (CIT/IUCC/DASC/PICOM), 2015 IEEE International Conference on, pp. 1521–1526. IEEE (2015)

[11]

Kulkarni, S., Singh, A., Ramakrishnan, G., Chakrabarti, S.: Collective annotation of Wikipedia entities in web text. In: ACM, SIGKDD (2009)

[12]

Spina, D., Amigó, E., Gonzalo, J.: Filter keywords and majority class strategies for company name disambiguation in twitter. In: CLEF, Springer, New York (2011)

[13]

Munoz, A.D.D., Unanue, R.M., Garcıa-Plaza, A.P., Fresno, V.: Unsupervised real-time company name disambiguation in twitter. In: ICWSM Workshop on Real-Time Analysis and Mining of Social Streams, pp. 25–28 (2012)

[14]

Christoforaki, M., Erunse, I., Yu, C.: Searching social updates for topic-centric entities. In: VLDS, pp. 34–39 (2011)

[15]

Hangya, V., Farkas, R.: Filtering and polarity detection for reputation management on tweets. In: CLEF (Working Notes) (2013)

[16]

Davis, A., Veloso, A., Da Silva, A.S., Meira Jr., W., Laender, A.H.F.: Named entity disambiguation in streaming data. In: ACL: Long Papers-Volume 1, pp. 815–824. Association for Computational Linguistics (2012)

[17]

Habib, M.B., Van Keulen, M.: Twitterneed: a hybrid approach for named entity extraction and disambiguation for tweet. Nat. Lang. Eng. 22(03), 423–456 (2016)

[18]

Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: Proceedings of the 16th International Conference on World Wide Web, pp. 697–706. ACM (2007)

[19]

Ferragina, P., Scaiella, U.: Tagme: on-the-fly annotation of short text fragments (by Wikipedia entities). In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pp. 1625–1628. ACM (2010)

[20]

Moro, A., Raganato, A., Navigli, R.: Entity linking meets word sense disambiguation: a unified approach. Trans. Assoc. Comput. Linguist. 2, 231–244 (2014)

[21]

Piccinno, F., Ferragina, P.: From TagME to WAT: a new entity annotator. In: Proceedings of the First International Workshop on Entity Recognition and Disambiguation, pp. 55–62. ACM (2014)

[22]

He, Z., Shujie Liu, M., Li, M.Z., Zhang, L., Wang, H.: Learning entity representation for entity disambiguation. ACL 2, 30–34 (2013)

[23]

Fang, W., Zhang, J., Wang, D., Chen, Z., Li, M.: Entity disambiguation by knowledge and text jointly embedding. In: CoNLL 2016, p. 260 (2016)

[24]

Moreno, J.G., Besançon, R., Beaumont, R., Dhondt, E., Ligozat, A.-L., Rosset, S., Tannier, X., Grau, B.: Combining word and entity embeddings for entity linking. In: European Semantic Web Conference, pp. 337–352. Springer, New York (2017)

[25]

Liu, B.: Sentiment analysis and opinion mining. Synth. Lect. Hum. Lang. Technol. 5(1), 1–167 (2012)

[26]

Rosenthal, S., Nakov, P., Kiritchenko, S., Mohammad, S.M., Ritter, A., Stoyanov, V.: Semeval-2015 task 10: Sentiment analysis in twitter. In: Proceedings of SemEval-2015 (2015)

[27]

Mohammad, S., Kiritchenko, S., Zhu, X.: Nrc-canada: building the state-of-the-art in sentiment analysis of tweets. In: SemEva, pp. 321–327, Atlanta, GA (2013). Association for Computational Linguistics

[28]

Kouloumpis, E., Wilson, T., Moore, J.D.: Twitter sentiment analysis: the good the bad and the omg! Icwsm 11, 538–541 (2011)

[29]

Bamman, D., Smith, N.A.: Contextualized sarcasm detection on twitter. In: Proceedings of the 9th International Conference on Web and Social Media, pp. 574–77. AAAI Menlo Park, CA (2015)

[30]

Liu, B.: Sentiment analysis and subjectivity. Handb. Nat. Lang. Process. 2, 627–666 (2010)

[31]

Thelwall, M., Buckley, K., Paltoglou, G.: Sentiment strength detection for the social web. J. Am. Soc. Inf. Sci. Technol. 63(1), 163–173 (2012)

Digital Library

[32]

Bengio, Y.: Deep learning of representations: looking forward. In: Statistical Language and Speech Processing, pp. 1–37. Springer, New York (2013)

[33]

Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

[34]

Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, pp. 142–150. Association for Computational Linguistics (2011)

[35]

Labutov, I., Lipson, H.: Re-embedding words. ACL 2, 489–493 (2013)

[36]

Sun, Y., Lin, L., Yang, N., Ji, Z., Wang, X.: Radical-enhanced Chinese character embedding. In: Neural Information Processing, pp. 279–286. Springer, New York (2014)

[37]

Tang, D., Wei, F., Yang, N., Zhou, M., Liu, T., Qin, B.: Learning sentiment-specific word embedding for twitter sentiment classification. ACL 1, 1555–1565 (2014)

[38]

Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv: 1301.3781 (2013)

Digital Library

[39]

Bošnjak, M., Oliveira, E., Martins, J., Rodrigues, E.M., Sarmento, L.: Twitterecho: a distributed focused crawler to support open research with twitter data. ACM (2012)

[40]

Laboreiro, G., Sarmento, L., Teixeira, J., Oliveira, E.: Tokenizing micro-blogging messages using a text classification approach. In: Proceedings of the 4th Workshop on Analytics for Noisy Unstructured Text Data, AND 10 (2010)

[41]

Gimpel, K., Schneider, N., O’Connor, B., Das, D., Mills, D., Eisenstein, J., Heilman, M., Yogatama, D., Flanigan, J., Smith, N.A.: Part-of-speech tagging for twitter: annotation, features, and experiments. In: ACL: Human Language Technologies: Short Papers-Volume 2, pp. 42–47. Association for Computational Linguistics (2011)

[42]

Bodnaruk, A., Loughran, T., McDonald, B.: Using 10-k text to gauge financial constraints. J. Financ. Quant. Anal. 50(04), 623–646 (2015)

[43]

Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity in phrase-level sentiment analysis. In: HLT/EMNLP, pp. 347–354 (2005)

[44]

Saleiro, P., Gomes, L., Soares, C.: Sentiment aggregate functions for political opinion polling using microblog streams. In: Proceedings of the 9th International C* Conference on Computer Science and Software Engineering, pp. 44–50. ACM (2016)

[45]

Saleiro, P., Rodrigues, E.M., Soares, C., Oliveira, E.: FEUP at semEval-2017 task 5: predicting sentiment polarity and intensity with financial word embeddings. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pp. 895–899. Vancouver, Canada, August 2017. Association for Computational Linguistics (2017)

[46]

Saleiro, P., Soares, C.: Learning from the news: predicting entity popularity on twitter. In: International Symposium on Intelligent Data Analysis, pp. 171–182. Springer, New York (2016)

[47]

Saleiro, P., Teixeira, J., Soares, C., Oliveira, E.: Timemachine: entity-centric search and visualization of news archives. In: European Conference on Information Retrieval, pp. 845–848. Springer, New York (2016)

[48]

Saleiro, P., Rei, L., Pasquali, A., Soares, C., Teixeira, J., Pinto, F., Zarmehri, M.N., Félix, C., Strecht, P.: Popstar at replab 2013: name ambiguity resolution on Twitter. In: CLEF (Working Notes) (2013)

[49]

Amigó, E., Gonzalo, J., Verdejo, F.: A general evaluation measure for document organization tasks. In: Proceedings SIGIR (2013)

[50]

Cortis, K., Freitas, A., Dauert, T., Huerlimann, M., Zarrouk, M., Handschuh, S., Davis, B.: Semeval-2017 task 5: Fine-grained sentiment analysis on financial microblogs and news. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pp. 510–526, Vancouver. Association for Computational Linguistics (2017)

Index Terms

TexRep: A Text Mining Framework for Online Reputation Monitoring

Index terms have been assigned to the content through auto-classification.

Recommendations

Learning similarity functions for topic detection in online reputation monitoring
SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval

Reputation management experts have to monitor--among others--Twitter constantly and decide, at any given time, what is being said about the entity of interest (a company, organization, personality...). Solving this reputation monitoring problem ...
Hierarchical classification in text mining for sentiment analysis of online news

Sentiment analysis in text mining is a challenging task. Sentiment is subtly reflected by the tone and affective content of a writer's words. Conventional text mining techniques, which are based on keyword frequencies, usually run short of accurately ...
Opinion mining from online hotel reviews A text summarization approach

Text summarization technique can extract essential information from online reviews.Our method can identify top-k most informative sentences from online hotel reviews.We jointly considered author, review time, usefulness, and opinion factors.Online hotel ...

Comments

Information & Contributors

Information

Published In

cover image New Generation Computing

New Generation Computing Volume 35, Issue 4

Oct 2017

164 pages

ISSN:0288-3635

Issue’s Table of Contents

Copyright © 2017 Ohmsha, Ltd. and Springer Japan KK.

Publisher

Ohmsha

Japan

Publication History

Published: 01 October 2017

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents