Abstract
With the rapid development of micro-blog, it has become one of the main platforms to publish news and express opinions. Micro-blog analyzing for hot event detection is widely concerned by researchers. However, hot event detection is not easy because micro-blog blogs have the characteristics of large scale, short text and irregular grammar. In order to improve the performance of hot event detection, a two-stage clustering hot event detection model for micro-blog is proposed. The model is designed in spark environment and divided into two parts. First, K-Means method is improved by threshold setting and cosine similarity to cluster blogs. Then, the result of blogs clustering is clustered again to detect hot events by LDA (Latent Dirichlet Allocation) model. Sufficient experiments have been carried out in spark environment, it is shown that the proposed model gains higher accuracy and time efficiency for hot event detection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
“Jieba” (Chinese for “to stutter”) Chinese text segmentation: built to be the best Python Chinese word segmentation module. GitHub: https://github.com/fxsjy/jieba/.
References
Ai, W., Li, K., Li, K.: An effective hot topic detection method for microblog on spark. Appl. Soft Comput. 70, 1010–1023 (2018)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Cao, J.X., Xu, S., Chen, G.J., Zhao, L.Y., Zhou, T., Liu, B.: Discovering geographical topics in online social networks. Chin. J. Comput. 40(7), 1530–1542 (2017)
Chen, X., Zhou, X., Sellis, T., Li, X.: Social event detection with retweeting behavior correlation. Expert Syst. Appl. 114, 516–523 (2018)
Hao, Y., Zheng, Q., Chen, Y., Yan, C.: Recognition of abnormal behavior based on data of public opinion on the web. Comput. Res. Dev. 53(3), 611–620 (2016)
Huang, F.L., Feng, S., Wang, D.L., Yu, G.: Mining topic sentiment in microblogging based on multi-feature fusion. Chin. J. Comput. 40(4), 872–888 (2017)
Huang, F.L., Yu, G., Zhang, J.L., Li, C.X., Yuan, C.A., Lu, J.L.: Mining topic sentiment in micro-blogging based on micro-blogger social relation. J. Softw. 28(3), 694–707 (2017)
Kitajima, R., Kobayashi, I.: A latent topic extracting method based on events in a document and its application. In: Proceedings of the ACL 2011 Student Session, pp. 30–35. Association for Computational Linguistics (2011)
Mathioudakis, M., Koudas, N.: TwitterMonitor: trend detection over the twitter stream. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 1155–1158. ACM (2010)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. Comput. Sci. (2013)
Ozdikis, O., Senkul, P., Oguztuzun, H.: Semantic expansion of hashtags for enhanced event detection in Twitter. In: Proceedings of VLDB 2012 Workshop on Online Social Systems, pp. 1–6 (08 2012)
Stilo, G., Velardi, P.: Temporal semantics: time-varying hashtag sense clustering. In: Janowicz, K., Schlobach, S., Lambrix, P., Hyvönen, E. (eds.) EKAW 2014. LNCS (LNAI), vol. 8876, pp. 563–578. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-13704-9_42
Sun, R., Guo, S., Ji, D.H.: Topic representation integrated with event knowledge. Chin. J. Comput. 40(4), 791–804 (2017)
Wang, Z.H., Chen, S.M., Yuan, X.R.: Visual analysis for microblog topic modeling. J. Softw. 29(4), 1115–1130 (2018)
Xu, K., Qi, G., Huang, J., Wu, T., Fu, X.: Detecting bursts in sentiment-aware topics from social media. Knowl.-Based Syst. 141, 44–54 (2018)
Yan, X., Guo, J., Lan, Y., Cheng, X.: A biterm topic model for short texts. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 1445–1456. ACM (2013)
Yilmaz, Y., Hero, A.O.: Multimodal event detection in Twitter hashtag networks. J. Signal Process. Syst. 90(2), 185–200 (2018)
Zhong, Z.M., Guan, Y., Li, C.H., Liu, Z.T.: Localized top-k bursty event detection in microblog. Chin. J. Comput. 41(7), 1504–1516 (2018)
Acknowledgments
This work was financially supported by the Natural Science Foundation of China (41571401).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Xia, Y., Huang, H. (2020). Two-Stage Clustering Hot Event Detection Model for Micro-blog on Spark. In: Wen, S., Zomaya, A., Yang, L.T. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2019. Lecture Notes in Computer Science(), vol 11945. Springer, Cham. https://doi.org/10.1007/978-3-030-38961-1_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-38961-1_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-38960-4
Online ISBN: 978-3-030-38961-1
eBook Packages: Computer ScienceComputer Science (R0)