Dynamic Clustering of Stream Short Documents Using Evolutionary Word Relation Network

Yang, Shuiqiao; Huang, Guangyan; Zhou, Xiangmin; Xiang, Yang

doi:10.1007/978-981-15-2810-1_40

Shuiqiao Yang¹⁵,
Guangyan Huang¹⁵,
Xiangmin Zhou¹⁶ &
…
Yang Xiang¹⁷

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1179))

Included in the following conference series:

International Conference on Data Service

1256 Accesses
2 Citations

Abstract

The explosive growth of web 2.0 applications (e.g., social networks, question answering forums and blogs) leads to continuous generation of short texts. Using clustering analysis to automatically categorize the stream short texts has been proved to be one of the critical unsupervised learning techniques. However, the unique attributes of short texts (e.g, few meaningful keywords, noisy features and lacking context) and the temporal dynamics of data in the stream challenge this task.

To tackle the problem, in this paper, we propose a stream clustering algorithm EWNStream by exploring the Evolutionary Word relation Network. The word relation network is constructed with the aggregated word co-occurrence patterns from batch of short texts in the stream to overcome the sparse features of short text at document level. To cope with the temporal dynamics of data in the stream, the word relation network will be incrementally updated with the new arriving batches of data. The change of word relation network indicates the evolution of underlying clusters in the stream. Based on the evolutionary word relation network, we proposed a keyword group discovery strategy to extract the representative terms for the underlying short text clusters. The keyword groups are used as cluster centers to group the stream short texts. The experimental results on real-word Twitter dataset show that our method can achieve much better clustering accuracy and time efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A topic-enhanced dirichlet model for short text stream clustering

Article 02 March 2024

Dynamic clustering for short text stream based on Dirichlet process

Article 26 July 2021

Locality-Sensitive Term Weighting for Short Text Clustering

References

Silva, J.A., Faria, E.R., Barros, R.C., Hruschka, E.R., De Carvalho, A.C., Gama, J.: Data stream clustering: a survey. ACM Comput. Surv. (CSUR) 46(1), 13 (2013)
Article Google Scholar
Ozcan, G.: Unsupervised learning from multi-dimensional data: a fast clustering algorithm utilizing canopies and statistical information. Int. J. Inf. Technol. Decis. Making 17(03), 841–856 (2018)
Article Google Scholar
Mehdizadeh, E., Teimouri, M., Zaretalab, A., Niaki, S.: A combined approach based on K-means and modified electromagnetism-like mechanism for data clustering. Int. J. Inf. Technol. Decis. Making 16(05), 1279–1307 (2017)
Article Google Scholar
Feng, W., et al.: STREAMCUBE: hierarchical spatio-temporal hashtag clustering for event exploration over the twitter stream. In: 2015 IEEE 31st International Conference on Data Engineering (ICDE), pp. 1561–1572. IEEE (2015)
Google Scholar
Zhao, Y., Liang, S., Ren, Z., Ma, J., Yilmaz, E., de Rijke, M.: Explainable user clustering in short text streams. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 155–164. ACM (2016)
Google Scholar
Wang, N., Ke, S., Chen, Y., Yan, T., Lim, A., et al.: Textual sentiment of Chinese microblog toward the stock market. Int. J. Inf. Technol. Decis. Making (IJITDM) 18(02), 649–671 (2019)
Article Google Scholar
Yan, X., Guo, J., Lan, Y., Cheng, X.: A biterm topic model for short texts. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 1445–1456. ACM (2013)
Google Scholar
Huang, G., et al.: Mining streams of short text for analysis of world-wide event evolutions. World Wide Web 18(5), 1201–1217 (2015)
Article Google Scholar
Shou, L., Wang, Z., Chen, K., Chen, G.: Sumblr: continuous summarization of evolving tweet streams. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 533–542. ACM (2013)
Google Scholar
Liang, S., Yilmaz, E., Kanoulas, E.: Dynamic clustering of streaming short documents. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016, pp. 995–1004. ACM (2016)
Google Scholar
Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases-Volume 29, pp. 81–92. VLDB Endowment (2003)
Google Scholar
Cao, F., Estert, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: Proceedings of the 2006 SIAM International Conference on Data Mining, pp. 328–339. SIAM (2006)
Google Scholar
Zhong, S.: Efficient streaming text clustering. Neural Netw. 18(5–6), 790–798 (2005)
Article Google Scholar
Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 113–120. ACM (2006)
Google Scholar
Yin, J., Chao, D., Liu, Z., Zhang, W., Yu, X., Wang, J.: Model-based clustering of short text streams. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2018, pp. 2634–2642. ACM, New York (2018)
Google Scholar
Liu, K., Bellet, A., Sha, F.: Similarity learning for high-dimensional sparse data. In: AISTATS (2015)
Google Scholar
Yang, J., Leskovec, J.: Patterns of temporal variation in online media. In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, WSDM 2011, pp. 177–186. ACM, New York (2011)
Google Scholar
Yin, J., Wang, J.: A model-based approach for text clustering with outlier detection. In: 2016 IEEE 32nd International Conference on Data Engineering (ICDE), pp. 625–636. IEEE (2016)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar

Download references

Acknowledgments

This work was partially supported by Australian Research Council (ARC) Grant (No. DE140100387).

Author information

Authors and Affiliations

School of Information Technology, Deakin University, Burwood, VIC, Australia
Shuiqiao Yang & Guangyan Huang
School of Computer Science and Information Technology, RMIT University, Melbourne, VIC, Australia
Xiangmin Zhou
Swinburne University of Technology, Hawthorn, VIC, Australia
Yang Xiang

Authors

Shuiqiao Yang
View author publications
You can also search for this author in PubMed Google Scholar
Guangyan Huang
View author publications
You can also search for this author in PubMed Google Scholar
Xiangmin Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Yang Xiang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guangyan Huang .

Editor information

Editors and Affiliations

Swinburne University of Technology, Melbourne, VIC, Australia
Jing He
University of Illinois at Chicago, Chicago, USA
Philip S. Yu
College of Information Science and Technology, University of Nebraska at Omaha, Omaha, NE, USA
Yong Shi
Research Institute of Extenics and Innovation Methods, Guangdong University of Technology, Guangzhou, China
Xingsen Li
Ningbo University, Ningbo, China
Zhijun Xie
Deakin University, Burwood, VIC, Australia
Guangyan Huang
Department of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing, China
Jie Cao
Nanjing University of Posts and Telecommunications, Nanjing, China
Fu Xiao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, S., Huang, G., Zhou, X., Xiang, Y. (2020). Dynamic Clustering of Stream Short Documents Using Evolutionary Word Relation Network. In: He, J., et al. Data Science. ICDS 2019. Communications in Computer and Information Science, vol 1179. Springer, Singapore. https://doi.org/10.1007/978-981-15-2810-1_40

Download citation

DOI: https://doi.org/10.1007/978-981-15-2810-1_40
Published: 02 February 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-2809-5
Online ISBN: 978-981-15-2810-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Dynamic Clustering of Stream Short Documents Using Evolutionary Word Relation Network

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A topic-enhanced dirichlet model for short text stream clustering

Dynamic clustering for short text stream based on Dirichlet process

Locality-Sensitive Term Weighting for Short Text Clustering

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Dynamic Clustering of Stream Short Documents Using Evolutionary Word Relation Network

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A topic-enhanced dirichlet model for short text stream clustering

Dynamic clustering for short text stream based on Dirichlet process

Locality-Sensitive Term Weighting for Short Text Clustering

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation