Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3448734.3450777acmotherconferencesArticle/Chapter ViewAbstractPublication PagesdtmisConference Proceedingsconference-collections

Research on Multi-granularity Ensemble Learning Based on Korean

Published: 17 May 2021 Publication History


Ensemble learning can train and combine multiple classifiers where the predictions are used as new features to train a meta-classifier. This improves the accuracy of the model. This paper proposes a multi granularity model based on Stacking ensemble learning for Korean text classification. Firstly, eojeol and subeojeol granularity is proposed according to the Korean language composition. Since different feature granularity contains different semantic information, compare the six different granularities of the phoneme, syllable, subword, word, subeojeol, and eojeol in Korean text classification task. Secondly, construct suffix words based on Korean grammatical morphology and compare the different granularities effects after suffix preprocessing. Finally, propose a multi granularity ensemble learning model based on Korean called MGEL-K. To enrich the diversity of ensemble learning using different granularities, making differences between learners. The results show that MGEL-K model proposed in this paper works best in the Korean text classification task with an accuracy of 92.33%.


L. K. Hansen and P. Salamon, “Neural network ensembles,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 12, no. 10, pp. 993–1001, 1990.
T. G. Dietterich, “Ensemble methods in machine learning,” in International workshop on multiple classifier systems, 2000, pp. 1–15.
R. Caruana and A. Niculescu-Mizil, “An empirical comparison of supervised learning algorithms,” in Proceedings of the 23rd international conference on Machine learning, 2006, pp. 161–168
X. Zhang, J. Zhao, and Y. LeCun, “Character-level convolutional networks for text classification,” in Advances in neural information processing systems, 2015, pp. 649–657.
R. Sennrich, B. Haddow, and A. Birch, “Neural machine translation of rare words with subword units,” arXiv Prepr. arXiv1508.07909, 2015.
Mintae Kim, Yeongtaek Oh, and Wooju Kim, “Sentence similarity prediction based on siamese CNN-Bidirectional LSTM with Self-attention,” Korean Inst. Inf. Sci. Eng., vol. 46, no. 3, pp. 241–245, 2019.
X. Chen, L. Xu, Z. Liu, M. Sun, and H. Luan, “Joint learning of character and word embeddings,” 2015.
J. Yu, X. Jian, H. Xin, and Y. Song, “Joint embeddings of chinese words, characters, and fine-grained subcharacter components,” in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017, pp. 286–291.
X. Meng, Y. Zhao, and M. Fang, “Multilingual text classification method based on bi-directional long term memory and convolutional neural network,” Appl. Res. Comput., vol. 37, no. 9, pp. 2669–2673, 2020.
E. L. Park and S. Cho, “KoNLPy: Korean natural language processing in Python,” Proc. 26th Annu. Conf. Hum. Cogn. Lang. Technol., pp. 133–136, 2014.
T. Kudo, “Subword regularization: Improving neural network translation models with multiple subword candidates,” arXiv Prepr. arXiv1804.10959, 2018.
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv Prepr. arXiv1810.04805, 2018.
Cho S, Whitman J, “Korean: A Linguistic Introduction,” Cambridge University Press, 2019, pp. 31-35.
F. Yang, Y. Zhao, R. Cui, and Z. Yi, “Words Alignment in Parallel Corpus Based on Translation Probability,” J. Chinese Inf. Process., vol. 33, no. 12, pp. 37–44, 2019.
R. E. Schapire, “The strength of weak learnability,” Mach. Learn., vol. 5, no. 2, pp. 197–227, 1990.
L. Breiman, “Bagging predictors,” Mach. Learn., vol. 24, no. 2, pp. 123–140, 1996.
D. H. Wolpert, “Stacked generalization,” Neural networks, vol. 5, no. 2, pp. 241–259, 1992.
K. Tumer and J. Ghosh, “Analysis of decision boundaries in linearly combined neural classifiers,” Pattern Recognit., vol. 29, no. 2, pp. 341–348, 1996.
Y. Kim, “Convolutional neural networks for sentence classification,” arXiv Prepr. arXiv1408.5882, 2014.
A. Mnih and G. E. Hinton, “A scalable hierarchical distributed language model,” Adv. Neural Inf. Process. Syst., vol. 21, pp. 1081–1088, 2008.
Z. Lin, “A structured self-attentive sentence embedding,” arXiv Prepr. arXiv1703.03130, 2017.
A. Vaswani, “Attention is all you need,” in Advances in neural information processing systems, 2017, pp. 5998–6008.
M. Tian, Y. Zhao, and R. Cui, “Identifying Word Translations in Scientific Literature Based on Labeled Bilingual Topic Model and Co-occurrence Features,” in Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data, Springer, 2018, pp. 76–87.

Cited By

View all
  • (2023)A multi-granular stacked regression for forecasting long-term demand in Emergency DepartmentsBMC Medical Informatics and Decision Making10.1186/s12911-023-02109-323:1Online publication date: 7-Feb-2023
  • (2022)DIFM: An Effective Deep Interaction and Fusion Model for Sentence Matching Chinese Computational Linguistics10.1007/978-3-031-18315-7_2(19-30)Online publication date: 14-Oct-2022



Information & Contributors


Published In

cover image ACM Other conferences
CONF-CDS 2021: The 2nd International Conference on Computing and Data Science
January 2021
1142 pages
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]


Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 May 2021


Request permissions for this article.

Check for updates

Author Tags

  1. Ensemble learning
  2. Korean natural language processing
  3. multi-granularity segment
  4. text classification


  • Research-article
  • Research
  • Refereed limited




Other Metrics

Bibliometrics & Citations


Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)1
Reflects downloads up to 01 Feb 2025

Other Metrics


Cited By

View all
  • (2023)A multi-granular stacked regression for forecasting long-term demand in Emergency DepartmentsBMC Medical Informatics and Decision Making10.1186/s12911-023-02109-323:1Online publication date: 7-Feb-2023
  • (2022)DIFM: An Effective Deep Interaction and Fusion Model for Sentence Matching Chinese Computational Linguistics10.1007/978-3-031-18315-7_2(19-30)Online publication date: 14-Oct-2022

View Options

Login options

View options


View or Download as a PDF file.



View online with eReader.


HTML Format

View this article in HTML Format.

HTML Format






Share this Publication link

Share on social media