A Semantic Kernel for Text Classification Based on Iterative Higher–Order Relations between Words and Documents

Altinel, Berna; Ganiz, Murat Can; Diri, Banu

doi:10.1007/978-3-319-07173-2_43

Berna Altinel²³,
Murat Can Ganiz²⁴ &
Banu Diri²⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8467))

Included in the following conference series:

International Conference on Artificial Intelligence and Soft Computing

2467 Accesses

Abstract

We propose a semantic kernel for Support Vector Machines (SVM) that takes advantage of higher-order relations between the words and between the documents. Conventional approach in text categorization systems is to represent documents as a “Bag of Words” (BOW) in which the relations between the words and their positions are lost. Additionally, traditional machine learning algorithms assume that instances, in our case documents, are independent and identically distributed. This approach simplifies the underlying models, but nevertheless it ignores the semantic connections between words as well as the semantic relations between documents that stem from the words. In this study, we improve the semantic knowledge capture capability of a previous work in [1], which is called χ-Sim Algorithm and use this method in the SVM as a semantic kernel. The proposed approach is evaluated on different benchmark textual datasets. Experiment results show that classification performance improves over the well-known traditional kernels used in the SVM such as the linear kernel (one of the state-of-the-art algorithms for text classification system), the polynomial kernel and the Radial Basis Function (RBF) kernel.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A New Framework to Categorize Text Documents Using SMTP Measure

Sentence similarity based on semantic kernels for intelligent text retrieval

Article 28 November 2016

Does Multilevel Semantic Representation Improve Text Categorization?

References

Bisson, G., Hussain, F.: Chi-Sim: A New Similarity Measure for the Co-clustering Task. In: Proceedings of the 2008 Seventh International Conference on Machine Learning and Applications, pp. 211–217 (2008)
Google Scholar
Wang, P., Domeniconi, C.: Building Semantic Kernels for text classification using Wikipedia. In: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 713–721. ACM Press, New York (2008)
Chapter Google Scholar
Ganiz, M.C., Lytkin, N.I., Pottenger, W.M.: Leveraging Higher Order Dependencies between Features for Text Classification. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009, Part I. LNCS, vol. 5781, pp. 375–390. Springer, Heidelberg (2009)
Chapter Google Scholar
Ganiz, M.C., George, C., Pottenger, W.M.: Higher Order Naive Bayes: A Novel Non-IID Approach to Text Classification. IEEE Transactions on Knowledge and Data Engineering 23(7), 1022–1034 (2011)
Article Google Scholar
Poyraz, M., Kilimci, Z.H., Ganiz, M.C.: Higher-Order Smoothing: A Novel Semantic Smoothing Method for Text Classification. Journal of Computer Science and Technology (accepted, 2014)
Google Scholar
Poyraz, M., Kilimci, Z.H., Ganiz, M.C.: A Novel Semantic Smoothing Method Based on Higher Order Paths for Text Classification. In: IEEE International Conference on Data Mining (ICDM), Brussels, Belgium (2012)
Google Scholar
Altinel, B., Ganiz, M.C., Diri, B.: A Novel Higher-order Semantic Kernel. In: ICECCO 2013 (The 10th International Conference on Electronics Computer and Computation), Ankara, Turkey, November 7-9 (2013)
Google Scholar
Joachims, T.: Text Categorization with Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Chapter Google Scholar
Dumais, S., Platt, J., Heckerman, D., Sahami, M.: Inductive learning algorithms and representations for text categorization. In: Proceedings of the Seventh International Conference on Information Retrieval and Knowledge Management (ACM-CIKM 1998), pp. 148–155 (1998)
Google Scholar
Siolas, G., D’Alche-Buc, F.: Support vectors machines based on a semantic kernel for text Categorization. In: Proceedings of the International Joint Conference on Neural Networks. IEEE Press, Como (2000)
Google Scholar
Leopold, E., Kindermann, J.: Text Categorization with Support Vector Machines. How to Represent Texts in Input Space? Machine Learning 46, 423–444 (2002)
Article MATH Google Scholar
Boser, B.E., Guyon, I.M., Vapnik, V.N.: A Training Algorithm for Optimal Margin Classifier. In: Proc. 5th ACM Workshop, Comput. Learning Theory, Pittsburgh, pp. 144–152 (1992)
Google Scholar
Hsu, C.W., Lin, C.J.: A Comparison of Methods for Multi-Class Support Vector Machines., 415–425 (2002)
Google Scholar
Bloehdorn, S., Basili, R., Cammisa, M., Moschitti, A.: Semantic kernels for text classifi-cation based on topological measures of feature similarity. In: ICDM 2006: Proceedings of the Sixth International Conference on Data Mining, pp. 808–812 (2006)
Google Scholar
Miller, G., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.: Five Papers on WordNet. Technical report, Stanford University (1993)
Google Scholar
Miller, Q., Chen, E., Xiong, H.: A Semantic Term Weighting Scheme for Text Categorization. Journal of Expert Systems with Applications (2011)
Google Scholar
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing & Management 24(5) (1988)
Google Scholar
Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques, 3rd edn. Morgan Kaufmann (2012)
Google Scholar
Dumais, S.: LSI meets TREC: A status report. In: Hartman, D. (ed.) The First Text Retrieval Conference: NIST Special Publication 500-215, pp. 105–116 (1993)
Google Scholar
Kontostathis, A., Pottenger, W.M.: A Framework for Understanding LSI Performance. Information Processing & Management, 56–73 (2006)
Google Scholar
Witten, H.I., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann (1999)
Google Scholar
Platt, J.C.: Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines. In: Advances in Kernel Method: Support Vector Learning, pp. 185–208. MIT Press (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Marmara University, Istanbul, Turkey
Berna Altinel
Department of Computer Engineering, Dogus University, Istanbul, Turkey
Murat Can Ganiz
Department of Computer Engineering, Yildiz Technical University, Istanbul, Turkey
Banu Diri

Authors

Berna Altinel
View author publications
You can also search for this author in PubMed Google Scholar
Murat Can Ganiz
View author publications
You can also search for this author in PubMed Google Scholar
Banu Diri
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Częstochowa University of Technology, Armii Krajowej 36, 42-200, Częstochowa, Poland
Leszek Rutkowski , Marcin Korytkowski & Rafał Scherer , &
AGH University of Science and Technology, Mickiewicza 30, 30-059, Kraków, Poland
Ryszard Tadeusiewicz
Computer Science Division, Department of Electrical Engineering and Computer Sciences, University of California Berkeley, 94720-1776, Berkeley, CA, USA
Lotfi A. Zadeh
Computational Intelligence Laboratory, Electrical and Computer Engineering, University of Louisville, 405 Lutz Hall, 40292, Louisville, KY, USA
Jacek M. Zurada

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Altinel, B., Ganiz, M.C., Diri, B. (2014). A Semantic Kernel for Text Classification Based on Iterative Higher–Order Relations between Words and Documents. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2014. Lecture Notes in Computer Science(), vol 8467. Springer, Cham. https://doi.org/10.1007/978-3-319-07173-2_43

Download citation

DOI: https://doi.org/10.1007/978-3-319-07173-2_43
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07172-5
Online ISBN: 978-3-319-07173-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Semantic Kernel for Text Classification Based on Iterative Higher–Order Relations between Words and Documents

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A New Framework to Categorize Text Documents Using SMTP Measure

Sentence similarity based on semantic kernels for intelligent text retrieval

Does Multilevel Semantic Representation Improve Text Categorization?

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Semantic Kernel for Text Classification Based on Iterative Higher–Order Relations between Words and Documents

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A New Framework to Categorize Text Documents Using SMTP Measure

Sentence similarity based on semantic kernels for intelligent text retrieval

Does Multilevel Semantic Representation Improve Text Categorization?

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation