Evaluating semantic similarity between concepts is a very common component in many applications dealing with textual data such as information extraction, information retrieval, natural language processing, or knowledge acquisition. This paper presents an approach to assess semantic similarity between Vietnamese concepts using Vietnamese Wikipedia. Firstly, the Vietnamese Wikipedia’ structure is exploited to derive a Vietnamese ontology. Next, based on the obtained ontology, we employ similarity measures in literature to evaluate the semantic similarity between Vietnamese concepts. Then we conduct an experiment providing 30 Vietnamese concept pairs to 18 human subjects to assess similarity of these pairs. Finally, we use Pearson product-moment correlation coefficient to estimate the correlation between human judgments and the results of similarity measures employed. The experiment results show that our system achieves quite good performance and that similarity measures between Vietnamese concepts are potential in enhancing the performance of applications dealing with textual data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
- 1.
- 2.
- 3.
- 4.
Accessed April 29, 2014
Vafaee, F., Rosu, D., Broackes-Carter, F., Jurisica, I.: Novel semantic similarity measure improves an integrative approach to predicting gene functional associations. BMC Syst. Biol. 7, 22 (2013)
Wenyin, L., Quan, X., Feng, M., Qiu, B.: A short text modeling method combining semantic and statistical information. Inf. Sci. 180(20), 4031–4041 (2010)
Oliva, J., Serrano, J.I., del Castillo, M.D., Iglesias, Á.: SyMSS: a syntax-based measure for short-text semantic similarity. Data Knowl. Eng. 70(4), 390–405 (2011)
Sánchez, D., Isern, D., Millán, M.: Content annotation for the semantic web: an automatic web-based approach. Knowl. Inf. Syst. 27, 393–418 (2011)
Bontcheva, K. and Rout, D.: Making sense of social media streams through semantics: a survey. In: Semantic Web Journal. IOS Press (2012)
Hliaoutakis, A., Varelas, G., Voutsakis, E., Petrakis, E.G., Milios, E.: Information retrieval by semantic similarity. Int. J. semant. Web Inf. Syst. (IJSWIS) 2(3), 55–73 (2006)
Jiang, Y., Wang, X., Zheng, H.T.: A semantic similarity measure based on information distance for ontology alignment. Info. Sci. 278, 76 (2014). http://dx.doi.org/10.1016/j.ins.2014.03.021
Sánchez, D., Moreno, A., Vasto, L.D.: Learning relation axioms from text: an automatic web-based approach. Expert Syst. Appl. 39, 5792–5805 (2012)
Ferreira, J.D., Couto, F.M.: Semantic similarity for automatic classification of chemical compounds. PLoS Comput. Biol. 6(9), e1000937 (2010)
Batet, M.: Ontology-based semantic clustering. AI Commun. 24, 291–292 (2011)
Schulz, M., Krause, F., Le Novere, N., Klipp, E., Liebermeister, W.: Re-trieval, alignment, and clustering of computational models based on semantic annotations. Mol. Syst. Biol. 7(512), 1–10 (2011)
Luo, Q., Chen, E., Xiong, H.: A semantic term weighting scheme for text categorization. Expert Syst. Appl. 38, 12708–12716 (2011)
Cilibrasi, R., Vitanyi, P.M.B.: The Google similarity distance. IEEE Trans. Knowl. Data Eng. 19(3), 370–383 (2006)
Fernando, S., and Stevenson, M.: A semantic similarity approach to para-phrase detection. In: Proceedings of the 11th Annual Research Colloquium of the UK Special Interest Group for Computational Linguistics, pp. 45–52 (2008)
Resnik, P.: Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language. J. Artif. Intell. Res. 11, 95–130 (1999)
McInnes, B.T., Pedersen, T.: Evaluating measures of semantic similarity and relatedness to disambiguate terms in biomedical text. J. Biomed. Inform. 46(6), 1116–1124 (2013)
Sánchez, D., Isern, D.: Automatic extraction of acronym definitions from the web. Appl. Intell. 34(2), 311–327 (2011)
Pedersen, T., Pakhomov, S.V., Patwardhan, S., Chute, C.G.: Measures of semantic similarity and relatedness in the biomedical domain. J. Biomed. Inform. 40(3), 288–299 (2007)
Sánchez, D., Batet, M., Isern, D., Valls, A.: Ontology-based semantic similarity: a new feature-based approach. Expert Syst. Appl. 39(9), 7718–7728 (2012)
Zesch, T., Gurevych, I.: Wisdom of crowds versus wisdom of linguists-measuring the semantic relatedness of words. Nat. Lang. Eng. 16(1), 25 (2010)
Gracia, J., Mena, E.: Web-based measure of semantic relatedness. Web Information Systems Engineering-WISE 2008, pp. 136–150. Springer, Berlin (2008)
Hsu, Y.Y., Chen, H.Y., Kao, H.Y.: Using a search engine-based mutually reinforcing approach to assess the semantic relatedness of biomedical terms. PloS One 8(11), e77868 (2013)
Budanitsky, A., Hirst, G.: Evaluating Wordnet-based measures of lexical semantic relatedness. Comput. Linguist. 32(1), 13–47 (2006)
Strube, M., Ponzetto, S.P.: WikiRelate! computing semantic relatedness using Wikipedia. AAAI 6, 1419–1424 (2006)
Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: Proceedings of IJCAI, pp. 1606–1611 (2007)
Witten, I., Milne, D.: An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In: Proceeding of AAAI Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy, pp. 25–30 (2008)
Hassan, S., Mihalcea, R.: Semantic Relatedness Using Salient Semantic Analysis. In: Proceedings of AAAI (2011)
Singer, P., Niebler, T., Strohmaier, M., Hotho, A.: Computing semantic relatedness from human navigational paths: a case study on Wikipedia. Int. J. Semant. Web Inf. Syst. (IJSWIS) 9(4), 41–70 (2013)
Liu, Y., McInnes, B.T., Pedersen, T., Melton-Meaux, G., Pakhomov, S.: Semantic relatedness study using second order co-occurrence vectors computed from biomedical corpora, UMLS and WordNet. In: Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium, pp. 363–372 (2012)
Bollegala, D., Matsuo, Y., Ishizuka, M.: A web search engine-based approach to measure semantic similarity between words. IEEE Trans. Knowl. Data Eng. 23(7), 977–990 (2010)
Ballatore, A., Wilson, D.C., Bertolotto, M.: Computing the semantic similarity of geographic terms using volunteered lexical definitions. Int. J. Geogr. Inf. Sci. 27(10), 2099–2118 (2013)
Sánchez, D., Batet, M., Valls, A., Gibert, K.: Ontology-driven web-based semantic similarity. J. Intell. Inf. Syst. 35(3), 383–413 (2010)
Curran, J.R.: From distributional to semantic similarity. Ph.D. thesis, University of Edinburgh (2004)
Harris, Z.S.: Distributional structure. Word 10, 146–162 (1954)
Sánchez, D., Batet, M., Isern, D.: Ontology-based information content computation. Knowl.-Based Syst. 24(2), 297–303 (2011)
Lin, D.: An information-theoretic definition of similarity. In: Proceedings of Conference on Machine Learning, pp. 296–304 (1998)
Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of International Conference Research on Computational Linguistics (ROCLING X), pp. 19–33 (1997)
Leacock, C., Chodorow, M.: Combining local context and WordNet similarity for word sense identification. In: Fellbaum, C. (ed.) WordNet: An Electronic Lexical Database. pp. 265–283. MIT Press, Cambridge (1998)
Rada, R., Mili, H., Bichnell, E., Blettner, M.: Development and application of a metric on semantic nets. IEEE Trans. Syst. Man Cybern. 9(1), 17–30 (1989)
Wu, Z., Palmer, M.: Verb semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, pp. 133–138, (1994)
Sánchez, D., Batet, M.: A semantic similarity method based on information content exploiting multiple ontologies. Expert Syst. Appl. 40(4), 1393–1399 (2013)
Resnik, P.: Information content to evaluate semantic similarity in a taxonomy. In: Proceedings of IJCAI, pp. 448–453 (1995)
Seco, N., Veale, T., Hayes, J.: An intrinsic information content metric for semantic similarity in WordNet. In: Proceedings of ECAI, pp. 1089–1090 (2004)
Wu, X., Pang, E., Lin, K., Pei, Z.-M.: Improving the measurement of semantic similarity between gene ontology terms and gene products: insights from an edge- and IC-based hybrid method. PLoS ONE 8(5), e66745 (2013). doi:10.1371/journal.pone.0066745
Tversky, A.: Features of similarity. Psychol. Rev. 84(2), 327–352 (1977)
Zuber, V.S., Faltings, B.: OSS: A semantic similarity function based on hierarchical ontologies. In: Proceedings of IJCAI, pp. 551–556 (2007)
Pirró, G.: A semantic similarity metric combining features and intrinsic information content. Data Knowl. Eng. 68(11), 1289–1308 (2009)
Solé-Ribalta, A., Sénchez, D., Batet, M., Serratosa, F.: Towards the estimation of feature-based semantic similarity using multiple ontologies. Knowl.-Based Syst. 55, 101–113 (2014)
Li, Y., Bandar, Z.A., McLean, D.: An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans. Knowl. Data Eng. 15(4), 871–882 (2003)
Al-Mubaid, H., Nguyen, A.: Measuring semantic similarity between bio-medical concepts within multiple ontologies. IEEE Trans. Syst. Man Cybern. Part C: Appl. Rev. 39, 389–398 (2009)
Rodríguez, A., Egenhofer, M.J.: Determining semantic similarity among entity classes from different ontologies. IEEE Trans. Knowl. Data Eng. 15, 442–456 (2003)
Batet, M., Sánchez, D., Valls, A.: An ontology-based measure to compute semantic similarity in biomedicine. J. Biomed. Inform. 44(1), 118–125 (2011)
Batet, M., Sánchez, D., Valls, A., Gibert, K.: Semantic similarity estimation from multiple ontologies. Appl. Intell. 38(1), 29–44 (2013)
Ponzetto, S.P., Strube, M.: Deriving a large scale taxonomy from Wikipedia. AAAI 7, 1440–1445 (2007)
Miller, G., Charles, W.: Contextual correlates of semantic similarity. Lang. Cogn. Process. 6(1), 1–28 (1991)
Rubenstein, H., Goodenough, J.B.: Contextual correlates of synonymy. Commun. ACM 8(10), 627–633 (1965)
Harispe, S., Sánchez, D., Ranwez, S., Janaqi, S., Montmain, J.: A frame-work for unifying ontology-based semantic similarity measures: a study in the biomedical domain. J. Biomed. Inform. 48, 38–53 (2013)
Sy, M.F., Ranwez, S., Montmain, J., Regnault, A., Crampes, M., Ranwez, V.: User centered and ontology based information retrieval system for life sciences. BMC Bioinform. 13(Suppl 1), S4 (2012)
Yazdani, M., Popescu-Belis, A.: Computing text semantic relatedness using the contents and links of a hypertext encyclopedia. Artif. Intell. 194, 176–202 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Nguyen, H.T. (2015). Computing Semantic Similarity for Vietnamese Concepts Using Wikipedia. In: Dang, Q., Nguyen, X., Le, H., Nguyen, V., Bao, V. (eds) Some Current Advanced Researches on Information and Computer Science in Vietnam. NAFOSTED 2014. Advances in Intelligent Systems and Computing, vol 341. Springer, Cham. https://doi.org/10.1007/978-3-319-14633-1_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-14633-1_7
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14632-4
Online ISBN: 978-3-319-14633-1
eBook Packages: EngineeringEngineering (R0)