Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Computing Semantic Similarity for Vietnamese Concepts Using Wikipedia

  • Conference paper
  • First Online:
Some Current Advanced Researches on Information and Computer Science in Vietnam (NAFOSTED 2014)

Abstract

Evaluating semantic similarity between concepts is a very common component in many applications dealing with textual data such as information extraction, information retrieval, natural language processing, or knowledge acquisition. This paper presents an approach to assess semantic similarity between Vietnamese concepts using Vietnamese Wikipedia. Firstly, the Vietnamese Wikipedia’ structure is exploited to derive a Vietnamese ontology. Next, based on the obtained ontology, we employ similarity measures in literature to evaluate the semantic similarity between Vietnamese concepts. Then we conduct an experiment providing 30 Vietnamese concept pairs to 18 human subjects to assess similarity of these pairs. Finally, we use Pearson product-moment correlation coefficient to estimate the correlation between human judgments and the results of similarity measures employed. The experiment results show that our system achieves quite good performance and that similarity measures between Vietnamese concepts are potential in enhancing the performance of applications dealing with textual data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://wordnet.princeton.edu/.

  2. 2.

    https://vi.wikipedia.org.

  3. 3.

    http://www.nlm.nih.gov/research/umls/.

  4. 4.

    Accessed April 29, 2014

References

  1. Vafaee, F., Rosu, D., Broackes-Carter, F., Jurisica, I.: Novel semantic similarity measure improves an integrative approach to predicting gene functional associations. BMC Syst. Biol. 7, 22 (2013)

    Article  Google Scholar 

  2. Wenyin, L., Quan, X., Feng, M., Qiu, B.: A short text modeling method combining semantic and statistical information. Inf. Sci. 180(20), 4031–4041 (2010)

    Article  Google Scholar 

  3. Oliva, J., Serrano, J.I., del Castillo, M.D., Iglesias, Á.: SyMSS: a syntax-based measure for short-text semantic similarity. Data Knowl. Eng. 70(4), 390–405 (2011)

    Article  Google Scholar 

  4. Sánchez, D., Isern, D., Millán, M.: Content annotation for the semantic web: an automatic web-based approach. Knowl. Inf. Syst. 27, 393–418 (2011)

    Article  Google Scholar 

  5. Bontcheva, K. and Rout, D.: Making sense of social media streams through semantics: a survey. In: Semantic Web Journal. IOS Press (2012)

    Google Scholar 

  6. Hliaoutakis, A., Varelas, G., Voutsakis, E., Petrakis, E.G., Milios, E.: Information retrieval by semantic similarity. Int. J. semant. Web Inf. Syst. (IJSWIS) 2(3), 55–73 (2006)

    Article  Google Scholar 

  7. Jiang, Y., Wang, X., Zheng, H.T.: A semantic similarity measure based on information distance for ontology alignment. Info. Sci. 278, 76 (2014). http://dx.doi.org/10.1016/j.ins.2014.03.021

  8. Sánchez, D., Moreno, A., Vasto, L.D.: Learning relation axioms from text: an automatic web-based approach. Expert Syst. Appl. 39, 5792–5805 (2012)

    Article  Google Scholar 

  9. Ferreira, J.D., Couto, F.M.: Semantic similarity for automatic classification of chemical compounds. PLoS Comput. Biol. 6(9), e1000937 (2010)

    Article  Google Scholar 

  10. Batet, M.: Ontology-based semantic clustering. AI Commun. 24, 291–292 (2011)

    Google Scholar 

  11. Schulz, M., Krause, F., Le Novere, N., Klipp, E., Liebermeister, W.: Re-trieval, alignment, and clustering of computational models based on semantic annotations. Mol. Syst. Biol. 7(512), 1–10 (2011)

    Google Scholar 

  12. Luo, Q., Chen, E., Xiong, H.: A semantic term weighting scheme for text categorization. Expert Syst. Appl. 38, 12708–12716 (2011)

    Article  Google Scholar 

  13. Cilibrasi, R., Vitanyi, P.M.B.: The Google similarity distance. IEEE Trans. Knowl. Data Eng. 19(3), 370–383 (2006)

    Article  Google Scholar 

  14. Fernando, S., and Stevenson, M.: A semantic similarity approach to para-phrase detection. In: Proceedings of the 11th Annual Research Colloquium of the UK Special Interest Group for Computational Linguistics, pp. 45–52 (2008)

    Google Scholar 

  15. Resnik, P.: Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language. J. Artif. Intell. Res. 11, 95–130 (1999)

    MATH  Google Scholar 

  16. McInnes, B.T., Pedersen, T.: Evaluating measures of semantic similarity and relatedness to disambiguate terms in biomedical text. J. Biomed. Inform. 46(6), 1116–1124 (2013)

    Article  Google Scholar 

  17. Sánchez, D., Isern, D.: Automatic extraction of acronym definitions from the web. Appl. Intell. 34(2), 311–327 (2011)

    Article  Google Scholar 

  18. Pedersen, T., Pakhomov, S.V., Patwardhan, S., Chute, C.G.: Measures of semantic similarity and relatedness in the biomedical domain. J. Biomed. Inform. 40(3), 288–299 (2007)

    Article  Google Scholar 

  19. Sánchez, D., Batet, M., Isern, D., Valls, A.: Ontology-based semantic similarity: a new feature-based approach. Expert Syst. Appl. 39(9), 7718–7728 (2012)

    Article  Google Scholar 

  20. Zesch, T., Gurevych, I.: Wisdom of crowds versus wisdom of linguists-measuring the semantic relatedness of words. Nat. Lang. Eng. 16(1), 25 (2010)

    Article  Google Scholar 

  21. Gracia, J., Mena, E.: Web-based measure of semantic relatedness. Web Information Systems Engineering-WISE 2008, pp. 136–150. Springer, Berlin (2008)

    Chapter  Google Scholar 

  22. Hsu, Y.Y., Chen, H.Y., Kao, H.Y.: Using a search engine-based mutually reinforcing approach to assess the semantic relatedness of biomedical terms. PloS One 8(11), e77868 (2013)

    Article  Google Scholar 

  23. Budanitsky, A., Hirst, G.: Evaluating Wordnet-based measures of lexical semantic relatedness. Comput. Linguist. 32(1), 13–47 (2006)

    Article  MATH  Google Scholar 

  24. Strube, M., Ponzetto, S.P.: WikiRelate! computing semantic relatedness using Wikipedia. AAAI 6, 1419–1424 (2006)

    Google Scholar 

  25. Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: Proceedings of IJCAI, pp. 1606–1611 (2007)

    Google Scholar 

  26. Witten, I., Milne, D.: An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In: Proceeding of AAAI Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy, pp. 25–30 (2008)

    Google Scholar 

  27. Hassan, S., Mihalcea, R.: Semantic Relatedness Using Salient Semantic Analysis. In: Proceedings of AAAI (2011)

    Google Scholar 

  28. Singer, P., Niebler, T., Strohmaier, M., Hotho, A.: Computing semantic relatedness from human navigational paths: a case study on Wikipedia. Int. J. Semant. Web Inf. Syst. (IJSWIS) 9(4), 41–70 (2013)

    Article  Google Scholar 

  29. Liu, Y., McInnes, B.T., Pedersen, T., Melton-Meaux, G., Pakhomov, S.: Semantic relatedness study using second order co-occurrence vectors computed from biomedical corpora, UMLS and WordNet. In: Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium, pp. 363–372 (2012)

    Google Scholar 

  30. Bollegala, D., Matsuo, Y., Ishizuka, M.: A web search engine-based approach to measure semantic similarity between words. IEEE Trans. Knowl. Data Eng. 23(7), 977–990 (2010)

    Article  Google Scholar 

  31. Ballatore, A., Wilson, D.C., Bertolotto, M.: Computing the semantic similarity of geographic terms using volunteered lexical definitions. Int. J. Geogr. Inf. Sci. 27(10), 2099–2118 (2013)

    Article  Google Scholar 

  32. Sánchez, D., Batet, M., Valls, A., Gibert, K.: Ontology-driven web-based semantic similarity. J. Intell. Inf. Syst. 35(3), 383–413 (2010)

    Article  Google Scholar 

  33. Curran, J.R.: From distributional to semantic similarity. Ph.D. thesis, University of Edinburgh (2004)

    Google Scholar 

  34. Harris, Z.S.: Distributional structure. Word 10, 146–162 (1954)

    Google Scholar 

  35. Sánchez, D., Batet, M., Isern, D.: Ontology-based information content computation. Knowl.-Based Syst. 24(2), 297–303 (2011)

    Article  Google Scholar 

  36. Lin, D.: An information-theoretic definition of similarity. In: Proceedings of Conference on Machine Learning, pp. 296–304 (1998)

    Google Scholar 

  37. Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of International Conference Research on Computational Linguistics (ROCLING X), pp. 19–33 (1997)

    Google Scholar 

  38. Leacock, C., Chodorow, M.: Combining local context and WordNet similarity for word sense identification. In: Fellbaum, C. (ed.) WordNet: An Electronic Lexical Database. pp. 265–283. MIT Press, Cambridge (1998)

    Google Scholar 

  39. Rada, R., Mili, H., Bichnell, E., Blettner, M.: Development and application of a metric on semantic nets. IEEE Trans. Syst. Man Cybern. 9(1), 17–30 (1989)

    Article  Google Scholar 

  40. Wu, Z., Palmer, M.: Verb semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, pp. 133–138, (1994)

    Google Scholar 

  41. Sánchez, D., Batet, M.: A semantic similarity method based on information content exploiting multiple ontologies. Expert Syst. Appl. 40(4), 1393–1399 (2013)

    Article  Google Scholar 

  42. Resnik, P.: Information content to evaluate semantic similarity in a taxonomy. In: Proceedings of IJCAI, pp. 448–453 (1995)

    Google Scholar 

  43. Seco, N., Veale, T., Hayes, J.: An intrinsic information content metric for semantic similarity in WordNet. In: Proceedings of ECAI, pp. 1089–1090 (2004)

    Google Scholar 

  44. Wu, X., Pang, E., Lin, K., Pei, Z.-M.: Improving the measurement of semantic similarity between gene ontology terms and gene products: insights from an edge- and IC-based hybrid method. PLoS ONE 8(5), e66745 (2013). doi:10.1371/journal.pone.0066745

    Article  Google Scholar 

  45. Tversky, A.: Features of similarity. Psychol. Rev. 84(2), 327–352 (1977)

    Article  Google Scholar 

  46. Zuber, V.S., Faltings, B.: OSS: A semantic similarity function based on hierarchical ontologies. In: Proceedings of IJCAI, pp. 551–556 (2007)

    Google Scholar 

  47. Pirró, G.: A semantic similarity metric combining features and intrinsic information content. Data Knowl. Eng. 68(11), 1289–1308 (2009)

    Article  Google Scholar 

  48. Solé-Ribalta, A., Sénchez, D., Batet, M., Serratosa, F.: Towards the estimation of feature-based semantic similarity using multiple ontologies. Knowl.-Based Syst. 55, 101–113 (2014)

    Article  Google Scholar 

  49. Li, Y., Bandar, Z.A., McLean, D.: An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans. Knowl. Data Eng. 15(4), 871–882 (2003)

    Article  Google Scholar 

  50. Al-Mubaid, H., Nguyen, A.: Measuring semantic similarity between bio-medical concepts within multiple ontologies. IEEE Trans. Syst. Man Cybern. Part C: Appl. Rev. 39, 389–398 (2009)

    Article  Google Scholar 

  51. Rodríguez, A., Egenhofer, M.J.: Determining semantic similarity among entity classes from different ontologies. IEEE Trans. Knowl. Data Eng. 15, 442–456 (2003)

    Article  Google Scholar 

  52. Batet, M., Sánchez, D., Valls, A.: An ontology-based measure to compute semantic similarity in biomedicine. J. Biomed. Inform. 44(1), 118–125 (2011)

    Article  Google Scholar 

  53. Batet, M., Sánchez, D., Valls, A., Gibert, K.: Semantic similarity estimation from multiple ontologies. Appl. Intell. 38(1), 29–44 (2013)

    Article  Google Scholar 

  54. Ponzetto, S.P., Strube, M.: Deriving a large scale taxonomy from Wikipedia. AAAI 7, 1440–1445 (2007)

    Google Scholar 

  55. Miller, G., Charles, W.: Contextual correlates of semantic similarity. Lang. Cogn. Process. 6(1), 1–28 (1991)

    Article  Google Scholar 

  56. Rubenstein, H., Goodenough, J.B.: Contextual correlates of synonymy. Commun. ACM 8(10), 627–633 (1965)

    Article  Google Scholar 

  57. Harispe, S., Sánchez, D., Ranwez, S., Janaqi, S., Montmain, J.: A frame-work for unifying ontology-based semantic similarity measures: a study in the biomedical domain. J. Biomed. Inform. 48, 38–53 (2013)

    Article  Google Scholar 

  58. Sy, M.F., Ranwez, S., Montmain, J., Regnault, A., Crampes, M., Ranwez, V.: User centered and ontology based information retrieval system for life sciences. BMC Bioinform. 13(Suppl 1), S4 (2012)

    Article  Google Scholar 

  59. Yazdani, M., Popescu-Belis, A.: Computing text semantic relatedness using the contents and links of a hypertext encyclopedia. Artif. Intell. 194, 176–202 (2013)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hien T. Nguyen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Nguyen, H.T. (2015). Computing Semantic Similarity for Vietnamese Concepts Using Wikipedia. In: Dang, Q., Nguyen, X., Le, H., Nguyen, V., Bao, V. (eds) Some Current Advanced Researches on Information and Computer Science in Vietnam. NAFOSTED 2014. Advances in Intelligent Systems and Computing, vol 341. Springer, Cham. https://doi.org/10.1007/978-3-319-14633-1_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-14633-1_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-14632-4

  • Online ISBN: 978-3-319-14633-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics