Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Research on Intelligent Retrieval Model of Multilingual Text Information in Corpus

  • Conference paper
  • First Online:
Advanced Hybrid Information Processing (ADHIP 2021)

Abstract

Cross language information retrieval focuses on how to use the query expressed in one language to search the information expressed in another language. One of the key problems is to adopt different methods to establish bilingual semantic correspondence. In recent years, topic model has become an effective method in machine learning, information retrieval and natural language processing. This paper systematically studies the cross language retrieval model, cross language text classification method and cross language text clustering method. Without the help of cross language resources such as machine translation and bilingual dictionaries, it can effectively solve the many to many problem of Vocabulary Translation in CLIR and the problem of partial decomposition of unknown words. The experimental results on the cross language text classification evaluation corpus established in this paper show that the performance of cross language and single language text classification on the bilingual topic space constructed by this method is close to or better than that of single language classification on the original feature space, and the performance of cross language text clustering is close to or better than that of single language document clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Zou, J., Kanoulas, E.: Towards question-based high-recall information retrieval: locating the last few relevant documents for technology-assisted reviews. ACM Trans. Inf. Syst. 38(3), 1–35 (2020)

    Article  Google Scholar 

  2. Kim, H., Cha, M., Kim, B.C., et al.: Part library-based information retrieval and inspection framework to support part maintenance using 3D printing technology. Rapid Prototyping J. 25(3), 630–644 (2019)

    Article  Google Scholar 

  3. Kanwal, S., Malik, K., Shahzad, K., et al.: Urdu named entity recognition: corpus generation and deep learning applications. ACM Trans. Asian Lang. Inf. Process. 19(1), 8.1-8.13 (2020)

    Google Scholar 

  4. Rascon, C., Ruiz-Espitia, O., Martinez-Carranza, J.: On the use of the AIRA-UAS corpus to evaluate audio processing algorithms in unmanned aerial systems. Sensors 19(18), 3902 (2019)

    Article  Google Scholar 

  5. Rojc, M., Mlakar, I.: A new unit selection optimisation algorithm for corpus-based TTS systems using the RBF-based data compression technique. IEEE Access 7(10), 1 (2019)

    Google Scholar 

  6. Mishra, S., Soni, D.: Smishing detector: a security model to detect smishing through SMS content analysis and URL behavior analysis. Futur. Gener. Comput. Syst. 108(10), 803–815 (2020)

    Article  Google Scholar 

  7. Oh, I., Kim, T., Yim, K., et al.: A novel message-preserving scheme with format-preserving encryption for connected cars in multi-access edge computing. Sensors 19(18), 3869–3870 (2019)

    Article  Google Scholar 

  8. Deng, N., Deng, S., Hu, C., et al.: An efficient revocable attribute-based signcryption scheme with outsourced unsigncryption in cloud computing. IEEE Access 8(10), 42805–42815 (2020)

    Article  Google Scholar 

  9. Ferdinando, D.M., Sabrina, S., Salvatore, S.: A lightweight clustering–based approach to discover different emotional shades from social message streams. Int. J. Intell. Syst. 34(7), 1505–1523 (2019)

    Article  Google Scholar 

  10. Jia, Z., Jafar, S.A.: On the asymptotic capacity of X-secure T-private information retrieval with graph-based replicated storage. IEEE Trans. Inf. Theory 66(10), 6280–6296 (2020)

    Article  MathSciNet  Google Scholar 

  11. Bhattacharya, P., Goyal, P., Sarkar, S.: Using communities of words derived from multilingual word vectors for cross-language information retrieval in indian languages. ACM Trans. Asian Lang. Inf. Process. 18(1), 1.1-1.27 (2019)

    Google Scholar 

  12. Liu, S., Bai, W., Liu, G., et al.: Parallel fractal compression method for big video data. Complexity 2018, 2016976 (2018)

    MATH  Google Scholar 

  13. Liu, S., He, T., Dai, J.: A survey of CRF algorithm based knowledge extraction of elementary mathematics in Chinese. Mobile Netw. Appl. 26, 1891–1903 (2021)

    Article  Google Scholar 

Download references

Funding

1. This paper is supported by the Postgraduate Research and Innovation Project of Northwest Minzu University , the name of the project is “A study on the names and descriptions of the things of mongolian <qin ding li fan yuan ze li>” (project number is Yxm202009), which is the phased achievement of this project.

2. This paper is supported by “the Fundamental Research Funds for the Central Universities-A Study on the Influence of Ecological Protection Policy of Qilian Mountains on Surrounding Herdsmen from the Perspective of Ecological Safety”, which number is 31920190133.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ri-han Wu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wu, Rh., Cao, Yj. (2022). Research on Intelligent Retrieval Model of Multilingual Text Information in Corpus. In: Liu, S., Ma, X. (eds) Advanced Hybrid Information Processing. ADHIP 2021. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 416. Springer, Cham. https://doi.org/10.1007/978-3-030-94551-0_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-94551-0_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-94550-3

  • Online ISBN: 978-3-030-94551-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics