Abstract
Cross language information retrieval focuses on how to use the query expressed in one language to search the information expressed in another language. One of the key problems is to adopt different methods to establish bilingual semantic correspondence. In recent years, topic model has become an effective method in machine learning, information retrieval and natural language processing. This paper systematically studies the cross language retrieval model, cross language text classification method and cross language text clustering method. Without the help of cross language resources such as machine translation and bilingual dictionaries, it can effectively solve the many to many problem of Vocabulary Translation in CLIR and the problem of partial decomposition of unknown words. The experimental results on the cross language text classification evaluation corpus established in this paper show that the performance of cross language and single language text classification on the bilingual topic space constructed by this method is close to or better than that of single language classification on the original feature space, and the performance of cross language text clustering is close to or better than that of single language document clustering.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Zou, J., Kanoulas, E.: Towards question-based high-recall information retrieval: locating the last few relevant documents for technology-assisted reviews. ACM Trans. Inf. Syst. 38(3), 1–35 (2020)
Kim, H., Cha, M., Kim, B.C., et al.: Part library-based information retrieval and inspection framework to support part maintenance using 3D printing technology. Rapid Prototyping J. 25(3), 630–644 (2019)
Kanwal, S., Malik, K., Shahzad, K., et al.: Urdu named entity recognition: corpus generation and deep learning applications. ACM Trans. Asian Lang. Inf. Process. 19(1), 8.1-8.13 (2020)
Rascon, C., Ruiz-Espitia, O., Martinez-Carranza, J.: On the use of the AIRA-UAS corpus to evaluate audio processing algorithms in unmanned aerial systems. Sensors 19(18), 3902 (2019)
Rojc, M., Mlakar, I.: A new unit selection optimisation algorithm for corpus-based TTS systems using the RBF-based data compression technique. IEEE Access 7(10), 1 (2019)
Mishra, S., Soni, D.: Smishing detector: a security model to detect smishing through SMS content analysis and URL behavior analysis. Futur. Gener. Comput. Syst. 108(10), 803–815 (2020)
Oh, I., Kim, T., Yim, K., et al.: A novel message-preserving scheme with format-preserving encryption for connected cars in multi-access edge computing. Sensors 19(18), 3869–3870 (2019)
Deng, N., Deng, S., Hu, C., et al.: An efficient revocable attribute-based signcryption scheme with outsourced unsigncryption in cloud computing. IEEE Access 8(10), 42805–42815 (2020)
Ferdinando, D.M., Sabrina, S., Salvatore, S.: A lightweight clustering–based approach to discover different emotional shades from social message streams. Int. J. Intell. Syst. 34(7), 1505–1523 (2019)
Jia, Z., Jafar, S.A.: On the asymptotic capacity of X-secure T-private information retrieval with graph-based replicated storage. IEEE Trans. Inf. Theory 66(10), 6280–6296 (2020)
Bhattacharya, P., Goyal, P., Sarkar, S.: Using communities of words derived from multilingual word vectors for cross-language information retrieval in indian languages. ACM Trans. Asian Lang. Inf. Process. 18(1), 1.1-1.27 (2019)
Liu, S., Bai, W., Liu, G., et al.: Parallel fractal compression method for big video data. Complexity 2018, 2016976 (2018)
Liu, S., He, T., Dai, J.: A survey of CRF algorithm based knowledge extraction of elementary mathematics in Chinese. Mobile Netw. Appl. 26, 1891–1903 (2021)
Funding
1. This paper is supported by the Postgraduate Research and Innovation Project of Northwest Minzu University , the name of the project is “A study on the names and descriptions of the things of mongolian <qin ding li fan yuan ze li>” (project number is Yxm202009), which is the phased achievement of this project.
2. This paper is supported by “the Fundamental Research Funds for the Central Universities-A Study on the Influence of Ecological Protection Policy of Qilian Mountains on Surrounding Herdsmen from the Perspective of Ecological Safety”, which number is 31920190133.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Wu, Rh., Cao, Yj. (2022). Research on Intelligent Retrieval Model of Multilingual Text Information in Corpus. In: Liu, S., Ma, X. (eds) Advanced Hybrid Information Processing. ADHIP 2021. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 416. Springer, Cham. https://doi.org/10.1007/978-3-030-94551-0_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-94551-0_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-94550-3
Online ISBN: 978-3-030-94551-0
eBook Packages: Computer ScienceComputer Science (R0)