Research on Intelligent Retrieval Model of Multilingual Text Information in Corpus

Wu, Ri-han; Cao, Yi-jie

doi:10.1007/978-3-030-94551-0_3

Ri-han Wu¹⁷ &
Yi-jie Cao¹⁸

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 416))

Included in the following conference series:

International Conference on Advanced Hybrid Information Processing

913 Accesses
1 Citations

Abstract

Cross language information retrieval focuses on how to use the query expressed in one language to search the information expressed in another language. One of the key problems is to adopt different methods to establish bilingual semantic correspondence. In recent years, topic model has become an effective method in machine learning, information retrieval and natural language processing. This paper systematically studies the cross language retrieval model, cross language text classification method and cross language text clustering method. Without the help of cross language resources such as machine translation and bilingual dictionaries, it can effectively solve the many to many problem of Vocabulary Translation in CLIR and the problem of partial decomposition of unknown words. The experimental results on the cross language text classification evaluation corpus established in this paper show that the performance of cross language and single language text classification on the bilingual topic space constructed by this method is close to or better than that of single language classification on the original feature space, and the performance of cross language text clustering is close to or better than that of single language document clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

English corpus and literary analysis based on statistical language model

Article 13 March 2018

Identifying Word Translations in Scientific Literature Based on Labeled Bilingual Topic Model and Co-occurrence Features

An Overview of Cross-Language Information Retrieval

References

Zou, J., Kanoulas, E.: Towards question-based high-recall information retrieval: locating the last few relevant documents for technology-assisted reviews. ACM Trans. Inf. Syst. 38(3), 1–35 (2020)
Article Google Scholar
Kim, H., Cha, M., Kim, B.C., et al.: Part library-based information retrieval and inspection framework to support part maintenance using 3D printing technology. Rapid Prototyping J. 25(3), 630–644 (2019)
Article Google Scholar
Kanwal, S., Malik, K., Shahzad, K., et al.: Urdu named entity recognition: corpus generation and deep learning applications. ACM Trans. Asian Lang. Inf. Process. 19(1), 8.1-8.13 (2020)
Google Scholar
Rascon, C., Ruiz-Espitia, O., Martinez-Carranza, J.: On the use of the AIRA-UAS corpus to evaluate audio processing algorithms in unmanned aerial systems. Sensors 19(18), 3902 (2019)
Article Google Scholar
Rojc, M., Mlakar, I.: A new unit selection optimisation algorithm for corpus-based TTS systems using the RBF-based data compression technique. IEEE Access 7(10), 1 (2019)
Google Scholar
Mishra, S., Soni, D.: Smishing detector: a security model to detect smishing through SMS content analysis and URL behavior analysis. Futur. Gener. Comput. Syst. 108(10), 803–815 (2020)
Article Google Scholar
Oh, I., Kim, T., Yim, K., et al.: A novel message-preserving scheme with format-preserving encryption for connected cars in multi-access edge computing. Sensors 19(18), 3869–3870 (2019)
Article Google Scholar
Deng, N., Deng, S., Hu, C., et al.: An efficient revocable attribute-based signcryption scheme with outsourced unsigncryption in cloud computing. IEEE Access 8(10), 42805–42815 (2020)
Article Google Scholar
Ferdinando, D.M., Sabrina, S., Salvatore, S.: A lightweight clustering–based approach to discover different emotional shades from social message streams. Int. J. Intell. Syst. 34(7), 1505–1523 (2019)
Article Google Scholar
Jia, Z., Jafar, S.A.: On the asymptotic capacity of X-secure T-private information retrieval with graph-based replicated storage. IEEE Trans. Inf. Theory 66(10), 6280–6296 (2020)
Article MathSciNet Google Scholar
Bhattacharya, P., Goyal, P., Sarkar, S.: Using communities of words derived from multilingual word vectors for cross-language information retrieval in indian languages. ACM Trans. Asian Lang. Inf. Process. 18(1), 1.1-1.27 (2019)
Google Scholar
Liu, S., Bai, W., Liu, G., et al.: Parallel fractal compression method for big video data. Complexity 2018, 2016976 (2018)
MATH Google Scholar
Liu, S., He, T., Dai, J.: A survey of CRF algorithm based knowledge extraction of elementary mathematics in Chinese. Mobile Netw. Appl. 26, 1891–1903 (2021)
Article Google Scholar

Download references

Funding

1. This paper is supported by the Postgraduate Research and Innovation Project of Northwest Minzu University , the name of the project is “A study on the names and descriptions of the things of mongolian <qin ding li fan yuan ze li>” (project number is Yxm202009), which is the phased achievement of this project.

2. This paper is supported by “the Fundamental Research Funds for the Central Universities-A Study on the Influence of Ecological Protection Policy of Qilian Mountains on Surrounding Herdsmen from the Perspective of Ecological Safety”, which number is 31920190133.

Author information

Authors and Affiliations

School of Chinese Language and Literature, Northwest Minzu University, Lanzhou, 730030, China
Ri-han Wu
School of Ethnology and Sociology, Northwest Minzu University, Lanzhou, 730030, China
Yi-jie Cao

Authors

Ri-han Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yi-jie Cao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ri-han Wu .

Editor information

Editors and Affiliations

Hunan Normal University, Changsha, China
Shuai Liu
Harbin Engineering University, Harbin, China
Xuefei Ma

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, Rh., Cao, Yj. (2022). Research on Intelligent Retrieval Model of Multilingual Text Information in Corpus. In: Liu, S., Ma, X. (eds) Advanced Hybrid Information Processing. ADHIP 2021. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 416. Springer, Cham. https://doi.org/10.1007/978-3-030-94551-0_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-94551-0_3
Published: 18 January 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-94550-3
Online ISBN: 978-3-030-94551-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Research on Intelligent Retrieval Model of Multilingual Text Information in Corpus

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

English corpus and literary analysis based on statistical language model

Identifying Word Translations in Scientific Literature Based on Labeled Bilingual Topic Model and Co-occurrence Features

An Overview of Cross-Language Information Retrieval

References

Funding

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Research on Intelligent Retrieval Model of Multilingual Text Information in Corpus

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

English corpus and literary analysis based on statistical language model

Identifying Word Translations in Scientific Literature Based on Labeled Bilingual Topic Model and Co-occurrence Features

An Overview of Cross-Language Information Retrieval

References

Funding

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation