With the development of the internet, information has exploded rapidly, and more and more scientific documents containing many mathematical expressions have rapidly been provided. The influx of various scientific documents has made it increasingly difficult to find useful information from them. Mathematical expressions are an important component of scientific documents for describing scientific content. Therefore, retrieving scientific documents by employing mathematical expressions such as query expression have become a necessary way for researchers to find the scientific information they need [
1]. However, the traditional search engines designed for full text retrieval cannot work well on the math queries, because of the special characteristics of mathematical expressions. Therefore, it is necessary to use mathematical expressions as the main subject for scientific document retrieval.
1.1. Related Work
At present, text-based scientific document retrieval technology is largely mature [
2,
3,
4]. However, mathematical expression-based retrieval is still under development. In recent years, many researchers have made progress in retrieving mathematical expressions. Three main approaches have been applied for mathematical expressions retrieval in previous models: (a) Operator trees (OPTs): which captures mathematical expression appearance [
5]. For instance, Zhong et al. [
6] proposed a dynamic pruning algorithm to solve the substructure retrieval of mathematical expressions. This retrieval algorithm expresses the mathematical expression as OPTs, which improves the efficiency of the mathematical expression retrieval. (b) Symbol layout trees (SLTs): which captures mathematical expression syntax [
7,
8]. (c) Embedding models: which converts two-dimensional mathematical expressions into one-dimensional vectors by using word embedding models [
9,
10,
11,
12].
Mathematical expressions have a complex two-dimensional structure, so it is very reasonable to introduce multi-criteria decision-making (MCDM) theory in the retrieval of mathematical expressions. MCDM theory has made many advances in recent years [
13,
14,
15], and has been applied and has achieved good results in location of a fleet [
16], the selection of warships [
17], and information security risk assessment in critical infrastructure [
18]. Hesitant fuzzy sets (HFSs) are one of the MCDM theories, and as an extension of fuzzy sets, they have made achievements in theory as well as in numerous other fields [
19,
20]. HFSs have been proven as a potential structure to express the uncertainty and vagueness [
21,
22], which can measure the impact of each attribute on decision making in an integrated way and are more flexible in expressing hesitant information in terms of processing. Driven by their unique advantages and the richness of their applications, we find them applicable to the retrieval of mathematical expressions with multiple attributes.
In terms of text similarity, Bromley et al. [
23] first proposed the Siamese network in 1993, and its model has the parameter-sharing property, which is very suitable for calculating sentences’ similarity. Therefore, Wang et al. [
24] proposed a bilateral multi-perspective matching model. They used the bidirectional LSTM combined with the Siamese network. The sentences are bilaterally encoded and matched in multiple ways to obtain the final sentence similarity, and the introduction of bilateral and multi-perspective matching makes the model more able to capture the semantic information of sentences. Liu et al. [
25] proposed a sentence similarity model with multi-feature fusion, introduced syntactic structure and word order features, and improved the accuracy of sentence similarity.
In the comprehensive retrieval of mathematical expressions and text, Zhong et al. [
26] used an improved OPT algorithm for retrieval of mathematical expressions and mined contextual potential keywords as query extensions, which explored the semantics of mathematical expressions and enabled a more accurate retrieval of relevant mathematical content. Kristianto et al. [
27] proposed a dependency graph method to enrich the semantic information of mathematical expressions because of the difficulty of capturing the semantics of mathematical expression context, and the experimental results showed that the accuracy of the mathematical search system can be improved by 13%. Tian et al. [
28] proposed a scientific document retrieval method based on the hesitant fuzzy set and BERT. They first used the hesitant fuzzy set to retrieve the mathematical expression, then used BERT to encode the keywords into word vectors, and finally used the cosine similarity to calculate the similarity between two keywords. On this basis, Tian et al. [
29] extracted full-text keywords, and then the GBDT model was used to discrete and reorganize mathematical expressions and text attributes; finally, the LR model was used to train the attributes to obtain the final retrieval results. The results showed that the comprehensive mathematical expression and the context of the scientific document retrieval were more reasonable. Pathak et al. [
30,
31] designed a knowledge base (KB) containing contextual formula pairs, and a total of 12,573 pairs of formulas and their contexts were extracted, considering the similarity between mathematical expressions, contexts, and documents. This method considered the relationship between the mathematical expression itself and its context, and then made the retrieval more credible. In 2019, Yuan et al. [
32] proposed a new abstract model based on the mathematical content “MathSum”, which uses the pointer mechanism and the multi-head attention mechanism to extract the mathematical content of the text and enrich the semantics of mathematical expressions, respectively, which provide new ideas for retrieving scientific documents. In 2019, Dhar et al. [
33] proposed a signature-based hashing scheme, which constructed the search engine “SigMa”, based on mathematical expressions, to retrieve documents by perceiving the high structure in mathematical expressions, which solves the problem that scientific texts based on mathematical expressions are not adapted to the traditional text retrieval system. Scharpf et al. [
34] applied mathematical expressions to the document recommendation system, which annotated the variables and constants of mathematical expressions; the method disambiguates mathematical identifiers and achieves good results.
In conclusion, scientific document retrieval mainly has three methods based on text, based on mathematical expressions, and based on the fusion of mathematical expressions and text. It is difficult to describe scientific documents completely, whether it is a single mathematical expression or text, so the current scientific document retrieval mostly uses the fusion of mathematical expressions and text and uses keywords in the text, but keywords contain less information and are easy to extract inaccurately, so obtaining more text information related to mathematical expressions is also a big problem that needs to be solved. At the same time, the ontology properties of the scientific document are ignored in either way, making it difficult for the search model to meet the needs of users.