Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Leveraging semantic resources in diversified query expansion

Published: 01 July 2018 Publication History

Abstract

A search query, being a very concise grounding of user intent, could potentially have many possible interpretations. Search engines hedge their bets by diversifying top results to cover multiple such possibilities so that the user is likely to be satisfied, whatever be her intended interpretation. Diversified Query Expansion is the problem of diversifying query expansion suggestions, so that the user can specialize the query to better suit her intent, even before perusing search results. In this paper, we consider the usage of semantic resources and tools to arrive at improved methods for diversified query expansion. In particular, we develop two methods, those that leverage Wikipedia and pre-learnt distributional word embeddings respectively. Both the approaches operate on a common three-phase framework; that of first taking a set of informative terms from the search results of the initial query, then building a graph, following by using a diversity-conscious node ranking to prioritize candidate terms for diversified query expansion. Our methods differ in the second phase, with the first method Select-Link-Rank (SLR) linking terms with Wikipedia entities to accomplish graph construction; on the other hand, our second method, Select-Embed-Rank (SER), constructs the graph using similarities between distributional word embeddings. Through an empirical analysis and user study, we show that SLR ourperforms state-of-the-art diversified query expansion methods, thus establishing that Wikipedia is an effective resource to aid diversified query expansion. Our empirical analysis also illustrates that SER outperforms the baselines convincingly, asserting that it is the best available method for those cases where SLR is not applicable; these include narrow-focus search systems where a relevant knowledge base is unavailable. Our SLR method is also seen to outperform a state-of-the-art method in the task of diversified entity ranking.

References

[1]
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993---1022 (2003)
[2]
Bouchoucha, A., He, J., Nie, J.Y.: Diversified query expansion using conceptnet. In: Proceedings of the 22nd ACM international conference on Conference on information & knowledge management, ACM, pp. 1861---1864 (2013)
[3]
Bouchoucha, A., Liu, X., Nie, J.Y.: Integrating multiple resources for diversified query expansion. In: Advances in Information Retrieval, Springer, pp. 437---442 (2014)
[4]
Bouchoucha, A., Liu, X., Nie, J.Y.: Towards query level resource weighting for diversified query expansion. In: Advances in Information Retrieval, Springer, pp. 1---12 (2015)
[5]
Carbonell, J., Goldstein, J.: The use of mmr, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, ACM, pp. 335---336 (1998)
[6]
Ceccarelli, D., Lucchese, C., Orlando, S., Perego, R., Trani, S.: Dexter 2.0 - an open source tool for semantically enriching data. In: Proceedings of the ISWC 2014 Posters & Demonstrations Track a track within the 13th International Semantic Web Conference, ISWC 2014, Riva del Garda, Italy, October 21, 2014., pp. 417---420 (2014)
[7]
Clueweb. http://lemurproject.org/clueweb09/ (2009)
[8]
Collins-Thompson, K.: Estimating robust query models with convex optimization. In: Advances in Neural Information Processing Systems, pp. 329---336 (2009)
[9]
Dalton, J., Dietz, L., Allan, J.: Entity query feature expansion using knowledge base links. In: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, ACM, pp. 365---374 (2014)
[10]
Deepak, P., Ranu, S., Banerjee, P., Mehta, S.: Entity linking for Web search queries. In: Advances in Information Retrieval, Springer, pp. 394---399 (2015)
[11]
Diaz, F., Mitra, B., Craswell, N.: Query expansion with locally-trained word embeddings. arXiv:160507891 (2016)
[12]
Dou, Z., Hu, S., Chen, K., Song, R., Wen, J.R.: Multi-dimensional search result diversification. In: Proceedings of the fourth ACM international conference on Web search and data mining, ACM, pp. 475---484 (2011)
[13]
Ferragina, P., Scaiella, U.: Tagme: on-the-fly annotation of short text fragments (by wikipedia entities). In: Proceedings of the 19th ACM international conference on Information and knowledge management, ACM, pp. 1625---1628 (2010)
[14]
Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: IJCAI, vol. 7, pp. 1606-1611 (2007)
[15]
He, B., Ounis, I.: Combining fields for query expansion and adaptive query expansion. Inform Process Manag 43(5), 1294---1307 (2007)
[16]
He, J., Hollink, V., de Vries, A.: Combining implicit and explicit topic representations for result diversification. In: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, ACM, pp. 851---860 (2012)
[17]
Jakarta, A.: Apache lucene-a high-performance, full-featured text search engine library (2004)
[18]
Krishnan, A., Padmanabhan, D., Ranu, S., Mehta, S.: Select, link and rank: Diversified query expansion and entity ranking using wikipedia. In: Web Information Systems Engineering - WISE 2016 - 17th International Conference, Shanghai, China, Proceedings, Part I, pp. 157---173. (2016)
[19]
Kuzi, S., Shtok, A., Kurland, O.: Query expansion using word embeddings. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, ACM, New York, NY, USA, CIKM '16, pp. 1929---1932. (2016)
[20]
Li, C., Wang, H., Zhang, Z., Sun, A., Ma, Z.: Topic modeling for short texts with auxiliary word embeddings. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, New York, NY, USA, SIGIR '16, pp. 165---174. (2016)
[21]
Liu, X., Bouchoucha, A., Sordoni, A., Nie, J.Y.: Compact aspect embedding for diversified query expansions. In: Proceedings of AAAI, vol. 14, pp. 115---121 (2014)
[22]
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp. 3111---3119 (2013)
[23]
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the Web. In: Proceedings of the 7th International World Wide Web Conference, pp. 161---172 (1998)
[24]
Pemantle, R.: Vertex-reinforced random walk. Probab. Theory Relat. Fields 92 (1), 117---136 (1992)
[25]
Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word representation. In: Empirical Methods in Natural Language Processing (EMNLP), pp. 1532---1543, http://www.aclweb.org/anthology/D14-1162 (2014)
[26]
Santos, R.L., Macdonald, C., Ounis, I.: Exploiting query reformulations for Web search result diversification. In: Proceedings of the 19th international conference on World wide Web, ACM, pp. 881---890 (2010a)
[27]
Santos, R.L., Peng, J., Macdonald, C., Ounis, I.: Explicit search result diversification through sub-queries. In: Advances in information retrieval, Springer, pp. 87---99 (2010b)
[28]
Schuhmacher, M., Ponzetto, S.P.: Knowledge-based graph document modeling. In: Proceedings of the 7th ACM international conference on Web search and data mining, ACM, pp. 543---552 (2014)
[29]
Singh, A., Raghu, D., et al.: Retrieving similar discussion forum threads: a structure based approach. In: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, ACM, pp. 135---144 (2012)
[30]
Song, R., Luo, Z., Wen, J.R., Yu, Y., Hon, H.W.: Identifying ambiguous queries in Web search. In: Proceedings of the 16th international conference on World Wide Web, ACM, pp. 1169---1170 (2007)
[31]
Strohman, T., Metzler, D., Turtle, H., Croft, W.B.: Indri: A language model-based search engine for complex queries. In: Proceedings of the International Conference on Intelligent Analysis, Citeseer, vol. 2, pp. 2---6 (2005)
[32]
Telang, A., Deepak, P., Joshi, S., Deshpande, P., Rajendran, R.: Detecting localized homogeneous anomalies over spatio-temporal data. Data Min. Knowl. Discov. 28(5-6), 1480---1502 (2014).
[33]
Van Deursen, A.J., Van Dijk, J.A.: Using the internet: Skill related problems in users' online behavior. Interact. Comput. 21(5), 393---402 (2009)
[34]
Vargas, S., Santos, R.L., Macdonald, C., Ounis, I.: Selecting effective expansion terms for diversity. In: Proceedings of the 10th Conference on Open Research Areas in Information Retrieval, pp. 69---76 (2013)
[35]
Whissell, J.S., Clarke, C.L.: Improving document clustering using okapi bm25 feature weighting. Inf. Retr. 14(5), 466---487 (2011)
[36]
Xu, Y., Jones, G.J., Wang, B.: Query dependent pseudo-relevance feedback based on wikipedia. In: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, ACM, pp. 59---66 (2009)
[37]
Zhu, X., Goldberg, A.B., Van Gael, J., Andrzejewski, D.: Improving diversity in ranking using absorbing random walks. In: HLT-NAACL, Citeseer, pp. 97---104 (2007)

Cited By

View all
  • (2024)Identifying Large Structural Balanced Cliques in Signed GraphsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.329580336:3(1145-1160)Online publication date: 1-Mar-2024
  • (2024)Event-Specific Document Ranking Through Multi-stage Query Expansion Using an Event Knowledge GraphAdvances in Information Retrieval10.1007/978-3-031-56060-6_22(333-348)Online publication date: 24-Mar-2024
  • (2023)Balanced Clique Computation in Signed Networks: Concepts and AlgorithmsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.322556235:11(11079-11092)Online publication date: 1-Nov-2023
  • Show More Cited By

Index Terms

  1. Leveraging semantic resources in diversified query expansion
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image World Wide Web
        World Wide Web  Volume 21, Issue 4
        Jul 2018
        401 pages

        Publisher

        Kluwer Academic Publishers

        United States

        Publication History

        Published: 01 July 2018

        Author Tags

        1. Diversification
        2. Entity ranking
        3. Query expansion
        4. Semantic search
        5. Wikipedia

        Qualifiers

        • Article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 26 Jan 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Identifying Large Structural Balanced Cliques in Signed GraphsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.329580336:3(1145-1160)Online publication date: 1-Mar-2024
        • (2024)Event-Specific Document Ranking Through Multi-stage Query Expansion Using an Event Knowledge GraphAdvances in Information Retrieval10.1007/978-3-031-56060-6_22(333-348)Online publication date: 24-Mar-2024
        • (2023)Balanced Clique Computation in Signed Networks: Concepts and AlgorithmsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.322556235:11(11079-11092)Online publication date: 1-Nov-2023
        • (2022)An overview of cluster-based image search result organization: background, techniques, and ongoing challengesKnowledge and Information Systems10.1007/s10115-021-01650-964:3(589-642)Online publication date: 1-Mar-2022
        • (2021)Semantics-based key concepts identification for documents indexing and retrieval on the webInternational Journal of Innovative Computing and Applications10.1504/ijica.2021.11360812:1(1-12)Online publication date: 5-Mar-2021
        • (2021)MS MARCO Chameleons: Challenging the MS MARCO Leaderboard with Extremely Obstinate QueriesProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482011(4426-4435)Online publication date: 26-Oct-2021
        • (2020)Unsupervised Fake News DetectionProceedings of the 31st ACM Conference on Hypertext and Social Media10.1145/3372923.3404783(75-83)Online publication date: 13-Jul-2020
        • (2020)Efficient Maximal Balanced Clique Enumeration in Signed NetworksProceedings of The Web Conference 202010.1145/3366423.3380119(339-349)Online publication date: 20-Apr-2020
        • (2018)Utilizing Knowledge Graphs for Text-Centric Information RetrievalThe 41st International ACM SIGIR Conference on Research & Development in Information Retrieval10.1145/3209978.3210187(1387-1390)Online publication date: 27-Jun-2018

        View Options

        View options

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media