Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2348283.2348397acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Combining implicit and explicit topic representations for result diversification

Published: 12 August 2012 Publication History

Abstract

Result diversification deals with ambiguous or multi-faceted queries by providing documents that cover as many subtopics of a query as possible. Various approaches to subtopic modeling have been proposed. Subtopics have been extracted internally, e.g., from retrieved documents, and externally, e.g., from Web resources such as query logs. Internally modeled subtopics are often implicitly represented, e.g., as latent topics, while externally modeled subtopics are often explicitly represented, e.g., as reformulated queries.
We propose a framework that: i)combines both implicitly and explicitly represented subtopics; and ii)allows flexible combination of multiple external resources in a transparent and unified manner. Specifically, we use a random walk based approach to estimate the similarities of the explicit subtopics mined from a number of heterogeneous resources: click logs, anchor text, and web n-grams. We then use these similarities to regularize the latent topics extracted from the top-ranked documents, i.e., the internal (implicit) subtopics. Empirical results show that regularization with explicit subtopics extracted from the right resource leads to improved diversification results, indicating that the proposed regularization with (explicit) external resources forms better (implicit) topic models. Click logs and anchor text are shown to be more effective resources than web n-grams under current experimental settings. Combining resources does not always lead to better results, but achieves a robust performance. This robustness is important for two reasons: it cannot be predicted which resources will be most effective for a given query, and it is not yet known how to reliably determine the optimal model parameters for building implicit topic models.

References

[1]
R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In WSDM '09, pages 5--14, 2009.
[2]
J. Allan and H. Raghavan. Using part-of-speech patterns to reduce query ambiguity. In SIGIR '02, pages 307--314, 2002.
[3]
D. Beeferman and A. Berger. Agglomerative clustering of a search engine query log. In KDD'00, pages 407--416, 2000.
[4]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet allocation. J. Mach. Learn. Res., 3: 993--1022, 2003.
[5]
B. R. Boyce. Beyond topicality: A two stage view of relevance and the retrieval process. Information Processing & Management, 18 105--109, 1982.
[6]
D. Cai, Q. Mei, J. Han, and C. Zhai. Modeling hidden topics on document manifold. In CIKM '08, pages 911--920, 2008.
[7]
J. Carbonell and J. Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In SIGIR '98, pages 335--336, 1998.
[8]
B. Carterette and P. Chandar. Probabilistic models of ranking novel documents for faceted topic retrieval. In CIKM'09, pages 1287--1296, 2009.
[9]
O. Chapelle, D. Metlzer, Y. Zhang, and P. Grinspan. Expected reciprocal rank for graded relevance. In CIKM'09, pages 621--630, 2009.
[10]
H. Chen and D. R. Karger. Less is more: probabilistic models for retrieving fewer relevant documents. In SIGIR '06, pages 429--436, 2006.
[11]
C. Clarke, M. Kolla, G. Cormack, O. Vechtomova, A. Ashkan, S. Buttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In SIGIR'08, pages 659--666, 2008.
[12]
C. Clarke, N. Craswell, and I. Soboroff. Overview of the TREC 2009 web track. In TREC'09, 2009.
[13]
C. Clarke, N. Craswell, and I. Soboroff. Overview of the TREC 2010 web track. In TREC'10, 2010.
[14]
C. Clarke, N. Craswell, and E. Soboroff, I.and Voorhees. Overview of the TREC 2011 web track. In TREC'11, 2011.
[15]
G. V. Cormack, M. D. Smucker, and C. L. A. Clarke. Efficient and effective spam filtering and re-ranking for large web datasets. Information Retrieval, 14 (5): 441--465, 2011.
[16]
N. Craswell and M. Szummer. Random walks on the click graph. In SIGIR'07, pages 239--246, 2007.
[17]
N. Craswell, R. Jones, G. Dupret, and E. Viegas, editors. WSCD'09, 2009.
[18]
J. Cui, H. Liu, J. Yan, L. Ji, R. Jin, J. He, Y. Gu, Z. Chen, and X. Du. Multi-view random walk framework for search task discovery from click-through log. In CIKM '11, pages 135--140, 2011.
[19]
V. Dang and B. W. Croft. Query reformulation using anchor text. In WSDM '10, pages 41--50, 2010.
[20]
V. Dang, X. Xue, and B. Croft. Inferring query aspects from reformulations using clustering. In CIKM '11, 2011.
[21]
Z. Dou, S. Hu, K. Chen, R. Song, and J.-R. Wen. Multi-dimensional search result diversification. In WSDM'11, pages 475--484, 2011.
[22]
A. Fuxman, P. Tsaparas, K. Achan, and R. Agrawal. Using the wisdom of the crowds for keyword generation. In WWW '08, pages 61--70, 2008.
[23]
W. Goffman. A searching procedure for information retrieval. Information Storage and Retrieval, 2 (2): 73--78, 1964.
[24]
J. Guo, X. Cheng, G. Xu, and X. Zhu. Intent-aware query similarity. In CIKM'11, pages 259--268, 2011.
[25]
J. He. Exploring topic structure: Coherence, Diversity and Relatedness. PhD thesis, University of Amsterdam, 2011.
[26]
J. He, E. Meij, and M. de Rijke. Result diversification based on query-specific cluster ranking. J. Am. Soc. Inf. Sci. Technol., 62 (3): 550--571, 2011.
[27]
D. Hiemstra and C. Hauff. MIREX: MapReduce information retrieval experiments. Technical Report TR-CTIT-10--15, University of Twente, 2010.
[28]
T. Hofmann. Probabilistic latent semantic indexing. In SIGIR'99, pages 50--57, 1999.
[29]
Z. Li, F. Chen, Q. Xing, J. Miao, Y. Xue, T. Zhu, B. Zhou, R. Cen, Y. Liu, M. Zhang, Y. Jin, and S. Ma. Thuir at trec 2009 web track: Finding relevant and diverse results for large scale web search. In TREC, 2009.
[30]
H. Ma, M. R. Lyu, and I. King. Diversifying query suggestion results. In AAAI'10, 2010.
[31]
D. Metzler and W. B. Croft. Combining the language model and inference network approaches to retrieval. Inf. Process. Manage., 40: 735--750, September 2004.
[32]
J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In SIGIR '98, pages 275--281, 1998.
[33]
F. Radlinski, M. Szummer, and N. Craswell. Inferring query intent from reformulations and clicks. In WWW '10, pages 1171--1172, 2010.
[34]
D. Rafiei, K. Bharat, and A. Shukla. Diversifying web search results. In WWW '10, pages 781--790, 2010.
[35]
R. L. T. Santos, C. Macdonald, and I. Ounis. Exploiting query reformulations for web search result diversification. In WWW'10, pages 881--890, 2010.
[36]
N. Tishby and N. Slonim. Data clustering by markovian relaxation and the information bottleneck method. In NIPS, pages 640--646, 2000.
[37]
J. Wang and J. Zhu. Portfolio theory of information retrieval. In SIGIR '09, pages 115--122, 2009.
[38]
K. Wang, C. Thrasher, E. Viegas, X. Li, and B.-j. P. Hsu. An overview of microsoft web n-gram corpus and applications. In NAACL HLT'10, pages 45--48, 2010.
[39]
J.-R. Wen, J.-Y. Nie, and H.-J. Zhang. Clustering user queries of a search engine. In WWW '01, pages 162--168, 2001.
[40]
F. Wilcoxon. Individual comparisons by ranking methods. Biometrics Bulletin, 1 (6): 80--83, 1945.
[41]
I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, 1st edition, 1999.
[42]
Y. Yue and T. Joachims. Predicting diverse subsets using structural {SVM}s. In ICML '08, pages 1224--1231, 2008.
[43]
C. X. Zhai, W. W. Cohen, and J. Lafferty. Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In SIGIR '03, pages 10--17, 2003.
[44]
X. Zhu, A. B. Goldberg, J. Van, and G. D. Andrzejewski. Improving diversity in ranking using absorbing random walks. Technical report, University of Washington, 2007.

Cited By

View all
  • (2023)Search Result Diversification Using Query Aspects as BottlenecksProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615050(3040-3051)Online publication date: 21-Oct-2023
  • (2022)Towards Explainable Search ResultsProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3532067(669-680)Online publication date: 6-Jul-2022
  • (2020)Coverage-based query subtopic diversification leveraging semantic relevanceKnowledge and Information Systems10.1007/s10115-020-01470-362:7(2873-2891)Online publication date: 27-Apr-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
August 2012
1236 pages
ISBN:9781450314725
DOI:10.1145/2348283
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 August 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. diversity
  2. random walk
  3. subtopics

Qualifiers

  • Research-article

Conference

SIGIR '12
Sponsor:

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)1
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Search Result Diversification Using Query Aspects as BottlenecksProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615050(3040-3051)Online publication date: 21-Oct-2023
  • (2022)Towards Explainable Search ResultsProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3532067(669-680)Online publication date: 6-Jul-2022
  • (2020)Coverage-based query subtopic diversification leveraging semantic relevanceKnowledge and Information Systems10.1007/s10115-020-01470-362:7(2873-2891)Online publication date: 27-Apr-2020
  • (2018)From Greedy Selection to Exploratory Decision-MakingThe 41st International ACM SIGIR Conference on Research & Development in Information Retrieval10.1145/3209978.3209979(125-134)Online publication date: 27-Jun-2018
  • (2018)Learning to CollaborateProceedings of the 2018 World Wide Web Conference10.1145/3178876.3186165(1939-1948)Online publication date: 10-Apr-2018
  • (2018)Scalable Aspects Learning for Intent-Aware Diversified Search on Social NetworksIEEE Access10.1109/ACCESS.2018.28509356(37124-37137)Online publication date: 2018
  • (2018)Leveraging semantic resources in diversified query expansionWorld Wide Web10.1007/s11280-017-0468-721:4(1041-1067)Online publication date: 1-Jul-2018
  • (2018)Novel Approaches to Accelerating the Convergence Rate of Markov Decision Process for Search Result DiversificationDatabase Systems for Advanced Applications10.1007/978-3-319-91458-9_11(184-200)Online publication date: 12-May-2018
  • (2017)Displaying the Amount of Missed Information in Recall-Oriented TasksTransactions of the Japanese Society for Artificial Intelligence10.1527/tjsai.WII-G32:1(WII-G_1-12)Online publication date: 2017
  • (2017)Adapting Markov Decision Process for Search Result DiversificationProceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3077136.3080775(535-544)Online publication date: 7-Aug-2017
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media