Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2600428.2609561acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Fusion helps diversification

Published: 03 July 2014 Publication History

Abstract

A popular strategy for search result diversification is to first retrieve a set of documents utilizing a standard retrieval method and then rerank the results. We adopt a different perspective on the problem, based on data fusion. Starting from the hypothesis that data fusion can improve performance in terms of diversity metrics, we examine the impact of standard data fusion methods on result diversification. We take the output of a set of rankers, optimized for diversity or not, and find that data fusion can significantly improve state-of-the art diversification methods. We also introduce a new data fusion method, called diversified data fusion, which infers latent topics of a query using topic modeling, without leveraging outside information. Our experiments show that data fusion methods can enhance the performance of diversification and DDF significantly outperforms existing data fusion methods in terms of diversity metrics.

References

[1]
S. Abbar, S. Amer-Yahia, P. Indyk, and S. Mahabadi. Real-time recommendation of diverse related articles. In WWW, pages 1--12, 2013.
[2]
R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In WSDM, pages 5--14, 2009.
[3]
E. Aktolga and J. Allan. Sentiment diversification with different biases. In SIGIR, pages 593--600, 2013.
[4]
J. A. Aslam and M. Montague. Models for metasearch. In SIGIR'01, pages 276--284, 2001.
[5]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993--1022, 2003.
[6]
J. Carbonell and J. Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In SIGIR, pages 335--336, 1998.
[7]
H. Chen and D. R. Karger. Less is more: probabilistic models for retrieving fewer relevant documents. In SIGIR, pages 429--436, 2006.
[8]
C. L. A. Clarke and N. Craswell. Overview of the TREC 2011 web track. In TREC, pages 1--9, 2011.
[9]
C. L. A. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. Büttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In SIGIR, pages 659--666, 2008.
[10]
C. L. A. Clarke, N. Craswell, and I. Soboroff. Overview of the TREC 2009 web track. In TREC, pages 1--9, 2009.
[11]
C. L. A. Clarke, N. Craswell, I. Soboroff, and G. V. Cormack. Overview of the TREC 2010 web track. In TREC, pages 1--9, 2010.
[12]
C. L. A. Clarke, N. Craswell, and E. M. Voorhees. Overview of the TREC 2012 web track. In TREC, pages 1--8, 2012.
[13]
V. Dang and W. B. Croft. Diversity by proportionality: An election-based approach to search result diversification. In SIGIR, pages 65--74, 2012.
[14]
V. Dang and W. B. Croft. Term level search result diversification. In SIGIR, pages 603--612, 2013.
[15]
M. Efron. Information search and retrieval in microblogs. J. Am. Soc. for Inform. Sci. and Techn., 62(6):996--1008, 2011.
[16]
M. Farah and D. Vanderpooten. An outranking approach for rank aggregation in information retrieval. In SIGIR'07, 2007.
[17]
E. A. Fox and J. A. Shaw. Combination of multiple searches. In TREC-2, 1994.
[18]
T. L. Griffiths and M. Steyvers. Finding scientific topics. PNAS, 101:5228--5235, 2004.
[19]
D. He and D. Wu. Toward a robust data fusion for document retrieval. In IEEE NLP-KE'08, 2008.
[20]
J. He, V. Hollink, and A. de Vries. Combining implicit and explicit topic representations for result diversification. In SIGIR, pages 851--860, 2012.
[21]
T. Hofmann. Probabilistic latent semantic indexing. In SIGIR, pages 50--57, 1999.
[22]
O. Jin, N. N. Liu, K. Zhao, Y. Yu, and Q. Yang. Transferring topical knowledge from auxiliary long texts for short text clustering. In CIKM, pages 775--784, 2011.
[23]
A. K. Kozorovitsky and O. Kurland. Cluster-based fusion of retrieved lists. In SIGIR'11, pages 893--902, 2011.
[24]
T. Kurashima, T. Iwata, T. Hoshide, N. Takaya, and K. Fujimura. Geo topic model: joint modeling of user's activity area and interests for location recommendation. In WSDM, pages 375--384, 2013.
[25]
J. D. Lafferty and D. M. Blei. Correlated topic models. In NIPS'05, pages 147--154, 2005.
[26]
J. H. Lee. Combining multiple evidence from different properties of weighting schemes. In SIGIR'95, pages 180--188, 1995.
[27]
J. H. Lee. Analyses of multiple evidence combination. In SIGIR, 1997.
[28]
F. Li, M. Huang, and X. Zhu. Sentiment analysis with global topics and local dependency. In AAAI, 2010.
[29]
W. Li and A. McCallum. Pachinko allocation: Dag-structured mixture models of topic correlations. In ICML, pages 577--584. ACM, 2006.
[30]
S. Liang and M. de Rijke. Finding knowledgeable groups in enterprise corpora. In SIGIR'13, pages 1005--1008, 2013.
[31]
S. Liang, M. de Rijke, and M. Tsagkias. Late data fusion for microblog search. In ECIR'13, pages 743--746, 2013.
[32]
S. Liang, Z. Ren, and M. de Rijke. The impact of semantic document expansion on cluster-based fusion for microblog search. In ECIR'14, pages 493--499, 2014.
[33]
N. Limsopatham, R. McCreadie, and M.-D. Albakour. University of Glasgow at TREC 2012: Experiments with Terrier in medical records, microblog, and web tracks. In TREC, 2012.
[34]
J. S. Liu. The collapsed gibbs sampler in bayesian computations with applications to a gene regulation problem. J. Am. Stat. Assoc., 89(427):958--966, 1994.
[35]
C. Macdonald and I. Ounis. Voting for candidates: Adapting data fusion techniques for an expert search task. In CIKM, 2006.
[36]
Z. Ren, S. Liang, E. Meij, and M. de Rijke. Personalized time-aware tweets summarization. In SIGIR, 2013.
[37]
M. Rosen-Zvi, T. Griffiths, M. Steyvers, and P. Smyth. The author-topic model for authors and documents. In UAI, pages 487--494, 2004.
[38]
T. Sakai, Z. Dou, and C. L. A. Clarke. The impact of intent selection on diversified search result. In SIGIR, 2013.
[39]
R. L. Santos, C. Macdonald, and I. Ounis. Exploiting query reformulations for web search result diversification. In WWW, pages 881--890, 2010.
[40]
R. L. Santos, C. Macdonald, and I. Ounis. Intent-aware search result diversification. In SIGIR, pages 595--604, 2011.
[41]
R. L. T. Santos, J. Peng, C. Macdonald, and I. Ounis. Explicit search result diversification through sub-queries. In ECIR, 2010.
[42]
J. A. Shaw and E. A. Fox. Combination of multiple searches. In TREC 1992, pages 243--252. NIST, 1993.
[43]
D. Sheldon, M. Shokouhi, M. Szummer, and N. Craswell. LambdaMerge: merging the results of query reformulations. In WSDM, pages 795--804, 2011.
[44]
I. Szpektor, Y. Maarek, and D. Pelleg. When relevance is not enough: promoting diversity and freshness in personalized question recommendation. In WWW '13, 2013.
[45]
S. Vargas, P. Castells, and D. Vallet. Explicit relevance models in intent-oriented information retrieval diversification. In SIGIR, pages 75--84, 2012.
[46]
X. Wang and A. McCallum. Topics over time: a non-markov continuous-time model of topical trends. In KDD'06, pages 424--433, 2006.
[47]
X. Wei and W. B. Croft. LDA-based document models for ad-hoc retrieval. In SIGIR, pages 178--185, 2006.
[48]
X. Wei, J. Sun, and X. Wang. Dynamic mixture models for multiple time-series. In IJCAI, pages 2909--2914, 2007.
[49]
S. Wu. Data fusion in information retrieval, volume 13 of Adaptation, Learning and Optimization. Springer, 2012.
[50]
Z. Xu, Y. Zhang, Y. Wu, and Q. Yang. Modeling user posting behavior on social media. In SIGIR, pages 545--554, 2012.
[51]
C. Zhai, W. W. Cohen, and J. Lafferty. Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In SIGIR, pages 10--17, 2003.

Cited By

View all
  • (2023)Result Diversification for Legal case RetrievalProceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3624918.3625319(158-168)Online publication date: 26-Nov-2023
  • (2023)Predictive edge caching through deep mining of sequential patterns in user content retrievalsComputer Networks10.1016/j.comnet.2023.109866233(109866)Online publication date: Sep-2023
  • (2022)Online Rare Category Identification and Data Diversification for Edge ComputingIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2021.308368141:5(1290-1301)Online publication date: May-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval
July 2014
1330 pages
ISBN:9781450322577
DOI:10.1145/2600428
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 July 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. ad hoc retrieval
  2. data fusion
  3. diversification
  4. rank aggregation

Qualifiers

  • Research-article

Funding Sources

Conference

SIGIR '14
Sponsor:

Acceptance Rates

SIGIR '14 Paper Acceptance Rate 82 of 387 submissions, 21%;
Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)10
  • Downloads (Last 6 weeks)1
Reflects downloads up to 25 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Result Diversification for Legal case RetrievalProceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3624918.3625319(158-168)Online publication date: 26-Nov-2023
  • (2023)Predictive edge caching through deep mining of sequential patterns in user content retrievalsComputer Networks10.1016/j.comnet.2023.109866233(109866)Online publication date: Sep-2023
  • (2022)Online Rare Category Identification and Data Diversification for Edge ComputingIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2021.308368141:5(1290-1301)Online publication date: May-2022
  • (2020)Measures of vote-seat disproportionality for incomplete dataParty Politics10.1177/135406882096838628:1(174-183)Online publication date: 29-Oct-2020
  • (2020)Cluster-Based Document Retrieval with Multiple QueriesProceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval10.1145/3409256.3409825(33-40)Online publication date: 14-Sep-2020
  • (2019)Boosting Search Performance Using Query VariationsACM Transactions on Information Systems10.1145/334500137:4(1-25)Online publication date: 4-Oct-2019
  • (2019)Unsupervised Semantic Generative Adversarial Networks for Expert RetrievalThe World Wide Web Conference10.1145/3308558.3313625(1039-1050)Online publication date: 13-May-2019
  • (2019)Learning Novelty-Aware Ranking of Answers to Complex QuestionsThe World Wide Web Conference10.1145/3308558.3313457(2799-2805)Online publication date: 13-May-2019
  • (2018)Fusion in Information RetrievalThe 41st International ACM SIGIR Conference on Research & Development in Information Retrieval10.1145/3209978.3210186(1383-1386)Online publication date: 27-Jun-2018
  • (2018)Manifold Learning for Rank AggregationProceedings of the 2018 World Wide Web Conference10.1145/3178876.3186085(1735-1744)Online publication date: 10-Apr-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media