research-article

Query sampling for learning data fusion

Authors:

Pu-Jen ChengAuthors Info & Claims

CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

Pages 141 - 146

https://doi.org/10.1145/2063576.2063601

Published: 24 October 2011 Publication History

Abstract

Data fusion is to merge the results of multiple independent retrieval models into a single ranked list. Several earlier studies have shown that the combination of different models can improve the retrieval performance better than using any of the individual models. Although many promising results have been given by supervised fusion methods, training data sampling has attracted little attention in previous work of data fusion. By observing some evaluations on TREC and NTCIR datasets, we found that the performance of one model varied largely from one training example to another, so that not all training examples were equivalently effective. In this paper, we propose two novel approaches: greedy and boosting approaches, which select effective training data by query sampling to improve the performance of supervised data fusion algorithms such as BayesFuse, probFuse and MAPFuse. Extensive experiments were conducted on five data sets including TREC-3,4,5 and NTCIR-3,4. The results show that our sampling approaches can significantly improve the retrieval performance of those data fusion methods.

References

[1]

W. I. Ai and P. Langley. Induction of one-level decision trees. In ICML. Morgan Kaufmann, 1992.

[2]

J. A. Aslam, E. Kanoulas, V. Pavlu, S. Savev, and E. Yilmaz. Document selection methodologies for efficient and effective learning-to-rank. In SIGIR. ACM, 2009.

Digital Library

[3]

J. A. Aslam and M. Montague. Models for metasearch. In SIGIR. ACM, 2001.

Digital Library

[4]

B. T. Bartell, G. W. Cottrell, and R. K. Belew. Automatic combination of multiple ranked retrieval systems. In SIGIR, 1994.

Digital Library

[5]

N. J. Belkin, P. Kantor, E. A. Fox, and J. A. Shaw. Combining the evidence of multiple query representations for information retrieval. IP&M, 1995.

Digital Library

[6]

A. L. Blum and P. Langley. Selection of relevant features and examples in machine learning. Artificial Intelligence, 1997.

Digital Library

[7]

J. A. S. Edward A. Fox. Combination of multiple searches. In TREC-2. NIST, 1994.

[8]

Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. In EuroCOLT. Springer, 1995.

Digital Library

[9]

J. Guiver, S. Mizzaro, and S. Robertson. A few good topics: Experiments in topic set reduction for retrieval evaluation. TOIS, 2009.

Digital Library

[10]

C. Hauff, D. Hiemstra, F. de Jong, and L. Azzopardi. Relying on topic subsets for system ranking estimation. In CIKM. ACM, 2009.

Digital Library

[11]

L. S. Kennedy, A. P. Natsev, and S.-F. Chang. Automatic discovery of query-class-dependent models for multimodal search. In MULTIMEDIA. ACM, 2005.

Digital Library

[12]

L. Li, A. Pratap, H. tien Lin, and Y. S. Abu-mostafa. Improving generalization by data categorization. In PKDD 2005. Springer, 2005.

Digital Library

[13]

D. Lillis, F. Toolan, R. Collier, and J. Dunnion. Probfuse: a probabilistic approach to data fusion. In SIGIR. ACM, 2006.

Digital Library

[14]

D. Lillis, F. Toolan, R. Collier, and J. Dunnion. Extending probabilistic data fusion using sliding windows. In ECIR. Springer, 2008.

Digital Library

[15]

D. Lillis, L. Zhang, F. Toolan, R. W. Collier, D. Leonard, and J. Dunnion. Estimating probabilities for effective data fusion. In SIGIR. ACM, 2010.

Digital Library

[16]

D. J. C. MacKay. Information-based objective functions for active data selection. Neural Computation, 1992.

Digital Library

[17]

M. Montague and J. A. Aslam. Condorcet fusion for improved retrieval. In CIKM. ACM, 2002.

Digital Library

[18]

S. Robertson. On the contributions of topics to system evaluation. In ECIR. Springer, 2011.

Digital Library

[19]

R. E. Schapire and Y. Singer. Improved boosting algorithms using confidence-rated predictions. Machine Learning, 1999.

Digital Library

[20]

M. Shokouhi. Segmentation of search engine results for effective data-fusion. In ECIR. Springer, 2007.

Digital Library

[21]

E. M. Voorhees, N. K. Gupta, and B. Johnson-laird. The collection fusion problem. In TREC-3. NIST.

[22]

J. Wang, P. Neskovic, and L. N. Cooper. Training data selection for support vector machines. In ICNC. LNCS. Springer, 2005.

Digital Library

[23]

R. Yan and A. G. Hauptmann. Probabilistic latent query analysis for combining multiple retrieval sources. In SIGIR. ACM, 2006.

Digital Library

Index Terms

Query sampling for learning data fusion
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
    2. Retrieval tasks and goals

Recommendations

Multi-source data fusion study in scientometrics

This paper provides an introduction to multi-source data fusion (MSDF) and comprehensively overviews the ingredients and challenges of MSDF in scientometrics. As compared to the MSDF methods in the sensor and other fields, and considering the features ...
Applying the data fusion technique to blog opinion retrieval

In recent years, blogs have been very popular on the Web as a grassroots publishing platform. Some research has been conducted on them and blog opinion retrieval is one of the key issues. In this paper, we investigate if data fusion can be useful for ...
Applying statistical principles to data fusion in information retrieval

Data fusion in information retrieval has been investigated by many researchers and quite a few data fusion methods have been proposed. However, their effect on effectiveness has not been well understood. In this paper, we apply statistical principles to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

October 2011

2712 pages

ISBN:9781450307178

DOI:10.1145/2063576

Editors:
Bettina Berendt,
Arjen de Vries,
Wenfei Fan,
Craig Macdonald
University of Glasgow, UK
,
Iadh Ounis
University of Glasgow, UK
,
Ian Ruthven
University of Strathclyde, UK

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 October 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CIKM '11

Sponsor:

CIKM '11: International Conference on Information and Knowledge Management

October 24 - 28, 2011

Glasgow, Scotland, UK

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
275
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents