Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1076034.1076050acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Server selection methods in hybrid portal search

Published: 15 August 2005 Publication History

Abstract

The TREC.GOV collection makes a valuable web testbed for distributed information retrieval methods because it is naturally partitioned and includes 725 web-oriented queries with judged answers. It can usefully model aspects of government and large corporate portals. Analysis of the.gov data shows that a purely distributed approach would not be feasible for providing search on a.gov portal because of the large number (17,000+) of web sites and the high proportion that do not provide a search interface. An alternative hybrid approach, combining both distributed and centralized techniques, is proposed and server selection methods are evaluated within this framework using web-oriented evaluation methodology. A number of well-known algorithms are compared against representatives (highest anchor ranked page (HARP) and anchor weighted sum (AWSUM)) of a family of new selection methods which use link anchortext extracted from an auxiliary crawl to provide descriptions of sites which are not themselves crawled. Of the previously published methods, ReDDE substantially outperformed three variants of CORI and also outperformed a method based on Kullback-Leibler Divergence (extended) except on topic distillation. HARP and AWSUM performed best overall but were outperformed on the topic distillation task by extended KL Divergence.

References

[1]
Andrei Broder. A taxonomy of web search. SIGIR Forum, 36(2), 2002.
[2]
J. P. Callan, Z. Lu, and W. Bruce Croft. Searching distributed collections with inference networks. In Proc. ACM SIGIR 1995, 1995.
[3]
Jamie Callan, Margaret Connell, and Aiqun Du. Automatic discovery of language models for text databases. In Proc. ACM SIGMOD 99, 1999.
[4]
Jared Cope, Nick Craswell, and David Hawking. Automated discovery of search interfaces on the web. In Proc. 14th Australasian Database Conference, 2003.
[5]
Nick Craswell, Peter Bailey, and David Hawking. Server selection on the world wide web. In Proc. ACM Digital Libraries Conference, June 2000.
[6]
Nick Craswell, Francis Crimmins, David Hawking, and Alistair Moffat. Performance and cost tradeoffs in web search. In Proc. ADC 2004, January 2004.
[7]
Nick Craswell, David Hawking, and Stephen Robertson. Effective site finding using link anchor information. In Proc. ACM SIGIR 2001, 2001.
[8]
Nick Craswell, David Hawking, Ross Wilkinson, and Mingfang Wu. Overview of the TREC-2003 web track. In Proc. TREC 2003, November 2003.
[9]
James C. French, Alison L. Powell, Jamie Callan, Charles L. Viles, Travis Emmitt, Kevin J. Prey, and Yun Mou. Comparing the performance of database selection algorithms. In Proc. ACM SIGIR 1999, August 1999.
[10]
James C. French, Allison L. Powell, Charles L. Viles, Travis Emmitt, and Kevin J. Prey. Evaluating database selection techniques: A testbed and experiment. In Proc. ACM SIGIR 1998, 1998.
[11]
Luis Gravano, Héctor García-Molena, and Anthony Tomasic. GlOSS: Text-source discovery over the internet. ACM Transactions on Database Systems, 24(2), June 1999.
[12]
David Hawking, Trystan Upstill, and Nick Craswell. Toward better weighting of anchors. In Proc. ACM SIGIR 2004, July 2004.
[13]
Bernado A. Huberman and Lada A. Adamic. Evolutionary dynamics of the world wide web. Technical report, Xerox Palo Alto Research Center, February 1999. http://www.hpl.hp.com/research/idl/papers/webgrowth/.
[14]
Panaglotis G. Ipeirotis and Luis Gravano. When one sample is not enough: Improving text database selection using shrinkage. In Proc. ACM SIGMOD 2004, Paris, June 2004. ACM Press.
[15]
Ronny Lempel and Shlomo Moran. Optimizing result prefetching in web search engines with segmented indices. In VLDB, pages 370--381, 2002.
[16]
Henrik Nottelmann and Norbert Fuhr. Combining CORI and the decision-theoretic approach for advanced resource selection. In Proc. ECIC 2004. Springer, 2004.
[17]
Allison L. Powell and James C. French. Comparing the performance of collection selection algorithms. ACM Transactions on Information Systems, 21(4), October 2003.
[18]
Yves Rasolofo, Faïza Abbaci, and Jacques Savoy. Approaches to collection selection and results merging for distributed information retrieval. In Proc. CIKM 2001, November 2001.
[19]
Luo Si and Jamie Callan. The effect of database size distribution on resource selection algorithms. In Proc. SIGIR 2003 Workshop on Distributed Information Retrieval, August 2003.
[20]
Luo Si and Jamie Callan. Relevant document distribution estimation method for resource selection. In Proc. ACM SIGIR 2003, July--August 2003.
[21]
Luo Si, Rong Jin, Jamie Callan, and Paul Ogilvie. A language modeling framework for resource selection and results merging. In Proc. CIKM 2002, 2002.
[22]
Craig Silverstein, Monika Henzinger, Hannes Marais, and Michael Moricz. Analysis of a very large web search engine query log. SIGIR Forum, 33(1), 1999.
[23]
Amit Singhal and Marcin Kaszkiel. A case study in web search using TREC algorithms. In Proc. WWW10, May 2001.
[24]
Jaime Teevan, Christine Alvarado, Mark S. Ackerman, and David R. Karger. The perfect search engine is not enough: A study of orienteering behaviour in directed search. In Proc. 2004 Conference on Human Factors in Computing Systems, April 2004.
[25]
Jinxi Xu and W. Bruce Croft. Cluster-based language models for distributed retrieval. In Proc. ACM SIGIR 1999, August 1999.

Cited By

View all
  • (2018)Visual Analysis of Distributed Search Traffic in a Peer-to-peer Network2018 10th International Conference on Communication Software and Networks (ICCSN)10.1109/ICCSN.2018.8488222(189-194)Online publication date: Jul-2018
  • (2018)Visual Analysis of Distributed Search Traffic in a Peer-to-peer Network2018 13th APCA International Conference on Control and Soft Computing (CONTROLO)10.1109/CONTROLO.2018.8439791(189-194)Online publication date: Jun-2018
  • (2018)Searching Digital LibrariesEncyclopedia of Database Systems10.1007/978-1-4614-8265-9_327(3333-3337)Online publication date: 7-Dec-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
August 2005
708 pages
ISBN:1595930345
DOI:10.1145/1076034
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 August 2005

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SIGIR05
Sponsor:

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 11 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2018)Visual Analysis of Distributed Search Traffic in a Peer-to-peer Network2018 10th International Conference on Communication Software and Networks (ICCSN)10.1109/ICCSN.2018.8488222(189-194)Online publication date: Jul-2018
  • (2018)Visual Analysis of Distributed Search Traffic in a Peer-to-peer Network2018 13th APCA International Conference on Control and Soft Computing (CONTROLO)10.1109/CONTROLO.2018.8439791(189-194)Online publication date: Jun-2018
  • (2018)Searching Digital LibrariesEncyclopedia of Database Systems10.1007/978-1-4614-8265-9_327(3333-3337)Online publication date: 7-Dec-2018
  • (2017)Distributed Search Efficiency and Robustness in Service oriented Multi-agent NetworksProceedings of the 2017 International Conference on Management Engineering, Software Engineering and Service Sciences10.1145/3034950.3034975(9-18)Online publication date: 14-Jan-2017
  • (2016)Scalability analysis of distributed search in large peer-to-peer networks2016 IEEE International Conference on Big Data (Big Data)10.1109/BigData.2016.7840686(909-914)Online publication date: Dec-2016
  • (2016)Searching Digital LibrariesEncyclopedia of Database Systems10.1007/978-1-4899-7993-3_327-2(1-4)Online publication date: 9-Dec-2016
  • (2015)Distributed Information Retrieval: Developments and StrategiesInternational Journal of Engineering Research in Africa10.4028/www.scientific.net/JERA.16.11016(110-144)Online publication date: Jun-2015
  • (2015)An Optimization Framework for Merging Multiple Result ListsProceedings of the 24th ACM International on Conference on Information and Knowledge Management10.1145/2806416.2806489(303-312)Online publication date: 17-Oct-2015
  • (2013)Which vertical search engines are relevant?Proceedings of the 22nd international conference on World Wide Web10.1145/2488388.2488524(1557-1568)Online publication date: 13-May-2013
  • (2013)Studying the clustering paradox and scalability of search in highly distributed environmentsACM Transactions on Information Systems10.1145/2457465.245746831:2(1-36)Online publication date: 17-May-2013
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media