Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Sampling Search-Engine Results

Published: 01 December 2006 Publication History

Abstract

We consider the problem of efficiently sampling Web search engine query results. In turn, using a small random sample instead of the full set of results leads to efficient approximate algorithms for several applications, such as: .
Determining the set of categories in a given taxonomy spanned by the search results;
Finding the range of metadata values associated with the result set in order to enable "multi-faceted search";
Estimating the size of the result set;
Data mining associations to the query terms We present and analyze efficient algorithms for obtaining uniform random samples applicable to any search engine that is based on posting lists and document-at-a-time evaluation. (To our knowledge, all popular Web search engines, for example, Google, Yahoo Search, MSN Search, Ask, belong to this class.) Furthermore, our algorithm can be modified to follow the modern object-oriented approach whereby posting lists are viewed as streams equipped with a next method, and the next method for Boolean and other complex queries is built from the next method for primitive terms. In our case we show how to construct a basic sample-next ( p ) method that samples term posting lists with probability p , and show how to construct sample-next ( p ) methods for Boolean operators ( AND , OR , WAND ) from primitive methods. Finally, we test the efficiency and quality of our approach on both synthetic and real-world data.

Cited By

View all
  • (2022)Web Page Ranking Using Web Mining TechniquesMobile Information Systems10.1155/2022/75195732022Online publication date: 1-Jan-2022
  • (2022)Diversifying recommendations on sequences of setsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-022-00740-632:2(283-304)Online publication date: 17-May-2022
  • (2021)Multi-Session Diversity to Improve User Satisfaction in Web ApplicationsProceedings of the Web Conference 202110.1145/3442381.3450046(1928-1936)Online publication date: 19-Apr-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image World Wide Web
World Wide Web  Volume 9, Issue 4
December 2006
252 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 December 2006

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 06 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Web Page Ranking Using Web Mining TechniquesMobile Information Systems10.1155/2022/75195732022Online publication date: 1-Jan-2022
  • (2022)Diversifying recommendations on sequences of setsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-022-00740-632:2(283-304)Online publication date: 17-May-2022
  • (2021)Multi-Session Diversity to Improve User Satisfaction in Web ApplicationsProceedings of the Web Conference 202110.1145/3442381.3450046(1928-1936)Online publication date: 19-Apr-2021
  • (2017)A survey of query result diversificationKnowledge and Information Systems10.1007/s10115-016-0990-451:1(1-36)Online publication date: 1-Apr-2017
  • (2016)The Power of an ExampleACM Transactions on Computation Theory10.1145/29306578:4(1-19)Online publication date: 14-Jun-2016
  • (2016)Estimating search engine index size variabilityScientometrics10.1007/s11192-016-1863-z107:2(839-856)Online publication date: 1-May-2016
  • (2014)Composable core-sets for diversity and coverage maximizationProceedings of the 33rd ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems10.1145/2594538.2594560(100-108)Online publication date: 18-Jun-2014
  • (2013)Topical crawling on the web through local site-searchesJournal of Web Engineering10.5555/2535629.253563112:3-4(203-214)Online publication date: 1-Jul-2013
  • (2011)Efficient Search Engine MeasurementsACM Transactions on the Web10.1145/2019643.20196455:4(1-48)Online publication date: 1-Oct-2011
  • (2008)Opinion Mining and Sentiment AnalysisFoundations and Trends in Information Retrieval10.1561/15000000112:1-2(1-135)Online publication date: 1-Jan-2008
  • Show More Cited By

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media