poster

Exploiting site-level information to improve web search

Authors:

Evgeniy Gabrilovich,

Vanja Josifovski,

George Mavromatis,

Donald Metzler,

Jane WangAuthors Info & Claims

CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management

Pages 1393 - 1396

https://doi.org/10.1145/1871437.1871630

Published: 26 October 2010 Publication History

Abstract

Ranking Web search results has long evolved beyond simple bag-of-words retrieval models. Modern search engines routinely employ machine learning ranking that relies on exogenous relevance signals. Yet the majority of current methods still evaluate each Web page out of context. In this work, we introduce a novel source of relevance information for Web search by evaluating each page in the context of its host Web site. For this purpose, we devise two strategies for compactly representing entire Web sites. We formalize our approach by building two indices, a traditional page index and a new site index, where each "document" represents the an entire Web site. At runtime, a query is first executed against both indices, and then the final page score for a given query is produced by combining the scores of the page and its site. Experimental results carried out on a large-scale Web search test collection from a major commercial search engine confirm the proposed approach leads to consistent and significant improvements in retrieval effectiveness.

References

[1]

F. Aguiar. Improving web search by the identification of contextual information. In Studies In Fuzziness And Soft Computing, Studies In Fuzziness And Soft Computing. Physica-Verlag Heidelberg, 2003.

Digital Library

[2]

S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1--7):107--117, 1998.

Digital Library

[3]

O. Chapelle, D. Metzler, Y. Zhang, and P. Grinspan. Expected reciprocal rank for graded relevance. In Proc. 18th Intl. Conf. on Information and Knowledge Management, page To appear., 2009.

Digital Library

[4]

B. Efron and R. Tibshirani. An Introduction to the Bootstrap. Chapman & Hall, 1993.

[5]

N. Jardine and C. van Rijsbergen. The use of hierarchic clustering in information retrieval. Information Storage and Retrieval, 7(5), 1971.

[6]

K. Jarvelin and J. Kekalainen. Cumulated gain-based evaluation of ir techniques. ACM Trans. Inf. Syst., 20(4):422--446, 2002.

Digital Library

[7]

J. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604--632, 1999.

Digital Library

[8]

O. Kurland and L. Lee. Corpus structure, language models, and ad hoc information retrieval. In SIGIR, pages 194--201, 2004.

Digital Library

[9]

X. Liu and W. B. Croft. Cluster-based retrieval using language models. In SIGIR, pages 186--193, 2004.

Digital Library

[10]

D. Metzler. Beyond Bags of Words: Effectively Modeling Dependence and Features in Information Retrieval. PhD thesis, University of Massachusetts, Amherst, MA, 2007.

Digital Library

[11]

D. Metzler and W. B. Croft. A Markov Random Field model for term dependencies. In Proc. 28th Ann. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, pages 472--479, 2005.

Digital Library

[12]

D. Metzler, J. Novak, H. Cui, and S. Reddy. Building enriched document representations using aggregated anchor text. In Proc. 32nd Ann. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, pages 219--226, New York, NY, USA, 2009. ACM.

Digital Library

[13]

T. Qin, T.-Y. Liu, X.-D. Zhang, Z. Chen, and W.-Y. Ma. A study of relevance propagation for web search. In SIGIR, pages 408--415, 2005.

Digital Library

[14]

A. Shakery and C. Zhai. Smoothing document language models with probabilistic term count propagation. Information Retrieval, 11(2), 2008.

Digital Library

[15]

X. Wei and W. B. Croft. LDA-based document models for ad-hoc retrieval. In SIGIR, pages 178--185, 2006.

Digital Library

[16]

H. Zaragoza, N. Craswell, M. Taylor, S. Saria, and S. Robertson. Microsoft Cambridge at TREC 13: Web and hard tracks. In Proc. 13th Text REtrieval Conference, 2004.

Cited By

Gallagher LChen RBlanco RCulpepper JCulpepper JMoffat ABennett PLerman K(2019)Joint Optimization of Cascade Ranking ModelsProceedings of the Twelfth ACM International Conference on Web Search and Data Mining10.1145/3289600.3290986(15-23)Online publication date: 30-Jan-2019
https://dl.acm.org/doi/10.1145/3289600.3290986
Balog KBalog K(2018)Term-Based Models for Entity RankingEntity-Oriented Search10.1007/978-3-319-93935-3_3(57-99)Online publication date: 3-Oct-2018
https://doi.org/10.1007/978-3-319-93935-3_3
Friedrich JLindemann CPetrifke M(2017)Two-level dynamic index pruning2017 Twelfth International Conference on Digital Information Management (ICDIM)10.1109/ICDIM.2017.8244656(191-196)Online publication date: Sep-2017
https://doi.org/10.1109/ICDIM.2017.8244656
Show More Cited By

Index Terms

Exploiting site-level information to improve web search
1. Information systems
  1. Information retrieval

Recommendations

Identifying popular search goals behind search queries to improve web search ranking
AIRS'11: Proceedings of the 7th Asia conference on Information Retrieval Technology

Web users usually have a certain search goal before they submit a search query. However, many laypersons can't transform their search goals into suitable queries. Thus, understanding original search goals behind a query is very important for search ...
Exploiting temporal information in Web search

Time plays important roles in Web search, because most Web pages contain temporal information and a lot of Web queries are time-related. How to integrate temporal information in Web search engines has been a research focus in recent years. However, ...
Exploiting location information for Web search

Most Web pages contain location information, which are usually neglected by traditional search engines. Queries combining location and textual terms are called as spatial textual Web queries. Based on the fact that traditional search engines pay little ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management

October 2010

2036 pages

ISBN:9781450300995

DOI:10.1145/1871437

General Chair:
Jimmy Huang
York University, Canada
,
Program Chairs:
Nick Koudas
University of Toronto, Canada
,
Gareth Jones
Dublin City University, Ireland
,
Xindong Wu
University of Vermont, USA
,
Kevyn Collins-Thompson
Microsoft Research, USA
,
Aijun An
York University, Canada

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 October 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Poster

Conference

CIKM '10

Sponsor:

CIKM '10: International Conference on Information and Knowledge Management

October 26 - 30, 2010

ON, Toronto, Canada

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
202
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)0

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Gallagher LChen RBlanco RCulpepper JCulpepper JMoffat ABennett PLerman K(2019)Joint Optimization of Cascade Ranking ModelsProceedings of the Twelfth ACM International Conference on Web Search and Data Mining10.1145/3289600.3290986(15-23)Online publication date: 30-Jan-2019
https://dl.acm.org/doi/10.1145/3289600.3290986
Balog KBalog K(2018)Term-Based Models for Entity RankingEntity-Oriented Search10.1007/978-3-319-93935-3_3(57-99)Online publication date: 3-Oct-2018
https://doi.org/10.1007/978-3-319-93935-3_3
Friedrich JLindemann CPetrifke M(2017)Two-level dynamic index pruning2017 Twelfth International Conference on Digital Information Management (ICDIM)10.1109/ICDIM.2017.8244656(191-196)Online publication date: Sep-2017
https://doi.org/10.1109/ICDIM.2017.8244656
Jiang SHu YKang CDaly TYin DChang YZhai CPerego RSebastiani FAslam JRuthven IZobel J(2016)Learning Query and Document Relevance from a Web-scale Click GraphProceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval10.1145/2911451.2911531(185-194)Online publication date: 7-Jul-2016
https://dl.acm.org/doi/10.1145/2911451.2911531
Graus DTsagkias MWeerkamp WMeij Ede Rijke MBennett PJosifovski VNeville JRadlinski F(2016)Dynamic Collective Entity Representations for Entity RankingProceedings of the Ninth ACM International Conference on Web Search and Data Mining10.1145/2835776.2835819(595-604)Online publication date: 8-Feb-2016
https://dl.acm.org/doi/10.1145/2835776.2835819
Huurdeman HKamps JSamar Tde Vries ABen-David ARogers R(2015)Lost but not forgotten: finding pages on the unarchived webInternational Journal on Digital Libraries10.1007/s00799-015-0153-316:3-4(247-265)Online publication date: 3-Jun-2015
https://doi.org/10.1007/s00799-015-0153-3
Limsopatham NMacdonald COunis I(2013)Aggregating evidence from hospital departments to improve medical records searchProceedings of the 35th European conference on Advances in Information Retrieval10.1007/978-3-642-36973-5_24(279-291)Online publication date: 24-Mar-2013
https://dl.acm.org/doi/10.1007/978-3-642-36973-5_24
Tan CGabrilovich EPang BAdar ETeevan JAgichtein EMaarek Y(2012)To each his ownProceedings of the fifth ACM international conference on Web search and data mining10.1145/2124295.2124325(233-242)Online publication date: 8-Feb-2012
https://dl.acm.org/doi/10.1145/2124295.2124325

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents