Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1871437.1871630acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
poster

Exploiting site-level information to improve web search

Published: 26 October 2010 Publication History

Abstract

Ranking Web search results has long evolved beyond simple bag-of-words retrieval models. Modern search engines routinely employ machine learning ranking that relies on exogenous relevance signals. Yet the majority of current methods still evaluate each Web page out of context. In this work, we introduce a novel source of relevance information for Web search by evaluating each page in the context of its host Web site. For this purpose, we devise two strategies for compactly representing entire Web sites. We formalize our approach by building two indices, a traditional page index and a new site index, where each "document" represents the an entire Web site. At runtime, a query is first executed against both indices, and then the final page score for a given query is produced by combining the scores of the page and its site. Experimental results carried out on a large-scale Web search test collection from a major commercial search engine confirm the proposed approach leads to consistent and significant improvements in retrieval effectiveness.

References

[1]
F. Aguiar. Improving web search by the identification of contextual information. In Studies In Fuzziness And Soft Computing, Studies In Fuzziness And Soft Computing. Physica-Verlag Heidelberg, 2003.
[2]
S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1--7):107--117, 1998.
[3]
O. Chapelle, D. Metzler, Y. Zhang, and P. Grinspan. Expected reciprocal rank for graded relevance. In Proc. 18th Intl. Conf. on Information and Knowledge Management, page To appear., 2009.
[4]
B. Efron and R. Tibshirani. An Introduction to the Bootstrap. Chapman & Hall, 1993.
[5]
N. Jardine and C. van Rijsbergen. The use of hierarchic clustering in information retrieval. Information Storage and Retrieval, 7(5), 1971.
[6]
K. Jarvelin and J. Kekalainen. Cumulated gain-based evaluation of ir techniques. ACM Trans. Inf. Syst., 20(4):422--446, 2002.
[7]
J. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604--632, 1999.
[8]
O. Kurland and L. Lee. Corpus structure, language models, and ad hoc information retrieval. In SIGIR, pages 194--201, 2004.
[9]
X. Liu and W. B. Croft. Cluster-based retrieval using language models. In SIGIR, pages 186--193, 2004.
[10]
D. Metzler. Beyond Bags of Words: Effectively Modeling Dependence and Features in Information Retrieval. PhD thesis, University of Massachusetts, Amherst, MA, 2007.
[11]
D. Metzler and W. B. Croft. A Markov Random Field model for term dependencies. In Proc. 28th Ann. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, pages 472--479, 2005.
[12]
D. Metzler, J. Novak, H. Cui, and S. Reddy. Building enriched document representations using aggregated anchor text. In Proc. 32nd Ann. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, pages 219--226, New York, NY, USA, 2009. ACM.
[13]
T. Qin, T.-Y. Liu, X.-D. Zhang, Z. Chen, and W.-Y. Ma. A study of relevance propagation for web search. In SIGIR, pages 408--415, 2005.
[14]
A. Shakery and C. Zhai. Smoothing document language models with probabilistic term count propagation. Information Retrieval, 11(2), 2008.
[15]
X. Wei and W. B. Croft. LDA-based document models for ad-hoc retrieval. In SIGIR, pages 178--185, 2006.
[16]
H. Zaragoza, N. Craswell, M. Taylor, S. Saria, and S. Robertson. Microsoft Cambridge at TREC 13: Web and hard tracks. In Proc. 13th Text REtrieval Conference, 2004.

Cited By

View all
  • (2019)Joint Optimization of Cascade Ranking ModelsProceedings of the Twelfth ACM International Conference on Web Search and Data Mining10.1145/3289600.3290986(15-23)Online publication date: 30-Jan-2019
  • (2018)Term-Based Models for Entity RankingEntity-Oriented Search10.1007/978-3-319-93935-3_3(57-99)Online publication date: 3-Oct-2018
  • (2017)Two-level dynamic index pruning2017 Twelfth International Conference on Digital Information Management (ICDIM)10.1109/ICDIM.2017.8244656(191-196)Online publication date: Sep-2017
  • Show More Cited By

Index Terms

  1. Exploiting site-level information to improve web search

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management
    October 2010
    2036 pages
    ISBN:9781450300995
    DOI:10.1145/1871437
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 October 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. corpus structure
    2. textual features
    3. web search

    Qualifiers

    • Poster

    Conference

    CIKM '10

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 30 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)Joint Optimization of Cascade Ranking ModelsProceedings of the Twelfth ACM International Conference on Web Search and Data Mining10.1145/3289600.3290986(15-23)Online publication date: 30-Jan-2019
    • (2018)Term-Based Models for Entity RankingEntity-Oriented Search10.1007/978-3-319-93935-3_3(57-99)Online publication date: 3-Oct-2018
    • (2017)Two-level dynamic index pruning2017 Twelfth International Conference on Digital Information Management (ICDIM)10.1109/ICDIM.2017.8244656(191-196)Online publication date: Sep-2017
    • (2016)Learning Query and Document Relevance from a Web-scale Click GraphProceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval10.1145/2911451.2911531(185-194)Online publication date: 7-Jul-2016
    • (2016)Dynamic Collective Entity Representations for Entity RankingProceedings of the Ninth ACM International Conference on Web Search and Data Mining10.1145/2835776.2835819(595-604)Online publication date: 8-Feb-2016
    • (2015)Lost but not forgotten: finding pages on the unarchived webInternational Journal on Digital Libraries10.1007/s00799-015-0153-316:3-4(247-265)Online publication date: 3-Jun-2015
    • (2013)Aggregating evidence from hospital departments to improve medical records searchProceedings of the 35th European conference on Advances in Information Retrieval10.1007/978-3-642-36973-5_24(279-291)Online publication date: 24-Mar-2013
    • (2012)To each his ownProceedings of the fifth ACM international conference on Web search and data mining10.1145/2124295.2124325(233-242)Online publication date: 8-Feb-2012

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media