Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/502932.502945acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Automatically combining ranking heuristics for HTML documents

Published: 09 November 2001 Publication History
  • Get Citation Alerts
  • Abstract

    Current search engines use several criteria or heuristics to rank HTML documents. HTML ranking heuristics need to be combined into a ranking function that given a text query returns a ranked list of HTML documents. The standard approach is to build a weighted average by manually estimating the importance of every heuristic and assigning a weight proportional to the estimated importance. In the current paper we apply an automatic method for combining HTML ranking heuristics. Using recall/precision evaluations we study the performance of the automatic method and using collections of HTML documents with different characteristics we show that the automatic method finds weights tailored to specific characteristics of each document collection

    References

    [1]
    B.T. Bartell, G.W. Cottrell, and R.W. Belew. Automatic combination of multiple ranked retrieval systems. In Proceedings of the Special Interest Group on Information Retrieval, Dublin, Ireland, 1994.
    [2]
    M. Cutler, H. Deng, S.S. Maniccam, and W. Meng. A new study on using html structures to improve retrieval. In Proceedings of the Eleventh IEEE Conference on Tools with Artificial Intelligence, pages 406-409, 1999.
    [3]
    A discussion of search-engine web-page ranking schemes. Available at the Search Engine Watch web site http://searchenginewatch.com/rank.htm.
    [4]
    W.B. Frakes and R. Baeza-Yates. Information Retrieval: Data Structures and Algorithms. Prentice-Hall, 1992.
    [5]
    L. Guttman. What is not what in statistics. In The Statistician, volume 26, pages 81-107, 1978.
    [6]
    D. Harman. An experimental study of factors important in document ranking. In Proceedings of the ACM SIGIR, pages 186-193, Pisa, Italy, 1986.
    [7]
    Multi heuristic system used in the described experiments. http ://www-scf.usc.edu/'rapela/projects/rankui.
    [8]
    J.M. Kleinberg. Authoritative sources in a hyperlinked environment. In Proceedings of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 668-677, San Francisco, California, January 1998.
    [9]
    L. Page. The pagerank citation ranking: bringing order to the web. In Proceedings of ASIS'98, Annual Meeting of the American Society for Information Science, 1998.
    [10]
    M.F. Porter. An algorithm for suffix stripping. In Program, volume 14 of 3, pages 130-137, July 1980. Available as http://open.muscat.com/developer/docs/porterstem.html.
    [11]
    W.H. Press, B.P. Flannery, S.A. Teukolsky, and W.T. Vetterling. Numerical Recipes in C. Cambridge University Press, 1991.
    [12]
    E. Pringle, A. Lloyd, and L.D. Dowe. What is a tall poppy among web pages? In Proceedings 7th International World Wide Web Conference, pages 369-377, Brisbane, Australia, April 1998.
    [13]
    S.K.M. Wong and Y.Y. Yao. Query formulation in linear retrieval models. In Journal of the American Society for Information Retrieval, volume 41 of 5, pages 334-341, 1990.

    Cited By

    View all

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WIDM '01: Proceedings of the 3rd international workshop on Web information and data management
    November 2001
    87 pages
    ISBN:1581134444
    DOI:10.1145/502932
    • Conference Chair:
    • Ee-Peng Lim
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 November 2001

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. HTML
    2. HTML ranking
    3. HTML ranking heuristics
    4. WWW
    5. automatic combination
    6. information retrieval

    Qualifiers

    • Article

    Conference

    CIKM01
    Sponsor:

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)Weighting Passages Enhances AccuracyACM Transactions on Information Systems10.1145/342868739:2(1-11)Online publication date: 17-Dec-2020
    • (2018)An algorithm to cluster documents based on relevanceInformation Processing and Management: an International Journal10.1016/j.ipm.2004.05.00341:5(1035-1049)Online publication date: 29-Dec-2018
    • (2018)Choosing document structure weightsInformation Processing and Management: an International Journal10.1016/j.ipm.2003.10.00341:2(243-264)Online publication date: 29-Dec-2018
    • (2018)Using proximity and tag weights for focused retrieval in structured documentsKnowledge and Information Systems10.1007/s10115-014-0767-644:1(51-76)Online publication date: 29-Dec-2018
    • (2018)BM25t: a BM25 extension for focused information retrievalKnowledge and Information Systems10.1007/s10115-011-0426-032:1(217-241)Online publication date: 29-Dec-2018
    • (2009)UJM at INEX 2008Advances in Focused Retrieval10.1007/978-3-642-03761-0_5(46-53)Online publication date: 3-Sep-2009
    • (2008)Integrating Structure in the Probabilistic Model for Information RetrievalProceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 0110.1109/WIIAT.2008.346(763-769)Online publication date: 9-Dec-2008

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media