Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/345508.345597acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article
Free access

Topical locality in the Web

Published: 01 July 2000 Publication History
  • Get Citation Alerts
  • Abstract

    Most web pages are linked to others with related content. This idea, combined with another that says that text in, and possibly around, HTML anchors describe the pages to which they point, is the foundation for a usable World-Wide Web. In this paper, we examine to what extent these ideas hold by empirically testing whether topical locality mirrors spatial locality of pages on the Web. In particular, we find that the likelihood of linked pages having similar textual content to be high; the similarity of sibling pages increases when the links from the parent are close together; titles, descriptions, and anchor text represent at least part of the target page; and that anchor text may be a useful discriminator among unseen child pages. These results show the foundations necessary for the success of many web systems, including search engines, focused crawlers, linkage analyzers, and intelligent web agents.

    References

    [1]
    E. Amitay. Hypertext- The importance of being different. Master's thesis, Edinburgh University, Scotland, 1997. Also Technical Report No. HCRC/RP-94.
    [2]
    E. Amitay. Using common hypertext links to identify the best phrasal description of target web documents. In Proceedings of the SIGIR'98 Post-Conference Workshop on Hypertext Information Retrieval for the Web, Melbourne, Australia, 1998.
    [3]
    M. Balabanovic and Y. Shoham. Fab: Content-based, collaborative recommendation. Communications of the A CM, 40(3), Mar. 1997.
    [4]
    I. Ben-Shaul, M. Herscovici, M. Jacovi, Y. S. Maarek, D. Pelleg, M. Shtalhaim, V. Soroka, and S. Ur. Adding support for dynamic and focused search with Fetuccino. In Proceedings of the Eighth International World Wide Web Conference, Toronto, Canada, May 1999.
    [5]
    K. Bharat and A. Broder. A technique for measuring the relative size and overlap of public Web search engines. In Proceedings of the Seventh International World Wide Web Conference, Brisbane, Australia, Apr. 1998.
    [6]
    K. Bharat and M. R. Henzinger. Improved Algnnthms for Topic Distillation in Hyperlinked Environments. In Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 104-111, Aug. 1998.
    [7]
    J. Boyan, D. Freitag, and T. Joachims. A Machine Learning Architecture for Optimizing Web Search Engines. In AAAI Workshop on Internet-Based Information Systems, Portland, OR, Aug. 1996.
    [8]
    S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the Seventh International World Wide Web Conference, Brisbane, Australia, Apr. 1998.
    [9]
    S. Chakrabarti, B. E. Dom, and P. Indyk. Enhanced hypertext categorization using hyperlinks. In Proceedings of ACM SIGMOD, Seattle, WA, 1998.
    [10]
    S. Chakrabarti, B. E. Dora, P. Raghavan, S. Rajagopalan, D. Gibson, and J. M. Kleinberg. Automatic resource compilation by analyzing hyperlink structure and associated text. In Proceedings of the Seventh International World Wide Web Conference, Brisbane, Australia, Apr. 1998.
    [11]
    S. Chakrabarti, M. van den Berg, and B. E. Dom. Focused crawling: a new approach to topic-specific web resource discovery. In Proceedings of the Eighth International World Wide Web Conference, Toronto, Canada, May 1999.
    [12]
    J. Cho, H. Garcia-Molina, and L. Page. Efficient crawling through URL ordering. In Proceedings of the Seventh International World Wide Web Conference, Brisbane, Australia, Apr. 1998.
    [13]
    B.D. Davison. Adaptive Web Prefetching. In Proceedings of the 2nd Workshop on Adaptive Systems and User Modeling on the WWW, pages 105-106, Toronto, May 1999. Position paper. Proceedings published as Computing Science Report 99-07, Dept. of Mathematics and Computing Science, Eindhoven University of Technology.
    [14]
    B. D. Davison. Topical locality in the Web: Experiments and observations. Technical Report DCS-TR-414, Department of Computer Science, Rutgers University, 2000.
    [15]
    B.D. Davison, A. Gerasoulis, K. Kleisouris, Y. Lu, H. Set, W. Wang, and B. Wu. DiscoWeb: Applying Link Analysis to Web Search. In Poster proceedings of the Eighth International World Wide Web Conference, pages 148-149, Toronto, Canada, May 1999.
    [16]
    J. Dean and M. R. Henzinger. Finding related pages in the world wide web. In Proceedings of the Eighth International World Wide Web Conference, pages 389-401, Toronto, Canada, May 1999.
    [17]
    D. Gibson, J. M. Kleinberg, and P. Raghavan. Inferring web communities from link topology. In Proceedings of the 9th ACM Conference on Hypertext and Hypermedia (Hypertext'98), 1998. Expanded version at http://www.cs.cornell.edu/home/kleinber/.
    [18]
    A. Howe and D. Dreilinger. SavvySearch: A MetaSearch Engine that Learns Which Search Engines to Query. AI Magazine, 18(2), 1997.
    [19]
    T. Joachims, D. Freitag, and T. Mitchell. WebWatcher: A Tour Guide for the World Wide Web. In Proceedings of the Feenth International Joint Conference on Artificial Intelligence, pages 770-775. Morgan Kaufmann, Aug. 1997.
    [20]
    J. M. Kleinberg. Authoritative sources in a hyperlinked environment. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA-98), pages 668- 677, San Francisco, CA, Jan. 1998. Expanded version at http://www.cs.cornell.edu/home/kleinber/.
    [21]
    T. Koch, A. Ardo, A. Brummer, and S. Lundberg. The building and maintenance of robot based internet search services: A review of current indexing and data collection methods. Prepared for Work Package 3 of EU Telematics for Research, project DESIRE; Available from http:l/www.ub2.1u.se/desire/radar/reportslD3.111, Sept. 1996.
    [22]
    S. Lawrence and C. L. Giles. Inquirus, the NECI meta search engine. In Proceedings of the Seventh International World Wide Web Conference, Brisbane, Australia, Apr. 1998.
    [23]
    S. Lawrence and C. L. Giles. Accessibility of Information on the Web. Nature, 400:107-109, 1999.
    [24]
    H. Lieberman. Autonomous Interface Agents. In Proceedings of the ACM SIGCHI'97 Conference on Human Factors in Computing Systems, Atlanta, GA, Mar. 1997.
    [25]
    O. A. McBryan. GENVL and WWWW: Tools for taming the Web. In Proceedings of the First International World Wide Web Conference, Geneva, Switzerland, May 1994.
    [26]
    E Menczer and R. K. Belew. Adaptive Retrieval Agents: Internalizing Local Context and Scaling up to the Web. Machine Learning, pages 1-45, 1999.
    [27]
    D. Mladenic. Personal WebWatcher: Implementation and Design. Technical Report IJS-DP-7472, Department of Intelligent Systems, J. Stefan Institute, Univ. of of Ljubljana, Slovenia, Oct. 1996.
    [28]
    J. E. Pitkow and P. L. Pirolli. Life, Death, and Lawfulness on the Electronic Frontier. In A CM Conference on Human Factors in Computing Systems, Atlanta, GA, Mar. 1997.
    [29]
    M. E Porter. An algorithm for suffix stripping. In K. S. Jones and P. WiUet, editors, Readings in Information Retrieval. Morgan Kaufmann, San Francisco, 1997. Originally published in Program, 14(3):130-137 (1980).
    [30]
    E. Selberg and O. Etzioni. The MetaCrawler Architecture for Resource Aggregation on the Web. IEEE Expert, 12(1):8-14, Jan/Feb 1997.
    [31]
    D. Sullivan. More evil than Dr. Evil? From the Search Engine Report, at http://www.searchenginewatch- .com/sereport/99/11-google.html, Nov. 1999.
    [32]
    D. Sullivan. Search engine features for webmasters. From Search Engine Watch, at http://www.searchenginewatch- .com/webmasters/features.html, Jan. 2000.
    [33]
    O. Zamir and O. Etzioni. Grouper: a dynamic clustering interface to Web search results. In Proceedings of the Eighth International World Wide Web Conference, Toronto, Canada, May 1999.

    Cited By

    View all
    • (2023)Der Einfluss bundespolitischer Themen auf den Landtagswahlkampf in Bayern 2018 – Eine Untersuchung der Twitter-Kommunikation von Bundes- und LandespolitikernDie Landtagswahl 2018 in Bayern10.1007/978-3-658-41392-7_11(417-452)Online publication date: 3-Aug-2023
    • (2021)Discovering obscure looking glass sites on the web to facilitate internet measurement researchProceedings of the 17th International Conference on emerging Networking EXperiments and Technologies10.1145/3485983.3494857(426-439)Online publication date: 2-Dec-2021
    • (2021)Automated Support to Capture Environment Assertions for Requirements-Based Testing2021 IEEE 22nd International Conference on Information Reuse and Integration for Data Science (IRI)10.1109/IRI51335.2021.00023(123-130)Online publication date: Aug-2021
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '00: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
    July 2000
    396 pages
    ISBN:1581132263
    DOI:10.1145/345508
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 July 2000

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Article

    Conference

    SIGIR00
    Sponsor:
    • Greek Com Soc
    • SIGIR
    • Athens U of Econ & Business

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)51
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 11 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Der Einfluss bundespolitischer Themen auf den Landtagswahlkampf in Bayern 2018 – Eine Untersuchung der Twitter-Kommunikation von Bundes- und LandespolitikernDie Landtagswahl 2018 in Bayern10.1007/978-3-658-41392-7_11(417-452)Online publication date: 3-Aug-2023
    • (2021)Discovering obscure looking glass sites on the web to facilitate internet measurement researchProceedings of the 17th International Conference on emerging Networking EXperiments and Technologies10.1145/3485983.3494857(426-439)Online publication date: 2-Dec-2021
    • (2021)Automated Support to Capture Environment Assertions for Requirements-Based Testing2021 IEEE 22nd International Conference on Information Reuse and Integration for Data Science (IRI)10.1109/IRI51335.2021.00023(123-130)Online publication date: Aug-2021
    • (2020)Crawling the German Health Web: Exploratory Study and Graph AnalysisJournal of Medical Internet Research10.2196/1785322:7(e17853)Online publication date: 24-Jul-2020
    • (2020)Predicting Labor Market CompetitionInformation Systems Research10.1287/isre.2020.095431:4(1443-1466)Online publication date: 1-Dec-2020
    • (2020)HPMScientific Programming10.1155/2020/88972442020Online publication date: 6-Nov-2020
    • (2020)TINB: a topical interaction network builder from WWWWireless Networks10.1007/s11276-020-02469-yOnline publication date: 6-Oct-2020
    • (2019)Hypergraph-of-entityOpen Computer Science10.1515/comp-2019-00069:1(103-127)Online publication date: 6-Jun-2019
    • (2018)Towards data-driven vulnerability prediction for requirementsProceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3236024.3264836(744-748)Online publication date: 26-Oct-2018
    • (2018)The colors of the national WebInternational Journal on Digital Libraries10.1007/s00799-016-0202-619:1(95-106)Online publication date: 1-Mar-2018
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media