Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1871437.1871692acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
poster

Mapping web pages to database records via link paths

Published: 26 October 2010 Publication History

Abstract

In this paper we propose a new knowledge management task which aims to map Web pages to their corresponding records in a structured database. For example, the DBLP database contains records for many computer scientists, and most of these persons have public Web pages; if we can map the database record with the appropriate Web page then the new information could be used to further describe the person's database record. To accomplish this goal we employ link paths which contain anchor texts from multiple paths through the Web ending at the Web page in question. We hypothesize that the information from these link paths can be used to generate an accurate Web page to database record mapping. Experiments on two large, real world data sets, DBLP and IMDB for the structured data and computer science faculty members' Web pages and official movie homepages for the Web page data, show that our method does provide an accurate mapping. Finally, we conclude by issuing a call for further research on this promising new task.

References

[1]
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems, 30(1--7):107--117, 1998.
[2]
N. Craswell and D. Hawking. Overview of the trec-2002 web track. In TREC '02: In Proceedings of the eleventh text retrieval conference TREC-2002, pages 86--95. NIST, 2003.
[3]
N. Craswell, D. Hawking, and S. Robertson. Effective site finding using link anchor information. In SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages 250--257, New York, NY, USA, 2001. ACM.
[4]
O. A. McBryan. Genvl and wwww: tools for taming the web. In WWW1: Proceedings of the 15th international conference on World Wide Web, 1994.
[5]
W. Xi, E. A. Fox, R. P. Tan, and J. Shu. Machine learning approach for homepage finding task. In SPIRE 2002: Proceedings of the 9th International Symposium on String Processing and Information Retrieval, pages 145--159, London, UK, 2002. Springer-Verlag.
[6]
Y. Yen. Finding the k shortest loopless paths in a network. Management Science, 17(1):712--716, 1971.

Cited By

View all
  • (2013)The parallel path framework for entity discovery on the webACM Transactions on the Web10.1145/2516633.25166387:3(1-29)Online publication date: 30-Sep-2013
  • (2011)Construction and analysis of web-based computer science information networksProceedings of the 13th international conference on Rough sets, fuzzy sets, data mining and granular computing10.5555/2026782.2026784(1-2)Online publication date: 25-Jun-2011
  • (2011)WINACSProceedings of the 2011 ACM SIGMOD International Conference on Management of data10.1145/1989323.1989469(1255-1258)Online publication date: 12-Jun-2011
  • Show More Cited By

Index Terms

  1. Mapping web pages to database records via link paths

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management
      October 2010
      2036 pages
      ISBN:9781450300995
      DOI:10.1145/1871437
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 26 October 2010

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. link paths
      2. mapping
      3. semi-structured data
      4. web

      Qualifiers

      • Poster

      Conference

      CIKM '10

      Acceptance Rates

      Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 12 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2013)The parallel path framework for entity discovery on the webACM Transactions on the Web10.1145/2516633.25166387:3(1-29)Online publication date: 30-Sep-2013
      • (2011)Construction and analysis of web-based computer science information networksProceedings of the 13th international conference on Rough sets, fuzzy sets, data mining and granular computing10.5555/2026782.2026784(1-2)Online publication date: 25-Jun-2011
      • (2011)WINACSProceedings of the 2011 ACM SIGMOD International Conference on Management of data10.1145/1989323.1989469(1255-1258)Online publication date: 12-Jun-2011
      • (2011)Growing parallel paths for entity-page discoveryProceedings of the 20th international conference companion on World wide web10.1145/1963192.1963266(145-146)Online publication date: 28-Mar-2011
      • (2011)Construction and Analysis of Web-Based Computer Science Information NetworksRough Sets, Fuzzy Sets, Data Mining and Granular Computing10.1007/978-3-642-21881-1_1(1-2)Online publication date: 2011

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media