Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1062745.1062828acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
Article

Mining web site's topic hierarchy

Published: 10 May 2005 Publication History

Abstract

Searching and navigating a Web site is a tedious task and the hierarchical models, such as site maps, are frequently used for organizing the Web site's content. In this work, we propose to model a Web site's content structure using the topic hierarchy, a directed tree rooted at a Web site's homepage in which the vertices and edges correspond to Web pages and hyperlinks. Our algorithm for mining a Web site's topic hierarchy utilizes three types of information associated with a Web site: link structure, directory structure and Web pages' content.

References

[1]
W.S. Li, O. Kolak, Q. Vu and H. Takano. Defining Logical Domains in a Web Site. Proc. of ACM Hypertext, San Antonio, 2000
[2]
M. Ester, H.P. Kriegel and M. Schubert. Web Site Mining: A new way to spot Competitors, Customers and Suppliers in the World Wide Web. In Proc. of ACM SIGKDD 2002
[3]
Z. Chen, S. Liu, W. Liu, G. Pu and W.Y. Ma. Building a Web Thesaurus from Web Link Structure. In Proc. of ACM SIGIR, Toronto, Canada, 2003

Cited By

View all
  • (2017)Mining the information architecture of the WWW using automated website boundary detectionWeb Intelligence10.3233/WEB-17036515:4(269-290)Online publication date: 20-Nov-2017
  • (2012)Extracting a spatial ontology from a large Flickr tag dataset4th International Conference on Awareness Science and Technology10.1109/iCAwST.2012.6469595(91-97)Online publication date: Aug-2012
  • (2009)Keyphrase extraction for labeling a website topic hierarchyProceedings of the 11th International Conference on Electronic Commerce10.1145/1593254.1593266(81-88)Online publication date: 12-Aug-2009
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WWW '05: Special interest tracks and posters of the 14th international conference on World Wide Web
May 2005
454 pages
ISBN:1595930515
DOI:10.1145/1062745
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 May 2005

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. content structure
  2. topic hierarchy
  3. web mining

Qualifiers

  • Article

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2017)Mining the information architecture of the WWW using automated website boundary detectionWeb Intelligence10.3233/WEB-17036515:4(269-290)Online publication date: 20-Nov-2017
  • (2012)Extracting a spatial ontology from a large Flickr tag dataset4th International Conference on Awareness Science and Technology10.1109/iCAwST.2012.6469595(91-97)Online publication date: Aug-2012
  • (2009)Keyphrase extraction for labeling a website topic hierarchyProceedings of the 11th International Conference on Electronic Commerce10.1145/1593254.1593266(81-88)Online publication date: 12-Aug-2009
  • (2009)Cross-Media Data Mining Using Associated Keyword SpaceProceedings of the 2009 Ninth IEEE International Conference on Computer and Information Technology - Volume 0210.1109/CIT.2009.41(289-294)Online publication date: 11-Oct-2009
  • (2008)Analysis of Network Lifetime in Hybrid Sensor Networks with Wired Shortcut2008 4th International Conference on Wireless Communications, Networking and Mobile Computing10.1109/WiCom.2008.862(1-4)Online publication date: Oct-2008
  • (2008)Extracting Structure of Web Site Based on Hyperlink Analysis2008 4th International Conference on Wireless Communications, Networking and Mobile Computing10.1109/WiCom.2008.2538(1-4)Online publication date: Oct-2008
  • (2008)A Novel Agent-Based Model for Search in Distributed Networks2008 4th International Conference on Wireless Communications, Networking and Mobile Computing10.1109/WiCom.2008.1326(1-4)Online publication date: Oct-2008
  • (2008)Web site topic‐hierarchy generation based on link structureJournal of the American Society for Information Science and Technology10.1002/asi.2099060:3(495-508)Online publication date: 8-Dec-2008
  • (2006)A Mining Method for LinkedWeb Pages Using Associated Keyword SpaceProceedings of the International Symposium on Applications on Internet10.1109/SAINT.2006.4(268-276)Online publication date: 23-Jan-2006
  • (2005)Extracting a website's content structure from its link structureProceedings of the 14th ACM international conference on Information and knowledge management10.1145/1099554.1099660(345-346)Online publication date: 31-Oct-2005

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media