Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1076034.1076068acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Exploiting the hierarchical structure for link analysis

Published: 15 August 2005 Publication History

Abstract

Link analysis algorithms have been extensively used in Web information retrieval. However, current link analysis algorithms generally work on a flat link graph, ignoring the hierarchal structure of the Web graph. They often suffer from two problems: the sparsity of link graph and biased ranking of newly-emerging pages. In this paper, we propose a novel ranking algorithm called Hierarchical Rank as a solution to these two problems, which considers both the hierarchical structure and the link structure of the Web. In this algorithm, Web pages are first aggregated based on their hierarchical structure at directory, host or domain level and link analysis is performed on the aggregated graph. Then, the importance of each node on the aggregated graph is distributed to individual pages belong to the node based on the hierarchical structure. This algorithm allows the importance of linked Web pages to be distributed in the Web page space even when the space is sparse and contains new pages. Experimental results on the .GOV collection of TREC 2003 and 2004 show that hierarchical ranking algorithm consistently outperforms other well-known ranking algorithms, including the PageRank, BlockRank and LayerRank. In addition, experimental results show that link aggregation at the host level is much better than link aggregation at either the domain or directory levels.

References

[1]
Y. Asano, Applying the Site Information to the Information Retrieval from the Web. In Proc. of IEEE Conference of WISE, 2002.
[2]
R. Baeza-Yates, F. Saint-Jean, and C. Castillo. Web Dynamics, Age and Page Quality. In Proc. of SPIRE 2002, LNCS, Springer, Lisbon, Portugal, September 2002.
[3]
R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 1999.
[4]
K. Bharat and M. R. Henzinger. Improved Algorithms for Topic Distillation in a Hyperlinked Environment. In Proc. of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, July 1998.
[5]
K. Bharat, B. W. Chang, M. R. Henzinger, and M. Ruhl. Who Links to Whom: Mining Linkage between Web Sites. In the Proc. of 1st International Conference on Data Mining (ICDM) p51--58? 2001.
[6]
A. Broder, and R. Lempel. Efficient PageRank Approximation via Graph Aggregation. In Proc. of 13th International World Wide Web Conference, May 2004.
[7]
S. Brin and L. Page. The Anatomy of a Large-Scale Hypertextual Web Search Engine. In Proc. of 7th International World Wide Web Conference, May 1998.
[8]
S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, P.Raghavan, and S. Rajagopalan. Automatic Resource List Compilation by Analyzing Hyperlink Structure and Associated Text. In Proc. of the 7th International World Wide Web Conference, May 1998.
[9]
S. Chakrabarti, M. Joshi, and V. Tawde. Enhanced Topic Distillation using Text, Markup Tags, and Hyperlinks. In Proceedings of the 24th Annual International ACM SIGIR conference on Research and Development in Information Retrieval, ACM Press, 2001, pp. 208--216.
[10]
J. Cho and S. Roy. Impact N. Eiron, K. S. McCurley, and J. A. Tomlin. Ranking the Web Frontier. In Proceedings of the 13th International World Wide Web Conference, pages 309--318. ACM Press, 2004.
[11]
M. Faloutsos, P.Faloutsos, and C.Faloutsos. On Powerlaw Relationships of the Internet Topology. In ACM SIGCOMM, pages 251--262, 1999.
[12]
C. Gurrin and A. F. Smeaton. Replicating Of Search Engines On Page Popularity. In the Proc. of 13th World Wide Web Conference, May 2004.
[13]
B. D. Davison. Recognizing Nepotistic Links on the Web. In Artificial Intelligence for Web Search, pages 23--28. AAAI Press, July 2000.
[14]
D. Cai, X. F. He, J. R. Wen and W.Y. Ma. Block-level Link Analysis. The 27th Annual International ACM SIGIR conference on Research and Development in Information Retrieval (SIGIR'2004), July 2004.
[15]
N. Eiron and K. S. McCurley. Locality, Hierarchy, and Bidirectionality on the Web. In the Workshop on Web Algorithms and Models, 2003.
[16]
Web Structure in Small-Scale Test Collections. Information Retrieval, 2004.
[17]
T. H. Haveliwala. Topic-Sensitive PageRank. In Proc. of the 11th Int. World Wide Web Conference, May 2002.
[18]
X. M. Jiang, G. R. Xue, H. J. Zeng, Z. Chen, W.-G. Song, and W.-Y. Ma. Exploiting PageRank Analysis at Different Block Level. In the Proc. of Conference of WISE 2004.
[19]
S. D. Kamvar, T. H. Haveliwala C. D. Manning and G. H. Golub. Exploiting the Block Structure of the Web for Computing PageRank. In Proc of 12th International World Wide Web Conference, 2003.
[20]
J. Kleinberg, Authoritative Sources in a Hyperlinked Environment. Journal of the ACM, Vol. 46, No. 5, pp. 604--622, 1999.
[21]
L. Laura, S. Leonardi, G. Caldarelli, and P. D. L. Rios. A Multi-Layer Model for the Web Graph. In 2nd International Workshop on Web Dynamics, Honolulu, 2002.
[22]
C. Monz, J. Kamps, and M. de Rijke. The University of Amsterdam at TREC 2002.
[23]
E. Ravasz and A.L.Barabasi. Hierarchical Organization in Complex Networks. PHYSICAL REVIEW, E67,026112, 2003.
[24]
L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank Citation Ranking: Bringing Order to the Web. Technical report, Stanford University, Stanford, CA, 1998.
[25]
S. E. Robertson. Overview of the Okapi Projects. Journal of Documentation, Vol. 53, No. 1, 1997.
[26]
H. A. Simon. The Sciences of the Artificial. MIT Press, Cambridge, MA, 3rd edition, 1981.
[27]
J. Wu, K. Aberer. Using a Layered Markov Model for Decentralized Web Ranking. EPFL Technical Report IC/2004/70, August 19, 2004.
[28]
G. Pandurangan, P. Raghavan, and E. Upfal. Using PageRank to Characterize Web Structure. In 8th Annual International Computing and Combinatorics Conference, 2002.
[29]
D. Zhou, J., Weston, A. Gretton, O. Bousquet, and B. Scholkopf. Ranking on Data Manifolds. Advances in NIPS 16. Cambridge, MA: MIT Press, 2004.

Cited By

View all
  • (2020)Incremental Refinement of Page Ranking of Web PagesInternational Journal of Information Retrieval Research10.4018/IJIRR.202007010410:3(57-73)Online publication date: 1-Jul-2020
  • (2018)Unsupervised Domain Ranking in Large-Scale Web CrawlsACM Transactions on the Web10.1145/318218012:4(1-29)Online publication date: 27-Sep-2018
  • (2017)Product information extraction & analysisProceedings of the 3rd International Conference on Communication and Information Processing10.1145/3162957.3163028(261-267)Online publication date: 24-Nov-2017
  • Show More Cited By

Index Terms

  1. Exploiting the hierarchical structure for link analysis

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
    August 2005
    708 pages
    ISBN:1595930345
    DOI:10.1145/1076034
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 August 2005

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. hierarchical random walk model
    2. hierarchical web graph
    3. link analysis

    Qualifiers

    • Article

    Conference

    SIGIR05
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)20
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 02 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)Incremental Refinement of Page Ranking of Web PagesInternational Journal of Information Retrieval Research10.4018/IJIRR.202007010410:3(57-73)Online publication date: 1-Jul-2020
    • (2018)Unsupervised Domain Ranking in Large-Scale Web CrawlsACM Transactions on the Web10.1145/318218012:4(1-29)Online publication date: 27-Sep-2018
    • (2017)Product information extraction & analysisProceedings of the 3rd International Conference on Communication and Information Processing10.1145/3162957.3163028(261-267)Online publication date: 24-Nov-2017
    • (2017)Exploiting Guest Preferences with Aspect-Based Sentiment Analysis for Hotel RecommendationKnowledge Discovery, Knowledge Engineering and Knowledge Management10.1007/978-3-319-52758-1_3(31-46)Online publication date: 22-Jan-2017
    • (2016)Learning hostname preference to enhance search relevanceProceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence10.5555/3061053.3061165(3903-3909)Online publication date: 9-Jul-2016
    • (2016)A Ranking Approach on Large-Scale Graph With Multidimensional Heterogeneous InformationIEEE Transactions on Cybernetics10.1109/TCYB.2015.241823346:4(930-944)Online publication date: Apr-2016
    • (2016)Random surfing on multipartite graphs2016 IEEE International Conference on Big Data (Big Data)10.1109/BigData.2016.7840666(736-745)Online publication date: Dec-2016
    • (2015)DEARankMachine Language10.1007/s10994-014-5442-3101:1-3(415-435)Online publication date: 1-Oct-2015
    • (2015)Fast PageRank approximation by adaptive samplingKnowledge and Information Systems10.1007/s10115-013-0691-142:1(127-146)Online publication date: 1-Jan-2015
    • (2015)Exploiting Guest Preferences with Aspect-based Sentiment Analysis for Hotel RecommendationKnowledge Discovery, Knowledge Engineering and Knowledge Management10.1007/978-3-319-25840-9_3(34-49)Online publication date: 28-Oct-2015
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media