Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2227296.2227315guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Intelligent web navigation

Published: 01 September 2009 Publication History

Abstract

Virtual integration systems retrieve information according to the user's interest. This information is retrieved from several web applications, but it is presented to the user uniformly, in an online process. Therefore, response time is a significant factor. An essential part of any information retrieval system is navigation through pages. Usually web pages contain a high number of links, some of them leading to interesting information, but most of them having other purposes, like advertising or internal site navigation. Traditional crawlers follow every link in each page, in order to analyze the target page, and classify it as interesting or irrelevant. This means having to retrieve, analyze and classify thousands of pages for every single site, which is a costly task. This problem can be solved with the combination of a web page classifier, to distinguish between interesting and irrelevant pages, and a link classifier, which automatically identifies links leading to interesting pages. This kind of navigation is more efficient and has a lower cost than traditional crawlers. Moreover, navigation model is automatically extracted from the site, instead of being handcrafted, reducing the supervision from the user.

References

[1]
Charu C. Aggarwal, Fatima Al-Garawi, Philip S. Yu (2001) On the design of a learning crawler for topical resource discovery ACM Trans. Inf. Syst., 19(3):286-309
[2]
Vinod Anupam, Juliana Freire, Bharat Kumar, Daniel Lieuwen (2000) Automating web navigation with the WebVCR. Computer Networks, 33(1-6):503-517
[3]
Arvind Arasu, Hector García-Molina (2003) Extracting Structured Data from Web Pages. SIGMOD Conference, 337-348, 2003
[4]
Guilherme T. de Assis, Alberto H. F. Laender, Marcos André Gonçalves, Altigran Soares da Silva (2007) Exploiting Genre in Focused CrawlingSPIRE, 62-73
[5]
Luciano Barbosa, Juliana Freire (2005) Searching for Hidden-Web DatabasesWebDB, 1-6
[6]
Ziv BarYossef, Sridhar Rajagopalan (2002) Template Detection via Data Mining and its Applications. WWW, 580-591
[7]
Sotiris Batsakis, Euripides G. M. Petrakis, Evangelos Milios (2009) Improving the performance of focused web crawlers. Data & Knowledge Engineering - Elsevier
[8]
Robert Baumgartner, Michal Ceresna, Gerald Ledermuller (2005) Deep Web Navigation in Web Data Extraction CIMCA/IAWTIC, 698-703
[9]
Lorenzo Blanco, Valter Crescenzi, Paolo Merialdo (2005) Efficiently Locating Collections of Web Pages to Wrap. WEBIST, 247-254
[10]
Lorenzo Blanco, Valter Crescenzi, Paolo Merialdo (2007) Structure and Semantics of DataintensiveWeb Pages: An Experimental Study on their Relationships. J. UCS Special Issue on Wrapping Web Data Islands, 14(11):1877-1892
[11]
Jim Blythe, Dipsy Kapoor, Craig A. Knoblock, Kristina Lerman, Steven Minton Information Integration for the Masses. J. UCS Special Issue on Wrapping Web Data Islands, 14(11):1811-1837
[12]
James Caverlee, Ling Liu (2005) QA-Pagelet: Data Preparation Techniques for Large-Scale Data Analysis of the Deep Web. IEEE Trans. Knowl. Data Eng., 17(9):1247-1262, 2005
[13]
Soumen Chakrabarti, Byron Dom, Prabhakar Raghavan, Sridhar Rajagopalan, David Gibson, Jon M. Kleinberg (1998) Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text. Computer Networks, 30(1-7):65-74
[14]
Soumen Chakrabarti, Martin Van den Berg, Byron Dom (1999) Focused Crawling: A New Approach to Topic-Specific Resource Discovery. Computer Networks, 31(11-16):1623-1640
[15]
William W. Cohen (2002) Improving a page classifier with anchor extraction and link analysis. NIPS, 1481-1488
[16]
Valter Crescenzi, Giansalvatore Mecca, Paolo Merialdo (2001) RoadRunner: Towards Automatic Data Extraction from Large Web Sites. VLDB, 109-118
[17]
Hasan Davulcu, Juliana Freire, Michael Kifer, I. V. Ramakrishnan A layered architecture for Querying Dynamic Web Content. SIGMOD Conference, 491-502
[18]
Johannes Fürnkranz (2002) Hyperlink Ensembles: A case study in hypertext classification. Information Fusion, 3(4):299-312
[19]
Stéphane Grumbach, Giansalvatore Mecca (1999) In Search of the Lost Schema. ICDT, 314-331
[20]
Andreas Hotho, Alexander Maedche, Steffen Staab (2002) Ontology-based Text Document Clustering. Künstliche Intelligenz, 16(4):48-54
[21]
Stephen W. Liddle, David W. Embley, Del T. Scott, Sai Ho Yau (2002) Extracting data behind web forms. ER (Workshops), 402-413
[22]
Jayant Madhavan, Loredana Afanasiev, Lyublena Antova, Alon Halevy (2009) Harnessing the Deep Web: Present and Future. 4th Biennial Conference on Innovative Data Systems Research (CIDR).
[23]
A. Markov, M. Last, A. Kandel (2008) The Hybrid Representation Model for Web Document Classification. Int. J. Intell. Syst., 23(6):654-679
[24]
Paula Montoto, Alberto Pan, Juan Raposo, José Losada, Fernando Bellas, Victor Carneiro (2007) A Workflow Language for Web Automation. J. UCS, 14(11):1838-1856, 2008
[25]
Sougata Mukherjea (2004) Discovering and Analyzing World Wide Web Collections. Knowl. Inf. Syst., 6(2):230-241
[26]
Juliano Palmieri Lage, Altigran S. da Silva, Paulo B. Golgher, Alberto H. F. Laender (2004) Automatic generation of agents for collecting hidden Web pages for data extraction. Data Knowl. Eng., 49(2):177-196
[27]
Alberto Pan, Juan Raposo, Manuel Álvarez, Justo Hidalgo, Ángel Viña (2002) Semi-Automatic Wrapper Generation for Commercial Web Sources. Engineering Information Systems in the Internet Context, 265-283
[28]
Gautam Pant, Padmini Srinivasan (2005) Learning to crawl: Comparing classification schemesACM Trans. Inf. Syst., 23(4):430-462
[29]
Gautam Pant, Padmini Srinivasan (2006) Link Contexts in Classifier-Guided Topical Crawlers. IEEE Trans. Knowl. Data Eng., 18(1):107-122
[30]
Ioannis Partalas, Georgios Paliouras, Ioannis P. Vlahavas (2008) Reinforcement Learning with Classifier Selection for Focused CrawlingECAI, 759-760
[31]
John M. Pierre (2001) On the Automated Classification of Web Sites. CoRR, cs. IR/0102002
[32]
Sriram Raghavan, Hector Garcia-Molina (2001) Crawling the hidden web. VLDB, 129-138
[33]
Davi de Castro Reis, Paulo B. Golgher, Altigran S. da Silva, Alberto H. F. Laender (2004) Automatic Web News Extraction Using Tree Edit Distance. WWW, 502-511
[34]
Ali Selamat, Sigeru Omatu (2004) Web page feature selection and classification using neural networks. Inf. Sci., 15869-88
[35]
Mark P. Sinka, David W. Corne (2002) A Large Benchmark Dataset for Web document clustering. Soft Computing Systems: Design, Management and Applications, 87 (2002)
[36]
Márcio Vidal, Altigran S. da Silva, Edleno S. de Moura, Joao M. B. Cavalcanti (2007) Structure-Based Crawling in the Hidden Web. J. UCS, 14(11):1857-1876
[37]
Karane Vieira, Altigran S. da Silva, Nick Pinto, Edleno S. de Moura, Joao M. B. Cavalcanti, Juliana Freire (2006) A Fast and Robust Method for Web Page Template Detection and Removal. CIKM, 258-267
[38]
Yang Wang, Thomas Hornung (2008) Deep Web Navigation by Example. BIS (Workshops), 131-140
[39]
Hwanjo Yu, Jiawei Han, Kevin Chen-Chuan Chang (2004) PEBL: Web Page Classification without Negative Examples. IEEE Trans. Knowl. Data Eng., 16(1):70-81

Cited By

View all
  1. Intelligent web navigation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    FDIA'09: Proceedings of the Third BCS-IRSG conference on Future Directions in Information Access
    September 2009
    142 pages

    Sponsors

    • BCS-ISRG: BCS-ISRG

    Publisher

    BCS Learning & Development Ltd.

    Swindon, United Kingdom

    Publication History

    Published: 01 September 2009

    Author Tags

    1. information retrieval
    2. navigation
    3. virtual integration

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 07 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media