Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-642-31753-8_29guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

A statistical approach for efficient crawling of rich internet applications

Published: 23 July 2012 Publication History

Abstract

Modern web technologies, like AJAX result in more responsive and usable web applications, sometimes called Rich Internet Applications (RIAs). Traditional crawling techniques are not sufficient for crawling RIAs. We present a new strategy for crawling RIAs. This new strategy is designed based on the concept of "Model-Based Crawling" introduced in [3] and uses statistics accumulated during the crawl to select what to explore next with a high probability of uncovering some new information. The performance of our strategy is compared with our previous strategy, as well as the classical Breadth-First and Depth-First on two real RIAs and two test RIAs. The results show this new strategy is significantly better than the Breadth-First and the Depth-First strategies (which are widely used to crawl RIAs), and outperforms our previous strategy while being much simpler to implement.

References

[1]
Bau, J., Bursztein, E., Gupta, D., Mitchell, J.C.: State of the Art: Automated Black-Box Web Application Vulnerability Testing. In: Proc. IEEE Symposium on Security and Privacy (2010)
[2]
Benjamin, K., von Bochmann, G., Jourdan, G. V., Onut, I. V.: Some Modeling Challenges when Testing Rich Internet Applications for Security. In: First International Workshop on Modeling and Detection of Vulnerabilities, Paris, France (2010)
[3]
Benjamin, K., von Bochmann, G., Dincturk, M. E., Jourdan, G.-V., Onut, I. V.: A Strategy for Efficient Crawling of Rich Internet Applications. In: Auer, S., Díaz, O., Papadopoulos, G. A. (eds.) ICWE 2011. LNCS, vol. 6757, pp. 74-89. Springer, Heidelberg (2011)
[4]
Carpento, G., Dell'amico, M., Toth, P.: Exact solution of large-scale, asymmetric traveling salesman problems. ACM Trans. Math. Softw. 21(4) (1995)
[5]
W3C. Document Object Model, DOM (2005), http://www.w3.org/DOM/
[6]
Garrett, J. J.: Adaptive Path (2005), http://www.adaptivepath.com/publications/essays/archives/000385.php

Cited By

View all
  • (2018)Browserless Web Data ExtractionProceedings of the 2018 World Wide Web Conference10.1145/3178876.3186008(1095-1104)Online publication date: 10-Apr-2018
  • (2017)Searching for behavioural bugs with stateful test oracles in web crawlersProceedings of the 10th International Workshop on Search-Based Software Testing10.5555/3105427.3105430(7-13)Online publication date: 20-May-2017
  • (2014)Model-based rich internet applications crawlingJournal of Web Engineering10.5555/2685119.268512413:3-4(243-262)Online publication date: 1-Jul-2014
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
ICWE'12: Proceedings of the 12th international conference on Web Engineering
July 2012
511 pages
ISBN:9783642317521
  • Editors:
  • Marco Brambilla,
  • Takehiro Tokuda,
  • Robert Tolksdorf

Sponsors

  • SparxSystems: SparxSystems Software GmbH
  • WebRatio: WebRatio

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 23 July 2012

Author Tags

  1. rich internet applications
  2. web application modeling
  3. web crawling

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2018)Browserless Web Data ExtractionProceedings of the 2018 World Wide Web Conference10.1145/3178876.3186008(1095-1104)Online publication date: 10-Apr-2018
  • (2017)Searching for behavioural bugs with stateful test oracles in web crawlersProceedings of the 10th International Workshop on Search-Based Software Testing10.5555/3105427.3105430(7-13)Online publication date: 20-May-2017
  • (2014)Model-based rich internet applications crawlingJournal of Web Engineering10.5555/2685119.268512413:3-4(243-262)Online publication date: 1-Jul-2014
  • (2014)A Model-Based Approach for Crawling Rich Internet ApplicationsACM Transactions on the Web10.1145/26263718:3(1-39)Online publication date: 8-Jul-2014
  • (2013)A brief history of web crawlersProceedings of the 2013 Conference of the Center for Advanced Studies on Collaborative Research10.5555/2555523.2555529(40-54)Online publication date: 18-Nov-2013
  • (2013)Web object identification for web automation and meta-searchProceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics10.1145/2479787.2479798(1-12)Online publication date: 12-Jun-2013
  • (2013)Building rich internet applications modelsProceedings of the 13th international conference on Web Engineering10.1007/978-3-642-39200-9_25(291-305)Online publication date: 8-Jul-2013
  • (2012)Crawling rich internet applicationsProceedings of the 2012 Conference of the Center for Advanced Studies on Collaborative Research10.5555/2399776.2399790(146-160)Online publication date: 5-Nov-2012

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media