Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

A Model-Based Approach for Crawling Rich Internet Applications

Published: 08 July 2014 Publication History

Abstract

New Web technologies, like AJAX, result in more responsive and interactive Web applications, sometimes called Rich Internet Applications (RIAs). Crawling techniques developed for traditional Web applications are not sufficient for crawling RIAs. The inability to crawl RIAs is a problem that needs to be addressed for at least making RIAs searchable and testable. We present a new methodology, called “model-based crawling”, that can be used as a basis to design efficient crawling strategies for RIAs. We illustrate model-based crawling with a sample strategy, called the “hypercube strategy”. The performances of our model-based crawling strategies are compared against existing standard crawling strategies, including breadth-first, depth-first, and a greedy strategy. Experimental results show that our model-based crawling approach is significantly more efficient than these standard strategies.

References

[1]
M. Aigner. 1973. Lexicographic matching in boolean algebras. J. Combin. Theory 14, 3, 187--194.
[2]
D. Amalfitano, A. R. Fasolino, and P. Tramontana. 2008. Reverse engineering finite state machines from rich Internet applications. In Proceedings of the 15th Working Conference on Reverse Engineering (WCRE'08). IEEE Computer Society, 69--73.
[3]
D. Amalfitano, A. R. Fasolino, and P. Tramontana. 2010. Rich Internet application testing using execution trace data. In Proceedings of the 3rd International Conference on Software Testing, Verification, and Validation Workshops (ICSTW'10). IEEE Computer Society, 274--283.
[4]
I. Anderson. 1987. Combinatorics of Finite Sets. Oxford University Press, London.
[5]
Apache. 2004. Apache flex. http://incubator.apache.org/flex/.
[6]
A. Arasu, J. Cho, H. Garcia-Molina, A. Paepcke, and S. Raghavan. 2001. Searching the web. ACM Trans. Internet Technol. 1, 1, 2--43.
[7]
J. Bau, E. Bursztein, D. Gupta, and J. Mitchell. 2010. State of the art: Automated black-box web application vulnerability testing. In Proceedings of the IEEE Symposium on Security and Privacy (SP'10). IEEE Computer Society, 332--345.
[8]
K. Benjamin. 2010. A strategy for efficient crawling of rich Internet applications. M.S. thesis, EECS - University of Ottawa. http://ssrg.eecs.uottawa.ca/docs/Benjamin-Thesis.pdf.
[9]
K. Benjamin, G. V. Bochmann, G.-V. Jourdan, and I.-V. Onut. 2010. Some modeling challenges when testing rich Internet applications for security. In Proceedings of the 3rd International Conference on Software Testing, Verification, and Validation Workshops (ICSTW'10). IEEE Computer Society, 403--409.
[10]
K. Benjamin, G. Von Bochmann, M. E. Dincturk, G.-V. Jourdan, and I. V. Onut. 2011. A strategy for efficient crawling of rich Internet applications. In Proceedings of the 11th International Conference on Web Engineering (ICWE'11). Springer, 74--89.
[11]
C.-P. Bezemer, A. Mesbah, and A. Van Deursen. 2009. Automated security testing of web widget interactions. In Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE'09). ACM Press, New York, 81--90.
[12]
N. Bruijn, C. Tengbergen, and D. Kruyswijk. 1951. On the set of divisors of a number. Nieuw Arch. Wisk. 23, 191--194.
[13]
G. Carpaneto, M. Dellamico, and P. Toth. 1995. Exact solution of large-scale, asymmetric traveling salesman problems. ACM Trans. Math. Softw. 21, 4, 394--409.
[14]
J. Cho and H. Garcia-Molina. 2003. Estimating frequency of change. ACM Trans. Internet Technol. 3, 3, 256--290.
[15]
S. Choudhary. 2012. M-crawler: Crawling rich Internet applications using menu meta-model. M.S. thesis, EECS - University of Ottawa. http://ssrg.site.uottawa.ca/docs/Surya-Thesis.pdf.
[16]
S. Choudhary, M. E. Dincturk, G. V. Bochmann, G.-V. Jourdan, I. V. Onut, and P. Ionescu. 2012. Solving some modeling challenges when testing rich Internet applications for security. In Proceedings of the International Conference on Software Testing, Verification, and Validation. 850--857.
[17]
S. Choudhary, M. E. Dincturk, S. M. Mirtaheri, G.-V. Jourdan, G. Bochmann, and I.-V. Onut. 2013. Building rich Internet applications models: Example of a better strategy. In Proceedings of the 13th International Conference on Web Engineering (ICWE'13). Lecture Notes in Computer Science, vol. 7977, Springer, 291--305.
[18]
E. G. Coffman, Z. Liu, and R. R. Weber. 1998. Optimal robot scheduling for web search engines. J. Schedul. 1, 1, 15--29.
[19]
R. P. Dilworth. 1950. A decomposition theorem for partially ordered sets. Ann. Math. 51, 1, 161--166.
[20]
M. E. Dincturk. 2013. Model-based crawling - An approach to design efficient crawling strategies for rich Internet applications. Ph.D. thesis, EECS - University of Ottawa. http://ssrg.site.uottawa.ca/docs/Dincturk_MustafaEmre_2013_thesis.pdf.
[21]
M. E. Dincturk, S. Choudhary, G. Bochmann, G.-V. Jourdan, and I. V. Onut. 2012. A statistical approach for efficient crawling of rich Internet applications. In Proceedings of the 12th International Conference on Web Engineering (ICWE'12). Springer, 74--89.
[22]
C. Duda, G. Frey, D. Kossmann, R. Matter, and C. Zhou. 2009. Ajax crawl: Making Ajax applications searchable. In Proceedings of the IEEE International Conference on Data Engineering (ICDE'09). IEEE Computer Society, 78--89.
[23]
M. Faheem and P. Senellart. 2013. Intelligent and adaptive crawling of web applications for web archiving. In Proceedings of the 13th International Conference on Web Engineering (ICWE'13). F. Daniel, P. Dolog, and Q. Li, Eds., Lecture Notes in Computer Science, vol. 7977, Springer, 306--322.
[24]
G. Frey. 2007. Indexing Ajax web applications. M.S. thesis, ETH Zurich. http://e-collection.library.ethz.ch/eserv/eth:30111/eth-30111-01.pdf.
[25]
J. J. Garrett. 2005. Ajax: A new approach to web applications. http://www.adaptivepath.com/publications/essays/archives/000385.php.
[26]
Google. 2009. Making Ajax applications crawlable. http://code.google.com/web/ajaxcrawling/index.html.
[27]
C. Greene and D. J. Kleitman. 1976. Strong versions of Sperner's theorem. J. Combin. Theory A20, 1, 80--88.
[28]
J. Griggs, C. E. Killian, and C. Savage. 2004. Venn diagrams and symmetric chain decompositions in the boolean lattice. Electron. J. Combin. 11, 2.
[29]
J. Lu, Y. Wang, J. Liang, J. Chen, and J. Liu. 2008. An approach to deep web crawling by sampling. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT'08), Vol. 1. 718--724.
[30]
A. Mesbah, E. Bozdag, and A. V. Deursen. 2008. Crawling Ajax by inferring user interface state changes. In Proceedings of the 8th International Conference on Web Engineering (ICWE'08). IEEE Computer Society, 122--134.
[31]
A. Mesbah and A. Van Deursen. 2009. Invariant-based automatic testing of Ajax user interfaces. In Proceedings of the 31st IEEE International Conference on Software Engineering (ICSE'09). 210--220.
[32]
A. Mesbah, A. Van Deursen, and S. Lenselink. 2012. Crawling Ajax-based web applications through dynamic analysis of user interface state changes. ACM Trans. Web 6, 1.
[33]
Microsoft. 2007. Silverlight. http://www.microsoft.com/silverlight/.
[34]
A. Ntoulas, P. Zerfos, and J. Cho. 2005. Downloading textual hidden web content through keyword queries. In Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL'05). ACM Press, New York, 100--109.
[35]
C. Olston and M. Najork. 2010. Web crawling. Found. Trends Inf. Retr. 4, 3, 175--246.
[36]
L. Page, S. Brin, R. Motwani, and T. Winograd. 1998. The PageRank citation ranking: Bringing order to the web. Tech. rep., Standford University. http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf.
[37]
Z. Peng, N. He, C. Jiang, Z. Li, L. Xu, Y. Li, and Y. Ren. 2012. Graph-based Ajax crawl: Mining data from rich Internet applications. In Proceedings of the International Conference on Computer Science and Electronics Engineering (ICCSEE'12). Vol. 3, 590--594.
[38]
D. Roest, A. Mesbah, and A. Van Deursen. 2010. Regression testing Ajax applications: Coping with dynamism. In Proceedings of the 3rd International Conference on Software Testing, Verification and Validation (ICST'10). IEEE Computer Society, 127--136.
[39]
W3C. 2005. Document object model (dom). http://www.w3.org/DOM/.
[40]
P. Wu, J.-R. Wen, H. Liu, and W.-Y. Ma. 2006. Query selection techniques for efficient crawling of structured web sources. In Proceedings of the 22nd International Conference on Data Engineering (ICDE'06). IEEE Computer Society, 47.

Cited By

View all
  • (2024)Tree-Based Synthesis of Web Test Sequences from Manual ActionsTheoretical Aspects of Software Engineering10.1007/978-3-031-64626-3_14(242-260)Online publication date: 29-Jul-2024
  • (2022)An Intensive review on implementation of Big Data in different applications and its associated issues and Challenges2022 5th International Conference on Contemporary Computing and Informatics (IC3I)10.1109/IC3I56241.2022.10072480(670-673)Online publication date: 14-Dec-2022
  • (2021)Model-Based Testing of Web Application: An SLRVFAST Transactions on Software Engineering10.21015/vtse.v9i4.9489:4(126-136)Online publication date: 31-Dec-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on the Web
ACM Transactions on the Web  Volume 8, Issue 3
June 2014
256 pages
ISSN:1559-1131
EISSN:1559-114X
DOI:10.1145/2639948
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 July 2014
Accepted: 01 December 2013
Revised: 01 August 2013
Received: 01 May 2012
Published in TWEB Volume 8, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. AJAX
  2. Crawling
  3. DOM
  4. dynamic analysis
  5. modeling
  6. rich Internet applications

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Tree-Based Synthesis of Web Test Sequences from Manual ActionsTheoretical Aspects of Software Engineering10.1007/978-3-031-64626-3_14(242-260)Online publication date: 29-Jul-2024
  • (2022)An Intensive review on implementation of Big Data in different applications and its associated issues and Challenges2022 5th International Conference on Contemporary Computing and Informatics (IC3I)10.1109/IC3I56241.2022.10072480(670-673)Online publication date: 14-Dec-2022
  • (2021)Model-Based Testing of Web Application: An SLRVFAST Transactions on Software Engineering10.21015/vtse.v9i4.9489:4(126-136)Online publication date: 31-Dec-2021
  • (2021)Automatic Web Testing Using Curiosity-Driven Reinforcement LearningProceedings of the 43rd International Conference on Software Engineering10.1109/ICSE43902.2021.00048(423-435)Online publication date: 22-May-2021
  • (2021)Towards various applications of Big Data and related issues and challenges2021 5th International Conference on Electronics, Communication and Aerospace Technology (ICECA)10.1109/ICECA52323.2021.9675990(1361-1365)Online publication date: 2-Dec-2021
  • (2019)Exploring the Intersections of Web Science and AccessibilityHuman Systems Engineering and Design II10.1007/978-3-030-27928-8_73(483-488)Online publication date: 14-Aug-2019
  • (2018)Smart Approach to Crawl Web Interfaces Using a Two Stage Framework of Crawler2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA)10.1109/ICCUBEA.2018.8697592(1-6)Online publication date: Aug-2018
  • (2018)Accuracy Crawler: An Accurate Crawler for Deep Web Data Extraction2018 International Conference on Control, Power, Communication and Computing Technologies (ICCPCCT)10.1109/ICCPCCT.2018.8574286(25-29)Online publication date: Mar-2018
  • (2018)Automatically Crawling Dynamic Web Applications via Proxy-Based JavaScript Injection and Runtime Analysis2018 IEEE Third International Conference on Data Science in Cyberspace (DSC)10.1109/DSC.2018.00042(242-249)Online publication date: Jun-2018
  • (2018)GUIDE: an interactive and incremental approach for crawling Web applicationsThe Journal of Supercomputing10.1007/s11227-018-2335-4Online publication date: 28-Mar-2018
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media