Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Sentiment-Focused Web Crawling

Published: 06 November 2014 Publication History
  • Get Citation Alerts
  • Abstract

    Sentiments and opinions expressed in Web pages towards objects, entities, and products constitute an important portion of the textual content available in the Web. In the last decade, the analysis of such content has gained importance due to its high potential for monetization. Despite the vast interest in sentiment analysis, somewhat surprisingly, the discovery of sentimental or opinionated Web content is mostly ignored. This work aims to fill this gap and addresses the problem of quickly discovering and fetching the sentimental content present in the Web. To this end, we design a sentiment-focused Web crawling framework. In particular, we propose different sentiment-focused Web crawling strategies that prioritize discovered URLs based on their predicted sentiment scores. Through simulations, these strategies are shown to achieve considerable performance improvement over general-purpose Web crawling strategies in discovery of sentimental Web content.

    References

    [1]
    Ahmed Abbasi, Hsinchun Chen, and Arab Salem. 2008. Sentiment analysis in multiple languages: feature selection for opinion classification in Web forums. ACM Trans. Inf. Syst. 26, 3, 12:1--12:34.
    [2]
    Ahmed Abbasi, Tianjun Fu, Daniel Zeng, and Donald Adjeroh. 2013. Crawling Credible Online Medical Sentiments for Social Intelligence. In Proceedings of the ASE/IEEE International Conference on Social Computing. 254--263.
    [3]
    Dirk Ahlers and Susanne Boll. 2009. Adaptive geospatially focused crawling. In Proceedings of the 18th ACM International Conference on Information and Knowledge Management. 445--454.
    [4]
    Ismail Sengor Altingovde and Ozgur Ulusoy. 2004. Exploiting interclass rules for focused crawling. IEEE Intell. Syst. 19, 6, 66--73.
    [5]
    Andrea Esuli Stefano Baccianella and Fabrizio Sebastiani. 2010. SentiWordNet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In Proceedings of the 7th Conference on International Language Resources and Evaluation.
    [6]
    Xue Bai. 2011. Predicting consumer sentiments from online text. Decision Support Syst. 50, 4, 732--742.
    [7]
    Sotiris Batsakis, Euripides G. M. Petrakis, and Evangelos Milios. 2009. Improving the performance of focused web crawlers. Data Knowl. Eng. 68, 10, 1001--1013.
    [8]
    Philip Beineke, Trevor Hastie, Christopher Manning, and Shivakumar Vaithyanathan. 2004. Exploring sentiment summarization. In Proceedings of the AAAI Spring Symposium on Exploring Attitude and Affect in Text: Theories and Applications. 1--4.
    [9]
    Krishna Bharat, Andrei Broder, Jeffrey Dean, and Monika R. Henzinger. 2000. A comparison of techniques to find mirrored hosts on the WWW. J. Amer. Soc. Inf. Sci. Technol. 51, 12, 1114--1122.
    [10]
    Andrei Z. Broder, Marc Najork, and Janet L. Wiener. 2003. Efficient URL caching for World Wide Web crawling. In Proceedings of the 12th International Conference on World Wide Web. 679--689.
    [11]
    Soumen Chakrabarti, Martin van den Berg, and Byron Dom. 1999. Focused crawling: A new approach to topic-specific Web resource discovery. Computer Networks 31, 11--16, 1623--1640.
    [12]
    Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 3, 27:1--27:27.
    [13]
    Sergiu Chelaru, Ismail Sengör Altingövde, Stefan Siersdorfer, and Wolfgang Nejdl. 2013. Analyzing, detecting, and exploiting sentiment in web queries. ACM Trans. Web 8, 6, 1.
    [14]
    YoungSik Choi, KiJoo Kim, and MunSu Kang. 2005. A focused crawling for the web resource discovery using a modified proximal support vector machines. In Proceedings of the International Conference on Computational Science and its Applications. 186--194.
    [15]
    Yoonjung Choi, Youngho Kim, and Sung-Hyon Myaeng. 2009. Domain-specific sentiment analysis using contextual feature generation. In Proceedings of the 1st International CIKM Workshop on Topic-Sentiment Analysis for Mass Opinion. 37--44.
    [16]
    Gordon V. Cormack, Mark D. Smucker, and Charles L. Clarke. 2011. Efficient and effective spam filtering and re-ranking for large web datasets. Inf. Retrieval 14, 5, 441--465.
    [17]
    Kushal Dave, Steve Lawrence, and David M. Pennock. 2003. Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In Proceedings of the 12th International Conference on World Wide Web. 519--528.
    [18]
    Marc Ehrig and Alexander Maedche. 2003. Ontology-focused crawling of web documents. In Proceedings of the ACM Symposium on Applied Computing. 1174--1178.
    [19]
    David Eichmann. 1995. Ethical web agents. Comput. Networks ISDN Syst. 28, 1--2, 127--136.
    [20]
    Tianjun Fu, Ahmed Abbasi, Daniel Zeng, and Hsinchun Chen. 2012. Sentimental spidering: leveraging opinion information in focused crawlers. ACM Trans. Inf. Syst. 30, 4, 24.
    [21]
    Shima Gerani, Mark J. Carman, and Fabio Crestani. 2009. Investigating learning approaches for blog post opinion retrieval. In Proceedings of the 31st European Conference on Information Retrieval. 313--324.
    [22]
    Namrata Godbole, Manjunath Srinivasaiah, and Steven Skiena. 2007. Large-scale sentiment analysis for news and blogs. In Proceedings of the International Conference on Weblogs and Social Media.
    [23]
    Michelle L. Gregory, Nancy Chinchor, Paul Whitney, Richard Carter, Elizabeth Hetzler, and Alan Turner. 2006. User-directed sentiment analysis: Visualizing the affective content of documents. In Proceedings of the Workshop on Sentiment and Subjectivity in Text. 23--30.
    [24]
    Allan Heydon and Marc Najork. 1999. Mercator: a scalable, extensible web crawler. World Wide Web 2, 4, 219--229.
    [25]
    Judy Johnson, Kostas Tsioutsiouliklis, and C. Lee Giles. 2003. Evolving strategies for focused web crawling. In Proceedings of the 20th International Conference on Machine Learning. 298--305.
    [26]
    Onur Kucuktunc, B. Barla Cambazoglu, Ingmar Weber, and Hakan Ferhatosmanoglu. 2012. A large-scale sentiment analysis for Yahoo! Answers. In Proceedings of the 5th ACM International Conference on Web Search and Data Mining. 633--642.
    [27]
    Kevin Lerman, Sasha Blair-Goldensohn, and Ryan McDonald. 2009. Sentiment summarization: evaluating and learning user preferences. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics. 514--522.
    [28]
    Hongyu Liu, Evangelos Milios, and Jeannette Janssen. 2004. Probabilistic models for focused web crawling. In Proceedings of the 6th ACM International Workshop on Web Information and Data Management. 16--22.
    [29]
    Tetsuya Nasukawa and Jeonghee Yi. 2003. Sentiment analysis: capturing favorability using natural language processing. In Proceedings of the 2nd International Conference on Knowledge Capture. 70--77.
    [30]
    Neil O'Hare, Michael Davy, Adam Bermingham, Paul Ferguson, Píaraic Sheridan, Cathal Gurrin, and Alan F. Smeaton. 2009. Topic-dependent sentiment analysis of financial blogs. In Proceedings of the 1st International CIKM Workshop on Topic-Sentiment Analysis for Mass Opinion. 9--16.
    [31]
    Christopher Olston and Marc Najork. 2010.Web crawling. Found. Trends Inf. Retrieval 4, 3, 175--246.
    [32]
    Bo Pang and Lillian Lee. 2008. Opinion mining and sentiment analysis. Found. Trends Inf. Retrieval 2, 1--135. Issue 1--2.
    [33]
    Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 79--86.
    [34]
    Gautam Pant and Padmini Srinivasan. 2005. Learning to crawl: comparing classification schemes. ACM Trans. Inf. Syst. 23, 4, 430--462.
    [35]
    Jialun Qin, Yilu Zhou, and Michael Chau. 2004. Building domain-specific web collections for scientific digital libraries: A meta-search enhanced focused crawling method. In Proceedings of the 4th ACM/IEEE-CS Joint Conference on Digital Libraries. 135--141.
    [36]
    Mike Thelwall, Kevan Buckley, and Georgios Paltoglou. 2011. Sentiment in Twitter events. J. Amer. Soc. Inf. Sci. Technol. 62, 2, 406--418.
    [37]
    Mike Thelwall, Kevan Buckley, and Georgios Paltoglou. 2012. Sentiment strength detection for the social web. J. Amer. Soc. Inf. Sci. Technol. 63, 1, 163--173.
    [38]
    Mike Thelwall, Kevan Buckley, Georgios Paltoglou, Di Cai, and Arvid Kappas. 2010. Sentiment strength detection in short informal text. J. Amer. Soc. Inf. Sci. Technol. 61, 12, 2544--2558.
    [39]
    Tun Thura Thet, Jin-Cheon Na, Christopher S. G. Khoo, and Subbaraj Shakthikumar. 2009. Sentiment analysis of movie reviews on discussion boards using a linguistic approach. In Proceedings of the 1st International CIKM Workshop on Topic-Sentiment Analysis for Mass Opinion. 81--84.
    [40]
    Peter D. Turney. 2002. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. 417--424.
    [41]
    Wouter van Atteveldt, Jan Kleinnijenhuis, Nel Ruigrok, and Stefan Schlobach. 2008. Good news or bad news? Conducting sentiment analysis on Dutch text to distinguish between positive and negative relations. J. Inf. Tech. Politics 5, 1, 73--94.
    [42]
    A. Gural Vural, B. Barla Cambazoglu, and Pinar Senkul. 2012a. Sentiment-focused web crawling. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management. 2020--2024.
    [43]
    A. Gural Vural, B. Barla Cambazoglu, Pinar Senkul, and Ozge Tokgoz. 2012b. A framework for sentiment analysis in Turkish: Application to polarity detection of movie reviews in Turkish. In Proceedings of the 27th International Symposium on Computer and Information Sciences. 437--445.
    [44]
    Xiaolong Wang, Furu Wei, Xiaohua Liu, Ming Zhou, and Ming Zhang. 2011. Topic sentiment analysis in Twitter: A graph-based hashtag sentiment classification approach. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management. 1031--1040.
    [45]
    Jeonghee Yi, Tetsuya Nasukawa, Razvan Bunescu, and Wayne Niblack. 2003. Sentiment analyzer: Extracting sentiments about a given topic using natural language processing techniques. In Proceedings of the 3rd IEEE International Conference on Data Mining. 427--434.
    [46]
    Meiyappan Yuvarani, N. Ch. Sriman Narayana Iyengar, and Arputharaj Kannan. 2006. LSCrawler: A framework for an enhanced focused web crawler based on link semantics. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence. 794--800.
    [47]
    Changli Zhang, Daniel Zeng, Jiexun Li, Fei-Yue Wang, and Wanli Zuo. 2009. Sentiment analysis of Chinese documents: From sentence to document level. J. Amer. Soc. Inf. Sci. Technol. 60, 12, 2474--2487.
    [48]
    Wei Zhang, Clement Yu, and Weiyi Meng. 2007. Opinion retrieval from blogs. In Proceedings of the 16th ACM International Conference on Information and Knowledge Management. 831--840.

    Cited By

    View all
    • (2024)Weakly supervised learning for an effective focused web crawlerEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.107944132(107944)Online publication date: Jun-2024
    • (2022)An Overview of Methodologies and Challenges in Sentiment Analysis on Social NetworksResearch Anthology on Implementing Sentiment Analysis Across Multiple Disciplines10.4018/978-1-6684-6303-1.ch084(1590-1599)Online publication date: 10-Jun-2022
    • (2022)Building a Technology Recommender System Using Web Crawling and Natural Language Processing TechnologyAlgorithms10.3390/a1508027215:8(272)Online publication date: 3-Aug-2022
    • Show More Cited By

    Index Terms

    1. Sentiment-Focused Web Crawling

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on the Web
      ACM Transactions on the Web  Volume 8, Issue 4
      October 2014
      178 pages
      ISSN:1559-1131
      EISSN:1559-114X
      DOI:10.1145/2686863
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 06 November 2014
      Accepted: 01 July 2014
      Revised: 01 May 2014
      Received: 01 July 2013
      Published in TWEB Volume 8, Issue 4

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Sentiment analysis
      2. focused web crawling

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)22
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 06 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Weakly supervised learning for an effective focused web crawlerEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.107944132(107944)Online publication date: Jun-2024
      • (2022)An Overview of Methodologies and Challenges in Sentiment Analysis on Social NetworksResearch Anthology on Implementing Sentiment Analysis Across Multiple Disciplines10.4018/978-1-6684-6303-1.ch084(1590-1599)Online publication date: 10-Jun-2022
      • (2022)Building a Technology Recommender System Using Web Crawling and Natural Language Processing TechnologyAlgorithms10.3390/a1508027215:8(272)Online publication date: 3-Aug-2022
      • (2022)Amelioration of linguistic semantic classifier with sentiment classifier manacle for the focused web crawlerInternational Journal of Information Technology10.1007/s41870-022-01139-w15:2(1137-1149)Online publication date: 27-Dec-2022
      • (2021)A Critique Empirical Evaluation of Relevance Computation for Focused Web CrawlersBrazilian Archives of Biology and Technology10.1590/1678-4324-202121022364Online publication date: 2021
      • (2021)Identification of Metrics for the Purdue Index for Construction Using Latent Dirichlet AllocationJournal of Management in Engineering10.1061/(ASCE)ME.1943-5479.000096837:6Online publication date: Nov-2021
      • (2020)An Overview of Methodologies and Challenges in Sentiment Analysis on Social NetworksHandbook of Research on Big Data Clustering and Machine Learning10.4018/978-1-7998-0106-1.ch010(204-213)Online publication date: 2020
      • (2020)A Search for Optimal Feature in Political Sentiment Analysis2020 IEEE International Women in Engineering (WIE) Conference on Electrical and Computer Engineering (WIECON-ECE)10.1109/WIECON-ECE52138.2020.9397966(340-343)Online publication date: 26-Dec-2020
      • (2020)Review selection based on content qualityKnowledge and Information Systems10.1007/s10115-020-01474-zOnline publication date: 21-May-2020
      • (2017)Sentiment Analysis for the Social MediaProceedings of the SouthEast Conference10.1145/3077286.3077569(215-218)Online publication date: 13-Apr-2017
      • Show More Cited By

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media