Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3178876.3186135acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article
Free access

VizByWiki: Mining Data Visualizations from the Web to Enrich News Articles

Published: 23 April 2018 Publication History

Abstract

Data visualizations in news articles (e.g., maps, line graphs, bar charts) greatly enrich the content of news articles and result in well-established improvements to reader comprehension. However, existing systems that generate news data visualiza-tions either require substantial manual effort or are limited to very specific types of data visualizations, thereby greatly re-stricting the number of news articles that can be enhanced. To address this issue, we define a new problem: given a news ar-ticle, retrieve relevant visualizations that already exist on the web. We show that this problem is tractable through a new system, VizByWiki, that mines contextually relevant data visualizations from Wikimedia Commons, the central file reposi-tory for Wikipedia. Using a novel ground truth dataset, we show that VizByWiki can successfully augment as many as 48% of popular online news articles with news visualizations. We also demonstrate that VizByWiki can automatically rank visualizations according to their usefulness with reasonable accuracy (nDCG@5 of 0.82). To facilitate further advances on our "news visualization retrieval problem", we release our ground truth dataset and make our system and its source code publicly available.

References

[1]
Agrawal, R. et al. 2011. Enriching textbooks with images. Proceedings of the 20th ACM international conference on Information and knowledge management (2011), 1847--1856.
[2]
Baack, S. 2011. A new style of news reporting: Wikileaks and data-driven journalism. Cyborg Subjects. (2011), 10.
[3]
Chang, S. et al. 2015. Got Many Labels?: Deriving Topic Labels from Multiple Sources for Social Media Posts Using Crowdsourcing and Ensemble Learning. Proceedings of the 24th International Conference on World Wide Web (New York, NY, USA, 2015), 397--406.
[4]
Cheng, X. and Roth, D. 2013. Relational inference for wikification. Urbana. 51, 61801 (2013), 16--58.
[5]
Delgado, D. et al. 2010. Assisted News Reading with Automated Illustration. Proceedings of the 18th ACM International Conference on Multimedia (New York, NY, USA, 2010), 1647--1650.
[6]
Ensan, F. and Bagheri, E. 2017. Document Retrieval Model Through Semantic Linking. Proceedings of the Tenth ACM International Conference on Web Search and Data Mining (New York, NY, USA, 2017), 181--190.
[7]
Gabrilovich, E. and Markovitch, S. 2007. Computing semantic relatedness using Wikipedia-based explicit semantic analysis. IJCAI (2007), 1606--1611.
[8]
Gao, T. et al. 2014. NewsViews: an automated pipeline for creating custom geovisualizations for news. (2014), 3005--3014.
[9]
Hall, A. et al. 2017. Freedom Versus Standardization: Structured Data Gener-ation in a Peer Production Community. Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (New York, NY, USA, 2017), 6352--6362.
[10]
Howard, A.B. 2014. The Art and Science of Data-Driven Journalism. (2014).
[11]
Hullman, J. et al. 2013. Contextifier: Automatic Generation of Annotated Stock Visualizations. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (New York, NY, USA, 2013), 2707--2716.
[12]
Ipeirotis, P.G. et al. 2010. Quality Management on Amazon Mechanical Turk. Proceedings of the ACM SIGKDD Workshop on Human Computation (New York, NY, USA, 2010), 64--67.
[13]
Joachims, T. 2002. Optimizing Search Engines Using Clickthrough Data. Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (New York, NY, USA, 2002), 133--142.
[14]
Jung, D. et al. 2017. ChartSense: Interactive Data Extraction from Chart Images. Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (New York, NY, USA, 2017), 6706--6717.
[15]
Keegan, B. et al. 2013. Hot Off the Wiki: Structures and Dynamics of Wikipedia's Coverage of Breaking News Events. American Behavioral Scientist. 57, 5 (May 2013), 595--622.
[16]
Kittur, A. et al. 2008. Crowdsourcing User Studies with Mechanical Turk. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (New York, NY, USA, 2008), 453--456.
[17]
Li, W. and Zhuge, H. 2014. Summarising news with texts and pictures. Semantics, Knowledge and Grids (SKG), 2014 10th International Conference on (2014), 100--107.
[18]
Li, Z. et al. 2016. Multimedia News Summarization in Search. ACM Trans. Intell. Syst. Technol. 7, 3 (Feb. 2016), 33:1--33:20.
[19]
Li, Z. et al. 2011. News contextualization with geographic and visual in-formation. Proceedings of the 19th ACM international conference on Multimedia (2011), 133--142.
[20]
Li, Z. 2017. Understanding-Oriented Multimedia News Summarization. Understanding-Oriented Multimedia Content Analysis. Springer, Singapore. 131--153.
[21]
Lin, Y. et al. 2017. Problematizing and Addressing the Article-as-Concept Assumption in Wikipedia. Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing (New York, NY, USA, 2017), 2052--2067.
[22]
Mackinlay, J. 1986. Automating the Design of Graphical Presentations of Relational Information. ACM Trans. Graph. 5, 2 (Apr. 1986), 110--141.
[23]
Marcus, A. et al. 2013. Data In Context: Aiding News Consumers while Taming Dataspaces. DBCrowd 2013. 47, (2013).
[24]
Mihalcea, R. and Csomai, A. 2007. Wikify!: Linking Documents to Encyclopedic Knowledge. Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management (New York, NY, USA, 2007), 233--242.
[25]
Noraset, T. et al. 2014. WebSAIL Wikifier at ERD 2014. Proceedings of the First International Workshop on Entity Recognition & Disambiguation (New York, NY, USA, 2014), 119--124.
[26]
Parasie, S. and Dagiral, E. 2013. Data-driven journalism and the public good:"Computer-assisted-reporters" and "programmer-journalists" in Chicago. New media & society. 15, 6 (2013), 853--871.
[27]
Poco, J. and Heer, J. 2017. Reverse-Engineering Visualizations: Recovering Visual Encodings from Chart Images. Computer Graphics Forum. 36, 3 (Jun. 2017), 353--363.
[28]
Ramisa, A. et al. 2016. Breakingnews: Article annotation by image and text processing. arXiv preprint arXiv:1603.07141. (2016).
[29]
Razavian, A.S. et al. 2014. CNN Features off-the-shelf: an Astounding Baseline for Recognition. arXiv:1403.6382 {cs}. (Mar. 2014).
[30]
Ren, D. et al. 2014. iVisDesigner: Expressive Interactive Design of Information Visualizations. IEEE Transactions on Visualization and Computer Graphics. 20, 12 (Dec. 2014), 2092--2101.
[31]
Satyanarayan, A. and Heer, J. 2014. Lyra: An Interactive Visualization Design Environment. Computer Graphics Forum. 33, 3 (Jun. 2014), 351--360.
[32]
Savva, M. et al. 2011. Revision: Automated classification, analysis and redesign of chart images. Proceedings of the 24th annual ACM symposium on User interface software and technology (2011), 393--402.
[33]
Segel, E. and Heer, J. 2010. Narrative visualization: Telling stories with data. IEEE transactions on visualization and computer graphics. 16, 6 (2010), 1139--1148.
[34]
Sen, S. et al. 2014. WikiBrain: Democratizing Computation on Wikipedia. Proceedings of The International Symposium on Open Collaboration (New York, NY, USA, 2014), 27:1--27:10.
[35]
Siegel, N. et al. 2016. FigureSeer: Parsing Result-Figures in Research Pa-pers. Computer Vision - ECCV 2016 (Oct. 2016), 664--680.
[36]
Szegedy, C. et al. 2016. Rethinking the Inception Architecture for Comput-er Vision. (2016), 2818--2826.
[37]
Tsikrika, T. et al. 2011. Overview of the Wikipedia Image Retrieval Task at ImageCLEF 2011. CLEF (Notebook Papers/Labs/Workshop) (2011).
[38]
Wang, P. et al. 2017. Concept Embedded Convolutional Semantic Model for Question Retrieval. Proceedings of the Tenth ACM International Conference on Web Search and Data Mining (New York, NY, USA, 2017), 395--403.
[39]
WikiBrain: Advanced SR usage.: https://shilad.github.io/wikibrain/tutorial/advancedsr.html. Accessed: 2017--10--31.
[40]
Wikimedia Foundation 2017. Wikimedia Foundation receives $3 million grant from Alfred P. Sloan Foundation to make freely licensed images accessible and reusable across the web. Retrieved from https://blog.wikimedia.org/2017/01/09/sloan-foundation-structured-data/
[41]
Wongsuphasawat, K. et al. 2016. Voyager: Exploratory Analysis via Faceted Browsing of Visualization Recommendations. IEEE Transactions on Visualization and Computer Graphics. 22, 1 (Jan. 2016), 649--658.
[42]
2017. Help:Adding image. Wikipedia. Retrieved from https://en.wikipedia.org/w/index.php?title=Help:Adding_image&oldid=764170156

Cited By

View all
  • (2024)DataDive: Supporting Readers' Contextualization of Statistical Statements with Data ExplorationProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645155(623-639)Online publication date: 18-Mar-2024
  • (2024)EC: A Tool for Guiding Chart and Caption EmphasisIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.332715030:1(120-130)Online publication date: 1-Jan-2024
  • (2024)ChatKG: Visualizing time-series patterns aided by intelligent agents and a knowledge graphComputers & Graphics10.1016/j.cag.2024.104092124(104092)Online publication date: Nov-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '18: Proceedings of the 2018 World Wide Web Conference
April 2018
2000 pages
ISBN:9781450356398
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • IW3C2: International World Wide Web Conference Committee

In-Cooperation

Publisher

International World Wide Web Conferences Steering Committee

Republic and Canton of Geneva, Switzerland

Publication History

Published: 23 April 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data visualizations
  2. news articles
  3. peer production
  4. user-generated content
  5. wikimedia commons
  6. wikipedia

Qualifiers

  • Research-article

Funding Sources

  • U.S. National Science Foundation

Conference

WWW '18
Sponsor:
  • IW3C2
WWW '18: The Web Conference 2018
April 23 - 27, 2018
Lyon, France

Acceptance Rates

WWW '18 Paper Acceptance Rate 170 of 1,155 submissions, 15%;
Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)227
  • Downloads (Last 6 weeks)37
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)DataDive: Supporting Readers' Contextualization of Statistical Statements with Data ExplorationProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645155(623-639)Online publication date: 18-Mar-2024
  • (2024)EC: A Tool for Guiding Chart and Caption EmphasisIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.332715030:1(120-130)Online publication date: 1-Jan-2024
  • (2024)ChatKG: Visualizing time-series patterns aided by intelligent agents and a knowledge graphComputers & Graphics10.1016/j.cag.2024.104092124(104092)Online publication date: Nov-2024
  • (2023)ConceptEVA: Concept-Based Interactive Exploration and Customization of Document SummariesProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581260(1-16)Online publication date: 19-Apr-2023
  • (2023)Review of chart image detection and classificationInternational Journal on Document Analysis and Recognition (IJDAR)10.1007/s10032-022-00424-526:4(453-474)Online publication date: 12-Jan-2023
  • (2022)A Multi-Purpose Shallow Convolutional Neural Network for Chart ImagesSensors10.3390/s2220769522:20(7695)Online publication date: 11-Oct-2022
  • (2022)Q4EDA: A Novel Strategy for Textual Information Retrieval Based on User Interactions with Visual Representations of Time SeriesInformation10.3390/info1308036813:8(368)Online publication date: 2-Aug-2022
  • (2022)AI4VIS: Survey on Artificial Intelligence Approaches for Data VisualizationIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2021.309900228:12(5049-5070)Online publication date: 1-Dec-2022
  • (2020)Exploiting Categorization of Online News for Profiling City Areas2020 IEEE Conference on Evolving and Adaptive Intelligent Systems (EAIS)10.1109/EAIS48028.2020.9122777(1-8)Online publication date: May-2020
  • (2020)ABLA: An Algorithm for Repairing Structure-Based Locators Through Attribute AnnotationsWeb Information Systems Engineering – WISE 202010.1007/978-3-030-62008-0_7(101-113)Online publication date: 21-Oct-2020
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media