research-article

Free access

VizByWiki: Mining Data Visualizations from the Web to Enrich News Articles

Authors:

Allen Yilun Lin,

Brent HechtAuthors Info & Claims

WWW '18: Proceedings of the 2018 World Wide Web Conference

Pages 873 - 882

https://doi.org/10.1145/3178876.3186135

Published: 23 April 2018 Publication History

All formats PDF

Abstract

Data visualizations in news articles (e.g., maps, line graphs, bar charts) greatly enrich the content of news articles and result in well-established improvements to reader comprehension. However, existing systems that generate news data visualiza-tions either require substantial manual effort or are limited to very specific types of data visualizations, thereby greatly re-stricting the number of news articles that can be enhanced. To address this issue, we define a new problem: given a news ar-ticle, retrieve relevant visualizations that already exist on the web. We show that this problem is tractable through a new system, VizByWiki, that mines contextually relevant data visualizations from Wikimedia Commons, the central file reposi-tory for Wikipedia. Using a novel ground truth dataset, we show that VizByWiki can successfully augment as many as 48% of popular online news articles with news visualizations. We also demonstrate that VizByWiki can automatically rank visualizations according to their usefulness with reasonable accuracy (nDCG@5 of 0.82). To facilitate further advances on our "news visualization retrieval problem", we release our ground truth dataset and make our system and its source code publicly available.

References

[1]

Agrawal, R. et al. 2011. Enriching textbooks with images. Proceedings of the 20th ACM international conference on Information and knowledge management (2011), 1847--1856.

Digital Library

[2]

Baack, S. 2011. A new style of news reporting: Wikileaks and data-driven journalism. Cyborg Subjects. (2011), 10.

[3]

Chang, S. et al. 2015. Got Many Labels?: Deriving Topic Labels from Multiple Sources for Social Media Posts Using Crowdsourcing and Ensemble Learning. Proceedings of the 24th International Conference on World Wide Web (New York, NY, USA, 2015), 397--406.

Digital Library

[4]

Cheng, X. and Roth, D. 2013. Relational inference for wikification. Urbana. 51, 61801 (2013), 16--58.

[5]

Delgado, D. et al. 2010. Assisted News Reading with Automated Illustration. Proceedings of the 18th ACM International Conference on Multimedia (New York, NY, USA, 2010), 1647--1650.

Digital Library

[6]

Ensan, F. and Bagheri, E. 2017. Document Retrieval Model Through Semantic Linking. Proceedings of the Tenth ACM International Conference on Web Search and Data Mining (New York, NY, USA, 2017), 181--190.

Digital Library

[7]

Gabrilovich, E. and Markovitch, S. 2007. Computing semantic relatedness using Wikipedia-based explicit semantic analysis. IJCAI (2007), 1606--1611.

Digital Library

[8]

Gao, T. et al. 2014. NewsViews: an automated pipeline for creating custom geovisualizations for news. (2014), 3005--3014.

Digital Library

[9]

Hall, A. et al. 2017. Freedom Versus Standardization: Structured Data Gener-ation in a Peer Production Community. Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (New York, NY, USA, 2017), 6352--6362.

Digital Library

[10]

Howard, A.B. 2014. The Art and Science of Data-Driven Journalism. (2014).

[11]

Hullman, J. et al. 2013. Contextifier: Automatic Generation of Annotated Stock Visualizations. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (New York, NY, USA, 2013), 2707--2716.

Digital Library

[12]

Ipeirotis, P.G. et al. 2010. Quality Management on Amazon Mechanical Turk. Proceedings of the ACM SIGKDD Workshop on Human Computation (New York, NY, USA, 2010), 64--67.

Digital Library

[13]

Joachims, T. 2002. Optimizing Search Engines Using Clickthrough Data. Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (New York, NY, USA, 2002), 133--142.

Digital Library

[14]

Jung, D. et al. 2017. ChartSense: Interactive Data Extraction from Chart Images. Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (New York, NY, USA, 2017), 6706--6717.

Digital Library

[15]

Keegan, B. et al. 2013. Hot Off the Wiki: Structures and Dynamics of Wikipedia's Coverage of Breaking News Events. American Behavioral Scientist. 57, 5 (May 2013), 595--622.

[16]

Kittur, A. et al. 2008. Crowdsourcing User Studies with Mechanical Turk. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (New York, NY, USA, 2008), 453--456.

Digital Library

[17]

Li, W. and Zhuge, H. 2014. Summarising news with texts and pictures. Semantics, Knowledge and Grids (SKG), 2014 10th International Conference on (2014), 100--107.

Digital Library

[18]

Li, Z. et al. 2016. Multimedia News Summarization in Search. ACM Trans. Intell. Syst. Technol. 7, 3 (Feb. 2016), 33:1--33:20.

Digital Library

[19]

Li, Z. et al. 2011. News contextualization with geographic and visual in-formation. Proceedings of the 19th ACM international conference on Multimedia (2011), 133--142.

Digital Library

[20]

Li, Z. 2017. Understanding-Oriented Multimedia News Summarization. Understanding-Oriented Multimedia Content Analysis. Springer, Singapore. 131--153.

[21]

Lin, Y. et al. 2017. Problematizing and Addressing the Article-as-Concept Assumption in Wikipedia. Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing (New York, NY, USA, 2017), 2052--2067.

Digital Library

[22]

Mackinlay, J. 1986. Automating the Design of Graphical Presentations of Relational Information. ACM Trans. Graph. 5, 2 (Apr. 1986), 110--141.

Digital Library

[23]

Marcus, A. et al. 2013. Data In Context: Aiding News Consumers while Taming Dataspaces. DBCrowd 2013. 47, (2013).

[24]

Mihalcea, R. and Csomai, A. 2007. Wikify!: Linking Documents to Encyclopedic Knowledge. Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management (New York, NY, USA, 2007), 233--242.

Digital Library

[25]

Noraset, T. et al. 2014. WebSAIL Wikifier at ERD 2014. Proceedings of the First International Workshop on Entity Recognition & Disambiguation (New York, NY, USA, 2014), 119--124.

Digital Library

[26]

Parasie, S. and Dagiral, E. 2013. Data-driven journalism and the public good:"Computer-assisted-reporters" and "programmer-journalists" in Chicago. New media & society. 15, 6 (2013), 853--871.

[27]

Poco, J. and Heer, J. 2017. Reverse-Engineering Visualizations: Recovering Visual Encodings from Chart Images. Computer Graphics Forum. 36, 3 (Jun. 2017), 353--363.

Digital Library

[28]

Ramisa, A. et al. 2016. Breakingnews: Article annotation by image and text processing. arXiv preprint arXiv:1603.07141. (2016).

[29]

Razavian, A.S. et al. 2014. CNN Features off-the-shelf: an Astounding Baseline for Recognition. arXiv:1403.6382 {cs}. (Mar. 2014).

[30]

Ren, D. et al. 2014. iVisDesigner: Expressive Interactive Design of Information Visualizations. IEEE Transactions on Visualization and Computer Graphics. 20, 12 (Dec. 2014), 2092--2101.

[31]

Satyanarayan, A. and Heer, J. 2014. Lyra: An Interactive Visualization Design Environment. Computer Graphics Forum. 33, 3 (Jun. 2014), 351--360.

[32]

Savva, M. et al. 2011. Revision: Automated classification, analysis and redesign of chart images. Proceedings of the 24th annual ACM symposium on User interface software and technology (2011), 393--402.

Digital Library

[33]

Segel, E. and Heer, J. 2010. Narrative visualization: Telling stories with data. IEEE transactions on visualization and computer graphics. 16, 6 (2010), 1139--1148.

Digital Library

[34]

Sen, S. et al. 2014. WikiBrain: Democratizing Computation on Wikipedia. Proceedings of The International Symposium on Open Collaboration (New York, NY, USA, 2014), 27:1--27:10.

Digital Library

[35]

Siegel, N. et al. 2016. FigureSeer: Parsing Result-Figures in Research Pa-pers. Computer Vision - ECCV 2016 (Oct. 2016), 664--680.

[36]

Szegedy, C. et al. 2016. Rethinking the Inception Architecture for Comput-er Vision. (2016), 2818--2826.

[37]

Tsikrika, T. et al. 2011. Overview of the Wikipedia Image Retrieval Task at ImageCLEF 2011. CLEF (Notebook Papers/Labs/Workshop) (2011).

[38]

Wang, P. et al. 2017. Concept Embedded Convolutional Semantic Model for Question Retrieval. Proceedings of the Tenth ACM International Conference on Web Search and Data Mining (New York, NY, USA, 2017), 395--403.

Digital Library

[39]

WikiBrain: Advanced SR usage.: https://shilad.github.io/wikibrain/tutorial/advancedsr.html. Accessed: 2017--10--31.

[40]

Wikimedia Foundation 2017. Wikimedia Foundation receives $3 million grant from Alfred P. Sloan Foundation to make freely licensed images accessible and reusable across the web. Retrieved from https://blog.wikimedia.org/2017/01/09/sloan-foundation-structured-data/

[41]

Wongsuphasawat, K. et al. 2016. Voyager: Exploratory Analysis via Faceted Browsing of Visualization Recommendations. IEEE Transactions on Visualization and Computer Graphics. 22, 1 (Jan. 2016), 649--658.

Digital Library

[42]

2017. Help:Adding image. Wikipedia. Retrieved from https://en.wikipedia.org/w/index.php?title=Help:Adding_image&oldid=764170156

Cited By

Kim HLe KLim GKim DHong YKim J(2024)DataDive: Supporting Readers' Contextualization of Statistical Statements with Data ExplorationProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645155(623-639)Online publication date: 18-Mar-2024
https://dl.acm.org/doi/10.1145/3640543.3645155
Kim DChoi SKim JSetlur VAgrawala M(2024)EC: A Tool for Guiding Chart and Caption EmphasisIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.332715030:1(120-130)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TVCG.2023.3327150
Christino LPaulovich F(2024)ChatKG: Visualizing time-series patterns aided by intelligent agents and a knowledge graphComputers & Graphics10.1016/j.cag.2024.104092124(104092)Online publication date: Nov-2024
https://doi.org/10.1016/j.cag.2024.104092
Show More Cited By

Index Terms

VizByWiki: Mining Data Visualizations from the Web to Enrich News Articles
1. Human-centered computing
  1. Collaborative and social computing
    1. Collaborative and social computing systems and tools
  2. Visualization
    1. Visualization application domains
      1. Information visualization

Recommendations

Classifying tags using open content resources
WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data Mining

Tagging has emerged as a popular means to annotate on-line objects such as bookmarks, photos and videos. Tags vary in semantic meaning and can describe different aspects of a media object. Tags describe the content of the media as well as locations, ...
A Deeper Investigation of the Importance of Wikipedia Links to Search Engine Results
CSCW

A growing body of work has highlighted the important role that Wikipedia's volunteer-created content plays in helping search engines achieve their core goal of addressing the information needs of hundreds of millions of people. In this paper, we report ...
On the "localness" of user-generated content
CSCW '10: Proceedings of the 2010 ACM conference on Computer supported cooperative work

The "localness" of participation in repositories of user-generated content (UGC) with geospatial components has been cited as one of UGC's greatest benefits. However, the degree of localness in major UGC repositories such as Flickr and Wikipedia has ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

WWW '18: Proceedings of the 2018 World Wide Web Conference

April 2018

2000 pages

ISBN:9781450356398

General Chairs:
Pierre-Antoine Champin
Universitè Claude Bernard Lyon 1, France
,
Fabien Gandon
Inria, Université Côte d'Azur, CNRS, I3S, France
,
Lionel Médini
Université Claude Bernard Lyon 1, France
,
Program Chairs:
Mounia Lalmas
Spotify, UK
,
Panagiotis G. Ipeirotis
New York University, USA

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

IW3C2: International World Wide Web Conference Committee

In-Cooperation

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

International World Wide Web Conferences Steering Committee

Republic and Canton of Geneva, Switzerland

Publication History

Published: 23 April 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

U.S. National Science Foundation

Conference

WWW '18

Sponsor:

IW3C2

WWW '18: The Web Conference 2018

April 23 - 27, 2018

Lyon, France

Acceptance Rates

WWW '18 Paper Acceptance Rate 170 of 1,155 submissions, 15%;

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
1,042
Total Downloads

Downloads (Last 12 months)227
Downloads (Last 6 weeks)37

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Kim HLe KLim GKim DHong YKim J(2024)DataDive: Supporting Readers' Contextualization of Statistical Statements with Data ExplorationProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645155(623-639)Online publication date: 18-Mar-2024
https://dl.acm.org/doi/10.1145/3640543.3645155
Kim DChoi SKim JSetlur VAgrawala M(2024)EC: A Tool for Guiding Chart and Caption EmphasisIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.332715030:1(120-130)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TVCG.2023.3327150
Christino LPaulovich F(2024)ChatKG: Visualizing time-series patterns aided by intelligent agents and a knowledge graphComputers & Graphics10.1016/j.cag.2024.104092124(104092)Online publication date: Nov-2024
https://doi.org/10.1016/j.cag.2024.104092
Zhang XLi JChi PChandrasegaran SMa K(2023)ConceptEVA: Concept-Based Interactive Exploration and Customization of Document SummariesProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581260(1-16)Online publication date: 19-Apr-2023
https://dl.acm.org/doi/10.1145/3544548.3581260
Bajić FJob J(2023)Review of chart image detection and classificationInternational Journal on Document Analysis and Recognition (IJDAR)10.1007/s10032-022-00424-526:4(453-474)Online publication date: 12-Jan-2023
https://doi.org/10.1007/s10032-022-00424-5
Bajić FOrel OHabijan M(2022)A Multi-Purpose Shallow Convolutional Neural Network for Chart ImagesSensors10.3390/s2220769522:20(7695)Online publication date: 11-Oct-2022
https://doi.org/10.3390/s22207695
Christino LFerreira MPaulovich F(2022)Q4EDA: A Novel Strategy for Textual Information Retrieval Based on User Interactions with Visual Representations of Time SeriesInformation10.3390/info1308036813:8(368)Online publication date: 2-Aug-2022
https://doi.org/10.3390/info13080368
Wu AWang YShu XMoritz DCui WZhang HZhang DQu H(2022)AI4VIS: Survey on Artificial Intelligence Approaches for Data VisualizationIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2021.309900228:12(5049-5070)Online publication date: 1-Dec-2022
https://doi.org/10.1109/TVCG.2021.3099002
Bondielli ADucange PMarcelloni F(2020)Exploiting Categorization of Online News for Profiling City Areas2020 IEEE Conference on Evolving and Adaptive Intelligent Systems (EAIS)10.1109/EAIS48028.2020.9122777(1-8)Online publication date: May-2020
https://doi.org/10.1109/EAIS48028.2020.9122777
Aldalur ILarrinaga FPerez A(2020)ABLA: An Algorithm for Repairing Structure-Based Locators Through Attribute AnnotationsWeb Information Systems Engineering – WISE 202010.1007/978-3-030-62008-0_7(101-113)Online publication date: 21-Oct-2020
https://doi.org/10.1007/978-3-030-62008-0_7
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten