Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2815833.2816956acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesk-capConference Proceedingsconference-collections
short-paper

Automatic Extraction of Data from Bar Charts

Published: 07 October 2015 Publication History

Abstract

Scientific charts are an effective tool to visualize numerical data trends. They appear in a wide range of contexts, from experimental results in scientific papers to statistical analyses in business reports. The abundance of scientific charts in the web has made it inevitable for search engines to include them as indexed content. However, the queries based on only the textual data used to tag the images can limit query results. Many studies exist to address the extraction of data from scientific diagrams in order to improve search results. In our approach to achieving this goal, we attempt to enhance the semantic labeling of the charts by using the original data values that these charts were designed to represent. In this paper, we describe a method to extract data values from a specific class of charts, bar charts. The extraction process is fully automated using image processing and text recognition techniques combined with various heuristics derived from the graphical properties of bar charts. The extracted information can be used to enrich the indexing content for bar charts and improve search results. We evaluate the effectiveness of our method on bar charts drawn from the web as well as charts embedded in digital documents.

References

[1]
S. Z. Chen, M. J. Cafarella, and E. Adar. Searching for statistical diagrams. Frontiers of Engineering, National Academy of Engineering, pages 69--78, 2011.
[2]
Z. Chen, M. Cafarella, and E. Adar. Diagramflyer: A search engine for data-driven diagrams. In Proceedings of the 24th International Conference on World Wide Web Companion, pages 183--186. International World Wide Web Conferences Steering Committee, 2015.
[3]
D. Chester and S. Elzer. Getting computers to see information graphics so users do not have to. In Foundations of Intelligent Systems, pages 660--668. Springer, 2005.
[4]
L. A. Fletcher and R. Kasturi. A robust algorithm for text string separation from mixed text/graphics images. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 10(6):910--918, 1988.
[5]
W. Huang, C. L. Tan, and W. K. Leow. Model-based chart image recognition. In Graphics Recognition. Recent Advances and Perspectives, pages 87--99. Springer, 2004.
[6]
W. Huang, C. L. Tan, and W. K. Leow. Associating text and graphics for scientific chart understanding. In Document Analysis and Recognition, 2005. Proceedings. Eighth International Conference on, pages 580--584. IEEE, 2005.
[7]
Y. Liu, P. Mitra, C. L. Giles, and K. Bai. Automatic extraction of table metadata from digital documents. In Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries, pages 339--340. ACM, 2006.
[8]
Quartz. Atlas, by quartz. http://atlas.qz.com/, 2015.
[9]
M. Savva, N. Kong, A. Chhajta, L. Fei-Fei, M. Agrawala, and J. Heer. Revision: Automated classification, analysis and redesign of chart images. In Proceedings of the 24th annual ACM symposium on User interface software and technology, pages 393--402. ACM, 2011.
[10]
N. Vassilieva and Y. Fomina. Text detection in chart images. Pattern Recognition and Image Analysis, 23(1):139--144, 2013.

Cited By

View all
  • (2024)From Detection to Application: Recent Advances in Understanding Scientific Tables and FiguresACM Computing Surveys10.1145/365728556:10(1-39)Online publication date: 22-Jun-2024
  • (2024)TactualPlot: Spatializing Data as Sound Using Sensory Substitution for Touchscreen AccessibilityIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.332693730:1(836-846)Online publication date: 1-Jan-2024
  • (2023)Towards Natural Language Interfaces for Data Visualization: A SurveyIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2022.314800729:6(3121-3144)Online publication date: 1-Jun-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
K-CAP '15: Proceedings of the 8th International Conference on Knowledge Capture
October 2015
209 pages
ISBN:9781450338493
DOI:10.1145/2815833
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 October 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Information extraction
  2. scientific chart understanding
  3. web search

Qualifiers

  • Short-paper
  • Research
  • Refereed limited

Conference

K-CAP 2015
K-CAP 2015: Knowledge Capture Conference
October 7 - 10, 2015
NY, Palisades, USA

Acceptance Rates

K-CAP '15 Paper Acceptance Rate 16 of 56 submissions, 29%;
Overall Acceptance Rate 55 of 198 submissions, 28%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)65
  • Downloads (Last 6 weeks)9
Reflects downloads up to 23 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)From Detection to Application: Recent Advances in Understanding Scientific Tables and FiguresACM Computing Surveys10.1145/365728556:10(1-39)Online publication date: 22-Jun-2024
  • (2024)TactualPlot: Spatializing Data as Sound Using Sensory Substitution for Touchscreen AccessibilityIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.332693730:1(836-846)Online publication date: 1-Jan-2024
  • (2023)Towards Natural Language Interfaces for Data Visualization: A SurveyIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2022.314800729:6(3121-3144)Online publication date: 1-Jun-2023
  • (2023)Data Extraction and Question Answering on Chart Images Towards Accessibility and Data InterpretationIEEE Open Journal of the Computer Society10.1109/OJCS.2023.33287674(314-325)Online publication date: 2023
  • (2023)ChartEye: A Deep Learning Framework for Chart Information Extraction2023 International Conference on Digital Image Computing: Techniques and Applications (DICTA)10.1109/DICTA60407.2023.00082(554-561)Online publication date: 28-Nov-2023
  • (2023)Review of chart image detection and classificationInternational Journal on Document Analysis and Recognition10.1007/s10032-022-00424-526:4(453-474)Online publication date: 12-Jan-2023
  • (2022)Data Extraction of Circular-Shaped and Grid-like Chart ImagesJournal of Imaging10.3390/jimaging80501368:5(136)Online publication date: 12-May-2022
  • (2022)Classification of Scatter Plot Images Using Deep LearningDeu Muhendislik Fakultesi Fen ve Muhendislik10.21205/deufmd.202224712624:71(631-642)Online publication date: 16-May-2022
  • (2022)Parsing Line Chart Images Using Linear Programming2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV51458.2022.00261(2553-2562)Online publication date: Jan-2022
  • (2022)A Mixed-Initiative Approach to Reusing Infographic ChartsIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2021.311485628:1(173-183)Online publication date: 1-Jan-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media