Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3441852.3476545acmconferencesArticle/Chapter ViewAbstractPublication PagesassetsConference Proceedingsconference-collections
poster
Public Access

SciA11y: Converting Scientific Papers to Accessible HTML

Published: 17 October 2021 Publication History
  • Get Citation Alerts
  • Abstract

    We present SciA11y, a system that renders inaccessible scientific paper PDFs into HTML. SciA11y uses machine learning models to extract and understand the content of scientific PDFs, and reorganizes the resulting paper components into a form that better supports skimming and scanning for blind and low vision (BLV) readers. SciA11y adds navigation features such as tagged headings, a table of contents, and bidirectional links between inline citations and references, which allow readers to resolve citations without losing their context. A set of 1.5 million open access papers are processed and available at https://scia11y.org/. This system is a first step in addressing scientific PDF accessibility, and may significantly improve the experience of paper reading for BLV users.

    Supplementary Material

    VTT File (assets21b-sub1101-cam-i41.vtt)
    MP4 File (assets21b-sub1101-cam-i41.mp4)
    Presentation video

    References

    [1]
    Waleed Ammar, Dirk Groeneveld, Chandra Bhagavatula, Iz Beltagy, Miles Crawford, Doug Downey, Jason Dunkelberger, Ahmed Elgohary, Sergey Feldman, Vu A. Ha, Rodney Michael Kinney, Sebastian Kohlmeier, Kyle Lo, Tyler C. Murray, Hsu-Han Ooi, Matthew E. Peters, Joanna L. Power, Sam Skjonsberg, Lucy Lu Wang, Christopher Wilhelm, Zheng Yuan, Madeleine van Zuylen, and Oren Etzioni. 2018. Construction of the Literature Graph in Semantic Scholar. In NAACL-HLT.
    [2]
    E. Bates and D. Fitzpatrick. 2010. Spoken Mathematics Using Prosody, Earcons and Spearcons. In ICCHP.
    [3]
    Jeffrey P. Bigham, E. Brady, Cole Gleason, Anhong Guo, and D. Shamma. 2016. An Uninteresting Tour Through Why Our Research Papers Aren’t Accessible. Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems(2016).
    [4]
    Stephanie Elzer, E. J. Schwartz, S. Carberry, D. Chester, Seniz Demir, and Peng Wu. 2008. Accessible bar charts for visually impaired users.
    [5]
    Christin Engel, David Gollasch, Meinhardt Branig, and G. Weber. 2017. Towards Accessible Charts for Blind and Partially Sighted People. In Mensch & Computer.
    [6]
    Christin Engel, E. Müller, and G. Weber. 2019. SVGPlott: an accessible tool to generate highly adaptable, accessible audio-tactile charts for and from blind and visually impaired people. Proceedings of the 12th ACM International Conference on PErvasive Technologies Related to Assistive Environments(2019).
    [7]
    S. Flores, M. Andrade-Aréchiga, Alfonso Flores-Barriga, and Juan Lazaro-Flores. 2010. MathML to ASCII-Braille and Hierarchical Tree Converter. In ICCHP.
    [8]
    L. Gao, Yilun Huang, Hervé Déjean, Jean-Luc Meunier, Qinqin Yan, Yu Fang, Florian Kleber, and E. Lang. 2019. ICDAR 2019 Competition on Table Detection and Recognition (cTDaR). 2019 International Conference on Document Analysis and Recognition (ICDAR) (2019), 1510–1515.
    [9]
    Andrew Head, Kyle Lo, Dongyeop Kang, Raymond Fok, Sam Skjonsberg, Daniel S. Weld, and Marti A. Hearst. 2021. Augmenting Scientific Papers with Just-in-Time, Position-Sensitive Definitions of Terms and Symbols. Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (2021).
    [10]
    J. Lazar, E. Churchill, T. Grossman, G. V. D. Veer, Philippe A. Palanque, J. Morris, and Jennifer Mankoff. 2017. Making the field of computing more inclusive. Commun. ACM 60(2017), 50 – 59.
    [11]
    Kyle Lo, Lucy Lu Wang, Mark Neumann, Rodney Kinney, and Daniel Weld. 2020. S2ORC: The Semantic Scholar Open Research Corpus. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 4969–4983. https://doi.org/10.18653/v1/2020.acl-main.447
    [12]
    P. Lopez and Laurent Romary. 2015. GROBID - Information Extraction from Scientific Publications. ERCIM News 2015(2015).
    [13]
    M. Mackowski, P. Brzoza, M. Zabka, and D. Spińczyk. 2017. Multimedia platform for mathematics’ interactive learning accessible to blind people. Multimedia Tools and Applications 77 (2017), 6191–6208.
    [14]
    Zejiang Shen, Kyle Lo, Lucy Lu Wang, Bailey Kuehl, Daniel S. Weld, and Doug Downey. 2021. Incorporating Visual Layout Structures for Scientific Text Classification. ArXiv abs/2106.00676(2021).
    [15]
    N. Siegel, Nicholas Lourie, R. Power, and Waleed Ammar. 2018. Extracting Scientific Figures with Distantly Supervised Neural Networks. Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries (2018).
    [16]
    V. Sorge, C. Chen, T. Raman, and David Tseng. 2014. Towards making mathematics a first class citizen in general screen readers. In W4A.
    [17]
    Lucy Lu Wang, Isabel Cachola, Jonathan Bragg, Evie Yu-Yen Cheng, Chelsea Hess Haupt, Matt Latzke, Bailey Kuehl, Madeleine van Zuylen, Linda M. Wagner, and Daniel S. Weld. 2021. Improving the Accessibility of Scientific Documents: Current State, User Needs, and a System Solution to Enhance Scientific PDF Accessibility for Blind and Low Vision Users. ArXiv abs/2105.00076(2021).
    [18]
    Jiaquan Ye, X. Qi, Yelin He, Yihao Chen, Dengyi Gu, Peng Gao, and Rong Xiao. 2021. PingAn-VCGroup’s Solution for ICDAR 2021 Competition on Scientific Literature Parsing Task B: Table Recognition to HTML. ArXiv abs/2105.01848(2021).
    [19]
    Xinyi Zheng, D. Burdick, Lucian Popa, and N. Wang. 2021. Global Table Extractor (GTE): A Framework for Joint Table Identification and Cell Structure Recognition Using Visual Context. 2021 IEEE Winter Conference on Applications of Computer Vision (WACV) (2021), 697–706.

    Cited By

    View all
    • (2024)ACCSAMS: Automatic Conversion of Exam Documents to Accessible Learning Material for Blind and Visually ImpairedComputers Helping People with Special Needs10.1007/978-3-031-62846-7_39(322-330)Online publication date: 8-Jul-2024
    • (2023)A Comparative Evaluation of PDF-to-HTML Conversion Tools2023 International Research Conference on Smart Computing and Systems Engineering (SCSE)10.1109/SCSE59836.2023.10214989(1-7)Online publication date: 29-Jun-2023
    • (2022)Overview of assets 2021ACM SIGACCESS Accessibility and Computing10.1145/3523265.3523266(1-1)Online publication date: 1-Mar-2022

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ASSETS '21: Proceedings of the 23rd International ACM SIGACCESS Conference on Computers and Accessibility
    October 2021
    730 pages
    ISBN:9781450383066
    DOI:10.1145/3441852
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 October 2021

    Check for updates

    Badges

    • Best Artifact

    Author Tags

    1. accessibility
    2. accessible reader
    3. blind and low vision users
    4. scientific documents

    Qualifiers

    • Poster
    • Research
    • Refereed limited

    Funding Sources

    Conference

    ASSETS '21
    Sponsor:

    Acceptance Rates

    ASSETS '21 Paper Acceptance Rate 36 of 134 submissions, 27%;
    Overall Acceptance Rate 436 of 1,556 submissions, 28%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)296
    • Downloads (Last 6 weeks)55
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)ACCSAMS: Automatic Conversion of Exam Documents to Accessible Learning Material for Blind and Visually ImpairedComputers Helping People with Special Needs10.1007/978-3-031-62846-7_39(322-330)Online publication date: 8-Jul-2024
    • (2023)A Comparative Evaluation of PDF-to-HTML Conversion Tools2023 International Research Conference on Smart Computing and Systems Engineering (SCSE)10.1109/SCSE59836.2023.10214989(1-7)Online publication date: 29-Jun-2023
    • (2022)Overview of assets 2021ACM SIGACCESS Accessibility and Computing10.1145/3523265.3523266(1-1)Online publication date: 1-Mar-2022

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media