Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3368308.3415380acmconferencesArticle/Chapter ViewAbstractPublication PagesiteConference Proceedingsconference-collections
research-article

A Case Study in Comparative Speech-to-Text Libraries for Use in Transcript Generation for Online Education Recordings

Published: 07 October 2020 Publication History

Abstract

With a proliferation of Cloud based Speech-to-Text services it can be difficult to decide where to start and how to make use of these technologies. These include the major Cloud providers as well as several Open Source Speech-to-Text projects available. We desired to investigate a sample of the available libraries and their attributes relating to the recording artifacts that are the by-product of Online Education.
The fact that so many resources are available means that the computing and technical barriers for applying speech recognition algorithms have decreased to the point of being a non-factor in the decision to use Speech-to-Text services. New barriers such as price, compute time, and access to the services? source code (software freedom) can be factored into the decision of which platform to use.
This case study provides a beginning to developing a test-suite and guide to compare Speech-to-Text libraries and their out-of-the-box accuracy. Our initial test suite employed two models: 1) a Cloud model employing AWS S3 using AWS Transcribe, 2) an on-premises Open Source model that relies on Mozilla's DeepSpeech[1]. We present our findings and recommendations based on the criteria discovered.
In order to deliver this test-suite, we also conducted research into the latest web development technologies with emphasis on security. This was done to produce a reliable and secure development process and to provide open access to this proof of concept for further testing and development.

References

[1]
Awni Hannun, Carl Case, Jared Casper, Bryan Catanzaro and Greg Diamos, 2014. DeepSpeech: Scaling up end-to-end Speech Recognition eprint=1412.5567, archivePrefix=arXiv,primaryClass=cs.CL.
[2]
Pablo Angel Alvarez Fernandez. (2020, June 15). pabloaaf/Factor-TranscriptionCaseStudy: v1.0.0 (Version 1.0.0). Zenodo. http://doi.org/10.5281/zenodo.3893988
[3]
Mozilla. 2020. https://voice.mozilla.org/en/about Why Common Voice?
[4]
Shimaa Ahmed, Amrita Roy Chowdhury, Kassem Fawaz,Parmesh Ramanathan. 2020. A System for Privacy-Preserving Speech Transcription. In arXiv:1909.04198v3 [cs.CR] 18 Feb 2020. https://arxiv.org/pdf/1909.04198.pdf
[5]
Sonal Shetty, Vidya Nagar, Harish Hebballi, Moula Husain, Meena S M and Shiddu Nagaralli. 2015. Content Based Audiobooks Indexing using Apache Hadoop Framework. In WCI '15, August 10 - 13, 2015, Kochi, India.
[6]
Larwan Berke, 2017. Displaying Confidence From Imperfect Automatic Speech Recognition for Captioning. SIGACCESS Newsletter. Issue 117. January 2017.
[7]
AWS Transcribe. 2020. https://aws.amazon.com/transcribe/pricing/ AWS Trasncribe About
[8]
Thierry Lavoie and Ettore Merlo. 2012. An accurate estimation of the Levenshtein distance using metric trees and Manhattan distance. IWSC, Zurich, Switzerland.
[9]
Matthew F. Dabkowski, Samuel H. Huddleston, and Ian Kloo. 2019. Improving record linkage for counter-threat finance intelligence with dynamic Jaro-Winkler thresholds. WSC, Maryland, United States.

Cited By

View all
  • (2023)0.01 Cent per Second: Developing a Cloud-based Cost-effective Audio Transcription System for an Online Video Learning Platform2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE)10.1109/JCSSE58229.2023.10201942(432-437)Online publication date: 28-Jun-2023
  • (2021)Using IBM watson services to process video to streamline business processes and improve customer experienceProceedings of the 31st Annual International Conference on Computer Science and Software Engineering10.5555/3507788.3507830(262-267)Online publication date: 22-Nov-2021

Index Terms

  1. A Case Study in Comparative Speech-to-Text Libraries for Use in Transcript Generation for Online Education Recordings

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGITE '20: Proceedings of the 21st Annual Conference on Information Technology Education
    October 2020
    446 pages
    ISBN:9781450370455
    DOI:10.1145/3368308
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 October 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. algorithms
    2. automatic speech recognition
    3. cloud
    4. deepspeech
    5. floss
    6. kubernetes
    7. speech-to-text
    8. subtitles
    9. test suites

    Qualifiers

    • Research-article

    Conference

    SIGITE '20
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 176 of 429 submissions, 41%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)29
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 09 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)0.01 Cent per Second: Developing a Cloud-based Cost-effective Audio Transcription System for an Online Video Learning Platform2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE)10.1109/JCSSE58229.2023.10201942(432-437)Online publication date: 28-Jun-2023
    • (2021)Using IBM watson services to process video to streamline business processes and improve customer experienceProceedings of the 31st Annual International Conference on Computer Science and Software Engineering10.5555/3507788.3507830(262-267)Online publication date: 22-Nov-2021

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media