Synopsis

MIT School of Engineering
Department of Computer Science and Engineering
Mini Project Synopsis
Group ID:
Project Title: __________Speech to Text Recognition_____________________
Group Members:
Enrollment Roll No. Name of student Email Id Contact

Number Number
MITU18BTCS0214 2183098 Jainam Shah jainamshah948@gmail.com 9359858153
MITU18BTCS0252 2183044 Anupam Thackar anupamthackar0786@gmail 8962219884

.com
Abstract:
Speech is the most common means of communication and the majority of the population in the
world relies on speech to communicate with one another. Speech recognition systems basically
translate spoken languages into text. There are various real-life examples of speech recognition
systems. For example, Apple SIRI which recognizes the speech and truncates it into text.
Literature Survey:
Speech-to-Text (STT) system is a system for the conversion of speech into text. As
computerized speech-to-text (STT) technology has become more advanced over the past
several years, more students with, and without, disabilities are using STT tools within
the classroom and while taking assessments. Speech-to-text tools often are installed on
school-provided computers or tablets and thus could also be widely available to students for
instructional use. First, it highlights what the available literature tells us about the
characteristics of scholars who used STT for instruction and assessment and therefore
the methodologies and outcomes variables related to those studies. Second, it describes the
implementation of STT tools (e.g., training within the use of STT, student attitudes toward
the tools, and comparison of various sorts of tools). Third, it describes the effect of
technology on academic outcomes for college kids with differing types of disabilities.
Most modern speech recognition systems believe what's referred to as a Hidden Markov
Model (HMM). This approach works on the idea that a speech signal, when viewed on a
brief enough timescale (say 10 ms) are often reasonably approximated as a stationary
process, that's a process during which statistical properties don't change over time.
In a typical HMM, the speech signal is split into 10-millisecond fragments. the
facility spectrum of every fragment, which is actually a plot of the signal’s power as a
function of frequency, is mapped to a vector of real numbers referred to as cepstral
coefficients. The dimension of this vector is typically small—sometimes as low as 10,
although more accurate systems may have dimension 32 or more. the ultimate output of the
HMM may be a sequence of those vectors. To decode the speech into text, groups of vectors
are matched to at least one or more phonemes—a fundamental unit of speech. This
calculation requires training, since the sound of a phoneme varies from speaker to speaker,
and even varies from one utterance to a different by an equivalent speaker. A special
algorithm is then applied to work out the foremost likely word (or words) that produce the
given sequence of phonemes. One can imagine that this whole process could also
be computationally expensive. In many modern speech recognition systems, neural networks
are wont to simplify the speech signal using techniques for feature transformation and
dimensionality reduction before HMM recognition. Voice activity detectors ( VADs) also
are wont to reduce an audio signal to only the portions that are likely to contain speech. This
prevents the recognizer from dalliance analyzing unnecessary parts of the signal.
Fortunately, as a Python programmer, you don’t need to worry about any of this. Several
application of speech recognition services are available to be used online through an
API, and lots of of those services offer Python SDKs.
Proposed System (Block Diagram):
A simplified overview of speech to text recognition using Speech recognition API using Python.
Block diagram representing speech to text recognition using Hidden Markov Model (HMM)
Flowchart
Conclusion:
Speech Recognition has been in development of more than 60 years. The various speech
recognition methodologies and approaches are available to enhance the recognition system. The
fundamentals of SR system, various approaches existing for developing an ASR system are
explained and compared in this paper. In recent years large vocabulary independent continuous
speech has highly enhanced. In order to improve the accuracy, other modeling techniques will be
implemented in future.
References:
Susanne Wagner(Halle), “Intralingual speech to text conversion real time challenges &
opportunities”. Retrieved on 6 June 2005.
Preeti Saini, Parneet Kaur,”Automatic Speech Recognition: A Review” proc International Journal
of Engineering Trends and Technology- Volume4 Issue2- 2013.
http://www.ijetajournal.org/

Synopsis

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Synopsis

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Synopsis

Uploaded by

Copyright:

Available Formats

MIT School of Engineering

Department of Computer Science and Engineering

Mini Project Synopsis

Enrollment Roll No. Name of student Email Id Contact

MITU18BTCS0214 2183098 Jainam Shah jainamshah948@gmail.com 9359858153

MITU18BTCS0252 2183044 Anupam Thackar anupamthackar0786@gmail 8962219884

You might also like