Synopsis
Synopsis
Synopsis
Group ID:
Project Title: __________Speech to Text Recognition_____________________
Group Members:
Literature Survey:
Speech-to-Text (STT) system is a system for the conversion of speech into text. As
computerized speech-to-text (STT) technology has become more advanced over the past
several years, more students with, and without, disabilities are using STT tools within
the classroom and while taking assessments. Speech-to-text tools often are installed on
school-provided computers or tablets and thus could also be widely available to students for
instructional use. First, it highlights what the available literature tells us about the
characteristics of scholars who used STT for instruction and assessment and therefore
the methodologies and outcomes variables related to those studies. Second, it describes the
implementation of STT tools (e.g., training within the use of STT, student attitudes toward
the tools, and comparison of various sorts of tools). Third, it describes the effect of
technology on academic outcomes for college kids with differing types of disabilities.
Most modern speech recognition systems believe what's referred to as a Hidden Markov
Model (HMM). This approach works on the idea that a speech signal, when viewed on a
brief enough timescale (say 10 ms) are often reasonably approximated as a stationary
process, that's a process during which statistical properties don't change over time.
In a typical HMM, the speech signal is split into 10-millisecond fragments. the
facility spectrum of every fragment, which is actually a plot of the signal’s power as a
function of frequency, is mapped to a vector of real numbers referred to as cepstral
coefficients. The dimension of this vector is typically small—sometimes as low as 10,
although more accurate systems may have dimension 32 or more. the ultimate output of the
HMM may be a sequence of those vectors. To decode the speech into text, groups of vectors
are matched to at least one or more phonemes—a fundamental unit of speech. This
calculation requires training, since the sound of a phoneme varies from speaker to speaker,
and even varies from one utterance to a different by an equivalent speaker. A special
algorithm is then applied to work out the foremost likely word (or words) that produce the
given sequence of phonemes. One can imagine that this whole process could also
be computationally expensive. In many modern speech recognition systems, neural networks
are wont to simplify the speech signal using techniques for feature transformation and
dimensionality reduction before HMM recognition. Voice activity detectors ( VADs) also
are wont to reduce an audio signal to only the portions that are likely to contain speech. This
prevents the recognizer from dalliance analyzing unnecessary parts of the signal.
Fortunately, as a Python programmer, you don’t need to worry about any of this. Several
application of speech recognition services are available to be used online through an
API, and lots of of those services offer Python SDKs.
Proposed System (Block Diagram):
A simplified overview of speech to text recognition using Speech recognition API using Python.
Block diagram representing speech to text recognition using Hidden Markov Model (HMM)
Flowchart
Conclusion:
Speech Recognition has been in development of more than 60 years. The various speech
recognition methodologies and approaches are available to enhance the recognition system. The
fundamentals of SR system, various approaches existing for developing an ASR system are
explained and compared in this paper. In recent years large vocabulary independent continuous
speech has highly enhanced. In order to improve the accuracy, other modeling techniques will be
implemented in future.
References:
Susanne Wagner(Halle), “Intralingual speech to text conversion real time challenges &
opportunities”. Retrieved on 6 June 2005.
Preeti Saini, Parneet Kaur,”Automatic Speech Recognition: A Review” proc International Journal
of Engineering Trends and Technology- Volume4 Issue2- 2013.
http://www.ijetajournal.org/