b.tech It Batchno 136
b.tech It Batchno 136
at
(Deemed to be University)
38120022
DECEMBER 2020
1
SCHOOL OF COMPUTING
BONAFIDE CERTIFICATE
Internal Guide
Dr. Y. Bevish Jinila M.E., Ph.D.,
2
DECLARATION
I Catherine Jenifer. R hereby declare that the Project Report entitled “VOICE TO
TEXT CONVERTER” done by me under the guidance of Dr. Y. Bevish Jinila
M.E., Ph.D., at Sathyabama Institute of Science and Technology (Deemed to be
University), Jeppiaar Nagar, Rajiv Gandhi Salai, Chennai – 600119, is submitted in
partial fulfillment of the requirements for the award of Bachelor of Technology
degree in Information Technology.
DATE: 21/01/2021
3
ACKNOWLEDGEMENT
I would like to express our sincere and deep sense of gratitude to our Project
Guide Dr. Y. Bevish JInila M.E., Ph.D., for her valuable guidance, suggestions and
constant encouragement paved way for the successful completion of my project
work.
I wish to express our thanks to all Teaching and Non-teaching staff members
of the Department of INFORMATION TECHNOLOGY who were helpful in many
ways for the completion of the project.
4
TRAINING CERTIFICATE
5
ABSTRACT
6
CHAPTER NO TITLE NAME PAGE NO
ABSTRACT 6
LIST OF FIGURES 8
LIST OF ABBREVIATION 9
1 INTRODUCTION 10
1.1 GENERAL 10
1.2 OUTLINE OF THE PROJECT 10
1.3 AVAILABLE TECHNOLOGY FOR 10
SPEECH RECOGNITION
1.4 CLASSIFICATION OF VOICE 12
2 RECOGNISING SYSTEM
AIM AND SCOPE 13
2.1 AIM 13
2.2 PROBLEM STATEMENT 13
2.3 SCOPE 13
3 SYSTEM ANALYSIS AND DESIGN 14
3.1 GENERAL 14
14
3.2 PROGRAMMING LANGUAGES USED
15
16
3.3 SYSTEM REQUIREMENTS
3.4 PROJECT DESCRIPTION
4
RESULT AND DISCUSSION 17
4.1 ADVANTAGES AND
DISADVANTAGES 17
4.2 RESULTS 18
5 CONCLUSION AND FUTURE WORK 18
5.1 CONCLUSION 18
5.2 FUTURE WORK 19
REFERENCE 20
APPENDIX 20
A. SCREENSHOTS 22
B. SOURCE CODE
7
FIGURE NO FIGURE NAME PAGE NO
4.1 OUTPUT 1 20
4.2 OUTPUT 2 20
4.3 OUTPUT 3 21
4.4 OUTPUT 4 21
8
LIST OF ABBREVATIONS
ABBREVIATIONS EXPANSION
9
CHAPTER 1
INTRODUCTION
1.1 GENERAL
Speech recognition is a feature that gives us the ability to perform tasks using our
spoken words as input. Speech recognition is gradually becoming a part of our lives
in the form of voice assistants such as Alexa, Google Assistant, and Siri. Whether
it’s dictating words to your device to compose a document, doing a web search using
voice, or controlling your computer using speech — speech to text conversion is
making our life faster and comfortable. It has the potential to replace traditional forms
of human to machine interface input devices, such as keyboards. A future where
humans are able to interact with machines just by using their speech and bodily
movements is not very far.
1.2 OUTLINE OF THE PROJECT
Human interact with each other in several ways such as facial expression, eye
contact, gesture, mainly speech. The speech is primary mode of communication
among human being and also, the most natural and efficient form of exchanging
information among human in speech. Speech-to-text conversion (STT) system is
widely used in many application areas. In the educational field, STT or speech
recognition system is the most effective on deaf or dumb students.
1.3 AVAILABLE TECHNOLOGY FOR SPEECH RECOGNITION
As part of a program of research on speech-to-speech translation, we review some
of the available technologies for speech recognition, the first component in any
voice-based MT system.
Microsoft Speech API
Microsoft Speech API (SAPI) allows access to Windows’ built-in speech recognition
and speech synthesis components. The API was released as part of the OS from
Windows 98 forward. The most recent release, Microsoft Speech API 5.4, supports
a small number of languages: American English, British English, Spanish, French,
German, simplified Chinese, and traditional Chinese. Because it is a native
Windows API, SAPI isn’t easy to use unless you’re an experienced C++ developer.
10
Microsoft Server-Related Technologies
The Microsoft Speech Platform provides access to speech recognition and
synthesis components that encourage the development of complex voice/telephony
server applications. This technology supports 26 different languages, although it
primarily just recognizes isolated words stored in a predefined grammar (http://
msdn.microsoft.com/en-us/library/ hh361571(v=office.14).aspx). Microsoft also
provides the Microsoft Unified Communications API (UCMA 3.0), a target for server
application development that requires integration with technologies such as voice
over IP, instant messages, voice call, or video call. The UCMA API allows easy
integration with Microsoft Lync and enables developers to create middle-layer
applications.
11
1.4 CLASSIFICATION OF VOICE RECOGNISING SYSTEM
Speech recognition system can be classified in several different types by describing
the type of speech utterance, type of speaker model and type of vocability that they
have the ability to recognize. The challenges are briefly explained below:
A.Types of speech utterance
Speech recognition are classified according to what type of utterance they have
ability to recognize. They are classified as:
1) Isolated word: Isolated word recognizer usually requires each spoken word to
have quiet (lack of an audio signal) on bot h side of the sample window. It accepts
single word at a time.
2) Connected word: It is similar to isolated word, but it allows separate utterances
to „run-together‟ which contains a minimum pause in between them.
3) Continuous Speech: it allows the users to speak naturally and in parallel the
computer will determine the content.
4) Spontaneous Speech: It is the type of speech which is natural sounding and is
not rehearsed.
B. Types of speaker model
Speech recognition system is broadly into two main categories based on speaker
models namely speaker dependent and speaker independent Journal of Applied
and Fundamental Sciences
1) Speaker dependent models: These systems are designed for a specific
speaker. They are easier to develop and more accurate but they are not so flexible.
2) Speaker independent models: These systems are designed for variety of
speaker. These systems are difficult to develop and less accurate but they are very
much flexible.
C. Types of vocabulary
The vocabulary size of speech recognition system affects the processing
requirements, accuracy and complexity of the system. In voice recognition system:
speech-to-text the types of vocabularies can be classified as follows:
1) Small vocabulary: single letter.
2) Medium vocabulary: two or three letter words.
3) Large vocabulary: more letter words.
12
CHAPTER 2
AIM AND SCOPE
2.1 AIM
Speech recognition technology is one from the fast-growing engineering
technologies. It has a number of applications in different areas and provides
potential benefits. Nearly 20% people of the world are suffering from various
disabilities; many of them are blind or unable to use their hands effectively. The
speech recognition systems in those particular cases provide a significant help to
them, so that they can share information with people by operating computer through
voice input. This project is designed and developed keeping that factor into mind,
and a little effort is made to achieve this aim. My project is capable to recognize the
speech and convert the input audio into text.
There is a lot of open-source development happening in this field with newer use
cases being envisioned for proper adoption. Lack of standardization of speech
recognition libraries and browsers needing to seek user permission for listening to
microphone input due to privacy concern is also holding it back.
2.3 SCOPE
Speech recognition can be implemented in the browser using JavaScript Web
Speech API. The Web Speech API enables the web app to accept speech as input
through the device's microphone and convert the speech into text by matching the
words in the speech against the words in its vocabulary.
13
CHAPTER 3
SYSTEM ANALYSIS
3.1 GENERAL
HTML5
14
3.3 SYSTEM REQUIREMENTS
Operating System: Windows/Mac
RAM: 4GB
Processor: 64x 1.0Ghz processor
ROM: 8GB
15
3.4 PROJECT DESCRIPTION
This feature checks for words and phrases in the speech input and provides the
identified words as output text. Speech recognition can be implemented in the
browser using JavaScript Web Speech API. The Web Speech API enables the web
app to accept speech as input through the device’s microphone and convert the
speech into text by matching the words in the speech against the words in its
vocabulary.
The speech recognition feature in its current form is free to use, highly developed,
and gives reasonably accurate results. It needs better adaptation and more devices
and browsers to support it for wider acceptance. There is a lot of open-source
development happening in this field with newer use cases being envisioned for
proper adoption. Lack of standardization of speech recognition libraries and
browsers needing to seek user permission for listening to microphone input due to
privacy concern is also holding it back.
16
CHAPTER 4
RESULTS AND DISCUSSION
4.1 ADVANTAGES AND DISADVANTAGES
4.1.1 ADVANTAGES
Converting Voice to Text are hard to ignore.
Improved information Accuracy.
Enhanced Focus.
Able to write the text through both keyboard and voice input.
Requires less consumption of time in writing text.
Provide significant help for the people with disabilities.
Lower operational costs.
4.1.2 DISADVANTAGE
Low accuracy.
Not good in the noisy environment.
Lack of Misinterpretation.
Voice recognition software won't always put your words on the screen
completely accurately.
Time Costs and Productivity.
Accents and Speech Recognition.
Background Noise Interference.
Physical Side Effects.
4.2 RESULTS
17
CHAPTER 5
5.1 CONCLUSION
In an era where voice assistants are more popular than ever, an API like this gives
you a quick shortcut to building bots that understand and speak human language.
Adding voice control to your apps can also be a great form of accessibility
enhancement. Users with visual impairment can benefit from both speech-to-text
and text-to-speech user interfaces.
This work can be taken into more detail and more work can be done on the project
in order to bring modifications and additional features. The current software doesn’t
support a large vocabulary, the work will be done in order to accumulate a greater
number of samples and increase the efficiency of the software. The current version
of the software supports only few areas of the notepad but more areas can be
covered and effort will be made in this regard.
18
REFERENCES
1. https://www.w3schools.com/html/default.asp
2. https://www.w3schools.com/html/html_css.asp
19
APPENDIX
A. SCREENSHOTS
Fig4.2
OUTPUT 2
20
Fig4.3 OUTPUT 3
Fig4.4 OUTPUT 4
21
B. SOURCE CODE
<!DOCTYPE html>
<html>
<head>
<title>Speech to text conversion</title>
<link rel="stylesheet" type="text/css"
href="https://cdnjs.cloudflare.com/ajax/libs/font-
awesome/4.6.1/css/font-awesome.min.css"/>
<style type="text/css">
body
{
font-family:verdana;
text-align:center;
background-position: center center;
}
#result
{
height: 100px;
border: 1px solid #ccc;
padding: 10px;
box-shadow:0 0 10px 0 #bbb;
margin-top:35px;
margin-bottom: 35px;
font-size: 14px;
line-height: 25px;
}
button
22
{
font-size: 20px;
position:absolute;
top: 240px;
left: 50%;
}
</style>
</head>
<body>
<h4 align="center">VOICE TO TEXT CONVERTER</h4>
<div id="result"></div>
<script type="text/javascript">
var r=document.getElementById('result');
function startConverting()
{
if('webkitSpeechRecognition' in window)
{
23
var finalTranscripts='';
speechRecognizer.onresult=function(event)
{
var interimTranscripts=' ';
for(var i=event.resultIndex; i<event.results.length; i++)
{
var transcript=event.results[i][0].transcript;
transcript.replace("\n","<br>");
if(event.results[i].isFinal)
{
finalTranscripts+=transcript;
}
else
{
interimTranscripts+=transcript;
}
}
r.innerHTML=finalTranscripts+'<span
style="color:#999">'+interimTranscripts+'</span>';
};
speechRecognizer.onerror=function(event){
};
}
else
{
r.innerHTML='your browser is not suppported';
}
}
24
</script>
</body>
</html>
25