Visual Assist
Visual Assist
INTRODUCTION
1.1 INTRODUCTION
Navigating through the world poses significant challenges for individuals with
visual impairments, often requiring reliance on external assistance or memorized
routes. Traditional navigation aids fall short in providing real-time, context-aware
guidance, limiting the independence and mobility of visually impaired individuals.
However, advancements in technology, particularly in the field of computer vision
and smartphone capabilities, offer promising solutions to address these challenges.
1
1.3 SCOPE OF THE PROJECT
The video processing module is responsible for extracting the audio from the
video recording file, while the speech recognition module converts the extracted
audio into a text document. To adapt the speech recognition system to the speaker's
voice and speech patterns, This system will use language model adaptation
techniques. These techniques will enhance the system's ability to recognize the
speaker's words and phrases accurately.
2
1.5 ORGANIZATIONAL REPORT
3
CHAPTER 2
LITERATURE SURVEY
In present scenario, video plays a vital role to help people understand and
comprehend the information for example the songs, movies or the video lectures or
any other multimedia data relevant to the user. Hence, here it becomes important to
make videos available to the people having auditory problems and even more for
the people to remove the gaps of their native language. This can be best done by
the use of subtitles of the video. HoThis systemver, downloading subtitles of any
video from the internet is a monotonous process. Consequently, to generate
subtitles automatically through the software itself and without the use of internet is
a valid subject of research. Hence, this research paper resolves the above issue
through thr|ee distinct modules namely Audio Extraction which converts an input
file of any format supported by MPEG standards to .wav format. Here 24%
reduction rate has been achieved in the size of the song after the extraction. Then
Speech Recognition of the extracted .wav file is implemented and finally, Subtitle
Generation in which a .txt/.start file is generated which is synchronized with the
input file.
4
of-the-art technology in personalized assistant development. JARVIS incorporates
the poThis systemr of AIML and with the industry-leading Google platform for
text-to-speech conversion and the voice of the Male Pitch in the GTTS libraries
inspired from the Marvel World. This is the result of the adoption of the dynamic
base Pythons pyttsx which considers intentionally in adjacent phases of GTTS and
AIML, facilitating the establishment of considerably smooth dialogues betThis
systemen the assistant and the users. This is a unique result of the exaggerated
contribution of several contributors such as the feasible use of AIML and its
dynamic fusion with platforms like Python[pyttsx] and GTTS[Google Text to
Speech] resulting into a consistent and modular structure of JARVIS exposing the
widespread reusability and negligible maintenance.
Speech synthesis has come a long way as current text-to-speech (TTS) models can
now generate natural human-sounding speech. HoThis systemver, most of the TTS
research focuses on using adult speech data and there has been very limited work
done on child speech synthesis. This study developed and validated a training
pipeline for fine-tuning state-of-the-art (SOTA) neural TTS models using child
speech datasets. This approach adopts a multi-speaker TTS retuning workflow to
provide a transfer-learning pipeline. A publicly available child speech dataset was
cleaned to provide a smaller subset of approximately 19 hours, which formed the
basis of our fine-tuning experiments. Both subjective and objective evaluations
This systemre performed using a retrained MOSNet for objective evaluation and a
novel subjective framework for mean opinion score (MOS) evaluations. Subjective
evaluations achieved the MOS of 3.95 for speech intelligibility, 3.89 for voice
naturalness, and 3.96 for voice consistency.
5
Objective evaluation using a retrained MOSNet shoThis systemd a strong
correlation betThis systemen real and synthetic child voices. Speaker similarity was
also verified by calculating the cosine similarity betThis systemen the embeddings
of utterances. An automatic speech recognition (ASR) model is also used to
provide a word error rate (THIS SYSTEMR) comparison betThis systemen the real
and synthetic child voices. The final trained TTS model was able to synthesize
child-like speech from reference audio samples as short as 5 seconds.
Voice control is a major growing feature that change the way people can live. The
voice assistant is commonly being used in smart phones and laptops. AI-based
Voice assistants are the operating systems that can recognize human voice and
respond via integrated voices. This voice assistant will gather the audio from the
microphone and then convert that into text, later it is sent through GTTS (Google
text to speech). GTTS engine will convert text into audio file in English language,
then that audio is played using play sound package of python programming
Language.
In this work, This system address the problem of audio-based near-duplicate video
retrieval. This system propose the Audio Similarity Learning (AuSiL) approach
that effectively captures temporal patterns of audio similarity betThis systemen
video pairs. For the robust similarity calculation betThis systemen two videos, This
system first extract representative audio-based video descriptors by leveraging
transfer learning based on a Convolution Neural Network (CNN) trained on a large
6
scale dataset of audio events, and then This system calculate the similarity matrix
derived from the Pair wise similarity of these descriptors. The similarity matrix is
subsequently fed to a CNN network that captures the temporal structures existing
within its content. This system train our network following a triplet generation
process and optimizing the triplet loss function. To evaluate the effectiveness of the
proposed approach, This system have manually annotated two publicly available
video datasets based on the audio duplicity betThis systemen their videos. The
proposed approach achieves very competitive results compared to three state-of-
the-art methods. Also, unlike the competing methods, it is very robust to the
retrieval of audio duplicates generated with speed transformations.
AI poThis systemred navigation is very useful, for blind people that help them
in identify object, known and unknown faces. The device can provide audio
descriptions, such as the presence of a tree and a park bench, recognize and
announce the pre-programmed faces of friends and family and provides both color
and object recognition.
7
CHAPTER 3
SYSTEM ANALYSIS
The existing methods for extracting text from video recordings involve manual
transcription or captioning, which is a time-consuming and expensive process. In
manual transcription, a human transcriber listens to the audio portion of the video
recording and types out the speech into a text document. Captioning, on the other
hand, involves adding subtitles to the video recording manually. While this method
can provide a convenient way for hearing-impaired individuals to access video
content, it requires a significant amount of time and effort to produce captions.
Disadvantages:
8
3.2 PROPOSED SYSTEM
In our proposed system, This system aim to automate the process of speech
recognition and text extraction from video recordings using artificial intelligence
techniques. The system will use the Movie Py library to extract the audio portion of
the video recording file and pass it through Google's speech recognition library for
text conversion. This system will use pre-processing techniques, such as noise
reduction and normalization, to improve the accuracy of the speech recognition.
This system will also use language model adaptation techniques to adapt the speech
recognition system to the speaker's voice and speech patterns.
The proposed system has several advantages over the existing methods. Firstly,
it eliminates the need for manual transcription or captioning, making the process
more efficient and scalable. Secondly, it provides a convenient way for hearing-
impaired individuals to access video content by automatically generating subtitles.
Thirdly, it can be used to extract speeches from recorded lectures or meetings,
providing a convenient way to review and analyze the content of these recordings.
Finally, it can be used in content-based video retrieval systems to enable users to
search for specific videos based on the spoken content. GTTS (Google Text-to-
Speech), a Python library and CLI tool to interface with Google Translate text-to-
speech API. Writes spoken mp3 data to a file, a file-like object (byte string) for
further audio manipulation, or std out. It features flexible pre-processing and
tokenizing.
9
Fig.3.1 Proposed Diagram
MoviePy is a Python module for video editing, which can be used for basic
operations on videos and GIF’s. Video is formed by the frames, combination of
frames creates a video each frame is an individual image
Overall, our proposed system can significantly improve the accessibility and
convenience of video content for everyone, regardless of their hearing ability or
preference for reading over watching videos.
10
CHAPTER 4
SYSTEM SPECIFICATION
11
syntactical constructions than other languages.
12
Data science and machine learning with Python
Sophisticated data analysis has become one of fastest-moving areas of IT and one
of Python’s star use cases. The vast majority of the libraries used for data science
or machine learning have Python interfaces, making the language the most popular
high-level command interface to for machine learning libraries and other numerical
algorithms.
This systemb services and REST Full APIs in Python
Python’s native libraries and third-party This systemb frameworks provide fast and
convenient ways to create everything from simple REST APIs in a few lines of
code to full-blown, data-driven sites. Python’s latest versions have strong support
for asynchronous operations, letting sites handle tens of thousands of requests per
second with the right libraries.
Meta programming and code generation in Python
In Python, everything in the language is an object, including Python modules and
libraries themselves. This lets Python work as a highly efficient code generator,
making it possible to write applications that manipulate their own functions and
have the kind of extensibility that would be difficult or impossible to pull off in
other languages.
Python can also be used to drive code-generation systems, such as LLVM, to
efficiently create code in other languages.
“Glue code” in Python
Python is often described as a “glue language,” meaning it can let disparate code
(typically libraries with C language interfaces) interoperate. Its use in data science
and machine learning is in this vein, but that’s just one incarnation of the general
idea. If you have applications or program domains that you would like to hitch up,
but cannot talk to each other directly, you can use Python to connect them.
13
Python 2 vs. Python 3
Python is available in two versions, which are different enough to trip up many
new users. Python 2.x, the older “legacy” branch, will continue to be supported
(that is, receive official updates) through 2020, and it might persist unofficially
after that. Python 3.x, the current and future incarnation of the language, has many
useful and important features not found in Python 2.x, such as new syntax features
(e.g., the “walrus operator”), better concurrency controls, and a more efficient
interpreter.
Python 3 adoption was sloThis systemd for the longest time by the relative lack of
third-party library support. Many Python libraries supported only Python 2, making
it difficult to switch.
But over the last couple of years, the number of libraries supporting only Python 2
has dwindled; all of the most popular libraries are now compatible with both
Python 2 and Python 3. Today, Python 3 is the best choice for new projects; there
is no reason to pick Python 2 unless you have no choice.
Python’s libraries
The success of Python rests on a rich ecosystem of first- and third-party software.
Python benefits from both a strong standard library and a generous assortment of
easily obtained and readily used libraries from third-party developers. Python has
been enriched by decades of expansion and contribution.
Python’s standard library provides modules for common programming tasks
—math, string handling, file and directory access, networking, asynchronous
operations, threading, multiprocessors management, and so on. But it also includes
modules that manage common, high-level programming tasks needed by modern
applications: reading and writing structured file formats like JSON and XML,
manipulating compressed files, working with internet protocols and data formats
(This systemb pages, URLs, email). Most any external code that exposes a C-
14
compatible foreign function interface can be accessed with Python’s types module.
The default Python distribution also provides a rudimentary, but useful, cross-
platform GUI library via Tkinter, and an embedded copy of the SQLite 3 database.
The thousands of third-party libraries, available through the Python Package
Index (PyPI), constitute the strongest showcase for Python’s popularity and
versatility.
For example:
The Beautiful Soup library provides an all-in-one toolbox for scraping HTML—
even tricky, broken HTML—and extracting data from it.
Requests makes working with HTTP requests at scale painless and simple.
Frameworks like Flask and Django allow rapid development of This systemb
services that encompass both simple and advanced use cases.
Like C#, Java, and Go, Python has garbage-collected memory management,
meaning the programmer doesn’t have to implement code to track and release
objects. Normally, garbage collection happens automatically in the background, but
if that poses a performance problem, you can trigger it manually or disable it
entirely, or declare whole regions of objects exempt from garbage collection as a
performance enhancement.
An important aspect of Python is its dynamism. Everything in the language,
including functions and modules themselves, are handled as objects. This comes at
the expense of speed (more on that later), but makes it far easier to write high-level
code.
Developers can perform complex object manipulations with only a few
instructions, and even treat parts of an application as abstractions that can be
altered if needed.
Python’s use of significant whitespace has been cited as both one of
Python’s best and worst attributes. The indentation on the second line below isn’t
just for readability; it is part of Python’s syntax.
15
Python interpreters will reject programs that don’t use proper indentation to
indicate control flow.
with open(‘myfile.txt’) as my_file:
file lines = [x. strips(‘\n’) for x in my_file]
Syntactical white space might cause noses to wrinkle, and some people do
reject Python for this reason. But strict indentation rules are far less obtrusive in
practice than they might seem in theory, even with the most minimal of code
editors, and the result is code that is cleaner and more readable.
Another potential turnoff, especially for those coming from languages like C
or Java, is how Python handles variable typing. By default, Python uses dynamic or
“duck” typing—great for quick coding, but potentially problematic in large code
bases. That said, Python has recently added support for optional compile-time type
hinting, so projects that might benefit from static typing can use it.
4.3.2 MySQL
What is MySQL? – An Introduction to Database Management Systems
Database Management is the most important part when you have humungous data
around you. MySQL is one of the most famous Relational Database to store &
handle your data. In this What is MySQL blog, you will be going through the
following topics:
What are Data & Database?
Database Management System & Types of DBMS
Structured Query Language (SQL)
MySQL & its features
MySQL Data Types
What are Data & Database?
Suppose a company needs to store the names of hundreds of employees working in
the company in such a way that all the employees can be individually identified.
Then, the company collects the data of all those employees. Now, when I say data,
16
I mean that the company collects distinct pieces of information about an object. So,
that object could be a real-world entity such as people, or any object such as a
mouse, laptop etc.
Database Management System & Types of DBMS
A Database Management System (DBMS) is a software application that interacts
with the user, applications and the database itself to capture and analyze data. The
data stored in the database can be modified, retrieved and deleted, and can be of
any type like strings, numbers, images etc.
Types of DBMS
There are mainly 4 types of DBMS, which are Hierarchical, Relational, Network,
and Object-Oriented DBMS.
Hierarchical DBMS: As the name suggests, this type of DBMS has a
style of predecessor-successor type of relationship. So, it has a structure
similar to that of a tree, wherein the nodes represent records and the
branches of the tree represent fields.
Relational DBMS (RDBMS): This type of DBMS, uses a structure that
allows the users to identify and access data in relation to another piece of
data in the database.
Network DBMS: This type of DBMS supports many to many relations
wherein multiple member records can be linked.
Object-oriented DBMS: This type of DBMS uses
small individual software called objects. Each object contains a piece of
data, and the instructions for the actions to be done with the data.
Structured Query Language (SQL)
SQL is the core of a relational database which is used for accessing and managing
the database. By using SQL, you can add, update or delete rows of data, retrieve
subsets of information, modify databases and perform many actions. The different
subsets of SQL are as follows:
DDL (Data Definition Language) – It allows you to perform various
17
operations on the database such as CREATE, ALTER and DELETE
objects.
DML (Data Manipulation Language) – It allows you to access and
manipulate data. It helps you to insert, update, delete and retrieve data
from the database.
DCL (Data Control Language) – It allows you to control access to the
database. Example – Grant or Revoke access permissions.
TCL (Transaction Control Language) – It allows you to deal with the
transaction of the database. Example – Commit, Rollback, save point, Set
Transaction.
4.3.3 Using MySQL
There’s not a lot of point to being able to change HTML output dynamically unless
you also have a means to track the changes that users make as they use your This
systembsite. In the early days of the This systemb, many sites used “flat” text files
to store data such as usernames and passwords. But this approach could cause
problems if the file wasn’t correctly locked against corruption from multiple
simultaneous accesses. Also, a flat file can get only so big before it becomes
unwieldy to manage—not to mention the difficulty of trying to merge files and
perform complex searches in any kind of reasonable time. That’s where relational
databases with structured querying become essential. And MySQL, being free to
use and installed on vast numbers of Internet This systemb servers, rises superbly
to the occasion.
The highest level of MySQL structure is a database, within which you can have one
or more tables that contain your data. For example, let’s suppose you are working
on a table called users, within which you have created columns for surname, first
name, and email, and you now wish to add another user. One command that you
might use to do this is: INSERT INTO users VALUES ('Smith', 'John',
'jsmith@mysite.com'); Of course, as mentioned earlier, you will have issued other
commands to create the database and table and to set up all the correct fields, but
18
the INSERT command here shows how simple it can be to add new data to a
database. The INSERT command is an example of SQL (which stands for
Structured Query Language), a language designed in the early 1970s and
reminiscent of one of the oldest programming languages, COBOL. It is This
systemll suited, hoThis systemver, to database queries, which is why it is still in
use after all this time. It’s equally easy to look up data. Let’s assume that you have
an email address for a user and you need to look up that person’s name. To do this,
you could issue a MySQL query such as: SELECT surname, first name FROM
users WHERE email='jsmith@mysite.com'; MySQL will then return Smith, John
and any other pairs of names that may be associated with that email address in the
database. As you’d expect, there’s quite a bit more that you can do with MySQL
than just simple INSERT and SELECT commands. For example, you can join
multiple tables according to various criteria, ask for results in a variety of different
orders, make partial matches when you know only part of the string that you are
searching for, return only the nth result, and a lot more
The Apache This systemb Server
In addition to PHP, MySQL, JavaScript, and CSS, there’s actually a fifth hero in
the dynamic This systemb: the This systemb server. In the case of this book, that
means the Apache This systemb server. This system’ve discussed a little of what a
This systemb server does during the HTTP server/client exchange, but it actually
does much more behind the scenes.
But these objects don’t have to be static files, such as GIF images. They can all be
generated by programs such as PHP scripts. That’s right: PHP can even create
images and other files for you, either on the fly or in advance to serve up later. To
do this, you normally have modules either precompiled into Apache or PHP or
called up at runtime. One such module is the GD library (short for Graphics Draw),
which PHP uses to create and handle graphics.
Apache also supports a huge range of modules of its own. In addition to the
PHP module, the most important for your purposes as a This systemb programmer
19
are the modules that handle security. Other examples are the Rewrite module,
which enables the This systemb server to handle a varying range of URL types and
rewrite them to its own internal requirements, and the Proxy module, which you
can use to serve up often-requested pages from a cache to ease the load on the
server.
Later in the book, you’ll see how to actually use some of these modules to enhance
the features provided by the core technologies This system cover. About Open
Source Whether or not being open source is the reason these technologies are so
popular has often been debated, but PHP, MySQL, and Apache are the three most
commonly used tools in their categories. What can be said, though, is that being
open source means that they have been developed in the community by teams of
programmers writing the features they themselves want and need, with the original
code available for all to see and change.
What Is a WAMP, MAMP, or LAMP?
WAMP, MAMP, and LAMP are abbreviations for “Windows, Apache, MySQL,
and PHP,” “Mac, Apache, MySQL, and PHP,” and “Linux, Apache, MySQL, and
PHP,” 13 www.it-ebooks.info respectively. These abbreviations describe a fully
functioning setup used for developing dynamic Internet This systemb pages.
WAMPs, MAMPs, and LAMPs come in the form of a package that binds the
bundled programs together so that you don’t have to install and set them up
separately. This means you can simply download and install a single program and
follow a few easy prompts to get your This systemb development server up and
running in the quickest time with the minimum hassle. During installation, several
default settings are created for you. The security configurations of such an
installation will not be as tight as on a production This systemb server, because it is
optimized for local use. For these reasons, you should never install such a setup as
a production server. HoThis systemver, for developing and testing This
systembsites and applications, one of these installations should be entirely
sufficient.
20
Using an IDE
As good as dedicated program editors can be for your programming productivity,
their utility pales into insignificance when compared to Integrated Developing
Environments (IDEs), which offer many additional features such as in-editor
debugging and program testing, as This systemll as function descriptions and much
more.
21
framework uses This systemrkzeug as one of its bases.Plus, Flask gives you so
much more CONTROL on the development stage of your project. It follows the
principles of minimalism and lets you decide how you will build your application.
Flask has a lightThis systemight and modular design, so it easy to
transform it to the This systemb framework you need with a few
extensions without This systemighing it down
Flask documentation is comprehensive, full of examples and This
systemll structured. You can even try out some sample application to
really get a feel of Flask.
22
CHAPTER 5
SYSTEM DESIGN
5.1 SYSTEM ARCHITECTURE
The video to speech to text system architecture consists of several components
working together to extract and convert speech from a video recording into a text
document.
Fig 5.1 SYSTEM ARCHITECTURE
The first component is the video processing module, which is responsible for
extracting the audio from the video recording file. This module typically uses a
library such as MoviePy or OpenCV to extract the audio from the video file. The
second component is the speech recognition module, which converts the extracted
audio into text. This module typically uses a speech recognition library such as
Google's Speech-to-Text API or IBM's Watson Speech-to-Text API to perform the
speech recognition. The third component is the pre-processing module, which
prepares the audio for speech recognition by applying techniques such as noise
reduction, normalization, and segmentation. These techniques help to improve the
23
accuracy of the speech recognition and ensure that the text generated is of high
quality.
Finally, the output of the speech recognition module is a text document, which can
be further processed by natural language processing (NLP) techniques to extract
key phrases, identify entities, and perform other text analysis tasks.
Overall, the video to speech to text system architecture is designed to automate the
process of speech recognition and text extraction from video recordings. By
integrating these components into a single system, the architecture enables the
efficient and scalable conversion of speech to text, making video content more
accessible and convenient for a wide range of users
5.2 DATA FLOW DIAGRAM
A two-dimensional diagram explains how data is processed and transferred in a
system. The graphical depiction identifies each source of data and how it interacts
without her data source store each a common output. Individuals seeking to draft a
dataflow diagram must identify external inputs and outputs, determine how the
inputs and outputs
Relate to each other, and explain with graphics how these connections relate
and what they result.
Data flow diagrams can be divided into logical and physical. The logical data
flow diagram describes flow of data through a system to perform certain
functionality of a business. The physical data flow diagram describes the
implementation of the logical data flow. In this type of diagram helps business
development and design teams visualize how data is processed and identify or
improve certain aspects.
24
DATA FLOW SYMBOLS:
Symbol Description
A data flows.
25
Fig 5.2 Data Flow Diagram
5.3 ER-DIAGRAM
5.3.1 INTRODUCTION:
This is the highest level ER model in that it contains the least granular detail
but establishes the overall scope of what is to be included within the model set.
The conceptual ER model normally defines master referenced at a entities that are
commonly used by the organization. Developing an enterprise-wide conceptual
ER model issue full to support document the data architecture for an organization.
26
5.3.3 LOGICAL DATA MODEL:
A logical ER model does not require a conceptual ER model, especially if
the scope of the logical ER model includes only the development to of a
distinction formation system. The logical ER model contains more detail than the
conceptual ER model.
27
Fig 5.4.1 Use Case Diagram
28
CHAPTER 6
SYSTEM IMPLEMENTATION
6.1 VIDEO PROCESSING
MoviePy is a Python module for video editing, which can be used for basic
operations on videos and GIF’s. Video is formed by the frames, combination of
frames creates a video each frame is an individual image. An audio file format is a
file format for storing digital audio data on a computer system. The bit layout of
the audio data is called the audio coding format and can be uncompressed, or
compressed to reduce the file size, often using lossy compression. This system can
load the audio file with the help of Audio File Clip method
6.2 SPEECH RECOGNITION
There are several APIs available to convert text to speech in Python. One of
such APIs is the Google Text to Speech API commonly known as the GTTS API.
GTTS is a very easy to use tool which converts the text entered, into audio which
can be saved as a mp3 file. The GTTS API supports several languages including
English, Hindi, Tamil, French, German and many more. The speech can be
delivered in any one of the two available audio speeds, fast or slow. HoThis
systemver, as of the latest update, it is not possible to change the voice of the
generated audio.
The technology works by analyzing written text and converting it into an audio
file that can be played back through a speaker or headphones. The process involves
several steps, including natural language processing, linguistic analysis, and audio
Synthesis. First, the text is analyzed using natural language processing algorithms
to identify the words and their meaning. The software then applies linguistic
analysis to the text, determining the pronunciation of each word and the appropriate
intonation, stress, and rhythm for the sentence.
Finally, the software uses audio synthesis techniques to create a digital audio
file that replicates the human voice, which can then be played back through a
29
speaker or headphones.
The Google Text-to-Speech (GTTS) library in Python is a Python wrapper for the
Google Text-to-Speech API. It allows developers to easily convert written text into
natural-sounding audio speech in a variety of languages and voices using the
poThis systemr of Google’s neural network. With the GTTS library, developers can
simply import the package, pass in the text to be spoken and the language, and then
save the audio output to a file or play it directly. The library also supports
customizable settings such as the speech rate and volume, as This systemll as the
option to save the audio output in various audio formats.
To use the GTTS library, developers will need to have an internet connection as the
library requires access to the Google Text-to-Speech API to convert the text to
speech. The library is compatible with Python 2 and 3, and can be installed using
31
One of the primary tasks of the post-processing module is to perform text
normalization, which involves standardizing the text to ensure consistency in
spelling, grammar, and punctuation. This step is particularly important when
dealing with large amounts of text generated by the system, as it helps to improve
the readability and accuracy of the text output.
Another important task of the post-processing module is named entity
recognition (NER), which involves identifying and categorizing named entities in
the text, such as people, organizations, and locations. This task is useful in
applications such as automatic subtitling or video indexing, where the identification
of important keywords and phrases can be used to improve the searchability and
accessibility of the video content.
In addition to text normalization and NER, the post-processing module may
also perform other NLP tasks, such as sentiment analysis or topic modeling, to
extract further insights from the text generated by the system. These techniques can
be particularly useful in applications such as market research or social media
analysis, where the extraction of meaningful information from large amounts of
text data is essential.
Overall, the post-processing module plays a crucial role in the video to speech
to text system architecture, enabling the extraction of useful information from the
text generated by the system.
32
CHAPTER 7
SYSTEM TESTING
Application testing is a software testing technique exclusively adopted to test the
applications that are hosted on the This systemb in which the application
interfaces and other functionalities are tested.
With all the impending issues, This systemb app testing holds more importance
than ever. HoThis systemver, testing a This systemb application is not just an
ordinary task, and depends on several factors such as compatibility across various
browsers, application performance, user experience, user acceptance, ensuring
proper security, etc.
The enterprises must deploy skilled testers to assess all aspects of the This
systembsite across platforms, browsers, and devices. The testers must always
implement This systemb application testing best practices in order to produce
accurate and reliable test results without increasing testing times.
The most common types of testing involved in the development process are:
Functionality Test
Usability Test
Interface Test
Compatibility Test
Performance Test
7.1 FUNCTIONALITY TESTING
This step ensures that the functionalities of an application are properly
functioning or not. Functional testing takes place in the source code.
Functionality testing includes:
Determining the data input and entry
33
Test case execution
Functions need to be properly identified because the software runs
effectively through the integration of functions
Actual results must be analyzed.
Verify there is no dead page or invalid redirects.
First check all the validations on each field.
Wrong inputs to perform negative testing.
Verify the workflow of the system.
Verify the data integrity.
34
correctly
Verify if all linked documents be supported/opened on all
platforms
Verify the security requirements or encryption while
communication happens betThis systemen systems
35
The final and most important step of testing an application is Security testing.
When a application is being built, there is a lot of data that is being used and stored.
Some of this data can be sensitive and needs to be protected at any cost, failure of
which can cause a lot of technical issues for an application to function properly. In
order to fully secure these types of mission-critical data, security testing is
implemented.
36
CHAPTER 8
EXPERIMENTAL RESULT
14
12
10
0
Existing System Proposed System
Based on the research that has been done, it can be concluded that to be able to do
extract audio from raw video using Movie py library , the first stage is the process
of audio extracting. core components are a video processing module and a speech
recognition module. The video processing module will be responsible for
extracting the audio from the video recording file, while the speech recognition
module will convert the extracted audio into a text document. The system will use
the MoviePy library to extract the audio portion of the video recording file and
Google's speech recognition library for text conversion
37
CHAPTER 9
CONCLUSION AND FUTURE ENHANCEMENT
9.1 CONCLUSION
In conclusion, the text-from-video system This system have proposed can
significantly improve the accessibility and convenience of video content for
everyone, regardless of their hearing ability or preference for reading over
watching videos. The system can automate the process of speech recognition and
text extraction from video recordings, eliminating the need for manual transcription
or captioning, making the process more efficient and scalable. The system's core
components are a video processing module and a speech recognition module that
uses the Movie Py library to extract the audio portion of the video recording file
and Google's speech recognition library for text conversion. This system have also
implemented pre-processing techniques such as noise reduction and normalization
to improve the accuracy of the speech recognition. other live broadcasts. This could
significantly improve the accessibility and inclusivity of such events, as it would
enable individuals who are deaf or hard of hearing to follow along with the spoken
content.
38
Overall, the text-from-video system This system have proposed has the
potential to significantly improve the accessibility and convenience of video
content for a wide range of users.
There are several potential future enhancements for this system. Firstly, This
system could explore different speech recognition libraries and compare their
performance to Google's speech recognition library. This could potentially improve
the accuracy of the system and make it more robust. Secondly, This system could
explore different pre-processing techniques and language model adaptation
techniques to improve the system's accuracy further. Thirdly, This system could
develop a more advanced user interface that allows users to interact with the
system and edit the generated text. This could improve the usability of the system
and make it more accessible to a broader range of users.
39
CHAPTER 10
APPENDICES
10.1 SOURCE CODE
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Title</title>
<style>
@import url('https://fonts.googleapis.com/css?family=Montserrat:400,800');
*{
box-sizing: border-box;
}
body {
background: #f6f5f7;
display: flex;
justify-content: center;
align-items: center;
flex-direction: column;
font-family: 'Montserrat', sans-serif;
height: 100vh;
margin: -20px 0 50px;
}
40
h1 {
font-This systemight: bold;
margin: 0;
}
h2 {
text-align: center;
}
p{
font-size: 14px;
font-This systemight: 100;
line-height: 20px;
letter-spacing: 0.5px;
margin: 20px 0 30px;
}
span {
font-size: 12px;
}
a{
color: #333;
font-size: 14px;
text-decoration: none;
margin: 15px 0;
}
.button {
border-radius: 20px;
41
border: 1px solid #FF4B2B;
background-color: #FF4B2B;
color: #FFFFFF;
font-size: 12px;
font-This systemight: bold;
padding: 12px 45px;
letter-spacing: 1px;
text-transform: uppercase;
transition: transform 80ms ease-in;
}
.button:active {
transform: scale(0.95);
}
.button:focus {
outline: none;
}
.button.ghost {
background-color: transparent;
border-color: #FFFFFF;
}
form {
background-color: #FFFFFF;
display: flex;
align-items: center;
42
justify-content: center;
flex-direction: column;
padding: 0 50px;
height: 100%;
text-align: center;
}
input {
background-color: #eee;
border: none;
padding: 12px 15px;
margin: 8px 0;
width: 100%;
}
.container {
background-color: #fff;
border-radius: 10px;
box-shadow: 0 14px 28px rgba(0,0,0,0.25),
0 10px 10pxrgba(0,0,0,0.22);
position: relative;
overflow: hidden;
width: 768px;
max-width: 100%;
min-height: 480px;
}
.form-container {
position: absolute;
top: 0;
height: 100%;
43
transition: all 0.6s ease-in-out;
}
.sign-in-container {
left: 0;
width: 50%;
z-index: 2;
}
.container.right-panel-active .sign-in-container {
transform: translateX(100%);
}
.sign-up-container {
left: 0;
width: 50%;
opacity: 0;
z-index: 1;
}
.container.right-panel-active .sign-up-container {
transform: translateX(100%);
opacity: 1;
z-index: 5;
animation: show 0.6s;
}
@keyframes show {
44
0%, 49.99% {
opacity: 0;
z-index: 1;
}
50%, 100% {
opacity: 1;
z-index: 5;
}
}
.overlay-container {
position: absolute;
top: 0;
left: 50%;
width: 50%;
height: 100%;
overflow: hidden;
transition: transform 0.6s ease-in-out;
z-index: 100;
}
.container.right-panel-active .overlay-container{
transform: translateX(-100%);
}
.overlay {
background: #FF416C;
background: -This systembkit-linear-gradient(to right, #FF4B2B,
45
#FF416C);
background: linear-gradient(to right, #FF4B2B, #FF416C);
background-repeat: no-repeat;
background-size: cover;
background-position: 0 0;
color: #FFFFFF;
position: relative;
left: -100%;
height: 100%;
width: 200%;
transform: translateX(0);
transition: transform 0.6s ease-in-out;
}
.container.right-panel-active .overlay {
transform: translateX(50%);
}
.overlay-panel {
position: absolute;
display: flex;
align-items: center;
justify-content: center;
flex-direction: column;
padding: 0 40px;
text-align: center;
top: 0;
height: 100%;
width: 50%;
46
transform: translateX(0);
transition: transform 0.6s ease-in-out;
}
.overlay-left {
transform: translateX(-20%);
}
.container.right-panel-active .overlay-left {
transform: translateX(0);
}
.overlay-right {
right: 0;
transform: translateX(0);
}
.container.right-panel-active .overlay-right {
transform: translateX(20%);
}
.social-container {
margin: 20px 0;
}
.social-container a {
border: 1px solid #DDDDDD;
border-radius: 50%;
display: inline-flex;
47
justify-content: center;
align-items: center;
margin: 0 5px;
height: 40px;
width: 40px;
}
footer {
background-color: #222;
color: #fff;
font-size: 14px;
bottom: 0;
position: fixed;
left: 0;
right: 0;
text-align: center;
z-index: 999;
}
footer p {
margin: 10px 0;
}
footer i {
color: red;
}
footer a {
color: #3c97bf;
48
text-decoration: none;
}
</style>
</head>
<body>
<h2>KALAIARASAN & TEAM PROJECT</h2>
<div class="container" id="container">
<div class="form-container sign-up-container">
<form action="#">
<h1>Create Account</h1>
<div class="social-container">
<a href="#" class="social"><i class="fab fa-facebook-
f"></i></a>
<a href="#" class="social"><i class="fab fa-google-
plus-g"></i></a>
<a href="#" class="social"><i class="fab fa-linkedin-
in"></i></a>
</div>
<span>or use your email for registration</span>
<input type="text" placeholder="Name" />
<input type="email" placeholder="Email" />
<input type="password" placeholder="Password" />
<button>Sign Up</button>
</form>
</div>
<div class="form-container sign-in-container">
<form action="#">
<h1>Sign in</h1>
<input type="email" placeholder="Email" />
49
<input type="password" placeholder="Password" />
<a href="#">Forgot your password?</a>
<a class="button" href="/upload">Login</a>
</form>
</div>
<div class="overlay-container">
<div class="overlay">
<div class="overlay-panel overlay-left">
<h1>This systemlcome Back!</h1>
<p>To keep connected with us please login with your
personal info</p>
<button class="ghost" id="signIn">Sign In</button>
</div>
<div class="overlay-panel overlay-right">
<h1>Hello!</h1>
<p>Click Login</p>
<!-- <a><button
class="ghost" id="signUp">Login</button></a>-->
</div>
</div>
</div>
</div>
</body>
</html>
50
10.2 SCRENSHOTS
51
10.2.3 TEXT GENERATOR MESSAGE
52
REFERENCES
[1]. M. Xu, S. Yan, T-S. Chua, R. Hong, M. Wang. Dynamic captioning: video
accessibility enhancement for hearing impairment. In Proc of the ACM
Multimedia Conference, pp. 421–430, New York, NY, USA, 2010
[2]. L. Jelinek, D. Jackson. Television literacy: comprehension of program
content using closed captions for the deaf. Journal of Deaf Stud. Deaf Educ.,
Vol. 6, N. 1, pp. 43–53, 2001
[3]. T. Garza. Evaluating the use of captioned video materials in advanced
foreign language learning. Foreign Language Annals, Vol. 24, N. 3, pp. 239-
258, May 1991
[4]. G. Penn, E. Toms, D. James, C. Munteanu, R. Baecker. The effect of
speech recognition accuracy rates on the usefulness and usability of This
systembcast archives
[5]. M. Wald. Crowdsourcing correction of speech recognition captioning
errors. In Proc of the W4A Conference, New York, NY, USA, 2011
[6]. A. Knight, K.C. Almeroth. Fast caption alignment for automatic indexing of
audio. International Journal of Multimedia Data Engineering and
Management, Vol. 1, N. 2, pp. 1–17. June 2010
[7]. The quality imperative. Global Monitoring Report, 2005
[8]. S. Tsuboi, N. Shimogori, T. Ikeda. Automatically generated captions: will
they help non-native speakers communicate in English
[9]. Maria Labied; AbdessamadBelangour; MouadBanane; AllaeErraissiAn
overview of Automatic Speech Recognition Preprocessing Techniques 2022.
53