Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Paper 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

e-ISSN: 2582-5208

International Research Journal of Modernization in Engineering Technology and Science


( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:06/Issue:04/April-2024 Impact Factor- 7.868 www.irjmets.com
VIDEO TO TEXT CONVERTER
Ajay N. Tembhare*1, Aastha P. Godange*2, Mohit S. Tondre*3, Pritam J. Satpute*4,
Sameer S. Selokar*5, Vaibhav Tembhurkar*6
*1,2,3,4,5,6Students, Department of Computer Science and Engineering, Guru Nanak Institute of Technology,
Nagpur, Maharashtra, India.
DOI : https://www.doi.org/10.56726/IRJMETS54631
ABSTRACT
The 'Video to Text Converter' project aims to develop an automated system capable of converting spoken
words in video content into textual transcripts efficiently. Leveraging advanced speech recognition and natural
language processing technologies, the system processes video content to extract audio tracks, which are then
transcribed into text using deep learning models. Post-processing techniques, including punctuation insertion
and spell checking, enhance transcription accuracy.
The system supports English transcription and finds applications in education, law enforcement, and content
creation industries. Overall, the project addresses the growing demand for tools that make video content more
accessible and searchable, offering valuable benefits across various domains.
Keywords: Video processing, Speech recognition, Natural language processing, Automated transcription, Deep
learning models etc.
I. INTRODUCTION
A video to text converter is a software tool or system that automatically transcribes spoken audio content from
a video file into written text. This technology utilizes speech recognition algorithms to analyze the audio track
of the video and convert it into a textual format.
The resulting text can then be edited, searched, indexed, or used for various purposes such as creating subtitles,
generating transcripts for accessibility purposes, or extracting information from video content for analysis or
documentation. The importance of video to text converters extends across various domains and applications. In
educational settings, these converters facilitate the creation of transcripts for instructional videos and lectures,
enhancing learning outcomes by providing searchable and indexed textual content.
In the realm of digital marketing, they play a vital role in improving search engine optimization (SEO) efforts by
making video content more discoverable through indexed transcripts. Moreover, video to text converters are
invaluable tools for content analysis, allowing researchers, marketers, and content creators to extract valuable
insights from video content. Techniques such as sentiment analysis, keyword extraction, and topic modeling
can be applied to video transcripts to glean actionable information and trends. In addition to their utility in
accessibility and content analysis, video to text converters also have significant implications in legal and
compliance contexts. Transcripts generated by these converters serve as official records in legal proceedings,
compliance audits, and regulatory requirements, ensuring accuracy and accountability.
II. METHODOLOGY
For a 'Video to Text Converter’ project, you'll need a combination of tools and platforms for various tasks such
as data preprocessing, model development, training, evaluation, and deployment. Here's a list of commonly
used tools and platforms for each stage of the project:
1. Data Collection and Preprocessing: Library Speech: A popular dataset for training speech recognition
models, containing read speech from audiobooks.
2. Model Development and Training: An open-source machine learning framework developed by Google,
widely used for building and training deep learning models, including speech recognition models.
3. Model Evaluation and Testing: Various open-source tools are available for calculating Word Error Rate
(WER) and other evaluation metrics for assessing the performance- of speech recognition models.
4. Deployment and Integration: A system for serving machine learning models in production, including
speech recognition models trained using TensorFlow.

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science


[11232]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:06/Issue:04/April-2024 Impact Factor- 7.868 www.irjmets.com
5. Development Environments: Interactive computing environment for developing and prototyping
machine learning models, including speech recognition models.
6. Version Control and Collaboration: Distributed version control system for tracking changes to code and
collaborating with team members on the development of speech recognition models.
III. MODELING AND ANALYSIS
Flow-Chart: Video to Text Converter.

IV. RESULTS AND DISCUSSION


1. First, we run the Python file, it will launch a window that looks just the UI we created.
2. When a user clicks on the menu item, we launch a file dialog box for the user to select the appropriate file.
3. Once converted into audio format, we can now start the transcription. Here we get the filename from the
text the user entered the output file name text box.
4. The image below shows the final version. The progress bar updates to show the transcription progress.
When complete we load the text file contents into the text area, which automatically adds scroll bars if
needed.

Figure 1: Video To Text Converter


www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[11233]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:06/Issue:04/April-2024 Impact Factor- 7.868 www.irjmets.com
V. CONCLUSION
In conclusion, this research presents a comprehensive study on the development and evaluation of a video-to-
text converter using Python programming language. The implemented converter leverages advanced machine
learning and natural language processing techniques to accurately transcribe spoken dialogue and extract
meaningful textual representations of visual content from videos.
The experimental results obtained from evaluating the converter demonstrate its effectiveness, reliability, and
real-world applicability in accurately converting videos to textual format. The achieved transcription accuracy,
object detection performance, text summarization quality, and processing efficiency validate the suitability of
the converter for various practical applications in digital media analysis, multimedia content management,
educational technology, and accessibility enhancement.
Looking ahead future research and directions include [potential areas for improvement or extension], such as
[list of future research directions]. By addressing these challenges and advancing the state-of-the-art in video-
to-text conversion technology, we can further enhance the accessibility, usability, and utility of digital video
content for diverse user populations and applications.
ACKNOWLEDGEMENTS
We would like to take this opportunity to thank all the people who were part of this seminar in numerous ways,
people who gave un-ending support right from the initial stage.
We wish to thank Prof. Trupti Ghate as an internal project guide who gave their co-operation timely and
precious guidance without which this project would not have been a success. We thank them for reviewing the
entire project with painstaking efforts and more of his, unbanning ability to spot mistakes. We would like to
thank our Prof. Jagruti Ghatole (HoD) for her continuous encouragement, support, and guidance at every stage
of the project.
And finally, we would like to thank all my friends who were associated with me and helped me in preparing my
project. The project named “Video to Text Converter” would not be possible without the extensive support of
people who were directly or indirectly involved in its successful execution.
VI. REFERENCES
[1] "Transcription Functions | Transcriber". General Transcription Functions and Conventions, Audio
Transcriptions. 2017-06-08. Retrieved 2019-02-15.
[2] Bhatt, Medha. "What is AI Transcription? Everything You Need to Know". fireflies.ai. Retrieved 3 June
[3] "Use Live Transcribe - Android Accessibility Help". support.google.com. Retrieved 2021-06-14.
[4] Butler, Sydney (2019-12-09). "How to transcribe speech using Google's Live Transcribe app".
9to5Google. Retrieved 2021-06-14.
[5] "Google Chrome's new Live Caption feature will transcribe speech in videos". techxplore.com. Retrieved
2021-06-14.
[6] "Now you can transcribe speech with Google Translate". Google. 2020-03-17. Retrieved 2021-06-14.
[7] Krasnoff, Barbara (2020-08-14). "How to use Google's free transcription tools". The Verge. Retrieved
2021-06-14.
[8] "Live Transcribe & Sound Notifications - Apps on Google Play". play.google.com. Retrieved 2021-06-14.
[9] Golla, Ramsri Goutham (2023-03-06). "Here Are Six Practical Use Cases for the New Whisper API". Slator.
Archived from the original on 2023-03-25. Retrieved 2023-08-12.

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science


[11234]

You might also like