International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal ) Volume:06/Issue:04/April-2024 Impact Factor- 7.868 www.irjmets.com VIDEO TO TEXT CONVERTER Ajay N. Tembhare*1, Aastha P. Godange*2, Mohit S. Tondre*3, Pritam J. Satpute*4, Sameer S. Selokar*5, Vaibhav Tembhurkar*6 *1,2,3,4,5,6Students, Department of Computer Science and Engineering, Guru Nanak Institute of Technology, Nagpur, Maharashtra, India. DOI : https://www.doi.org/10.56726/IRJMETS54631 ABSTRACT The 'Video to Text Converter' project aims to develop an automated system capable of converting spoken words in video content into textual transcripts efficiently. Leveraging advanced speech recognition and natural language processing technologies, the system processes video content to extract audio tracks, which are then transcribed into text using deep learning models. Post-processing techniques, including punctuation insertion and spell checking, enhance transcription accuracy. The system supports English transcription and finds applications in education, law enforcement, and content creation industries. Overall, the project addresses the growing demand for tools that make video content more accessible and searchable, offering valuable benefits across various domains. Keywords: Video processing, Speech recognition, Natural language processing, Automated transcription, Deep learning models etc. I. INTRODUCTION A video to text converter is a software tool or system that automatically transcribes spoken audio content from a video file into written text. This technology utilizes speech recognition algorithms to analyze the audio track of the video and convert it into a textual format. The resulting text can then be edited, searched, indexed, or used for various purposes such as creating subtitles, generating transcripts for accessibility purposes, or extracting information from video content for analysis or documentation. The importance of video to text converters extends across various domains and applications. In educational settings, these converters facilitate the creation of transcripts for instructional videos and lectures, enhancing learning outcomes by providing searchable and indexed textual content. In the realm of digital marketing, they play a vital role in improving search engine optimization (SEO) efforts by making video content more discoverable through indexed transcripts. Moreover, video to text converters are invaluable tools for content analysis, allowing researchers, marketers, and content creators to extract valuable insights from video content. Techniques such as sentiment analysis, keyword extraction, and topic modeling can be applied to video transcripts to glean actionable information and trends. In addition to their utility in accessibility and content analysis, video to text converters also have significant implications in legal and compliance contexts. Transcripts generated by these converters serve as official records in legal proceedings, compliance audits, and regulatory requirements, ensuring accuracy and accountability. II. METHODOLOGY For a 'Video to Text Converter’ project, you'll need a combination of tools and platforms for various tasks such as data preprocessing, model development, training, evaluation, and deployment. Here's a list of commonly used tools and platforms for each stage of the project: 1. Data Collection and Preprocessing: Library Speech: A popular dataset for training speech recognition models, containing read speech from audiobooks. 2. Model Development and Training: An open-source machine learning framework developed by Google, widely used for building and training deep learning models, including speech recognition models. 3. Model Evaluation and Testing: Various open-source tools are available for calculating Word Error Rate (WER) and other evaluation metrics for assessing the performance- of speech recognition models. 4. Deployment and Integration: A system for serving machine learning models in production, including speech recognition models trained using TensorFlow.
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[11232] e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science ( Peer-Reviewed, Open Access, Fully Refereed International Journal ) Volume:06/Issue:04/April-2024 Impact Factor- 7.868 www.irjmets.com 5. Development Environments: Interactive computing environment for developing and prototyping machine learning models, including speech recognition models. 6. Version Control and Collaboration: Distributed version control system for tracking changes to code and collaborating with team members on the development of speech recognition models. III. MODELING AND ANALYSIS Flow-Chart: Video to Text Converter.
IV. RESULTS AND DISCUSSION
1. First, we run the Python file, it will launch a window that looks just the UI we created. 2. When a user clicks on the menu item, we launch a file dialog box for the user to select the appropriate file. 3. Once converted into audio format, we can now start the transcription. Here we get the filename from the text the user entered the output file name text box. 4. The image below shows the final version. The progress bar updates to show the transcription progress. When complete we load the text file contents into the text area, which automatically adds scroll bars if needed.
Figure 1: Video To Text Converter
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science [11233] e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science ( Peer-Reviewed, Open Access, Fully Refereed International Journal ) Volume:06/Issue:04/April-2024 Impact Factor- 7.868 www.irjmets.com V. CONCLUSION In conclusion, this research presents a comprehensive study on the development and evaluation of a video-to- text converter using Python programming language. The implemented converter leverages advanced machine learning and natural language processing techniques to accurately transcribe spoken dialogue and extract meaningful textual representations of visual content from videos. The experimental results obtained from evaluating the converter demonstrate its effectiveness, reliability, and real-world applicability in accurately converting videos to textual format. The achieved transcription accuracy, object detection performance, text summarization quality, and processing efficiency validate the suitability of the converter for various practical applications in digital media analysis, multimedia content management, educational technology, and accessibility enhancement. Looking ahead future research and directions include [potential areas for improvement or extension], such as [list of future research directions]. By addressing these challenges and advancing the state-of-the-art in video- to-text conversion technology, we can further enhance the accessibility, usability, and utility of digital video content for diverse user populations and applications. ACKNOWLEDGEMENTS We would like to take this opportunity to thank all the people who were part of this seminar in numerous ways, people who gave un-ending support right from the initial stage. We wish to thank Prof. Trupti Ghate as an internal project guide who gave their co-operation timely and precious guidance without which this project would not have been a success. We thank them for reviewing the entire project with painstaking efforts and more of his, unbanning ability to spot mistakes. We would like to thank our Prof. Jagruti Ghatole (HoD) for her continuous encouragement, support, and guidance at every stage of the project. And finally, we would like to thank all my friends who were associated with me and helped me in preparing my project. The project named “Video to Text Converter” would not be possible without the extensive support of people who were directly or indirectly involved in its successful execution. VI. REFERENCES [1] "Transcription Functions | Transcriber". General Transcription Functions and Conventions, Audio Transcriptions. 2017-06-08. Retrieved 2019-02-15. [2] Bhatt, Medha. "What is AI Transcription? Everything You Need to Know". fireflies.ai. Retrieved 3 June [3] "Use Live Transcribe - Android Accessibility Help". support.google.com. Retrieved 2021-06-14. [4] Butler, Sydney (2019-12-09). "How to transcribe speech using Google's Live Transcribe app". 9to5Google. Retrieved 2021-06-14. [5] "Google Chrome's new Live Caption feature will transcribe speech in videos". techxplore.com. Retrieved 2021-06-14. [6] "Now you can transcribe speech with Google Translate". Google. 2020-03-17. Retrieved 2021-06-14. [7] Krasnoff, Barbara (2020-08-14). "How to use Google's free transcription tools". The Verge. Retrieved 2021-06-14. [8] "Live Transcribe & Sound Notifications - Apps on Google Play". play.google.com. Retrieved 2021-06-14. [9] Golla, Ramsri Goutham (2023-03-06). "Here Are Six Practical Use Cases for the New Whisper API". Slator. Archived from the original on 2023-03-25. Retrieved 2023-08-12.
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science