Minor Project Fiii - Merged
Minor Project Fiii - Merged
Minor Project Fiii - Merged
Submitted by
KUMAR ABHISHEK [Reg. No.: RA2011003011171]
SHRIMAYI MATANHELIA [Reg. No.: RA2011003011141]
Under the Guidance of
Certified that this B.Tech. Minor project report titled “NEWS APPLICATION USING VOICE
PROMPT” is the bonafide work of Kumar Abhishek (RA2011003011171) and Shrimayi Matanhelia
(RA2011003011141) who carried out the project work under my supervision. Certified further, that to
the best of my knowledge the work reported herein does not form part of any other thesis or dissertation
based on which a degree or award was conferred on an earlier occasion for this or any other candidate.
Dr. M. PUSHPALATHA
HEAD OF THE DEPARTMENT
Department of Computing Technologies
Department of Computing Technologies
SRM Institute of Science and Technology
Own Work Declaration Form
I / We hereby certify that this assessment compiles with the University’s Rules and Regulations relating
to Academic misconduct and plagiarism**, as listed in the University Website, Regulations, and the
Education Committee guidelines.
I / We confirm that all the work contained in this assessment is my / our own except where indicated, and that
I / We have met the following conditions:
I / We understand that any false claim for this work will be penalized in accordance with the University
policies and regulations.
DECLARATION:
I/We am/are aware of and understand the University’s policy on Academic misconduct and
plagiarism and I /we certify that this assessment is my / our own work, except where indicated
by referring, and that I/we have followed the good academic practices noted above.
If you are working in a group, please write your registration numbers and sign with the date for
every student in your group.
ACKNOWLEDGEMENT
We extend our sincere thanks to Dean-CET, SRM Institute of Science and Technology, Dr.
T. V. Gopal, for his invaluable support.
We wish to thank Dr. Revathi Venkataraman, Professor and Chairperson, School of
Computing, SRM Institute of Science and Technology, for her support throughout the
project work.
We want to convey our thanks to our Project Coordinators, Dr. S. Godfrey winster, Dr. S.
Nalini, Mr. G. Manoj Kumar, Dr. M. Kandan and Dr. A. Arul murugan, Department of
Computing Technologies, SRM Institute of Science and Technology, for their inputs
during the project reviews and support.
Our inexpressible respect and thanks to our guide, Mr. G. Manoj Kumar, Assistant
Professor, Department of Computing Technologies, SRM Institute of Science and
Technology, for providing us with an opportunity to pursue our project under his
mentorship. He provided us with the freedom and support to explore the research topics
of our interest. His passion for solving problems and making a difference in the world has
always been inspiring.
We sincerely thank all the staff and students of the Computing Technologies Department,
School of Computing, S.R.M Institute of Science and Technology, for their help during
our project. Finally, we would like to thank our parents, family members, and friends for
their unconditional love, constant support and encouragement.
Abstract i
List of Tables ii
List of Figures ii
List of Symbols and Abbreviations iii
1 INTRODUCTION 1
1.1 General 1
1.2 Purpose 2
1.3 Scope 2
1.4 Text To Speech (TTS) And Speech To Text (STT) 3
1.5 Natural Language Processing 5
1.6 ALAN AI 6
1.7 Motivation 7
2 LITERATURE REVIEW 8
3 PROPOSED METHODOLOGY 15
Revolutionizing Accessibility: The Voice-Powered News
3.1 Portal 15
3.2 A User-Centric Approach 15
3.3 The Power of Voice Interaction 15
3.4 Real-Time News Retrieval 16
3.5 Effective Architecture Over Traditional Systems 16
3.6 Modules Used 20
4 RESULT 21
5 CONCLUSION 25
6 FUTURE SCOPE 26
7 REFERENCES 28
APPENDIX 32
ABSTRACT i
The Voice-Powered News Portal stands as a pioneering endeavour, dedicated to bridging the
information chasm for visually challenged individuals, thereby endowing them with
unrestrained entry to the most recent news, thus empowering their active participation in the
news realm. It leverages state-of-the-art voice recognition and synthesis technologies, proffering
an instinctive platform for tailored news dissemination. The amalgamation of Speech-to-Text
(STT) and Text-to-Speech (TTS) technologies, fortified with robust external Application
Programming Interfaces (APIs), facilitates precise command interpretation, real-time news
retrieval, and seamless audio narration. Employing advanced web scraping techniques, it
aggregates the most up-to-the-minute news data from a myriad of sources, ensuring unwavering
user enlightenment. This technology ushers in a more engaging approach to information
acquisition while markedly reducing the manual efforts hitherto required from users. The user
interface provided by this system is dynamic, user-friendly, and didactic. By empowering users
to comment on news stories and share them on social platforms, the platform actively fosters
involvement. Its mission is to advance equity and inclusivity in the digital age, envisioning a
future where visual impairment ceases to be a hindrance to critical information access. The
Voice-Powered News Portal symbolizes a more egalitarian and accessible world, where
everyone can stay informed and be part of society's broader discourse. A key advantage of this
proposed system lies in its adaptability, as voice recognition can be applied to a multitude of
devices that consumers interact with, spanning from smart TVs and watches to laptops,
transcending the confines of mobile phones, laptops, or PCs.
Keywords- Voice powered news portal, voice recognition, speech to text (STT), text to
Speech (TTS), API, news
LIST OF TABLES ii
LIST OF FIGURES
1. AI – Artificial Intelligence
2. STT – Speech to Text TTS
3. Text to Speech
4. ASR – Automatic Speech Recognition
5. MFCC – Mel – frequency cepstral coefficients
6. CNN – Convolutional neural networks
7. RNN – Recurrent neural networks
8. HMM – Hidden Markov Models
9. NLP – Natural Language Processing
10. MT-KD – Multi – educator knowledge condensation
11. HLC – Hybrid lightweight convolution
12. SC-CNN – Sparse – coding convolutional neural networks
13. GUI – Graphical User Interface
14. API – Application Programming Interface
15. UI – User Interface
16. UX – User Experience
17. JSON – JavaScript Object Notation
1
CHAPTER 1
INTRODUCTION
1.1 General
This project revolves around the conception of a "Voice-powered News Web Platform"
designed to provide a speech-driven or text-based operational interface for a personal
assistant. The primary objective is to directly engage with the audience, disseminating
real-time updates and insights on various aspects of people's endeavours. Moreover, it
promises to optimize daily productivity by conserving valuable work hours and furnishing
tailored news alerts in alignment with users' subscribed news categories. With the utterance
of words, it allows users to navigate the EC website and access articles accompanied by
corresponding videos.
A pivotal technology gaining traction across an expanding array of devices is voice control.
Within this research, we propose a sophisticated deep learning-powered voice assistant for
news updates that is proficient in discerning human behaviour. Voice, fundamentally a form
of interpersonal communication, is synonymous with Automatic Speech Recognition. In our
fast paced lives, staying informed about global events demands more than reading
traditional print media like newspapers and magazines. Every generation faces increasingly
demanding survival challenges.
Through the synergy of natural language processing and established methodologies, this
system can analyse audio input and furnish articulate responses in electronic vocal form.
The program transmits audio data to the Alan AI Cloud servers, where it undergoes
comprehensive evaluation to generate precise outcomes. Our voice assistant streamlines the
process of accessing news and headlines, while our web-based platform transforms news
consumption into an engaging and immersive experience. Being aware of current events
necessitates sifting through news content available in various formats, including traditional
print media, online news outlets, e-commerce websites, multimedia sources, and even
games. Nonetheless, none of these approaches offer the convenience and accessibility that
Voice-Enhanced News provides for retrieving your news through voice-directed
interactions.
2
1.2 Purpose
The Voice-Powered News Portal project embarks on an audacious journey to obliterate the
formidable barriers that have encumbered visually challenged individuals in their quest for
information and engagement. Its paramount objective is to bestow upon them an egalitarian
entrance into the realm of knowledge, untethered by the limitations of their visual faculties.
This initiative seeks not merely to bestow passive consumption of news but to actively
embolden participation in the discourses, deliberations, and determinations that shape our
societal tapestry. It is steeped in the tenets of inclusivity, accessibility, and knowledge
equity, an unflinching commitment to manifest a digital sphere where every individual's
voice reverberates not as a faint whisper, but as a resounding celebration of their intrinsic
worth. The project stands as a poignant embodiment of the notion that technology, when
harnessed for the collective betterment, possesses the alchemical ability to dismantle the
entrenched citadels of disparity, thus unfurling novel avenues for those who have long
languished on the peripheries of societal attention. Through its user-centric architectural
finesse, the orchestration of advanced technological symphony, and the perennial dedication
to user assistance and insight assimilation, the Voice-Powered News Portal aspires to
radiate as a luminary of empowerment, an advocate for the unheard, and an architect of
transformative social change for the visually challenged community. The project aims to
develop an intuitive user interface, which, though devoid of the visual cues prevalent in
mainstream applications, shall be meticulously crafted to prioritize auditory and tactile
interactions. Through this interface, visually impaired users can comfortably register,
authenticate, and personalize their news preferences, including topics of interest and
language choices.
1.3 Scope
User-Centric Accessibility: The project aims to create a user-centric platform that is highly
accessible to visually challenged individuals, ensuring that they can effortlessly access,
3
navigate, and interact with news content using voice commands and synthesized audio. The
design prioritizes inclusivity, ease of use, and user empowerment.
Data Quality and Reliability: A foundational pillar of the project is the creation of an
elaborate data management citadel; wherein real-time news content is meticulously curated
from reputable sources.
Future Scopes: The project anticipates future expansion and enhancement, including
features such as advanced personalization, multilingual support, integration with IoT
devices, and collaboration with educational institutions, among other possibilities.
Social Impact: Beyond technical functionality, the project aims to foster social inclusion,
knowledge equity, and empowerment for visually challenged individuals. It will actively
advocate for the rights and needs of this community, striving to create a positive societal
impact.
TTS, also known as speech synthesis, is a technology that converts written text into audible
speech. It plays a crucial role in making digital content more accessible to individuals with
visual impairments and those who prefer auditory interfaces. The process of converting text
to speech involves several key components: text analysis, where the input text is analyzed
4
for sentence boundaries and linguistic structure; text preprocessing, which includes
cleaning, formatting, and enhancing text quality through tasks like tokenization and
part-of-speech tagging; phonetic and linguistic analysis that determines word pronunciation
based on linguistic rules and prosodic elements; acoustic modeling, essential for generating
sound waves based on recorded human speech; and synthesis, where the TTS system
combines linguistic and acoustic information to produce speech, utilizing methods like
concatenative synthesis or parametric synthesis, with the naturalness of the output varying
based on system sophistication and acoustic model quality.
Few applications of TTS include accessibility, it is a crucial tool for people with visual
impairments, as it allows them to access written content as they can hear the written content
by using tts; voice assistants, popular voice assistants rely on TTS to provide human-like
response to user queries.
STT, also known as automatic speech recognition (ASR), is the counterpart of TTS
technology. STT converts spoken language into written text, enabling machines to
understand and process human speech.
The STT process encompasses several critical steps: it begins with audio input, where STT
systems receive spoken language from sources like microphones, phone calls, or audio
recordings. Subsequently, acoustic feature extraction is employed to process the audio
signal, extracting relevant acoustic characteristics such as spectrograms or mel-frequency
cepstral coefficients (MFCCs) to represent speech. An acoustic model, typically utilizing
deep learning techniques like convolutional neural networks (CNNs) and recurrent neural
networks (RNNs), is then employed to recognize phonemes, words, or other speech units
based on these acoustic features. Additionally, a language model factors in linguistic
context and grammar to decode recognized phonemes or words into coherent sentences.
Language models can be built on ngrams, Hidden Markov Models (HMMs), or more
advanced methods such as transformer models. The final result of the STT process is
transcribed text, providing a written representation of the originally spoken content.
Few applications of STT include voice search, it is used on search engines and mobile
devices relies on STT technology to understand queries given by any user; voice assistants,
voice assistants use STT to transcribe user commands and queries for further processing.
5
Both TTS and STT technologies encounter a variety of challenges. Achieving naturalness
and accuracy in speech synthesis and recognition remains a persistent challenge,
encompassing the complexity of capturing human speech nuances like intonation, emotion,
and dialects. Moreover, linguistic variation across different languages and dialects presents
difficulties in developing TTS and STT systems that are effective on a global scale, given
the unique phonetic and prosodic patterns. Additionally, the presence of background noise
and acoustic variations in real-world scenarios can compromise the accuracy of STT
systems. Furthermore, enhancing emotional expressiveness in synthesized speech and
recognizing emotions in spoken language continue to be active areas of research. Lastly, in
applications featuring avatars and virtual characters, synchronizing lip movement and facial
expressions with synthesized speech poses a complex task.
Natural Language Processing (NLP) is a field of artificial intelligence that focuses on the
interaction between computers and human language. It encompasses a range of techniques
and technologies used to enable machines to understand, interpret, and generate human
language in a valuable way. NLP involves various tasks, such as text and speech
recognition, language translation, sentiment analysis, and chatbots.
6
NLP applications are pervasive in our daily lives, from voice assistants like Siri and
chatbots in customer service to language translation services and social media sentiment
analysis. NLP systems use machine learning algorithms and linguistic rules to analyze and
process natural language data. Its advancements have been driven by the growth of big data,
more powerful computing, and sophisticated deep learning models like transformers. NLP
continues to evolve, holding great promise for improving communication between humans
and machines, automating content analysis, and facilitating better decision-making in
numerous industries.
1.6 ALAN AI
Distinguished by its prowess, the company's technology adeptly comprehends and responds
to user commands, ushering in a more organic and instinctual user experience. Alan AI's
platform transcends industry boundaries, finding utility in healthcare, e-commerce,
customer service, and the realms of entertainment, while underscored by a steadfast
commitment to userfriendliness, developer-centricity, and scalability. Its visionary solutions
portend a paradigm shift, ushering in a more seamless and efficient era of human-machine
interactions, ultimately enhancing productivity and accessibility across a multitude of
sectors.
1.7 Motivation
Social Inclusion: Staying informed about current events is a fundamental aspect of social
inclusion. When visually impaired individuals have the same access to news as their sighted
counterparts, they can actively participate in conversations, discussions, and social
interactions. This project fosters social inclusion and a sense of belonging.
Advocacy and Empowerment: This project serves as an advocate for the visually impaired
community, demonstrating that technological solutions can break down barriers. It
empowers users to advocate for their rights and needs, promoting a more inclusive and
understanding society.
Positive Social Impact: Beyond the immediate benefits to visually impaired users, the
VoicePowered News Portal has the potential to create a positive ripple effect by raising
awareness about accessibility and inspiring similar initiatives. The motivation is to catalyze
positive social change and foster a more inclusive world.
8
CHAPTER – 2
LITERATURE SURVEY
In [1], An investigation into unraveling the strategy for knowledge transmission and the
multieducator knowledge condensation (MT-KD) system confronts the issue of experiential
prejudice in neural TTS. The findings underscore that MT-KD surpasses data augmentation
and adversarial instruction in terms of effectiveness. Furthermore, a two-teacher knowledge
condensation approach outperforms the conventional single-teacher techniques. The
MT-KD technique displays its prowess in the context of the GST-Tacotron network design,
with the objective of transferring wisdom from a previously trained instructor to a fledgling
pupil model, thus mitigating the influence of experiential prejudice. Prospective
undertakings will delve into the utilization of pre-trained models in multi-round
decipherment.
In [3], The IBM Expressive Speech Synthesis System has been amplified through the
incorporation of two auspicious methodologies. Auditory assessments demonstrate that the
corpus-oriented strategy exhibits potential in particular forms of expressions, thereby
igniting deeper scrutiny of each algorithm. A direct juxtaposition of the corpus-driven and
prosodic phonologicaltechniques across an extensive gamut of expressions holds the
promise of assessing the significance of incorporating voice quality modeling into
9
emotional expressions. However, the results from such comparisons hinge on the scale of
the expressive databases and the volume of ToBI-labeled data employed. The amalgamation
of these two methodologies could furnish a more holistic comprehension of their efficacy.
In [4], Within this document, the hybrid lightweight convolution (HLC) is introduced, with
a specific focus on the interdependencies within a confined context scope. Integration into
the standard Transformer network compensates for the shortage of local data in the domain
of TTS. A comprehensive array of experiments serves to substantiate the viability of the
proposed technique, resulting in noticeable enhancements in the performance of TTS
systems. HLC exhibits its adaptability in multi-tier TTS systems, entailing external
attention alignments and tasks associated with sequence modeling. These aspects are slated
for exploration in upcoming research endeavors.
In [5], The SC-CNN stands as a novel technique for conditioning speakers in ZSM-TTS,
harnessing 1-D convolutional processes to depict the phonetic micro-environment, rooted in
speaker embeddings. Its superiority over established methodologies for speaker
conditioning within contemporary systems is evident. Subsequent research will concentrate
on the synthesis of emotional speech and diverse expressive manners to evaluate its efficacy
in managing the nuances of emotional styles.
methods adopted by online video providers. Notably, the demonstration highlights the
enhancements brought to YouTube, a major Chinese online video platform. The primary
objective is to exhibit the tangible enhancements in user experience delivered by the
HTTP-CCN gateway, thereby fostering an upsurge in CCN traffic and lending support to
advanced CCN research.
In [7], In this document, the challenge of maintaining fairness between HTTP/1.1 and
HTTP/2 sessions in real-world network scenarios is examined. Despite HTTP/2's capacity
to improve web page performance, it lacks the ability to deliver even-handed throughput for
concurrent sessions.
In [8], This document unveils algorithms and techniques tailored to enhance the scalability
of REST APIs within hypertext-driven navigation systems. It introduces the Petri-Net-based
REST Chart framework, a collection of design patterns governed by hypertext, and an
innovative differential caching mechanism. These strategies found successful application in
the development of a RESTful interface for a northbound SDN API in cloud computing
with OpenStack. They effectively address prior limitations in design and performance. The
proposed hypertext-driven REST API methodology facilitates seamless migration between
RESTful SDN APIs without interrupting service execution, a pivotal trait for extensive
distributed systems. The differential stratified cache mechanism contributes to heightened
system efficiency, as demonstrated by performance evaluations showcasing a 66%
reduction in overhead related to hypertext-driven navigation and response times under 20
ms in tested networking applications.
In [9], The anticipated outcomes of the remedy displayed a notable 33% reduction in
retrieval duration following the incorporation of an extra API key, and this variance gains
prominence as the volume of inquiries mounts. The practical application underwent
examination across datasets encompassing information for one to twenty participants out of
a multitude of players, potentially numbering in the thousands.
11
In [10], The assessment prototype for REST web services enables comprehensive validation
of APIs across multiple dimensions, employing both assertion and script-based
methodologies. It extends its support to test suites of varying complexity levels, executing
test cases in tandem with the program's operation. Furthermore, it facilitates data
transmission and conditional execution by aligning the test tool model with the program
execution language. The descriptive language utilized by the model exhibits a high degree
of expressiveness, enabling the automatic generation of test cases through the inter face.
In [11], The REST API presents a substitute approach for facilitating data interchange
spanning diverse platforms. This investigation seeks to undertake a future analysis, with the
objective of contrasting REST APIs utilizing varied data structures such as XML and
TEXT.
In [12], In this research, two microservice structures were formulated employing REST and
Graph QL technologies. The REST-compatible Ocelot portal established connections with
REST services, whereas the Graph QL-driven system integrated Hot Chocolate via schema
stitching. The REST gateway demonstrated its superiority in amalgamating responses,
leading to swifter reaction times and enhanced data transfer capacity. Nonetheless, its
implementation necessitated more extensive labor. The constraints of this study stem from
the confinement of testing within a localized environment, thereby influencing the observed
outcomes.
In [13], The primary aim of this research was to pinpoint features of Graphic User Interface
(GUI) design that had undergone scrutiny among non-medical professionals and to gauge
their relevance in the context of GUI prerequisites for physicians. As per the accessible
data, medical practitioners exhibit a preference for GUI configurations that diverge from the
ones commonly provided by medical software applications. The research suggests that
integrating medical symbols, favored list structures, and screen intricacy could potentially
facilitate the adoption of PDAs by physicians for medical purposes.
12
In [14], The graphical user interface opens up an array of opportunities for individuals,
obviating the necessity to commit lengthy commands to memory and then input them into
the computer system devoid of errors. The significance of semantics stands out as a key
factor contributing to the success of the interface. The utilization of semantics is far from a
recent concept, given its integration into human existence dating back to the inception of
computing. In their work "Metaphors We Live By," Lakoff and Johnson posit that the
languages employed and our comprehension of the world are inexorably linked to the
fundamental semantic framework derived from the tangible world that surrounds us.
In [15], The objective of this investigation was to juxtapose the efficacy of explicit
feedback against user profile alteration paired with implicit feedback. Nevertheless, the
experimental system did not exhibit a notable superiority over the implicit feedback
mechanism, signifying that the implicit feedback system suffices in terms of efficiency to
rival the explicit counterpart. Furthermore, it came to light that altering the user model had
an impact on the system's operation. The performance of the system remained consistent
among users who introduced minimal adjustments to their user profiles but deteriorated
significantly when substantial alterations were made. This finding implies that user model
manipulation should be approached with caution to avert performance degradation. In sum,
the study posits that modifying user profiles can indeed serve as a viable means to enhance
system performance.
In [16], This study delves into the perceived cognitive burden experienced by news
consumers as they engage with news content employing diverse typographic styles and text
hues. The findings reveal substantial associations between font presentation techniques and
the cognitive load reported by users. For instance, employing italics and employing red
lettering for key terms can alleviate cognitive strain and enhance reading efficiency. These
discoveries hold relevance for intelligent media interfaces, enabling the automatic
adaptation of news text presentation modes in response to users' cognitive workload,
thereby amplifying the efficacy of news communication.
13
In [17], This research advocates for the incorporation of CoAP, a pivotal application
protocol within the Internet of Things, into web-centric applications. CoAP encounters
limitations with conventional web browsers stemming from its inherent design,
characterized by its reliance on UDP socket associations and two-way communication. An
alternative approach involves the adoption of a novel bidirectional web protocol akin to
HTML5 Web Sockets, which facilitates genuine CoAP interactions within web browsers.
The experimental outcomes vividly underscore the pronounced benefits concerning network
traffic and computational requirements when compared to the conventional HTTP/CoAP
proxy.
Future efforts will be dedicated to training TTS models across an array of noise categories
and implementing few-shot learning approaches for speakers dealing with noise.
CHAPTER 3
PROPOSED METHODOLOGY
The core essence of the Voice-Powered News Portal is its unwavering commitment to a
usercentric approach. The platform is a testament to inclusive design, with an intuitive
interface that champions accessibility and ease of use. It transcends mere functionality to
create an immersive and empowering user experience. Users can register browse through a
wide range of categories and hence therefore, personalize their news preferences according
to their interests and choice. This unprecedented level of customization ensures that users
are not just recipients of news but active participants in their information journey. The
user-centric design isn't a mere feature; it's the driving force behind the platform's
transformational impact. It transforms the visually challenged user into an empowered,
informed individual, enhancing their autonomy and their sense of belonging in the digital
world.
Central to the mechanism of the Voice-Powered News Portal is the revolutionary integration
of Speech-to-Text (STT) and Text-to-Speech (TTS) technologies, the twin pillars of voice
interaction. This technological synergy is not just a feature; it's the lifeline of the platform.
16
At the heart of the platform's mechanism is the ability to provide real-time news retrieval.
This goes beyond from being just a function and turning into being the engine of
transformation. The platform acts as a conduit to the latest developments in the world. It
sources news data from a rich tapestry of credible and reliable sources, employing
cutting-edge web scraping techniques and harnessing the capabilities of Application
Programming Interfaces (APIs). The platform's content pool is a dynamic reflection of the
world's evolving narratives, providing users with a real-time window to current events. Yet,
what truly sets the platform apart is its unwavering commitment to quality. A meticulous
data verification process stands as a sentinel against inaccuracies and unreliable
information. It upholds the platform's mission to be not just a source of news but a source of
reliable and high-quality knowledge. The mechanism for realtime news retrieval is not just
a convenience; it's a statement of credibility and reliability. It instills in users the confidence
that they are accessing news of the highest standard, an assurance that empowers them to be
active participants in the global conversation.
The architecture of the Voice-Powered News Portal is a stark departure from traditional
systems. Traditional systems often prioritize visuals and rely on screen readers, which can
be cumbersome and less intuitive for the visually challenged. In contrast, the
Voice-Powered News Portal's architecture is anchored in a user-centric design that
transcends mere accessibility; it's a platform that's inherently inclusive and easy to use. The
integration of Speech-to-Text (STT) and Text-to-Speech (TTS) technologies is a
17
monumental leap over traditional systems that rely on static text-to-speech conversions. The
Voice-Powered News Portal's dynamic TTS system delivers news in a coherent and
human-like manner, enhancing comprehension and making the user experience more
engaging. Moreover, the mechanism for real-time news retrieval isn't
merely an upgrade; it's a revolution. Traditional systems often rely on static news databases,
whereas the Voice-Powered News Portal's mechanism keeps users connected to the pulse of
global events in real time. Moreover, the light-weight nature of the project and the fact that
there is little to no dependencies required to run the application makes it even more
approachable and attempts to overcome the hinderance of less-aware and tech-illiterate
audience segment.
The Voice-Powered News Portal embodies the values of inclusivity, empowerment, and
credibility. The user-centric approach, the transformative power of voice interaction, and
the real-time news retrieval mechanism, coupled with its architecture, align to revolutionize
accessibility for visually challenged individuals. It does not just provide news; it provides a
voice, a sense of autonomy, and a gateway to knowledge. The platform's commitment is not
limited to technology; it's a commitment to inclusivity and a brighter future for all.
Command Interpreter:
• Receives the text command from the STT API.
• Maps the command to specific actions based on predefined patterns.
• Triggers the appropriate functionality, such as news retrieval or article reading.
commitment to user-centric design. The presentation tier serves as the user's point of
interaction with the platform. It boasts an intuitive, accessible, and responsive user interface
designed to empower users with visual impairments. This tier is meticulously crafted to
prioritize accessibility, ensuring that users can seamlessly register, authenticate, and
personalize their news preferences. With elements that adhere to accessibility standards
such as WCAG (Web Content Accessibility Guidelines), it forms the portal's gateway for
visually challenged individuals to the digital news ecosystem.
The TTS system in the middleware is not just another automated reader; it's a sophisticated
synthesis of audio that mirrors human speech. It goes beyond the mechanical enunciation of
text to present content in a manner that is coherent, natural, and engaging. The TTS system
is equipped with a rich linguistic database, enabling it to pronounce words and phrases with
nuanced inflections, adding depth and clarity to the user experience. It's an architectural
marvel that bridges the gap between textual information and auditory engagement. This
middleware's integration of STT and TTS is the architectural cornerstone, transforming the
Voice-Powered News Portal into an inclusive auditory gateway to the world of news.
The architectural essence of the Voice-Powered News Portal extends to the data
management layer. This is where real-time news retrieval is achieved, an intricate
mechanism that defines the platform's effectiveness. Advanced web scraping techniques are
deployed to gather data from a multitude of credible and reliable sources, forming a
real-time content pool. These sources are dynamic, ensuring that users have access to the
20
latest news developments from around the world. It's a direct channel to the current affairs
that is unparalleled in the accessibility landscape.
However, what truly sets this data management layer apart is its stringent data verification
process. This is not a static database; it's a living repository that upholds the highest
standards of quality and reliability. In an era rife with misinformation, this process acts as a
sentinel, eliminating inaccuracies and unreliable information, guaranteeing the content's
credibility. This architectural rigor ensures that users don't just receive news; they receive
accurate, credible, and high-quality information. It's not just a data management system; it's
a quality assurance mechanism that places the user's trust at the forefront.
UI/UX:
Responsible for data flow (UX) initiation, displays news card, made using JavaScript,
the framework used is ReactJS, styling done Tailwind CSS.
System API:
API implemented is Alan AI which is an open source, text to speech (TTS) and speech to
text (STT) API which is responsible for voice navigation of the page.
It is an open-source API named News API, which provides news articles in JSON format,
provides a wide range of prompt specification which ensures a great user experience.
CHAPTER 4
21
RESULTS
Through the seamless amalgamation of cutting-edge technologies, the project has actualized
an exceptionally user-centric platform. The core objective of this initiative was to provide a
dynamic, intuitive, and accessible environment for visually challenged users. This has been
magnificently achieved, as the platform now exhibits remarkable accessibility features,
including screen reader compatibility, and tactile-friendly design.
The focal point of this venture was to offer an intuitive, human-like interaction experience.
The system, having undergone intensive development, now thrives as a beacon of natural
language interaction. This accomplishment is chiefly attributed to the integration of
state-of-the-art Speech-to-Text (STT) and Text-to-Speech (TTS) technologies. The STT
component exhibits a commendable accuracy rate, facilitating seamless voice commands.
Meanwhile, the TTS engine has been fine-tuned to deliver a remarkably lifelike auditory
rendition of news articles. Users can now engage in articulate dialogues with the system,
enjoying a reading experience that is undistinguishable from human narration.
News Experiences:
The confluence of user preferences and artificial intelligence has given rise to the crux of
personalization. The system, as part of its integral design, now offers users the capability to
curate their news experiences. It employs collaborative filtering algorithms to suggest news
articles based on past preferences, thus ensuring that each user receives a tailor-made
newsfeed. Device and Network Adaptability:
22
4.2 Comparison between voice powered news portal and existing system
Aspect Voice-Powered News Portal Existing System
Accessibility Highly accessible, designed May lack
with visually challenged comprehensive accessibility
users in mind, prioritizing features, potentially
usability and inclusivity. limiting use by visually
challenged
individuals.
Table 4.2.1 – Comparison between voice powered news portal and existing system
This comparison underscores how the proposed Voice-Powered News Portal project is
tailored to address the unique needs and challenges of visually challenged users,
emphasizing accessibility, security, and user support. In contrast, the existing system may
have limitations in these areas, potentially hindering its effectiveness and inclusivity.
The implementation of the voice powered news portal is shown in the below given figures
(Figure 4.2.1 and Figure 4.2.2). The first figure [Figure 4.2.1] shows the homepage of the
news portal, when a voice command is given to the system it interprets the speech and it is
converted into speech. The second figure shows the response of the portal when it is asked
to provide news.
24
Figure 4.2.2: News cards shown by the voice powered new portal
25
CHAPTER 5
CONCLUSION
The Voice-Powered News Portal project represents an innovative and transformative step
toward ensuring accessibility, inclusivity, and empowerment for visually challenged
individuals in the digital age. With a laser focus on bridging the informational divide, this
project leverages cutting-edge voice recognition and synthesis technologies to deliver an
intuitive and user-friendly platform where visually impaired users can independently access
real-time news.
The objectives of the project, which include news curation, voice interaction, and the
integration of Speech-to-Text (STT) and Text-to-Speech (TTS) technologies, have been
meticulously realized. Users can effortlessly tailor their news experience, from selecting
topics of interest to choosing their preferred language, thereby ensuring that the news
content they receive is both relevant and accessible. Through the seamless integration of
STT and TTS, natural voice interactions are enabled, making news retrieval a
straightforward and interactive experience.
What makes this project truly significant is its potential to break down long-standing
barriers and empower the visually challenged community. It not only grants them access to
news but also facilitates their active engagement in the broader societal discourse. The
project's social sharing and feedback features further enhance user participation, fostering
inclusivity and knowledge equity.
In essence, the Voice-Powered News Portal is more than just a news platform; it is a
testament to the power of technology to level the playing field and open up new avenues for
those who have long been underserved. It envisions a future where visual impairment is not
a hindrance to staying informed, active participation, and societal integration. This project
sets the stage for a more accessible and equitable digital world, where everyone's voice is
not just heard but celebrated. It is a testament to the potential of technology to drive positive
social change and promote inclusivity.
26
CHAPTER 6
FUTURE SCOPE
Enhanced User Personalization: The platform's future lies in more intricate user
personalization. By incorporating machine learning algorithms, it can discern and adapt to
individual preferences, curating news content with remarkable precision. Users will benefit
from a tailor-made news experience, receiving content that aligns closely with their
interests, thus elevating engagement and relevance.
Multilingual Support: The project's global impact hinges on its capacity to transcend
linguistic boundaries. Expanding the platform's language support opens doors to a diverse
user base worldwide. By accommodating multiple languages, it becomes an accessible
resource for visually challenged individuals from different corners of the globe, fostering a
truly inclusive digital environment.
Integration with IoT Devices: The project's reach can extend into the realms of the Internet
of Things (IoT) by seamlessly integrating with smart devices like speakers, headphones,
and connected appliances. Users can access news conveniently through these devices,
enhancing the platform's accessibility and adaptability in diverse technological ecosystems.
Expanded Content Types: Beyond news articles, diversifying content types enriches the
user experience. The inclusion of audio versions of magazines, blogs, and educational
materials broadens the platform's utility. Users can delve into a broader spectrum of content,
enhancing its role as an educational and informative resource.
Community Features: Fostering a sense of community is pivotal. Features that enable users
to connect, share experiences, and discuss news topics empower social interaction. This not
only enhances engagement but also creates a supportive ecosystem where users can connect
with peers, furthering the platform's societal impact.
27
Voice Assistant Integration: Integration with popular voice assistants enhances accessibility.
It broadens the platform's reach, enabling users to interact with it through well-known voice
assistant systems like Siri, Google Assistant, or Amazon Alexa. This expansion amplifies
its user base and accessibility.
Content Creation Tools: User-generated content fosters inclusivity. Developing tools that
enable visually challenged users to create and publish their content, such as blogs or
podcasts, promotes a more inclusive digital environment. Users can become content
creators, adding their voices to the digital discourse.
Global Outreach: The project's global impact is anchored in its capacity to transcend
boundaries. By expanding outreach to more countries and regions, tailoring the platform to
local needs, and collaborating with local advocacy groups, it can further its mission of
global inclusivity and accessibility.
Social and Policy Advocacy: Advocating for societal change is a noble pursuit. Leveraging
the platform's influence to drive policy changes, enhance accessibility, and foster social
inclusion for visually challenged individuals is a potent scope for the future. It positions the
platform as a catalyst for broader societal transformation.
Research and Innovation Hub: Beyond functionality, the project can serve as a nucleus of
research and innovation in accessibility technology. It can actively engage with developers
and researchers, driving advances in the field and disseminating knowledge to foster
continuous improvement in digital accessibility.
28
CHAPTER 7
REFERENCES
[1] R. Liu, B. Sisman, G. Gao and H. Li, "Decoding Knowledge Transfer for Neural
10.1109/TASLP.2022.3171974.
Learning based Text-To-Speech Device for Visually Impaired People," 2023 2nd
International
Transac ons on
Audio, Speech, and Language Processing, vol. 14, no. 4, pp. 1099-1108, July 2006, doi:
10.1109/TASL.2006.876123.
Text-toSpeech via Hybrid Lightweight Convolu on," in IEEE Access, vol. 9, pp.
[5] H. Yoon, C. Kim, S. Um, H. -W. Yoon and H. -G. Kang, "SC-CNN: Effec ve Speaker
Condi oning Method for Zero-Shot Mul -Speaker Text-to-Speech Systems," in IEEE
Signal Processing Le ers, vol. 30, pp. 593-597, 2023, doi: 10.1109/LSP.2023.3277786.
29
[6] Zhaogeng Li, Jun Bi and Sen Wang, "HTTP-CCN gateway: Adap ng HTTP protocol to
Content Centric Network," 2013 21st IEEE Interna onal Conference on Network
Protocols
[8] L. Li, W. Chou, W. Zhou and M. Luo, "Design Pa erns and Extensibility of REST API
for Networking Applica ons," in IEEE Transac ons on Network and Service
Management, vol.
REST requests," 2017 12th Interna onal Conference for Internet Technology and
Secured
10.23919/ICITST.2017.8356445.
[10] H. Wenhui, H. Yu, L. Xueyang and X. Chen, "Study on REST API Test Model
Supporting
Web Service Integration," 2017 ieee 3rd international conference on big data security on
cloud
(hpsc), and ieee international conference on intelligent data and security (ids), Beijing,
[11] M. K. Yusof, M. Man and A. Ismail, "Design and Implement of REST API for Data
10.1109/ICIMTech55957.2022.9915098.\
[13] P. Alafaireet, "Graphic User Interface: Needed Design Characteristics for Successful
10.1109/ITICT.2006.358261.
[14] Cai Xinyuan, "Semantic transformation in user interface design," 2008 9th
International
[15] C. Wongchokprasitti and P. Brusilovsky, "NewsMe: A Case Study for Adaptive News
Systems with Open User Model," Third International Conference on Autonomic and
10.1109/CONIELECOMP.2007.88.
[16] J. Zhou, X. Miao, F. He and Y. Miao, "Effects of Font Style and Font Color in News
Text on User Cognitive Load in Intelligent User Interfaces," in IEEE Access, vol. 10,
10.1109/GLOCOM.2013.6831474.
10.1109/WiMOB.2019.8923157.
[19] R. Luo et al., "Lightspeech: Lightweight and Fast Text to Speech with Neural
10.1109/ICASSP39728.2021.9414403.
[20] C. Zhang et al., "Denoispeech: Denoising Text to Speech with Frame-Level Noise
and Signal Processing (ICASSP), Toronto, ON, Canada, 2021, pp. 7063-7067, doi:
10.1109/ICASSP39728.2021.9413934.
[21] A. Acero, "An overview of text-to-speech synthesis," 2000 IEEE Workshop on Speech
Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421),
GUI for Text-to-Speech Recognition using Natural Language Processing," 2018 2nd
APPENDIX
index.html
33
manifest.json
34
InfoCards.js
35
36
styles.js
37
NewsCard.js
38
39
styles.js
40
NewsCards.js
41
styles.js
42
App.js
43
index.css
44
index.js
styles.js
alan_code.txt
45
46
47
48
package-lock.json
49
50
51
52
package.json
a
ORIGINALITY REPORT
3 %
SIMILARITY INDEX
2%
INTERNET SOURCES
3%
PUBLICATIONS
1%
STUDENT PAPERS
PRIMARY SOURCES
1
V. Madhusudhana Reddy, T. Vaishnavi, K.
Pavan Kumar. "Speech-to-Text and Text-to-
1%
Speech Recognition Using Deep Learning",
2023 2nd International Conference on Edge
Computing and Applications (ICECAA), 2023
Publication
2
Submitted to Queensland University of
Technology
1%
Student Paper
3
Lecture Notes in Computer Science, 2006.
Publication <1 %
4
Xie, Jingming. "The design of a mobile English
learning system for higher vocational
<1 %
students", International Journal of
Information Technology and Management,
2014.
Publication
5
Submitted to Manipal University
Student Paper <1 %
6
docplayer.net
Internet Source <1 %
7
www.geeksforgeeks.org
Internet Source <1 %
8
Submitted to University of Surrey
Student Paper <1 %
9
Dimitrichka Nikolaeva. "An Elementary
Emulator Based on Speech-To-Text and Text-
<1 %
to-Speech Technologies for Educational
Purposes", 2023 XXXII International Scientific
Conference Electronics (ET), 2023
Publication
10
apkflash.com
Internet Source <1 %
11
publications.polymtl.ca
Internet Source <1 %