使用 Python 的语音助手调查

Survey on Voice Assistant Using Python

Prof. S.A Nalawade Vaishnavi Wayal Pradnya Kute Gautamyi Chougule
Guide Student, BE-IT Student, BE-IT Student, BE-IT
Jayesh Chikane
Student, BE-IT
Department of Information Technology, Dr. D. Y. Patil Institute of Technology, Pimpri, Pune, India

1. ABSTRACT Intelligence(AI) that is Voice Assistant.

Speaking which Alexa by Amazon, Siri by
In this modern era, living has become Apple etc. Speech recognition such as text-
smarter and easier with help of technologies to-speech, speech-to-text, speech-to-speech
used in home automation such as Artificial can be done by using the API’s. The
Intelligence (AI), Cloud Computing, Mobile implementation of such smart automation
Computing Voice assistant, image and voice system in now constantly increasing as well
processing etc. This project works on the it’s also evolving.
voice input to voice and text output. It’s user
friendly which’ll assist the user to perform Voice research have dominated over the text
the tasks on PC. In such times where search. It’s in-hand operation where there’s
advancement is one of the main agenda of aren’t a necessary to use hands. Its been a
development, our Voice Assistant is the highlight of this voice automation system
assistant which will interact with user as per that it can be operated from certain radio
the regional language like Marathi, English range from the user. Some of the type of
being the default language to access files Voice Assistants are:-
and websites from the system etc. Via voice 1.Intelligent Personal Assistant
the commands are taken as input by the
machine and the operations are performed 2.Automated Personal Assistant
on Windows operating System. AI 3.Virtual Digital Assistant
technology, Python and API is used to build
this application. 4.Chat bot
Our voice assistant is designed specifically
for Windows operating system using
In recent times, automation is in the boom. Python. It can perform operations such as
Wherein the interaction between the opening the file from the file manager, can
machine and the human is initiated via voice conversate using regional languages, open
assistant. As nowadays there been a new the online websites etc. It also has a
inventions in the field of Artificial graphical representation i.e. animated
presentation for voice assistant. It’s Office for Windows in 1997 and in 2003 it
designed to work more efficiently and was discontinued.
improve the way of interaction with
In 2011 Apple introduced Siri. Voice
queries, control based on gesture, focus-
tracking and natural language user interface
for answering the questions, making
3. Literature Survey recommendations and perform operations
Voice Assistant has the long history. It has by passing on the requests to as set of
been in the phase of evolution since 1880. internet services were used in Siri. With it’s
continuous use, it adapts to users' individual
In 1880 Alexander Graham Bell language usages, searches and preferences,
implemented further operations over returning individualized results.
Edison’s phonograph, which his Volta
Graphophone Company patented in 1886. In 2012, Google launched Google.
Instead of foil graphophone was used, which Google Now proactively delivered
allowed for longer recordings and higher- information to users to predict information
quality playback. Edison also developed a they might need in the form of informational
wax version of the phonograph and both cards which was based on the users’ search
devices were used primarily for dictating habits and other factors. For Android and
letters and other documents. iOS, Goggle Now was a feature of Google
search embedded in Google app.
In 1961 IBM introduced the IBM Functionality of Google Now is being used in
Shoebox, it’s the first digital speech the Google app and it’s discovery tab today
recognition tool. It recognized 16 words and also wherein it’s branding is no longer used.
digits 0 to 9. It was able perform
mathematical functions and perform speech In 2013, at annua BUILD developer
recognition. conference, Cortana was introduced by
Microsoft. Cortana is a virtual assistant
In 1972 Carnegie Mellon completed which uses the Bing search engine to
the Harpy Program. It could able to perform tasks such as setting reminders
understand about 1000 words. Harpy and answering questions for the user.
processed speech that followed pre- Depending upon the software programs and
programmed vocabulary, pronunciation and region in which its used, Cortana is currently
grammar structures. available in English, Chinese, French,
German or Italian, Portuguese, Spanish and
In 1990 Dragon launched Dragon Japanese language editions.
Dictate, the first speech recognition module
In 2014, Alexa and Amazon Echo, was
for consumers for $6,ooo (Indian currency= introduced by Amazon which was available
496.27 in current date). to prime members only. Amazon Alexa is
In 1996 Microsoft introduces Clippy. a virtual assistant technology also known as
Microsoft Clippy, it’s also known as Clippit Alexa, it’s largely based on a Polish speech
synthesizer named Ivona. It’s capabilities
and officially recognized as Office Assistant,
consists of voice interaction, music
it was an intelligent user interface for
playback, playing audiobooks, setting
Microsoft Office. It assisted the users in a alarms , streaming podcasts, providing
number of interactive ways by appearing as weather forecast, traffic news, sports news,
a visualized character on the Office making to-do lists and other real-time
applications and offering help related to information, such as news. Using itself as a
various operations of the Office Software. It home automation, it can control numerous
was made available in the Microsoft smart devices.
In 2015, Cortana on windows 10 speech synthesis and recognition. The

desktops and mobile devices was introduces proposed assistant is effective and resource
by Microsoft. While in US Amazon officially efficient, interactive and customizable and
launched Amazon Echo. Alexa Skills kit was realised protype runs on low cost, small
introduced by Amazon. sized Raspberry PI 3 device.
In 2016, voice-powered virtual AI based Voice Assistant Using
assistant app was introduced by Python(2019) by Deepak Shende, Ria
SoundHound, HOUND. SoundHound is Umahiya, Monika Raghorte, Aishwarya
grandly known for its music recognition Bhisikar, Anup Bhange stated the new
app, which listens to songs and identifies insights of natural human-machine
them. Amazon launched Amazon Echo Dot interaction, in which machine would learn
and Amazon Tap. Google introduced the how to understand the humans language. It
Google Assistant as a part of the messaging also expressed the principles of functioning
app, Allo. In the same year the virtual of voice assistants, it’s main shortcomings
assistant startup Viv was obtained by and limitations, methos of creating local
Samsung. voice assistant without using cloud services
Google launches Google Home and smart is described.
phone Google Pixel. Also Chinese Artificial Intelligent-Based Voice
manufacturer Linglong launches Echo Assistant(2020) by Subhas S, Prajwal N,
competitor DingDong. Siddesh S, Ullas A, Santhosh B stated a voice
In 2017, Samsung introduced Bixby assistant gathering the audio from the
next to Galaxy S8 devices. While thereafter microphone and get converted into text,
Google Homes was launched in UK. Google later it sent through GTTS (Google Text To
introduces multi-user support for Google Speech.). GTTS engine will convert text into
Homes; it recognizes six different voices. audio file in English language , then that
Simultaneously, Amazon introduced Echo audio sound is played using the play sound
look. In China, Baidu unveils its first package of python programming language.
consumer AI device Xiaoyu. Amazon Voice Assistant Using Python (2021)
introduces calling/messaging feature for by Nivedita Singh, Dr. Diwakar Yagyasen,
Echo devices. While on the other hand Apple Mr. Surya Vikram Singh, Gaurav Kumar,
introduces HomePod and Alibaba launches Harshit Agarwal stated a voice assistant
Genie X1 Smart Speaker. using Python which allows the user to run
These Voice Assistants was introduced on any type of command in linux without
only for smart phones, smart homes etc, but interating with keyboard. It performs basic
also was integrated in cars as well like tasks such as weather updates, stream
BMw.AI based Voice Assistants are being music, search Wikipedia, open desktop
evolved in many ways like it can be applications etc.
developed using various languages such as Survey On Smart Virtual Voice
Java and Python. Assistant(2022) by Manjusha Jadhav,
A Vision and Speech Enabled, Krushna kalyankar, Ganesh Narkhede,
Customizable, Virtual Assistant for Smart Swapnil Kharose stated natural language
Environments (2018) by Giancarlo processing algorithm that helps machines to
Iannizzotto, Lucia Lo Bello, AndreaNucita, engage in communication using natural
Giorgio Mario Gtasso stated the software human language in many forms. It also
architecture for building lightweight, vision connects to World Wide Web to provide the
and speech enabled virtual assistant for results that the user required.
smart phone and automation application. A Research Paper On Desktop Voice
complete prototype application was build Assistant(2022) by Vishal Kumar Dhanraj,
featuring a realistic graphic assistant able to Lokeshkriplani, Semal Mahajan stated
show facial expression and enabled with working of a vice assistant without using
cloud services, which will allow the 6. Methodology

expansion of devices in the future. It can
perform any kind of task in exchange of
commands given by the user without any 1] API- SAPI5
error, it will listen to the users ‘voice only
and will not be activated from environment API is an abbreviation for “Application
noise’. Programming Interface”, it’s a software
intermediary that allows two applications to
4. Aim communicate with each other. An API is a
software which can be used by other
software, to communicate with other
Voice assistants are globally used for software or even hardware. It acts as a link
performing tasks of Laptops, smartphones, between different Softwares and devices.
PC’s etc. These tasks are performed while There are so many applications using
using internet which are introduced by different technologies and programming
software firms like Google, Microsoft etc. By languages, which uses API’s to interact with
having voice assistant on-board it becomes each other.
easier to carry out the tasks without the This model uses SAPI5 API to communicate.
interference of keyboard and it can help the Microsoft developed an API, “Speech
physically challenged people to operate the Application Programming Interface”, which
mobile devices at ease. allow the use of speech recognition and
Our aim is to develop the graphically speech synthesis within Windows
presented(UI) voice assistant for windows application. It mainly has following features:
using Python programming language at Shared Recognizer, IN-proc recognizer.
backend to carry out the operations such as Grammar objects, Voice object, Audio
accessing the files and applications on users’ interfaces, User lexicon object and Object
PC following the web surfing, wherein the tokens.
accessing of files is independent of internet We’ll import API SAPI5 for communication
connection. It can communicate in regional between the user and the assistant.
language (Hindi) and English being the
default language.
2] Speech Recognition module

5. System Architecture Speech recognition, a important feature

embedded in the various applications such
as AI automation etc. This module helps to
obtain the text output from the voice input
of the user. It’s obtained through speech
recognition module which identifies the
command is whether it’s an API call, System
call or Content extraction.
3] Python Backend
The whole program is written in Python
Fig. System Architecture backend. Through speech recognition
module, Python backend work on obtaining
the output in exchange of voice input
provided by the user.

4] System Call

System Calls are the programmatic way in 8. References

which computer program requests a service
from the kernel of its’ operating system.
Provides an important and essential 1. Giancarlo Iannizzotto, Lucia Lo Bello,
interface between operating system and AndreaNucita, Giorgio Mario Gtasso.,
process. A Vision and Speech Enabled,
Customizable, Virtual Assistant for
Smart Environments, ‘978-1-5386-
5] Motion UI 5024-0/18/$31.00 ©2018 IEEE’
Good design is one of the most important 2. Deepak Shende, Ria Umahiya, Monika
aspect of app or website. It helps to make the Raghorte, Aishwarya Bhisikar, Anup
app more interactive and innovative. A good Bhange., AI based Voice Assistant
user interface helps to build better Using Python, ‘Volume 6 Issue
communication between the user and the 2|JETIR| ©2019’
3. Subhas S, Prajwal N, Siddesh S, Ullas
Motion UI is a library of Sass which helps in A, Santhosh B., Artificial Intelligent-
creating flexible UI animations and Based Voice Assistant, ‘978-1-7281-
transitions. It’s a library that has control on 6823-4/20/$31.00 ©2020 IEEE’
the transformation effects including
components of foundation. It provides set of 4. Nivedita Singh, Dr. Diwakar
pre-made effects as CSS package. Yagyasen, Mr. Surya Vikram Singh,
Gaurav Kumar, Harshit Agarwal.,
We have used Motion UI to build a Voice Assistant Using Python,
effective interaction which will run in the ‘Volume 8 Issue 2|IJIRT| © July 2021’
backend during the functioning of the
5. Manjusha Jadhav, Krushna kalyankar,
instructions or commands given by the Ganesh Narkhede, Swapnil Kharose.,
user. Survey On Smart Virtual Voice
Assistant, Volume:09 Issue:01|Jan
7. Algorithm
6. Vishal Kumar Dhanraj,
Lokeshkriplani, Semal Mahajan.,
1] Take voice commands from user Research Paper on Desktop Voice
Assistant, Volume 10 Issue
2] Display commands using speech- to-
text module
3] Mark the key words
4] Make API calls System calls
5] Display the results in form of text or
speech or operation.

