An Application To Control Media Player With Voice Commands: Journal of Polytechnic January 2020

An Application to Control Media Player with Voice Commands

Article  in  Journal of Polytechnic · January 2020

DOI: 10.2339/politeknik.646675


3 authors, including:

Emre Avuçlu Abdullah Elen

Aksaray Üniversitesi Bandirma Onyedi Eylül Üniversitesi


ISSN: 1302-0900 (PRINT), ISSN: 2147-9429 (ONLINE)

URL: http://dergipark.org.tr/politeknik

An application to control media player with

voice commands
Ses komutları ile media player kontrolü için bir
Yazar(lar) (Author(s)): Emre AVUÇLU1, Ayhan ÖZÇİFÇİ2, Abdullah ELEN3

ORCID1: 0000-0002-1622-9059
ORCID2: 0000-0001-7733-9959
ORCID3: 0000-0003-1644-0476

Bu makaleye şu şekilde atıfta bulunabilirsiniz(To cite to this article): Avuçlu E., Özçifçi A. ve Elen A.,
“Ses komutları ile media player kontrolü için bir uygulama”, Politeknik Dergisi, 23(4): 1311-1315, (2020).

Erişim linki (To link to this article): http://dergipark.org.tr/politeknik/archive

DOI: 10.2339/politeknik.646675
An Application to Control Media Player with Voice Commands
 In the developed application, operations with keyboard and mouse can be done with voice commands.
 Voice commands can be sent with the wireless headset from anywhere in the shooting area.

Graphical Abstract
The following Figure shows a general voice recognition process.

Figure. Voice recognition process

This application was developed to address the needs of people who cannot listen to music on their own due to any

Design & Methodology

In order to manage the media player with voice commands, voice recognition libraries were first used.

In this study, an application that provides media player control with voice commands was developed.

In this study, test procedures were performed with 20 people. In some word tests, more than one test was performed
over the same person's voice.

100% accurate recognition can be achieved by using short words and words with full pronunciation when making
voice definitions.

Declaration of Ethical Standards

The author(s) of this article declare that the materials and methods used in this study do not require ethical
committee permission and/or legal-special permission.
Politeknik Dergisi, 2020; 23(4) : 1311-1315 Journal of Polytechnic, 2020; 23 (4): 1311-1315

Ses Komutları ile Media Player Kontrolü İçin Bir

Araştırma Makalesi / Research Article
Emre AVUÇLU1*, Ayhan ÖZÇİFÇİ2, Abdullah ELEN3
1Teknik Bilimler Meslek Yüksek Okulu, Bilgisayar Tek. Bölümü, Aksaray Üniversitesi, Türkiye
2Mühendislik Fakültesi, Endüstri Müh. Bölümü, Aksaray Üniversitesi, Türkiye
3 TOBB Meslek Yüksek Okulu, Bilgisayar Tek. Bölümü, Karabük Üniversitesi, Türkiye

(Geliş/Received : 14.11.2019 ; Kabul/Accepted : 07.01.2020)

Günümüzde teknolojiyi kullanmak insanların hayatlarını kolaylaştırmak açısından büyük öneme sahiptir. Teknoloji ile bazı
uygulamaları çalıştırmak çok kolay bir hal almıştır. Bu çalışmada ses komutları ile media player kontrolü sağlayan bir uygulama
geliştirilmiştir. Herhangi bir engelinden dolayı kendi kendine müzik dinleyemeyen kişilerin ihtiyaçlarını gidermek için bu
uygulama geliştirilmiştir. Uygulama C# programlama dilinde gerçekleştirilmiştir. Media player’ı ses komutları ile yönetebilmek
için önce ses tanıma kütüphanelerinden faydalanılmıştır. Geliştirilen uygulama da klavye Mouse ile media player üzerinden yapılan
işlemler ses komutları ile gerçekleştirilebilmektedir. Ses komutları kablosuz kulaklık ile çekim alanının olduğu bir yerden
Anahtar Kelimeler: Ses tanıma, media player kontrolü, engelli birey.

An Application to Control Media Player with Voice

Using technology today is of great importance in terms of making people's lives easier. It has become very easy to run some
applications with technology. In this study, an application that provides media player control with voice commands was developed.
This application was developed to address the needs of people who cannot listen to music on their own due to any disability. The
application was implemented in C# programming language. In order to manage the media player with voice commands, voice
recognition libraries were first used. In the developed application, operations with keyboard and mouse can be done with voice
commands. Voice commands can be sent with the wireless headset from anywhere in the shooting area.
Keywords: Voice recognition, media player control, disabled individual.

1. INTRODUCTION different study, separate tests were performed on male

Today, it is nearly not possible for people to live and and female users with different algorithms [5].
carry out some operations without technology. People Using artificial intelligence techniques, a voice
have developed and used technology every day for their recognition system independent of text and speaker was
own benefit. Today it is very easy to control any developed on the Turkish language [6]. The syllable-
application with software. Nowadays, many applications based Turkish word recognition system was developed
can be controlled with software so that people can live using different voice recognition algorithms [7, 8]. In the
more comfortably. People can see examples of such simulation environment performed on MATLAB, the
practices in every aspect of their lives. successful recognition rate for 10 people was found to be
To facilitate the social life of people in studies in this field 99% [9]. They performed music and speech recognition
in the literature: Different voice recognition algorithms [10]. Successful results were obtained in the study which
and command sets were used on MATLAB [1]. With performed 40 commands [11]. It was controlled by voice
different voice recognition algorithms, “On TV”, “Off commands of a remote controlled car [12]. It has been
TV”, “Volume Up”, “Volume Down” and “Channel tried to determine the English pronunciation of the
One” command sets were tried separately for male and numbers 0-9 [13].
female users [2]. It has been tried by establishing In this study; an application was developed to control the
different algorithms on a phone simulation. The results media player to listen to music with voice commands
obtained were found to vary according to the way the over the computer. The application was implemented
voice is spoken [3]. Over 80% success was achieved in using the SpeechRecognitionEngine Class in the
voice recognition on the letters “a”, “e” and “i” [4]. In a System.Speech library in the C # .Net Framework. If
*Sorumlu Yazar (Corresponding Author) there is a match in voice commands, the operations that
e-posta : emreavuclu@aksaray.edu.tr can be done with mouse and keyboard are executed.

Emre AVUÇLU, Ayhan ÖZÇİFÇİ, Abdullah ELEN / POLİTEKNİK DERGİSİ, Politeknik Dergisi,2020;23(4): 1311-1315

2. MATERIAL and METHOD The voice wave that forms the sound has two important
The application was programmed in C# programming features. These properties are amplitude and frequency
language. This section describes how the voice [15]. Frequency, while determining the soundness and
recognition process is performed. quiver characteristics of voice; amplitude determines the
intensity of the voice and the energy it carries. Equation
2.1. Voice Recognition Process
1 is given for the Total Amplitude (TG) calculation.
First stage; the voice recorded in the system. Once the
𝑇𝐺 = ∑𝑛𝑡=1 𝑥(𝑡) (1)
voice is recorded, it can go through various processes and
be processed. The following Figure 1 shows a general In this equation x (t); amplitude at time t; In other words,
voice recognition process. it expresses the energy carried by the voice wave at the
moment t. If the sum of the total amplitude value
calculated by this method is above a certain value, then
the meaning of sound, that is, speech, is started.
Filters are used for two purposes in the processing of
voice. These are the separation of the voice signal and the
correction of the voice signal. Digital filters are FIR
(Finite Impulse Response) filter and IIR (Infinite Impulse
Response) filter. In FIR filters, the input signal forms the
output 𝑦𝑛 , which is the weighted sum of the current and
previous inputs versus 𝑥𝑛 . The mathematical expression
of this filter is given by Equation 2.
𝑦𝑛 = 𝑏0 𝑥𝑛 + 𝑏1 𝑥𝑛−1 + 𝑏2 𝑥𝑛−2 + ⋯ + 𝑏𝑞 𝑥𝑛−𝑞 (2)
In this equation 𝑦𝑛 is the result of the filter output. In IIR
filters, the input signal constitutes the output 𝑦𝑛 , which
represents the weighted sum of the previous outputs,
together with the weighted sums of the current and
previous inputs versus 𝑥𝑛 . In this model, together with
the 𝑥𝑛 input, the weighted sum of the previous p outputs
gives the filter output 𝑦𝑛 . After digitizing the voice, the
voice is encoded and the voice recognition process is
completed. The following libraries should first be added
to the system for voice recognition.

using System.Diagnostics;
Figure 1. Voice recognition process.
private SpeechLib.SpSharedRecoContext
objRecoContext = null;
The voice is digitized to perform these operations. The
private SpeechLib.ISpeechRecoGrammar grammar
voice is first filtered and then sampled for digitization.
= null;
Figure 2 shows an example of digitization function.
private SpeechLib.ISpeechGrammarRule menuRule
= null;

The design of the application consists of certain stages.

Figure 2. Example of digitization function. From the recognition of voice commands to the execution
of the media player, a number of operations are carried
out. The flow diagram of the developed application is as
Where x(t) is the analog signal, x(nT) is the digitized
in figure 3.
signal. In the digitizing stage, the filter shown in Figure
2 refers to the analog filter analog filtering and sampling
are performed during the recording of the voice.
In order to use digital signal processing techniques, the
analog signal must be represented as a series of numbers
[14]. It utilizes the analysis and separation of voice
signals to detect voice after sampling.

AN APPLICATION TO CONTROL MEDIA PLAYER WITH VOICE COMMANDS… Politeknik Dergisi, 2020; 23 (4) : 1311-1315

Figure 4. Interface of the developed application.

We need to include the following library in our system


using System.Speech.Recognition;

Voice detection can be performed with the methods in the

“𝑆𝑦𝑠𝑡𝑒𝑚. 𝑆𝑝𝑒𝑒𝑐ℎ” library in the .Net Framework. The
following code blocks are used in the system for
feedback after voice recognition.

Figure 2. Flow diagram of the system.

SpeechSynthesizer Speech = new
Firstly, we need to add the media player component to
our application as shown in Figure 4. PromptBuilder Builder = new PromptBuilder();
SpeechRecognitionEngine Recognition = new

First, the “𝑝𝑙𝑎𝑦𝑒𝑟” command is given to start the

application. This starts the application.

if (avuclu.Text == "player")
var mediaPlayer = "C:\\Program Files\\Windows
Media Player\\wmplayer.exe";
Figure 3. General structure of the system.

The general form design view of the application to be The code block required to activate or deactivate the
managed by voice commands is shown in Figure 5. We application is as follows.
can activate or deactivate this application at any time.

Emre AVUÇLU, Ayhan ÖZÇİFÇİ, Abdullah ELEN / POLİTEKNİK DERGİSİ, Politeknik Dergisi,2020;23(4): 1311-1315

avuclu.Text = Result.PhraseInfo.GetText(0, -1, 3. CONCLUSION

true); // activate
As the pronunciation of the voice command becomes
objRecoContext = null; // deactivate more difficult and the number of letters in it increases,
the level of accurate voice recognition decreases. 100%
accurate recognition can be achieved by using short
The definitions and their use for controlling the media words and words with full pronunciation when making
player with voice commands are shown in Table 1 below. voice definitions. With misrecognition, the voice
command performs the function linked to the nearest
voice command. No action can be taken with inability to
Table 1. Commands and functions.
identify. In this study, test procedures were performed
Voice commad Feature with 20 people. In some word tests, more than one test
Player Open the media player
Open Add mp3 list to media player
was performed over the same person's voice. The
Active Media player active following in Table 2 shows the results of the
Passive Media player passive experimental studies.
Play Mp3 Play
Pause Mp3 Pause
Next Mp3 Next in list Table 2. Experimental results.
Previous Mp3 previous in list Number Accurate False Error
Stop Mp3 Stop Words
of trials recognition recognition rate
Player 20 18 2 %10
Open 20 17 2 %15
After verification of the required definitions and voice Active 15 14 1 %6,66
command, the data transmission process is executed with Passive 15 13 2 %13,33
Play 10 10 0 %0
the following code block. 12 10 1 %16,66
Next 10 10 0 %0
avuclu.Text = Result.PhraseInfo.GetText(0, -1, true); Previous 25 18 5 %28
Stop 25 23 1 %8
if (recog.Text == "play")
As can be seen from the results, it was more difficult to
axWindowsMediaPlayer1.Ctlcontrols.play(); identify words with a high number of words and difficult
SpeechSynth.Speak("play"); to pronounce. The application has a coding that can do
} everything we do about daily media player with voice
if (avuclu.Text == "pause") commands. It is thought that the application will be
useful for people who cannot use the computer for any
{ reason (bedridden, elderly, disabled, etc.). Specially
axWindowsMediaPlayer1.Ctlcontrols.pause(); developed to facilitate the lives of the visually impaired.
SpeechSynth.Speak("pause"); With the application you can meet your daily music
listening needs without being connected to anyone.
In this study, media player control was provided to listen
if (avuclu.Text == " next")
to music by remote voice commands. Voice commands
{ can be sent from any point with a wireless or wired
axWindowsMediaPlayer1.Ctlcontrols.next(); headset. Media player was managed with voice
SpeechSynth.Speak("next"); commands without using mouse and keyboard. In
addition, this study will enable people with disabilities,
elderly or bedridden patients to meet their listening
if (avuclu.Text == "previous") needs. The application developed in C # using the
{ Speech.dll library was tested with different voice
axWindowsMediaPlayer1.Ctlcontrols.previous(); commands.
SpeechSynth.Speak("previous"); ACKNOWLEDGEMENT
} This study was supported by Aksaray University
if (avuclu.Text == "stop") Scientific Research Projects Coordinatorship, Aksaray,
Turkey. Project Number: 2018-061.
The author(s) of this article declare that the materials and
methods used in this study do not require ethical
committee permission and/or legal-special permission.

AN APPLICATION TO CONTROL MEDIA PLAYER WITH VOICE COMMANDS… Politeknik Dergisi, 2020; 23 (4) : 1311-1315

REFERENCES [8]. Meral O., “Doğrusal Öngörülü Kodlama ve Adaptif

[1]. Karakaş M., “Computer Based Control Using Voice Input”, Algoritma Tabanlı Konuşmacı Tanıma”, Master Thesis,
Master Thesis, Dokuz Eylül University, (2010). Istanbul University Institute of Science and Technology,
[2]. Muda L., Begam M., Elamvazuthi I., "Voice Recognition
Using Mel Frequency Cepstral Coefficient and Dynamic [9]. Dede G., Sazlı,M.H., “Biyometrik Sistemlerin Örüntü
Time WarpingTechniques", Journal of Computing, Tanıma Perspektifinden İncelenmesi ve Ses Tanıma
2(3):138-143, (2010). Modülü Simülasyonu”, Institute of Defense Sciences.
[3]. Baygün M. K., Yaldır A. K., “Linear Predictive Coding ve [10]. Bolat B., Küçük Ü., Yıldırım T., “Aktif Öğrenen PNN ile
Dynamic Time Warping Teknikleri Kullanılarak Ses Konuşma/Müzik Sınıflandırma”, Akıllı Sistemlerde
Tanıma Sistemi Geliştirilmesi”, Pamukkale University, Yenilikler ve Uygulamaları Sempozyumu, (2004).
(2009). [11]. Asyalı, M.H., Yılmaz, M., Tokmakçı, M., Sedef, K.,
[4]. Öztürk B., Çakar T., “Gerçek Zamanlı Ses Tanıma”, Aksebzeci, B.H., Mittal, R., “Design and Implementation
Graduation Project, Istanbul University Faculty of of a Voice Controlled Prosthetic Hand”, Turk J. Elec.
Engineering Department of Electrical/Electronics Eng. and Comp., 19(1): (2011).
Engineering, (2007). [12]. Leechor P., Pornpanomchai C., Sukklay P., “Operation of
[5]. Demirci M. D.,” Bilgisayar Destekli Ses Tanıma Sistemi a Radio Controlled Car by Voice Commands”, 2nd
Tasarımı”, Master Thesis, Istanbul University Institute of International Conference on Mechanica and Electronics
Science and Technolog, (2005). Engineering (ICMEE 2010), (2010).
[6]. “Bilgisayar destekli bir dil programı” [Online]. Available: [13]. Abushariah A.A.M., Gunawan T.S., Khalifa O.O.,
https://docplayer.biz.tr/16256021-Bilgisayar-destekli-bir- “English Digits Speech Based on Hidden Markov
dil-programi-turkce-konusma-tanima-sistemi.html Models”, International Conference on Computer and
[Accessed: 19-Sep-2019]. Communication Engineering (ICCCE 2010), (2010).
[7]. Aşlıyan R., Günel K., Yakhno T., ”Dinamik Zaman [14]. Rabiner L., and Schafer R. W., “Digital Processing of
Bükmesi Yöntemiyle Hece Tabanlı Konuşma Tanıma Speech Signals”, Prentice Hall PTR, 512, (1978).
Sistemi”, Çanakkale Onsekiz Mart University, Academic [15]. Huang X., Acero A., and Hon H. W., “Spoken Language
Informatics, (2008). Processing: A Guide to Theory”, Algorithm and System
Development (1st Ed.), Prentice Hall PTR, New Jersey,
980, (2001)


