Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

3405-8008-1-PB

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

ISSN No.

0976-5697
Volume 8, No. 5, May-June 2017
International Journal of Advanced Research in Computer Science
RESEARCH PAPER
Available Online at www.ijarcs.info

Raga Identification Using MFCC and Chroma Features


Kavita.M.Deshmukh Prof. Dr. Pramod. J. Deore
PG Student, Department of E&TC Engineering, HOD, Department of E&TC Engineering,
SES’s RCPIT, Shirpur, India SES’s RCPIT, Shirpur, India

Abstract: Ragas are the heart of Indian classical music. Raga is one of the basic concepts in Indian music. Raga plays an important role in Indian
classical music. It is a collection of swaras comprising of many features and is explained as melodic concept which is led to blossom by the
musical artist. We have performed this approach on 3 ragas- Darbari, Khamaj, and Malhar. In this work, we propose a methodology to identify
the ragas of an Indian music signal. This has several interesting applications in digital music indexing, recommendation and retrieval. In this
work, we attempt the raga classification problem in a non-linear SVM (support vector machine) framework using a combination of two relevant
features that represent the similarities of a music signal using two different features MFCC (Mel Frequency Cepstral Coefficient) and
Chromagram. We assesses the proposed method on our own raga dataset achieve an improvement of 96.79% in accuracy by combining the
information from two features relevant to Indian music.

Keywords: swaras; raga identification; SVM; MFCC and chromagram.

I. INTRODUCTION approach, we further augment the MFCC features with chroma


features to improve results. For the chromagram result, the
Western music and Indian classical music differed from MIR Toolbox [3], an open source toolbox for musical
each other with respect to their timing, notes, and different extraction was used to extract features.
characteristics associated with raga. Note of western classical
music is similar to that of swaras of Indian classical music. A II. RELATED WORK
raga is the unique combination of swara, and their substrings.
It plays vital role in an Indian classical music. Indian track has There are different endeavours made in distinguishing the raga
seven basic swaras (notes) namely Sa, Ri, Ga, Ma, Pa, Dha, in an Indian music. One technique for raga classification is
Ni(Shadja, Rishab, Gandhar, Madhyam, Pancham, through the interpretation of raga straight forwardly into
Dhaivatand, Nishad)..Indian classical music consists of swaras at every intervals of time and order raga utilizing a
different characteristics associated with particular raga that are classifier, for example K-NN or SVM.In [4], Vijay Kumar,
not easily identified by using approach for identification of Harit Pandya, C.V. Jawahar investigated the difficulty of raga
western music. Raga is a set of different specific notes which recognizable proof in Indian Carnatic track. In mild of the
might be having a few unique properties (e.g. Arohana, belief that, contemporary strategies are either in light of pitch-
avarohana, pakad, Taal, etc.).Notes of a raga organized in an class profiles or ngram histogram of notes yet not each, they
ascending order called arohana of that raga and notes of a raga attempted to fuse them in a multi-class SVM framework
organized in a descending order called avarohana of that raga. through linearly combining the 2 kernels. Each of these
Different specific notes are referred to as swaras in Indian kernels capture the similarities of a raga based on Pitch-class
classical music. Raga identity includes techniques that find profiles and ngram histogram of notes.Chordia and Rae [5]
notes from a music and as classify it into the proper raga. defined the consequences of the first massive-scale raga
Ragas shape a very crucial idea in Hindustani classical song reputation experiment. Raga are the crucial structure of Indian
and seize the mood and emotion of performances [1]. classical music, every such as a unique set of complex melodic
It can also be utilized by novice musicians who find it tough to gestures. They have constructed a system to understand ragas
differentiate ragas which are very similar to each different and based totally on pitch-class distributions (PCDs) and pitch-
additionally useful for the beginners who examine this class dyad distribution (PCDDs) calculated at once from the
stunning artwork. For automatic identification, some of the audio signal. A massive, numerous database inclusive of 20
characteristics of ragas have to be converted into appropriate hours of recorded performances in 31 one-of-a-kind ragas by
features. This is very difficult for Indian music due to the 19 different performers turned into assembled to educate and
following reasons which needs to be addressed while take a look at the system. Classification was finished the usage
converting a music piece into swara strings. (i) A music piece of support vector machines (SVM).
may be composed from multiple instruments during a Authors of [6] they investigate the problem of scale
performance. (ii) Unlike Western music, the notes in Indian independent automated raga identification by means of
music are not on an absolute scale but on a relative scale (iii) accomplishing kingdom of the artwork results the usage of
there is no fixed starting swara in a raga. (iv)Notes in Indian Gaussian mixture model (GMM) based Hidden Markov
music do not have a fixed frequency but rather band of Models (HMM) and he combines three features i.e
frequencies (oscillations) around a note. (v) The sequence of chromagram styles, mel-cepstrum coefficients and timbre
swaras in the ragas are not fixed and various improvisations features. We additionally carry out the above work using 1)
are allowed [2]. In this work, we attempt the raga classification discrete HMMs and 2) classification trees over swara based
problem using a non-linear SVM and a combination of two totally functions comprised of chromagrams using the idea of
different features that is MFCC and Chromagram. In this vadi of a raga. They perform their approach based on four

© 2015-19, IJARCS All Rights Reserved 725


Kavita. M.Deshmukh et al, International Journal of Advanced Research in Computer Science, 8 (5), May-June 2017,725-729

ragas- darbari, khamaj, malhar and sohini. They have carried is a major clue for human raga recognition. For some ragas,
out a median accuracy of 97%. [7] Sridhar and Geetha propose the pakad might be sincerely the avrohan wrapped over arohan
a strategy to recognize the raga of Carnatic music signal. The and for some others; it is probably a totally different pattern of
principle thought manner behind Raga identification is that it the constituent swaras.
can be applied as a decent premise for music statistics healing E. Vadi: In each Hindustani classical song and Carnatic song,
of Carnatic track melodies or Film songs based totally on is the tonic (root) swara (musical be aware) of a given raga
Carnatic track. The input polyphonic music signal was (musical scale). "Vadi is the most sonant or most essential note
investigated and made to go through a signal separation set of of a Raga. It does not check with the maximum performed
rules to split the tool and the vocal signal. Utilizing their note however it as an alternative refers to a notice of unique
proposed artist identity set of rules they determined the singer significance. It is normally the swara that's repeated the best
with the help of fundamental frequency of artist. number of times, and frequently it is the swara on which the
The frequency component of the signal had been then decided singer can pause for a significant time. Vadi swara in a raga is
and these frequency components, mapped into the swara most important swara. Specialty of any raga relies upon on
sequence thereby therefore finding out the Raga of the vadi swara and because of this, the vadi swara is likewise
particular melody. The raga that is coordinated with the raga called the Jeeva swara or the Ansha swara. A expert artist
placed away in the database is the following raga and output of makes use of vadi swara in special ways like making a song
the device. Their test information contains of 30 samples/tunes vadi swara again and again, beginning a raga with vadi swara,
in 3 melakarta ragas sung via four artists, 175 Talam, raga to give up a raga with vadi swara, making a song vadi swara
database having raga name, arohana avarohana in swara phase often in vital places with one of a kind swaras or someday
structure. Pandey, Mishra, Paul [8] proposed the system making a song vadi swara for an extended time in one breath.
'Tansen' which relies upon on Hidden Markov Model and F. Tala: Tala refers to a fixed time cycle, set for a particular
string, Pakad coordinating. Their test information comprises of composition, that's constructed from groupings of beats.Talas
result on just 2 ragas Yaman kalian and Bhupali. They applied have cycles of a described number of beats and infrequently
HMM model seeing that notes are little in number and the trade inside a track. They have particular components, which
grouping for raga is quite very much characterized. Baum- in mixtures can provide upward thrust to the range to exist,
Welch getting to know algorithm is applied for identity of allowing special compositions to have extraordinary rhythms.
transition and preliminary state chance in HMM set of rules. Carnatic track singers commonly hold the beat by means of
Again to improve execution over HMM, Pakad coordinating moving their palms up and down in targeted patterns, and the
method is utilized by becoming a member of gaining use of their hands concurrently to maintain time.
knowledge of into the framework. Prashanth T R, Radhika
Venugopala [9] proposed a way for Note Identification in IV. EXTRACTION OF FATURES
Carnatic Music from Frequency Spectrum. Instead of the
usage of note transcription we can also perceive only notes in A. Mel Frequency Cepstral Coefficients (MFCCs):
the input music. These system accept ‘.Wav’ documents as The most usually used speech characteristic is the Mel
input, frequency spectrum characteristics are analyzed and Frequency Cepstral Coefficients (MFCC) features, MFCC is
depends on that they mapped notes. They take a look at most commonly used feature, and because of its accurate
records consist of 15 raga alap with 3-8 min clip of different estimate of the speech parameters and most effective results
artists. They have achieved up to 90% of accuracy. for speech [9]. Mel Frequency Cepstral Coefficients (MFCCs)
are the most widely used features in the majority of the
III. CHARACTERISTICS OF A RAGA speaker and speech recognition applications. The typical
process for feature extraction can be seen on Figure, with the
Raga is a collection of swaras and consists of sequential assumption that it has been processed digitally and properly
arrangement of swaras or notes. Different notes are called quantized. Extraction refers to procedure of transforming the
swaras in Indian classical music. The fundamental seven notes speech signal into a number of parameters, while pattern
or swaras or symbols in classical music are S(Sa),R (Re or Ri), matching is a task of obtaining parameter sets from memory
G (Ga), M(Ma), P (Pa), D (Dha), N(Ni) .Raga is a blend of closely matches the parameter set extracted from the input
various swaras that are having some exceptional properties speech signal. In simple words, the essence a speech
(e.g. arohana, avarohana, Gamakas,Pakad, Taal etc.)[10]. recognizer is to provide a powerful and accurate mechanism to
A. Arohana and Avarohana: Raga consists of group of swaras transcribe speech into text [4]. Feature extraction is a crucial
or notes. Depending on sequence of notes or swara, step of the raga identification process. The MFCC is the best
characteristics of raga i.e. arohana and avarohana, it offers method for feature extraction introduced in [4].
identity. Arohana is a collection of notes that are arranged in
ascending order. Avarohana is a collection of notes that are
arranged in descending order.
B. Gamakas: Each note in the swara sequence has a specific
frequency value. Notes in a raga are arranged in a way that
there is continuous oscillatory movement about the note, such
arrangement of notes is referred to as gamakas.
D. Pakad: A Pakad is a characteristic phrase or set of swara Fig1. Sample speech spectrum
which uniquely identify a raga. For each raga there is a unique
• Peaks denote dominant frequency additives within the
and different Pakad from other raga. Pakad is a small sequence
speech signal in fig (1).
of swaras in a raga that acts as a signature for the raga and an
• Peaks are referred to as formants.
artist often visits and revisits the pakad over a performance. It
• Formants convey the identity of the sound.

© 2015-19, IJARCS All Rights Reserved 726


Kavita. M.Deshmukh et al, International Journal of Advanced Research in Computer Science, 8 (5), May-June 2017,725-729

• Formants and a smooth curve connecting them.


• This Smooth curve is known as spectral envelope.
• Our aim: We want to split spectral envelope and
spectral details from the spectrum shows in fig (2).
Given log X[k], acquire log H[k] and log E[k],
such that log X[k] = log H[k] + log E[k]
Take Log on both sides
log ||X[k]|| = ||log H[k] +|| log E[k]||
Taking inverse FFT on both sides Fig.3Chromagram for Raga Darbari
x[k] = h[k] + e[k]
• Mel-Frequency analysis of speech is based on human
belief experiments.
• It is determined that human ear acts as clear out. It
concentrates on only certain frequency components.
• These filters are non-uniformly spaced on the
frequency axis extra filters in the low frequency
regions. Less no. Of filters in high frequency regions
Cepstral coefficients h[k] obtained for Mel spectrum
are referred to as Mel-Frequency Cepstral
Fig.4 Chromagram for Raga Khamaj
Coefficients regularly denoted by MFCC.

Figure.5 Chromagram for Raga Malhar

If one notice has a frequency of 440 Hz, the be aware an


octave above it is at 880 Hz, and the note an octave below is
at 220 Hz. Since the semitones get repeated in every octave
above and below, the energies within the chromagram for each
semitone (chroma) is computed by means of wrapping and
adding it up over different octaves. Above figure shows
Fig 2.Spectral Envelope depicts a chromagram generated from an arohan of raga
Darbari, Khamaj and Malhar. The arohan for raga bhairavi is:
B. Chromagram: In the music context, chroma feature or SA Re-Kom Ga-Kom Ma Pa Dh-Kom Ni-Kom SA. The Sa
chromagram closely relates to the twelve different pitch swara of Darbari coincides with the semitone G and the rest of
classes. Chroma-based features, these are also referred to pitch the swaras in the arohan get aligned with the semitone pattern
class profiles, are a powerful tool for analyzing tune whose in the chromagram. These observations and previous use of
pitches can be meaningfully categorized (often into twelve chromagrams in audio analysis and chord recognition [12]
categories) and whose tuning approximates to the equal- motivated us to use the chromagram to extract information
tempered scale. One predominant property of chroma features about swaras. From the chromagram, we extract the semitone
is that they capture harmonic and melodic characteristics of with maximum energy in each frame and get a sequence of
track [11], while being sturdy to modifications in timbre and semitones for the raga. Though these sequences of semitones
instrumentation. A chromagram [6] is a visual illustration of might have some identifying information about the raga, using
energies within the 12 semitones (or chromas) of the western them for raga identification is not appropriate since ragas are
musical octave particularly C, C#, D, D#, E, F, F#, G, G#, A, defined over MFCC [13]. In our approach, we assume that we
A# and B. So, it basically depicts the distribution of energy in do not have information about the tonic frequency of the raga
the twelve pitch classes. The western semitones are such that performance. We must therefore find the mapping from the
they are fixed with respect to absolute frequency values and absolute frequency scale employed by the chromagram to the
the musical octave has the property that the semitone one relative scale of the musical piece, so that the swara sequence
octave below or above is equivalent to the current semitone. can be identified. To do so, we use the concept of vadi
Figure (3),(4),(5) shows the chromagram features of raga discussed earlier to convert the semitone sequence to a swara
Darbari, Khamaj and Malhar. sequence. We compute the most frequently occur ring
semitone from the semitone sequence and associate it with the
vadi of a raga which is known for each raga. For example vadi
for raga Khamaj is swara “Ga”. In an audio of Khamaj, if the
semitone C# is most prominent, we label swara “Ga” at
semitone C# and then convert the rest of the semitone
sequence into a swara sequence. The above procedure is raga

© 2015-19, IJARCS All Rights Reserved 727


Kavita. M.Deshmukh et al, International Journal of Advanced Research in Computer Science, 8 (5), May-June 2017,725-729

specific, i.e. the conversion from semitone sequence to swara are extracted as explained in the [4]. We randomly create the
sequence utilizes the identity of the raga-specific vadi swara. dataset and evaluate raga so that create data base used for
Assume that we are building a system for n ragas and the training and evaluate the selected raga. We created
actual raga for the test audio is not known, then for the given approximately 90 tonnes in data base and test the ragas report
audio, we must compute separate swara transcriptions for each the accuracy. We compare our approach with the approach
of the n ragas [14]. proposed by Mr.Vijay Kumar. Results are shown in Table I
and the best results for both methods are shown. It is clear
V. CLASSIFICATION that, our approach which combines MFCC and Chromagram
feature extraction methods achieves superior performance
We identify a raga by combining the information from two compared to [4] where only Pitch-class profiles and n-gram
different and relevant features, MFCC and Chromagram. We histogram of notes is used. Results in the table clearly
incorporate this systematically into an SVM framework. In demonstrates the superiority of our approach. The best
machine learning, the (Gaussian) radial basis function kernel, accuracy obtained by our approach is 96.79% which is higher
or RBF(Radial Basis Function) kernel, is a popular kernel than their best reported accuracy 91.20%.
function used in various kernelized learning algorithms. In
particular, it is commonly used in support vector machine
Number Ragas Correctly Correctly Accuracy Accuracy Delay Delay
classification [4]. In machine learning, support vector of Ragas Checked Identified Identified % % (Existing (Proposed
machines (SVMs, also support vector networks [4]) are in Data by by (Existing (Proposed Method) Method)
Base Existing Proposed Method) Method)
supervised learning models with associated learning Method Method
algorithms that analyze data used for classification and 5 4 4 4 100 100 12 14
regression analysis. In addition to acting linear category, 5 5 5 5 100 100 12 14
SVMs can correctly perform a non-linear classification which 10 6 6 6 100 100 18 20
is called the kernel trick, implicitly mapping their inputs into 10 8 7 8 87.5 100 18 20
excessive-dimensional feature spaces. Kernel Trick: The main 10 10 9 10 90 100 18 20
idea behind the kernel trick" is to map the data into a different 15 8 7 8 87.5 100 21 24
space, called feature space, and to construct a linear classier in 15 12 11 11 91.66 91.66 21 24
this space. It can also be seen as a way to construct non-linear 15 14 12 13 85.71 92.85 21 24
classifiers in the original space the classier function f(x) can be
15 15 13 14 86.66 93.33 21 24
expressed as a sum of inner products with support vectors. An
20 8 7 8 87.5 100 30 35
important result, called Mercer's theorem, states that any
20 12 11 11 91.66 91.66 30 35
symmetric positive semi-denite function K(x;z) is an inner
20 16 14 15 87.5 93.75 30 35
product in some space (and vice-versa). In other words, any
such function K(x;z) implicitly denes a mapping into so called 20 20 18 19 90 95 30 35
feature space
Table I: Comparison of performance of approach [2] with
:x such that K(x;z) =< (x), (z) >. our approach on our dataset.
Such functions K are called kernels. The unique most-margin
hyperplane algorithm proposed by Vapnik in 1963 and
constructed a linear classifier. However, in 1992, Bernhard E.
Boser, Isabelle M. Guyon and Vladimir N. Vapnik suggested a
manner to create nonlinear classifiers through applying the
kernel trick (originally proposed by Aizerman et al.[15]) to
maximum-margin hyperplanes. The resulting algorithm is
formally similar, except that every dot product is replaced by a
nonlinear kernel function. This allows the algorithm to fit the
Fig 6.Final Delay for Test Ragas
maximum-margin hyperplane in a transformed feature space.
The transformation may be nonlinear and the transformed
space high dimensional; although the classifier is a hyperplane
in the transformed feature space, it may be nonlinear in the
original input space.

VI. RESULT

We evaluate the performance of our proposed approach on our


dataset. Our data set is small consisting of only 3 ragas with
limited instruments .To evaluate our method, we created a Fig 7. Accuracy for Test Ragas
dataset comprising of 4 ragas namely Darbari, Khamaj and
Malhar. All audio files are of type instrumental of type flute, VII. CONCLUSION
harmonium and sitar recordings. We initially conducted
experiment on our dataset. We implemented the feature The feature analysis component of an Raga Identification
extraction procedure for ragas as described in [4]. Polyphonic system plays a crucial role in the overall performance of the
audio signals are converted to predominant melody using system.There are many feature extraction techniques available,
melody extraction software. Pitch-class profiles and n-grams but ultimately we want to maximize the performance of these

© 2015-19, IJARCS All Rights Reserved 728


Kavita. M.Deshmukh et al, International Journal of Advanced Research in Computer Science, 8 (5), May-June 2017,725-729

systems The objective of this method investigates the results [4] Vijay Kumar, Harit Pandya, C.V. Jawahar, “Identifying
that can be obtained when you combine Mel-Frequency Ragas in Indian Music” International Institute of
Cepstral Coefficients(MFCC) and Chromagram features as Information Technology, Hyderabad, India , IEEE
feature components for the front-end processing of an Raga International Conference 2014.
[5] Parag Chordia and Alex Rae, “Raag Recognition using
Identification .The MFCC and Chroma feature components Pitch-Class and Pitch-Class Dyad Distributions”,
combined are suggested to improve the reliability of a Raga International Symposium/Conference on Music Information
Identification system. The MFCC are typically the “de facto” Retrieval, 2007.
standard for Raga Identification systems because of their high [6] Pranay Dighe, Parul Agrawal, Harish Karnick, Siddartha
accuracy and low complexity; however they are not very Thota and Bhiksha Raj, Scale independent raga
robust at the presence of additive noise. The Chroma features identification using chromagram patterns and swara based
in recent studies have shown very good robustness against features, IEEE International Conference on Multimedia and
noise and acoustic change. The main idea is to integrate Expo Workshops, 2013.
MFCC & Chroma features to improve the overall Raga [7] Rajeswari Sridhar and T.V.Geetha, “Raga Identification of
Carnatic music for Music Information Retrieval”,
Identification performance in low signal to noise ratio (SNR) International Journal of Recent Trends in Engineering,
conditions. We achieved average accuracies of 98.30%.These Vol.1, May 2009.
are the best current results for scale independent raga [8] Gaurav Pandey, Chaitanya Mishra, and Paul Ipe, “Tansen:
identification and compare closely with the results by A System For Automatic Raga Identification”, In proc. 1st
[4].Overall, the combined feature set consisting of chroma and Indian Intl. Conf. on Artificial Intelligence, pages 1-9,2003.
MFCC consisting the best results. [9] Prashanth T R, Radhika Venugopala, “Note Identification
In Carnatic Music From Frequency Spectrum”, In
VIII. ACKNOWLEDGEMENT proc. of IEEE, page 87, 2011.
[10] Pranay Dighe, Harish Karnick, Bhiksha Raj Indian
Institute of Technology, Kanpur, India. 2. Carnegie Mellon
I would like to thank Prof. Dr.P.J.Deore for his invaluable
University, Pittsburgh PA, USA.
support, guidance and availability throughout the course of [11] Rajshri Pendekar, S. P. Mahajan, Rasika Mujumdar,
this research.My deepest thanks go to my husband, for his Pranjali Ganoo “Harmonium Raga Recognition”
love, understanding and support. Finally, my thanks go to all International Journal of Machine Learning and Computing,
the people who have supported me to complete the research Vol. 3, No. 4, August 2013.
work directly or indirectly. [12] Springer, 2011. [D. A. Reynolds, "An overview of
automatic speaker recognition].
IX. REFERENCES [13] H. Sahasrabuddhe and R. Upadhye, “On the computational
model of raag music of India.” in Workshop on AI and
Music: European Conference on AI, 1992.
[1] Ranjani, H.G, Arthi,S. and T.V. Sreenivas, “Carnatic Music
[14] J airazbhoy, N.A.-The Rags of North Indian Music: Their
Analysis: Shadja, Swara identification and raga verification
in AlApana using stochastic models”, IEEE Workshop on Structure & Evolution; Popular Prakashan Bombay 1995.
Application of Signal Processing to Audio and Acoustics [15] Olivier Lartillot, Petri Toiviainen, “A Matlab Toolbox for
Musical Feature Extraction from Audio”, International
October 16-19, 2011,New Paltz,NY.
Conference on Digital Audio Effects, Bordeaux, 2007.
[2] Bruno Nettl, Melinda Russell, In the Course of
Performance: Studies in the World of Musical
Improvisation, Chapter 10, Page 219.
[3] Bruno Nettl, Melinda Russell, In the Course of
Performance: Studies in the World of Musical
Improvisation, chapter 10.

© 2015-19, IJARCS All Rights Reserved 729

You might also like