KNIF 2015
Artificial Neural Network Application on Determining
Chord Composition for Melody Accompaniment
Mochammad Dikra Prasetya
Institut Teknologi Bandung
dikraprasetya@yahoo.co.id
Abstract Melody is a sequence of succession of tones and
itself is the major part of a song composition. To accompany
the melody, chord compositions will be prepared in
accordance with the harmonization of tones within it.
Composing chords is an unquantifiable process which may
only be rated by subjective judgement. Variations are
welcomed, however subjectively each person have their own
preference thus making a generally likeable composition is a
challenge to musicians since long. Simon et al. (2008)
proposed a solution on composing chord accompaniment for
a melody in real-time as an application named MySong.
Machine learning serves a main part on the application,
therefore it suggests that other machine learning variations
may possibly applicable to the problem and possibly produce
better result. Artificial neural network is considered as a
potential alternative that may have an advantage on
parameter customization and time-series applicable. Based
on test results, it is proven that an artificial neural network
solution is able to produce a generally both hearable and
likeable chord composition.
Key words Artificial Neural Network, Machine Learning,
Melody, Chord Composition, Song Accompaniment.
I. INTRODUCTORY
Mainly the experiment is conducted to examine artificial
chord composition for melody accompaniment with
general acceptance of audience. Artificial neural network
is configured in accordance with music theory in the
is assessed through crossvalidation testing and survey from sample audiences. Some
other methods such as greedy alternatives and manual by
experts are present as the comparative method in the
evaluation.
As a complex problem, there are several limitations
brought in the conducted experiment. Input of the
prototype is a music melody transcribed in MusicXML
format and its chord composition is generated in MIDI
format. Unfortunately, there is no source of downloadable
melody-only MusicXML. Hence all of the song bank is
transcribed manually through this experiment. Songs used
in both training and testing are originated from national
mars and hymnes due to the limitation of melody-chord
source. There is no transposes of root scale within the
songs, and used chords are limited to simple chord forms
only (major, minor, diminished). The examination is also
limited only to major-themed song, since processing major
and minor should be parallel (according to literature study)
Konferensi Nasional Informatika 2015
Dr. Ir. Rinaldi Munir, M.T.
Institut Teknologi Bandung
rinaldi@informatika.org
and examining only in one theme is applicable to the other.
This means that by examining only major themed song, the
proposed solution is also applicable for minor themed song
by a parallel similar model. In spite of these limitations, the
proposed solution are configured as general as possible to
let the possibility of further development in the future. All
Git repository: https://github.com/dkrprasetya/Chordman.
II. MUSIC THEORY AND RELATED RESEARCHES
Literature study is conducted on music theory and
researches related to chord composing. [10-11] Simon et
al. (2008) have contributed on solving this problem with
their product, MySong. They are using Hidden Markov
Model on their implementation. Other researches are not in
same objective as current experiment, but lies in the same
domain and adaptable to this chord composing problem.
[12] Yaremchuk et al. (2008) proposed a solution on
guessing played chords in an MP3 using artificial neural
network, while [9] Mauch et al. (2009) proposed that
segmentation in a song beforehand could result significant
improvement on chord guessing. From above researches it
is concluded few things to be adapted in the
implementation of artificial neural network.
As [9] Mauch et al. (2009) suggests, it is preferably to
break down the whole melody beforehand. However since
the parts identification (prelude, riff, elude, etc.) are not
given, the segmentation should be done in different
relevant term. [10-11] Simon, et al. (2008) processed the
song in a function of time, which is adaptable to this
segmentation process. Measure beat is a preferable
parameter for segmenting the melody. These segments will
be translated into input parameters of the artificial neural
network. Chords resulted from each segment are parts of
the whole composition that will be reconstructed in the end.
[10-11] Simon, et al. (2008) also suggests on many
adaptable processes that is necessary to be done in order to
improve the quality of processing. Melody as sequence of
tones should be transposed into uniform root scale before
being used as a training data. This is done to increase the
accuracy of predicting, since similar root scale means more
relevance in the similarity matching of determining
probability of playing a chord in a segment. Major and
minor songs should also be parallel in its process.
However, this observation is not relevant anymore since it
has been declared that only major-themed song will be
used in the experiment.
Institut Teknologi
Jl. Ganesha 10 Bandung 40132, Indonesia
156 ofBandung,
286.
KNIF 2015
Observation parameter is an important feature of a
predictor. Tones from each segment are obviously become
a deciding parameter on chord composing. Its occurrence
and density are different in terms since one of it a true-false
variable while the other is calculated by its frequency per
total of tones in the segment. There are a total of 12 tones
exist in music (C, C#, D, D#, E, F, F#, G, G#, A, A#, B),
each reserves two (occurrence and density) input node in
the designed network.
Chord composition is a harmonious sequence, therefore
it is natural that observing the previous chord is considered
as necessary. This hypothesis is supported by chord
progression study in music theory which stated that each
chords have its tendency to move only a few chord
candidates as the next played chord. In the implementation,
this could be translated as the time-series input where we
will observe previous results for our current observation.
However it is not known the best range of time-series
implementation in the network. This limitation suggests
the experimentation should include examination of timeseries range.
III. CHORD COMPOSING PROCESS DESIGN
A. Artificial Neural Network Design
Designed artificial neural network should consider all of
the suggested features as part of the observation. Melody
as the input is processed beforehand to produce segments.
These segments are mapped into values for input nodes in
the designed network. From the analysis, it is found that
the observation candidates including: (1) Occurrence of
each tones; (2) Density of each tones; and (3) Previous
, which range is going to be
examined through the experiment. With this configuration,
feed-forward multilayer perceptron architecture is used.
Backpropagation algorithm and sigmoid activation
function is applied to the network with a fixed iteration.
Training until some number of error is not considered as it
is not applicable for such non-deterministic (not
necessarily exact match) problem. However, the number of
fixed iteration is not defined hence it will be examined also
through the experiment.
Observed parameters are mapped into 24 main input
nodes (12 occurrence nodes and 12 density nodes) and 12
output nodes. The time-series parameter are adding a
number of 12 x N input nodes, where N is the range of the
time-series. One hidden-layer is present with its number of
nodes ¾ of the sum of input and output nodes. This number
is taken from empiric study references on artificial neural
network. Designed model as a whole is illustrated on
Figure 1.
Each of 12 output nodes represent the confidence level
of that tone being the suggested played chord on current
segment. Result of the segment is decided by taking each
of these values as the probability its chord is taken. This
algorithm is described as in written Formula 1, where
confidence values of each chord tone is notated as vi.
B. Process Scheme
(1)
The whole process consist of three processes: Preprocess, training process, and testing process.
Pre-process takes role on preparing inputted melody into
class models so that further processing could be conducted
with more flexibility on tweaking its variables. This
includes transposing tones into C major as the uniform root
scale and segmenting the melody into parts and mapping it
s design which the whole scheme
illustrated on Figure 2. As it is been discussed on the
analysis, segmentation is done per half measure. Therefore,
if the song used 4 beats per measure, it will be segmented
into 2 segments (per 2 beats).
Figure 2. Segmentation Result
Figure 1. Artificial Neural Network Model
Konferensi Nasional Informatika 2015
Song bank are divided into training data set and testing
data set by ratio of 10:3 respectively. Data sets are preprocessed to output the segments that will be used in
further process. In training process, implemented artificial
neural network will be trained to each provided data set.
Weights resulted after training is stored in a file to be used
in testing process. Testing process consists of processing
each preits position, then the final chord composition is constructed
from those. The whole main process scheme are illustrated
in Figure 3.
Institut Teknologi
Jl. Ganesha 10 Bandung 40132, Indonesia
157 ofBandung,
286.
KNIF 2015
(2)
Figure 3. Process Scheme
IV. IMPLEMENTATION AND EVALUATION
It has been succeeded to implement the previous design
of artificial neural network that is successfully able to
produce chord composition for melody accompaniment.
From the song bank, it is produced 600 of data sets for
training and testing. Each process are executed on each
data sets and experiment configuration. In Figure 4, the
resulted composition from the implemented artificial
neural network (as chord A) and manually generated
chords by expert (as chord B) are presented side by side.
Both composition have different assigned chords on some
parts, however, both are well-fit for its segment and
generally acceptable by audience, which evaluation
described in further section.
Throughout this accuracy evaluation, all of the
configuration candidates and alternative methods are
experimented. Configuration with best accuracy concluded
from this evaluation will be used on further testing. The
experimented parameters are the range of time-series and
the number of fixed iteration for training process.
Alternative methods that are present to be compared with
artificial neural network solution are: (1) Random function
method; (2) Greedy by tone occurrence; and (3) Greedy by
applying chord progression theory. Random function
method is done by picking one of exist tones (12
candidates) randomly with a uniform probability of 1/12.
Greedy by tone occurrence done by assigning the most
is by applying chord progression theory, which the method
will pick randomly one of the candidates decided by our
current chord and its next tendentious chords. These
alternative methods are compared with the best
configuration of artificial neural network. The accuracy of
each methods could be compared through the following
Table 2.
Table 1. Cross-Validation Test on Experiments
Figure 4. Composed Chords Sample
The first evaluation is conducted by cross-validation
accuracy test on the result. It should be noted that the chord
matching equivalence condition are modified to fit the
relevance matching of two different chords, since they are
not necessarily to be exact match to be stated as a well or
badly predicted chord. The equivalence score follows
Formula 2. If both have the exact same chord, it will score
1, or exactly equivalent. The second check is whether both
chord is the relative major/minor of the other, which if true
will score 0.75. Otherwise, these chords will be matched
by each tones, as if it appeared in one and another it will
score 1/3 for each tone. Number of segments present is
notated by N. Both chords which currently being matched
is notated as a and b in the formula. Notation f(a, b)
represents the whole equivalence function. Notation g(xi,
b) tests whether tone xi , which occurred in chord a, also
occurred in chord b. Result of this accuracy test is shown
on following Table 1.
Konferensi Nasional Informatika 2015
Num of
Iteration
10
100
1000
10000
100000
No Timeseries
61.520 %
74.002 %
74.755 %
73.232 %
74.815 %
1 Segment
Time-series
53.510 %
72.534 %
68.420 %
73.226 %
70.593 %
2 Segment
Time-series
35.182 %
70.122 %
68.727 %
70.525 %
66.925 %
Average
50.070 %
72.219 %
70.634 %
72.327 %
70.777 %
Table 2. Accuracy of Methods
Method
No time-series
1 Segment time-series
2 Segment time-series
Random function
Greedy by tone occurrence
Greedy by chord progression
Accuracy
74.815 %
73.226 %
70.525 %
31.247 %
37.897 %
65.859 %
Based on above score, the best accuracy achieved by
artificial neural network is 74.815%, with no time-series
configuration and 100000 iteration on its training. It could
be seen that after 100 iteration, the network shown a
stability of accuracy movement with ± 2% of fluctuation.
Even so, it is concluded that the combination of 100000
iteration and no time-series is the best configuration out of
this experiment. Compared with other alternative methods,
its best competitor is greedy by chord progression, with the
Institut Teknologi
Jl. Ganesha 10 Bandung 40132, Indonesia
158 ofBandung,
286.
KNIF 2015
accuracy of 65.859%.
naturally
necessary
to
build
harmonious
chord
configuration produced the best accuracy. It is a bit
contradictive since it is well-known that chord progression
theory defined that the current played chord limits the
candidates of chords to play next. This could come into two
alternative conclusion: 1) Time-series parameter is
considered as an excessive observation feature; 2)
Limitation of data set affects the pattern learning that
should have been strengthening the prediction. These two
analysis is not yet concluded and need further experiment
to decide. Despite the uncertainty, here it is solidly
concluded that the implemented artificial neural network
could predict quite satisfyingly accurate.
The second and last evaluation is sample audience
survey. It is conducted to gain audience subjective
judgement on composed melody accompaniment. Quality
of the composition is scored by averaging ratings from
audience. The selected three sample melody are
accompanied by artificial neural network, then given to the
audience to be rated. To filter audience responses, they are
categorized into three category: Expert, semi-expert, and
non-expert. Expert category respondents are audience that
are musician and capable to determine chords to
accompany a melody. Semi-experts are ones that are
musician but not well-experienced in determining chords.
Other than the recent category will join non-expert
category respondents. Responses from expert category are
obviously having a high priority for evaluation reference,
since it is assumed that experts have the most objective
scoring than the other category.
Gathered audience are 51 respondent with ratio of
expert, semi-expert, and non-expert of 1:4:2 respectively.
From each of three songs, it is presented three version of
chord composition which two of them is produced by
parallel artificial neural network and one manually
assigned by experts. In this experiment, term ANN-1 and
ANN-2 is used to differentiate these two artificial neural
network. Recapitulated survey result of these scores is
shown in following Table 3. It should be noted that the
rating are scored with value range of 1 to 5.
Table 3. Survey Result
Respondent
Category
Expert
Semi-expert
Non-expert
Average
ANN-1
3.2
2.94
3.361
3.167
Rating
ANN-2
3.067
2.714
3.25
3.01
Manual
4.2
4
4.222
4.141
matter, even manually assigned chords by experts may
sound not good enough to some audience. But there is also
a possibility where this may ha
assigned chords are actually not good enough. Despite its
imperfect upper bound, artificial neural network reached
the score of 3.167 and 3.01, which actually can be
concluded as good enough generally to audiences and
achieved more than 75% of its upper bound.
There are still rooms for further development. One of the
biggest limitation is data set limit. Current data sets are all
manually transcribed. It is believed that if more variations
of data set are able to be provided, the result could be
improved significantly. The design of artificial neural
network is good enough, however, there are still options of
configurations available to be experimented on, i.e.
recurrence network architecture. Observing responses
from audience, it is also possible that the MIDI quality as
the sample for the composition to be hearable, is decreasing
the esthetics of chord composition. Even though
instrumentation for playing the chords is not part of the
evaluation, it could affect the songs rating. If possible,
future experiments could handle this problem to gain
higher quality of result.
V. CONCLUSION
It is concluded that artificial neural network is capable
to produce a solid chord composition for melody
accompaniment that is generally acceptable for hearers.
Feed-forward multilayer perceptron architecture with 12
input nodes of tone occurrence, 12 input nodes of tone
density, 12 output nodes of chord confidence, and one
hidden layer with 2/3 of total nodes, backpropagation
algorithm, 100000 fixed iteration, and sigmoid activation
function for training are the best configuration proposed
from the experiment. This is satisfyingly achieved the
score of 3.167 with upper bound of 4.141 (out of 5).
ACKNOWLEDGEMENT
We would like to show our special gratitude to Dr. Ir.
Gusti Ayu Putri Saptawati, M.Comm and Dr. Masayu
Leyla Khodra, S.T. M.T. who provided evaluations and
suggestions that greatly assisted this research. We are very
grateful for their comments on earlier versions which
significantly improved both execution and writings of this
research.
REFERENCES
[1]
Based on above result, manually assigned chords scored
[2]
This is actually obvious since the manually composed is
the one used as training data. However, this shown that the
upper bound
is not surprising that perfect score is not achieved. Since it
is well assumed, universally, that music is a very subjective
[3]
Konferensi Nasional Informatika 2015
[4]
[5]
Berkeley, I. S. N., Dawson, M. R. W., Medler, D. A., Schopflocher,
D. P., & Hornsby, L. (1995). Density plots of hidden value unit
activations reveal interpretable bands. Connection Science, 7, 167186
Curtis, M. E., Bharucha, J. J. (2010). The minor third communicates
sadness in speech, mirroring its use in music. Emotion, 10, 335-348.
Demuth, H. B., et al. (2014). Neural Network Design 2nd Edition.
Paperback.
Good, M. (2001). MusicXML for Notation and Analysis. MIT Press.
Graupe, D. (1997). Principles of Artificial Neural Networks 2nd
Edition. World Scientific Publishing.
Institut Teknologi
Jl. Ganesha 10 Bandung 40132, Indonesia
159 ofBandung,
286.
KNIF 2015
[6]
Hermawan, Arief. (2006). Jaringan Saraf Tiruan: Teori dan
Aplikasi. Penerbit ANDI.
[7] Huron, D. (2006). Sweet Anticipation: Music and the Psychology of
Expectation. MIT Press.
[8] Laden, B., Keefe, B. H. (1989). The representation of pitch in a
neural net model of pitch classification. Computer Music Journal,
13, 12-26.
[9] Mauch, M., et al. (2009). Using Musical Structure to Enhance
Automatic Chord Transcription. International Society for Music
Information Retrieval Conference.
[10] Simon, I., et al. (2008). MySong: Automatic Accompaniment
Generation for Vocal Melodies. ACM CHI Conference on Human
Factors in Computing Systems.
[11] Simon, I., et al. (2008). Exposing Parameters of a Trained Dynamic
Model for Interactive Music Creation. Association for the
Advancement of Artificial Intelligence.
[12] Yaremchuk, V., et al. (2008). Artificial Neural Networks that
Classify Musical Chords. International Journal of Cognitive
Informatics and Natural Intelligence.
Konferensi Nasional Informatika 2015
Institut Teknologi
Jl. Ganesha 10 Bandung 40132, Indonesia
160 ofBandung,
286.