1. Introduction
Variability and similarity in the structure of echolocation calls produced by bats has hindered the development of acoustic species classification as a monitoring tool. Individual bats within a species may vary their echolocation calls with location [
1], habitat [
2], stage of foraging [
3], and presence of and proximity to conspecifics [
4]. Call structure can vary by gender [
5], and may also change as an individual bat ages [
5,
6,
7]. Furthermore, the recording equipment used by researchers and the effects of the physical environment, for example frequency-dependent atmospheric attenuation, may introduce additional variability [
6]. Individuals and species sharing a habitat or ecological niche may produce echolocation calls that are similar both temporally and spectrally to deal with similar sensorial challenges.
One method that researchers employ to identify bats from their echolocation calls relies on subjective analysis of call structure, often supplemented by direct observation of the bat as it flies [
8]. While such methods can provide results comparable to some computer-based machine learning techniques [
9], they are of limited use where repeatable (over time and between researchers), quantifiable measures of species identity are required [
10]. Removal of bias associated with researcher experience and enhancement of repeatability of results requires more objective quantifiable methods. Multivariate discriminant functions have been used to identify calls of bats recorded in south-east England and Italy for habitat-use assessment [
11,
12] while a decision tree was used to classify zero-crossed echolocation call recordings from eight Australian species [
13]. Neural networks have also been used to identify species of British bats flying over organic and conventional farms [
14]. Finally, machine learning techniques commonly used in automated (human) speech recognition have been used to detect and classify calls from five North American bat species [
15]. Although these methods allow satisfactory identification of several species, others are so similar that the development of new methods may be required to separate them. Convergence in call structure among species makes acoustic identification of
Myotis bats especially challenging.
In this study we aimed to test the ability of two classifiers, support vector machines (SVM) and ensembles of artificial neural networks (ENN), to classify the echolocation calls of bats with the hope that they would produce consistently high correct classification rates. A third technique, discriminant function analysis, was used as a benchmark against which the performance of these classifiers could be compared. A relatively recent advance in machine learning techniques, SVMs work by creating separation boundaries between classes by training iteratively on a dataset. The separation boundaries between classes are modified for each case (in this study, a case is the parameters derived from a single call) in the dataset. Modifications are based on data points that lie on the boundary of each class, i.e., support vectors in the dataset [
16]. SVMs have been used successfully for multi-dimensional classification tasks such as cancer tissue classification from microarray data [
17]. They have also been applied in acoustic research for tasks such as classification of musical audio [
18]. Ensembles use a group of classifiers and a voting system to decide on the classification of an input [
19]. They work by presenting the inputs to each classifier within the ensemble and collating the results to reach a consensus classification. Provided the members of the ensemble have a classification rate greater than 50%, the ensemble will achieve a higher classification rate than any one member alone [
19]. Classifiers in an ensemble can also be arranged into a hierarchy where the ensemble input is initially classified into a number of general classes before being classified to a more detailed class. This allows members of the ensemble to be trained on specific tasks, rather than the entire classification task [
20,
21]. Prudent selection of hierarchical classification branches can enhance overall classification performance.
Here we describe the ability of SVMs and ENNs to correctly classify 14 species of bats resident in the United Kingdom based on acoustic parameters measured automatically from echolocation calls recorded in the field. Using these techniques we achieved unprecedented high correct classification rates for all but three species found in the United Kingdom, including five species from the Myotis genus.
2. Methods
We used the recordings from a previously published study [
21] as a source of calls to train and test the SVMs and ENNs; methods used for the collection, recording, digitization and selection of calls can be found in that paper. From these recordings we obtained a library of 713 search-phase echolocation calls from 14 species of bat resident in Britain (see
Table 2 and
Table 4 for species and sample sizes, respectively). Only one call was used from each individual bat recorded.
We used an automated algorithm implemented in Matlab (v7, Mathworks, Nattick, MA) to extract calls from the background noise and measure parameters. Prior to the extraction of a call, a 10
th order Butterworth high-pass filter (cut-off frequency of 15 kHz) was applied to improve the signal-noise ratio. Extraction of a call from a recording began by finding the approximate centre of the echolocation call by applying a 100:1 moving average filter to the absolute value of the signal and finding the position of highest amplitude within the recording. The algorithm then iterated forward through the recording, calculating power spectra containing 256 data points (representing 5 μs). The data were then zero-padded to 1024 points and a power spectrum of the same size was calculated. The algorithm iterated through the recording in steps of 6 points, giving 98% overlap between successive power spectra (final resolution 78 Hz). The power spectrum was then smoothed using a 20:1 moving average filter and normalised between 0 and 1. The algorithm would stop iterating through the recording if one of two criteria were met: 1) the frequency with highest energy of the power spectrum was more than 8 kHz different from that of the previous iteration, or 2) the average signal to noise ratio of the normalised power spectrum exceeded a value of 0.01 (Equation 1). Both criteria were indicative of the algorithm reaching the end of a call and moving into the background noise. Once the algorithm found the end of the call, it repeated the search but this time moving from the point with highest energy in the call back towards its beginning.
Equation 1: Calculation of the hill climbing threshold parameter for the noise estimation call isolation algorithm. The threshold parameter (T) is obtained dynamically for each given signal. (S = absolute value of the power spectrum data points, k = number of samples in the power spectrum).
Using deterministic algorithms, twelve parameters were extracted automatically from each call. Five of these parameters: call duration (ms), frequency at the start of the call (kHz), frequency at the end of the call (kHz), frequency at half the duration of the call (kHz) and frequency at the maximum energy of the call (kHz) have been used in previous studies [
11,
21,
34], and are referred to here as the ‘base’ parameters. Once isolated from the background signal, the call duration was measured by dividing the number of samples in the isolated call by the sample rate. Frequency at the start of the extracted call was obtained by taking a 1024-point power spectrum of the first 256 μs of the call (resolution 3.9 kHz) and noting the frequency with most energy; the signal was zero padded to 1024 samples before the FFT was applied. This procedure was repeated for the centre and final 256 μs of the call to obtain the frequency at half the duration of the call and the frequency at the end of the call. The frequency with maximum energy was obtained by applying a FFT to the entire isolated call. As calls varied greatly in length, and therefore number of samples, each call was first zero padded to increase its length to the nearest power of two before the FFT was applied. The resulting power spectrum was then reduced to 1024-samples using a moving average filter and the frequency at the maximum energy of the call measured.
The sixth call parameter estimated the rate of change (i.e. second order derivative) of the frequency-time course of each call (RoC, expressed at kHz/ms2). The frequency-time course was calculated by measuring the frequency with peak amplitude from successive power spectra taken throughout the call. Each power spectrum was calculated from a zero-padded 1024-point FFT of successive 128 μs portions of the call. Each successive power spectra overlapped the previous one by 80% (102.4 μs) giving a resolution in the frequency-domain of 1.6 kHz. The frequency with maximum amplitude of each power spectrum was used to mark out the frequency-time course of the call.
The seventh parameter was a standardised bandwidth measurement, taken from a normalised version of the power spectrum calculated to measure frequency with maximum energy of the call. Standardised bandwidth was defined and measured as the bandwidth of the call at 80% of the maximum amplitude. A high bandwidth measurement denotes a relatively broad spread of energy across the echolocation call while a low value energy focused into a narrow range of frequencies.
The next four parameters measured the distribution of energy across an echolocation call. Each call was subjected to a Hilbert transform [
35] and multiplied by its conjugate; the output was normalised between zero and one. The resulting signal was separated into four quartiles, and the sum of each quartile’s amplitude was divided by the sum of the signal’s total amplitude. This gave the energy each quartile contained expressed as a proportion of the call’s total energy.
The final parameter was a discrete variable that categorised calls into one of three previously described types [
36] and one devised by us: constant frequency (CF), frequency modulated (FM), quasi-constant frequency (QCF), and kinked (K). Kinked calls were characterised by having a sudden decrease in frequency in the final quarter of the call. This degree of frequency modulation was calculated by comparing the initial peak frequency, and the final peak frequency of the quarter. If the degree of modulation was above a threshold of 6 kHz, and the first quarter of the call did not contain a positive frequency modulation (indicative of a call from
Rhinolophus sp
.), then the echolocation call was classed as kinked. If a kink in the last part of the call was not present, the call was checked for constant frequencies. This was done by checking if the start and end frequency of the call were both less than the centre frequency of the call (indicative of a call from
Rhinolophus sp
.). If this category was rejected, the call was then tested for frequency modulation. FM calls were defined as having a relatively constant rate of change in frequency. Rate of change was calculated as the mean second-order derivative of a 1
st order polynomial fitted to the start, centre, and end frequencies of the call. Calls whose rate of change was less than 0.2 were classified as FM. Calls not fitting any of the above categories were classified as being QCF.
The extracted parameters were tested for normality in SPSS (SPSS Inc., Chicago IL) using a Shapiro-Wilk’s W test and were not normally distributed. Therefore all extracted parameters are expressed as medians ± interquartile ranges. For the purpose of comparison with other studies, the five base parameters were also summarised as mean ± one standard error (SE).
Call parameters were classified to genus and species using three classification algorithms: DFA (quadratic with cross-validation), SVM, and ENN. DFA is relatively robust to deviations from normality, which are likely to reduce performance slightly [
37]. Examination of the covariance matrices showed that they were heterogeneous and transformation of the data did not reduce heterogeneity, nor did it reduce deviation of the data from normality. Therefore, quadratic discriminant functions were calculated in all analyses [
37], using untransformed data. The neural networks used in the ENN and the SVMs were trained on half of the dataset, and then tested on the remaining half to give an indication of classification performance on unseen data. DFA was carried out using SPSS and calls classified to genus and species. The SVM classifier was trained and tested using Matlab. The kernel functions of the support vector machines used a radial basis function with a gamma parameter ranging from 0.005 to 0.03 in steps of 0.005, from 0.03 to 0.1 in steps of 0.01, and from 0.1 to 0.5 in steps of 0.1. For each value of gamma, the support vector machine was reinitialised 20-times to increase the chance of obtaining an optimal classifier. A support vector machine was trained for every target case in the dataset (each genus, or species) to classify an instance as either belonging to that case, or not. All classifiers were then combined and categorised each echolocation call as either belonging to a specific class (genus or species), or not. A call was classified correctly only if a single support vector machine classified it, and that classification was correct.
Neural networks were trained and tested using Matlab. Neural networks used a sigmoidal activation function and varied in learning rate, momentum, number of hidden layers, and number of neurons per hidden layer used (
Table 1). Because overtraining can lead to a decrease in accuracy [
38], each neural network configuration recorded its best epoch and associated weights. Once 2500 epochs had been completed, the weights of the best epoch were saved to file. The best epoch was identified as the one that produced the highest minimum accuracy over all targets, and then the highest average accuracy over all targets. When all permutations of architectures were exhausted, the top 50 performing neural networks were refined by retraining with randomly initialised weights 30 more times, as the values of the initial weights affect the accuracy of the neural network [
39]. Performance was measured by sorting the networks first by highest minimum classification rate, and then highest average classification rate. Two types of ENNs were generated. The first utilized the top performing 21 neural networks (of the 50 retrained networks) trained to classify calls to species. The second type used a hierarchy of neural networks [
21]. The first layer of the hierarchy used an ensemble of the top 21 performing neural networks (of the 50 retrained networks trained to genus level) trained to classify calls only to genus. Within each genus, ensembles of 21 neural networks (of the 50 retrained networks trained for each species) were trained to classify calls only from species within a particular genus.
The performance of the SVMs and ENNs were ranked first in order of minimum classification rate, and then by average classification rate. This prevented a classifier that performed poorly on one or two genera or species to be selected over one that performed slightly worse on average, but better in general.
Table 1.
Ranges of values for neural network configuration. The top performing 50 permutations of the parameters were refined to increase the probability of finding an optimal neural network by retraining with randomly initialised weights 30 times.
Table 1.
Ranges of values for neural network configuration. The top performing 50 permutations of the parameters were refined to increase the probability of finding an optimal neural network by retraining with randomly initialised weights 30 times.
Parameter | Values |
Learning Rate | 0.01, 0.02, 0.05, 0.08, 0.1, 0.2, 0.3, 0.4, 0.5 |
Momentum | 0.0, 0.1, 0.2, 0.3, 0.4, 0.5 |
Hidden Layers | 1, 2 |
Neurons per hidden layer | 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30 |
3. Results
All parameters measured (
Table 2 and
Table 3), except those from calls from
Plecotus auritus, agreed with those already published for the same dataset [
18]. The duration of
P. auritus calls was approximately half that reported previously, and this may account for the higher center and end frequency measurements of this study.
Discriminant function analysis classified calls to genus with an overall accuracy of 81% (
Table 4). Calls of
Rhinolophus were all correctly identified, and more than 90% of the calls from
Pipistrellus,
Myotis, and
Barbastella were identified correctly. Discriminant function analysis identified correctly 71% of the calls from
Plecotus, which were mainly confused with
Barbastella calls (21%). Calls from
Nyctalus were identified with the lowest accuracy (58%) and were confused predominantly with calls from
Eptesicus (37%).
Discriminant function analysis correctly classified 73% of the calls to species (
Table 5). Calls from both species of
Rhinolophus were all classified correctly, and calls from
Barbastella were classified with more than 90% accuracy. Calls from
M. mystacinus were least accurately classified (42%), and were mainly misclassified as being from other
Myotis species (28% of calls were confused as
M. brandtii, 8% as
M. daubentonii, and 8% as
M. bechsteinii). Calls from
N. leisleri were also poorly classified (45%) and were misclassified mostly as
E. serotinus (33%). Just 56% of calls from
E. serotinus were classified correctly, and were mainly misclassified as calls of
N. leisleri.
We assessed the relative importance of each parameter to the DFA classification of calls using Wilks’ lamba values (
Table 6). Parameters that score a lambda close to zero indicate a greater importance for species differentiation, whereas those that have a lambda close to one are less important. The most useful parameters for discriminanting between different species were the five base parameters, followed by the standardised bandwidth measurement, call type classification, and quarterly amplitude measurements. The spectrogram rate of change estimate was the least important parameter. A Kruskal-Wallis test was applied to each parameter to test if they were significantly different between species. After corrections for multiple comparisons, all parameters had a χ
2 value much greater than the critical value.
Table 2.
Parameters measured from echolocation calls. Values are median ± IQR (Range). Call type values are stated as percentage of calls marked as quasi constant frequency (QCF), constant frequency (CF), frequency modulated (FM), and kinked (K). Parameters are duration (ms), start frequency (Fstart, kHz), end frequency (Fend, kHz), center frequency (Fcenter, kHz), frequency with maximum energy (Emax, kHz), standardised bandwidth (SBW, kHz), Rate of change (kHz/ms2), and quartile amplitude (Q1 – Q4, % of total energy in call). Parameters presented here for Plecotus auritus should not be taken as indicative of the species.
Table 2.
Parameters measured from echolocation calls. Values are median ± IQR (Range). Call type values are stated as percentage of calls marked as quasi constant frequency (QCF), constant frequency (CF), frequency modulated (FM), and kinked (K). Parameters are duration (ms), start frequency (Fstart, kHz), end frequency (Fend, kHz), center frequency (Fcenter, kHz), frequency with maximum energy (Emax, kHz), standardised bandwidth (SBW, kHz), Rate of change (kHz/ms2), and quartile amplitude (Q1 – Q4, % of total energy in call). Parameters presented here for Plecotus auritus should not be taken as indicative of the species.
Species | Duration | FStart | FEnd | FCenter | EMax | SBW | RoC | Q1 | Q2 | Q3 | Q4 | Type |
Barbastella barbastellus | 2.91±1.22 | 38.76±9.04 | 27.99±6.46 | 34.45±6.89 | 34.24±6.89 | 1.29±0.65 | 0.00±0.02 | 0.16±0.14 | 0.38±0.07 | 0.35±0.17 | 0.11±0.14 | 68.5% QCF, 6% CF, |
| (7.95) | (23.26) | (15.93) | (18.95) | (19.6) | (1.72) | (0.36) | (0.81) | (0.49) | (0.48) | (0.69) | 8.5% FM, 17% K |
Eptesicus serotinus | 6.83±3.83 | 64.45±8.78 | 26.37±4.39 | 34.67±5.18 | 33.20±5.20 | 2.58±2.17 | 0.01±0.04 | 0.07±0.14 | 0.36±0.13 | 0.37±0.20 | 0.14±0.11 | 84.21% QCF, 0% CF, |
| (18.25) | (27.65) | (22.46) | (25.39) | (24.19) | (5.62) | (0.3) | (0.36) | (0.62) | (0.69) | (0.34) | 5.26% FM, 10.53% K |
Myotis bechsteinii | 1.84±0.48 | 117.57±13.78 | 37.47±3.45 | 75.37±7.5 | 71.92±4.74 | 5.81±3.01 | 0.00±0.02 | 0.07±0.05 | 0.28±0.14 | 0.39±0.15 | 0.18±0.13 | 8% QCF, 0% CF, |
| (1.61) | (52.54) | (12.06) | (23.26) | (19.38) | (14.21) | (0.37) | (0.32) | (0.39) | (0.56) | (0.39) | 92% FM, 0% K |
Myotis brandtii | 4.15±1.19 | 106.93±10.01 | 31.13±7.39 | 60.06±7.8 | 53.96±3.36 | 5.49±2.81 | 0.01±0.08 | 0.03±0.03 | 0.16±0.10 | 0.53±0.07 | 0.27±0.1 | 60% QCF, 2% CF, |
| (3.61) | (41.5) | (22.95) | (40.53) | (34.48) | (11.37) | (1.66) | (0.17) | (0.4) | (0.41) | (0.74) | 38% FM, 0% K |
Myotis daubentonii | 2.72±1.28 | 87.86±6.89 | 31.87±4.31 | 56.42±4.31 | 55.13±6.24 | 3.23±3.66 | 0.00±0.01 | 0.10±0.13 | 0.31±0.13 | 0.34±0.12 | 0.21±0.09 | 8% QCF, 0% CF, |
| (2.54) | (18.52) | (12.49) | (11.2) | (18.73) | (12.06) | (0.09) | (0.34) | (0.38) | (0.37) | (0.38) | 80% FM, 12% K |
Myotis mystacinus | 2.64±0.83 | 106.2±19.28 | 33.16±8.66 | 63.79±9.4 | 54.32±7.87 | 4.39±3.78 | 0.02±0.04 | 0.02±0.04 | 0.13±0.14 | 0.39±0.26 | 0.41±0.25 | 36.11% QCF, 0% CF, |
| (3.57) | (55.66) | (24.51) | (50.29) | (31.56) | (18.8) | (0.92) | (0.20) | (0.91) | (0.64) | (0.74) | 44.44% FM, 19.44% K |
Myotis nattereri | 3.17±3.5 | 125.49±20.97 | 24.9±2.46 | 73.24±27.88 | 60.29±27.85 | 5.17±3.2 | -0.03±0.06 | 0.01±0.02 | 0.13±0.11 | 0.45±0.17 | 0.38±0.23 | 34.94% QCF, 0% CF, |
| (6.37) | (80.1) | (25.63) | (59.76) | (64.15) | (11.41) | (1.09) | (0.38) | (0.49) | (0.61) | (0.83) | 60.24% FM, 4.82% K |
Nyctalus leisleri | 7.37±3.34 | 63.48±16.84 | 26.86±4.48 | 32.23±4.88 | 31.01±3.91 | 1.61±1.01 | 0.02±0.05 | 0.17±0.16 | 0.39±0.11 | 0.32±0.14 | 0.10±0.07 | 78.57% QCF, 1.19% CF, |
| (13.63) | (57.63) | (28.32) | (24.11) | (24.51) | (5.67) | (1.18) | (0.71) | (0.47) | (0.58) | (0.27) | 3.57% FM, 16.67% K |
Nyctalus noctula | 15.56±10.91 | 36.4±20.46 | 21.75±4.03 | 24.55±5.24 | 23.68±4.62 | 0.8±0.57 | 0.01±0.02 | 0.30±0.40 | 0.32±0.10 | 0.24±0.33 | 0.10±0.17 | 78.41% QCF, 3.41% CF, |
| (37.17) | (45.93) | (15.43) | (19.74) | (13.33) | (2.5) | (0.06) | (0.9) | (0.76) | (0.57) | (0.92) | 5.68% FM,12.5% K |
Plecotus auritus | 0.82±0.53 | 52.73±17.72 | 37.4±19.88 | 51.17±18.01 | 50.47±19.70 | 3.42±0.84 | 0.00±0.13 | 0.17±0.18 | 0.39±0.26 | 0.28±0.12 | 0.10±0.24 | 41.18% QCF, 17.65% CF, |
| (3.24) | (26.27) | (31.31) | (35.64) | (37.11) | (4.57) | (2.63) | (0.46) | (0.60) | (0.40) | (0.52) | 41.18% FM, 0% K |
Pipistrellus pipistrellus | 5.23±2.94 | 81.31±15.11 | 44.92±3.15 | 47.61±3.47 | 46.63±2.67 | 0.73±0.49 | 0.02±0.07 | 0.15±0.18 | 0.4±0.13 | 0.3±0.16 | 0.08±0.13 | 94.44% QCF, 0% CF, |
| (6.59) | (58.24) | (11.53) | (6.44) | (7.98) | (1.45) | (0.23) | (0.46) | (0.51) | (0.56) | (0.27) | 0% FM, 5.56% K |
Pipistrellus pygmaeus | 7.47±2.59 | 95.7±20.51 | 50.82±3.32 | 53.22±2.27 | 52.73±2.08 | 0.65±0.24 | 0.03±0.03 | 0.25±0.14 | 0.46±0.16 | 0.25±0.14 | 0±0.1 | 97.98% QCF, 0% CF, |
| (15.44) | (61.52) | (13.18) | (10.91) | (10.42) | (2.58) | (0.16) | (0.84) | (0.62) | (0.51) | (0.21) | 0% FM, 2.02% K |
Rhinolophus ferrumequinum | 55.93±20.69 | 67.4±2.37 | 63.95±3.66 | 82.69±0.43 | 82.26±0.27 | 0.43±0 | -0.01±0.01 | 0.17±0.09 | 0.3±0.07 | 0.34±0.1 | 0.17±0.08 | 0% QCF, 100% CF, |
| (36.99) | (8.61) | (12.92) | (2.15) | (2.58) | (0.22) | (0.03) | (0.43) | (0.34) | (0.36) | (0.33) | 0% FM, 0% K |
Rhinolophus hipposideros | 45±10.46 | 93.88±3.42 | 91.31±4.88 | 111.82±1.46 | 111.57±0.98 | 0.49±0.06 | -0.01±0.01 | 0.23±0.11 | 0.34±0.15 | 0.27±0.15 | 0.09±0.13 | 0% QCF, 97.3% CF |
| (43.94) | (20.21) | (25.94) | (6.33) | (6.14) | (0.3) | (0.02) | (0.6) | (0.54) | (0.47) | (0.62) | 0% FM, 2.7% K |
Table 3.
Base parameters measured from automatically extracted echolocation calls. Values are mean ± s.e.m. (CV). Parameters are duration (ms), start frequency (Fstart, kHz), end frequency (Fend, kHz), centre frequency (Fcentre, kHz), and frequency of maximum energy (Emax, kHz). Parameters presented here for Plecotus auritus should not be taken as indicative of the species.
Table 3.
Base parameters measured from automatically extracted echolocation calls. Values are mean ± s.e.m. (CV). Parameters are duration (ms), start frequency (Fstart, kHz), end frequency (Fend, kHz), centre frequency (Fcentre, kHz), and frequency of maximum energy (Emax, kHz). Parameters presented here for Plecotus auritus should not be taken as indicative of the species.
Species | Duration | Fstart | Fend | Fcenter | Emax |
Barbastella barbastellus | 3.17±0.77 | 39.19±0.85 | 28.6±0.79 | 35.84±0.72 | 35.63±0.74 |
| (0.43) | (0.14) | (0.15) | (0.12) | (0.12) |
Eptesicus serotinus | 7.44±1.25 | 63.26±0.8 | 28.18±0.92 | 35.03±0.88 | 33.88±0.82 |
| (0.46) | (0.1) | (0.17) | (0.15) | (0.14) |
Myotis bechsteinii | 1.9±0.28 | 115.25±1.13 | 37.95±0.45 | 74.24±0.67 | 71.52±0.58 |
| (0.21) | (0.11) | (0.07) | (0.08) | (0.07) |
Myotis brandtii | 3.99±0.46 | 106.9±0.83 | 33.16±0.93 | 61.53±0.95 | 55.16±0.84 |
| (0.23) | (0.08) | (0.16) | (0.12) | (0.11) |
Myotis daubentonii | 2.44±0.52 | 87.53±0.47 | 33.04±0.54 | 57.45±0.45 | 55.57±0.59 |
| (0.33) | (0.05) | (0.09) | (0.06) | (0.08) |
Myotis mystacinus | 2.64±0.53 | 107.81±1.22 | 35.36±1.07 | 64.39±1.29 | 56.59±1.02 |
| (0.32) | (0.12) | (0.18) | (0.16) | (0.14) |
Myotis nattereri | 3.41±1.04 | 123.55±1.57 | 25.65±0.84 | 76.48±1.88 | 66.21±2.08 |
| (0.56) | (0.14) | (0.17) | (0.21) | (0.26) |
Nyctalus leisleri | 7.94±0.99 | 58.93±1.81 | 27.67±0.94 | 32.09±0.78 | 31.02±0.72 |
| (0.35) | (0.24) | (0.18) | (0.14) | (0.13) |
Nyctalus noctula | 17.34±1.92 | 38.94±1.96 | 21.94±0.7 | 25.21±0.77 | 24.3±0.65 |
| (0.46) | (0.31) | (0.15) | (0.15) | (0.13) |
Plecotus auritus | 1.12±0.72 | 58.87±1.25 | 40.47±1.65 | 51.43±1.48 | 51.28±1.53 |
| (0.68) | (0.16) | (0.26) | (0.21) | (0.21) |
Pipistrellus pipistrellus | 5.39±0.77 | 81.16±1.51 | 45.3±0.35 | 47.5±0.29 | 46.81±0.28 |
| (0.33) | (0.17) | (0.05) | (0.04) | (0.04) |
Pipistrellus pygmaeus | 7.56±0.95 | 93.47±1.45 | 51.2±0.37 | 53.34±0.26 | 52.79±0.25 |
| (0.35) | (0.15) | (0.05) | (0.04) | (0.03) |
Rhinolophus ferrumequinum | 51.35±1.76 | 67.47±0.25 | 64.92±0.4 | 82.54±0.06 | 82.22±0.06 |
| (0.25) | (0.03) | (0.05) | (0.01) | (0.01) |
Rhinolophus hipposideros | 42.58±1.52 | 94.49±0.39 | 92.33±0.66 | 111.36±0.14 | 111.06±0.13 |
| (0.23) | (0.04) | (0.07) | (0.01) | (0.01) |
Table 4.
Correct identification rates of echolocation calls by DFA, SVM and ENN classifying to genus. Rates are rounded to the closest integer.
Table 4.
Correct identification rates of echolocation calls by DFA, SVM and ENN classifying to genus. Rates are rounded to the closest integer.
| | Correct Classification Rate (%) |
---|
Genus | Calls | DFA | SVM | ENN |
---|
Barbastella | 35 | 97 | 100 | 100 |
Eptesicus | 57 | 74 | 79 | 98 |
Myotis | 219 | 92 | 99 | 100 |
Nyctalus | 172 | 58 | 88 | 95 |
Pipistrellus | 34 | 96 | 91 | 97 |
Plecotus | 135 | 71 | 97 | 100 |
Rhinolophus | 61 | 100 | 98 | 100 |
Mean | | 81 | 93 | 98 |
Table 5.
Correct identification rates of echolocation calls to species level achieved by DFA, SVM, ENN and hierarchical ensembles (Hier. Ensembles) of networks. Rates are rounded to the closest integer.
Table 5.
Correct identification rates of echolocation calls to species level achieved by DFA, SVM, ENN and hierarchical ensembles (Hier. Ensembles) of networks. Rates are rounded to the closest integer.
| | Correct Classification Rate (%) |
---|
Species | Calls | DFA | SVM | ENN | Hier. Ensembles |
---|
B. barbastellus | 35 | 97 | 97 | 100 | 100 |
E. serotinus | 57 | 56 | 75 | 98 | 98 |
M. bechsteinii | 25 | 88 | 88 | 100 | 100 |
M. brandtii | 50 | 68 | 80 | 94 | 98 |
M. daubentonii | 25 | 80 | 80 | 100 | 96 |
M. mystacinus | 36 | 42 | 64 | 97 | 97 |
M. nattereri | 83 | 71 | 94 | 99 | 100 |
N. leisleri | 84 | 45 | 71 | 91 | 88 |
N. noctula | 88 | 73 | 86 | 99 | 98 |
P. auritus | 34 | 62 | 94 | 97 | 97 |
P. pipistrellus | 36 | 81 | 92 | 100 | 100 |
P. pygmaeus | 99 | 80 | 99 | 100 | 100 |
R. ferrumequinum | 24 | 100 | 100 | 100 | 100 |
R. hipposideros | 37 | 100 | 97 | 100 | 100 |
Mean | | 73 | 87 | 98 | 98 |
Table 6.
Comparison of Wilks' lambda values between this study (1) and a previous study [
18] (2) for each call parameter used in the DFA to classify calls to genus and species. A lower Lambda indicates a higher importance of a parameter for classification.
Table 6.
Comparison of Wilks' lambda values between this study (1) and a previous study [18] (2) for each call parameter used in the DFA to classify calls to genus and species. A lower Lambda indicates a higher importance of a parameter for classification.
Parameter | Genus(1) | Genus(2) | Species(1) | Species(2) |
---|
End frequency | 0.197 | 0.213 | 0.147 | 0.126 |
Start frequency | 0.035 | 0.221 | 0.153 | 0.139 |
Centre frequency | 0.514 | 0.214 | 0.153 | 0.147 |
Duration | 0.002 | 0.345 | 0.163 | 0.221 |
Frequency of maximum energy | 0.835 | 0.262 | 0.18 | 0.2 |
Standardised Bandwidth | 0.97 | | 0.44 | |
Call type | 1 | | 0.505 | |
Quartile 4 | 1 | | 0.551 | |
Quartile 2 | 1 | | 0.579 | |
Quartile 1 | 1 | | 0.64 | |
Quartile 3 | 1 | | 0.699 | |
RoC | 1 | | 0.982 | |
Unlike DFA and artificial neural networks, the SVMs in this study did not have to classify a call as belonging to an output group and can leave it unclassified. The optimal (i.e. achieving the highest correct identification rate) SVM for classifying calls to genus had a correct classification rate of 93% (
Table 4). All calls from
Barbastella were correctly identified, and calls from
Myotis,
Pipistrellus,
Plecotus, and
Rhinolophus were all identified correctly with more than 90% accuracy. Calls from
Eptesicus were classified relatively poorly (79%) and were most often confused with calls from
Nyctalus species (16% of calls). Out of all calls, only one call from
Rhinolophus was unclassified. The measured duration of this call was lower than the actual call duration (as determined by visual inspection of a spectrogram of the call), which caused the start frequency of the call to be measured higher than the expected value. The call type parameter also classed this call as kinked.
The best SVM for classifying calls to species achieved an 87% correct classification rate (
Table 5), with calls from
R. ferrumequinum being classified correctly 100% of the time. Calls from
B. barbastellus,
M. nattereri,
P. auritus,
P. pipistrellus,
P. pygmaeus, and
R. hipposideros were all classified with greater than 90% accuracy. Calls from
M. mystacinus were the least accurately identified with just 64% classified correctly. Calls from this species were misclassified as being from
M. brandtii (17% of calls) and
M. nattereri (8% of calls). Calls from
N. leisleri were also identified poorly with 71% correct classification. Calls were misclassified as being from
E. serotinus (11% of calls) and
N. noctula (10% of calls). Eighty percent of calls from
M. daubentonii were classified correctly while 16% were unable to be classified. A single call from
R. hipposideros was unclassified. This was the same call that could not be classified by the support vector machine classifying calls to genus.
All ENNs consisted of 21 neural networks, as there were sufficient networks that classified with greater than 50% accuracy. The ENN trained to classify calls to genus yielded a 98% correct classification rate (
Table 4). Calls from
Barbastella,
Myotis,
Pipistrellus, and
Rhinolophus were all classified with 100% accuracy. Calls from all other genera were classified with more than 90% accuracy. The lowest rate of correct classification was for calls from the genus
Nyctalus (95%); which were confused with calls of
Eptesicus.
Classification of calls to species achieved 98% accuracy (
Table 5). Calls from
B. barbastellus,
M. bechsteinii,
M. daubentonii, both
Pipistrellus species and both
Rhinolophus species were all classified with 100% accuracy. Calls from all other species were classified with more than 90% accuracy. Calls from
N. leisleri were 91% classified correctly, and were misclassified as calls of
E. serotinus (5%), and
N. noctula (4%).
The hierarchical ensemble of the best performing neural networks achieved an overall accuracy of 98% (
Table 5). Each ensemble of species classifiers within the
Myotis,
Nyctalus,
Pipistrellus, and
Rhinolophus genera consisted of 21 neural networks. Echolocation calls from
B. barbastellus,
M. bechsteinii,
M. nattereri,
P. pipistrellus,
P. pygmaeus,
R. ferrumequinum, and
R. hipposideros were all classified with 100% accuracy. Echolocation calls from
E. serotinus,
M. brandtii,
M. daubentonii,
M. mystacinus,
N. noctula and
Pl. auritus were all classified with 90% accuracy or greater. Calls from
N. leisleri were classified with the least accuracy (88%), and confused with calls from
E. serotinus (12% of calls).
4. Discussion
Parameters measured from calls of
P. auritus were different from those published using the same dataset [
21]. This discrepancy likely resulted from a difference in the parameter measurement algorithm used in the two studies. In the previous study, when a jump in the frequency of the fundamental harmonic was detected, indicative of a switch in energy to a higher harmonic, the algorithm continued mapping the frequency-time course of the fundamental. In this study, such a jump would cause the call isolation algorithm to stop prematurely and return parameters representing only a part of the call. Regardless of the true values of call parameters, as long as values are measured consistently and differ from those of other species’ calls, then the classifiers will be able to correctly identify this species. Such was the case here with more than 95% of
P. auritus calls classified correctly. However, parameters presented here should not be seen as accurately describing the calls of
P. auritus.In addition to the five base parameters [
21], this study extracted seven previously undescribed descriptors from the calls. Since the five base parameters measured from all but one species’ calls were in agreement with those published previously [
21], the increase in classification performance can be attributed to either the new parameters or the new classification algorithms (ENNs and SVM). Although the DFA in this study misclassified more calls than the previous study [
21], a DFA using only the five base parameters resulted in a classification rate for species and genus 1% and 4% lower respectively (Redgwell, unpubl. data) when compared with the DFA when 9 parameters were included. This suggests that although the new parameters do help increase correct identification rates, the use of new classification algorithms had a greater effect.
To provide consistent objective call measurements, and prevent the user adding variation to echolocation call parameters by subjective measurement, all parameter extraction algorithms used in this study were deterministic and return the same result from the same echolocation call recording regardless of the user. This is important when the classification techniques are trained on parameters extracted in a particular way with their own inherent biases. By using a common method of parameter extraction, the bias of parameter measurements remains the same as those parameters used to train the classifiers [
22]. This consistency can increase the likelihood of obtaining an accurate classification of unseen calls [
15].
Although often used in classification of calls from echolocating bats, we chose to not use interpulse interval as an additional parameter because it relies on knowing that two adjacent calls in a signal are from the same individual. This would require classification of the calls before measuring the interpulse interval. An extension to the algorithms used in this study could potentially use this measurement as a confirmation or augmentation of classification.
The classification of echolocation calls is a complex task. Recordings of echolocation calls are often only partially accurate representations of the original call produced by the bat [
23]. A multitude of environmental factors can cause variability and distortion in call recordings [
23]. Robust machine learning techniques tolerant of noisy datasets are therefore desirable. More established techniques such as DFA have been shown to perform relatively poorly when classifying the calls of some species of British bats [
21]. However more recent machine learning techniques such as SVM, neural networks, and ENN have demonstrated superior performance on highly noisy datasets [
17,
24,
25,
26,
27]. Nevertheless, the performance of these classifiers still depends on both the quality and quantity of available data.
The classification rates of calls achieved in this study using DFA was less accurate than those published previously [
21]. While both studies used the same dataset, this study extracted the parameters from the calls using a different automated method, and this likely accounts for the disparity. Any extraction method is likely to contain biases that may cause parameters measured from different species or genera to overlap, resulting in already problematic genera or species becoming more so. Alternatively, features that may help to separate cryptic (sibling) species may become more apparent in the dataset when using alternate algorithms to extract parameters, and the value of these features may be enhanced by the use of more complex classification algorithms [
17,
27].
The correct classification rates of this study exceed that of other studies that have attempted to classify bat echolocation calls. A study that used multivariate analysis to classify calls of bats in south-west England was able to reliably identify the species of bat in 83% of 5519 bat passes [
11] while a study on Swiss bats classified a wide range of species with varying degrees of accuracy using DFA and a synergetic pattern recognition algorithm [
28]. Classification rates achieved by the Swiss study for species in common with the UK were 65% using DFA and 75% using synergetic pattern recognition (after filtering, with random selection). Previous research [
21] using the same library of calls analysed in this study was able to achieve 79% correct classification rate using DFA and 87% using a neural network.
Despite the promising results of this study, the confusion between calls of species in the
Myotis genus is well known [
21,
28] and remains troublesome. This illustrates the difficulty of the task of separating calls having many dimensions of variability, and the perhaps unavoidable problem that some call recordings lack sufficient information to enable correct classification.
Of the top five parameters measured, the most important for the separation of bats to genus in this study was duration, which was the least important parameter in the previous study [
21] for classification to both genus and species. Previous Wilks’ lambda values [
21] were all below 0.4, which suggests that further parameters would be beneficial in separation of the classes. In this study, genus classification was done predominantly by duration, start frequency, end frequency and center frequency. Species classification was achieved mainly with the five base parameters. However, call type and the four quarter parameters also assisted. The rate of change parameter provided little assistance classifying species or genera.
Our SVMs and ENNs outperformed hierarchical networks [
21] and showed similar identification rates for non-
Myotis species to those obtained using speech recognition algorithms [
15]; the only other published study to use machine learning techniques to classify calls of bats. The SVM classifier achieved higher classification rates for most species and genera. The minimum classification rate achieved by the SVM at genus and species level was higher when compared with the neural networks [
21] and overall, the SVMs performed marginally better for genus classification (~1%) and species classification (2%). Support vector machines will always find the global minimum of the configuration [
29] unlike the other classifiers used in this study, since the classification algorithm is absent of local minima [
30]. Application of the minimised function onto unseen data may be less effective than other techniques however [
29], which is likely to explain why the ENNs performed better. Specifically, the SVM confused around 13% of calls from
E. serotinus with calls from both species of
Nyctalus, and about 25% of calls from
Nyctalus species with
E. serotinus. Up to 14% of calls from species within the
Myotis genus were also confused with other
Myotis species, or remained unclassified.
ENNs trained to classify calls to genus and species performed well with both achieving correct classification rates exceeding 97%. All species and genera were classified with higher rates than the neural network trained in the previous study [
21]. Importantly, the lowest correct classification rates achieved by the genus- and species-level networks were 95% and 91% respectively. Despite the hierarchical ENN containing classifiers trained for a more specific task, it produced a slightly lower overall classification rate. This contrasted with our expectation that classification rates would be increased by including classifiers in the ENN that were trained to classify calls belonging to a single genus, rather than classifiers trained to classify calls from all species. The initial weights and configuration parameters (learning rate, momentum constant, architecture etc) all predetermine the resulting classification abilities of a neural network [
19]. We speculate that even with the extensive amount of network training we carried out, the optimal configuration was not obtained. Given the number of possible configurations of the neural networks, and the amount of time to train each network, we could not test all possible network configurations.
The ENNs we trained used 21 neural networks and fortunately always provided a majority vote for a particular target. However, with 21 neural networks, it is possible to obtain a result where an input is classified as three separate species with equal votes. A simple work around for such a case would be to output the result as ambiguous. However, using a prime number of neural networks will always prevent a tie vote.
Other studies that have used classification techniques such as neural networks or support vector machines have shown similar promise in other areas of biology. For example, hidden Markov models have been used to classify bird song elements from zebra finches and indigo buntings [
31]. The task of bird song classification has similar complexity to bat call classification in that it requires a complex classifier to discriminate small differences between classes. The Markov model was able to classify more than 90% of syllables from both bird species correctly, and in many cases, correct classification exceeded 95%.
While the results from this study show promise for automated classification of bat calls, there are several caveats associated with this system. The calls in this study were all high quality recordings with relatively low amounts of background noise. In practice, field recording often produces many lower quality recordings that would confound the automated parameter extraction of this study and yield poor results. However, experienced researchers are unable to classify such poor quality recordings any better than automated algorithms [
9]. This system is also biased towards high intensity calls, which are more likely to be recorded with higher quality than low intensity calls. Relatively quiet calls are also often recorded with underestimated bandwidth [
23]. The difference in call intensity can be substantial;
P. auritus produces echolocation calls as quiet as 76 dB peSPL at 10 cm [
32], whereas
Eptesicus bottae can produce calls with an intensity of 133 dB peSPL at 10 cm [
33]. This highlights the system’s reliance on high quality recordings to accurately identify bat species. Systems have been developed that can better isolate calls from high-noise recordings [
15]. However, information lost from calls due to atmospheric attenuation will remain lost to the researcher no matter what the recording equipment or field conditions. The system used in this study has only been applied to calls from the United Kingdom to date, and other species assemblages may present calls that are inseparable by this system due to similarity. Furthermore, the number of species in this study is relatively low and adding more species would significantly increase the complexity of the classification, and therefore impact the classification rates.
The adoption of a quantitative approach to the identification of echolocation calls offers consistent and repeatable classification of unknown calls. In general, the ENN classifier approach provides a method to classify calls from a dataset that can encompass the strengths of a wide range of different classifiers, that is overall more robust to noise, and yielding more accurate classification. Classification rates of machine learning techniques should be quoted as the minimum classification rate (i.e. species with the lowest correct identification rate), since this is the most important value for researchers wishing to use an automated classifier.