Abstract
Research into automated systems for detecting and classifying marine mammals in acoustic recordings is expanding internationally due to the necessity to analyze large collections of data for conservation purposes. In this work, we present a Convolutional Neural Network that is capable of classifying the vocalizations of three species of whales, non-biological sources of noise, and a fifth class pertaining to ambient noise. In this way, the classifier is capable of detecting the presence and absence of whale vocalizations in an acoustic recording. Through transfer learning, we show that the classifier is capable of learning high-level representations and can generalize to additional species. We also propose a novel representation of acoustic signals that builds upon the commonly used spectrogram representation by way of interpolating and stacking multiple spectrograms produced using different Short-time Fourier Transform (STFT) parameters. The proposed representation is particularly effective for the task of marine mammal species classification where the acoustic events we are attempting to classify are sensitive to the parameters of the STFT.
Stan Matwin’s research is supported by the Natural Sciences and Engineering Research Council and by the Canada Research Chairs program.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Protecting north Atlantic right whales from collisions with ships in the Gulf of St. Lawrence. http://bit.ly/tc_whales
Abdel-Hamid, O., Mohamed, A.R., Jiang, H., Deng, L., Penn, G., Yu, D.: Convolutional neural networks for speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22(10), 1533–1545 (2014)
Baumgartner, M.F., Mussoline, S.E.: A generalized baleen whale call detection and classification system. J. Acoust. Soc. Am. 129(5), 2889–2902 (2011)
Choi, K., Fazekas, G., Sandler, M., Cho, K.: Convolutional recurrent neural networks for music classification. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2392–2396. IEEE (2017)
Clark, C.W., Marler, P., Beeman, K.: Quantitative analysis of animal vocal phonology: an application to swamp sparrow song. Ethology 76(2), 101–115 (1987)
Cooley, J.W., Tukey, J.W.: An algorithm for the machine calculation of complex Fourier series. Math. Comput. 19(90), 297–301 (1965)
Deng, L., et al.: Recent advances in deep learning for speech research at Microsoft. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 26, p. 64. IEEE (2013)
Dugan, P.J., Rice, A.N., Urazghildiiev, I.R., Clark, C.W.: North Atlantic right whale acoustic signal processing: Part i. comparison of machine learning recognition algorithms. In: IEEE Long Island Systems, Applications and Technology Conference, pp. 1–6. IEEE (2010)
Gillespie, D., Caillat, M., Gordon, J., White, P.: Automatic detection and classification of odontocete whistles. J. Acous. Soc. Am. 134(3), 2427–2437 (2013)
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Halkias, X.C., Paris, S., Glotin, H.: Classification of mysticete sounds using machine learning techniques. J. Acous. Soc. Am. 134(5), 3496–3505 (2013)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Humphrey, E.J., Bello, J.P.: Rethinking automatic chord recognition with convolutional neural networks. In: 11th International Conference on Machine Learning and Applications (ICMLA), vol. 2, pp. 357–362. IEEE (2012)
Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3128–3137 (2015)
Liu, S., Liu, M., Wang, M., Ma, T., Qing, X.: Classification of cetacean whistles based on convolutional neural network. In: 10th International Conference on Wireless Communications and Signal Processing (WCSP), pp. 1–5. IEEE (2018)
Luo, W., Yang, W., Zhang, Y.: Convolutional neural network for detecting odontocete echolocation clicks. J. Acous. Soc. Am. 145(1), EL7–EL12 (2019)
van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)
Mellinger, D.K., Martin, S.W., Morrissey, R.P., Thomas, L., Yosco, J.J.: A method for detecting whistles, moans, and other frequency contour sounds. J. Acous. Soc. Am. 129(6), 4055–4061 (2011)
Paszke, A., et al.: Automatic differentiation in PyTorch. In: NIPS-W (2017)
Piczak, K.J.: Environmental sound classification with convolutional neural networks. In: IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6. IEEE (2015)
Roch, M.A., et al.: Classification of echolocation clicks from odontocetes in the southern California bight. J. Acous. Soc. Am. 129(1), 467–475 (2011)
Salamon, J., Bello, J.P.: Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Process. Lett. 24(3), 279–283 (2016)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Skowronski, M.D., Harris, J.G.: Acoustic detection and classification of microchiroptera using machine learning: lessons learned from automatic speech recognition. J. Acous. Soc. Am. 119(3), 1817–1833 (2006)
van Den Oord, A., et al.: Wavenet: a generative model for raw audio. SSW 125 (2016)
Wang, D., Zhang, L., Lu, Z., Xu, K.: Large-scale whale call classification using deep convolutional neural network architectures. In: IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC), pp. 1–5. IEEE (2018)
Zimmer, W.M.: Passive Acoustic Monitoring of Cetaceans. Cambridge University Press, New York (2011)
Acknowledgements
Collaboration between researchers at JASCO Applied Sciences and Dalhousie University was made possible through a Natural Sciences and Engineering Research Council Engage Grant. The acoustic recordings described in this paper were collected by JASCO Applied Sciences under a contribution agreement with the Environmental Studies Research Fund.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Thomas, M., Martin, B., Kowarski, K., Gaudet, B., Matwin, S. (2020). Marine Mammal Species Classification Using Convolutional Neural Networks and a Novel Acoustic Representation. In: Brefeld, U., Fromont, E., Hotho, A., Knobbe, A., Maathuis, M., Robardet, C. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019. Lecture Notes in Computer Science(), vol 11908. Springer, Cham. https://doi.org/10.1007/978-3-030-46133-1_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-46133-1_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-46132-4
Online ISBN: 978-3-030-46133-1
eBook Packages: Computer ScienceComputer Science (R0)