research-article

Pattern Matching Techniques for Replacing Missing Sections of Audio Streamed across Wireless Networks

Authors:

Jonathan Doherty,

Paul McKevittAuthors Info & Claims

ACM Transactions on Intelligent Systems and Technology (TIST), Volume 6, Issue 2

Article No.: 25, Pages 1 - 38

https://doi.org/10.1145/2663358

Published: 31 March 2015 Publication History

Abstract

Streaming media on the Internet can be unreliable. Services such as audio-on-demand drastically increase the loads on networks; therefore, new, robust, and highly efficient coding algorithms are necessary. One method overlooked to date, which can work alongside existing audio compression schemes, is that which takes into account the semantics and natural repetition of music. Similarity detection within polyphonic audio has presented problematic challenges within the field of music information retrieval. One approach to deal with bursty errors is to use self-similarity to replace missing segments. Many existing systems exist based on packet loss and replacement on a network level, but none attempt repairs of large dropouts of 5 seconds or more. Music exhibits standard structures that can be used as a forward error correction (FEC) mechanism. FEC is an area that addresses the issue of packet loss with the onus of repair placed as much as possible on the listener's device. We have developed a server--client-based framework (SoFI) for automatic detection and replacement of large packet losses on wireless networks when receiving time-dependent streamed audio. Whenever dropouts occur, SoFI swaps audio presented to the listener between a live stream and previous sections of the audio stored locally. Objective and subjective evaluations of SoFI where subjects were presented with other simulated approaches to audio repair together with simulations of replacements including varying lengths of time in the repair give positive results.

References

[1]

S. Abdallah, K. Noland, M. Sandler, M. Casey, and C. Rhodes. 2005. Theory and evaluation of a Bayesian music structure extractor. In Proceedings of the 6th International Conference on Music Information Retrieval. 420--425.

[2]

M. Bartsch and G. Wakefield. 2001. To catch a chorus: Using chroma-based representations for audio thumbnailing. In Proceedings of the IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics. 15--18.

[3]

J. Bilmes and C. Bartels. 2005. Graphical model architectures for speech recognition. IEEE Signal Processing Magazine 22, 5, 89--100.

[4]

J. Bolot, S. Fosse-Parisis, and D. Towsley. 1999. Adaptive FEC-based error control for Internet telephony. In Proceedings of INFOCOM’99: The 18th Annual Joint Conference of the IEEE Computer and Communications Societies. 1453--1460.

[5]

R. Boyer and J. Moore. 1977. A fast string searching algorithm. Communications of the ACM 20, 10, 762--772.

Digital Library

[6]

P. Bradley and U. Fayyad. 1998. Refining initial points for k-means clustering. In Proceedings of the 15th International Conference on Machine Learning. 91--99.

Digital Library

[7]

J. Burred and A. Lerch. 2003. A hierarchical approach to automatic musical genre classification. In Proceedings of the 6th International Conference on Digital Audio Effects. 308--311.

[8]

S. Bush. 2000. Active Jitter Control. Intelligence in Services and Networks (ISN), London, UK.

[9]

A. Cater and N. O’Kennedy. 2000. You hum it, and I’ll play it. In Proceedings of the 11th Conference on Artificial Intelligence and Cognitive Science.

[10]

C. Charras and T. Lecroq. 2004. Handbook of Exact String Matching Algorithms. King's College Publications.

Digital Library

[11]

Y. Cho and S. Choi. 2005. Nonnegative features of spectro-temporal sounds for classification. Pattern Recognition Letters 26, 9, 1327--1336.

Digital Library

[12]

C. Chuan and E. Chew. 2004. Polyphonic audio key finding using the spiral array CEG algorithm. In Proceedings of the International Conference on Multimedia and Expo.

[13]

M. Crochemore and W. Rytter. 1994. Text Algorithms. Oxford University Press, New York, USA.

Digital Library

[14]

R. Dannenberg and N. Hu. 2003. Pattern discovery techniques for music audio. Journal of New Music Research 32, 2, 153--163.

[15]

S. Doraisamy and S. Ruger. 2004. A polyphonic music retrieval system using n-grams. In Proceedings of the International Conference on Music Information Retrieval. 204--209.

[16]

R. Duda, P. Hart, and D. Stork. 2000. Pattern Classification. Wiley-Interscience.

Digital Library

[17]

T. Eerola and P. Toiviainen. 2004. MIR in MATLAB: The MIDI toolbox. In Proceedings of the International Conference on Music Information Retrieval. 22--27.

[18]

J. Foote and M. Cooper. 2003. Media segmentation using self-similarity decomposition. Proceedings of SPIE 50, 21, 167--175.

[19]

W. Frakes and R. Baeza-Yates. 1992. Information Retrieval: Data Structures and Algorithms. Prentice Hall, Upper Saddle River, NJ.

Digital Library

[20]

A. Ghias, J. Logan, D. Chamberlin, and B. Smith. 1995. Query by humming: Musical information retrieval in an audio database. In Proceedings of the 3rd ACM International Conference on Multimedia. 231--236.

Digital Library

[21]

M. Good. 2001. MusicXML: An Internet-friendly format for sheet music. In Proceedings of the XML Conference and Expo. 3--4.

[22]

D. Hermes. 1988. Measurement of pitch by subharmonic summation. Journal of the Acoustical Society of America 83, 257.

[23]

P. Herrera, V. Sandvold, and F. Gouyon. 2004. Percussion-related semantic descriptors of music audio files. In Proceedings of the AES 25th International Conference.

[24]

N. Hu and R. Dannenberg. 2002. A comparison of melodic database retrieval techniques using sung queries. In Proceedings of the 2nd ACM/IEEE-CS Joint Conference on Digital Libraries. 301--307.

Digital Library

[25]

Humdrum. 2008. The Humdrum Toolkit: Software for Music Research. http://dactyl.som.ohio-state.edu/Humdrum.

[26]

Icecast. 2008. Icecast Home Page. Retrieved February 18, 2015, from http://www.icecast.org.

[27]

R. Jackendoff. 1987. Consciousness and the Computational Mind. MIT Press, Cambridge, MA.

[28]

R. Jackendoff. 2002. Foundations of Language: Brain, Meaning, Grammar. Oxford University Press.

[29]

I. Jackson. 2008. Song forms and terms - a quick study. http://irenejackson.com/songblog/song-forms-and- terms-a-quick-study/.

[30]

W. Jiang and H. Schulzrinne. 2002. Comparison and optimization of packet loss repair methods on VoIP perceived quality under bursty loss. In Proceedings of the 12th International Workshop on Network and Operating Systems Support for Digital Audio and Video. Florida, USA, 73--81.

Digital Library

[31]

I. Jolliffe. 1986. Principal Component Analysis. Springer-Verlag, New York, NY.

[32]

D. Jurafsky and J. Martin. 2000. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice Hall, Upper Saddle River, NJ.

Digital Library

[33]

H. Kim, N. Moreau, and T. Sikora. 2004. Audio classification based on MPEG-7 spectral basis representations. IEEE Transactions on Circuits and Systems for Video Technology 14, 5, 716--725.

Digital Library

[34]

H. Kriegel, P. Kunath, M. Pfeifie, and M. Renz. 2005. Approximated Clustering of Distributed high-dimensional data. In Advances in Knowledge Discovery and Data Mining. Lecture Notes in Computer Science, Vol. 3518. 432--441.

Digital Library

[35]

F. Kurth, A. Ribbrock, and M. Clausen. 2002. Efficient fault tolerant search techniques for full-text audio retrieval. In Proceedings of the AES Convention.

[36]

G. Lakoff. 1988. Cognitive semantics. In Meaning and Mental Representations, U. Eco (Ed.). Indiana University Press, Bloomington, IN, 119--154.

[37]

Y. Lamdan, J. Schwartz, and H. Wolfson. 1988. Object recognition by affine invariant matching. In Proceedings of the Computer Society Conference on Computer Vision and Pattern Recognition. 335--344.

[38]

K. Lee and S. Chanson. 2004. Packet loss probability for bursty wireless real-time traffic through delay model. IEEE Transactions on Vehicular Technology 53, 3, 929--938.

[39]

M. Leman, L. Clarisse, B. De Baets, H. De Meyer, M. Lesaffre, G. Martens, J. Martens, and D. Van Steelant. 2002. Tendencies, Perspectives, and Opportunities of Musical Audio-Mining. Forum Acusticum Sevilla.

[40]

K. Lemstrom and E. Ukkonen. 2000. Including interval encoding into edit distance based music comparison and retrieval. In Proceedings of the AISB Symposium on Creative and Cultural Aspects and Applications of AI and Cognitive Science. 53--60.

[41]

K. Lemstrom, V. Makinen, A. Pienimaki, M. Turkia and E. Ukkonen. 2003. The C-BRAHMS project. In Proceedings of the 4th International Conference on Music Information Retrieval. 237--238.

[42]

F. Lerdahl and R. Jackendoff. 1983. A Generative Theory of Tonal Music. MIT Press, Cambridge, MA.

[43]

Y. Liang, N. Farber, and B. Girod. 2003. Adaptive playout scheduling and loss concealment for voice communication over IP networks. IEEE Transactions on Multimedia 5, 4, 532--543.

Digital Library

[44]

R. Likert. 1932. A technique for the measurement of attitudes. Archives of Psychology 22, 140, 1--55.

[45]

S. Lin, D. Costello, and M. Miller. 1984. Automatic-repeat-request error-control schemes. IEEE Communications Magazine 22, 12, 5--17.

Digital Library

[46]

B. Logan. 2000. Mel frequency cepstral coefficients for music modelling. In Proceedings of the International Symposium on Music Information Retrieval.

[47]

B. Logan and A. Salomon. 2001. A music similarity function based on signal analysis. In Proceedings of the IEEE International Conference on Multimedia and Expo.

[48]

J. Lukasiak, D. Stirling, N. Harders, and S. Perrow. 2003. Performance of mpeg-7 low level audio descriptors with compressed data. In Proceedings of the International Conference on Multimedia and Expo. 273--276.

Digital Library

[49]

R. Lyons. 2004. Understanding Digital Signal Processing. Prentice Hall, Upper Saddle River, NJ.

Digital Library

[50]

A. Mahanti, D. Eager, M. Vernon, and D. Sundaram-Stukel. 2003. Scalable on-demand media streaming with packet loss recovery. IEEE/ACM Transactions on Networking 11, 2, 195--209.

Digital Library

[51]

J. Mao, A. Jain, I. Center, and C. San Jose. 1996. A self-organizing network for hyperellipsoidal clustering (HEC). IEEE Transactions on Neural Networks 7, 1, 16--29.

Digital Library

[52]

J. Martinez, R. Koenen, and F. Pereira. 2002. MPEG-7: The generic multimedia content description standard, part 1. IEEE MultiMedia 9, 2, 78--87.

Digital Library

[53]

R. McNab, L. Smith, D. Bainbridge, and I. Witten. 1997. The New Zealand digital library MELody inDEX. D-Lib Magazine 3, 5, 4--15.

[54]

E. Menin. 2002. The Streaming Media Handbook. Pearson Education.

Digital Library

[55]

D. Meredith, G. Wiggins, and K. Lemstrom. 2001. Pattern induction and matching in polyphonic music and other multidimensional datasets. In Proceedings of the 5th World Multiconference on Systemics, Cybernetics and Informatics (SCI’01). London, UK, 22--25.

[56]

D. Meredith, K. Lemstrom, and G. Wiggins. 2002. Algorithms for discovering repeated patterns in multidimensional representations of polyphonic music. Journal of New Music Research 31, 4, 321--345.

[57]

G. Navarro and M. Raffinot. 2002. Flexible Pattern Matching in Strings: Practical On-Line Search Algorithms for Texts and Biological Sequences. Cambridge University Press.

Digital Library

[58]

G. Navarro, M. Raffinot, and M. Farach-Colton. 1998. A bit-parallel approach to suffix automata: Fast extended string matching. In Proceedings of the 9th Annual Symposium on Combinatorial Pattern Matching. 14--33.

Digital Library

[59]

A. Ockelford. 1991. The role of repetition in perceived musical structures. In Representing Musical Structure, P. Howell, R. West, and I. Cross (Eds.). Academic Press, London, England, 129--160.

[60]

D. Parsons. 1975. Directory of Tunes and Musical Themes. Oxford Press, UK.

[61]

S. Pauws. 2002. Cubyhum: A fully operational query by humming system. In Proceedings of the 3rd International Conference on Music Information Retrieval. 187--196.

[62]

M. Pearce and G. Wiggins. 2002. Aspects of a cognitive theory of creativity in musical composition. In Proceedings of the ECAI Workshop on Creative Systems. 17--24.

[63]

C. Perkins, O. Hodson, and V. Hardman. 1998. A survey of packet loss recovery techniques for streaming audio. IEEE Network 12, 5, 40--48.

Digital Library

[64]

L. Prechelt and R. Typke. 2001. An interface for melody input. ACM Transactions on Computer-Human Interaction 8, 2, 133--149.

Digital Library

[65]

L. Rabiner. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77, 2, 257--286.

[66]

F. Salzer. 1962. Structural Hearing: Tonal Coherence in Music. Dover Publications.

[67]

H. Sapp and B. Aarden. 2008. Themefinder. Available at http://www.themefinder.org.

[68]

M. Schedl, T. Pohle, P. Knees, and G. Widmer. 2011. Exploring the music similarity space on the Web. ACM Transactions on Information Systems 29, 3, Article No. 14.

Digital Library

[69]

E. Schubert, J. Wolfe, and A. Tarnopolsky. 2004. Spectral centroid and timbre in complex, multiple instrumental textures. In Proceedings of the 8th International Conference on Music Perception and Cognition.

[70]

J. Seo, M. Jin, S. Lee, D. Jang, S. Lee, and C. Yoo. 2005. Audio fingerprinting based on normalized spectral subband centroids. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. 213--216.

[71]

Semantic. 2008. Semantic Interaction with Audio Contents. Retrieved August 6, 2008, from http://www. semanticaudio.co.uk/.

[72]

M. Slaney, I. Center, and C. San Jose. 2002. Semantic-audio retrieval. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. 4108--4111.

[73]

T. Socolofsky and C. Kale. 2008. A TCP/IP Tutorial. Retrieved February 18, 2015 from http://www.ietf.org/rfc/rfc1180.txt.

Digital Library

[74]

W. Stevens. 1993. TCP/IP Illustrated Volume 1: The Protocols. Addison-Wesley Longman Publishing Co., Inc. Boston, MA, USA.

Digital Library

[75]

S. Stevens, J. Volkmann, and E. Newman. 1937. A scale for the measurement of the psychological magnitude pitch. The Journal of the Acoustical Society of America 8, 1, 185--194.

[76]

H. Sze, S. Liew, and Y. Lee. 2001. A packet-loss-recovery scheme for continuous-media streaming over the Internet. IEEE Communications Letters 5, 3, 116--118.

[77]

A. Tanguiane. 1993. Artificial Perception and Music Recognition. Springer, Berlin.

Digital Library

[78]

D. Turnbull, L. Barrington, D. Torres, and G. Lanckriet. 2007. Towards musical query-by-semantic-description using the CAL500 data set. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 439--446.

Digital Library

[79]

D. Turnbull, L. Barrington, D. Torres, and G. Lanckriet. 2008. Semantic annotation and retrieval of music and sound effects. IEEE Transactions on Audio, Speech, and Language Processing 16, 2, 467--476.

Digital Library

[80]

G. Tzanetakis and P. Cook. 2002. Musical genre classification of audio signals. IEEE Transactions on Speech and Audio Processing 10, 5, 293--302.

[81]

G. Tzanetakis, A. Ermolinskyi, and P. Cook. 2003. Pitch histograms in audio and symbolic music information retrieval. Journal of New Music Research 32, 2, 143--152.

[82]

S. Varadarajan, H. Ngo, and J. Srivastava. 2002. Error spreading: A perception-driven approach to handling error in continuous media streaming. IEEE/ACM Transactions on Networking 10, 1, 139--152.

Digital Library

[83]

B. Wah and D. Lin. 2005. LSP-based multiple-description coding for real-time low bit-rate voice over IP. IEEE Transactions on Multimedia 7, 1, 167--178.

Digital Library

[84]

R. Walker. 1997. Visual metaphors as music notations for sung vowel spectra in different cultures. Journal of New Music Research 26, 4, 315--345.

[85]

H. Wallach. 2004. Evaluation metrics for hard classifiers. Unpublished Note. Available at http://www. inference.phy.cam.ac.uk/hmw26/papers/evaluation.ps.

[86]

Y. Wang, A. Ahmaniemi, D. Isherwood, and W. Huang. 2003. Content-based UEP: A new scheme for packet loss recovery in music streaming. In Proceedings of the 11th ACM International Conference on Multimedia. 412--421.

Digital Library

[87]

E. Wold, T. Blum, D. Keislar, and J. Wheaten. 1996. Content-based classification, search, and retrieval of audio. IEEE Multimedia 3, 3, 27--36.

Digital Library

[88]

G. Wiggins. 1998. Music, syntax, and the meaning of “meaning.” In Proceedings of the 1st Symposium on Music and Computers.

[89]

G. P. Williams. 1997. Chaos Theory Tamed. Taylor and Francis, London, England.

[90]

H. Zha, X. He, C. Ding, M. Gu, and H. Simon. 2002. Spectral relaxation for k-means clustering. Advances in Neural Information Processing Systems 2, 1, 1057--1064.

[91]

B. Zhang, J. Shen, Q. Xiang, and Y. Wang. 2009. CompositeMap: A novel framework for music similarity measure. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 403--410.

Digital Library

Cited By

Cong PTam NYin HZheng BStantic BHung N(2019)Efficient User Guidance for Validating Participatory Sensing DataACM Transactions on Intelligent Systems and Technology10.1145/332616410:4(1-30)Online publication date: 17-Jul-2019
https://dl.acm.org/doi/10.1145/3326164
Tao DTao DLi XGao X(2017)Large Sparse Cone Non-negative Matrix Factorization for Image AnnotationACM Transactions on Intelligent Systems and Technology10.1145/29873798:3(1-21)Online publication date: 20-Apr-2017
https://dl.acm.org/doi/10.1145/2987379

Index Terms

Pattern Matching Techniques for Replacing Missing Sections of Audio Streamed across Wireless Networks
1. Mathematics of computing
  1. Information theory
    1. Coding theory
2. Security and privacy
  1. Cryptography
    1. Mathematical foundations of cryptography

Recommendations

A self-similarity approach to repairing large dropouts of streamed music

Enjoyment of audio has now become about flexibility and personal freedom. Digital audio content can be acquired from many sources and wireless networking allows digital media devices and associated peripherals to be unencumbered by wires. However, ...
Error correction and error detection techniques for wireless ATM systems
Abstract
Error correction and error detection techniques are often used in wireless transmission systems. The Asynchronous Transfer Mode (ATM) employs Header Error Control (HEC). Since ATM specifications have been developed for high‐quality optical fiber ...
Burst-Aware Adaptive Forward Error Correction in Video Streaming over Wireless Networks
HPCC '08: Proceedings of the 2008 10th IEEE International Conference on High Performance Computing and Communications

Video streaming over wireless networks have many challenges due to the high error rate and burst packet error characteristic. Forward error correction (FEC) is a method commonly used to handle losses in real-time communication. Conventional FEC ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology

ACM Transactions on Intelligent Systems and Technology Volume 6, Issue 2

Special Section on Visual Understanding with RGB-D Sensors

May 2015

381 pages

ISSN:2157-6904

EISSN:2157-6912

DOI:10.1145/2753829

Editor:
Huan Liu
Arizona State University

Issue’s Table of Contents

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 March 2015

Accepted: 01 August 2014

Revised: 01 August 2014

Received: 01 July 2013

Published in TIST Volume 6, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
210
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)2

Reflects downloads up to 17 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Cong PTam NYin HZheng BStantic BHung N(2019)Efficient User Guidance for Validating Participatory Sensing DataACM Transactions on Intelligent Systems and Technology10.1145/332616410:4(1-30)Online publication date: 17-Jul-2019
https://dl.acm.org/doi/10.1145/3326164
Tao DTao DLi XGao X(2017)Large Sparse Cone Non-negative Matrix Factorization for Image AnnotationACM Transactions on Intelligent Systems and Technology10.1145/29873798:3(1-21)Online publication date: 20-Apr-2017
https://dl.acm.org/doi/10.1145/2987379

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents