Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Pattern Matching Techniques for Replacing Missing Sections of Audio Streamed across Wireless Networks

Published: 31 March 2015 Publication History

Abstract

Streaming media on the Internet can be unreliable. Services such as audio-on-demand drastically increase the loads on networks; therefore, new, robust, and highly efficient coding algorithms are necessary. One method overlooked to date, which can work alongside existing audio compression schemes, is that which takes into account the semantics and natural repetition of music. Similarity detection within polyphonic audio has presented problematic challenges within the field of music information retrieval. One approach to deal with bursty errors is to use self-similarity to replace missing segments. Many existing systems exist based on packet loss and replacement on a network level, but none attempt repairs of large dropouts of 5 seconds or more. Music exhibits standard structures that can be used as a forward error correction (FEC) mechanism. FEC is an area that addresses the issue of packet loss with the onus of repair placed as much as possible on the listener's device. We have developed a server--client-based framework (SoFI) for automatic detection and replacement of large packet losses on wireless networks when receiving time-dependent streamed audio. Whenever dropouts occur, SoFI swaps audio presented to the listener between a live stream and previous sections of the audio stored locally. Objective and subjective evaluations of SoFI where subjects were presented with other simulated approaches to audio repair together with simulations of replacements including varying lengths of time in the repair give positive results.

References

[1]
S. Abdallah, K. Noland, M. Sandler, M. Casey, and C. Rhodes. 2005. Theory and evaluation of a Bayesian music structure extractor. In Proceedings of the 6th International Conference on Music Information Retrieval. 420--425.
[2]
M. Bartsch and G. Wakefield. 2001. To catch a chorus: Using chroma-based representations for audio thumbnailing. In Proceedings of the IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics. 15--18.
[3]
J. Bilmes and C. Bartels. 2005. Graphical model architectures for speech recognition. IEEE Signal Processing Magazine 22, 5, 89--100.
[4]
J. Bolot, S. Fosse-Parisis, and D. Towsley. 1999. Adaptive FEC-based error control for Internet telephony. In Proceedings of INFOCOM’99: The 18th Annual Joint Conference of the IEEE Computer and Communications Societies. 1453--1460.
[5]
R. Boyer and J. Moore. 1977. A fast string searching algorithm. Communications of the ACM 20, 10, 762--772.
[6]
P. Bradley and U. Fayyad. 1998. Refining initial points for k-means clustering. In Proceedings of the 15th International Conference on Machine Learning. 91--99.
[7]
J. Burred and A. Lerch. 2003. A hierarchical approach to automatic musical genre classification. In Proceedings of the 6th International Conference on Digital Audio Effects. 308--311.
[8]
S. Bush. 2000. Active Jitter Control. Intelligence in Services and Networks (ISN), London, UK.
[9]
A. Cater and N. O’Kennedy. 2000. You hum it, and I’ll play it. In Proceedings of the 11th Conference on Artificial Intelligence and Cognitive Science.
[10]
C. Charras and T. Lecroq. 2004. Handbook of Exact String Matching Algorithms. King's College Publications.
[11]
Y. Cho and S. Choi. 2005. Nonnegative features of spectro-temporal sounds for classification. Pattern Recognition Letters 26, 9, 1327--1336.
[12]
C. Chuan and E. Chew. 2004. Polyphonic audio key finding using the spiral array CEG algorithm. In Proceedings of the International Conference on Multimedia and Expo.
[13]
M. Crochemore and W. Rytter. 1994. Text Algorithms. Oxford University Press, New York, USA.
[14]
R. Dannenberg and N. Hu. 2003. Pattern discovery techniques for music audio. Journal of New Music Research 32, 2, 153--163.
[15]
S. Doraisamy and S. Ruger. 2004. A polyphonic music retrieval system using n-grams. In Proceedings of the International Conference on Music Information Retrieval. 204--209.
[16]
R. Duda, P. Hart, and D. Stork. 2000. Pattern Classification. Wiley-Interscience.
[17]
T. Eerola and P. Toiviainen. 2004. MIR in MATLAB: The MIDI toolbox. In Proceedings of the International Conference on Music Information Retrieval. 22--27.
[18]
J. Foote and M. Cooper. 2003. Media segmentation using self-similarity decomposition. Proceedings of SPIE 50, 21, 167--175.
[19]
W. Frakes and R. Baeza-Yates. 1992. Information Retrieval: Data Structures and Algorithms. Prentice Hall, Upper Saddle River, NJ.
[20]
A. Ghias, J. Logan, D. Chamberlin, and B. Smith. 1995. Query by humming: Musical information retrieval in an audio database. In Proceedings of the 3rd ACM International Conference on Multimedia. 231--236.
[21]
M. Good. 2001. MusicXML: An Internet-friendly format for sheet music. In Proceedings of the XML Conference and Expo. 3--4.
[22]
D. Hermes. 1988. Measurement of pitch by subharmonic summation. Journal of the Acoustical Society of America 83, 257.
[23]
P. Herrera, V. Sandvold, and F. Gouyon. 2004. Percussion-related semantic descriptors of music audio files. In Proceedings of the AES 25th International Conference.
[24]
N. Hu and R. Dannenberg. 2002. A comparison of melodic database retrieval techniques using sung queries. In Proceedings of the 2nd ACM/IEEE-CS Joint Conference on Digital Libraries. 301--307.
[25]
Humdrum. 2008. The Humdrum Toolkit: Software for Music Research. http://dactyl.som.ohio-state.edu/Humdrum.
[26]
Icecast. 2008. Icecast Home Page. Retrieved February 18, 2015, from http://www.icecast.org.
[27]
R. Jackendoff. 1987. Consciousness and the Computational Mind. MIT Press, Cambridge, MA.
[28]
R. Jackendoff. 2002. Foundations of Language: Brain, Meaning, Grammar. Oxford University Press.
[29]
I. Jackson. 2008. Song forms and terms - a quick study. http://irenejackson.com/songblog/song-forms-and- terms-a-quick-study/.
[30]
W. Jiang and H. Schulzrinne. 2002. Comparison and optimization of packet loss repair methods on VoIP perceived quality under bursty loss. In Proceedings of the 12th International Workshop on Network and Operating Systems Support for Digital Audio and Video. Florida, USA, 73--81.
[31]
I. Jolliffe. 1986. Principal Component Analysis. Springer-Verlag, New York, NY.
[32]
D. Jurafsky and J. Martin. 2000. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice Hall, Upper Saddle River, NJ.
[33]
H. Kim, N. Moreau, and T. Sikora. 2004. Audio classification based on MPEG-7 spectral basis representations. IEEE Transactions on Circuits and Systems for Video Technology 14, 5, 716--725.
[34]
H. Kriegel, P. Kunath, M. Pfeifie, and M. Renz. 2005. Approximated Clustering of Distributed high-dimensional data. In Advances in Knowledge Discovery and Data Mining. Lecture Notes in Computer Science, Vol. 3518. 432--441.
[35]
F. Kurth, A. Ribbrock, and M. Clausen. 2002. Efficient fault tolerant search techniques for full-text audio retrieval. In Proceedings of the AES Convention.
[36]
G. Lakoff. 1988. Cognitive semantics. In Meaning and Mental Representations, U. Eco (Ed.). Indiana University Press, Bloomington, IN, 119--154.
[37]
Y. Lamdan, J. Schwartz, and H. Wolfson. 1988. Object recognition by affine invariant matching. In Proceedings of the Computer Society Conference on Computer Vision and Pattern Recognition. 335--344.
[38]
K. Lee and S. Chanson. 2004. Packet loss probability for bursty wireless real-time traffic through delay model. IEEE Transactions on Vehicular Technology 53, 3, 929--938.
[39]
M. Leman, L. Clarisse, B. De Baets, H. De Meyer, M. Lesaffre, G. Martens, J. Martens, and D. Van Steelant. 2002. Tendencies, Perspectives, and Opportunities of Musical Audio-Mining. Forum Acusticum Sevilla.
[40]
K. Lemstrom and E. Ukkonen. 2000. Including interval encoding into edit distance based music comparison and retrieval. In Proceedings of the AISB Symposium on Creative and Cultural Aspects and Applications of AI and Cognitive Science. 53--60.
[41]
K. Lemstrom, V. Makinen, A. Pienimaki, M. Turkia and E. Ukkonen. 2003. The C-BRAHMS project. In Proceedings of the 4th International Conference on Music Information Retrieval. 237--238.
[42]
F. Lerdahl and R. Jackendoff. 1983. A Generative Theory of Tonal Music. MIT Press, Cambridge, MA.
[43]
Y. Liang, N. Farber, and B. Girod. 2003. Adaptive playout scheduling and loss concealment for voice communication over IP networks. IEEE Transactions on Multimedia 5, 4, 532--543.
[44]
R. Likert. 1932. A technique for the measurement of attitudes. Archives of Psychology 22, 140, 1--55.
[45]
S. Lin, D. Costello, and M. Miller. 1984. Automatic-repeat-request error-control schemes. IEEE Communications Magazine 22, 12, 5--17.
[46]
B. Logan. 2000. Mel frequency cepstral coefficients for music modelling. In Proceedings of the International Symposium on Music Information Retrieval.
[47]
B. Logan and A. Salomon. 2001. A music similarity function based on signal analysis. In Proceedings of the IEEE International Conference on Multimedia and Expo.
[48]
J. Lukasiak, D. Stirling, N. Harders, and S. Perrow. 2003. Performance of mpeg-7 low level audio descriptors with compressed data. In Proceedings of the International Conference on Multimedia and Expo. 273--276.
[49]
R. Lyons. 2004. Understanding Digital Signal Processing. Prentice Hall, Upper Saddle River, NJ.
[50]
A. Mahanti, D. Eager, M. Vernon, and D. Sundaram-Stukel. 2003. Scalable on-demand media streaming with packet loss recovery. IEEE/ACM Transactions on Networking 11, 2, 195--209.
[51]
J. Mao, A. Jain, I. Center, and C. San Jose. 1996. A self-organizing network for hyperellipsoidal clustering (HEC). IEEE Transactions on Neural Networks 7, 1, 16--29.
[52]
J. Martinez, R. Koenen, and F. Pereira. 2002. MPEG-7: The generic multimedia content description standard, part 1. IEEE MultiMedia 9, 2, 78--87.
[53]
R. McNab, L. Smith, D. Bainbridge, and I. Witten. 1997. The New Zealand digital library MELody inDEX. D-Lib Magazine 3, 5, 4--15.
[54]
E. Menin. 2002. The Streaming Media Handbook. Pearson Education.
[55]
D. Meredith, G. Wiggins, and K. Lemstrom. 2001. Pattern induction and matching in polyphonic music and other multidimensional datasets. In Proceedings of the 5th World Multiconference on Systemics, Cybernetics and Informatics (SCI’01). London, UK, 22--25.
[56]
D. Meredith, K. Lemstrom, and G. Wiggins. 2002. Algorithms for discovering repeated patterns in multidimensional representations of polyphonic music. Journal of New Music Research 31, 4, 321--345.
[57]
G. Navarro and M. Raffinot. 2002. Flexible Pattern Matching in Strings: Practical On-Line Search Algorithms for Texts and Biological Sequences. Cambridge University Press.
[58]
G. Navarro, M. Raffinot, and M. Farach-Colton. 1998. A bit-parallel approach to suffix automata: Fast extended string matching. In Proceedings of the 9th Annual Symposium on Combinatorial Pattern Matching. 14--33.
[59]
A. Ockelford. 1991. The role of repetition in perceived musical structures. In Representing Musical Structure, P. Howell, R. West, and I. Cross (Eds.). Academic Press, London, England, 129--160.
[60]
D. Parsons. 1975. Directory of Tunes and Musical Themes. Oxford Press, UK.
[61]
S. Pauws. 2002. Cubyhum: A fully operational query by humming system. In Proceedings of the 3rd International Conference on Music Information Retrieval. 187--196.
[62]
M. Pearce and G. Wiggins. 2002. Aspects of a cognitive theory of creativity in musical composition. In Proceedings of the ECAI Workshop on Creative Systems. 17--24.
[63]
C. Perkins, O. Hodson, and V. Hardman. 1998. A survey of packet loss recovery techniques for streaming audio. IEEE Network 12, 5, 40--48.
[64]
L. Prechelt and R. Typke. 2001. An interface for melody input. ACM Transactions on Computer-Human Interaction 8, 2, 133--149.
[65]
L. Rabiner. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77, 2, 257--286.
[66]
F. Salzer. 1962. Structural Hearing: Tonal Coherence in Music. Dover Publications.
[67]
H. Sapp and B. Aarden. 2008. Themefinder. Available at http://www.themefinder.org.
[68]
M. Schedl, T. Pohle, P. Knees, and G. Widmer. 2011. Exploring the music similarity space on the Web. ACM Transactions on Information Systems 29, 3, Article No. 14.
[69]
E. Schubert, J. Wolfe, and A. Tarnopolsky. 2004. Spectral centroid and timbre in complex, multiple instrumental textures. In Proceedings of the 8th International Conference on Music Perception and Cognition.
[70]
J. Seo, M. Jin, S. Lee, D. Jang, S. Lee, and C. Yoo. 2005. Audio fingerprinting based on normalized spectral subband centroids. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. 213--216.
[71]
Semantic. 2008. Semantic Interaction with Audio Contents. Retrieved August 6, 2008, from http://www. semanticaudio.co.uk/.
[72]
M. Slaney, I. Center, and C. San Jose. 2002. Semantic-audio retrieval. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. 4108--4111.
[73]
T. Socolofsky and C. Kale. 2008. A TCP/IP Tutorial. Retrieved February 18, 2015 from http://www.ietf.org/rfc/rfc1180.txt.
[74]
W. Stevens. 1993. TCP/IP Illustrated Volume 1: The Protocols. Addison-Wesley Longman Publishing Co., Inc. Boston, MA, USA.
[75]
S. Stevens, J. Volkmann, and E. Newman. 1937. A scale for the measurement of the psychological magnitude pitch. The Journal of the Acoustical Society of America 8, 1, 185--194.
[76]
H. Sze, S. Liew, and Y. Lee. 2001. A packet-loss-recovery scheme for continuous-media streaming over the Internet. IEEE Communications Letters 5, 3, 116--118.
[77]
A. Tanguiane. 1993. Artificial Perception and Music Recognition. Springer, Berlin.
[78]
D. Turnbull, L. Barrington, D. Torres, and G. Lanckriet. 2007. Towards musical query-by-semantic-description using the CAL500 data set. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 439--446.
[79]
D. Turnbull, L. Barrington, D. Torres, and G. Lanckriet. 2008. Semantic annotation and retrieval of music and sound effects. IEEE Transactions on Audio, Speech, and Language Processing 16, 2, 467--476.
[80]
G. Tzanetakis and P. Cook. 2002. Musical genre classification of audio signals. IEEE Transactions on Speech and Audio Processing 10, 5, 293--302.
[81]
G. Tzanetakis, A. Ermolinskyi, and P. Cook. 2003. Pitch histograms in audio and symbolic music information retrieval. Journal of New Music Research 32, 2, 143--152.
[82]
S. Varadarajan, H. Ngo, and J. Srivastava. 2002. Error spreading: A perception-driven approach to handling error in continuous media streaming. IEEE/ACM Transactions on Networking 10, 1, 139--152.
[83]
B. Wah and D. Lin. 2005. LSP-based multiple-description coding for real-time low bit-rate voice over IP. IEEE Transactions on Multimedia 7, 1, 167--178.
[84]
R. Walker. 1997. Visual metaphors as music notations for sung vowel spectra in different cultures. Journal of New Music Research 26, 4, 315--345.
[85]
H. Wallach. 2004. Evaluation metrics for hard classifiers. Unpublished Note. Available at http://www. inference.phy.cam.ac.uk/hmw26/papers/evaluation.ps.
[86]
Y. Wang, A. Ahmaniemi, D. Isherwood, and W. Huang. 2003. Content-based UEP: A new scheme for packet loss recovery in music streaming. In Proceedings of the 11th ACM International Conference on Multimedia. 412--421.
[87]
E. Wold, T. Blum, D. Keislar, and J. Wheaten. 1996. Content-based classification, search, and retrieval of audio. IEEE Multimedia 3, 3, 27--36.
[88]
G. Wiggins. 1998. Music, syntax, and the meaning of “meaning.” In Proceedings of the 1st Symposium on Music and Computers.
[89]
G. P. Williams. 1997. Chaos Theory Tamed. Taylor and Francis, London, England.
[90]
H. Zha, X. He, C. Ding, M. Gu, and H. Simon. 2002. Spectral relaxation for k-means clustering. Advances in Neural Information Processing Systems 2, 1, 1057--1064.
[91]
B. Zhang, J. Shen, Q. Xiang, and Y. Wang. 2009. CompositeMap: A novel framework for music similarity measure. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 403--410.

Cited By

View all
  • (2019)Efficient User Guidance for Validating Participatory Sensing DataACM Transactions on Intelligent Systems and Technology10.1145/332616410:4(1-30)Online publication date: 17-Jul-2019
  • (2017)Large Sparse Cone Non-negative Matrix Factorization for Image AnnotationACM Transactions on Intelligent Systems and Technology10.1145/29873798:3(1-21)Online publication date: 20-Apr-2017

Index Terms

  1. Pattern Matching Techniques for Replacing Missing Sections of Audio Streamed across Wireless Networks

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Intelligent Systems and Technology
      ACM Transactions on Intelligent Systems and Technology  Volume 6, Issue 2
      Special Section on Visual Understanding with RGB-D Sensors
      May 2015
      381 pages
      ISSN:2157-6904
      EISSN:2157-6912
      DOI:10.1145/2753829
      • Editor:
      • Huan Liu
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 31 March 2015
      Accepted: 01 August 2014
      Revised: 01 August 2014
      Received: 01 July 2013
      Published in TIST Volume 6, Issue 2

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Streaming media
      2. audio repair
      3. data compaction and compression
      4. forward error correction

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 17 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2019)Efficient User Guidance for Validating Participatory Sensing DataACM Transactions on Intelligent Systems and Technology10.1145/332616410:4(1-30)Online publication date: 17-Jul-2019
      • (2017)Large Sparse Cone Non-negative Matrix Factorization for Image AnnotationACM Transactions on Intelligent Systems and Technology10.1145/29873798:3(1-21)Online publication date: 20-Apr-2017

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media