Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Embedded coding using a mixed speech and audio coding paradigm

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

A two stage hybrid embedded speech/audio coding structure and algorithm are proposed. The first stage of the structure consists of a core speech coder which provides a minimum output bit rate and acceptable performance on clean speech inputs. The second stage is a perceptual/transform based coder which provides a separate optional bitstream for the enhancement of the core stage output.

The two stage structure can be used to enhance the quality of an existing codec without modification of the original coding algorithm. In this regard it can be considered a value added option that can be used with a standard (existing) system. The structure can also be used in systems in which many users/systems force the coding algorithm to work simultaneously under multiple constraints of bitrate, complexity, delay, and coding quality.

Informal testing of the algorithm has been done using ITU-T standard G.723.1 at 5.3 kb/s as a core coder. The maximum combined bitrate from the core and enhancement stages for the tests is 16 kb/s. The tests show that the second stage significantly improves the quality of the core output in the cases of music and speech with background noise. Compared to the non-embedded fixed rate standard LD-CELP G.728 at 16 kb/s, the quality of the two stage structure is generally lower on these inputs; the embedded feature does affect quality. On clean speech the quality of the two stage structure at 16 kb/s is close to if not better than that of G.728 at 16 kb/s.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Atal, B.S. and Schroeder, M.R. (1979). Predictive coding of speech signals and subjective error criteria.IEEE Trans. on Acoustics, Speech, and Signal Proc., ASSP-27(3):241–254.

    Google Scholar 

  • Bially, T., Gold, B., and Seneff, S. (1980). A technique for adaptive voice flow control in integrated packet networks.IEEE Trans. on Communications.COM-28(3):325–333.

    Google Scholar 

  • Brandenburg, K. and Sporer, T. (1992). “NMR” and “Masking Flag” evaluation of quality using perceptual criteria.AES 11th International Conference, pp. 169–179.

  • Campos-Neto, S. (1999). The ITU-T software tool library. (IJST: see this issue).

  • De Iacovo, R.D. and Sereno, D. (1991). Embedded CELP coding for variable bit-rate between 6.4 and 9.6 kbit/s.IEEE Int. Conf. of Acoustics, Speech, Signal Processing. Toronto, pp. 681–684.

  • Goodman, D.J. (1980). Embedded DPCM for variable bit rate transmission.IEEE Trans. on Communications, COM-28(7):1040–1046.

    Google Scholar 

  • Hall, J.H. (1997). Asymmetry of masking revisited: Generalization of masker and probe bandwidth.Journal of the Acoustical Soc. of Am., 101(2):1023–1033.

    Google Scholar 

  • Haoui, A. and Messerschmitt, D.G. (1985). Embedded coding of speech: A vector quantization approach.IEEE Int. Conf. of Acoustics, Speech, Signal Processing, pp. 43.9.1–43.9.3.

  • ITU-T (1996a).Coding of Speech at 8 kbit/s Using Conjugate-Structure Algebraic-Code-Excited Linear Prediction (CSACELP). Recommendation G.729.

  • ITU-T (1996b).Dual Rate Speech Coder for Multimedia Communications Transmitting at 5.3 and 6.3 kbit/s. Recommendation G.723.1.

  • Jayant, N.S. (1983). Variable rae ADPCM based on explicit noise coding.The Bell System Technical Journal, 62(3):657–677.

    Google Scholar 

  • Johnston, J.D. (1988). Transform coding of audio signals using perceptual noise criteria.IEEE Journal on Selected Areas in Communications, 6(2):314–323.

    Google Scholar 

  • Johnston, J.D. and Brandenburg, K. (1992). Wideband coding perceptual considerations for speech and music. In S. Furui and M.M. Sondhi (Eds.),Advances in Speech Signal Processing. Marcel Dekker, New York.

    Google Scholar 

  • Kleijn, W.B., Kroon, P., Cellario, L., and Soreno, D. (1993). A 5.85 kb/s CELP algorithm for cellular applications.IEEE Int. Conf. of Acoustics, Speech, Signal Processing. Minneapolis, MN, pp. 596–II-599.

  • Kleijn, W.B. and Paliwal, K.K. (Eds.). (1995),Speech Coding and Synthesis. New York: Elsevier.

    Google Scholar 

  • Kondo, K. and Ohno, M. (1994). Packet speech transmission on ATM networks using a variable rate embedded ADPCM coding scheme.IEEE Transactions on Communications, 42(2/3/4):243–247.

    Google Scholar 

  • Le Guyader A. and Boursicaut, E. (1993). Embedded wideband VSELP speech coding with optimized codebooks.IEEE Workshop on Speech Coding for Telecommunications. Quebec, Canada, pp. 15–16.

  • Princen, J.P. and Bradley, A.B. (1986). Analysis and synthesis filter bank design based on time domain aliasing cancellation.IEEE Trans. on Acoustics, Speech, and Signal Proc., 34(5):277–284.

    Google Scholar 

  • Rabiner, L. and Juang, B-H. (1993).Fundamentals of Speech Recognition. Englewood Cliffs: Prentice Hall.

    Google Scholar 

  • Ramprashad, S.A. (1998). A two stage hybrid embedded speech/audio coding structure.IEEE Int. Conf. of Acoustics, Speech, Signal Processing, 1:337–340.

    Google Scholar 

  • Scharf, B. (1970).Foundations of Modern Auditory Theory. Academic, New York.

    Google Scholar 

  • Schroeder, M.R., Atal, B.S., and Hall, J.L. (1979). Optimizing digital speech coders by exploiting masking properties of the human ear.Journal of the Acoustical Soc. of Am., 66(6): 1647–1652.

    Google Scholar 

  • Singhai, S. and Atal, B.S. (1985). Improving performance of multipulse LPC coders at low bit rates.IEEE Int. Conf. of Acoustics, Speech, Signal Processing, pp. 1.3.1–1.3.4.

  • Tang, B., Shen, A., Alwan, A., and Pottie, G. (1997). A perceptually based embedded subband speech coder.IEEE Trans. on Speech and Audio Proc., 5(2): 131–140.

    Google Scholar 

  • Wigren, T., Bergstrom, A., Harrysson, S., Jansson, F., and Nilsson, H. (1995). Improvements of background sound coding in linear predictive speech coders.IEEE Int. Conf. of Acoustics, Speech, Signal Processing, pp. 25–28.

  • Zhang, S. and Lockhart, F. (1995). An embedded scheme for regular pulse excited (RPE) linear predictive coding.IEEE Int. Conf. of Acoustics, Speech, Signal Processing. Detroit, pp. 37–40.

  • Zhang, S. and Lockhart, F. (1997). Embedded RPE based on multistage coding.IEEE Trans. on Speech and Audio Proc., 5(4):367–371.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ramprashad, S.A. Embedded coding using a mixed speech and audio coding paradigm. Int J Speech Technol 2, 359–372 (1999). https://doi.org/10.1007/BF02108650

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02108650

Keywords