Embedded coding using a mixed speech and audio coding paradigm

Ramprashad, Sean A.

doi:10.1007/BF02108650

Embedded coding using a mixed speech and audio coding paradigm

Published: May 1999

Volume 2, pages 359–372, (1999)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Sean A. Ramprashad¹

69 Accesses
8 Citations
12 Altmetric
Explore all metrics

Abstract

A two stage hybrid embedded speech/audio coding structure and algorithm are proposed. The first stage of the structure consists of a core speech coder which provides a minimum output bit rate and acceptable performance on clean speech inputs. The second stage is a perceptual/transform based coder which provides a separate optional bitstream for the enhancement of the core stage output.

The two stage structure can be used to enhance the quality of an existing codec without modification of the original coding algorithm. In this regard it can be considered a value added option that can be used with a standard (existing) system. The structure can also be used in systems in which many users/systems force the coding algorithm to work simultaneously under multiple constraints of bitrate, complexity, delay, and coding quality.

Informal testing of the algorithm has been done using ITU-T standard G.723.1 at 5.3 kb/s as a core coder. The maximum combined bitrate from the core and enhancement stages for the tests is 16 kb/s. The tests show that the second stage significantly improves the quality of the core output in the cases of music and speech with background noise. Compared to the non-embedded fixed rate standard LD-CELP G.728 at 16 kb/s, the quality of the two stage structure is generally lower on these inputs; the embedded feature does affect quality. On clean speech the quality of the two stage structure at 16 kb/s is close to if not better than that of G.728 at 16 kb/s.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An investigation on the degradation of different features extracted from the compressed American English speech using narrowband and wideband codecs

Article 29 October 2018

Compressed Sensing-Speech Coding Scheme for Mobile Communications

Article 05 May 2021

Speech coding techniques and challenges: a comprehensive literature survey

Article 14 September 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Atal, B.S. and Schroeder, M.R. (1979). Predictive coding of speech signals and subjective error criteria.IEEE Trans. on Acoustics, Speech, and Signal Proc., ASSP-27(3):241–254.
Google Scholar
Bially, T., Gold, B., and Seneff, S. (1980). A technique for adaptive voice flow control in integrated packet networks.IEEE Trans. on Communications.COM-28(3):325–333.
Google Scholar
Brandenburg, K. and Sporer, T. (1992). “NMR” and “Masking Flag” evaluation of quality using perceptual criteria.AES 11th International Conference, pp. 169–179.
Campos-Neto, S. (1999). The ITU-T software tool library. (IJST: see this issue).
De Iacovo, R.D. and Sereno, D. (1991). Embedded CELP coding for variable bit-rate between 6.4 and 9.6 kbit/s.IEEE Int. Conf. of Acoustics, Speech, Signal Processing. Toronto, pp. 681–684.
Goodman, D.J. (1980). Embedded DPCM for variable bit rate transmission.IEEE Trans. on Communications, COM-28(7):1040–1046.
Google Scholar
Hall, J.H. (1997). Asymmetry of masking revisited: Generalization of masker and probe bandwidth.Journal of the Acoustical Soc. of Am., 101(2):1023–1033.
Google Scholar
Haoui, A. and Messerschmitt, D.G. (1985). Embedded coding of speech: A vector quantization approach.IEEE Int. Conf. of Acoustics, Speech, Signal Processing, pp. 43.9.1–43.9.3.
ITU-T (1996a).Coding of Speech at 8 kbit/s Using Conjugate-Structure Algebraic-Code-Excited Linear Prediction (CSACELP). Recommendation G.729.
ITU-T (1996b).Dual Rate Speech Coder for Multimedia Communications Transmitting at 5.3 and 6.3 kbit/s. Recommendation G.723.1.
Jayant, N.S. (1983). Variable rae ADPCM based on explicit noise coding.The Bell System Technical Journal, 62(3):657–677.
Google Scholar
Johnston, J.D. (1988). Transform coding of audio signals using perceptual noise criteria.IEEE Journal on Selected Areas in Communications, 6(2):314–323.
Google Scholar
Johnston, J.D. and Brandenburg, K. (1992). Wideband coding perceptual considerations for speech and music. In S. Furui and M.M. Sondhi (Eds.),Advances in Speech Signal Processing. Marcel Dekker, New York.
Google Scholar
Kleijn, W.B., Kroon, P., Cellario, L., and Soreno, D. (1993). A 5.85 kb/s CELP algorithm for cellular applications.IEEE Int. Conf. of Acoustics, Speech, Signal Processing. Minneapolis, MN, pp. 596–II-599.
Kleijn, W.B. and Paliwal, K.K. (Eds.). (1995),Speech Coding and Synthesis. New York: Elsevier.
Google Scholar
Kondo, K. and Ohno, M. (1994). Packet speech transmission on ATM networks using a variable rate embedded ADPCM coding scheme.IEEE Transactions on Communications, 42(2/3/4):243–247.
Google Scholar
Le Guyader A. and Boursicaut, E. (1993). Embedded wideband VSELP speech coding with optimized codebooks.IEEE Workshop on Speech Coding for Telecommunications. Quebec, Canada, pp. 15–16.
Princen, J.P. and Bradley, A.B. (1986). Analysis and synthesis filter bank design based on time domain aliasing cancellation.IEEE Trans. on Acoustics, Speech, and Signal Proc., 34(5):277–284.
Google Scholar
Rabiner, L. and Juang, B-H. (1993).Fundamentals of Speech Recognition. Englewood Cliffs: Prentice Hall.
Google Scholar
Ramprashad, S.A. (1998). A two stage hybrid embedded speech/audio coding structure.IEEE Int. Conf. of Acoustics, Speech, Signal Processing, 1:337–340.
Google Scholar
Scharf, B. (1970).Foundations of Modern Auditory Theory. Academic, New York.
Google Scholar
Schroeder, M.R., Atal, B.S., and Hall, J.L. (1979). Optimizing digital speech coders by exploiting masking properties of the human ear.Journal of the Acoustical Soc. of Am., 66(6): 1647–1652.
Google Scholar
Singhai, S. and Atal, B.S. (1985). Improving performance of multipulse LPC coders at low bit rates.IEEE Int. Conf. of Acoustics, Speech, Signal Processing, pp. 1.3.1–1.3.4.
Tang, B., Shen, A., Alwan, A., and Pottie, G. (1997). A perceptually based embedded subband speech coder.IEEE Trans. on Speech and Audio Proc., 5(2): 131–140.
Google Scholar
Wigren, T., Bergstrom, A., Harrysson, S., Jansson, F., and Nilsson, H. (1995). Improvements of background sound coding in linear predictive speech coders.IEEE Int. Conf. of Acoustics, Speech, Signal Processing, pp. 25–28.
Zhang, S. and Lockhart, F. (1995). An embedded scheme for regular pulse excited (RPE) linear predictive coding.IEEE Int. Conf. of Acoustics, Speech, Signal Processing. Detroit, pp. 37–40.
Zhang, S. and Lockhart, F. (1997). Embedded RPE based on multistage coding.IEEE Trans. on Speech and Audio Proc., 5(4):367–371.
Google Scholar

Download references

Author information

Authors and Affiliations

Lucent Technologies, Bell Laboratories, USA
Sean A. Ramprashad

Authors

Sean A. Ramprashad
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ramprashad, S.A. Embedded coding using a mixed speech and audio coding paradigm. Int J Speech Technol 2, 359–372 (1999). https://doi.org/10.1007/BF02108650

Download citation

Received: 15 September 1998
Revised: 15 October 1998
Issue Date: May 1999
DOI: https://doi.org/10.1007/BF02108650

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Embedded coding using a mixed speech and audio coding paradigm

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An investigation on the degradation of different features extracted from the compressed American English speech using narrowband and wideband codecs

Compressed Sensing-Speech Coding Scheme for Mobile Communications

Speech coding techniques and challenges: a comprehensive literature survey

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Embedded coding using a mixed speech and audio coding paradigm

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An investigation on the degradation of different features extracted from the compressed American English speech using narrowband and wideband codecs

Compressed Sensing-Speech Coding Scheme for Mobile Communications

Speech coding techniques and challenges: a comprehensive literature survey

Explore related subjects

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation