Abstract
A two stage hybrid embedded speech/audio coding structure and algorithm are proposed. The first stage of the structure consists of a core speech coder which provides a minimum output bit rate and acceptable performance on clean speech inputs. The second stage is a perceptual/transform based coder which provides a separate optional bitstream for the enhancement of the core stage output.
The two stage structure can be used to enhance the quality of an existing codec without modification of the original coding algorithm. In this regard it can be considered a value added option that can be used with a standard (existing) system. The structure can also be used in systems in which many users/systems force the coding algorithm to work simultaneously under multiple constraints of bitrate, complexity, delay, and coding quality.
Informal testing of the algorithm has been done using ITU-T standard G.723.1 at 5.3 kb/s as a core coder. The maximum combined bitrate from the core and enhancement stages for the tests is 16 kb/s. The tests show that the second stage significantly improves the quality of the core output in the cases of music and speech with background noise. Compared to the non-embedded fixed rate standard LD-CELP G.728 at 16 kb/s, the quality of the two stage structure is generally lower on these inputs; the embedded feature does affect quality. On clean speech the quality of the two stage structure at 16 kb/s is close to if not better than that of G.728 at 16 kb/s.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Atal, B.S. and Schroeder, M.R. (1979). Predictive coding of speech signals and subjective error criteria.IEEE Trans. on Acoustics, Speech, and Signal Proc., ASSP-27(3):241–254.
Bially, T., Gold, B., and Seneff, S. (1980). A technique for adaptive voice flow control in integrated packet networks.IEEE Trans. on Communications.COM-28(3):325–333.
Brandenburg, K. and Sporer, T. (1992). “NMR” and “Masking Flag” evaluation of quality using perceptual criteria.AES 11th International Conference, pp. 169–179.
Campos-Neto, S. (1999). The ITU-T software tool library. (IJST: see this issue).
De Iacovo, R.D. and Sereno, D. (1991). Embedded CELP coding for variable bit-rate between 6.4 and 9.6 kbit/s.IEEE Int. Conf. of Acoustics, Speech, Signal Processing. Toronto, pp. 681–684.
Goodman, D.J. (1980). Embedded DPCM for variable bit rate transmission.IEEE Trans. on Communications, COM-28(7):1040–1046.
Hall, J.H. (1997). Asymmetry of masking revisited: Generalization of masker and probe bandwidth.Journal of the Acoustical Soc. of Am., 101(2):1023–1033.
Haoui, A. and Messerschmitt, D.G. (1985). Embedded coding of speech: A vector quantization approach.IEEE Int. Conf. of Acoustics, Speech, Signal Processing, pp. 43.9.1–43.9.3.
ITU-T (1996a).Coding of Speech at 8 kbit/s Using Conjugate-Structure Algebraic-Code-Excited Linear Prediction (CSACELP). Recommendation G.729.
ITU-T (1996b).Dual Rate Speech Coder for Multimedia Communications Transmitting at 5.3 and 6.3 kbit/s. Recommendation G.723.1.
Jayant, N.S. (1983). Variable rae ADPCM based on explicit noise coding.The Bell System Technical Journal, 62(3):657–677.
Johnston, J.D. (1988). Transform coding of audio signals using perceptual noise criteria.IEEE Journal on Selected Areas in Communications, 6(2):314–323.
Johnston, J.D. and Brandenburg, K. (1992). Wideband coding perceptual considerations for speech and music. In S. Furui and M.M. Sondhi (Eds.),Advances in Speech Signal Processing. Marcel Dekker, New York.
Kleijn, W.B., Kroon, P., Cellario, L., and Soreno, D. (1993). A 5.85 kb/s CELP algorithm for cellular applications.IEEE Int. Conf. of Acoustics, Speech, Signal Processing. Minneapolis, MN, pp. 596–II-599.
Kleijn, W.B. and Paliwal, K.K. (Eds.). (1995),Speech Coding and Synthesis. New York: Elsevier.
Kondo, K. and Ohno, M. (1994). Packet speech transmission on ATM networks using a variable rate embedded ADPCM coding scheme.IEEE Transactions on Communications, 42(2/3/4):243–247.
Le Guyader A. and Boursicaut, E. (1993). Embedded wideband VSELP speech coding with optimized codebooks.IEEE Workshop on Speech Coding for Telecommunications. Quebec, Canada, pp. 15–16.
Princen, J.P. and Bradley, A.B. (1986). Analysis and synthesis filter bank design based on time domain aliasing cancellation.IEEE Trans. on Acoustics, Speech, and Signal Proc., 34(5):277–284.
Rabiner, L. and Juang, B-H. (1993).Fundamentals of Speech Recognition. Englewood Cliffs: Prentice Hall.
Ramprashad, S.A. (1998). A two stage hybrid embedded speech/audio coding structure.IEEE Int. Conf. of Acoustics, Speech, Signal Processing, 1:337–340.
Scharf, B. (1970).Foundations of Modern Auditory Theory. Academic, New York.
Schroeder, M.R., Atal, B.S., and Hall, J.L. (1979). Optimizing digital speech coders by exploiting masking properties of the human ear.Journal of the Acoustical Soc. of Am., 66(6): 1647–1652.
Singhai, S. and Atal, B.S. (1985). Improving performance of multipulse LPC coders at low bit rates.IEEE Int. Conf. of Acoustics, Speech, Signal Processing, pp. 1.3.1–1.3.4.
Tang, B., Shen, A., Alwan, A., and Pottie, G. (1997). A perceptually based embedded subband speech coder.IEEE Trans. on Speech and Audio Proc., 5(2): 131–140.
Wigren, T., Bergstrom, A., Harrysson, S., Jansson, F., and Nilsson, H. (1995). Improvements of background sound coding in linear predictive speech coders.IEEE Int. Conf. of Acoustics, Speech, Signal Processing, pp. 25–28.
Zhang, S. and Lockhart, F. (1995). An embedded scheme for regular pulse excited (RPE) linear predictive coding.IEEE Int. Conf. of Acoustics, Speech, Signal Processing. Detroit, pp. 37–40.
Zhang, S. and Lockhart, F. (1997). Embedded RPE based on multistage coding.IEEE Trans. on Speech and Audio Proc., 5(4):367–371.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Ramprashad, S.A. Embedded coding using a mixed speech and audio coding paradigm. Int J Speech Technol 2, 359–372 (1999). https://doi.org/10.1007/BF02108650
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF02108650