Article

Free access

Detecting stress in spoken English using Decision Trees and Support Vector Machines

Authors:

Mengjie Zhang, and

Paul WarrenAuthors Info & Claims

ACSW Frontiers '04: Proceedings of the second workshop on Australasian information security, Data Mining and Web Intelligence, and Software Internationalisation - Volume 32

January 2004

Pages 145 - 150

Published: 01 January 2004 Publication History

Abstract

This paper describes an approach to the detection of stress in spoken New Zealand English. After identifying the vowel segments of the speech signal, the approach extracts two different sets of features - prosodic features and vowel quality features - from the vowel segments. These features are then normalised and scaled to obtain speaker independent feature values that can be used to classify each vowel segment as stressed or unstressed. We used Decision Trees (C4.5) and Support Vector Machines (LIBSVM) to learn stress-detecting classifiers with various combinations of the features. The approach was evaluated on 60 adult female utterances with 703 vowels and a maximum accuracy of 84.72% was achieved. The results showed that a combination of features derived from duration and amplitude achieved the best performance but the vowel quality features also achieved quite reasonable results.

References

[1]

Aull, A. M. & Zue, V. W. (1985), 'Lexical stress determination and its application to speech recognition', in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1549--1552.

[2]

Bernthal, J. E. & Bankson, N. W. (1988), Articulation and phonological disorders, Prentice Hall, New Jersey.

[3]

Chang, C.-C. & Lin, C.-J. (2003), 'Libsvm: a library for support vector machines', http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf.

[4]

Cortes, C. & Vapnik, V. (1995), 'Support-vector network', Machine Learning20, 273--297.

Digital Library

[5]

Cruttenden, A. (1997), Intonation, Second edition, Cambridge University Press, Cambridge.

[6]

Freij, G., Fallside, F., Hoequist, C. & Nolan, F. (1990), 'Lexical stress estimation and phonological knowledge', Computer Speech and Language4(1), 1--15.

[7]

Jenkin, K. L. & Scordilis, M. S. (1996), 'Development and comparison of three syllable stress classifiers', in Proceedings of the International Conference on Spoken Language Processing, Philadelphia, USA, pp. 733--736.

[8]

Ladefoged, P. (1967), Three Areas of experimental phonetics, Oxford University Press, London.

[9]

Ladefoged, P. (1993), A Course in Phonetics, Third edition, Harcourt Brace Jovanovich, New York.

[10]

Ladefoged, P. & Maddieson, I. (1990), 'Vowels of the world's languages', Journal of Phonetics, 18, 93--122.

[11]

Lieberman, P. (1960), 'Some acoustic correlates of word stress in American English', Journal of the Acoustical Society of America, 32, 451--454.

[12]

Mateescu, D. (2003), 'English phonetics and phonological theory', http://www.unibuc.ro/eBooks/filologie/mateescu.

[13]

Pennington, M. C. (1996), Phonology in English language teaching: An international approach, Longman, London.

[14]

van Kuijk, D. & Boves, L. (1999), 'Acoustic characteristics of lexical stress in continuous speech', in IEEE International Conference on Acoustics, Speech, and Signal Processing, 3, Munich, Germany, pp. 1655--1658.

Digital Library

[15]

Waibel, A. (1986), 'Recognition of lexical stress in a continuous speech system -- a pattern recognition approach', in IEEE International Conference on Acoustics, Speech, and Signal Processing, Tokyo, Japan, pp. 2287--2290.

[16]

Wightman, C. W. (1992), Automatic detection of prosodic constituents for parsing, PhD thesis, Boston University.

Digital Library

[17]

Xie, H., Andreae, P., Zhang, M. & Warren, P. (2004), 'Learning models for English speech recognition', Proceedings of the 27th Australasian Computer Science Conference, Dunedin, New Zealand.

Digital Library

[18]

Ying, G. S., Jamieson, L. H., Chen, R., Michell, C. D. & Liu, H. (1996), 'Lexical stress detection on stress-minimal word pairs', Proceedings of the 1996 International Conference on Spoken Language Processing pp. 1612--1615.

Cited By

Nagy PNémeth G(2016)Improving HMM speech synthesis of interrogative sentences by pitch track transformationsSpeech Communication10.1016/j.specom.2016.06.00582:C(97-112)Online publication date: 1-Sep-2016
https://dl.acm.org/doi/10.1016/j.specom.2016.06.005
Xie HZhang MAndreae P(2006)Genetic programming for automatic stress detection in spoken englishProceedings of the 2006 international conference on Applications of Evolutionary Computing10.1007/11732242_41(460-471)Online publication date: 10-Apr-2006
https://dl.acm.org/doi/10.1007/11732242_41
van Dalen RWiggers PRothkrantz L(2005)Modelling lexical stressProceedings of the 8th international conference on Text, Speech and Dialogue10.1007/11551874_27(211-218)Online publication date: 12-Sep-2005
https://dl.acm.org/doi/10.1007/11551874_27

Index Terms

Detecting stress in spoken English using Decision Trees and Support Vector Machines
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Speech recognition

Recommendations

Applications of support vector machines to speech recognition

Recent work in machine learning has focused on models, such as the support vector machine (SVM), that automatically control generalization and parameterization as part of the overall optimization process. In this paper, we show that SVMs provide a ...
Read More
Acoustic model adaptation based on pronunciation variability analysis for non-native speech recognition

In this paper, pronunciation variability between native and non-native speakers is investigated, and a novel acoustic model adaptation method is proposed based on pronunciation variability analysis in order to improve the performance of a speech ...
Read More
Lithuanian Speech Recognition Using the English Recognizer

The present work is concerned with speech recognition using a small or medium size vocabulary. The possibility to use the English speech recognizer for the recognition of Lithuanian was investigated. Two methods were used to deal with such problems: the ...
Read More

Comments

Information & Contributors

Information

Published In

cover image DL Hosted proceedings

ACSW Frontiers '04: Proceedings of the second workshop on Australasian information security, Data Mining and Web Intelligence, and Software Internationalisation - Volume 32

January 2004

192 pages

Publisher

Australian Computer Society, Inc.

Australia

Publication History

Published: 01 January 2004

Author Tags

Qualifiers

Article

Conference

ACSW Frontiers '04

ACSW Frontiers '04: Australasian information security, Data Mining and Web Intelligence, and Software Internationalisation

01 01 2004

Dunedin, New Zealand

Acceptance Rates

Overall Acceptance Rate 204 of 424 submissions, 48%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
612
Total Downloads

Downloads (Last 12 months)42
Downloads (Last 6 weeks)7

Other Metrics

View Author Metrics

Citations

Cited By

Nagy PNémeth G(2016)Improving HMM speech synthesis of interrogative sentences by pitch track transformationsSpeech Communication10.1016/j.specom.2016.06.00582:C(97-112)Online publication date: 1-Sep-2016
https://dl.acm.org/doi/10.1016/j.specom.2016.06.005
Xie HZhang MAndreae P(2006)Genetic programming for automatic stress detection in spoken englishProceedings of the 2006 international conference on Applications of Evolutionary Computing10.1007/11732242_41(460-471)Online publication date: 10-Apr-2006
https://dl.acm.org/doi/10.1007/11732242_41
van Dalen RWiggers PRothkrantz L(2005)Modelling lexical stressProceedings of the 8th international conference on Text, Speech and Dialogue10.1007/11551874_27(211-218)Online publication date: 12-Sep-2005
https://dl.acm.org/doi/10.1007/11551874_27

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents