Skip to main content

Joseph Picone

Followers

27

Following

8

Public Views

Interests

Uploads

Papers by Joseph Picone

A cluster-based power-efficient MAC scheme for event-driven sensing applications

Ad Hoc Networks, 2007

In developing an architecture for wireless sensor networks (WSNs) that is extensible to hundreds ... more In developing an architecture for wireless sensor networks (WSNs) that is extensible to hundreds of thousands of heterogeneous nodes, fundamental advances in energy efficient communication protocols must occur. In this paper, we first propose an energy-efficient and robust intra-cluster communication bit-map assisted (BMA) MAC protocol for large-scale cluster-based WSNs and then derive energy models for BMA, conventional TDMA, and energy efficient TDMA (E-TDMA) using two different approaches. We use simulation to validate these analytical models. BMA is intended for event-driven sensing applications, that is, sensor nodes forward data to the cluster head only if significant events are observed. It has low complexity and utilizes a dynamic scheduling scheme. Clustering is a promising distributing technique used in large-scale WSNs, and when combined with an appropriate MAC scheme, high energy efficiency can be achieved. The results indicate that BMA can improve the performance of wireless sensor networks by reducing energy expenditure and packet latency. The performance of BMA as an intra-cluster MAC scheme relative to E-TDMA depends on the sensor node traffic offer load and several other key system parameters. For most sensor-based applications, the values of these parameters can be constrained such that BMA provides enhanced performance.

A phonetic vocoder

Syllable-a promising recognition unit for LVCSR

... Aravind Ganapathiraju, Vaibhava Goel, Joseph Picone, Andres Corrada, George Doddington, Katri... more

Visualization of signal processing concepts

The goal for any engineering come is to convey to student engineers the knowledge and skills whic... more

Continuous speech recognition using hidden Markov models

IEEE Assp Magazine, 1990

On modeling duration in context in speech recognition

Duration in context clustering for speech recognition

Speech Communication, 1990

Applications of support vector machines to speech recognition

IEEE Transactions on Signal Processing, 2004

Syllable-based large vocabulary continuous speech recognition

IEEE Transactions on Speech and Audio Processing, 2001

Automated generation of N-best pronunciations of proper nouns

Abstract The problem of proper noun recognition is key to developing pervasive voice interfaces i... more

Hybrid SVM/HMM architectures for speech recognition

... (1) Rempα( ) 1 2l ---- yif xi α,( ) i 1= l ∑ = Aravind Ganapathiraju and Joseph Picone Dept... more

A public domain speech-to-text system

The lack of freely available state-of-the-art Speech-to-Text (STT) software has been a major hind... more The lack of freely available state-of-the-art Speech-to-Text (STT) software has been a major hindrance to the development of new audio information processing technology. The high cost of the infrastructure required to conduct state-of-the-art speech recognition research prevents many small research groups from evaluating new ideas on large-scale tasks. The Institute for Signal and Information Processing (ISIP) has been committed to providing the research community with free software tools for digital information processing via the Internet to facilitate worldwide synergistic development of speech recognition technology. In this paper, we present the core components of an available state-of-the-art Speech-to-Text system: an acoustic processor which converts the speech signal into a sequence of feature vectors; a training module which estimates the parameters for a Hidden Markov Model; a linguistic processor which predicts the next word given a sequence of previously recognized words; and a search engine which finds the most probable word sequence given a set of feature vectors. By far, the most important component of a Speech-to-Text system is the search engine or decoder. The decoder was designed to be modular and extensible in order to be able to handle a wide variety speech recognition problems (connected digits, studio-quality read speech and spontaneous telephone conversations) in a transparent fashion. The process of moving from a well defined task to a less rigorously defined recognition problem (Spontaneous Speech Recognition, i.e. Switchboard) requires the decoder to have a sophisticated control structure. Hence very few good decoders exist and the best decoders are always considered proprietary. The ISIP decoder has the capability to compile network grammars, efficiently decode n-gram language models, generate and rescore lattices, generate N-best lists, and perform forced alignments. The decoder is based on a hierarchical Viterbi, breadth-first search tree which will support cross-word triphone acoustic models. The decoder uses lexical trees to represent the pronunciations of all words. The decoder uses beam pruning at the state, phone and word levels and limits the number of active model instances per frame to prevent the evaluation of low-scoring hypothesis. A benchmark evaluation (which does not include MLLR or vocal tract normalization) conducted on a subset of the Switchboard corpus yielded a WER of 46.1% at 30xRT. This is competitive with commercially available Speech-to-Text systems. The ISIP Speech-to-Text system currently produces mel-frequency scaled cepstral coefficients and is capable of estimating the mixture densities using Viterbi training. The design of the acoustic processor will allow other feature sets to be easily incorporated into the ISIP Speech-to-Text system. Finally, some experimental results of the complete system will be presented in this paper. To obtain further information o f t he I SIP S peech-to-Text s ystem t he f ollowing U RL i s a vailable: http://WWW.ISIP.MsState.Edu/resources/technology/projects/speech_recognition/ .

Resegmentation of SWITCHBOARD

Low rate speech coding using contour quantization

Speech recognition in a unification grammar framework

Benchmarking of FFT algorithms

Advances in alphadigit recognition using syllables

Phone-mediated word alignment for speech recognition evaluation

IEEE Transactions on Acoustics, Speech, and Signal Processing, 1990

Support vector machines for speech recognition

... non-linear transformations [16]. Schemes like linear discriminant analysis (LDA), ... Support... more

Linear discriminant analysis for signal processing problems

A cluster-based power-efficient MAC scheme for event-driven sensing applications

Ad Hoc Networks, 2007

In developing an architecture for wireless sensor networks (WSNs) that is extensible to hundreds ... more In developing an architecture for wireless sensor networks (WSNs) that is extensible to hundreds of thousands of heterogeneous nodes, fundamental advances in energy efficient communication protocols must occur. In this paper, we first propose an energy-efficient and robust intra-cluster communication bit-map assisted (BMA) MAC protocol for large-scale cluster-based WSNs and then derive energy models for BMA, conventional TDMA, and energy efficient TDMA (E-TDMA) using two different approaches. We use simulation to validate these analytical models. BMA is intended for event-driven sensing applications, that is, sensor nodes forward data to the cluster head only if significant events are observed. It has low complexity and utilizes a dynamic scheduling scheme. Clustering is a promising distributing technique used in large-scale WSNs, and when combined with an appropriate MAC scheme, high energy efficiency can be achieved. The results indicate that BMA can improve the performance of wireless sensor networks by reducing energy expenditure and packet latency. The performance of BMA as an intra-cluster MAC scheme relative to E-TDMA depends on the sensor node traffic offer load and several other key system parameters. For most sensor-based applications, the values of these parameters can be constrained such that BMA provides enhanced performance.

A phonetic vocoder

Syllable-a promising recognition unit for LVCSR

... Aravind Ganapathiraju, Vaibhava Goel, Joseph Picone, Andres Corrada, George Doddington, Katri... more

Visualization of signal processing concepts

The goal for any engineering come is to convey to student engineers the knowledge and skills whic... more

Continuous speech recognition using hidden Markov models

IEEE Assp Magazine, 1990

On modeling duration in context in speech recognition

Duration in context clustering for speech recognition

Speech Communication, 1990

Applications of support vector machines to speech recognition

IEEE Transactions on Signal Processing, 2004

Syllable-based large vocabulary continuous speech recognition

IEEE Transactions on Speech and Audio Processing, 2001

Automated generation of N-best pronunciations of proper nouns

Abstract The problem of proper noun recognition is key to developing pervasive voice interfaces i... more

Hybrid SVM/HMM architectures for speech recognition

... (1) Rempα( ) 1 2l ---- yif xi α,( ) i 1= l ∑ = Aravind Ganapathiraju and Joseph Picone Dept... more

A public domain speech-to-text system

The lack of freely available state-of-the-art Speech-to-Text (STT) software has been a major hind... more The lack of freely available state-of-the-art Speech-to-Text (STT) software has been a major hindrance to the development of new audio information processing technology. The high cost of the infrastructure required to conduct state-of-the-art speech recognition research prevents many small research groups from evaluating new ideas on large-scale tasks. The Institute for Signal and Information Processing (ISIP) has been committed to providing the research community with free software tools for digital information processing via the Internet to facilitate worldwide synergistic development of speech recognition technology. In this paper, we present the core components of an available state-of-the-art Speech-to-Text system: an acoustic processor which converts the speech signal into a sequence of feature vectors; a training module which estimates the parameters for a Hidden Markov Model; a linguistic processor which predicts the next word given a sequence of previously recognized words; and a search engine which finds the most probable word sequence given a set of feature vectors. By far, the most important component of a Speech-to-Text system is the search engine or decoder. The decoder was designed to be modular and extensible in order to be able to handle a wide variety speech recognition problems (connected digits, studio-quality read speech and spontaneous telephone conversations) in a transparent fashion. The process of moving from a well defined task to a less rigorously defined recognition problem (Spontaneous Speech Recognition, i.e. Switchboard) requires the decoder to have a sophisticated control structure. Hence very few good decoders exist and the best decoders are always considered proprietary. The ISIP decoder has the capability to compile network grammars, efficiently decode n-gram language models, generate and rescore lattices, generate N-best lists, and perform forced alignments. The decoder is based on a hierarchical Viterbi, breadth-first search tree which will support cross-word triphone acoustic models. The decoder uses lexical trees to represent the pronunciations of all words. The decoder uses beam pruning at the state, phone and word levels and limits the number of active model instances per frame to prevent the evaluation of low-scoring hypothesis. A benchmark evaluation (which does not include MLLR or vocal tract normalization) conducted on a subset of the Switchboard corpus yielded a WER of 46.1% at 30xRT. This is competitive with commercially available Speech-to-Text systems. The ISIP Speech-to-Text system currently produces mel-frequency scaled cepstral coefficients and is capable of estimating the mixture densities using Viterbi training. The design of the acoustic processor will allow other feature sets to be easily incorporated into the ISIP Speech-to-Text system. Finally, some experimental results of the complete system will be presented in this paper. To obtain further information o f t he I SIP S peech-to-Text s ystem t he f ollowing U RL i s a vailable: http://WWW.ISIP.MsState.Edu/resources/technology/projects/speech_recognition/ .

Resegmentation of SWITCHBOARD

Low rate speech coding using contour quantization

Speech recognition in a unification grammar framework

Benchmarking of FFT algorithms

Advances in alphadigit recognition using syllables

Phone-mediated word alignment for speech recognition evaluation

IEEE Transactions on Acoustics, Speech, and Signal Processing, 1990

Support vector machines for speech recognition

... non-linear transformations [16]. Schemes like linear discriminant analysis (LDA), ... Support... more

Linear discriminant analysis for signal processing problems