In developing an architecture for wireless sensor networks (WSNs) that is extensible to hundreds ... more In developing an architecture for wireless sensor networks (WSNs) that is extensible to hundreds of thousands of heterogeneous nodes, fundamental advances in energy efficient communication protocols must occur. In this paper, we first propose an energy-efficient and robust intra-cluster communication bit-map assisted (BMA) MAC protocol for large-scale cluster-based WSNs and then derive energy models for BMA, conventional TDMA, and energy efficient TDMA (E-TDMA) using two different approaches. We use simulation to validate these analytical models. BMA is intended for event-driven sensing applications, that is, sensor nodes forward data to the cluster head only if significant events are observed. It has low complexity and utilizes a dynamic scheduling scheme. Clustering is a promising distributing technique used in large-scale WSNs, and when combined with an appropriate MAC scheme, high energy efficiency can be achieved. The results indicate that BMA can improve the performance of wireless sensor networks by reducing energy expenditure and packet latency. The performance of BMA as an intra-cluster MAC scheme relative to E-TDMA depends on the sensor node traffic offer load and several other key system parameters. For most sensor-based applications, the values of these parameters can be constrained such that BMA provides enhanced performance.
... Aravind Ganapathiraju, Vaibhava Goel, Joseph Picone, Andres Corrada, George Doddington, Katri... more ... Aravind Ganapathiraju, Vaibhava Goel, Joseph Picone, Andres Corrada, George Doddington, Katrin Kirchhoff, Mark Ordowski, Barbara Wheatley ... Four passes of Baum-Welch reestimation were used to generate single-component mixture distributions for the triphone models. ...
The goal for any engineering come is to convey to student engineers the knowledge and skills whic... more The goal for any engineering come is to convey to student engineers the knowledge and skills which will make them valuable to industry. The job market calls for engineers with a broad understanding of the mathematics and theory underlying engineering concepts, but also, engineers who ...
Abstract The problem of proper noun recognition is key to developing pervasive voice interfaces i... more Abstract The problem of proper noun recognition is key to developing pervasive voice interfaces in applications such as directory assistance and data entry for telecommunications. Recognition of such words requires an ability to generate reasonably ...
... (1) Rempα( ) 1 2l ---- yif xi α,( ) – i 1= l ∑ = Aravind Ganapathiraju and Joseph Picone Dept... more ... (1) Rempα( ) 1 2l ---- yif xi α,( ) – i 1= l ∑ = Aravind Ganapathiraju and Joseph Picone Dept. of Elec. and Computer Engr. ... [5] B. Schölkopf, et. al., Advances in Kernel Methods: Support Vector Machines, MIT Press, Cambridge, MA, USA, December 1998. ...
The lack of freely available state-of-the-art Speech-to-Text (STT) software has been a major hind... more The lack of freely available state-of-the-art Speech-to-Text (STT) software has been a major hindrance to the development of new audio information processing technology. The high cost of the infrastructure required to conduct state-of-the-art speech recognition research prevents many small research groups from evaluating new ideas on large-scale tasks. The Institute for Signal and Information Processing (ISIP) has been committed to providing the research community with free software tools for digital information processing via the Internet to facilitate worldwide synergistic development of speech recognition technology. In this paper, we present the core components of an available state-of-the-art Speech-to-Text system: an acoustic processor which converts the speech signal into a sequence of feature vectors; a training module which estimates the parameters for a Hidden Markov Model; a linguistic processor which predicts the next word given a sequence of previously recognized words; and a search engine which finds the most probable word sequence given a set of feature vectors. By far, the most important component of a Speech-to-Text system is the search engine or decoder. The decoder was designed to be modular and extensible in order to be able to handle a wide variety speech recognition problems (connected digits, studio-quality read speech and spontaneous telephone conversations) in a transparent fashion. The process of moving from a well defined task to a less rigorously defined recognition problem (Spontaneous Speech Recognition, i.e. Switchboard) requires the decoder to have a sophisticated control structure. Hence very few good decoders exist and the best decoders are always considered proprietary. The ISIP decoder has the capability to compile network grammars, efficiently decode n-gram language models, generate and rescore lattices, generate N-best lists, and perform forced alignments. The decoder is based on a hierarchical Viterbi, breadth-first search tree which will support cross-word triphone acoustic models. The decoder uses lexical trees to represent the pronunciations of all words. The decoder uses beam pruning at the state, phone and word levels and limits the number of active model instances per frame to prevent the evaluation of low-scoring hypothesis. A benchmark evaluation (which does not include MLLR or vocal tract normalization) conducted on a subset of the Switchboard corpus yielded a WER of 46.1% at 30xRT. This is competitive with commercially available Speech-to-Text systems. The ISIP Speech-to-Text system currently produces mel-frequency scaled cepstral coefficients and is capable of estimating the mixture densities using Viterbi training. The design of the acoustic processor will allow other feature sets to be easily incorporated into the ISIP Speech-to-Text system. Finally, some experimental results of the complete system will be presented in this paper. To obtain further information o f t he I SIP S peech-to-Text s ystem t he f ollowing U RL i s a vailable: http://WWW.ISIP.MsState.Edu/resources/technology/projects/speech_recognition/ .
... non-linear transformations [16]. Schemes like linear discriminant analysis (LDA), ... Support... more ... non-linear transformations [16]. Schemes like linear discriminant analysis (LDA), ... SupportVector Machines (SVMs), a discriminative machine learning technique which is the basis of this dissertation, also falls ... related to this discrimination quantity. ...
In developing an architecture for wireless sensor networks (WSNs) that is extensible to hundreds ... more In developing an architecture for wireless sensor networks (WSNs) that is extensible to hundreds of thousands of heterogeneous nodes, fundamental advances in energy efficient communication protocols must occur. In this paper, we first propose an energy-efficient and robust intra-cluster communication bit-map assisted (BMA) MAC protocol for large-scale cluster-based WSNs and then derive energy models for BMA, conventional TDMA, and energy efficient TDMA (E-TDMA) using two different approaches. We use simulation to validate these analytical models. BMA is intended for event-driven sensing applications, that is, sensor nodes forward data to the cluster head only if significant events are observed. It has low complexity and utilizes a dynamic scheduling scheme. Clustering is a promising distributing technique used in large-scale WSNs, and when combined with an appropriate MAC scheme, high energy efficiency can be achieved. The results indicate that BMA can improve the performance of wireless sensor networks by reducing energy expenditure and packet latency. The performance of BMA as an intra-cluster MAC scheme relative to E-TDMA depends on the sensor node traffic offer load and several other key system parameters. For most sensor-based applications, the values of these parameters can be constrained such that BMA provides enhanced performance.
... Aravind Ganapathiraju, Vaibhava Goel, Joseph Picone, Andres Corrada, George Doddington, Katri... more ... Aravind Ganapathiraju, Vaibhava Goel, Joseph Picone, Andres Corrada, George Doddington, Katrin Kirchhoff, Mark Ordowski, Barbara Wheatley ... Four passes of Baum-Welch reestimation were used to generate single-component mixture distributions for the triphone models. ...
The goal for any engineering come is to convey to student engineers the knowledge and skills whic... more The goal for any engineering come is to convey to student engineers the knowledge and skills which will make them valuable to industry. The job market calls for engineers with a broad understanding of the mathematics and theory underlying engineering concepts, but also, engineers who ...
Abstract The problem of proper noun recognition is key to developing pervasive voice interfaces i... more Abstract The problem of proper noun recognition is key to developing pervasive voice interfaces in applications such as directory assistance and data entry for telecommunications. Recognition of such words requires an ability to generate reasonably ...
... (1) Rempα( ) 1 2l ---- yif xi α,( ) – i 1= l ∑ = Aravind Ganapathiraju and Joseph Picone Dept... more ... (1) Rempα( ) 1 2l ---- yif xi α,( ) – i 1= l ∑ = Aravind Ganapathiraju and Joseph Picone Dept. of Elec. and Computer Engr. ... [5] B. Schölkopf, et. al., Advances in Kernel Methods: Support Vector Machines, MIT Press, Cambridge, MA, USA, December 1998. ...
The lack of freely available state-of-the-art Speech-to-Text (STT) software has been a major hind... more The lack of freely available state-of-the-art Speech-to-Text (STT) software has been a major hindrance to the development of new audio information processing technology. The high cost of the infrastructure required to conduct state-of-the-art speech recognition research prevents many small research groups from evaluating new ideas on large-scale tasks. The Institute for Signal and Information Processing (ISIP) has been committed to providing the research community with free software tools for digital information processing via the Internet to facilitate worldwide synergistic development of speech recognition technology. In this paper, we present the core components of an available state-of-the-art Speech-to-Text system: an acoustic processor which converts the speech signal into a sequence of feature vectors; a training module which estimates the parameters for a Hidden Markov Model; a linguistic processor which predicts the next word given a sequence of previously recognized words; and a search engine which finds the most probable word sequence given a set of feature vectors. By far, the most important component of a Speech-to-Text system is the search engine or decoder. The decoder was designed to be modular and extensible in order to be able to handle a wide variety speech recognition problems (connected digits, studio-quality read speech and spontaneous telephone conversations) in a transparent fashion. The process of moving from a well defined task to a less rigorously defined recognition problem (Spontaneous Speech Recognition, i.e. Switchboard) requires the decoder to have a sophisticated control structure. Hence very few good decoders exist and the best decoders are always considered proprietary. The ISIP decoder has the capability to compile network grammars, efficiently decode n-gram language models, generate and rescore lattices, generate N-best lists, and perform forced alignments. The decoder is based on a hierarchical Viterbi, breadth-first search tree which will support cross-word triphone acoustic models. The decoder uses lexical trees to represent the pronunciations of all words. The decoder uses beam pruning at the state, phone and word levels and limits the number of active model instances per frame to prevent the evaluation of low-scoring hypothesis. A benchmark evaluation (which does not include MLLR or vocal tract normalization) conducted on a subset of the Switchboard corpus yielded a WER of 46.1% at 30xRT. This is competitive with commercially available Speech-to-Text systems. The ISIP Speech-to-Text system currently produces mel-frequency scaled cepstral coefficients and is capable of estimating the mixture densities using Viterbi training. The design of the acoustic processor will allow other feature sets to be easily incorporated into the ISIP Speech-to-Text system. Finally, some experimental results of the complete system will be presented in this paper. To obtain further information o f t he I SIP S peech-to-Text s ystem t he f ollowing U RL i s a vailable: http://WWW.ISIP.MsState.Edu/resources/technology/projects/speech_recognition/ .
... non-linear transformations [16]. Schemes like linear discriminant analysis (LDA), ... Support... more ... non-linear transformations [16]. Schemes like linear discriminant analysis (LDA), ... SupportVector Machines (SVMs), a discriminative machine learning technique which is the basis of this dissertation, also falls ... related to this discrimination quantity. ...
Uploads
Papers by Joseph Picone