In this paper, we describe ongoing research towards building an automatic reading assessment syst... more In this paper, we describe ongoing research towards building an automatic reading assessment system that emulates a human expert in a spoken language learning scenario. Audio recordings of read aloud English stories by children of grades 6-8 are acquired on an available tablet application that facilitates guided oral reading and recording. The created recordings, uploaded to a web-based ratings panel, are currently evaluated by human experts on four relevant dimensions. Observations of typical learner progress patterns will form the bases of a system that applies Automatic Speech Recognition (ASR) techniques to obtain robust automatic predictions of reading fluency and word decoding accuracy.
where p1 and z ′ 1 are respectively the new homogeneous coordinates of the pixel and the new dept... more where p1 and z ′ 1 are respectively the new homogeneous coordinates of the pixel and the new depth, projected onto frame 2, and K is the camera matrix. The above equation consists of the scene depth, as obtained by rigid motion of the scene and the additional changes obtained from the motions of the individually movable objects. Note that the motion mask is only applied to regions of potentially movable objects m1(i, j), determined by the semantic segmentation model. The movable mask m1(i, j) (of frame 1) restricts motion of objects relative to the scene to occur only at pixels that belong to movable objects.
2017 Twenty-third National Conference on Communications (NCC)
Recordings of read-aloud stories by children in a school setting can be used to provide an assess... more Recordings of read-aloud stories by children in a school setting can be used to provide an assessment of reading skills via automatic speech recognition (ASR). ASR, however, is known to be highly susceptible to background noise. The unusual variety of foreground (breath release, mic pops, etc.) and background (children playing, distinct background talker, wind, etc.) non-speech sounds makes this application particularly challenging. Motivated by the observation on real-world data that close to 50% of the recorded audio comprises purely non-speech activity, we investigate robust approaches to voice activity detection to eliminate non-speech segments to the extent possible prior to ASR. We have exploited energy-based and harmonicity-based features coupled with suitable temporal smoothing constraints in a two-pass noise preprocessing system. A discussion of the voice activity detection performance of the system is presented with reference to the characteristics of the noise types.
In this paper, we describe ongoing research towards building an automatic reading assessment syst... more In this paper, we describe ongoing research towards building an automatic reading assessment system that emulates a human expert in a spoken language learning scenario. Audio recordings of read aloud English stories by children of grades 6-8 are acquired on an available tablet application that facilitates guided oral reading and recording. The created recordings, uploaded to a web-based ratings panel, are currently evaluated by human experts on four relevant dimensions. Observations of typical learner progress patterns will form the bases of a system that applies Automatic Speech Recognition (ASR) techniques to obtain robust automatic predictions of reading fluency and word decoding accuracy.
where p1 and z ′ 1 are respectively the new homogeneous coordinates of the pixel and the new dept... more where p1 and z ′ 1 are respectively the new homogeneous coordinates of the pixel and the new depth, projected onto frame 2, and K is the camera matrix. The above equation consists of the scene depth, as obtained by rigid motion of the scene and the additional changes obtained from the motions of the individually movable objects. Note that the motion mask is only applied to regions of potentially movable objects m1(i, j), determined by the semantic segmentation model. The movable mask m1(i, j) (of frame 1) restricts motion of objects relative to the scene to occur only at pixels that belong to movable objects.
2017 Twenty-third National Conference on Communications (NCC)
Recordings of read-aloud stories by children in a school setting can be used to provide an assess... more Recordings of read-aloud stories by children in a school setting can be used to provide an assessment of reading skills via automatic speech recognition (ASR). ASR, however, is known to be highly susceptible to background noise. The unusual variety of foreground (breath release, mic pops, etc.) and background (children playing, distinct background talker, wind, etc.) non-speech sounds makes this application particularly challenging. Motivated by the observation on real-world data that close to 50% of the recorded audio comprises purely non-speech activity, we investigate robust approaches to voice activity detection to eliminate non-speech segments to the extent possible prior to ASR. We have exploited energy-based and harmonicity-based features coupled with suitable temporal smoothing constraints in a two-pass noise preprocessing system. A discussion of the voice activity detection performance of the system is presented with reference to the characteristics of the noise types.
Uploads
Papers by Ankita Pasad