Ankita Pasad

Followers

Following

Public Views

Interests

Uploads

Papers by Ankita Pasad

SLUE: New Benchmark Tasks for Spoken Language Understanding Evaluation on Natural Speech

arXiv (Cornell University), Nov 19, 2021

Download

On the Use of External Data for Spoken Named Entity Recognition

arXiv (Cornell University), Dec 14, 2021

Download

On the Contributions of Visual and Textual Supervision in Low-Resource Semantic Speech Retrieval

arXiv (Cornell University), Apr 24, 2019

Download

What do self-supervised speech models know about words?

arXiv (Cornell University), Jun 30, 2023

Download

Layer-wise Analysis of a Self-supervised Speech Representation Model

arXiv (Cornell University), Jul 9, 2021

Download

Hidden State Variability of Pretrained Language Models Can Guide Computation Reduction for Transfer Learning

Findings of the Association for Computational Linguistics: EMNLP 2022

Download

SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding Tasks

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Download

Comparative Layer-Wise Analysis of Self-Supervised Speech Models

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Download

SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding Tasks

arXiv (Cornell University), Dec 20, 2022

Download

On the Use of External Data for Spoken Named Entity Recognition

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Download

SLUE: New Benchmark Tasks For Spoken Language Understanding Evaluation on Natural Speech

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Download

Layer-Wise Analysis of a Self-Supervised Speech Representation Model

2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2021

Download

Automatic Assessment of Reading with Speech Recognition Technology

In this paper, we describe ongoing research towards building an automatic reading assessment syst... more In this paper, we describe ongoing research towards building an automatic reading assessment system that emulates a human expert in a spoken language learning scenario. Audio recordings of read aloud English stories by children of grades 6-8 are acquired on an available tablet application that facilitates guided oral reading and recording. The created recordings, uploaded to a web-based ratings panel, are currently evaluated by human experts on four relevant dimensions. Observations of typical learner progress patterns will form the bases of a system that applies Automatic Speech Recognition (ASR) techniques to obtain robust automatic predictions of reading fluency and word decoding accuracy.

Download

Taskology: Utilizing Task Relations at Scale (Supplemental Material)

where p1 and z ′ 1 are respectively the new homogeneous coordinates of the pixel and the new dept... more where p1 and z ′ 1 are respectively the new homogeneous coordinates of the pixel and the new depth, projected onto frame 2, and K is the camera matrix. The above equation consists of the scene depth, as obtained by rigid motion of the scene and the additional changes obtained from the motions of the individually movable objects. Note that the motion mask is only applied to regions of potentially movable objects m1(i, j), determined by the semantic segmentation model. The movable mask m1(i, j) (of frame 1) restricts motion of objects relative to the scene to occur only at pixels that belong to movable objects.

Download

Taskology: Utilizing Task Relations at Scale

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021

Download

Improving Semantic Segmentation through Spatio-Temporal Consistency Learned from Videos

Download

On the Contributions of Visual and Textual Supervision in Low-Resource Semantic Speech Retrieval

Interspeech 2019

Download

Voice activity detection for children's read speech recognition in noisy conditions

2017 Twenty-third National Conference on Communications (NCC)

Recordings of read-aloud stories by children in a school setting can be used to provide an assess... more Recordings of read-aloud stories by children in a school setting can be used to provide an assessment of reading skills via automatic speech recognition (ASR). ASR, however, is known to be highly susceptible to background noise. The unusual variety of foreground (breath release, mic pops, etc.) and background (children playing, distinct background talker, wind, etc.) non-speech sounds makes this application particularly challenging. Motivated by the observation on real-world data that close to 50% of the recorded audio comprises purely non-speech activity, we investigate robust approaches to voice activity detection to eliminate non-speech segments to the extent possible prior to ASR. We have exploited energy-based and harmonicity-based features coupled with suitable temporal smoothing constraints in a two-pass noise preprocessing system. A discussion of the voice activity detection performance of the system is presented with reference to the characteristics of the noise types.