Professor Diane Brentari is one of three Directors of the Center for Gesture Sign Language. Currently her work addresses cross-linguistic variation among sign languages, particularly in the parameters of handshape and movement. She is also interested in how the mental lexicon emerges in historical time, which includes the relationship between gesture, homesign systems and well-established sign languages. In addition Brentari has developed the Prosodic Model of sign language phonology, and her work addresses the prosodic structure of signed and spoken languages. Phone: 773-702-5725 Address: Linguistics Department University of Chicago 1115 E. 58th Street Chicago, IL 60637
<p>Responses to stimulus vignettes with an agent (video example 1) and without an agent (vi... more <p>Responses to stimulus vignettes with an agent (video example 1) and without an agent (video example 2).</p
How does the language ecology affect the speed of the emergence of phonology? BACKGROUND: [1,2] M... more How does the language ecology affect the speed of the emergence of phonology? BACKGROUND: [1,2] METHODS: Participants: 25 signers-12 signers of Central Taurus Sign Language (CTSL): CTSL-cohorts 1,2,3 (4 signers each)-13 signers of from Nicaragua: homesigners (4), & Nicaraguan Sign Language (NSL): NSL-cohort1 (4), NSL-cohort2 (5) Types of interaction/input: ±Horizontal contact: does the person sign with other signers ±vertical contact: does the person sign have a language model from the previous cohort Esogenic: homogeneous community membership Exogenic: heterogeneous community membership
Does knowledge of language transfer across language modalities? For example, can speakers who hav... more Does knowledge of language transfer across language modalities? For example, can speakers who have had no sign language experience spontaneously project grammatical principles of English to American Sign Language (ASL) signs? To address this question, here, we explore a grammatical illusion. Using spoken language, we first show that a single word with doubling (e.g., trafraf) can elicit conflicting linguistic responses, depending on the level of linguistic analysis (phonology vs. morphology). We next show that speakers with no command of a sign language extend these same principles to novel ASL signs. Remarkably, the morphological analysis of ASL signs depends on the morphology of participants' spoken language. Speakers of Malayalam (a language with rich reduplicative morphology) prefer XX signs when doubling signals morphological plurality, whereas no such preference is seen in speakers of Mandarin (a language with no productive plural morphology). Our conclusions open up the p...
Publisher Summary This chapter presents an analysis of ASL, the signed language of the American D... more Publisher Summary This chapter presents an analysis of ASL, the signed language of the American Deaf., phonology, which focuses on the information structure of the sign. General phonological theory should operate in a uniform fashion across modalities and provide theoretical units that play similar, though not identical, roles in the various modalities that may underlie human language. The chapter focuses on analogy that should be established between the coda of a spoken language syllable and the second/weak hand of a two-handed sign. If the proposal is correct, then the cross-modality analogies that are to be made by the theory of phonology are, in a sense, more abstract than previous researchers had been led to believe. If there is in sign language something analogous to the syllable of spoken language, then it does not carry over the sequential character of the syllable of spoken language.
Over the history of research on sign languages, much scholarship has highlighted the pervasive pr... more Over the history of research on sign languages, much scholarship has highlighted the pervasive presence of signs whose forms relate to their meaning in a non-arbitrary way. The presence of these forms suggests that sign language vocabularies are shaped, at least in part, by a pressure toward maintaining a link between form and meaning in wordforms. We use a vector space approach to test the ways this pressure might shape sign language vocabularies, examining how non-arbitrary forms are distributed within the lexicons of two unrelated sign languages. Vector space models situate the representations of words in a multi-dimensional space where the distance between words indexes their relatedness in meaning. Using phonological information from the vocabularies of American Sign Language (ASL) and British Sign Language (BSL), we tested whether increased similarity between the semantic representations of signs corresponds to increased phonological similarity. The results of the computationa...
In this article, we analyze the grammatical incorporation of demonstratives in a tactile language... more In this article, we analyze the grammatical incorporation of demonstratives in a tactile language, emerging in communities of DeafBlind signers in the US who communicate via reciprocal, tactile channels—a practice known as “protactile.” In the first part of the paper, we report on a synchronic analysis of recent data, identifying four types of “taps,” which have taken on different functions in protacitle language and communication. In the second part of the paper, we report on a diachronic analysis of data collected over the past 8 years. This analysis reveals the emergence of a new kind of “propriotactic” tap, which has been co-opted by the emerging phonological system of protactile language. We link the emergence of this unit to both demonstrative taps, and backchanneling taps, both of which emerged earlier. We show how these forms are all undergirded by an attention-modulation function, more or less backgrounded, and operating across different semiotic systems. In doing so, we co...
Existing work on sign language translationthat is, translation from sign language videos into sen... more Existing work on sign language translationthat is, translation from sign language videos into sentences in a written language-has focused mainly on (1) data collected in a controlled environment or (2) data in a specific domain, which limits the applicability to realworld settings. In this paper, we introduce OpenASL, a large-scale American Sign Language (ASL)-English dataset collected from online video sites (e.g., YouTube). OpenASL contains 288 hours of ASL videos in multiple domains from over 200 signers and is the largest publicly available ASL translation dataset to date. To tackle the challenges of sign language translation in realistic settings and without glosses, we propose a set of techniques including sign search as a pretext task for pre-training and fusion of mouthing and handshape features. The proposed techniques produce consistent and large improvements in translation quality, over baseline models based on prior work. 1
2012 IEEE Spoken Language Technology Workshop (SLT), 2012
We study the recognition of fingerspelling sequences in American Sign Language from video using t... more We study the recognition of fingerspelling sequences in American Sign Language from video using tandem-style models, in which the outputs of multilayer perceptron (MLP) classifiers are used as observations in a hidden Markov model (HMM)-based recognizer. We compare a baseline HMMbased recognizer, a tandem recognizer using MLP letter classifiers, and a tandem recognizer using MLP classifiers of phonological features. We present experiments on a database of fingerspelling videos. We find that the tandem approaches outperform an HMM-based baseline, and that phonological feature-based tandem models outperform letter-based tandem models.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Natural language processing for sign language video-including tasks like recognition, translation... more Natural language processing for sign language video-including tasks like recognition, translation, and search-is crucial for making artificial intelligence technologies accessible to deaf individuals, and is gaining research interest in recent years. In this paper, we address the problem of searching for fingerspelled keywords or key phrases in raw sign language videos. This is an important task since significant content in sign language is often conveyed via fingerspelling, and to our knowledge the task has not been studied before. We propose an end-to-end model for this task, FSS-Net, that jointly detects fingerspelling and matches it to a text sequence. Our experiments, done on a large public dataset of ASL fingerspelling in the wild, show the importance of fingerspelling detection as a component of a search and retrieval model. Our model significantly outperforms baseline methods adapted from prior work on related tasks.
Table 1 provides the numbers of clips and of fingerspelling segments in the datasets used in our ... more Table 1 provides the numbers of clips and of fingerspelling segments in the datasets used in our work. Note that the number of fingerspelling segments is not exactly same as in [7, 8] due to the 75-frame overlap when we split raw video into 300-frame clips. On average there are 1.9/1.8 fingerspelling segments per clip for ChicagoFSWild/ChicagoFSWild+. The distributions of durations are shown in Figure 1.
2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019
Sign language recognition is a challenging gesture sequence recognition problem, characterized by... more Sign language recognition is a challenging gesture sequence recognition problem, characterized by quick and highly coarticulated motion. In this paper we focus on recognition of fingerspelling sequences in American Sign Language (ASL) videos collected in the wild, mainly from YouTube and Deaf social media. Most previous work on sign language recognition has focused on controlled settings where the data is recorded in a studio environment and the number of signers is limited. Our work aims to address the challenges of real-life data, reducing the need for detection or segmentation modules commonly used in this domain. We propose an end-to-end model based on an iterative attention mechanism, without explicit hand detection or segmentation. Our approach dynamically focuses on increasingly high-resolution regions of interest. It outperforms prior work by a large margin. We also introduce a newly collected data set of crowdsourced annotations of fingerspelling in the wild, and show that performance can be further improved with this additional data set.
2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021
Fingerspelling detection and recognition in video of American Sign Language. The goal of detectio... more Fingerspelling detection and recognition in video of American Sign Language. The goal of detection is to find intervals corresponding to fingerspelling (here indicated by open/close parentheses), and the goal of recognition is to transcribe each of those intervals into letter sequences. Our focus in this paper is on detection that enables accurate recognition. In this example (with downsampled frames), the fingerspelled words are PIRATES and PATRICK, shown along with their canonical handshapes aligned roughly with the most-canonical corresponding frames. Non-fingerspelled signs are labeled with their glosses. The English translation is "Moving furtively, pirates steal the boy Patrick."
We study the problem of recognizing video sequences of fingerspelled letters in American Sign Lan... more We study the problem of recognizing video sequences of fingerspelled letters in American Sign Language (ASL). Fingerspelling comprises a significant but relatively understudied part of ASL. Recognizing fingerspelling is challenging for a number of reasons: It involves quick, small motions that are often highly coarticulated; it exhibits significant variation between signers; and there has been a dearth of continuous fingerspelling data collected. In this work we collect and annotate a new data set of continuous fingerspelling videos, compare several types of recognizers, and explore the problem of signer variation. Our best-performing models are segmental (semi-Markov) conditional random fields using deep neural network-based features. In the signer-dependent setting, our recognizers achieve up to about 92% letter accuracy. The multi-signer setting is much more challenging, but with neural network adaptation we achieve up to 83% letter accuracies in this setting.
<p>Responses to stimulus vignettes with an agent (video example 1) and without an agent (vi... more <p>Responses to stimulus vignettes with an agent (video example 1) and without an agent (video example 2).</p
How does the language ecology affect the speed of the emergence of phonology? BACKGROUND: [1,2] M... more How does the language ecology affect the speed of the emergence of phonology? BACKGROUND: [1,2] METHODS: Participants: 25 signers-12 signers of Central Taurus Sign Language (CTSL): CTSL-cohorts 1,2,3 (4 signers each)-13 signers of from Nicaragua: homesigners (4), & Nicaraguan Sign Language (NSL): NSL-cohort1 (4), NSL-cohort2 (5) Types of interaction/input: ±Horizontal contact: does the person sign with other signers ±vertical contact: does the person sign have a language model from the previous cohort Esogenic: homogeneous community membership Exogenic: heterogeneous community membership
Does knowledge of language transfer across language modalities? For example, can speakers who hav... more Does knowledge of language transfer across language modalities? For example, can speakers who have had no sign language experience spontaneously project grammatical principles of English to American Sign Language (ASL) signs? To address this question, here, we explore a grammatical illusion. Using spoken language, we first show that a single word with doubling (e.g., trafraf) can elicit conflicting linguistic responses, depending on the level of linguistic analysis (phonology vs. morphology). We next show that speakers with no command of a sign language extend these same principles to novel ASL signs. Remarkably, the morphological analysis of ASL signs depends on the morphology of participants' spoken language. Speakers of Malayalam (a language with rich reduplicative morphology) prefer XX signs when doubling signals morphological plurality, whereas no such preference is seen in speakers of Mandarin (a language with no productive plural morphology). Our conclusions open up the p...
Publisher Summary This chapter presents an analysis of ASL, the signed language of the American D... more Publisher Summary This chapter presents an analysis of ASL, the signed language of the American Deaf., phonology, which focuses on the information structure of the sign. General phonological theory should operate in a uniform fashion across modalities and provide theoretical units that play similar, though not identical, roles in the various modalities that may underlie human language. The chapter focuses on analogy that should be established between the coda of a spoken language syllable and the second/weak hand of a two-handed sign. If the proposal is correct, then the cross-modality analogies that are to be made by the theory of phonology are, in a sense, more abstract than previous researchers had been led to believe. If there is in sign language something analogous to the syllable of spoken language, then it does not carry over the sequential character of the syllable of spoken language.
Over the history of research on sign languages, much scholarship has highlighted the pervasive pr... more Over the history of research on sign languages, much scholarship has highlighted the pervasive presence of signs whose forms relate to their meaning in a non-arbitrary way. The presence of these forms suggests that sign language vocabularies are shaped, at least in part, by a pressure toward maintaining a link between form and meaning in wordforms. We use a vector space approach to test the ways this pressure might shape sign language vocabularies, examining how non-arbitrary forms are distributed within the lexicons of two unrelated sign languages. Vector space models situate the representations of words in a multi-dimensional space where the distance between words indexes their relatedness in meaning. Using phonological information from the vocabularies of American Sign Language (ASL) and British Sign Language (BSL), we tested whether increased similarity between the semantic representations of signs corresponds to increased phonological similarity. The results of the computationa...
In this article, we analyze the grammatical incorporation of demonstratives in a tactile language... more In this article, we analyze the grammatical incorporation of demonstratives in a tactile language, emerging in communities of DeafBlind signers in the US who communicate via reciprocal, tactile channels—a practice known as “protactile.” In the first part of the paper, we report on a synchronic analysis of recent data, identifying four types of “taps,” which have taken on different functions in protacitle language and communication. In the second part of the paper, we report on a diachronic analysis of data collected over the past 8 years. This analysis reveals the emergence of a new kind of “propriotactic” tap, which has been co-opted by the emerging phonological system of protactile language. We link the emergence of this unit to both demonstrative taps, and backchanneling taps, both of which emerged earlier. We show how these forms are all undergirded by an attention-modulation function, more or less backgrounded, and operating across different semiotic systems. In doing so, we co...
Existing work on sign language translationthat is, translation from sign language videos into sen... more Existing work on sign language translationthat is, translation from sign language videos into sentences in a written language-has focused mainly on (1) data collected in a controlled environment or (2) data in a specific domain, which limits the applicability to realworld settings. In this paper, we introduce OpenASL, a large-scale American Sign Language (ASL)-English dataset collected from online video sites (e.g., YouTube). OpenASL contains 288 hours of ASL videos in multiple domains from over 200 signers and is the largest publicly available ASL translation dataset to date. To tackle the challenges of sign language translation in realistic settings and without glosses, we propose a set of techniques including sign search as a pretext task for pre-training and fusion of mouthing and handshape features. The proposed techniques produce consistent and large improvements in translation quality, over baseline models based on prior work. 1
2012 IEEE Spoken Language Technology Workshop (SLT), 2012
We study the recognition of fingerspelling sequences in American Sign Language from video using t... more We study the recognition of fingerspelling sequences in American Sign Language from video using tandem-style models, in which the outputs of multilayer perceptron (MLP) classifiers are used as observations in a hidden Markov model (HMM)-based recognizer. We compare a baseline HMMbased recognizer, a tandem recognizer using MLP letter classifiers, and a tandem recognizer using MLP classifiers of phonological features. We present experiments on a database of fingerspelling videos. We find that the tandem approaches outperform an HMM-based baseline, and that phonological feature-based tandem models outperform letter-based tandem models.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Natural language processing for sign language video-including tasks like recognition, translation... more Natural language processing for sign language video-including tasks like recognition, translation, and search-is crucial for making artificial intelligence technologies accessible to deaf individuals, and is gaining research interest in recent years. In this paper, we address the problem of searching for fingerspelled keywords or key phrases in raw sign language videos. This is an important task since significant content in sign language is often conveyed via fingerspelling, and to our knowledge the task has not been studied before. We propose an end-to-end model for this task, FSS-Net, that jointly detects fingerspelling and matches it to a text sequence. Our experiments, done on a large public dataset of ASL fingerspelling in the wild, show the importance of fingerspelling detection as a component of a search and retrieval model. Our model significantly outperforms baseline methods adapted from prior work on related tasks.
Table 1 provides the numbers of clips and of fingerspelling segments in the datasets used in our ... more Table 1 provides the numbers of clips and of fingerspelling segments in the datasets used in our work. Note that the number of fingerspelling segments is not exactly same as in [7, 8] due to the 75-frame overlap when we split raw video into 300-frame clips. On average there are 1.9/1.8 fingerspelling segments per clip for ChicagoFSWild/ChicagoFSWild+. The distributions of durations are shown in Figure 1.
2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019
Sign language recognition is a challenging gesture sequence recognition problem, characterized by... more Sign language recognition is a challenging gesture sequence recognition problem, characterized by quick and highly coarticulated motion. In this paper we focus on recognition of fingerspelling sequences in American Sign Language (ASL) videos collected in the wild, mainly from YouTube and Deaf social media. Most previous work on sign language recognition has focused on controlled settings where the data is recorded in a studio environment and the number of signers is limited. Our work aims to address the challenges of real-life data, reducing the need for detection or segmentation modules commonly used in this domain. We propose an end-to-end model based on an iterative attention mechanism, without explicit hand detection or segmentation. Our approach dynamically focuses on increasingly high-resolution regions of interest. It outperforms prior work by a large margin. We also introduce a newly collected data set of crowdsourced annotations of fingerspelling in the wild, and show that performance can be further improved with this additional data set.
2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021
Fingerspelling detection and recognition in video of American Sign Language. The goal of detectio... more Fingerspelling detection and recognition in video of American Sign Language. The goal of detection is to find intervals corresponding to fingerspelling (here indicated by open/close parentheses), and the goal of recognition is to transcribe each of those intervals into letter sequences. Our focus in this paper is on detection that enables accurate recognition. In this example (with downsampled frames), the fingerspelled words are PIRATES and PATRICK, shown along with their canonical handshapes aligned roughly with the most-canonical corresponding frames. Non-fingerspelled signs are labeled with their glosses. The English translation is "Moving furtively, pirates steal the boy Patrick."
We study the problem of recognizing video sequences of fingerspelled letters in American Sign Lan... more We study the problem of recognizing video sequences of fingerspelled letters in American Sign Language (ASL). Fingerspelling comprises a significant but relatively understudied part of ASL. Recognizing fingerspelling is challenging for a number of reasons: It involves quick, small motions that are often highly coarticulated; it exhibits significant variation between signers; and there has been a dearth of continuous fingerspelling data collected. In this work we collect and annotate a new data set of continuous fingerspelling videos, compare several types of recognizers, and explore the problem of signer variation. Our best-performing models are segmental (semi-Markov) conditional random fields using deep neural network-based features. In the signer-dependent setting, our recognizers achieve up to about 92% letter accuracy. The multi-signer setting is much more challenging, but with neural network adaptation we achieve up to 83% letter accuracies in this setting.
Uploads
Papers by Diane Brentari