Kola Tubosun, (born Kolawole Olatubosun) has a Masters in Linguistics/Teaching English as a Second Language from Southern Illinois University Edwardsville, and a BA in Linguistics from the Department of Linguistics and African Languages, University of Ibadan, Nigeria.
The Yorùbá language materials at the British Library (BL) span the years between 1843, when the f... more The Yorùbá language materials at the British Library (BL) span the years between 1843, when the first item was published, and the present day, providing an impressive catalogue of the history of Yorùbá writing through the early days of 19th-century missionary writings with a yet undeveloped orthography, the boom of anthropological literature of the later 19th century, the creative fervour of the early and mid-20th century, and the later experimentations of 21st-century monolingual and bilingual writings. From September 2019 to September 2020, I was Chevening Research Fellow at the BL working on this wide-ranging collection with the BL’s Africa Curator. In this article, I present an analysis of what the records contain and what is missing, along with a record of some challenges I faced cataloguing the work – ranging from technological limitations to issues of orthography.
Yorùbá is a widely spoken West African language with a writing system rich in orthographic and to... more Yorùbá is a widely spoken West African language with a writing system rich in orthographic and tonal diacritics. They provide morphological information, are crucial for lexical disambiguation, pronunciation and are vital for any computational Speech or Natural Language Processing tasks. However diacritic marks are commonly excluded from electronic texts due to limited device and application support as well as general education on proper usage. We report on recent efforts at dataset cultivation. By aggregating and improving disparate texts from the web and various personal libraries, we were able to significantly grow our clean Yorùbá dataset from a majority Bibilical text corpora with three sources to millions of tokens from over a dozen sources. We evaluate updated diacritic restoration models on a new, general purpose, public-domain Yorùbá evaluation dataset of modern journalistic news text, selected to be multi-purpose and reflecting contemporary usage. All pre-trained models, datasets and source-code have been released as an open-source project to advance efforts on Yorùbá language technology.
This thesis documents the nature and character of initial tonal strategies native English speaker... more This thesis documents the nature and character of initial tonal strategies native English speakers employ when learning Yoruba as a second language, using elicited imitation and other computer tonal analysis of errors and strategies on 40 human subjects of various selected backgrounds. This research also provides initial insight into the detail of tonal acquisition in second language learning in general, allowing for future acquisition and learning procedures to be better understood and better prescribed.
This thesis documents the nature and character of initial tonal strategies native English speaker... more This thesis documents the nature and character of initial tonal strategies native English speakers employ when learning Yoruba as a second language, using elicited imitation and other computer tonal analysis of errors and strategies on 40 human subjects of various selected backgrounds. This research also provides initial insight into the detail of tonal acquisition in second language learning in general, allowing for future acquisition and learning procedures to be better understood and better prescribed.
The Yorùbá language materials at the British Library (BL) span the years between 1843, when the f... more The Yorùbá language materials at the British Library (BL) span the years between 1843, when the first item was published, and the present day, providing an impressive catalogue of the history of Yorùbá writing through the early days of 19th-century missionary writings with a yet undeveloped orthography, the boom of anthropological literature of the later 19th century, the creative fervour of the early and mid-20th century, and the later experimentations of 21st-century monolingual and bilingual writings. From September 2019 to September 2020, I was Chevening Research Fellow at the BL working on this wide-ranging collection with the BL’s Africa Curator. In this article, I present an analysis of what the records contain and what is missing, along with a record of some challenges I faced cataloguing the work – ranging from technological limitations to issues of orthography.
Yorùbá is a widely spoken West African language with a writing system rich in orthographic and to... more Yorùbá is a widely spoken West African language with a writing system rich in orthographic and tonal diacritics. They provide morphological information, are crucial for lexical disambiguation, pronunciation and are vital for any computational Speech or Natural Language Processing tasks. However diacritic marks are commonly excluded from electronic texts due to limited device and application support as well as general education on proper usage. We report on recent efforts at dataset cultivation. By aggregating and improving disparate texts from the web and various personal libraries, we were able to significantly grow our clean Yorùbá dataset from a majority Bibilical text corpora with three sources to millions of tokens from over a dozen sources. We evaluate updated diacritic restoration models on a new, general purpose, public-domain Yorùbá evaluation dataset of modern journalistic news text, selected to be multi-purpose and reflecting contemporary usage. All pre-trained models, datasets and source-code have been released as an open-source project to advance efforts on Yorùbá language technology.
This thesis documents the nature and character of initial tonal strategies native English speaker... more This thesis documents the nature and character of initial tonal strategies native English speakers employ when learning Yoruba as a second language, using elicited imitation and other computer tonal analysis of errors and strategies on 40 human subjects of various selected backgrounds. This research also provides initial insight into the detail of tonal acquisition in second language learning in general, allowing for future acquisition and learning procedures to be better understood and better prescribed.
This thesis documents the nature and character of initial tonal strategies native English speaker... more This thesis documents the nature and character of initial tonal strategies native English speakers employ when learning Yoruba as a second language, using elicited imitation and other computer tonal analysis of errors and strategies on 40 human subjects of various selected backgrounds. This research also provides initial insight into the detail of tonal acquisition in second language learning in general, allowing for future acquisition and learning procedures to be better understood and better prescribed.
Uploads