Search | arXiv e-print repository

Connected Speech-Based Cognitive Assessment in Chinese and English

Authors: Saturnino Luz, Sofia De La Fuente Garcia, Fasih Haider, Davida Fromm, Brian MacWhinney, Alyssa Lanzi, Ya-Ning Chang, Chia-Ju Chou, Yi-Chien Liu

Abstract: We present a novel benchmark dataset and prediction tasks for investigating approaches to assess cognitive function through analysis of connected speech. The dataset consists of speech samples and clinical information for speakers of Mandarin Chinese and English with different levels of cognitive impairment as well as individuals with normal cognition. These data have been carefully matched by age… ▽ More We present a novel benchmark dataset and prediction tasks for investigating approaches to assess cognitive function through analysis of connected speech. The dataset consists of speech samples and clinical information for speakers of Mandarin Chinese and English with different levels of cognitive impairment as well as individuals with normal cognition. These data have been carefully matched by age and sex by propensity score analysis to ensure balance and representativity in model training. The prediction tasks encompass mild cognitive impairment diagnosis and cognitive test score prediction. This framework was designed to encourage the development of approaches to speech-based cognitive assessment which generalise across languages. We illustrate it by presenting baseline prediction models that employ language-agnostic and comparable features for diagnosis and cognitive test score prediction. The models achieved unweighted average recall was 59.2% in diagnosis, and root mean squared error of 2.89 in score prediction. △ Less

Submitted 18 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

Comments: To appear in Proceedings of Interspeech 2024

ACM Class: J.3; I.5.4

arXiv:2406.03138 [pdf, other]

A Frame-based Attention Interpretation Method for Relevant Acoustic Feature Extraction in Long Speech Depression Detection

Authors: Qingkun Deng, Saturnino Luz, Sofia de la Fuente Garcia

Abstract: Speech-based depression detection tools could help early screening of depression. Here, we address two issues that may hinder the clinical practicality of such tools: segment-level labelling noise and a lack of model interpretability. We propose a speech-level Audio Spectrogram Transformer to avoid segment-level labelling. We observe that the proposed model significantly outperforms a segment-leve… ▽ More Speech-based depression detection tools could help early screening of depression. Here, we address two issues that may hinder the clinical practicality of such tools: segment-level labelling noise and a lack of model interpretability. We propose a speech-level Audio Spectrogram Transformer to avoid segment-level labelling. We observe that the proposed model significantly outperforms a segment-level model, providing evidence for the presence of segment-level labelling noise in audio modality and the advantage of longer-duration speech analysis for depression detection. We introduce a frame-based attention interpretation method to extract acoustic features from prediction-relevant waveform signals for interpretation by clinicians. Through interpretation, we observe that the proposed model identifies reduced loudness and F0 as relevant signals of depression, which aligns with the speech characteristics of depressed patients documented in clinical studies. △ Less

Submitted 7 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

Comments: 5 pages, 3 figures. arXiv admin note: substantial text overlap with arXiv:2309.13476

arXiv:2404.15367 [pdf, other]

Leveraging Visibility Graphs for Enhanced Arrhythmia Classification with Graph Convolutional Networks

Authors: Rafael F. Oliveira, Gladston J. P. Moreira, Vander L. S. Freitas, Eduardo J. S. Luz

Abstract: Arrhythmias, detectable via electrocardiograms (ECGs), pose significant health risks, emphasizing the need for robust automated identification techniques. Although traditional deep learning methods have shown potential, recent advances in graph-based strategies are aimed at enhancing arrhythmia detection performance. However, effectively representing ECG signals as graphs remains a challenge. This… ▽ More Arrhythmias, detectable via electrocardiograms (ECGs), pose significant health risks, emphasizing the need for robust automated identification techniques. Although traditional deep learning methods have shown potential, recent advances in graph-based strategies are aimed at enhancing arrhythmia detection performance. However, effectively representing ECG signals as graphs remains a challenge. This study explores graph representations of ECG signals using Visibility Graph (VG) and Vector Visibility Graph (VVG), coupled with Graph Convolutional Networks (GCNs) for arrhythmia classification. Through experiments on the MIT-BIH dataset, we investigated various GCN architectures and preprocessing parameters. The results reveal that GCNs, when integrated with VG and VVG for signal graph mapping, can classify arrhythmias without the need for preprocessing or noise removal from ECG signals. While both VG and VVG methods show promise, VG is notably more efficient. The proposed approach was competitive compared to baseline methods, although classifying the S class remains challenging, especially under the inter-patient paradigm. Computational complexity, particularly with the VVG method, required data balancing and sophisticated implementation strategies. The source code is publicly available for further research and development at https://github.com/raffoliveira/VG_for_arrhythmia_classification_with_GCN. △ Less

Submitted 19 April, 2024; originally announced April 2024.

arXiv:2404.13002 [pdf, other]

Towards Robust Ferrous Scrap Material Classification with Deep Learning and Conformal Prediction

Authors: Paulo Henrique dos Santos, Valéria de Carvalho Santos, Eduardo José da Silva Luz

Abstract: In the steel production domain, recycling ferrous scrap is essential for environmental and economic sustainability, as it reduces both energy consumption and greenhouse gas emissions. However, the classification of scrap materials poses a significant challenge, requiring advancements in automation technology. Additionally, building trust among human operators is a major obstacle. Traditional appro… ▽ More In the steel production domain, recycling ferrous scrap is essential for environmental and economic sustainability, as it reduces both energy consumption and greenhouse gas emissions. However, the classification of scrap materials poses a significant challenge, requiring advancements in automation technology. Additionally, building trust among human operators is a major obstacle. Traditional approaches often fail to quantify uncertainty and lack clarity in model decision-making, which complicates acceptance. In this article, we describe how conformal prediction can be employed to quantify uncertainty and add robustness in scrap classification. We have adapted the Split Conformal Prediction technique to seamlessly integrate with state-of-the-art computer vision models, such as the Vision Transformer (ViT), Swin Transformer, and ResNet-50, while also incorporating Explainable Artificial Intelligence (XAI) methods. We evaluate the approach using a comprehensive dataset of 8147 images spanning nine ferrous scrap classes. The application of the Split Conformal Prediction method allowed for the quantification of each model's uncertainties, which enhanced the understanding of predictions and increased the reliability of the results. Specifically, the Swin Transformer model demonstrated more reliable outcomes than the others, as evidenced by its smaller average size of prediction sets and achieving an average classification accuracy exceeding 95%. Furthermore, the Score-CAM method proved highly effective in clarifying visual features, significantly enhancing the explainability of the classification decisions. △ Less

Submitted 19 April, 2024; originally announced April 2024.

arXiv:2402.14563 [pdf, other]

doi 10.1093/iwc/iwu016

Wizard of Oz Experimentation for Language Technology Applications: Challenges and Tools

Authors: Stephan Schlögl, Gavin Doherty, Saturnino Luz

Abstract: Wizard of OZ (WOZ) is a well-established method for simulating the functionality and user experience of future systems. Using a human wizard to mimic certain operations of a potential system is particularly useful in situations where extensive engineering effort would otherwise be needed to explore the design possibilities offered by such operations. The WOZ method has been widely used in connecti… ▽ More Wizard of OZ (WOZ) is a well-established method for simulating the functionality and user experience of future systems. Using a human wizard to mimic certain operations of a potential system is particularly useful in situations where extensive engineering effort would otherwise be needed to explore the design possibilities offered by such operations. The WOZ method has been widely used in connection with speech and language technologies, but advances in sensor technology and pattern recognition as well as new application areas such as human-robot interaction have made it increasingly relevant to the design of a wider range of interactive systems. In such cases achieving acceptable performance at the user interface level often hinges on resource intensive improvements such as domain tuning, which are better done once the overall design is relatively stable. While WOZ is recognised as a valuable prototyping technique, surprisingly little effort has been put into exploring it from a methodological point of view. Starting from a survey of the literature, this paper presents a systematic investigation and analysis of the design space for WOZ for language technology applications, and proposes a generic architecture for tool support that supports the integration of components for speech recognition and synthesis as well as for machine translation. This architecture is instantiated in WebWOZ - a new web-based open-source WOZ prototyping platform. The viability of generic support is explored empirically through a series of evaluations. Researchers from a variety of backgrounds were able to create experiments, independent of their previous experience with WOZ. The approach was further validated through a number of real experiments, which also helped to identify a number of possibilities for additional support, and flagged potential issues relating to consistency in Wizard performance. △ Less

Submitted 22 February, 2024; originally announced February 2024.

Comments: 28 pages

Journal ref: Schlogl, S., Doherty, G., & Luz, S. (2015). Wizard of Oz Experimentation for Language Technology Applications: Challenges and Tools. Interacting with Computers 27(6), pp. 592-615

arXiv:2309.13476 [pdf, other]

Hierarchical attention interpretation: an interpretable speech-level transformer for bi-modal depression detection

Authors: Qingkun Deng, Saturnino Luz, Sofia de la Fuente Garcia

Abstract: Depression is a common mental disorder. Automatic depression detection tools using speech, enabled by machine learning, help early screening of depression. This paper addresses two limitations that may hinder the clinical implementations of such tools: noise resulting from segment-level labelling and a lack of model interpretability. We propose a bi-modal speech-level transformer to avoid segment-… ▽ More Depression is a common mental disorder. Automatic depression detection tools using speech, enabled by machine learning, help early screening of depression. This paper addresses two limitations that may hinder the clinical implementations of such tools: noise resulting from segment-level labelling and a lack of model interpretability. We propose a bi-modal speech-level transformer to avoid segment-level labelling and introduce a hierarchical interpretation approach to provide both speech-level and sentence-level interpretations, based on gradient-weighted attention maps derived from all attention layers to track interactions between input features. We show that the proposed model outperforms a model that learns at a segment level ($p$=0.854, $r$=0.947, $F1$=0.897 compared to $p$=0.732, $r$=0.808, $F1$=0.768). For model interpretation, using one true positive sample, we show which sentences within a given speech are most relevant to depression detection; and which text tokens and Mel-spectrogram regions within these sentences are most relevant to depression detection. These interpretations allow clinicians to verify the validity of predictions made by depression detection tools, promoting their clinical implementations. △ Less

Submitted 6 October, 2023; v1 submitted 23 September, 2023; originally announced September 2023.

Comments: 5 pages, 3 figures, submitted to IEEE International Conference on Acoustics, Speech, and Signal Processing

ACM Class: F.2.2; I.2.7

arXiv:2308.08556 [pdf, other]

Causally Linking Health Application Data and Personal Information Management Tools

Authors: Saturnino Luz, Masood Masoodian

Abstract: The proliferation of consumer health devices such as smart watches, sleep monitors, smart scales, etc, in many countries, has not only led to growing interest in health monitoring, but also to the development of a countless number of ``smart'' applications to support the exploration of such data by members of the general public, sometimes with integration into professional health services. While a… ▽ More The proliferation of consumer health devices such as smart watches, sleep monitors, smart scales, etc, in many countries, has not only led to growing interest in health monitoring, but also to the development of a countless number of ``smart'' applications to support the exploration of such data by members of the general public, sometimes with integration into professional health services. While a variety of health data streams has been made available by such devices to users, these streams are often presented as separate time-series visualizations, in which the potential relationships between health variables are not explicitly made visible. Furthermore, despite the fact that other aspects of life, such as work and social connectivity, have become increasingly digitised, health and well-being applications make little use of the potentially useful contextual information provided by widely used personal information management tools, such as shared calendar and email systems. This paper presents a framework for the integration of these diverse data sources, analytic and visualization tools, with inference methods and graphical user interfaces to help users by highlighting causal connections among such time-series. △ Less

Submitted 11 August, 2023; originally announced August 2023.

MSC Class: 62D20 (Primary); 92C50 (Secondary) ACM Class: I.2.1; J.3

arXiv:2305.13447 [pdf, other]

Regularization Through Simultaneous Learning: A Case Study on Plant Classification

Authors: Pedro Henrique Nascimento Castro, Gabriel Cássia Fortuna, Rafael Alves Bonfim de Queiroz, Gladston Juliano Prates Moreira, Eduardo José da Silva Luz

Abstract: In response to the prevalent challenge of overfitting in deep neural networks, this paper introduces Simultaneous Learning, a regularization approach drawing on principles of Transfer Learning and Multi-task Learning. We leverage auxiliary datasets with the target dataset, the UFOP-HVD, to facilitate simultaneous classification guided by a customized loss function featuring an inter-group penalty.… ▽ More In response to the prevalent challenge of overfitting in deep neural networks, this paper introduces Simultaneous Learning, a regularization approach drawing on principles of Transfer Learning and Multi-task Learning. We leverage auxiliary datasets with the target dataset, the UFOP-HVD, to facilitate simultaneous classification guided by a customized loss function featuring an inter-group penalty. This experimental configuration allows for a detailed examination of model performance across similar (PlantNet) and dissimilar (ImageNet) domains, thereby enriching the generalizability of Convolutional Neural Network models. Remarkably, our approach demonstrates superior performance over models without regularization and those applying dropout regularization exclusively, enhancing accuracy by 5 to 22 percentage points. Moreover, when combined with dropout, the proposed approach improves generalization, securing state-of-the-art results for the UFOP-HVD challenge. The method also showcases efficiency with significantly smaller sample sizes, suggesting its broad applicability across a spectrum of related tasks. In addition, an interpretability approach is deployed to evaluate feature quality by analyzing class feature correlations within the network's convolutional layers. The findings of this study provide deeper insights into the efficacy of Simultaneous Learning, particularly concerning its interaction with the auxiliary and target datasets. △ Less

Submitted 20 June, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

arXiv:2301.05562 [pdf, ps, other]

Multilingual Alzheimer's Dementia Recognition through Spontaneous Speech: a Signal Processing Grand Challenge

Authors: Saturnino Luz, Fasih Haider, Davida Fromm, Ioulietta Lazarou, Ioannis Kompatsiaris, Brian MacWhinney

Abstract: This Signal Processing Grand Challenge (SPGC) targets a difficult automatic prediction problem of societal and medical relevance, namely, the detection of Alzheimer's Dementia (AD). Participants were invited to employ signal processing and machine learning methods to create predictive models based on spontaneous speech data. The Challenge has been designed to assess the extent to which predictive… ▽ More This Signal Processing Grand Challenge (SPGC) targets a difficult automatic prediction problem of societal and medical relevance, namely, the detection of Alzheimer's Dementia (AD). Participants were invited to employ signal processing and machine learning methods to create predictive models based on spontaneous speech data. The Challenge has been designed to assess the extent to which predictive models built based on speech in one language (English) generalise to another language (Greek). To the best of our knowledge no work has investigated acoustic features of the speech signal in multilingual AD detection. Our baseline system used conventional machine learning algorithms with Active Data Representation of acoustic features, achieving accuracy of 73.91% on AD detection, and 4.95 root mean squared error on cognitive score prediction. △ Less

Submitted 13 January, 2023; originally announced January 2023.

Comments: ICASSP 2023 SPGC description

MSC Class: 68T10 (Primary) 92C55 (Secondary) ACM Class: J.3; I.2.6; I.5.1

arXiv:2206.07026 [pdf, other]

doi 10.4324/9781315158945

Computational linguistics and Natural Language Processing

Authors: Saturnino Luz

Abstract: This chapter provides an introduction to computational linguistics methods, with focus on their applications to the practice and study of translation. It covers computational models, methods and tools for collection, storage, indexing and analysis of linguistic data in the context of translation, and discusses the main methodological issues and challenges in this field. While an exhaustive review… ▽ More This chapter provides an introduction to computational linguistics methods, with focus on their applications to the practice and study of translation. It covers computational models, methods and tools for collection, storage, indexing and analysis of linguistic data in the context of translation, and discusses the main methodological issues and challenges in this field. While an exhaustive review of existing computational linguistics methods and tools is beyond the scope of this chapter, we describe the most representative approaches, and illustrate them with descriptions of typical applications. △ Less

Submitted 14 June, 2022; originally announced June 2022.

Comments: This is the unedited author's copy of a text which appeared as a chapter in "The Routledge Handbook of Translation and Methodology'', edited by F Zanettin and C Rundle (2022)

ACM Class: I.2.7

Journal ref: In Zanettin F, Rundle C, editors, The Routledge Handbook of Translation and Methodology. New York: Routledge. 2022

arXiv:2206.06048 [pdf, other]

Temporal and Spatial Elements in Interactive Epidemiological Maps

Authors: Saturnino Luz, Masood Masoodian

Abstract: Maps have played an important role in epidemiology and public health since the beginnings of these disciplines. With the advent of geographical information systems and advanced information visualization techniques, interactive maps have become essential tools for the analysis of geographical patterns of disease incidence and prevalence, as well as communication of public health knowledge, as drama… ▽ More Maps have played an important role in epidemiology and public health since the beginnings of these disciplines. With the advent of geographical information systems and advanced information visualization techniques, interactive maps have become essential tools for the analysis of geographical patterns of disease incidence and prevalence, as well as communication of public health knowledge, as dramatically illustrated by the proliferation of web-based maps and disease surveillance ``dashboards'' during the COVID-19 pandemic. While such interactive maps are usually effective in supporting static spatial analysis, support for spatial epidemiological visualization and modelling involving distributed and dynamic data sources, and support for analysis of temporal aspects of disease spread have proved more challenging. Combining these two aspects can be crucial in applications of interactive maps in epidemiology and public health work. In this paper, we discuss these issues in the context of support for disease surveillance in remote regions, including tools for distributed data collection, simulation and analysis, and enabling multidisciplinary collaboration. △ Less

Submitted 13 June, 2022; originally announced June 2022.

Comments: Presented at the Map-based Interfaces and Interactions (MAPII) Workshop, at AVI'22

MSC Class: 92D30 (primary); 68U01 (secondary) ACM Class: J.3; H.4.3

arXiv:2203.12262 [pdf, other]

Multi-Mosaics: Corpus Summarizing and Exploration using multiple Concordance Mosaic Visualisations

Authors: Shane Sheehan, Saturnino Luz, Masood Masoodian

Abstract: Researchers working in areas such as lexicography, translation studies, and computational linguistics, use a combination of automated and semi-automated tools to analyze the content of text corpora. Keywords, named entities, and events are often extracted automatically as the first step in the analysis. Concordancing -- or the arranging of passages of a textual corpus in alphabetical order accordi… ▽ More Researchers working in areas such as lexicography, translation studies, and computational linguistics, use a combination of automated and semi-automated tools to analyze the content of text corpora. Keywords, named entities, and events are often extracted automatically as the first step in the analysis. Concordancing -- or the arranging of passages of a textual corpus in alphabetical order according to user-defined keywords -- is one of the oldest and still most widely used forms of text analysis. This paper describes Multi-Mosaics, a tool for corpus analysis using multiple implicitly linked Concordance Mosaic visualisations. Multi-Mosaics supports examining linguistic relationships within the context windows surrounding extracted keywords. △ Less

Submitted 23 March, 2022; originally announced March 2022.

arXiv:2104.09356 [pdf, other]

Detecting cognitive decline using speech only: The ADReSSo Challenge

Authors: Saturnino Luz, Fasih Haider, Sofia de la Fuente, Davida Fromm, Brian MacWhinney

Abstract: Building on the success of the ADReSS Challenge at Interspeech 2020, which attracted the participation of 34 teams from across the world, the ADReSSo Challenge targets three difficult automatic prediction problems of societal and medical relevance, namely: detection of Alzheimer's Dementia, inference of cognitive testing scores, and prediction of cognitive decline. This paper presents these predic… ▽ More Building on the success of the ADReSS Challenge at Interspeech 2020, which attracted the participation of 34 teams from across the world, the ADReSSo Challenge targets three difficult automatic prediction problems of societal and medical relevance, namely: detection of Alzheimer's Dementia, inference of cognitive testing scores, and prediction of cognitive decline. This paper presents these prediction tasks in detail, describes the datasets used, and reports the results of the baseline classification and regression models we developed for each task. A combination of acoustic and linguistic features extracted directly from audio recordings, without human intervention, yielded a baseline accuracy of 78.87% for the AD classification task, an MMSE prediction root mean squared (RMSE) error of 5.28, and 68.75% accuracy for the cognitive decline prediction task. △ Less

Submitted 22 March, 2021; originally announced April 2021.

arXiv:2010.06047 [pdf, other]

Artificial Intelligence, speech and language processing approaches to monitoring Alzheimer's Disease: a systematic review

Authors: Sofia de la Fuente Garcia, Craig Ritchie, Saturnino Luz

Abstract: Language is a valuable source of clinical information in Alzheimer's Disease, as it declines concurrently with neurodegeneration. Consequently, speech and language data have been extensively studied in connection with its diagnosis. This paper summarises current findings on the use of artificial intelligence, speech and language processing to predict cognitive decline in the context of Alzheimer's… ▽ More Language is a valuable source of clinical information in Alzheimer's Disease, as it declines concurrently with neurodegeneration. Consequently, speech and language data have been extensively studied in connection with its diagnosis. This paper summarises current findings on the use of artificial intelligence, speech and language processing to predict cognitive decline in the context of Alzheimer's Disease, detailing current research procedures, highlighting their limitations and suggesting strategies to address them. We conducted a systematic review of original research between 2000 and 2019, registered in PROSPERO (reference CRD42018116606). An interdisciplinary search covered six databases on engineering (ACM and IEEE), psychology (PsycINFO), medicine (PubMed and Embase) and Web of Science. Bibliographies of relevant papers were screened until December 2019. From 3,654 search results 51 articles were selected against the eligibility criteria. Four tables summarise their findings: study details (aim, population, interventions, comparisons, methods and outcomes), data details (size, type, modalities, annotation, balance, availability and language of study), methodology (pre-processing, feature generation, machine learning, evaluation and results) and clinical applicability (research implications, clinical potential, risk of bias and strengths/limitations). While promising results are reported across nearly all 51 studies, very few have been implemented in clinical research or practice. We concluded that the main limitations of the field are poor standardisation, limited comparability of results, and a degree of disconnect between study aims and clinical applications. Attempts to close these gaps should support translation of future research into clinical practice. △ Less

Submitted 12 October, 2020; originally announced October 2020.

Comments: Pre-print submitted to the Journal of Alzheimer's Disease

ACM Class: J.3; I.2.7; I.2.6; I.5.4

arXiv:2005.05716 [pdf, other]

AttViz: Online exploration of self-attention for transparent neural language modeling

Authors: Blaž Škrlj, Nika Eržen, Shane Sheehan, Saturnino Luz, Marko Robnik-Šikonja, Senja Pollak

Abstract: Neural language models are becoming the prevailing methodology for the tasks of query answering, text classification, disambiguation, completion and translation. Commonly comprised of hundreds of millions of parameters, these neural network models offer state-of-the-art performance at the cost of interpretability; humans are no longer capable of tracing and understanding how decisions are being ma… ▽ More Neural language models are becoming the prevailing methodology for the tasks of query answering, text classification, disambiguation, completion and translation. Commonly comprised of hundreds of millions of parameters, these neural network models offer state-of-the-art performance at the cost of interpretability; humans are no longer capable of tracing and understanding how decisions are being made. The attention mechanism, introduced initially for the task of translation, has been successfully adopted for other language-related tasks. We propose AttViz, an online toolkit for exploration of self-attention---real values associated with individual text tokens. We show how existing deep learning pipelines can produce outputs suitable for AttViz, offering novel visualizations of the attention heads and their aggregations with minimal effort, online. We show on examples of news segments how the proposed system can be used to inspect and potentially better understand what a model has learned (or emphasized). △ Less

Submitted 12 May, 2020; originally announced May 2020.

arXiv:2004.06833 [pdf, ps, other]

Alzheimer's Dementia Recognition through Spontaneous Speech: The ADReSS Challenge

Authors: Saturnino Luz, Fasih Haider, Sofia de la Fuente, Davida Fromm, Brian MacWhinney

Abstract: The ADReSS Challenge at INTERSPEECH 2020 defines a shared task through which different approaches to the automated recognition of Alzheimer's dementia based on spontaneous speech can be compared. ADReSS provides researchers with a benchmark speech dataset which has been acoustically pre-processed and balanced in terms of age and gender, defining two cognitive assessment tasks, namely: the Alzheime… ▽ More The ADReSS Challenge at INTERSPEECH 2020 defines a shared task through which different approaches to the automated recognition of Alzheimer's dementia based on spontaneous speech can be compared. ADReSS provides researchers with a benchmark speech dataset which has been acoustically pre-processed and balanced in terms of age and gender, defining two cognitive assessment tasks, namely: the Alzheimer's speech classification task and the neuropsychological score regression task. In the Alzheimer's speech classification task, ADReSS challenge participants create models for classifying speech as dementia or healthy control speech. In the the neuropsychological score regression task, participants create models to predict mini-mental state examination scores. This paper describes the ADReSS Challenge in detail and presents a baseline for both tasks, including feature extraction procedures and results for classification and regression models. ADReSS aims to provide the speech and language Alzheimer's research community with a platform for comprehensive methodological comparisons. This will hopefully contribute to addressing the lack of standardisation that currently affects the field and shed light on avenues for future research and clinical applicability. △ Less

Submitted 5 August, 2020; v1 submitted 14 April, 2020; originally announced April 2020.

Comments: To appear in the Proceedings of INTERSPEECH 2020, Oct 2020, Shanghai, China

arXiv:1911.00914 [pdf, ps, other]

Potential Applications of Machine Learning at Multidisciplinary Medical Team Meetings

Authors: Bridget Kane, Jing Su, Saturnino Luz

Abstract: While machine learning (ML) systems have produced great advances in several domains, their use in support of complex cooperative work remains a research challenge. A particularly challenging setting, and one that may benefit from ML support is the work of multidisciplinary medical teams (MDTs). This paper focuses on the activities performed during the multidisciplinary medical team meeting (MDTM),… ▽ More While machine learning (ML) systems have produced great advances in several domains, their use in support of complex cooperative work remains a research challenge. A particularly challenging setting, and one that may benefit from ML support is the work of multidisciplinary medical teams (MDTs). This paper focuses on the activities performed during the multidisciplinary medical team meeting (MDTM), reviewing their main characteristics in light of a longitudinal analysis of several MDTs in a large teaching hospital over a period of ten years and of our development of ML methods to support MDTMs, and identifying opportunities and possible pitfalls for the use of ML to support MDTMs. △ Less

Submitted 3 November, 2019; originally announced November 2019.

Comments: 5 pages

arXiv:1908.10623 [pdf, ps, other]

Emotion Recognition in Low-Resource Settings: An Evaluation of Automatic Feature Selection Methods

Authors: Fasih Haider, Senja Pollak, Pierre Albert, Saturnino Luz

Abstract: Research in automatic affect recognition has seldom addressed the issue of computational resource utilization. With the advent of ambient intelligence technology which employs a variety of low-power, resource-constrained devices, this issue is increasingly gaining interest. This is especially the case in the context of health and elderly care technologies, where interventions may rely on monitorin… ▽ More Research in automatic affect recognition has seldom addressed the issue of computational resource utilization. With the advent of ambient intelligence technology which employs a variety of low-power, resource-constrained devices, this issue is increasingly gaining interest. This is especially the case in the context of health and elderly care technologies, where interventions may rely on monitoring of emotional status to provide support or alert carers as appropriate. This paper focuses on emotion recognition from speech data, in settings where it is desirable to minimize memory and computational requirements. Reducing the number of features for inductive inference is a route towards this goal. In this study, we evaluate three different state-of-the-art feature selection methods: Infinite Latent Feature Selection (ILFS), ReliefF and Fisher (generalized Fisher score), and compare them to our recently proposed feature selection method named `Active Feature Selection' (AFS). The evaluation is performed on three emotion recognition data sets (EmoDB, SAVEE and EMOVO) using two standard acoustic paralinguistic feature sets (i.e. eGeMAPs and emobase). The results show that similar or better accuracy can be achieved using subsets of features substantially smaller than the entire feature set. A machine learning model trained on a smaller feature set will reduce the memory and computational resources of an emotion recognition system which can result in lowering the barriers for use of health monitoring technology. △ Less

Submitted 29 May, 2020; v1 submitted 28 August, 2019; originally announced August 2019.

arXiv:1811.09919 [pdf, other]

A Method for Analysis of Patient Speech in Dialogue for Dementia Detection

Authors: Saturnino Luz, Sofia de la Fuente, Pierre Albert

Abstract: We present an approach to automatic detection of Alzheimer's type dementia based on characteristics of spontaneous spoken language dialogue consisting of interviews recorded in natural settings. The proposed method employs additive logistic regression (a machine learning boosting method) on content-free features extracted from dialogical interaction to build a predictive model. The model training… ▽ More We present an approach to automatic detection of Alzheimer's type dementia based on characteristics of spontaneous spoken language dialogue consisting of interviews recorded in natural settings. The proposed method employs additive logistic regression (a machine learning boosting method) on content-free features extracted from dialogical interaction to build a predictive model. The model training data consisted of 21 dialogues between patients with Alzheimer's and interviewers, and 17 dialogues between patients with other health conditions and interviewers. Features analysed included speech rate, turn-taking patterns and other speech parameters. Despite relying solely on content-free features, our method obtains overall accuracy of 86.5\%, a result comparable to those of state-of-the-art methods that employ more complex lexical, syntactic and semantic features. While further investigation is needed, the fact that we were able to obtain promising results using only features that can be easily extracted from spontaneous dialogues suggests the possibility of designing non-invasive and low-cost mental health monitoring tools for use at scale. △ Less

Submitted 24 November, 2018; originally announced November 2018.

Comments: 8 pages, Resources and ProcessIng of linguistic, paralinguistic and extra-linguistic Data from people with various forms of cognitive impairment, LREC 2018

arXiv:1711.03065 [pdf, other]

An Application of Mosaic Diagrams to the Visualization of Set Relationships

Authors: Saturnino Luz, Masood Masoodian

Abstract: We present an application of mosaic diagrams to the visualisation of set relations. Venn and Euler diagrams are the best known visual representations of sets and their relationships (intersections, containment or subsets, exclusion or disjointness). In recent years, alternative forms of visualisation have been proposed. Among them, linear diagrams have been shown to compare favourably to Venn and… ▽ More We present an application of mosaic diagrams to the visualisation of set relations. Venn and Euler diagrams are the best known visual representations of sets and their relationships (intersections, containment or subsets, exclusion or disjointness). In recent years, alternative forms of visualisation have been proposed. Among them, linear diagrams have been shown to compare favourably to Venn and Euler diagrams, in supporting non-interactive assessment of set relationships. Recent studies that compared several variants of linear diagrams have demonstrated that users perform best at tasks involving identification of intersections, disjointness and subsets when using a horizontally drawn linear diagram with thin lines representing sets, and employing vertical lines as guide lines. The essential visual task the user needs to perform in order to interpret this kind of diagram is vertical alignment of parallel lines and detection of overlaps. Space-filling mosaic diagrams which support this same visual task have been used in other applications, such as the visualization of schedules of activities, where they have been shown to be superior to linear Gantt charts. In this paper, we present an application of mosaic diagrams for visualization of set relationships, and compare it to linear diagrams in terms of accuracy, time-to-answer, and subjective ratings of perceived task difficulty. The study participants exhibited similar performance on both visualisations, suggesting that mosaic diagrams are a good alternative to Venn and Euler diagrams, and that the choice between linear diagrams and mosaics may be solely guided by visual design considerations. △ Less

Submitted 8 November, 2017; originally announced November 2017.

Showing 1–20 of 20 results for author: Luz, S