Language and Health xxx (xxxx) xxx
Contents lists available at ScienceDirect
Language and Health
journal homepage: www.sciencedirect.com/journal/language-and-health
Natural language processing-driven framework for the early detection of
language and cognitive decline
Kulvinder Panesar a, María Beatriz Pérez Cabello de Alba b, *
a
b
University of Bradford, Bradford, UK
Universidad Nacional de Educación a Distancia, Madrid, Spain
A R T I C L E I N F O
A B S T R A C T
Keywords:
Language production
Memory concerns
Pre-screening model
Role and reference grammar
Speech assessment
Natural language processing
Natural Language Processing (NLP) technology has the potential to provide a non-invasive, cost-effective method
using a timely intervention for detecting early-stage language and cognitive decline in individuals concerned
about their memory. The proposed pre-screening language and cognition assessment model (PST-LCAM) is based
on the functional linguistic model Role and Reference Grammar (RRG) to analyse and represent the structure and
meaning of utterances, via a set of language production and cognition parameters. The model is trained on a
Dementia TalkBank dataset with markers of cognitive decline aligned to the global deterioration scale (GDS). A
hybrid approach of qualitative linguistic analysis and assessment is applied, which includes the mapping of
participants´ tasks of speech utterances and words to RRG phenomena. It uses a metric-based scoring with
resulting quantitative scores and qualitative indicators as pre-screening results. This model is to be deployed in a
user-centred conversational assessment platform.
1. Introduction
There is continuing research for a timely diagnosis for early detection
of cognitive decline to help reduce the dementia rates, and to provide
the best treatment, support and plans promptly (Adhikari et al., 2022;
DementiaUK, 2021). Dementia is a complex and progressive neurological disorder that leads to a decline in cognitive abilities such as language, visuospatial skills, memory, judgment, and mental agility. Our
focus is the language and cognitive impairments and semantic memory
deficits which may affect understanding speech production in everyday
life (McKhann et al., 1984). Each affected person will experience dementia symptoms differently. The symptoms are categorised differently
by several authors such as Förstl and Kurz (1999), and our focus is the
global deterioration scale (GDS) and stages (Stage 1 - no cognitive
decline to Stage 7 – very severe cognitive decline) (Reisberg et al.,
1982). Under the umbrella of dementia individuals will be characterised
by as being afflicted with a loss in cognitive and communicative functionality (Bucks et al., 2000). This observation is seen in 88–95% of
people as noted by Thompson (1987) demonstrating some degree of
aphasia (language disability) and cognitive failure including the
inability to grasp concepts, events of their past, or the ability to recognize individuals (Guinn & Habash, 2012). These mild impairments could
be from one or more word level, sentence level, and discourse level
features as stated in the AphasiaTalkbank (2021) and is expanded in the
next section. The evidence of impairments can be seen when a person is
experiencing notable changes in their short-term memory and forgetting
things such as an immediate task to do such as ‘to turn off the cooker’
which may lead to health implications. Further work colleagues may
have detected mistakes in their performance over a period, and they
have flagged this, leaving the individual with a feeling of despair and
anxiety. In another situation the individual may demonstrate confusion
over location, time, and activity requirements (such as taking medicine/switching off cooking/ heating/electrical appliances) which may
lead to a personal risk and incurred risks to the people around them.
These real-life person-centered scenarios could be critical and manifested by people with undiagnosed cognitive decline. Hence, cognitive
assessment is a critical clinical diagnostic tool for neurodegenerative
diseases, especially AD and one of the most valuable predictor of its
further progression [NIA 2018]. A series of cognitive tests are used to
diagnose any cognitve impariment. We will invoke the tasks of the Mini
Mental State Examination (MMSE) (Cockrell & Folstein, 2002). As noted
above, individuals may have a range of symptoms underpinned by
communication, health, employment, risk triggers and most often evidenced in a regular conversation. Further, other people such as family,
* Corresponding author.
E-mail address: bperez-cabello@flog.uned.es (M.B. Pérez Cabello de Alba).
https://doi.org/10.1016/j.laheal.2023.09.002
Received 26 May 2023; Received in revised form 8 August 2023; Accepted 20 September 2023
2949-9038/© 2023 The Authors. Published by Elsevier B.V. on behalf of Shandong University. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Please cite this article as:
Kulvinder
https://doi.org/10.1016/j.laheal.2023.09.002
Panesar,
María
Beatriz
Pérez
Cabello
de
Alba,
Language
and
Health,
K. Panesar and M.B. Pérez Cabello de Alba
Language and Health xxx (xxxx) xxx
friends or acquaintances may identify some mild impairment with
speech or a feeling of something going wrong and recommend the need
to investigate. This investigative task is the major motivation for our
work. The goal is to embed a pre-screening trained language and
cognition assessment model (PST-LCAM) as an intervention into a
conversational agent interface as an application for investigating the
early detection of language and cognitive decline.
The second motivation for exploring a pre-screening model is as a
response to two out of four grand challenges of healthcare technologies
that UKRI had proposed in 2021 (EPRScUKRIOrg, 2021). Here, our
pre-screening model involves: (1) a new method of recognising
abnormal data patterns of spoken utterances. (2) We investigate the
technique of reducing the progression of the disease – by identification
of at-risk individuals through early pre-screening. The third motivation
for a pre-screening tool is derived from a commercial/technical need
acknowledged at various events such as the AI Business Digital Symposium via a healthcare panel discussion. Here they unanimously agreed
a need for a proactive evidence-based approach that will engage patients, monitor their health journey and measure outcomes - rather than
a reactive mindset to diagnosis (BabylonHealthCom, 2021). This message is shared by wider healthcare sector and professional community.
Our motivation and investigative work also concurs with a recent BBC
article stating that a brain check-up tool has the potential to cut risk at
any age (AlzheimersResearchUkOrg, 2023; Roxby, 2023).
Focusing on the participant’s language production and assessment
by both manual and computational approaches where understanding of
what is said (meaning) is critical for communication. Natural language
processing (NLP) is an overall term for how computers, interpret, understand, and use human language. Our pre-screening model will be NLP
driven. NLP is defined ‘as a theoretically motivated range of computational techniques for analysing and representing natural occurring texts
at one or more levels of linguistic analysis for the purpose of achieving
human-like language processing for a range of tasks or application’
(Liddy, 2001). NLP involves natural language understanding (NLU), that
is, enabling computers to derive meaning from human or natural language input. NLU is challenged as: (a) humans make mistakes (b) human
speech requires context such as to ask “how was lunch?” and receive a
reply “I spend the entire time waiting at the doctor” is clear to you (lunch
was bad) but not necessarily to a computer trained to search for negative
words (no, not for example). (c) human language is irregular due to
variances in the same language – American English vs British English
etc, and this can lead to a lack of context, spelling errors, or dialectal
differences. (d) language has ambiguity which means it could be understood in two or more possible senses or ways. It can be of several
types - morphological, lexical, syntactic, semantic dependency ambiguity, referential ambiguity, scope ambiguity, and pragmatic ambiguity
(Maurya, Gupta, & Choudhary, 2015). Further NLP and NLU systems
require knowledge on the domain, use case and about the special nuances of how the language is expressed such as: (a) different word – same
meaning; (b) different grammar – same meaning; (c) different expression – same meaning; (d) same word – different context (Linguamatics,
2021).
From a technical development perspective, a review on conversational agents (CAs) in healthcare by Car et al. (2020) identified their
infancy, and proposed a robust investigation into their potential diagnosis use rather than just health service support. Our proposal supports a
proactive strategy and potential of CA intervention to aid diagnosis
based on a range of tasks.
Statistically, the NHS state there are over 850,000 people in the UK
affected by dementia, of which 7% are over the age of 65 and 17% over
80. It is estimated that more than one million people will have dementia
by 2030, and this will increase to more than 1.6 million by 2050 (AlzheimersResearchUk, 2020). Alzheimer’s disease (AD) is a chronic progressive neurodegenerative disease that affects more than 35 million
people worldwide, and this similar prevalence as in the UK, identifies
that this number is expected to triple worldwide by 2050
(WorldAlzReport2015Org, 2015). As for the care perspective, over 11
million Americans provide unpaid care for people with Alzheimer’s or
other dementias. In 2022, unpaid caregivers provided an estimated 18
billion hours of care valued at $339.5 billion. From a pre-screening view,
only 4 in 10 Americans would talk to their doctor right away when
experiencing early memory or cognitive loss, and 7 in 10 Americans
would want to know early if they have AD if it could allow for earlier
treatment (Alzheimer’sAssociation, 2023). From a social perspective, a
study by the Alzheimer’s Society (AS) stated that 56% of patients, must
wait for up to a year before getting help, because they feel afraid of their
condition (AlzheimersOrgUk, 2018). Also in the UK, under the Equality
Act 2010, a person who is living with dementia is recognised as having a
disability (a protected characteristic) necessitating a person-centred
care approach with patient safety.
From a psychological and health perspective the pre-screening tool is
grounded by COM-B (capability, opportunity, motivation) model
(Michie, Atkins, & West, 2014). Here the underlying social problem is
related to concerns about memory, and the pre-screening tool facilitates
the need for a target behaviour change and the outcomes from our model
will provide indicators of what needs to change. This aspect of participant change is outside the remit of this paper. As highlighted above the
importance of the model is derived foremost from a social problem. As
AlzheimersOrgUk (2020) reported during the pandemic “referral
numbers are increasing; a sustained and proactive effort must be made
to support access to timely diagnosis” (AlzheimersOrgUk, 2020). To take
a small step in pro-active early detection, our CA intervention will
provide indicators of any potential issues with language production and
cognition and help to support any patient-centred care plans (Mannonen, Kaipio, & Nieminen, 2017).
Linguistically, our model will use Role and Reference Grammar
(RRG)´s functional model (Van Valin, 2005a, 2005b) for a grammatical
analysis, and use an ontology for a cognitive assessment, to ascertain the
symptomatic changes of language production in people. RRG can
adequately explain, describe, and embed the communication-cognitive
function in conversation, in a computational form. RRG enables language to be comprehended and produced, to gain a deep understanding
and interfacing with knowledge and provide logical representations of
the utterance (Van Valin, 2005a, 2005b). An ontology/knowledge
base/corpus will help us assess the linguistic capacity of a subject to
locate categories in a cognitive dimension and to produce instances of a
given category. For example, to list words related to transportation. In
this way, by analysing the oral production of participants with an
ontology, we will be able to (1) check which attributes of a given
category are present and which ones are missing, and (2) establish
enriched conceptual networks which would reflect the hierarchical
chain found in the production of a category. This information will be
useful as part of the pre-diagnosis of cognitive decline.
We deploy RRG’s functional model to analyse the utterances for the
lexical and grammatical complexity, word order, and represent their
structure and meaning. The utterances are sourced from the Dementia
TalkBank (MacWhinney, 2017)conversation dataset and associated
audio/video transcripts of participants conducting various tasks with an
investigator. The tasks were linguistically mapped to language and
cognition parameters - lexical, syntax, semantic, discourse, and pragmatic (Ntracha et al., 2020). Further factors we assess are ontology
which constitutes the lexicon and parameters of word production,
contextual new word, repetition (Echolalia), involuntary words (Palilalia), retaining of your language, speech pauses, timings, and interruptions. With respect to cognitive issues, RRG adopts the criterion of
psychological adequacy formulated in Dik (1991), which states that a
theory should be "compatible with the results of psycholinguistic
research on the acquisition, processing, production, interpretation and
memorization of linguistic expressions". It also accepts the related criterion put forth in Bresnan et al. (1982) that theories of linguistic
structure should be directly relatable to testable theories of language
production and comprehension (Van Valin, 2000, p. 48). Henceforth,
2
K. Panesar and M.B. Pérez Cabello de Alba
Language and Health xxx (xxxx) xxx
psycholinguistic and cognitive adequacy (PCA) refers to psychological
structures, principles and strategies which determine the way in which
the linguistic expressions are acquired, generated, understood, processed, produced, interpreted, and stored in our mind (Mairal and Pérez,
2019).
In summary the aim of the pre-screening language and cognition
assessment model (PST-LCAM) will be to assess speech, language and
cognition and present results with indicators that can be further validated with a clinical professional. The concept of PST-LCAM is that it
utilises the Dementia TalkBank dataset for the training and testing of
utterances and assessment. The linguistic mapping of utterances for
assessment tasks will invoke PCA and linguistic assessment will be
achieved by Role and Reference Grammar to ascertain structure and
meaning of an utterance. Further the concept involves the NLP challenges, conversational practices and clinical processes identified and
addressed as part of the assessment, assessment analysis and outcome.
The design of the PST-LCAM comprises: (i) Devise the language and
cognition tasks (LC) as per two strands (MMSE tasks) and interview task
using the Dementia TalkBank dataset; (ii) create a RRG Mapping for LC
tasks, scoring and GDS based matrix (iii) conduct the task and strand
assessments; (iv) conduct the strand scoring, and the merging of the
strand analysis; (v) to present the participant dashboard.
Our research hypothesis for this work is that participants are placed
on the Global Deterioration Scale (GDS) score, and our RRG based language production and cognition assessment provides a positive correlation and presents similar indicators. Our research questions are:
The focus of this paper is to present the model, its underlying principles and concepts, dataset use, model design, and final analysis. Section II gives a summary of literature review and related works. Section
III describes the methodology to develop the model and presents the
linguistic mapping for model experiment and participant tasks. Section
IV discusses the results and includes the analysis methods, results from
statistical analysis and summative outcomes for participants and model
itself. Section V discusses the advantages of the PST-LCAM model and its
future directions and finally Section VI presents some concluding
remarks.
2. Literature review
The literature view and related works section will discuss the data
sources, the nature of mild impairments, underpinning technology and
opportunities for neurological problems, person-centred speech and
memory problems, and the consideration of the range of diagnostic
assessment for cognitive decline.
2.1. Dementia datasets
Dementia TalkBank is part of DementiaBank, one of the largest
available datasets of audio recordings and transcripts and are selected
for our work. It is convenient, 24/7 availability and accessible via
membership and permissions granted by the University of Edinburgh
(DementiaTalkbankOrg, 2017). MacWhinney (2000) manually transcribed the recordings using the CHAT (Codes for the Human Analysis of
Transcripts) protocol. It is a shared database of multimedia interactions
for the study of communication in dementia since 2007, using heterogeneous sources, specialists and has prestige worldwide. Dementia
TalkBank (MacWhinney, 2017) continues to be used for extensive
research projects and teaching goals, and has led the way forward to
research projects such as easy indicators of dementia (Padhee et al.,
2020). Other dementia conversation dataset research include: (1)
conversational profiling of video and audio recordings of personal information and working memory (Jones et al., 2016); (2) conversational
analysis of the pause to speech ratio and measures of linguistic
complexity (O’Malley et al., 2021; O’Malley et al., 2020). Other dementia research is explored with open access data repositories for dementia (Miah et al., 2021, p. 98) (AlzheimersResearchUkOrg, 2021) and
via various other project initiatives such as (JAIN, 2021) but not limited
to.
1. What is the relationship between the concepts of language, cognition, and speech production of participant’s task utterances and
RRG’s functional model and the linguistic phenomenon of psychological adequacy? How will these concepts be mapped?
2. How will the mapping be implemented into a model for the language
and cognition assessment with summative outcomes linked to the
global deterioration scale?
3. What are the challenges that need to be considered in terms of
computational natural language processing of speech, language
production and cognition assessment?
4. How will the PST-LCAM consider individual variables and performance to complete the language and cognition assessment and how
will this be evaluated?
The novelty of this work lies in the introduction of deep linguistic
analysis in a pre-screening trained language and cognition assessment
model (PST-LCAM) for people concerned about their memory. The main
contributions of the paper are:
2.2. Mild impairments related to aphasia
As noted in the introduction the mild impairments could be from one
or more word level, sentence level, and discourse level features (AphasiaTalkbank, 2021). These mild impairments are discussed here in detail
covering both definition and example as they are central to the approach
used in the participant task analysis. Word level features include
anomia, circumlocution, conduit d′approche, jargon, neologism,
perseveration, phonemic paraphasia and semantic paraphasia. Anomia
is word finding problems manifested via long pauses, word fragments,
fillers, trailed-off, unfinished utterances, sighs, and other signs of frustration. Whereas circumlocution refers indirect, roundabout language to
describe a word or concept, e.g., “that thing to cut my fish and chips”.
Conduit d′approche is successive attempts at a target word. These attempts approximate the target phonetically and final production may or
not be successful, e.g., uttering ‘ife’, ‘knif’ for “knife”. Another key word
level feature is jargon which refers to fluent, prosodically correct output,
resembling English syntax and inflection, but containing largely meaningless speech, sometimes intelligible and can be transcribed, sometimes
it is unintelligible. Neologism is another word level feature which is a
non-word substitution for a target word, usually with less than 50%
overlap of phonemes between error and targe, and the target word may
be known or unknown. Perseveration – is the repetition of a previously
A. A novel linguistic mapping is presented which is based on the underpinning theoretical stance of psycholinguistic and cognitive adequacy (PCA) of RRG. This mapping of the Dementia TalkBank
conversation dataset and participant tasks are analysed to understand the psycholinguistic and cognitive determinants for dementia.
B. A manually trained pre-screening model for the language and
cognition assessment is presented. This is based on two strands of
participant tasks using a PCA based linguistic mapping of 22 language and cognition parameters, model statistical scoring with
qualitative and quantitative indicators and a summative participant
GDS score demonstrating consistency with the clinical investigator’s
outcome.
C. An intensive statistical evaluation model involves (i) utterances
analysis; (ii) speech disfluencies; (iii) parameter assessment scoring;
(iv) utterance grouping analysis; (v) normalisation of task scoring
against each strand and the complete scoring results; (vi) tuning of
results based on inclusions, exclusions set (non-linguistic/visual
cues); (vii) consideration of the types of utterances (viii) flagging of
the completion of tasks; (ix) lexical and cognitive significance (such
as acknowledgements); (x) completion of tasks and final assessment.
3
K. Panesar and M.B. Pérez Cabello de Alba
Language and Health xxx (xxxx) xxx
used word or phrase that is no longer appropriate to the context. Phonemic paraphasia is the substitution, insertion, deletion, or transposition
of phonemes (usually with at least 50% overlap of phonemes between
error production and target, but definitions differ. Here error production
may be a word or non-word and the error may or may not be
self-corrected. Semantic paraphasia is the substitution of a real word for
a target word. The error may be related or unrelated to the target and it
may or may not be self-corrected. Stereotypy is the repetition of a syllable, word, or phrase frequently throughout the sample and they may
be words or non-words. Sentence-level features include: Agrammatism
where speech is reduced in length and/or complexity and function
words and morphemes may be missing. For example, uttering ‘tree’,
‘dog’ without ‘the’. Empty speech is speech that contains general, vague,
unspecific referents but is semantically and syntactically intact. For
example, for retelling a telephone conversation and including
non-relevant ideas. At discourse-level we find interactions demonstrating successful communication despite language filled with neologisms and jargon (Wernicke’s aphasia) and very limited language output
(Broca’s aphasia).
manipulations and AI. We are aiming at a pre-diagnosis of apparently
healthy subjects who may demonstrate language and cognitive decline,
by assessing them through automatic speech recognition, considering
grammatical, phonological, and cognitive indicators that will help to
assess any signs of cognitive impairment invoking Reisberg´s global
deterioration scale.
2.4. Technology drivers, progress of diagnosis and NLP driven models
The rise of AI technologies, including machine learning, deep
learning, natural language processing (NLP), smart robots, and conversational agents, has undoubtedly had a significant impact on various
aspects of daily life, particularly in the healthcare sector. There is an
optimistic outlook that AI-based solutions can greatly enhance healthcare by augmenting the decision-making process of doctors, from diagnostics to treatment, leading to significant improvements in various
healthcare areas (Bohr & Memarzadeh, 2020; Lee & Yoon, 2021). This
potential for transformative innovation and attention is shared by researchers, physicians, technology, and program developers with enormous investment in AI-related technologies with substantial annual
savings in healthcare. Focusing on diagnostic assistance of certain diseases such as cancer, eye or paediatric, it was reported by Taylor (2019)
from a report from the National Academy of Science, Engineering and
Medicine that diagnostic errors accounted for 60% of all medical errors,
and unfortunately accounting for 40,000–80,000 deaths a year in US
hospitals, attributed to human judgement. Using AI-based technologies
for nursing patient via chatbots have been effective for engaging in
conversation with patients and family member in hospital (Palanica
et al., 2019) They further indicated as part of the physicians’ perception
of chatbots in healthcare via cross-sectional web-based survey that
healthcare chatbots could be at risk of patient to self-diagnose to often
(74%) and whom may not be able to fully understand the diagnoses
outputs. This is an important consideration for our development.
From an NLP perspective, language understanding of AI systems has
improved rapidly since 2020 with the use of large language models
(LLM) and the recency of generative AI technologies in 2023 and
transformation of large complex data sources into useable information.
Despite its massive opportunities in the healthcare domain, it has been
slow to adopt it with varying reduced maturity levels in the life science
market, patient-facing, clinical-facing (diagnosis), admin and analytics
and AI (Norden, Wang, & Bhattacharyya, 2023). Technically, the AI
technologies are able to present great accurate results of higher than
human performance on numerous benchmarks such as the 99th
percentile on the Biology Olympiad and demonstrate advanced
reasoning capabilities (OpenAI, 2023). This is further supported by
linguistics researchers who have in the recent years, primarily described
the performance of specific language models in their ability to accomplish several sophisticated tasks such as question answering, content
summarization, sentence prediction, and so on. However, it leads to the
question of is it cognition: understanding vs. simulating (Verizon, 2023).
Subsequently, diagnostics of identifying diseases is challenging in term
of human-centric individual variations and potential ethical harm from
automated diagnosis. There is a critical need of a granular understanding of language and meaning of speech production with speech disfluencies for the pre-screening assessment of cognitive decline with
aspects of explainability of the results. For this critical reason a clinician
with judgement ability is required as a human in the loop concept
during/after) the outcome (Mosqueira-Rey et al., 2023). This is necessary for any AI/hybrid linguistic based intervention and adhering to
responsible and ethical alignment (Dastani & Yazdanpanah, 2023).
Hence, it will be currently used as a decision support or analytical/alerting aids (Norden, Wang, & Bhattacharyya, 2023).
Our rationale for PST-LCAM model thinking is based upon a robust,
optimized, validated system that has a prime functionality of understanding, analysing, and assessing utterances, both at an individual,
task, strand, and holistic level. This will involve automatic speech
2.3. Diagnostic assessment for cognitive decline
Traditionally, manually administered cognitive tests have been used
to help measure mental functions such as memory and language, among
others. The most frequently used cognitive tests for orientation, memory, attention, concentration, naming, repetition, writing and comprehension, are the Mini-Mental State Examination (MMSE) (Cockrell &
Folstein, 2002) the Montreal Cognitive Assessment (MoCA) (Nasreddine
et al., 2005); the Mini Cognitive Assessment (Mini-Cog) (Borson et al.,
2003) and the Boston Naming Test (Kaplan, Goodglass, & Weintraub,
2001). Language has gained growing interest in cognitive screening and
in particular the analysis of speech production due to its inexpensive and
ecological approach to identify changes in cognitive function (Bertini
et al., 2022). Nevertheless, they state this approach requires manual
activities such as transcription, annotation and correction which may
result in a biased outcome. In recent years, some attempts to automate
cognitive assessment have been made. Cognospeak (O’Malley et al.,
2021; O’Malley et al., 2020) is a fully automated tool based on automatic speech recognition to classify participants/patients in mild
cognitive impairment (MCI,) Alzheimer’s Disease (AD), functional
cognitive disorder (FCD) and healthy controls (HC) via automatic speech
recognition and diarisation.
O’Malley et al. (2020) used their automated cognitive assessment
tool to explore whether responses to questions which examine recent
and remote memory could help in distinguishing between patients with
early neurodegenerative disorders and those with Functional Cognitive
Disorders (FCD), who have non-progressive cognitive complaints. The
findings were that the application of linguistic measures of differences in
pause to speech ratio, and measures of linguistic complexity did not help
with that task but, nevertheless, it could distinguish patients with Mild
Cognitive Impairment (MCI) and Alzheimer´s Disease (AD) from healthy
controls. As a result, the authors stated the utility of incorporating
additional measures of lexical and grammatical complexity (word frequency, sentence structure). Their follow-up work O’Malley et al. (2021)
used a fully automated version of their tool which automatically analyses the audio and speech data which involved speaker segmentation,
automatic speech recognition and ML classification. Here their tool
could distinguish between participants in the AD or MCI groups and
those in the FCD or healthy control groups with levels of accuracy
comparable to manually administered assessments. Nevertheless, they
state that greater accuracy should be achievable through further system
training with a greater number of users, the inclusion of verbal fluency
tasks and repeat assessments. After reviewing O’Malley et al. (2020) and
(O’Malley et al., 2021) our model takes on a different approach to
address similar improvement needs by deploying participant tasks from
traditional cognitive tests, automatic speech recognition, RRG
4
K. Panesar and M.B. Pérez Cabello de Alba
Language and Health xxx (xxxx) xxx
recognition (ASR) using deep learning models, computational linguistics
(RRG) and behaviours of adopting user centred design and experience,
person-centred approach, ethical alignment, conversation design, language and speech analysis, and dialogue management strategy. Our
embedded model PST-LCAM for pre-screening will be different as it is
based on hybridisation of model preparation involving ASR, lean machine learning methods with major Role and Reference Grammar (RRG)
based manipulation and grammatical testing to support a pre-screening.
Our goal is for the early detection of cognitive decline and our methods
are based on a linguistic phenomenon – PCA to map language production and cognition tasks to levels of language, cognitive ontological information, and speech production variables. Further, we develop a novel
language and cognition assessment addressing memory concerns with
resulting qualitative and quantitative indicators of possible cognitive
issues. This is explained in the next section. The data, analysis and
assessment tool are available at source with further details found in
Appendix F.
Other related works entail using NLP deep learning pre-trained
models on large corpus of speech transcripts which can be instrumental in learning the pattern of speech narratives as in the case of
speech production of Alzheimer’s disease (Adhikari et al., 2022). Similarly, “CognoSpeak’ is a fully automated system with the analyses audio
and speech data which involves speaker segmentation, automatic speech
recognition (ASR) and machine learning classification for the diagnosis
of AD, Mild Cognitive Impairment (MCI), Functional Memory Disorder
and healthy controls (O’Malley et al., 2021) and other approaches such
as the NLP annotation tool (Noori et al., 2022), NL user interface
(Ntracha et al., 2020) and AI-based semantic measurement model (Foltz
et al., 2022; Penfold et al., 2022). Similar works are from Yeung et al.
(2021), with the goal for the early identification of markers. Here, they
analyse variables extracted through NLP and automated speech analysis
with correlation to language impairments identified by a clinician. In
summary, these related works focus on AI-enabled results with post
clinician decision making, while our PST-LCAM focuses on the understanding and meaning that are required for a thorough language and
cognition assessment.
memory. An overview of the proposed framework is shown in Fig. 1.
This model is underpinned by an experiment to test a hypothesis against
participants from the Dementia TalkBank, who are placed on the Global
Deterioration Scale (GDS). Our proposed RRG based language production and cognition assessment provides a positive correlation and presents similar indicators and results of the investigators. Our model is
based on a language and cognitive assessment invoked from the MMSE
(Mini-mental State Examination) and refers to the Global Deterioration
Scale (GDS) with 7 stages and indicators as: (1) No cognitive decline, (2)
very mild cognitive decline, (3) mild cognitive decline, (4) moderate
cognitive decline, (5) moderate severe cognitive decline, (6) severe
cognitive decline, (7) very severe cognitive decline. (Reisberg et al.,
1982).
This methodology was achieved via three stages: (1) using the underpinning theoretical basis of psycholinguistic and cognitive adequacy
(PCA) of Role and Reference Grammar (RRG) to map the speech production to the language and cognition parameters. We used RRG’s levels
of language – lexical, syntax, semantic, discourse and pragmatic (Van
Valin, 2005a, 2005b), and annotations of speech disfluencies found in
CLAN (Computerized Language ANalysis) manual (MacWhinney, 2017,
2019, 2021). (2) Experiment with the participant data based on the
Dementia TalkBank’s tasks with associated transcripts, utterances, and
speech. (3) Model assessment stages involving selection, utterance
assessment, normalisation for language and cognition scoring; normalisation for the whole experiment and scoring; and finally, assessment
and measurement against the Global Deterioration Scale (GDS). The
proof of concept is a trained and tested model based on a spoken corpus
– Dementia TalkBank and language and cognition assessment with indicators of language production and cognition impairment.
PST-LCAM experiment uses a mixed method approach of qualitative
linguistics analysis and assessment with resulting quantitative scores
and qualitative recommendations. We have devised our own prescreening assessment composed of two strand assessments. Strand 1 Cognitive and linguistic assessment (St1-CLA) with 4 participant tasks
and Strand 2 – Interview analysis assessment (St2-IAA). This was
inspired by TalkBank (DementiaTalkbankOrg, 2017) a shared database
and platform for the study of communication in dementia, and human
spoken communication with associated tools, manuals and transcripts
(MacWhinney, 2017) (MacWhinney, 2021). Our dataset is based on a
range of investigators (Holland, Kempler, Lanzi, PPA, and Pitt) and their
participant group and their specific techniques used. We started with an
initial set of 6 participants (transcripts and audio) with GDS scores
ranging from band score 3–7, who had completed both assessment
3. Methodology
3.1. Methods
The proposed work is based on the investigation of a hypothesis and
experimentation of PST-LCAM for people concerned about their
Fig. 1. Pre-screening language and cognition assessment model (PST-LCAM) Framework.
5
K. Panesar and M.B. Pérez Cabello de Alba
Language and Health xxx (xxxx) xxx
strands and there was significant and reliable information available with
the definitive investigator’s clinical decisions. Due to some inconsistencies with the corpus, only 4 participants were considered for
the experiment, as shown in Appendix C.
Strand 1 has four cognitive and linguistic assessment tasks. The
investigator is assessing the participant on a range of tasks via conversational style. In Task 1, the investigator will start off with telling a story
and the participant will have to repeat the story immediately after. The
story goes - “While a lady was shopping her wallet fell out of her purse,
but she did not see it fall. When she got to the checkout counter, she had
no way to pay for her groceries. So, she put the groceries away and went
home. Just as she opened the door to her house the phone rang, and a
little girl told her that she had found her wallet. The lady was very
relieved”. Task 2 is a generative naming task where the participant has
to name things related to transportation within one minute. In Task 3,
the investigator will ask the participant to retell the story related to Task
1. In Task 4, the investigator will present a picture to the participant, and
they will need to describe what is going on in the picture in Fig. 2. An
interpretation is “A man (Father) is reading a newspaper whilst, his wife
and family are ready to go to Church. There is no communication between the woman (Mother) and children with the man (Father)”).
In a separate session the investigator will conduct Strand 2, an
interview with the participant. Here the narrative is ‘tell me something
about your family’, ‘tell me about your job’, ‘tell me about Little Red
Riding Hood’, Goldilocks and 3 bears; tell me about your country”. Our
test strand was the topic ‘tell me about your family’. Exploring the
strands from a lexical-semantic perspective and variability in cognitive
behaviour will provide an effective predicator of cognitive status. As
Ostrand and Gunstad (2021) suggest story-retelling task imposes higher
memory demand than the picture description task as the participant
must recall the events without external memory support. They need to
draw on both the semantic memory to retrieve appropriate words and
the episodic memory for the story and the use of attentional and executive function controls for the passage of the story.
analysed for their language production and cognition links as part of the
task. For instance, picking keywords, memory recall, velocity of
retrieval, making references to people, objects, location, time, manner
and mental lexicon, and reference to knowledge, relationship between
entities, understanding the questions, confirmation/acknowledge of
question, fluency of response, and number of responses, number of
pauses, length of pauses (short, long, and very long). Step 3 – for each
participant task and attributes they were considered firstly for their
mapping to language production and cognition aspects and secondly to
assess for their RRG relevance in terms of cognition and the lexicon
(picking out words and word production). Further considerations
include: (i) are there any semantic macroroles (actors and undergoers),
this will help to understand for example the relationships between the
family members?; (ii) Semantic – is there meaning in the response –
identified by the Logical structure of the clause (LSc) (Van Valin, 2005a,
2005b)?; (iii) is there a lexical representation to describe the linguistic
phenomena e.g. static, active accomplishments for example, buying and
eaten?; (iv) is there syntax i.e., a structured explanation of the answer
(Subject-Verb-Object) following the layered structure of the clause
(LSC)?; (v) is there discourse – linking of the answer to the question/previous utterance/event?; (vi) is there a pragmatic context of the
event? Fig. 3 outlines the Model experiment 1 for Task 1– Story Telling
Mapping Attributes to RRG, which identified the Steps 1–3 as the
mapping framework to support Stage 2 and the quantitative and qualitative assessment. Further Step 4 is regarding the actual speech production for Task 1 which requires a mapping to the linguistic expression
considering pauses and timing using a speech protocol based on speech
disfluencies.
The remaining participant tasks in Strand 1 consisting of tasks 2–4
and Strand 2- interview analysis assessment, their mappings (using Steps
1–3) are found in Appendices A.1, A.2 and A.3 reflecting Task 2, Task 3,
and Strand 2 Interview Task respectively. All the tasks use the same
speech protocol as in Step 4 above. Task 4 is the retelling task and hence
the same analysis as Task 1- story telling but accounted for differently as
discussed in the later assessment adjustments.
In Stage 2, the utterance assessment will be specified by a range of
respective task related language production, cognition and speech parameters, and indicators (linguistic markers) and corresponding scores.
Here the language production parameters include lexical (diversity, content), syntax (clause SVO, noun SVO), semantic (repetition, pronoun
frequency), discourse (reference, context, speech act, change of
anaphoric resolution (AR), correctness of AR, pragmatic (focus). On the
other hand, language cognition parameters contain ontology (ontology
placement), lexicon (word production, contextual new word, repetition
(echolalia), involuntary (palilalia), retain language). Speech parameters
comprise the nature of pauses, number of pauses and timing of pauses
which are impactful as speech disfluencies. Pauses can be short (.), long
(.) and a very long pause (…). Interruptions are considered as pauses and
exist at word, sentence, and discourse levels of communication, and thus
may change the grammar. For example, in the sentence, ‘so it’s [//] must
be the Sunday newspaper’, (//) would be considered as a pause as
prescribed and adapted from CLAN MacWhinney (2021) found in the
table in Appendix B which identifies speech notation, phonological
fragments, speech disfluency, and our qualitative scoring indicators. The
indicators were selected on the premise of the assessment parameters
and the nature of correctness. Indicators included, complete, correct,
fluent, new, yes, lower, default, same reduced, some, incomplete, upper
partial, poor, no, and N/A.
As highlighted, the experiment is to create a model based on a
dataset from the Dementia TalkBank, constituting participant tasks
mapped to RRG phenomena and corresponding questions addressing 5
aspects of language production found in Tables 1 and 2 aspects of language cognition and 2 speech parameters found in Table 2. This has an
options and point scoring system. For example, a language production
parameter such as lexical diversity – which refers to different lexical
words used by the participant – is assessed with 4 possible options with
3.2. Experiments
Focusing on the lexical-semantic work and developing the PSTLCAM, Role and Reference Grammar (RRG)’s conceptual use of PCA is
invoked via a two-stage process. Stage 1 is to create a linguistic mapping
of the strands and tasks and Stage 2 is the individual utterance assessment per task.
For Stage 1, the first process requires mapping to RRG concepts, this
is achieved via Step 1 - each task was analysed in terms of its specific and
relevant attributes such as keyword recognition, events and timing,
people, objects, “who did what to whom” and including a question-andanswer requirement. Step 2 – each attribute or grouped attributes were
Fig. 2. Picture Naming Task ("nhs.uk", 2019) from the MMSE.
6
K. Panesar and M.B. Pérez Cabello de Alba
Language and Health xxx (xxxx) xxx
Step 2 - Language Produc�on
and Cogni�on Links
Step 1 - A�ributes
Events (start, middle and
end)
Re-telling a story require
sta�ng a series of events.
(Cogni�on – picking of
keywords)
Timing of the event
Who did what to whom?
Who did what to Whom
when?
Step 3 RRG phenomena (Van
Valin Jr, 2005b)
Step 4 Speech produc�on
Seman�c macroroles –
Actors and undergoers.
How do say that?
Seman�c – present the
meaning of the event –
Logical structure of the
clause
Communica�on of the
passage of the story
This requires
remembering
par�cipants in the story
(memory recall).
Lexical representa�on of
events (ak�onsarten verbs)
e.g., ac�ve
accomplishment such as
melted, mel�ng
When, where how and
why – to reflect the
informa�on structure
Syntax – structured
explana�on of the event
based on subject-verbobject using the Layered
structure of the clause
This requires reference
to par�cipants, objects,
loca�on, �me, manner,
and reason (with a
mental lexicon and
represen�ng
knowledge in order )
Speech disfluencies –
CHAT manual
(MacWhinney, 2021)
Discourse – linking with
previous events and topics
Pragma�c context of the
event to other events, and
world knowledge
Fig. 3. Model experiment 1 - Task 1 – Story Telling Mapping Attributes to RRG.
corresponding scores. For example: LexDivNew (3); LexDivSame (2);
LexDivPoor (1) LexDivNA* (0) as shown in Table 1. The outcome of the
model will provide the participant an integrated score from the assessment from both strands of tasks. This will be compared against an internal Global Deterioration Scale band matrix.
individuals and carers, distributed to various care homes (UK and
Spain), and hosted on the Dementia Voices website (InnovationsInDementia, 2016) for completion via the DEEP (Dementia Engagement &
Empowerment Project) group – ‘Taking Part’ in December 2021 (InnovationsInDementiaOrgUk, 2021). The model experiment development
involves two phases. Phase 1: looking at 3 participants involved in
strand 1 (tasks 1–4) providing a PST-LCAM 1.0 model. Phase 2 interview
assessment based on a revised model containing CLAN based analysis of
speech disfluency for the timing and pauses – creating a PST-LCAM 2.0
model. In each case, manual analysis and allocation of points was conducted by a two-person team, taking one parameter at a time – hiding
the previous parameter (a set of colour coded columns in Microsoft
Excel) to remain neutral/impartial/unbiased to the previous lexical
categories, which have existing embedded automatic metrics. The model
development has the following phases: 1) Model assessment - training
and testing; 2) model assessment techniques; 3) model analysis techniques; and 4) model tuning techniques. Model analysis and tuning
techniques have been discussed in detail in the above sections.
3.3. Experimental challenges
The experiment had linguistic, technical, and clinical challenges.
Linguistically, initially we had to ensure the accuracy of the mapping
framework from a participant task to specific language production and
cognition parameters in a RRG context. Further, it was necessary to
ensure balanced linguistic decisions relating to the levels of language in
the creation of the model. Technically, to ensure a systematic approach
in conceptually devising and training the model with linked concepts
based on semantic knowledge graphs. Further it was important to
implicitly consider the underpinning descriptive, observational,
explanatory, and computational linguistic adequacies of RRG model.
From a clinical perspective, it was critical to ensure a normalised and
balanced interim assessment of participant tasks to derive the summative score, indicators, and outcome. Further, access to specialist
personnel to critique to the feasibility of the proof of concept of the PSTLCAM.
Our experimental approach posed a range of challenges. (1) Inconsistent recording of participant information. For instance, a participant’s
age, sex, and education information is missing. Alternatively, there is a
declaration of the patient current dementia health status. (2) Impact –
limited comparative information to help train the model for linguistic
and cognitive scoring. (3) Interview topics are varied as expected – but
modelling is more complex, however, the same language and cognition
parameters can be used. (4) Limitation – some transcripts have not been
reviewed by a second transcriber – so were avoided.
3.4.1. Model assessment training and creating a baseline model
A baseline model was created from 3 iterations of sampling. Iteration
1 was from a sample of 50 utterances without speech indicators, based
on 1 investigator, 1 participant (low GDS) and strand 1 tasks. This
involved the continuous refinement of categories and parameter
assessment and scores to establish proof of concept and baseline model.
Iteration 2 included 3 other participants (higher GDS scores) and 100
utterances from the same investigator and their audio and transcript
with speech pauses and timing of pauses to the latest model. Iteration 3
involved Strand 2 – interview assessment from other investigators –
audio and transcript with over 300 utterances with speech indicators of
disfluency to form a baseline model with an utterance score.
3.4.2. Model assessment methods used in testing of all tasks
For testing the utterances an expanded set of assessment steps are
performed. These included (a) utterance scoring based on the type of
utterance: Type A - One-word Utterance e.g. bus; Type B -Simple (SVC)
’A lady lost her purse’ - Type C – Simple (Adjuncts) adjuncts/adverbs/
manner/location - e.g. ’ I saw Mary in Madrid’; Type D – Complex
3.4. Experimental setup and design
This model is conceptually designed in Microsoft Excel – software
using -various lookups, data validations, and group functions. Assessment design was based on data collected from questionnaires sent to
7
K. Panesar and M.B. Pérez Cabello de Alba
Language and Health xxx (xxxx) xxx
Table 1
Scoring framework for 15 Language Production Parameters.
Table 2
Scoring framework for 7 Language cognition and 2 speech parameters.
Levels of
Language
Assessment
Parameters
Explanation
Indicators
Tasks
Max
Score
Levels of
Language
Assessment
Parameters
Explanation
Indicators
Tasks
Max
Score
All
Length
Number of words
All
3
Ontology
Ontology
placement
Richness of words
All
3
Lower,
Same,
Upper, NA
3
Diversity
Content
Content word
All
3
Clause SVO
word order/clause
level
All
3
Lexicon
Word
Production
Conceptual
placement
hierarchical levels
– Upper
(generalising),
Same (Discourse
aligned), Lower
(more specific)
Semantic memory
Lexicon
All
Lexical
3
word order/noun
phrase
All
3
Lexicon
Contextual
new word
Semantic memory word in context
All
3
Repetition
Repetition of
words
All
3
Lexicon
Picking
words
Semantic memory –
picking words
All
3
Pronoun
Freq
Frequency of
pronoun use
All
3
Lexicon
Repetition
Echolalia
All
3
Sentence
Completion
Completeness of
sentence in context
(use of
morphological and
grammatical)
terms
Correct reference
All
3
Lexicon
Word/
Phrase
Revision
Correct,
Partial,
Poor, N/A
All
3
All
3
Lexicon
Retain
Language
Repetition of
previous words/
sounds from
investigator
Changing words
based on audio
transcript
annotation e.g.,
inclusion of < >
[//]
Level 1 language
use
New,
Same,
Poor, N/A
Correct,
Partial,
Poor, N/A
Correct,
Partial,
Poor, N/A
Correct,
Partial,
Poor, N/A
All
Noun SVO
Correct,
partial,
poor, N/A
New,
Same,
Poor, N/A
Correct,
Partial,
Poor, N/A
Correct,
Partial,
Poor, N/A
Correct,
Partial,
Poor, N/A
New,
Same,
Poor, N/A
Correct,
Partial,
Poor, N/A
Correct,
Partial,
Poor, N/A
All
3
All
3
Speech
Pauses
Filled sound pauses
(e.g., &-um)
All
3
All
3
Speech
Timing of
pauses
Short, medium, and
long pauses
Correct,
Partial,
Poor, N/A
Fluent,
Partial,
Poor, N/
A*
Fluent,
Partial,
Poor, NA*
All
3
Yes, Some,
No, N/A
All
3
Yes, Some,
No, N/A
All
3
Correct,
Partial,
Poor, N/A
All
3
Syntax
Semantic
Discourse
Reference
Context
Speech Act
Change of
AR
Correctness
of AR
Pragmatic
Focus
Context accuracy
(use of the correct
topic)
Type of Speech act
(interrogative,
assertive,
directive) (Searle,
1969)
Change of
anaphoric
resolution
Correctness of
anaphoric
resolution
Completeness of
sentence
Correct,
Partial,
Poor, N/A
Correct,
Partial,
Poor, N/A
Correct,
Partial,
Poor, N/A
Table 3
Scoring for the Generative Naming Task found in the MMSE.
Language
Production
Levels of
Language
Lexical
Assessment
Parameters
Diversity
Content
Table 2 identifies language cognition parameters of the ontology and lexicon and
together with speech dysfluency (MacWhinney, 2021). These parameters align
to the language and cognitive assessment invoked from the MMSE, and language
and memory cognition domains (Kulkarni & Moningi, 2015).
Language
cognition
Utterance - with subordination and coordination ’ I will go if it does not
rain’; Type E - Short Attempt at lexical attributes; Type F: Long Attempt
at lexical attributes. The language production, cognition and speech
parameters are assessed by a metric of qualitative allocation and
quantitative scoring such as the assessment parameter SyntaxClauseSVO
and it can be correct, partially correct, or poor in communication.
At this point any investigator utterances will be marked as ‘INV’ to be
excluded. (b) Normalised utterance scoring based on framework for
participant tasks based on exclusions set, and inclusions sets, and types
of utterances and lexical and cognitive significance are discussed further
in the next section. For instance, a complete (successful) utterance will
score a maximum of 66. A generative naming task complete (successful)
words will score a maximum of 24 shown in Table 3. This score of 24 is
normalised to standard utterance assessment score of 66.
Here the assessment parameters of language production level: lexical
(and sub parameters) and pragmatic; language cognition: ontology,
lexicon, and speech (pauses) have a range supporting considerations for
Pragmatic
Focus
Ontology
Ontology
placement
Word
Production
Retain
Language
Lexicon
Lexicon
Speech
All
Cognition no
pauses
All
Cognition
timing pauses
Max score for the generative naming task
Average score
Poor score
N/A
8
Options
Score
New, Default,
Poor, N/A
Correct,
Partial, Poor,
N/A
Correct,
Partial, Poor,
N/A
3, 2, 1, 0
Lower, Same,
Upper, NA
New, Same,
Poor N/A
Correct,
Partial, Poor,
N/A
Fluent,
Partial, Poor,
N/A*
Fluent,
Partial, Poor,
NA*
3, 2, 1, 0
3, 2, 1, 0
3, 2, 1, 0
3, 2, 1, 0
3, 2, 1, 0
3, 2, 1, 0
3, 2, 1, 0
9 + 15 = 24
6 + 10 = 16
3+5=8
0
K. Panesar and M.B. Pérez Cabello de Alba
Language and Health xxx (xxxx) xxx
a person-centred assessment. These include: (1) Utterance complexity of
the category Groups A to F, are considered implicitly in the scoring. (2) If
the content in an utterance is missing – as per the discourse and pragmatic phenomena – this is considered a content poor utterance with
underlying language and cognition issues. (3) Acknowledgements are
part of the discourse – but for the model are not computed – it is
important for conversational design, but do not actually provide a
response to the participant task. (4) Use of synonyms – ‘same’ as default’
– to ease model parameter analysis (5) NA count –used to differentiate
utterances and indicators that are not relevant and also to aid verification. (6) Use of different linguistic patterns (PAR1 – T1 - ‘she lost it’) to
express the same idea – a positive indicator of cognitive ability. (7)
Interim scoring of results – from task to task. (8) After generative
naming, in a conversational style participant would declare when they
are done, and this utterance will be ignored in the data collection to
differentiate the assessed tasks from the ones used as acknowledgements. (9) Self interruptions are considered as pause – with a change of
grammar as in (//) ‘no that [//] that’s’ or as in the case of ‘&hm + ’
when a participant is pausing with a sound. (10) Transcript coding is
translated into language and cognition contributing elements. For
instance, phonological fragments & disfluencies such as fillers, phonological fragments, and repeated segments are all coded by a preceding &
(MacWhinney, 2021). (11) Non-linguistic cues/events such as a laugh,
cries or hisses noted in the transcript such as ‘&=laughs’ is not considered in the model. Also, visual cues (pointing) are not considered but
helpful for the participant.
Sandwich). From this transcript analysis an extract was collated for our
analysis of the first 124 utterances containing 9 investigator utterances
(to be excluded). Utterances 1–23 involved the general talking topic;
and 24–124 were illness- related. These utterances have the speech
disfluencies symbols from the CHAT automatic annotation excluding the
word level error codes, but have high level of accuracy (MacWhinney,
2021). Our extract had a couple of errors such as (‘‡’ for ‘I and i_mean for
‘I mean’) which were cleaned for inclusion and analysis. Further automatic time and speech disfluencies metrics were created, to assess the
number of pauses for example (medium pause - = COUNTIF(G9, "*(.)* ").
4.1. Model rescaling
Next stage of scoring is the rescaling which involves the inclusions,
penalties, exclusions set (non-linguistic/visual cues), and types of utterances, flagging of completion, and lexical and cognitive significance
(such as ACKs) and completion of tasks. Rescaling is based on series of
nested if-then-else statements in Microsoft Excel to establish additions or
subtractions from the normalised score. These include: (a) an addition
for completing the Generative Naming task (>=10 nouns within a minute) and decision rule: “=IF(cell>=10,1.5, IF(cell>=7,1, IF(cell>=5,0.5,
IF(AND(cell>=1, cell<=4), 0.25, IF(cell=0,0,0))))); (b) subtracting
scores for not completing the Story Retelling (SRT) task again - and
decision rule: (IF(cell=0, 5,0)); (c) added scores for a good use of acknowledgements in conversation (based on a threshold >5) but excluding
non-linguistic/visual cues and using decision rule: =IF(cell>=5,1.5, IF
(cell>=4,1.2, IF(cell>=3,0.9, IF(cell>=2,0.6, IF(cell=1,0.3, 0))))). Note
for the interview task the acknowledgements are not excluded – as they
are part of the interviewing responses; (d) added scores for very good
totals for language production and cognition group totals (threshold >20 for
max scoring of all qualitative attributes) and decision rule: IF(cell>=20,
0.5,0); (e) for the interview task, the considerations of the use of repetitive word, filler words and vague expression words (Guinn &
Habash, 2012). In this model they are not considered as they reduce the
lexical richness from a semantic point of view and thus the SemSenCorrect is replaced by SemSenPartial score, a score from 3 to 2 for the
individual parameter score of the utterance. The Python code routine for
top 10 words of 115 utterances of text (dataset) identified: (1) ‘whatever’ (28); (2) ‘things’ (23); (3) ‘like’ (22); etc. This list excluded the
filler words such as ’th’ or ’um’ or ’uh’ (which are considered as part of
speech disfluencies parameter assessment) and excluded the visual cues
identified on the annotated transcription such as ‘points’ and ‘ges’
(gesture).
For lexical-semantic analysis and cognitive variability utterances 26
– 28 are to be reviewed. Starting off with utterance (26), the investigator
asks, ‘so when you started feeling like all of the things you just described
started happening’. The participant responses are utterance (27) ‘oh I
&-um back in the summer of two thousand seven after my mom had passed
away there was one little tiny thing in Chicago ?’ and Utterance (28) ‘
‘‘&=points:forward that I [/] &+kn I [/] &+kn I useta know what it was
called’. For utterances 27 and 28 they gain maximum scores for the
lexical diversity, semantic, syntax and pragmatic parameters. However,
for utterance 27 they communicate appropriate discourse references and
cognitive information, but for utterance 28, there is a loss of context and
picking out words and have included filler words. This pattern of variability of cognitive status also appears in utterance 51 - ‘I would hafta go
back in there &=ges:putting_in to find them, whatever, which I did a little bit’
and utterance 52 - ‘but <then I just> [//] it’s like, no it’s getting whacky’.
All utterances of a strand are grouped from the collective qualitative
attributes derived from the scoring and assessment. For example, a
participant may have more of category A scores, some of category B, a
little of category C scores demonstrating the linguistic and cognitive
behaviour. See Table 4, for all qualitative attributes, quantitative score
per parameter and indicators with reference to the mini-mental state
examination (Cockrell & Folstein, 2002; NHSuk, 2019).
This category group scoring minimum and maximum value of
4. Results
In this section, we will examine our statistical approach to model
assessment analysis and tuning techniques, which are based on a range
of different considered factors as mentioned in the previous section.
These factors and further rescaling are to provide a clinically orientated
person-centred adjustment to the pre-screening cognitive assessment
(Pendrill, 2018).
Initial scoring for Strand 1 with 4 tasks includes (1) the scoring of all
utterances against a maximum score of 66; (2) the scoring of the
generative naming task (single word responses) to provide a normalised
score to be used for comparison and completeness; (3) a normalised
mean without the picture naming task to see the difference in variation;
and a normalised mean based on all 4 tasks (used for the main analysis;
(4) category grouping totals for Group A, Group B, Group C for the range
of qualitative indicators as seen in Table 4; (5) apply statistical metrics
(ratios, min, max, mean, standard deviation (SD) and variance) to the
language and cognition groupings (lexical, syntactic, semantic, pragmatic, discourse, cognition, speech) for the 22 language production/
cognition parameters, for example - minSyn, maxSyn, meanSyn, sdSync.
For strand 2- interview task with participant 6 and 313 participant
utterances with 8 topics (@G: Illness; @G: Important_Event; @G: Window;
@G: Umbrella; @G: Cat; @G: Cinderella_Intro; @G: Cinderella; @G:
Table 4
All Language and Cognition parameters and Assessment Attribute Qualitative
Indicators.
Category of
Group
Totals
Qualitative
Attribute
Quantitative
Scoring per
parameter
Indicators - Reference from
MMSE
Good
Ability
(A)
Complete, New,
Correct, Lower,
Fluent, Yes
3
Some
Ability
(B)
Poor Ability
(C)
Reduced,
default, partial,
same,
Incomplete,
Upper, poor, No
2
These terms signify the
ability to demonstrate
good communication and
understanding
These terms signify some
ability to communicate and
understand
These terms signify a poor
ability to communicate and
understand
1
9
K. Panesar and M.B. Pérez Cabello de Alba
Language and Health xxx (xxxx) xxx
category ability is used for the basis of the GDS analysis matrix (GDSAM)
of a GDS score and associated GDS band score descriptor. This is based
on language and cognition insight and group threshold (minMean and
maxMean of each GDS band), variability, indicators and markers and
Dementia TalkBank training, MMSE and clinical insight. For example,
SD threshold (e.g., 0 – 0.3) indicates a GDS band score 1–3 of which no
cognitive decline (1) to mild cognitive decline (3) on GDS. Finally, the
participants’ rescaled, integrated scores and statistics are queries against
the GDSAM for allocation of a GDS band score for the participant – in the
form of a dashboard. This has a GDS band type and descriptor (Reisberg
et al., 1982). For example, our model has a GDS – band 1 – which denotes “No cognitive decline based on our Language and Cognition
assessment” and further a Language Cognition Parameter Grouping Lexical SD with a score of 0 has an indicator ‘Able to provide lexical
diverse words and information.’ Another example, for the syntax
grouping, the individual quantitative metric e.g., MeanSyn has a value
of 1 denotes ‘no problem structuring the word order of the utterance’. A
complete GDSAM table is found in Appendix D.
During the model training several findings include: (1) the use of
synonyms by participants - to express the same idea. This reflects a
positive indicator of cognitive ability with the introduction of new
words. (2) Participants are generally doing better on the earlier tasks
than the latter task such as story retelling which reflects issues in long
term memory (3) If there is missing information, the RRG theory refers
to this as ‘prior contextual dialogue to retrieve that information’ Van
Valin, 2005a, 2005b. For example, ‘can’t do it’ refers to ‘I can’t do it’.
Here the pronoun is expected, but it is implicit through context. (4) The
repetition of an investigator’s utterance, echolalia, is recorded as repetition in language cognition – Echolalia assessment. (5) The use of acknowledgements by the participant is a contributing factor to the final
assessment.
with a scoring from a qualitative drop-down list and equivalent quantitative score assignment. This colour coded group was hidden to prevent bias and remain objective, and the next colour coded columns (that
is parameter category) were selected and assessed. For instance, the
utterance category is assessed first, hidden, and followed by lexical
category and hidden until all categories are assessed.
Validity is achieved by normalising for statistical analysis, additions
and subtractions to an utterance score based on linguistic phenomena at
word level/sentence-level/discourse-level pragmatic level for the strand
1 task. This is achieved by taking the mean utterance score through a
series of sequential adjustments based on completion/incompletion/use
of acknowledgements/demonstration of good ability to form a rescaled
mean score. For completing the generative naming task, the use of the
lookup table and the additions as follows: a) > +10 nouns add 1.5; b)
between 7 and 9 add 1; c) between 5 and 6 add 0.5; d) between 1 and 4
add 0.25; e) 0 nouns uttered, no adjustment. Similarly, if there was no
attempt of the story re-telling task a value of 5 is subtracted from the
mean utterance score. However, the use of acknowledgements by the
participant in sessions is a powerful compensatory strategy (Kindell
et al., 2013; Pilnick et al., 2021) for conversational interaction and
hence an addition to the mean utterance score was as follows: a) > =5
acknowledgements add 1.5; b) 4 acknowledgements add 1.2; c) 3 acknowledgements add 0.9; d) 2 acknowledgements add 0.6; e) 1
acknowledgement add 0.2. Finally, the addition of 0.5 is added to the
mean utterance if there is a demonstration of more than 20 qualitative
‘good ability’ assessed.
Validity for strand 2 interview task looks at the same linguistic
phenomena at word level/sentence-level/discourse-level/pragmatic
level but also includes the acknowledgements in conversation as part
of the initial analysis and assessment and hence contributes to the mean
utterance score. Appendix C – refers to the Participant and Data (Dementia TalkBank) and our Hypothesis Testing baseline on the investigator’s allocation. Here participant 5 has been allocated a GDS
score of Stage 6 for Strand 1 tasks and Strand 2. However, our results
derive a Stage 4 result for Strand 2 interview task. This task is mainly
about recalling information about a topic and about themselves. There
was no evidence/files available of participant 5 undergoing Strand 1 –
which if available may allude more towards a Stage 6 outcome. See
Fig. 4 presents a line graph comparison of utterance behaviour of participants 1,2,3 and 5 which correlates with each participant’s stage
outcome.
The accuracy measure is collectively based on the reliability of the
original extracted transcript data from the Dementia TalkBank followed
with some cleaning of the data, model design, the intuition of language
experts as part of the manual analysis. See Table 5, the dashboard for
Participant 1.
4.2. Quality of the model
In terms of the objectivity, reliability, validity, and accuracy of our
model it can be assessed on two levels. The model itself, details (link)
and the use of the dataset (utterances) is found in Appendix F. At the
initial dataset usage of the Dementia TalkBank (MacWhinney, 2017), we
have inherited an annotated transcription with an implicit error factor
manipulated from the CHAT manual by MacWhinney (2021). The
transcription has been identified with high level of accuracy, but with a
couple of transcription annotations/anomalies which were addressed in
Section 3. For example, (1) transcript utterance 113 ‘ &-um I went this last
Saturday &-um with my son and my sister in law and his [/] his wife and so
on like that’. (2) Transcript utterance 98 ‘because I_mean I useta like all
sorts of stuff and big stuff &=hands:spread’. At the linguistic analysis
stage, the reliability, can be explained via the discussion of the use of
dataset in analysis, automatic analysis grouping controls and assessment
in Microsoft Excel. There are a series of automatic and group controls
(counts). The order which they are computed are included as follows. (1)
number of participant utterances; (2) number of utterance types based
on a qualitative utterance category; (3) total for the language production
category; (4) the total for the language cognition category; (5) total
utterance score; (6) normalised score adjustment if it is a generative
naming task; (7) group totals of the qualitative scoring for the categories
of ability, good/partial/poor; (8) the number of speech disfluencies in
the utterance via 13 speech checks. For example, 4 out of 13 are: a)
Revision - Count of &+ ’; b) Word repetition count of [ x 3]; c) Filled
pause - Count of &= ; d) ‘Timing pause - Count of ’.’’; (8) a final holistic
utterance score subtracting the speech disfluencies.
Here, we carried out the manual analysis and initial scoring of two
strands – strand (1) with four tasks for 4 participants (total utterances
of…) and strand 2 with an interview task for participant 5 and an extract
of 122 utterances. Utterance Scoring was achieved by a simultaneous
verification and assessment by a two-person team with systematic selection of colour coded group (to reflect the different parameter groups)
5. Discussion
5.1. Advantages of the PST-LCAM model
The goal of the experiment was to identify in regular conversation
some mild impairment/speech issues and/or indicators or a feeling of
something going wrong and the need for further investigation from a
clinical perspective. The variability of language and cognitive behaviour
of participants can be identified by the participant’s qualitative and
quantitative results with their resulting different GDS band (1−5) and
indicators. Our PST-LCAM concept and development is based on the
investigator session with the participant via audio files/transcripts and
final outcomes. As noted earlier, we have a true hypothesis result. Our
outcomes correlate with problems at word level/sentence level/
discourse level communication and specific conditions as in Table 6 with
examples from our model analysis, implementation, and assessment.
The explanations of these conditions were presented in the literature
review earlier (AphasiaTalkbank, 2021). For each level of language Table 6 identifies a specific condition with an explanation and selected
10
Language and Health xxx (xxxx) xxx
K. Panesar and M.B. Pérez Cabello de Alba
Utterance Score Comparision of Participants 1, 2, 3, 5
65
55
45
1
4
7
10
13
16
19
22
25
28
31
34
37
40
43
46
49
52
55
58
61
64
67
70
73
76
79
82
85
88
91
94
97
100
103
106
109
112
115
35
Participant 1
Participant 2
Participant 3
Partcipant 5
Fig. 4. Line graph comparison of utterance behaviour of participants 1, 2, 3 and 5.
Table 5
Participant 1 – Dashboard of Strand 1 of 4 MMSE tasks and Utterances’ (Utts) Analysis and GDS outcome.
Participant
No of Utts.
Normalised
Utts.
ACK (More than 5
ACK - credit given)
GN (√)
Extra points
if done
P1
23
14
5
Parameter
Mean (over
Normalised Utts)
MeanLex
Score
Ratio of
Normalised Utts
LexRat
Lexical
Syntactic
Semantic
Pragmatic
Discourse
Lang
Cognition
(LG)
Speech
Summary
2.77
SRT (√)
Penalty if not
done
Gp A (>20
credit
given)
Max Utterance Value
√
X
Normalised Mean
Score
SD
20.86
65.49
69
Comments
0.95
0.17
SD
Able to provide lexical diverse words and
information
MeanSyn
3
SynRatio
1
SD
0
Able to structure an utterance appropriately
MeanSem
2.74
SemRatio
0.91
SD
0.15
Utterances and words provided are adequate to the
context and give meaning, and little variation in
semantic representation of utterances.
MeanPrag
3
PragRatio
1
SD
0
They demonstrate a very good world perspective
MeanDis
3
DisRatio
1
SD
0
Very good at producing sentences and words in the
right context and reference
MeanLG
2.87
LGRat
0.66
SD
0.32
No repetition or involuntary words, however, there
is the displacement of wrong/more generalised
words used for complex Utt: category D
MeanSpeech
2.85
SpeechRat
0.45
SD
0.36
Occasional speech interruptions and pauses
meanAll
2.89
meanRat
0.85
SD
0.14
There is a consistency between the utterances
produced in relation to all the utterances and little
difference in variation.
Based on our GDS band scoring table - P1 = Stage 3 - Mild Cognitive Decline. This participant has failed to re-tell the story, but capable of producing responses in
relation to investigation’s context.
example (s) from our model analysis, implementation, and assessment.
Table 6 further provides early evidence and correlates with discussions by Ostrand and Gunstad (2021) that early-stage dementia
(cognitive decline) reduces the amount of specific content information
conveyed during speech, while maintaining contextual relevance and
grammaticality. They further note that other levels of linguistic processing, including articulatory production, phonetic retrieval, and syntax, remain largely unimpaired until much more advanced stages of the
disease.
Table 6
Example of language issues identified from the proposed model assessment.
Type of
feature
Condition
Example from our model
results
Participant
(P) Source
Word level
anomia
Use of whatever, things, //,
pointing, gestures
“Where my mom and dad
and so on had been &=ges:
circling where they had
gone into these places where
&-um nursing home that
whatever like that”
that’s [/] that’s what’s
whatever, ‘so on like that’
Whatever, thing
No matched evidence
P3, P5
circumlocution
jargon
perseveration
semantic
paraphasia
stereotypy
Sentence
level
agrammatism
empty speech -
Discourse
level
communication vs
language
Whatever, thing, ‘useta’ (e.
g., &-um)
No matched evidence
‘Stuff, like I was going there,
going there, going there’
‘He got a funny paper down
there’
P5
5.2. Future directions and clinical implications
P2
The PST-LCAM model will be embedded into a conceptual architecture as shown in Appendix E. This will involve: (1) to implement the
model design and cognitive assessment process of language and cognition by deploying a Role and Reference Grammar language engine and a
psycholinguistic and cognitive adequacy (PCA) assessment protocol
discussed earlier; (2) using a user voice input it is transcribed by a
bespoke automatic speech recognition (ASR) annotation framework to
create a set of annotated utterances. Our work and plans concur with
Boletsis (2020) who reviewed 9–15 studies taking place from 2017 to
2020 on automated speech-based interaction for cognitive screening; (3)
this validates the use and next stage of our model development to be
embedded in an intervention for early dementia detection. It will provide indicators, pre-diagnosis results, and recommendations for the
P3, P5
None
P3, P5
None
P5
P1
11
K. Panesar and M.B. Pérez Cabello de Alba
Language and Health xxx (xxxx) xxx
cognition assessment model (PST-LCAM) as an intervention into a
conversational agent interface as an application for pre-screening people
the early detection of language and cognitive decline.
Our model uses Dementia TalkBank dataset of investigators interviews of the participants’, cognitive assessment sessions transcribed
by the CHAT system taking in account speech disfluencies of the participants. These transcriptions constitute the input for training our language and cognition assessment model in Microsoft Excel. Our model is
aligned with GDS band score descriptor via GDS analysis matrix
(GDSAM) providing language and cognition insight and group threshold
(minMean and maxMean of each GDS band), variability, indicators, and
various statistical markers. For example, (1) SD threshold (e.g., 0 – 0.3)
indicates a GDS band score 1–3 of which no cognitive decline (1) to mild
cognitive decline (3) on GDS. (2) GDS – band 1 – denotes “No cognitive
decline based on our Language and Cognition assessment” and further a
Language Cognition Parameter Grouping - Lexical SD with a score of
0 has an indicator ‘Able to provide lexical diverse words and information.’ Another example, for the syntax grouping, the individual quantitative metric MeanSyn has a value of 1 denotes ‘no problem structuring
the word order of the utterance’. To summarise our model is successful,
and demonstrates a proof of concept, and fit for purpose. The novelty of
our PST-LCAM lies on the use of a functional grammatical model to elicit
the understanding and meaning that are required for a thorough language and cognition assessment. We are aware of the limitation of our
model in terms of low participant numbers, however, our future plans of
testing with larger groups will enrich our pre-screening trained language
and cognition assessment model and support validation to engage in
further dementia related research collaboration.
participant and/or carer, as precursors to a definitive final clinical
diagnosis. These qualitative recommendations will form an outputs of
the model via (1) Acquiring latest MCI recommendation via dementia
and speech specialists, dementia care providers and health organisational documents (NHSorg, 2020; TheAdultsSeechTherapyWorkbook.
com, 2022). (2) Creating a recommendation mapping protocol (RMP) of
language categories and a mapping of recommendation activities; For
instance for a syntax indicator a range of options (i) physical activity –
out and about, walk and talk; ii) mental activity – word game app/audio
book/reading puzzles; (iii) social activity – creative singing following a
lyric; iv) creative activity – writing a to do list; v) individual/group/direct/indirect activities - Reminiscence work, a time in your life
in a social setting. (3) Creating a personalised recommendations activity
plan (PRAP) based on the RMP and participant’s GDS results, and latest
personal dashboard. This PST-LCAM model builds on the innovations in
dementia – with the aim to trigger and inspire this line of thinking and
development for remote pre-screening application at the patient’s convenience, via a conversational interface (see Appendix E). This will
improve patient’s experience, support pre-diagnosis processes, and help
to reduce costs in NHS dementia diagnosis, social care and contribute to
wider ambient assisted living (AAL) practices (AlzheimersResearchUkOrg, 2023; Demir et al., 2017). More recent considerations and studies
have identified that dementia only causes about 41% of cognitive
decline and there are other predictors such as lifestyle factors that can
impact cognitive decline (Pelc, 2023).
Our PST-LCAM proof of concept will be implemented as highlighted
above with ethical alignment to validate the computational model a
cognitive healthy older group of 200 participants will be recruited to test
and validate the model. They will undergo a battery of cognitive tests
administered by a pair neuropsychologist. This will be followed by a
protocol based comparative evaluation of our implemented PST-LCAM
model results and the clinical results, and with feedback appropriate
refinement made and subsequently a larger testing cohort The ultimate
goal is to test with a control group of participants with mild impairments
via clinical collaborative arrangements.
Declaration of Competing Interest
The authors declare that they have no known competing financial
interests or personal relationships that could have appeared to influence
the work reported in this paper.
Data Availability
6. Conclusion
The dataset link has been shared in the appendix and methods
section.
Our goal was to embed a pre-screening trained language and
Appendix A1 model experiment 2 participant task 2 – generative naming task mapping attributes to RRG
Table 7
Model experiment 2 Participant Task 2 – Generative naming Task Mapping Attributes to RRG.
Attributes
Explanation of using it
RRG relevance
Consecutive list of words within a minute
Domain dependent – picking out words (memory recall)
Velocity of retrieval
Ontology – placement from the correct Semantic classes
Lexicon - Lexical word retrieval and lexical word category
Pragmatic – linked to the topic/question
Appendix A.2 - Model experiment 2 participant Task 3 – picture description task mapping attributes to RRG
Table 8
Model experiment 2 Participant Task 3 – Picture Description Task Mapping Attributes to RRG.
Attributes
Language Production & Cognition Links – Mapping framework
RRG relevance
• Degree of interpretation of
the picture
• Captured Event with
participating objects.
• Who did what to Whom?
• Describe the Captured event in stages (Cognition – picking of keywords)
• Who did what to whom?
• This requires stating the relevant participating objects in the picture
(memory recall).
• Where, how, when, and why of the captured event.
• This requires reference to participants, objects, location, time, manner,
and reason (mental lexicon and representing knowledge in order)
• Semantic macroroles – Actors and undergoers.
• Semantic – present the meaning of the event – Logical structure of
the clause (LSc)
• Lexical representation of events (aktionsarten verbs) e.g., static,
active accomplishment – e.g., slowed, slowly, finished,
• Syntax – structured explanation of the event (SVO). Layered
structure of the clause (LSC)
• Discourse – linking the events
• Pragmatic context of the event to other events.
12
Language and Health xxx (xxxx) xxx
K. Panesar and M.B. Pérez Cabello de Alba
Appendix A3 – Strand 3 - participant interview task – interview sets about your family (Q and A) attributes to RRG concepts
Interview sets are based on topics such as the ‘tell me something about your family’, ‘tell me about your job’, ‘tell me about Little Red Riding Hood’,
Goldilocks and 3 bears; tell me about your country you live’. A sample model experiment is ‘tell me about your family’. Q stands for ‘Question’.
Table 9
Strand 3 - Participant Interview Task – Interview sets and attributes to RRG concepts.
Language Production and Cognition Links
RRG relevance
Understand the initial Q, with an initial response of Q
Understanding the confirmation (investigator) – acts as trigger
Turn taking and response - Understanding of the interim Q in relation to the initial Q
Understanding the leading Qs, and immediate response – and the nature of the response
Analyse the non-ability to understand the question
Range of participant responses
Q and knowledge domain dependent – picking out words (memory recall)
Velocity of retrieval
How many people are in your family?
Who is related to whom?
What does a particular family member do?
This requires remembering family members (memory recall) via leading Qs.
When, where how and why – can reflect the adverbial.
This requires reference to participants, objects, location, time, manner, and reason (mental
lexicon and representing knowledge in order)
• Fluency of response
• Cognitive – lexicon – pick out key words and word production
• Cognitive – lexicon – pick out key words and word production
• Semantic macroroles – Actors and undergoers – relation between the family
members
• Semantic – present a response the question – Logical structure of the clause
(LSc)
• Lexical representation to describe linguistic phenomena (aktionsarten verbs)
e.g., poorly, energetic, depressed
• Syntax – structured explanation of the answer (SVO). Layered structure of the
clause (LSC)
• Discourse – linking of the answer to the question
• Pragmatic context of the answer to the question
• Speech fluency
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Appendix B – Adapted - speech notation (SN), phonological fragments (PF), speech disfluency (MacWhinney, 2021), qualitative scoring
(QS)
Table 10
Speech notation (SN), Phonological fragments (PF), Speech disfluency, and qualitative scoring (QS).
Speech disfluency
Speech notation (SN)
QS
SN
QS
SN
QS
SN
QS
PF Revision
&+kn, (e.g &+m &+s, &+r,
& +ha)
No [//]
No < > [//]
&-uh, (e.g &-um)
Fluent, (or
n/a)
Fluent
Fluent
Fluent
Partial
Partial
Poor
&+kn, (again) more than
3 times
[//] three or more
< > [//] three or more
&-uh, (again) four and
more
[x > 3]
Critical
Fluent
&+kn, (again) three
times
[//] twice
< > [//] twice
&-uh, (again) three and
more
[x 3]
Poor
No [x]* **
&+kn, (again)
twice
[//]
< > [//]
&-uh, (again)
twice
[x 2] or [/]
No pause
^
Fluent
Fluent
(.) <short>
^2
Partial
Partial
(.) <medium>
^3
Poor
Poor
(…) <long>
^>3
Critical
Critical
Word revision
(<> Phrase) revision
Filled pause
Repetition (mono
words)
Pause
Pause within a word
blocking
Partial
Partial
Partial
Poor
Poor
Poor
Critical
Critical
Critical
Critical
Appendix C- Participant (Par) data (Dementia TalkBank) and our hypothesis testing
Table 11
Participant (Par) Data (Dementia TalkBank) and our Hypothesis Testing.
Par
Age
Sex
(M/F)
Ed years
Occupation
Comment
Experiment Strands
File
(INV)
Model
Train – Test
GDS
PAR1
PAR2
62
76
M
F
15
18
designated as early confusional stage
CLA (4 tasks)
4 tasks
Stage 3
Stage 4
78
66
F
F
16
/tele01a
/tele01a
/tele01b
/tele01b
/PPA
/depaul2A
Train
Train
PAR3
PAR5
Manager -private water co
MSc degree
History teacher
Graduate Textile designer
Train
Test
Stage 5
Stage 6
4 tasks
Interview
Appendix D – GDS Matrix and bands (also found in the Excel file worksheet GDS Band Matrix) (23 refers to the maximum total for the
group totals – derived from 23 parameter assessments)
Table 12
GDS Matrix and bands.
Scale
Indicator
Mean Utt
Score Min
Mean Utt
Score Max
Mean
Language
level Min
Mean
Language level
Max
Grp A
(Good)
Min
Gp A
Good
Max
Grp B
Partial
Min
1
No cognitive
decline
68
69
3
3
23
23
0
GpB
Partial
Max
0
Grp
Poor C
Min
0
Grp C
Poor
Max
0
(continued on next page)
13
K. Panesar and M.B. Pérez Cabello de Alba
Language and Health xxx (xxxx) xxx
Table 12 (continued )
Scale
Indicator
Mean Utt
Score Min
Mean Utt
Score Max
Mean
Language
level Min
Mean
Language level
Max
Grp A
(Good)
Min
Gp A
Good
Max
Grp B
Partial
Min
2
Very mild cognitive
decline
Mild cognitive
decline
Moderate cognitive
decline
Moderately severe
cognitive decline
Severe cognitive
decline
Very severe
cognitive decline
66
67.99
2.80
2.99
22
23
0
1
0
0
63
65.99
2.60
2.79
15
21
1
2
1
2
53
62.99
2.40
2.59
11
14
3
5
3
11
26
52.99
2.2
2.39
8
10
6
8
7
15
19
25.99
1.0
2.19
5
7
9
11
8
22
0
18.99
0
0.99
0
4
9
11
8
23
3
4
5
6
7
Appendix E: Conceptual architecture
Fig. 5. Conceptual Architecture.
.
Appendix F - Pre-screening trained language and cognition assessment model (PST-LCAM)
Please see the Microsoft Excel File here: includes worksheets:
1.
2.
3.
4.
5.
6.
7.
8.
9.
Instructions for use
Worksheet 1 - Lookup list
Worksheet 2 - GDS Band Matrix
Worksheet 3 - Participant 1 (Strand 1) utterances, PST-LCAM analysis, assessment, and dashboard
Worksheet 4 - Participant 2 (Strand 1) utterances, PST-LCAM analysis, assessment, and dashboard
Worksheet 5 - Participant 3 (Strand 1) utterances, PST-LCAM analysis, assessment, and dashboard
Worksheet 6 - Participant 5 (Strand 2) utterances, PST-LCAM analysis, assessment, and dashboard
Worksheet 7 - Utterance Variations (line graph) comparing participants utterances
Worksheet 8 - Utterance Extract – Top 20 words (Participant 5 utterances) and visualisation
14
GpB
Partial
Max
Grp
Poor C
Min
Grp C
Poor
Max
K. Panesar and M.B. Pérez Cabello de Alba
Language and Health xxx (xxxx) xxx
References
Jones, D., et al. (2016). Conversational assessment in memory clinic encounters:
Interactional profiling for differentiating dementia from functional memory
disorders. Aging & Mental Health, 20(5), 500–509. https://doi.org/10.1080/
13607863.2015.1021753
Kaplan, E., Goodglass, H.,Weintraub, S. (2001). Boston naming test.
Kindell, J., et al. (2013). Adapting to conversation with semantic dementia: Using
enactment as a compensatory strategy in everyday social interaction. International
Journal of Language & Communication Disorders, 48(5), 497–507. https://doi.org/
10.1111/1460-6984.12023
Kulkarni, D. K., & Moningi, S. (2015). Neurocognitive function monitoring. Journal of
Neuroanaesthesiology and Critical Care, 2(03), 246–256.
Lee, D., & Yoon, S. N. (2021). Application of artificial intelligence-based technologies in
the healthcare industry: Opportunities and challenges. International Journal of
Environmental Research and Public Health, 18(1), 271. https://doi.org/10.3390/
ijerph18010271
Liddy, E. D. (2001). Natural language processing. Encyclopedia of Library and Information
Science (second ed.). NY: Marcel Decker, Inc.
Linguamatics. (2021). How does Natural Language Processing (NLP) work? Retrieved
from 〈https://www.linguamatics.com/how-does-nlp-work〉. ’Accessed 12 December
2021.
MacWhinney, B. (2000). The CHILDES Project: Tools for Analyzing Talk: Volume I:
Transcription Format and Programs, Volume II: The Database (third ed.). Mahwah, NJ:
Lawrence Erlbaum Associates.
MacWhinney, B. (2017). Dementia.TalkBank. Retrieved from 〈https://dementia.talkban
k.org/〉. ’Accessed 12 February 2021.
MacWhinney, B. (2019). Understanding spoken language through TalkBank. Behavior
Research Methods, 51(4), 1919–1927.
MacWhinney, B. (2021). Tools for Analyzing Talk Part 1: The CHAT Transcription Format.
Retrieved from 〈https://talkbank.org/manuals/CHAT.pdf〉.
Mairal, R., Perez, M.-B.A.,et-al (2019). Teorías lingüísticas: Editorial UNED.
Mannonen, P., Kaipio, J., & Nieminen, M. P. (2017). Patient-centred design of healthcare
services: Meaningful events as basis for patient experiences of families. Stud Health
Technol Inform, 234, 206–210.
Maurya, H. C., Gupta, P., & Choudhary, N. (2015). Natural language ambiguity and its
effect on machine learning. International Journal of Modern Engineering Research, 5,
25–30.
McKhann, G., et al. (1984). Clinical diagnosis of Alzheimer’s disease. Neurology, 34(7),
939. https://doi.org/10.1212/WNL.34.7.939
Miah, Y., et al. (2021), 2021//. Performance Comparison of Machine Learning Techniques in
Identifying Dementia from Open Access Clinical Datasets. Singapore: Paper presented at
the Advances on Smart and Soft Computing.
Michie, S., Atkins, L., West, R. (2014). The behaviour change wheel: a guide to designing
interventions.
Mosqueira-Rey, E., et al. (2023). Human-in-the-loop machine learning: A state of the art.
Artificial Intelligence Review, 56(4), 3005–3054. https://doi.org/10.1007/s10462022-10246-w
Nasreddine, Z. S., et al. (2005). The Montreal Cognitive Assessment, MoCA: A brief
screening tool for mild cognitive impairment. Journal of the American Geriatrics
Society, 53(4), 695–699.
NHSorg. (2020). Activities for dementia - Dementia guide. Retrieved from 〈https://www.
nhs.uk/conditions/dementia/activities/?tabname=symptoms-and-diagnosis〉.
’Accessed 1 July 2020.
NHSuk. (2019). Standardized Mini-Mental State Examination (SMME). Retrieved from
〈https://www.swlstg.nhs.uk/images/Useful_docs_for_healthcare_professionals/min
i-mental_state_examination_form.pdf〉. ’Accessed 12 December 2020.
Noori, A., et al. (2022). Development and evaluation of a NLP annotation tool to
facilitate phenotyping of cognitive status in electronic health records: Diagnostic
study. Journal of Medical Internet Research, 24(8). https://doi.org/10.1002/
alz.068929
Norden, J., Wang, J., & Bhattacharyya, A. (2023). Where Generative. AI Meets
Healthcare: Updating The Healthcare AI Landscape. Retrieved from https://aichecku
p.substack.com/p/where-generative-ai-meets-healthcare.
Ntracha, A., et al. (2020). Detection of mild cognitive impairment through natural
language and touchscreen typing processing. Frontiers in Digital Health, 2, Article
567158.
O’Malley, R., et al. (2020). Can an automated assessment of language help distinguish
between Functional Cognitive Disorder and early neurodegeneration? Journal of
Neurology, Neurosurgery Psychiatry, 91(8), e18–e19. https://doi.org/10.1136/jnnp2020-BNPA.43
O’Malley, R. P. D., et al. (2021). Fully automated cognitive screening tool based on
assessment of speech and language. Journal of Neurology, Neurosurgery & Psychiatry,
92(1), 12–15. https://doi.org/10.1136/jnnp-2019-322517
OpenAI. (2023). GPT-4 is OpenAI’s most advanced system, producing safer and more
useful responses. Retrieved from https://openai.com/gpt-4. ’Accessed 15 March
2023’
Ostrand, R., & Gunstad, J. (2021). Using automatic assessment of speech production to
predict current and future cognitive function in older adults. Journal of Geriatric
Psychiatry and Neurology, 34(5), 357–369. https://doi.org/10.1177/
089198872093335
Padhee, S.et al. (2020). Identifying Easy Indicators of Dementia.
Palanica, A., et al. (2019). Physicians’ perceptions of chatbots in health care: Crosssectional web-based survey. J Med Internet Res, 21(4), Article e12887. doi:https://
www.jmir.org/2019/4/e12887/.
Pelc, C. (2023). Dementia only causes about 41% of cognitive decline: Study identifies
other predictors. Retrieved from 〈https://www.medicalnewstoday.com/articles
/cognitive-decline-predictors-besides-dementia〉. ’Accessed 16 April 2023.
Adhikari, S., et al. (2022). Exploiting linguistic information from Nepali transcripts for
early detection of Alzheimer’s disease using natural language processing and
machine learning techniques. International Journal of Human-computer Studies, 160,
Article 102761. https://doi.org/10.1016/j.ijhcs.2021.102761
Alzheimer’sAssociation. (2023). Alzheimer’s Disease Facts and Figures. Retrieved from
〈https://www.alz.org/alzheimers-dementia/facts-figures〉. ’Accessed 12 July 2023.
AlzheimersOrgUk. (2018). Over half of people fear dementia diagnosis, 62 per cent think
it means ’life is over’. Retrieved from 〈https://www.alzheimers.org.uk/news/201
8–05-29/over-half-people-fear-dementia-diagnosis-62-cent-think-it-means-life-ove
r〉. ’Accessed 12 March 2021.
AlzheimersOrgUk. (2020). Alzheimer’s Society comment on how coronavirus is affecting
dementia assessment and diagnosis. Retrieved from 〈https://www.alzheimers.org.
uk/news/2020–08-10/coronavirus-affecting-dementia-assessment-diagnosis〉.
’Accessed 9 March 2021.
AlzheimersResearchUk. (2020). Statstics about dementia - prevalence. Retrieved from
〈https://dementiastatistics.org/about-dementia/prevalence-and-incidence/〉.
’Accessed 12 July 2020.
AlzheimersResearchUkOrg. (2021). Alzheimer’s Research UK. Retrieved from
〈https://www.alzheimersresearchuk.org/research/〉. ’Accessed 20 September 2022.
AlzheimersResearchUkOrg. (2023). Think Brain Health Check-in. Retrieved from 〈https
://www.alzheimersresearchuk.org/brain-health/check-in/〉. ’Accessed 23 January
2023.
AphasiaTalkbank. (2021). AphasiaBank Example. Retrieved from 〈https://aphasia.talkb
ank.org/education/examples/〉. ’Accessed 12 February 2021.
BabylonHealthCom. (2021). Creating Better Health and Panel Discussion. Paper presented
at the AI Business Week Digital Symposium February 22–25 2021.
Bertini, F., et al. (2022). An automatic Alzheimer’s disease classifier based on
spontaneous spoken English. Computer Speech & Language, 72, Article 101298.
https://doi.org/10.1016/j.csl.2021.101298
Bohr, A., & Memarzadeh, K. (2020). Chapter 2 - the rise of artificial intelligence in
healthcare applications. In A. Bohr, & K. Memarzadeh (Eds.), Artificial Intelligence in
Healthcare (pp. 25–60). Academic Press. https://doi.org/10.1016/B978-0-12818438-7.00002-2.
Boletsis, C. (2020). A review of automated speech-based interaction for cognitive
screening. Multimodal Technologies and Interaction, 4(4), 93. https://doi.org/
10.3390/mti4040093
Borson, S., et al. (2003). The Mini-Cog as a screen for dementia: Validation in a
population-based sample. Journal of the American Geriatrics Society, 51(10),
1451–1454. https://doi.org/10.1046/j.1532-5415.2003.51465.x
Bresnan, J., et al. (1982). Cross-serial dependencies in Dutch. The Formal Complexity of
Natural Language, 33, 286–319.
Bucks, R. S., et al. (2000). Analysis of spontaneous, conversational speech in dementia of
Alzheimer type: Evaluation of an objective technique for analysing lexical
performance. Aphasiology, 14(1), 71–91. https://doi.org/10.1080/
026870300401603
Car, L. T., et al. (2020). Conversational agents in health care: Scoping review and
conceptual analysis. Journal of medical Internet research, 22(8), Article e17158. doi:
https://www.jmir.org/2020/8/e17158.
Cockrell, J. R., & Folstein, M. F. (2002). Mini-mental state examination. In Principles and
practice of geriatric psychiatry, 140–141. https://doi.org/10.1002/0470846410.ch27
(ii)
Dastani, M., & Yazdanpanah, V. (2023). Responsibility of AI systems. Ai & Society, 38(2),
843–852.
DementiaTalkbankOrg. (2017). TalkBank and DementiaBank. Retrieved from 〈htt
ps://dementia.talkbank.org/〉. ’Accessed 2 January 2021.
DementiaUK. (2021). Getting a diagnosis. Retrieved from 〈https://www.dementiauk.or
g/get-support/diagnosis-and-specialist-support/getting-a-diagnosis-of-dementia/〉.
’Accessed 12 June 2023.
Demir, E., et al. (2017). Smart home assistant for ambient assisted living of elderly
people with dementia. Procedia Computer Science, 113, 609–614. https://doi.org/
10.1016/j.procs.2017.08.302
Dik, S. (1991). Functional grammar. In Linguistic Theory and Grammatical Description (Vol.
75, pp. 247–274). John Benjamins Publishing Company.
EPRScUKRIOrg. (2021). Healthcare Technologies Grand Challenges. Retrieved from
〈https://www.ukri.org/what-we-do/our-main-funds-and-areas-of-support/browse
-our-areas-of-investment-and-support/healthcare-technologies-theme/〉. ’Accessed
15 December 2021.
Foltz, P. W., et al. (2022). Reflections on the nature of measurement in language-based
automated assessments of patients’ mental state and cognitive function.
Schizophrenia Research. https://doi.org/10.1016/j.schres.2022.07.011
Förstl, H., & Kurz, A. (1999). Clinical features of Alzheimer’s disease. European Archives
of Psychiatry and Clinical Neuroscience, 249, 288–290.
Guinn, C. I., & Habash, A. (2012). Language analysis of speakers with dementia of the
Alzheimer’s type. Paper presented at the 2012 AAAI Fall Symposium Series.
InnovationsInDementia. (2016). Making an Impact Together - Sharing the learning on
dementia activism from and across the DEEP network. Retrieved from The UK Network
of Dementia Voices 〈https://www.dementiavoices.org.uk/wp-content/uploads/
2016/11/Making-An-Impact-Together.pdf〉.
InnovationsInDementiaOrgUk. (2021). Learning about your cognitive state using
language and memory – a questionnaire. Retrieved from 〈https://www.dementiavo
ices.org.uk/deep-groups-news/learning-about-your-cognitive-state-using-languageand-memory-a-questionnaire/〉. ’Accessed 03 September 2021.
JAIN. (2021). Assisting people with memory loss. Retrieved from 〈https://www.jain
projects.com/〉. ’Accessed 12 July 2021.
15
K. Panesar and M.B. Pérez Cabello de Alba
Language and Health xxx (xxxx) xxx
Pendrill, L. (2018). Assuring measurement quality in person-centred healthcare.
Measurement Science and Technology, 29(3), Article 034003. https://doi.org/
10.1088/1361-6501/aa9cd2
Penfold, R. B., et al. (2022). Development of a machine learning model to predict mild
cognitive impairment using natural language processing in the absence of screening.
BMC Medical Informatics and Decision Making, 22(1), 1–13.
Pilnick, A., et al. (2021). Avoiding repair, maintaining face: Responding to hard-tointerpret talk from people living with dementia in the acute hospital. Social Science &
Medicine, 282, Article 114156. https://doi.org/10.1016/j.socscimed.2021.114156
Reisberg, B., et al. (1982). The Global Deterioration Scale for assessment of primary
degenerative dementia. The American journal of psychiatry. https://doi.org/10.1176/
ajp.139.9.1136
Roxby, P. (2023). Dementia: Brain check-up tool aims to cut risk at any age. Retrieved
from 〈https://www.bbc.co.uk/news/health-64308997〉. ’Accessed 18 January 2023.
Searle, J. R. (1969). Speech Acts: An Essay in the Philosophy of Language (Vol. 626).
Cambridge: Cambridge University Press.
Taylor, N. (2019). Duke Report Identifies Barriers to Adoption of AI Healthcare Systems.
Retrieved from 〈https://www.medtechdive.com/news/duke-report-identifies
-barriers-to-adoption-of-ai-healthcare-systems/546739/〉. ’Accessed 1 November
2021.
TheAdultsSeechTherapyWorkbook.com. (2022). THE ADULT SPEECH THERAPY
WORKBOOK - Everything you need to assess, treat, and document. Retrieved from
〈https://theadultspeechtherapyworkbook.com/speech-therapy-memory-activitiesfor-adults/〉. ’Accessed 1 July 2022.
Thompson, I. (1987). Language in dementia: I. A review. International Journal of Geriatric
Psychiatry. https://doi.org/10.1002/gps.930020304
Van Valin, R. D., Jr (2000). A concise introduction to role and reference grammar.
FLUMINENSIA: časopis za filološka istraživanja, 12(1–2), 47–78.
Van Valin, R. D., Jr (2005a). Exploring the syntax-semantics interface. Cambridge:
Cambridge Univ Press.
Van Valin Jr, R.D. (2005b). A summary of Role and reference Grammar. Role and
Reference Grammar Web Page, University of Buffalo .
Verizon. (2023). Do LLMs really understand human language? Verizon experts offer a
critical perspective on language understanding by large language models. Retrieved
from https://inform.tmforum.org/features-and-opinion/do-llms-really-understand-h
uman-language/. ’Accessed 1 June 2023’
WorldAlzReport2015Org. (2015). Prevalence of dementia around the world, along with
forecasts for 2030 and 2050. In 〈https://www.researchgate.net/figure/Prevalence-o
f-dementia-around-the-world-along-with-forecasts-for-2030-and-2050_fig1_33880
1466〉 (Ed.). Research Gate.
Yeung, A., et al. (2021). Correlating natural language processing and automated speech
analysis with clinician assessment to quantify speech-language changes in mild
cognitive impairment and Alzheimer’s dementia. Alzheimer’s Research & therapy, 13
(1), 109.
16