Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Natural language processing-driven framework for the early detection of language and cognitive decline

2023, Journal of Language and Health

Natural Language Processing (NLP) technology has the potential to provide a non-invasive, cost-effective method using a timely intervention for detecting early-stage language and cognitive decline in individuals concerned about their memory. The proposed pre-screening language and cognition assessment model (PST-LCAM) is based on the functional linguistic model Role and Reference Grammar (RRG) to analyse and represent the structure and meaning of utterances, via a set of language production and cognition parameters. The model is trained on a Dementia TalkBank dataset with markers of cognitive decline aligned to the global deterioration scale (GDS). A hybrid approach of qualitative linguistic analysis and assessment is applied, which includes the mapping of participants´tasks of speech utterances and words to RRG phenomena. It uses a metric-based scoring with resulting quantitative scores and qualitative indicators as pre-screening results. This model is to be deployed in a user-centred conversational assessment platform.

Language and Health xxx (xxxx) xxx Contents lists available at ScienceDirect Language and Health journal homepage: www.sciencedirect.com/journal/language-and-health Natural language processing-driven framework for the early detection of language and cognitive decline Kulvinder Panesar a, María Beatriz Pérez Cabello de Alba b, * a b University of Bradford, Bradford, UK Universidad Nacional de Educación a Distancia, Madrid, Spain A R T I C L E I N F O A B S T R A C T Keywords: Language production Memory concerns Pre-screening model Role and reference grammar Speech assessment Natural language processing Natural Language Processing (NLP) technology has the potential to provide a non-invasive, cost-effective method using a timely intervention for detecting early-stage language and cognitive decline in individuals concerned about their memory. The proposed pre-screening language and cognition assessment model (PST-LCAM) is based on the functional linguistic model Role and Reference Grammar (RRG) to analyse and represent the structure and meaning of utterances, via a set of language production and cognition parameters. The model is trained on a Dementia TalkBank dataset with markers of cognitive decline aligned to the global deterioration scale (GDS). A hybrid approach of qualitative linguistic analysis and assessment is applied, which includes the mapping of participants´ tasks of speech utterances and words to RRG phenomena. It uses a metric-based scoring with resulting quantitative scores and qualitative indicators as pre-screening results. This model is to be deployed in a user-centred conversational assessment platform. 1. Introduction There is continuing research for a timely diagnosis for early detection of cognitive decline to help reduce the dementia rates, and to provide the best treatment, support and plans promptly (Adhikari et al., 2022; DementiaUK, 2021). Dementia is a complex and progressive neurological disorder that leads to a decline in cognitive abilities such as language, visuospatial skills, memory, judgment, and mental agility. Our focus is the language and cognitive impairments and semantic memory deficits which may affect understanding speech production in everyday life (McKhann et al., 1984). Each affected person will experience dementia symptoms differently. The symptoms are categorised differently by several authors such as Förstl and Kurz (1999), and our focus is the global deterioration scale (GDS) and stages (Stage 1 - no cognitive decline to Stage 7 – very severe cognitive decline) (Reisberg et al., 1982). Under the umbrella of dementia individuals will be characterised by as being afflicted with a loss in cognitive and communicative functionality (Bucks et al., 2000). This observation is seen in 88–95% of people as noted by Thompson (1987) demonstrating some degree of aphasia (language disability) and cognitive failure including the inability to grasp concepts, events of their past, or the ability to recognize individuals (Guinn & Habash, 2012). These mild impairments could be from one or more word level, sentence level, and discourse level features as stated in the AphasiaTalkbank (2021) and is expanded in the next section. The evidence of impairments can be seen when a person is experiencing notable changes in their short-term memory and forgetting things such as an immediate task to do such as ‘to turn off the cooker’ which may lead to health implications. Further work colleagues may have detected mistakes in their performance over a period, and they have flagged this, leaving the individual with a feeling of despair and anxiety. In another situation the individual may demonstrate confusion over location, time, and activity requirements (such as taking medicine/switching off cooking/ heating/electrical appliances) which may lead to a personal risk and incurred risks to the people around them. These real-life person-centered scenarios could be critical and manifested by people with undiagnosed cognitive decline. Hence, cognitive assessment is a critical clinical diagnostic tool for neurodegenerative diseases, especially AD and one of the most valuable predictor of its further progression [NIA 2018]. A series of cognitive tests are used to diagnose any cognitve impariment. We will invoke the tasks of the Mini Mental State Examination (MMSE) (Cockrell & Folstein, 2002). As noted above, individuals may have a range of symptoms underpinned by communication, health, employment, risk triggers and most often evidenced in a regular conversation. Further, other people such as family, * Corresponding author. E-mail address: bperez-cabello@flog.uned.es (M.B. Pérez Cabello de Alba). https://doi.org/10.1016/j.laheal.2023.09.002 Received 26 May 2023; Received in revised form 8 August 2023; Accepted 20 September 2023 2949-9038/© 2023 The Authors. Published by Elsevier B.V. on behalf of Shandong University. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Please cite this article as: Kulvinder https://doi.org/10.1016/j.laheal.2023.09.002 Panesar, María Beatriz Pérez Cabello de Alba, Language and Health, K. Panesar and M.B. Pérez Cabello de Alba Language and Health xxx (xxxx) xxx friends or acquaintances may identify some mild impairment with speech or a feeling of something going wrong and recommend the need to investigate. This investigative task is the major motivation for our work. The goal is to embed a pre-screening trained language and cognition assessment model (PST-LCAM) as an intervention into a conversational agent interface as an application for investigating the early detection of language and cognitive decline. The second motivation for exploring a pre-screening model is as a response to two out of four grand challenges of healthcare technologies that UKRI had proposed in 2021 (EPRScUKRIOrg, 2021). Here, our pre-screening model involves: (1) a new method of recognising abnormal data patterns of spoken utterances. (2) We investigate the technique of reducing the progression of the disease – by identification of at-risk individuals through early pre-screening. The third motivation for a pre-screening tool is derived from a commercial/technical need acknowledged at various events such as the AI Business Digital Symposium via a healthcare panel discussion. Here they unanimously agreed a need for a proactive evidence-based approach that will engage patients, monitor their health journey and measure outcomes - rather than a reactive mindset to diagnosis (BabylonHealthCom, 2021). This message is shared by wider healthcare sector and professional community. Our motivation and investigative work also concurs with a recent BBC article stating that a brain check-up tool has the potential to cut risk at any age (AlzheimersResearchUkOrg, 2023; Roxby, 2023). Focusing on the participant’s language production and assessment by both manual and computational approaches where understanding of what is said (meaning) is critical for communication. Natural language processing (NLP) is an overall term for how computers, interpret, understand, and use human language. Our pre-screening model will be NLP driven. NLP is defined ‘as a theoretically motivated range of computational techniques for analysing and representing natural occurring texts at one or more levels of linguistic analysis for the purpose of achieving human-like language processing for a range of tasks or application’ (Liddy, 2001). NLP involves natural language understanding (NLU), that is, enabling computers to derive meaning from human or natural language input. NLU is challenged as: (a) humans make mistakes (b) human speech requires context such as to ask “how was lunch?” and receive a reply “I spend the entire time waiting at the doctor” is clear to you (lunch was bad) but not necessarily to a computer trained to search for negative words (no, not for example). (c) human language is irregular due to variances in the same language – American English vs British English etc, and this can lead to a lack of context, spelling errors, or dialectal differences. (d) language has ambiguity which means it could be understood in two or more possible senses or ways. It can be of several types - morphological, lexical, syntactic, semantic dependency ambiguity, referential ambiguity, scope ambiguity, and pragmatic ambiguity (Maurya, Gupta, & Choudhary, 2015). Further NLP and NLU systems require knowledge on the domain, use case and about the special nuances of how the language is expressed such as: (a) different word – same meaning; (b) different grammar – same meaning; (c) different expression – same meaning; (d) same word – different context (Linguamatics, 2021). From a technical development perspective, a review on conversational agents (CAs) in healthcare by Car et al. (2020) identified their infancy, and proposed a robust investigation into their potential diagnosis use rather than just health service support. Our proposal supports a proactive strategy and potential of CA intervention to aid diagnosis based on a range of tasks. Statistically, the NHS state there are over 850,000 people in the UK affected by dementia, of which 7% are over the age of 65 and 17% over 80. It is estimated that more than one million people will have dementia by 2030, and this will increase to more than 1.6 million by 2050 (AlzheimersResearchUk, 2020). Alzheimer’s disease (AD) is a chronic progressive neurodegenerative disease that affects more than 35 million people worldwide, and this similar prevalence as in the UK, identifies that this number is expected to triple worldwide by 2050 (WorldAlzReport2015Org, 2015). As for the care perspective, over 11 million Americans provide unpaid care for people with Alzheimer’s or other dementias. In 2022, unpaid caregivers provided an estimated 18 billion hours of care valued at $339.5 billion. From a pre-screening view, only 4 in 10 Americans would talk to their doctor right away when experiencing early memory or cognitive loss, and 7 in 10 Americans would want to know early if they have AD if it could allow for earlier treatment (Alzheimer’sAssociation, 2023). From a social perspective, a study by the Alzheimer’s Society (AS) stated that 56% of patients, must wait for up to a year before getting help, because they feel afraid of their condition (AlzheimersOrgUk, 2018). Also in the UK, under the Equality Act 2010, a person who is living with dementia is recognised as having a disability (a protected characteristic) necessitating a person-centred care approach with patient safety. From a psychological and health perspective the pre-screening tool is grounded by COM-B (capability, opportunity, motivation) model (Michie, Atkins, & West, 2014). Here the underlying social problem is related to concerns about memory, and the pre-screening tool facilitates the need for a target behaviour change and the outcomes from our model will provide indicators of what needs to change. This aspect of participant change is outside the remit of this paper. As highlighted above the importance of the model is derived foremost from a social problem. As AlzheimersOrgUk (2020) reported during the pandemic “referral numbers are increasing; a sustained and proactive effort must be made to support access to timely diagnosis” (AlzheimersOrgUk, 2020). To take a small step in pro-active early detection, our CA intervention will provide indicators of any potential issues with language production and cognition and help to support any patient-centred care plans (Mannonen, Kaipio, & Nieminen, 2017). Linguistically, our model will use Role and Reference Grammar (RRG)´s functional model (Van Valin, 2005a, 2005b) for a grammatical analysis, and use an ontology for a cognitive assessment, to ascertain the symptomatic changes of language production in people. RRG can adequately explain, describe, and embed the communication-cognitive function in conversation, in a computational form. RRG enables language to be comprehended and produced, to gain a deep understanding and interfacing with knowledge and provide logical representations of the utterance (Van Valin, 2005a, 2005b). An ontology/knowledge base/corpus will help us assess the linguistic capacity of a subject to locate categories in a cognitive dimension and to produce instances of a given category. For example, to list words related to transportation. In this way, by analysing the oral production of participants with an ontology, we will be able to (1) check which attributes of a given category are present and which ones are missing, and (2) establish enriched conceptual networks which would reflect the hierarchical chain found in the production of a category. This information will be useful as part of the pre-diagnosis of cognitive decline. We deploy RRG’s functional model to analyse the utterances for the lexical and grammatical complexity, word order, and represent their structure and meaning. The utterances are sourced from the Dementia TalkBank (MacWhinney, 2017)conversation dataset and associated audio/video transcripts of participants conducting various tasks with an investigator. The tasks were linguistically mapped to language and cognition parameters - lexical, syntax, semantic, discourse, and pragmatic (Ntracha et al., 2020). Further factors we assess are ontology which constitutes the lexicon and parameters of word production, contextual new word, repetition (Echolalia), involuntary words (Palilalia), retaining of your language, speech pauses, timings, and interruptions. With respect to cognitive issues, RRG adopts the criterion of psychological adequacy formulated in Dik (1991), which states that a theory should be "compatible with the results of psycholinguistic research on the acquisition, processing, production, interpretation and memorization of linguistic expressions". It also accepts the related criterion put forth in Bresnan et al. (1982) that theories of linguistic structure should be directly relatable to testable theories of language production and comprehension (Van Valin, 2000, p. 48). Henceforth, 2 K. Panesar and M.B. Pérez Cabello de Alba Language and Health xxx (xxxx) xxx psycholinguistic and cognitive adequacy (PCA) refers to psychological structures, principles and strategies which determine the way in which the linguistic expressions are acquired, generated, understood, processed, produced, interpreted, and stored in our mind (Mairal and Pérez, 2019). In summary the aim of the pre-screening language and cognition assessment model (PST-LCAM) will be to assess speech, language and cognition and present results with indicators that can be further validated with a clinical professional. The concept of PST-LCAM is that it utilises the Dementia TalkBank dataset for the training and testing of utterances and assessment. The linguistic mapping of utterances for assessment tasks will invoke PCA and linguistic assessment will be achieved by Role and Reference Grammar to ascertain structure and meaning of an utterance. Further the concept involves the NLP challenges, conversational practices and clinical processes identified and addressed as part of the assessment, assessment analysis and outcome. The design of the PST-LCAM comprises: (i) Devise the language and cognition tasks (LC) as per two strands (MMSE tasks) and interview task using the Dementia TalkBank dataset; (ii) create a RRG Mapping for LC tasks, scoring and GDS based matrix (iii) conduct the task and strand assessments; (iv) conduct the strand scoring, and the merging of the strand analysis; (v) to present the participant dashboard. Our research hypothesis for this work is that participants are placed on the Global Deterioration Scale (GDS) score, and our RRG based language production and cognition assessment provides a positive correlation and presents similar indicators. Our research questions are: The focus of this paper is to present the model, its underlying principles and concepts, dataset use, model design, and final analysis. Section II gives a summary of literature review and related works. Section III describes the methodology to develop the model and presents the linguistic mapping for model experiment and participant tasks. Section IV discusses the results and includes the analysis methods, results from statistical analysis and summative outcomes for participants and model itself. Section V discusses the advantages of the PST-LCAM model and its future directions and finally Section VI presents some concluding remarks. 2. Literature review The literature view and related works section will discuss the data sources, the nature of mild impairments, underpinning technology and opportunities for neurological problems, person-centred speech and memory problems, and the consideration of the range of diagnostic assessment for cognitive decline. 2.1. Dementia datasets Dementia TalkBank is part of DementiaBank, one of the largest available datasets of audio recordings and transcripts and are selected for our work. It is convenient, 24/7 availability and accessible via membership and permissions granted by the University of Edinburgh (DementiaTalkbankOrg, 2017). MacWhinney (2000) manually transcribed the recordings using the CHAT (Codes for the Human Analysis of Transcripts) protocol. It is a shared database of multimedia interactions for the study of communication in dementia since 2007, using heterogeneous sources, specialists and has prestige worldwide. Dementia TalkBank (MacWhinney, 2017) continues to be used for extensive research projects and teaching goals, and has led the way forward to research projects such as easy indicators of dementia (Padhee et al., 2020). Other dementia conversation dataset research include: (1) conversational profiling of video and audio recordings of personal information and working memory (Jones et al., 2016); (2) conversational analysis of the pause to speech ratio and measures of linguistic complexity (O’Malley et al., 2021; O’Malley et al., 2020). Other dementia research is explored with open access data repositories for dementia (Miah et al., 2021, p. 98) (AlzheimersResearchUkOrg, 2021) and via various other project initiatives such as (JAIN, 2021) but not limited to. 1. What is the relationship between the concepts of language, cognition, and speech production of participant’s task utterances and RRG’s functional model and the linguistic phenomenon of psychological adequacy? How will these concepts be mapped? 2. How will the mapping be implemented into a model for the language and cognition assessment with summative outcomes linked to the global deterioration scale? 3. What are the challenges that need to be considered in terms of computational natural language processing of speech, language production and cognition assessment? 4. How will the PST-LCAM consider individual variables and performance to complete the language and cognition assessment and how will this be evaluated? The novelty of this work lies in the introduction of deep linguistic analysis in a pre-screening trained language and cognition assessment model (PST-LCAM) for people concerned about their memory. The main contributions of the paper are: 2.2. Mild impairments related to aphasia As noted in the introduction the mild impairments could be from one or more word level, sentence level, and discourse level features (AphasiaTalkbank, 2021). These mild impairments are discussed here in detail covering both definition and example as they are central to the approach used in the participant task analysis. Word level features include anomia, circumlocution, conduit d′approche, jargon, neologism, perseveration, phonemic paraphasia and semantic paraphasia. Anomia is word finding problems manifested via long pauses, word fragments, fillers, trailed-off, unfinished utterances, sighs, and other signs of frustration. Whereas circumlocution refers indirect, roundabout language to describe a word or concept, e.g., “that thing to cut my fish and chips”. Conduit d′approche is successive attempts at a target word. These attempts approximate the target phonetically and final production may or not be successful, e.g., uttering ‘ife’, ‘knif’ for “knife”. Another key word level feature is jargon which refers to fluent, prosodically correct output, resembling English syntax and inflection, but containing largely meaningless speech, sometimes intelligible and can be transcribed, sometimes it is unintelligible. Neologism is another word level feature which is a non-word substitution for a target word, usually with less than 50% overlap of phonemes between error and targe, and the target word may be known or unknown. Perseveration – is the repetition of a previously A. A novel linguistic mapping is presented which is based on the underpinning theoretical stance of psycholinguistic and cognitive adequacy (PCA) of RRG. This mapping of the Dementia TalkBank conversation dataset and participant tasks are analysed to understand the psycholinguistic and cognitive determinants for dementia. B. A manually trained pre-screening model for the language and cognition assessment is presented. This is based on two strands of participant tasks using a PCA based linguistic mapping of 22 language and cognition parameters, model statistical scoring with qualitative and quantitative indicators and a summative participant GDS score demonstrating consistency with the clinical investigator’s outcome. C. An intensive statistical evaluation model involves (i) utterances analysis; (ii) speech disfluencies; (iii) parameter assessment scoring; (iv) utterance grouping analysis; (v) normalisation of task scoring against each strand and the complete scoring results; (vi) tuning of results based on inclusions, exclusions set (non-linguistic/visual cues); (vii) consideration of the types of utterances (viii) flagging of the completion of tasks; (ix) lexical and cognitive significance (such as acknowledgements); (x) completion of tasks and final assessment. 3 K. Panesar and M.B. Pérez Cabello de Alba Language and Health xxx (xxxx) xxx used word or phrase that is no longer appropriate to the context. Phonemic paraphasia is the substitution, insertion, deletion, or transposition of phonemes (usually with at least 50% overlap of phonemes between error production and target, but definitions differ. Here error production may be a word or non-word and the error may or may not be self-corrected. Semantic paraphasia is the substitution of a real word for a target word. The error may be related or unrelated to the target and it may or may not be self-corrected. Stereotypy is the repetition of a syllable, word, or phrase frequently throughout the sample and they may be words or non-words. Sentence-level features include: Agrammatism where speech is reduced in length and/or complexity and function words and morphemes may be missing. For example, uttering ‘tree’, ‘dog’ without ‘the’. Empty speech is speech that contains general, vague, unspecific referents but is semantically and syntactically intact. For example, for retelling a telephone conversation and including non-relevant ideas. At discourse-level we find interactions demonstrating successful communication despite language filled with neologisms and jargon (Wernicke’s aphasia) and very limited language output (Broca’s aphasia). manipulations and AI. We are aiming at a pre-diagnosis of apparently healthy subjects who may demonstrate language and cognitive decline, by assessing them through automatic speech recognition, considering grammatical, phonological, and cognitive indicators that will help to assess any signs of cognitive impairment invoking Reisberg´s global deterioration scale. 2.4. Technology drivers, progress of diagnosis and NLP driven models The rise of AI technologies, including machine learning, deep learning, natural language processing (NLP), smart robots, and conversational agents, has undoubtedly had a significant impact on various aspects of daily life, particularly in the healthcare sector. There is an optimistic outlook that AI-based solutions can greatly enhance healthcare by augmenting the decision-making process of doctors, from diagnostics to treatment, leading to significant improvements in various healthcare areas (Bohr & Memarzadeh, 2020; Lee & Yoon, 2021). This potential for transformative innovation and attention is shared by researchers, physicians, technology, and program developers with enormous investment in AI-related technologies with substantial annual savings in healthcare. Focusing on diagnostic assistance of certain diseases such as cancer, eye or paediatric, it was reported by Taylor (2019) from a report from the National Academy of Science, Engineering and Medicine that diagnostic errors accounted for 60% of all medical errors, and unfortunately accounting for 40,000–80,000 deaths a year in US hospitals, attributed to human judgement. Using AI-based technologies for nursing patient via chatbots have been effective for engaging in conversation with patients and family member in hospital (Palanica et al., 2019) They further indicated as part of the physicians’ perception of chatbots in healthcare via cross-sectional web-based survey that healthcare chatbots could be at risk of patient to self-diagnose to often (74%) and whom may not be able to fully understand the diagnoses outputs. This is an important consideration for our development. From an NLP perspective, language understanding of AI systems has improved rapidly since 2020 with the use of large language models (LLM) and the recency of generative AI technologies in 2023 and transformation of large complex data sources into useable information. Despite its massive opportunities in the healthcare domain, it has been slow to adopt it with varying reduced maturity levels in the life science market, patient-facing, clinical-facing (diagnosis), admin and analytics and AI (Norden, Wang, & Bhattacharyya, 2023). Technically, the AI technologies are able to present great accurate results of higher than human performance on numerous benchmarks such as the 99th percentile on the Biology Olympiad and demonstrate advanced reasoning capabilities (OpenAI, 2023). This is further supported by linguistics researchers who have in the recent years, primarily described the performance of specific language models in their ability to accomplish several sophisticated tasks such as question answering, content summarization, sentence prediction, and so on. However, it leads to the question of is it cognition: understanding vs. simulating (Verizon, 2023). Subsequently, diagnostics of identifying diseases is challenging in term of human-centric individual variations and potential ethical harm from automated diagnosis. There is a critical need of a granular understanding of language and meaning of speech production with speech disfluencies for the pre-screening assessment of cognitive decline with aspects of explainability of the results. For this critical reason a clinician with judgement ability is required as a human in the loop concept during/after) the outcome (Mosqueira-Rey et al., 2023). This is necessary for any AI/hybrid linguistic based intervention and adhering to responsible and ethical alignment (Dastani & Yazdanpanah, 2023). Hence, it will be currently used as a decision support or analytical/alerting aids (Norden, Wang, & Bhattacharyya, 2023). Our rationale for PST-LCAM model thinking is based upon a robust, optimized, validated system that has a prime functionality of understanding, analysing, and assessing utterances, both at an individual, task, strand, and holistic level. This will involve automatic speech 2.3. Diagnostic assessment for cognitive decline Traditionally, manually administered cognitive tests have been used to help measure mental functions such as memory and language, among others. The most frequently used cognitive tests for orientation, memory, attention, concentration, naming, repetition, writing and comprehension, are the Mini-Mental State Examination (MMSE) (Cockrell & Folstein, 2002) the Montreal Cognitive Assessment (MoCA) (Nasreddine et al., 2005); the Mini Cognitive Assessment (Mini-Cog) (Borson et al., 2003) and the Boston Naming Test (Kaplan, Goodglass, & Weintraub, 2001). Language has gained growing interest in cognitive screening and in particular the analysis of speech production due to its inexpensive and ecological approach to identify changes in cognitive function (Bertini et al., 2022). Nevertheless, they state this approach requires manual activities such as transcription, annotation and correction which may result in a biased outcome. In recent years, some attempts to automate cognitive assessment have been made. Cognospeak (O’Malley et al., 2021; O’Malley et al., 2020) is a fully automated tool based on automatic speech recognition to classify participants/patients in mild cognitive impairment (MCI,) Alzheimer’s Disease (AD), functional cognitive disorder (FCD) and healthy controls (HC) via automatic speech recognition and diarisation. O’Malley et al. (2020) used their automated cognitive assessment tool to explore whether responses to questions which examine recent and remote memory could help in distinguishing between patients with early neurodegenerative disorders and those with Functional Cognitive Disorders (FCD), who have non-progressive cognitive complaints. The findings were that the application of linguistic measures of differences in pause to speech ratio, and measures of linguistic complexity did not help with that task but, nevertheless, it could distinguish patients with Mild Cognitive Impairment (MCI) and Alzheimer´s Disease (AD) from healthy controls. As a result, the authors stated the utility of incorporating additional measures of lexical and grammatical complexity (word frequency, sentence structure). Their follow-up work O’Malley et al. (2021) used a fully automated version of their tool which automatically analyses the audio and speech data which involved speaker segmentation, automatic speech recognition and ML classification. Here their tool could distinguish between participants in the AD or MCI groups and those in the FCD or healthy control groups with levels of accuracy comparable to manually administered assessments. Nevertheless, they state that greater accuracy should be achievable through further system training with a greater number of users, the inclusion of verbal fluency tasks and repeat assessments. After reviewing O’Malley et al. (2020) and (O’Malley et al., 2021) our model takes on a different approach to address similar improvement needs by deploying participant tasks from traditional cognitive tests, automatic speech recognition, RRG 4 K. Panesar and M.B. Pérez Cabello de Alba Language and Health xxx (xxxx) xxx recognition (ASR) using deep learning models, computational linguistics (RRG) and behaviours of adopting user centred design and experience, person-centred approach, ethical alignment, conversation design, language and speech analysis, and dialogue management strategy. Our embedded model PST-LCAM for pre-screening will be different as it is based on hybridisation of model preparation involving ASR, lean machine learning methods with major Role and Reference Grammar (RRG) based manipulation and grammatical testing to support a pre-screening. Our goal is for the early detection of cognitive decline and our methods are based on a linguistic phenomenon – PCA to map language production and cognition tasks to levels of language, cognitive ontological information, and speech production variables. Further, we develop a novel language and cognition assessment addressing memory concerns with resulting qualitative and quantitative indicators of possible cognitive issues. This is explained in the next section. The data, analysis and assessment tool are available at source with further details found in Appendix F. Other related works entail using NLP deep learning pre-trained models on large corpus of speech transcripts which can be instrumental in learning the pattern of speech narratives as in the case of speech production of Alzheimer’s disease (Adhikari et al., 2022). Similarly, “CognoSpeak’ is a fully automated system with the analyses audio and speech data which involves speaker segmentation, automatic speech recognition (ASR) and machine learning classification for the diagnosis of AD, Mild Cognitive Impairment (MCI), Functional Memory Disorder and healthy controls (O’Malley et al., 2021) and other approaches such as the NLP annotation tool (Noori et al., 2022), NL user interface (Ntracha et al., 2020) and AI-based semantic measurement model (Foltz et al., 2022; Penfold et al., 2022). Similar works are from Yeung et al. (2021), with the goal for the early identification of markers. Here, they analyse variables extracted through NLP and automated speech analysis with correlation to language impairments identified by a clinician. In summary, these related works focus on AI-enabled results with post clinician decision making, while our PST-LCAM focuses on the understanding and meaning that are required for a thorough language and cognition assessment. memory. An overview of the proposed framework is shown in Fig. 1. This model is underpinned by an experiment to test a hypothesis against participants from the Dementia TalkBank, who are placed on the Global Deterioration Scale (GDS). Our proposed RRG based language production and cognition assessment provides a positive correlation and presents similar indicators and results of the investigators. Our model is based on a language and cognitive assessment invoked from the MMSE (Mini-mental State Examination) and refers to the Global Deterioration Scale (GDS) with 7 stages and indicators as: (1) No cognitive decline, (2) very mild cognitive decline, (3) mild cognitive decline, (4) moderate cognitive decline, (5) moderate severe cognitive decline, (6) severe cognitive decline, (7) very severe cognitive decline. (Reisberg et al., 1982). This methodology was achieved via three stages: (1) using the underpinning theoretical basis of psycholinguistic and cognitive adequacy (PCA) of Role and Reference Grammar (RRG) to map the speech production to the language and cognition parameters. We used RRG’s levels of language – lexical, syntax, semantic, discourse and pragmatic (Van Valin, 2005a, 2005b), and annotations of speech disfluencies found in CLAN (Computerized Language ANalysis) manual (MacWhinney, 2017, 2019, 2021). (2) Experiment with the participant data based on the Dementia TalkBank’s tasks with associated transcripts, utterances, and speech. (3) Model assessment stages involving selection, utterance assessment, normalisation for language and cognition scoring; normalisation for the whole experiment and scoring; and finally, assessment and measurement against the Global Deterioration Scale (GDS). The proof of concept is a trained and tested model based on a spoken corpus – Dementia TalkBank and language and cognition assessment with indicators of language production and cognition impairment. PST-LCAM experiment uses a mixed method approach of qualitative linguistics analysis and assessment with resulting quantitative scores and qualitative recommendations. We have devised our own prescreening assessment composed of two strand assessments. Strand 1 Cognitive and linguistic assessment (St1-CLA) with 4 participant tasks and Strand 2 – Interview analysis assessment (St2-IAA). This was inspired by TalkBank (DementiaTalkbankOrg, 2017) a shared database and platform for the study of communication in dementia, and human spoken communication with associated tools, manuals and transcripts (MacWhinney, 2017) (MacWhinney, 2021). Our dataset is based on a range of investigators (Holland, Kempler, Lanzi, PPA, and Pitt) and their participant group and their specific techniques used. We started with an initial set of 6 participants (transcripts and audio) with GDS scores ranging from band score 3–7, who had completed both assessment 3. Methodology 3.1. Methods The proposed work is based on the investigation of a hypothesis and experimentation of PST-LCAM for people concerned about their Fig. 1. Pre-screening language and cognition assessment model (PST-LCAM) Framework. 5 K. Panesar and M.B. Pérez Cabello de Alba Language and Health xxx (xxxx) xxx strands and there was significant and reliable information available with the definitive investigator’s clinical decisions. Due to some inconsistencies with the corpus, only 4 participants were considered for the experiment, as shown in Appendix C. Strand 1 has four cognitive and linguistic assessment tasks. The investigator is assessing the participant on a range of tasks via conversational style. In Task 1, the investigator will start off with telling a story and the participant will have to repeat the story immediately after. The story goes - “While a lady was shopping her wallet fell out of her purse, but she did not see it fall. When she got to the checkout counter, she had no way to pay for her groceries. So, she put the groceries away and went home. Just as she opened the door to her house the phone rang, and a little girl told her that she had found her wallet. The lady was very relieved”. Task 2 is a generative naming task where the participant has to name things related to transportation within one minute. In Task 3, the investigator will ask the participant to retell the story related to Task 1. In Task 4, the investigator will present a picture to the participant, and they will need to describe what is going on in the picture in Fig. 2. An interpretation is “A man (Father) is reading a newspaper whilst, his wife and family are ready to go to Church. There is no communication between the woman (Mother) and children with the man (Father)”). In a separate session the investigator will conduct Strand 2, an interview with the participant. Here the narrative is ‘tell me something about your family’, ‘tell me about your job’, ‘tell me about Little Red Riding Hood’, Goldilocks and 3 bears; tell me about your country”. Our test strand was the topic ‘tell me about your family’. Exploring the strands from a lexical-semantic perspective and variability in cognitive behaviour will provide an effective predicator of cognitive status. As Ostrand and Gunstad (2021) suggest story-retelling task imposes higher memory demand than the picture description task as the participant must recall the events without external memory support. They need to draw on both the semantic memory to retrieve appropriate words and the episodic memory for the story and the use of attentional and executive function controls for the passage of the story. analysed for their language production and cognition links as part of the task. For instance, picking keywords, memory recall, velocity of retrieval, making references to people, objects, location, time, manner and mental lexicon, and reference to knowledge, relationship between entities, understanding the questions, confirmation/acknowledge of question, fluency of response, and number of responses, number of pauses, length of pauses (short, long, and very long). Step 3 – for each participant task and attributes they were considered firstly for their mapping to language production and cognition aspects and secondly to assess for their RRG relevance in terms of cognition and the lexicon (picking out words and word production). Further considerations include: (i) are there any semantic macroroles (actors and undergoers), this will help to understand for example the relationships between the family members?; (ii) Semantic – is there meaning in the response – identified by the Logical structure of the clause (LSc) (Van Valin, 2005a, 2005b)?; (iii) is there a lexical representation to describe the linguistic phenomena e.g. static, active accomplishments for example, buying and eaten?; (iv) is there syntax i.e., a structured explanation of the answer (Subject-Verb-Object) following the layered structure of the clause (LSC)?; (v) is there discourse – linking of the answer to the question/previous utterance/event?; (vi) is there a pragmatic context of the event? Fig. 3 outlines the Model experiment 1 for Task 1– Story Telling Mapping Attributes to RRG, which identified the Steps 1–3 as the mapping framework to support Stage 2 and the quantitative and qualitative assessment. Further Step 4 is regarding the actual speech production for Task 1 which requires a mapping to the linguistic expression considering pauses and timing using a speech protocol based on speech disfluencies. The remaining participant tasks in Strand 1 consisting of tasks 2–4 and Strand 2- interview analysis assessment, their mappings (using Steps 1–3) are found in Appendices A.1, A.2 and A.3 reflecting Task 2, Task 3, and Strand 2 Interview Task respectively. All the tasks use the same speech protocol as in Step 4 above. Task 4 is the retelling task and hence the same analysis as Task 1- story telling but accounted for differently as discussed in the later assessment adjustments. In Stage 2, the utterance assessment will be specified by a range of respective task related language production, cognition and speech parameters, and indicators (linguistic markers) and corresponding scores. Here the language production parameters include lexical (diversity, content), syntax (clause SVO, noun SVO), semantic (repetition, pronoun frequency), discourse (reference, context, speech act, change of anaphoric resolution (AR), correctness of AR, pragmatic (focus). On the other hand, language cognition parameters contain ontology (ontology placement), lexicon (word production, contextual new word, repetition (echolalia), involuntary (palilalia), retain language). Speech parameters comprise the nature of pauses, number of pauses and timing of pauses which are impactful as speech disfluencies. Pauses can be short (.), long (.) and a very long pause (…). Interruptions are considered as pauses and exist at word, sentence, and discourse levels of communication, and thus may change the grammar. For example, in the sentence, ‘so it’s [//] must be the Sunday newspaper’, (//) would be considered as a pause as prescribed and adapted from CLAN MacWhinney (2021) found in the table in Appendix B which identifies speech notation, phonological fragments, speech disfluency, and our qualitative scoring indicators. The indicators were selected on the premise of the assessment parameters and the nature of correctness. Indicators included, complete, correct, fluent, new, yes, lower, default, same reduced, some, incomplete, upper partial, poor, no, and N/A. As highlighted, the experiment is to create a model based on a dataset from the Dementia TalkBank, constituting participant tasks mapped to RRG phenomena and corresponding questions addressing 5 aspects of language production found in Tables 1 and 2 aspects of language cognition and 2 speech parameters found in Table 2. This has an options and point scoring system. For example, a language production parameter such as lexical diversity – which refers to different lexical words used by the participant – is assessed with 4 possible options with 3.2. Experiments Focusing on the lexical-semantic work and developing the PSTLCAM, Role and Reference Grammar (RRG)’s conceptual use of PCA is invoked via a two-stage process. Stage 1 is to create a linguistic mapping of the strands and tasks and Stage 2 is the individual utterance assessment per task. For Stage 1, the first process requires mapping to RRG concepts, this is achieved via Step 1 - each task was analysed in terms of its specific and relevant attributes such as keyword recognition, events and timing, people, objects, “who did what to whom” and including a question-andanswer requirement. Step 2 – each attribute or grouped attributes were Fig. 2. Picture Naming Task ("nhs.uk", 2019) from the MMSE. 6 K. Panesar and M.B. Pérez Cabello de Alba Language and Health xxx (xxxx) xxx Step 2 - Language Produc�on and Cogni�on Links Step 1 - A�ributes Events (start, middle and end) Re-telling a story require sta�ng a series of events. (Cogni�on – picking of keywords) Timing of the event Who did what to whom? Who did what to Whom when? Step 3 RRG phenomena (Van Valin Jr, 2005b) Step 4 Speech produc�on Seman�c macroroles – Actors and undergoers. How do say that? Seman�c – present the meaning of the event – Logical structure of the clause Communica�on of the passage of the story This requires remembering par�cipants in the story (memory recall). Lexical representa�on of events (ak�onsarten verbs) e.g., ac�ve accomplishment such as melted, mel�ng When, where how and why – to reflect the informa�on structure Syntax – structured explana�on of the event based on subject-verbobject using the Layered structure of the clause This requires reference to par�cipants, objects, loca�on, �me, manner, and reason (with a mental lexicon and represen�ng knowledge in order ) Speech disfluencies – CHAT manual (MacWhinney, 2021) Discourse – linking with previous events and topics Pragma�c context of the event to other events, and world knowledge Fig. 3. Model experiment 1 - Task 1 – Story Telling Mapping Attributes to RRG. corresponding scores. For example: LexDivNew (3); LexDivSame (2); LexDivPoor (1) LexDivNA* (0) as shown in Table 1. The outcome of the model will provide the participant an integrated score from the assessment from both strands of tasks. This will be compared against an internal Global Deterioration Scale band matrix. individuals and carers, distributed to various care homes (UK and Spain), and hosted on the Dementia Voices website (InnovationsInDementia, 2016) for completion via the DEEP (Dementia Engagement & Empowerment Project) group – ‘Taking Part’ in December 2021 (InnovationsInDementiaOrgUk, 2021). The model experiment development involves two phases. Phase 1: looking at 3 participants involved in strand 1 (tasks 1–4) providing a PST-LCAM 1.0 model. Phase 2 interview assessment based on a revised model containing CLAN based analysis of speech disfluency for the timing and pauses – creating a PST-LCAM 2.0 model. In each case, manual analysis and allocation of points was conducted by a two-person team, taking one parameter at a time – hiding the previous parameter (a set of colour coded columns in Microsoft Excel) to remain neutral/impartial/unbiased to the previous lexical categories, which have existing embedded automatic metrics. The model development has the following phases: 1) Model assessment - training and testing; 2) model assessment techniques; 3) model analysis techniques; and 4) model tuning techniques. Model analysis and tuning techniques have been discussed in detail in the above sections. 3.3. Experimental challenges The experiment had linguistic, technical, and clinical challenges. Linguistically, initially we had to ensure the accuracy of the mapping framework from a participant task to specific language production and cognition parameters in a RRG context. Further, it was necessary to ensure balanced linguistic decisions relating to the levels of language in the creation of the model. Technically, to ensure a systematic approach in conceptually devising and training the model with linked concepts based on semantic knowledge graphs. Further it was important to implicitly consider the underpinning descriptive, observational, explanatory, and computational linguistic adequacies of RRG model. From a clinical perspective, it was critical to ensure a normalised and balanced interim assessment of participant tasks to derive the summative score, indicators, and outcome. Further, access to specialist personnel to critique to the feasibility of the proof of concept of the PSTLCAM. Our experimental approach posed a range of challenges. (1) Inconsistent recording of participant information. For instance, a participant’s age, sex, and education information is missing. Alternatively, there is a declaration of the patient current dementia health status. (2) Impact – limited comparative information to help train the model for linguistic and cognitive scoring. (3) Interview topics are varied as expected – but modelling is more complex, however, the same language and cognition parameters can be used. (4) Limitation – some transcripts have not been reviewed by a second transcriber – so were avoided. 3.4.1. Model assessment training and creating a baseline model A baseline model was created from 3 iterations of sampling. Iteration 1 was from a sample of 50 utterances without speech indicators, based on 1 investigator, 1 participant (low GDS) and strand 1 tasks. This involved the continuous refinement of categories and parameter assessment and scores to establish proof of concept and baseline model. Iteration 2 included 3 other participants (higher GDS scores) and 100 utterances from the same investigator and their audio and transcript with speech pauses and timing of pauses to the latest model. Iteration 3 involved Strand 2 – interview assessment from other investigators – audio and transcript with over 300 utterances with speech indicators of disfluency to form a baseline model with an utterance score. 3.4.2. Model assessment methods used in testing of all tasks For testing the utterances an expanded set of assessment steps are performed. These included (a) utterance scoring based on the type of utterance: Type A - One-word Utterance e.g. bus; Type B -Simple (SVC) ’A lady lost her purse’ - Type C – Simple (Adjuncts) adjuncts/adverbs/ manner/location - e.g. ’ I saw Mary in Madrid’; Type D – Complex 3.4. Experimental setup and design This model is conceptually designed in Microsoft Excel – software using -various lookups, data validations, and group functions. Assessment design was based on data collected from questionnaires sent to 7 K. Panesar and M.B. Pérez Cabello de Alba Language and Health xxx (xxxx) xxx Table 1 Scoring framework for 15 Language Production Parameters. Table 2 Scoring framework for 7 Language cognition and 2 speech parameters. Levels of Language Assessment Parameters Explanation Indicators Tasks Max Score Levels of Language Assessment Parameters Explanation Indicators Tasks Max Score All Length Number of words All 3 Ontology Ontology placement Richness of words All 3 Lower, Same, Upper, NA 3 Diversity Content Content word All 3 Clause SVO word order/clause level All 3 Lexicon Word Production Conceptual placement hierarchical levels – Upper (generalising), Same (Discourse aligned), Lower (more specific) Semantic memory Lexicon All Lexical 3 word order/noun phrase All 3 Lexicon Contextual new word Semantic memory word in context All 3 Repetition Repetition of words All 3 Lexicon Picking words Semantic memory – picking words All 3 Pronoun Freq Frequency of pronoun use All 3 Lexicon Repetition Echolalia All 3 Sentence Completion Completeness of sentence in context (use of morphological and grammatical) terms Correct reference All 3 Lexicon Word/ Phrase Revision Correct, Partial, Poor, N/A All 3 All 3 Lexicon Retain Language Repetition of previous words/ sounds from investigator Changing words based on audio transcript annotation e.g., inclusion of < > [//] Level 1 language use New, Same, Poor, N/A Correct, Partial, Poor, N/A Correct, Partial, Poor, N/A Correct, Partial, Poor, N/A All Noun SVO Correct, partial, poor, N/A New, Same, Poor, N/A Correct, Partial, Poor, N/A Correct, Partial, Poor, N/A Correct, Partial, Poor, N/A New, Same, Poor, N/A Correct, Partial, Poor, N/A Correct, Partial, Poor, N/A All 3 All 3 Speech Pauses Filled sound pauses (e.g., &-um) All 3 All 3 Speech Timing of pauses Short, medium, and long pauses Correct, Partial, Poor, N/A Fluent, Partial, Poor, N/ A* Fluent, Partial, Poor, NA* All 3 Yes, Some, No, N/A All 3 Yes, Some, No, N/A All 3 Correct, Partial, Poor, N/A All 3 Syntax Semantic Discourse Reference Context Speech Act Change of AR Correctness of AR Pragmatic Focus Context accuracy (use of the correct topic) Type of Speech act (interrogative, assertive, directive) (Searle, 1969) Change of anaphoric resolution Correctness of anaphoric resolution Completeness of sentence Correct, Partial, Poor, N/A Correct, Partial, Poor, N/A Correct, Partial, Poor, N/A Table 3 Scoring for the Generative Naming Task found in the MMSE. Language Production Levels of Language Lexical Assessment Parameters Diversity Content Table 2 identifies language cognition parameters of the ontology and lexicon and together with speech dysfluency (MacWhinney, 2021). These parameters align to the language and cognitive assessment invoked from the MMSE, and language and memory cognition domains (Kulkarni & Moningi, 2015). Language cognition Utterance - with subordination and coordination ’ I will go if it does not rain’; Type E - Short Attempt at lexical attributes; Type F: Long Attempt at lexical attributes. The language production, cognition and speech parameters are assessed by a metric of qualitative allocation and quantitative scoring such as the assessment parameter SyntaxClauseSVO and it can be correct, partially correct, or poor in communication. At this point any investigator utterances will be marked as ‘INV’ to be excluded. (b) Normalised utterance scoring based on framework for participant tasks based on exclusions set, and inclusions sets, and types of utterances and lexical and cognitive significance are discussed further in the next section. For instance, a complete (successful) utterance will score a maximum of 66. A generative naming task complete (successful) words will score a maximum of 24 shown in Table 3. This score of 24 is normalised to standard utterance assessment score of 66. Here the assessment parameters of language production level: lexical (and sub parameters) and pragmatic; language cognition: ontology, lexicon, and speech (pauses) have a range supporting considerations for Pragmatic Focus Ontology Ontology placement Word Production Retain Language Lexicon Lexicon Speech All Cognition no pauses All Cognition timing pauses Max score for the generative naming task Average score Poor score N/A 8 Options Score New, Default, Poor, N/A Correct, Partial, Poor, N/A Correct, Partial, Poor, N/A 3, 2, 1, 0 Lower, Same, Upper, NA New, Same, Poor N/A Correct, Partial, Poor, N/A Fluent, Partial, Poor, N/A* Fluent, Partial, Poor, NA* 3, 2, 1, 0 3, 2, 1, 0 3, 2, 1, 0 3, 2, 1, 0 3, 2, 1, 0 3, 2, 1, 0 3, 2, 1, 0 9 + 15 = 24 6 + 10 = 16 3+5=8 0 K. Panesar and M.B. Pérez Cabello de Alba Language and Health xxx (xxxx) xxx a person-centred assessment. These include: (1) Utterance complexity of the category Groups A to F, are considered implicitly in the scoring. (2) If the content in an utterance is missing – as per the discourse and pragmatic phenomena – this is considered a content poor utterance with underlying language and cognition issues. (3) Acknowledgements are part of the discourse – but for the model are not computed – it is important for conversational design, but do not actually provide a response to the participant task. (4) Use of synonyms – ‘same’ as default’ – to ease model parameter analysis (5) NA count –used to differentiate utterances and indicators that are not relevant and also to aid verification. (6) Use of different linguistic patterns (PAR1 – T1 - ‘she lost it’) to express the same idea – a positive indicator of cognitive ability. (7) Interim scoring of results – from task to task. (8) After generative naming, in a conversational style participant would declare when they are done, and this utterance will be ignored in the data collection to differentiate the assessed tasks from the ones used as acknowledgements. (9) Self interruptions are considered as pause – with a change of grammar as in (//) ‘no that [//] that’s’ or as in the case of ‘&hm + ’ when a participant is pausing with a sound. (10) Transcript coding is translated into language and cognition contributing elements. For instance, phonological fragments & disfluencies such as fillers, phonological fragments, and repeated segments are all coded by a preceding & (MacWhinney, 2021). (11) Non-linguistic cues/events such as a laugh, cries or hisses noted in the transcript such as ‘&=laughs’ is not considered in the model. Also, visual cues (pointing) are not considered but helpful for the participant. Sandwich). From this transcript analysis an extract was collated for our analysis of the first 124 utterances containing 9 investigator utterances (to be excluded). Utterances 1–23 involved the general talking topic; and 24–124 were illness- related. These utterances have the speech disfluencies symbols from the CHAT automatic annotation excluding the word level error codes, but have high level of accuracy (MacWhinney, 2021). Our extract had a couple of errors such as (‘‡’ for ‘I and i_mean for ‘I mean’) which were cleaned for inclusion and analysis. Further automatic time and speech disfluencies metrics were created, to assess the number of pauses for example (medium pause - = COUNTIF(G9, "*(.)* "). 4.1. Model rescaling Next stage of scoring is the rescaling which involves the inclusions, penalties, exclusions set (non-linguistic/visual cues), and types of utterances, flagging of completion, and lexical and cognitive significance (such as ACKs) and completion of tasks. Rescaling is based on series of nested if-then-else statements in Microsoft Excel to establish additions or subtractions from the normalised score. These include: (a) an addition for completing the Generative Naming task (>=10 nouns within a minute) and decision rule: “=IF(cell>=10,1.5, IF(cell>=7,1, IF(cell>=5,0.5, IF(AND(cell>=1, cell<=4), 0.25, IF(cell=0,0,0))))); (b) subtracting scores for not completing the Story Retelling (SRT) task again - and decision rule: (IF(cell=0, 5,0)); (c) added scores for a good use of acknowledgements in conversation (based on a threshold >5) but excluding non-linguistic/visual cues and using decision rule: =IF(cell>=5,1.5, IF (cell>=4,1.2, IF(cell>=3,0.9, IF(cell>=2,0.6, IF(cell=1,0.3, 0))))). Note for the interview task the acknowledgements are not excluded – as they are part of the interviewing responses; (d) added scores for very good totals for language production and cognition group totals (threshold >20 for max scoring of all qualitative attributes) and decision rule: IF(cell>=20, 0.5,0); (e) for the interview task, the considerations of the use of repetitive word, filler words and vague expression words (Guinn & Habash, 2012). In this model they are not considered as they reduce the lexical richness from a semantic point of view and thus the SemSenCorrect is replaced by SemSenPartial score, a score from 3 to 2 for the individual parameter score of the utterance. The Python code routine for top 10 words of 115 utterances of text (dataset) identified: (1) ‘whatever’ (28); (2) ‘things’ (23); (3) ‘like’ (22); etc. This list excluded the filler words such as ’th’ or ’um’ or ’uh’ (which are considered as part of speech disfluencies parameter assessment) and excluded the visual cues identified on the annotated transcription such as ‘points’ and ‘ges’ (gesture). For lexical-semantic analysis and cognitive variability utterances 26 – 28 are to be reviewed. Starting off with utterance (26), the investigator asks, ‘so when you started feeling like all of the things you just described started happening’. The participant responses are utterance (27) ‘oh I &-um back in the summer of two thousand seven after my mom had passed away there was one little tiny thing in Chicago ?’ and Utterance (28) ‘ ‘‘&=points:forward that I [/] &+kn I [/] &+kn I useta know what it was called’. For utterances 27 and 28 they gain maximum scores for the lexical diversity, semantic, syntax and pragmatic parameters. However, for utterance 27 they communicate appropriate discourse references and cognitive information, but for utterance 28, there is a loss of context and picking out words and have included filler words. This pattern of variability of cognitive status also appears in utterance 51 - ‘I would hafta go back in there &=ges:putting_in to find them, whatever, which I did a little bit’ and utterance 52 - ‘but <then I just> [//] it’s like, no it’s getting whacky’. All utterances of a strand are grouped from the collective qualitative attributes derived from the scoring and assessment. For example, a participant may have more of category A scores, some of category B, a little of category C scores demonstrating the linguistic and cognitive behaviour. See Table 4, for all qualitative attributes, quantitative score per parameter and indicators with reference to the mini-mental state examination (Cockrell & Folstein, 2002; NHSuk, 2019). This category group scoring minimum and maximum value of 4. Results In this section, we will examine our statistical approach to model assessment analysis and tuning techniques, which are based on a range of different considered factors as mentioned in the previous section. These factors and further rescaling are to provide a clinically orientated person-centred adjustment to the pre-screening cognitive assessment (Pendrill, 2018). Initial scoring for Strand 1 with 4 tasks includes (1) the scoring of all utterances against a maximum score of 66; (2) the scoring of the generative naming task (single word responses) to provide a normalised score to be used for comparison and completeness; (3) a normalised mean without the picture naming task to see the difference in variation; and a normalised mean based on all 4 tasks (used for the main analysis; (4) category grouping totals for Group A, Group B, Group C for the range of qualitative indicators as seen in Table 4; (5) apply statistical metrics (ratios, min, max, mean, standard deviation (SD) and variance) to the language and cognition groupings (lexical, syntactic, semantic, pragmatic, discourse, cognition, speech) for the 22 language production/ cognition parameters, for example - minSyn, maxSyn, meanSyn, sdSync. For strand 2- interview task with participant 6 and 313 participant utterances with 8 topics (@G: Illness; @G: Important_Event; @G: Window; @G: Umbrella; @G: Cat; @G: Cinderella_Intro; @G: Cinderella; @G: Table 4 All Language and Cognition parameters and Assessment Attribute Qualitative Indicators. Category of Group Totals Qualitative Attribute Quantitative Scoring per parameter Indicators - Reference from MMSE Good Ability (A) Complete, New, Correct, Lower, Fluent, Yes 3 Some Ability (B) Poor Ability (C) Reduced, default, partial, same, Incomplete, Upper, poor, No 2 These terms signify the ability to demonstrate good communication and understanding These terms signify some ability to communicate and understand These terms signify a poor ability to communicate and understand 1 9 K. Panesar and M.B. Pérez Cabello de Alba Language and Health xxx (xxxx) xxx category ability is used for the basis of the GDS analysis matrix (GDSAM) of a GDS score and associated GDS band score descriptor. This is based on language and cognition insight and group threshold (minMean and maxMean of each GDS band), variability, indicators and markers and Dementia TalkBank training, MMSE and clinical insight. For example, SD threshold (e.g., 0 – 0.3) indicates a GDS band score 1–3 of which no cognitive decline (1) to mild cognitive decline (3) on GDS. Finally, the participants’ rescaled, integrated scores and statistics are queries against the GDSAM for allocation of a GDS band score for the participant – in the form of a dashboard. This has a GDS band type and descriptor (Reisberg et al., 1982). For example, our model has a GDS – band 1 – which denotes “No cognitive decline based on our Language and Cognition assessment” and further a Language Cognition Parameter Grouping Lexical SD with a score of 0 has an indicator ‘Able to provide lexical diverse words and information.’ Another example, for the syntax grouping, the individual quantitative metric e.g., MeanSyn has a value of 1 denotes ‘no problem structuring the word order of the utterance’. A complete GDSAM table is found in Appendix D. During the model training several findings include: (1) the use of synonyms by participants - to express the same idea. This reflects a positive indicator of cognitive ability with the introduction of new words. (2) Participants are generally doing better on the earlier tasks than the latter task such as story retelling which reflects issues in long term memory (3) If there is missing information, the RRG theory refers to this as ‘prior contextual dialogue to retrieve that information’ Van Valin, 2005a, 2005b. For example, ‘can’t do it’ refers to ‘I can’t do it’. Here the pronoun is expected, but it is implicit through context. (4) The repetition of an investigator’s utterance, echolalia, is recorded as repetition in language cognition – Echolalia assessment. (5) The use of acknowledgements by the participant is a contributing factor to the final assessment. with a scoring from a qualitative drop-down list and equivalent quantitative score assignment. This colour coded group was hidden to prevent bias and remain objective, and the next colour coded columns (that is parameter category) were selected and assessed. For instance, the utterance category is assessed first, hidden, and followed by lexical category and hidden until all categories are assessed. Validity is achieved by normalising for statistical analysis, additions and subtractions to an utterance score based on linguistic phenomena at word level/sentence-level/discourse-level pragmatic level for the strand 1 task. This is achieved by taking the mean utterance score through a series of sequential adjustments based on completion/incompletion/use of acknowledgements/demonstration of good ability to form a rescaled mean score. For completing the generative naming task, the use of the lookup table and the additions as follows: a) > +10 nouns add 1.5; b) between 7 and 9 add 1; c) between 5 and 6 add 0.5; d) between 1 and 4 add 0.25; e) 0 nouns uttered, no adjustment. Similarly, if there was no attempt of the story re-telling task a value of 5 is subtracted from the mean utterance score. However, the use of acknowledgements by the participant in sessions is a powerful compensatory strategy (Kindell et al., 2013; Pilnick et al., 2021) for conversational interaction and hence an addition to the mean utterance score was as follows: a) > =5 acknowledgements add 1.5; b) 4 acknowledgements add 1.2; c) 3 acknowledgements add 0.9; d) 2 acknowledgements add 0.6; e) 1 acknowledgement add 0.2. Finally, the addition of 0.5 is added to the mean utterance if there is a demonstration of more than 20 qualitative ‘good ability’ assessed. Validity for strand 2 interview task looks at the same linguistic phenomena at word level/sentence-level/discourse-level/pragmatic level but also includes the acknowledgements in conversation as part of the initial analysis and assessment and hence contributes to the mean utterance score. Appendix C – refers to the Participant and Data (Dementia TalkBank) and our Hypothesis Testing baseline on the investigator’s allocation. Here participant 5 has been allocated a GDS score of Stage 6 for Strand 1 tasks and Strand 2. However, our results derive a Stage 4 result for Strand 2 interview task. This task is mainly about recalling information about a topic and about themselves. There was no evidence/files available of participant 5 undergoing Strand 1 – which if available may allude more towards a Stage 6 outcome. See Fig. 4 presents a line graph comparison of utterance behaviour of participants 1,2,3 and 5 which correlates with each participant’s stage outcome. The accuracy measure is collectively based on the reliability of the original extracted transcript data from the Dementia TalkBank followed with some cleaning of the data, model design, the intuition of language experts as part of the manual analysis. See Table 5, the dashboard for Participant 1. 4.2. Quality of the model In terms of the objectivity, reliability, validity, and accuracy of our model it can be assessed on two levels. The model itself, details (link) and the use of the dataset (utterances) is found in Appendix F. At the initial dataset usage of the Dementia TalkBank (MacWhinney, 2017), we have inherited an annotated transcription with an implicit error factor manipulated from the CHAT manual by MacWhinney (2021). The transcription has been identified with high level of accuracy, but with a couple of transcription annotations/anomalies which were addressed in Section 3. For example, (1) transcript utterance 113 ‘ &-um I went this last Saturday &-um with my son and my sister in law and his [/] his wife and so on like that’. (2) Transcript utterance 98 ‘because I_mean I useta like all sorts of stuff and big stuff &=hands:spread’. At the linguistic analysis stage, the reliability, can be explained via the discussion of the use of dataset in analysis, automatic analysis grouping controls and assessment in Microsoft Excel. There are a series of automatic and group controls (counts). The order which they are computed are included as follows. (1) number of participant utterances; (2) number of utterance types based on a qualitative utterance category; (3) total for the language production category; (4) the total for the language cognition category; (5) total utterance score; (6) normalised score adjustment if it is a generative naming task; (7) group totals of the qualitative scoring for the categories of ability, good/partial/poor; (8) the number of speech disfluencies in the utterance via 13 speech checks. For example, 4 out of 13 are: a) Revision - Count of &+ ’; b) Word repetition count of [ x 3]; c) Filled pause - Count of &= ; d) ‘Timing pause - Count of ’.’’; (8) a final holistic utterance score subtracting the speech disfluencies. Here, we carried out the manual analysis and initial scoring of two strands – strand (1) with four tasks for 4 participants (total utterances of…) and strand 2 with an interview task for participant 5 and an extract of 122 utterances. Utterance Scoring was achieved by a simultaneous verification and assessment by a two-person team with systematic selection of colour coded group (to reflect the different parameter groups) 5. Discussion 5.1. Advantages of the PST-LCAM model The goal of the experiment was to identify in regular conversation some mild impairment/speech issues and/or indicators or a feeling of something going wrong and the need for further investigation from a clinical perspective. The variability of language and cognitive behaviour of participants can be identified by the participant’s qualitative and quantitative results with their resulting different GDS band (1−5) and indicators. Our PST-LCAM concept and development is based on the investigator session with the participant via audio files/transcripts and final outcomes. As noted earlier, we have a true hypothesis result. Our outcomes correlate with problems at word level/sentence level/ discourse level communication and specific conditions as in Table 6 with examples from our model analysis, implementation, and assessment. The explanations of these conditions were presented in the literature review earlier (AphasiaTalkbank, 2021). For each level of language Table 6 identifies a specific condition with an explanation and selected 10 Language and Health xxx (xxxx) xxx K. Panesar and M.B. Pérez Cabello de Alba Utterance Score Comparision of Participants 1, 2, 3, 5 65 55 45 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100 103 106 109 112 115 35 Participant 1 Participant 2 Participant 3 Partcipant 5 Fig. 4. Line graph comparison of utterance behaviour of participants 1, 2, 3 and 5. Table 5 Participant 1 – Dashboard of Strand 1 of 4 MMSE tasks and Utterances’ (Utts) Analysis and GDS outcome. Participant No of Utts. Normalised Utts. ACK (More than 5 ACK - credit given) GN (√) Extra points if done P1 23 14 5 Parameter Mean (over Normalised Utts) MeanLex Score Ratio of Normalised Utts LexRat Lexical Syntactic Semantic Pragmatic Discourse Lang Cognition (LG) Speech Summary 2.77 SRT (√) Penalty if not done Gp A (>20 credit given) Max Utterance Value √ X Normalised Mean Score SD 20.86 65.49 69 Comments 0.95 0.17 SD Able to provide lexical diverse words and information MeanSyn 3 SynRatio 1 SD 0 Able to structure an utterance appropriately MeanSem 2.74 SemRatio 0.91 SD 0.15 Utterances and words provided are adequate to the context and give meaning, and little variation in semantic representation of utterances. MeanPrag 3 PragRatio 1 SD 0 They demonstrate a very good world perspective MeanDis 3 DisRatio 1 SD 0 Very good at producing sentences and words in the right context and reference MeanLG 2.87 LGRat 0.66 SD 0.32 No repetition or involuntary words, however, there is the displacement of wrong/more generalised words used for complex Utt: category D MeanSpeech 2.85 SpeechRat 0.45 SD 0.36 Occasional speech interruptions and pauses meanAll 2.89 meanRat 0.85 SD 0.14 There is a consistency between the utterances produced in relation to all the utterances and little difference in variation. Based on our GDS band scoring table - P1 = Stage 3 - Mild Cognitive Decline. This participant has failed to re-tell the story, but capable of producing responses in relation to investigation’s context. example (s) from our model analysis, implementation, and assessment. Table 6 further provides early evidence and correlates with discussions by Ostrand and Gunstad (2021) that early-stage dementia (cognitive decline) reduces the amount of specific content information conveyed during speech, while maintaining contextual relevance and grammaticality. They further note that other levels of linguistic processing, including articulatory production, phonetic retrieval, and syntax, remain largely unimpaired until much more advanced stages of the disease. Table 6 Example of language issues identified from the proposed model assessment. Type of feature Condition Example from our model results Participant (P) Source Word level anomia Use of whatever, things, //, pointing, gestures “Where my mom and dad and so on had been &=ges: circling where they had gone into these places where &-um nursing home that whatever like that” that’s [/] that’s what’s whatever, ‘so on like that’ Whatever, thing No matched evidence P3, P5 circumlocution jargon perseveration semantic paraphasia stereotypy Sentence level agrammatism empty speech - Discourse level communication vs language Whatever, thing, ‘useta’ (e. g., &-um) No matched evidence ‘Stuff, like I was going there, going there, going there’ ‘He got a funny paper down there’ P5 5.2. Future directions and clinical implications P2 The PST-LCAM model will be embedded into a conceptual architecture as shown in Appendix E. This will involve: (1) to implement the model design and cognitive assessment process of language and cognition by deploying a Role and Reference Grammar language engine and a psycholinguistic and cognitive adequacy (PCA) assessment protocol discussed earlier; (2) using a user voice input it is transcribed by a bespoke automatic speech recognition (ASR) annotation framework to create a set of annotated utterances. Our work and plans concur with Boletsis (2020) who reviewed 9–15 studies taking place from 2017 to 2020 on automated speech-based interaction for cognitive screening; (3) this validates the use and next stage of our model development to be embedded in an intervention for early dementia detection. It will provide indicators, pre-diagnosis results, and recommendations for the P3, P5 None P3, P5 None P5 P1 11 K. Panesar and M.B. Pérez Cabello de Alba Language and Health xxx (xxxx) xxx cognition assessment model (PST-LCAM) as an intervention into a conversational agent interface as an application for pre-screening people the early detection of language and cognitive decline. Our model uses Dementia TalkBank dataset of investigators interviews of the participants’, cognitive assessment sessions transcribed by the CHAT system taking in account speech disfluencies of the participants. These transcriptions constitute the input for training our language and cognition assessment model in Microsoft Excel. Our model is aligned with GDS band score descriptor via GDS analysis matrix (GDSAM) providing language and cognition insight and group threshold (minMean and maxMean of each GDS band), variability, indicators, and various statistical markers. For example, (1) SD threshold (e.g., 0 – 0.3) indicates a GDS band score 1–3 of which no cognitive decline (1) to mild cognitive decline (3) on GDS. (2) GDS – band 1 – denotes “No cognitive decline based on our Language and Cognition assessment” and further a Language Cognition Parameter Grouping - Lexical SD with a score of 0 has an indicator ‘Able to provide lexical diverse words and information.’ Another example, for the syntax grouping, the individual quantitative metric MeanSyn has a value of 1 denotes ‘no problem structuring the word order of the utterance’. To summarise our model is successful, and demonstrates a proof of concept, and fit for purpose. The novelty of our PST-LCAM lies on the use of a functional grammatical model to elicit the understanding and meaning that are required for a thorough language and cognition assessment. We are aware of the limitation of our model in terms of low participant numbers, however, our future plans of testing with larger groups will enrich our pre-screening trained language and cognition assessment model and support validation to engage in further dementia related research collaboration. participant and/or carer, as precursors to a definitive final clinical diagnosis. These qualitative recommendations will form an outputs of the model via (1) Acquiring latest MCI recommendation via dementia and speech specialists, dementia care providers and health organisational documents (NHSorg, 2020; TheAdultsSeechTherapyWorkbook. com, 2022). (2) Creating a recommendation mapping protocol (RMP) of language categories and a mapping of recommendation activities; For instance for a syntax indicator a range of options (i) physical activity – out and about, walk and talk; ii) mental activity – word game app/audio book/reading puzzles; (iii) social activity – creative singing following a lyric; iv) creative activity – writing a to do list; v) individual/group/direct/indirect activities - Reminiscence work, a time in your life in a social setting. (3) Creating a personalised recommendations activity plan (PRAP) based on the RMP and participant’s GDS results, and latest personal dashboard. This PST-LCAM model builds on the innovations in dementia – with the aim to trigger and inspire this line of thinking and development for remote pre-screening application at the patient’s convenience, via a conversational interface (see Appendix E). This will improve patient’s experience, support pre-diagnosis processes, and help to reduce costs in NHS dementia diagnosis, social care and contribute to wider ambient assisted living (AAL) practices (AlzheimersResearchUkOrg, 2023; Demir et al., 2017). More recent considerations and studies have identified that dementia only causes about 41% of cognitive decline and there are other predictors such as lifestyle factors that can impact cognitive decline (Pelc, 2023). Our PST-LCAM proof of concept will be implemented as highlighted above with ethical alignment to validate the computational model a cognitive healthy older group of 200 participants will be recruited to test and validate the model. They will undergo a battery of cognitive tests administered by a pair neuropsychologist. This will be followed by a protocol based comparative evaluation of our implemented PST-LCAM model results and the clinical results, and with feedback appropriate refinement made and subsequently a larger testing cohort The ultimate goal is to test with a control group of participants with mild impairments via clinical collaborative arrangements. Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Data Availability 6. Conclusion The dataset link has been shared in the appendix and methods section. Our goal was to embed a pre-screening trained language and Appendix A1 model experiment 2 participant task 2 – generative naming task mapping attributes to RRG Table 7 Model experiment 2 Participant Task 2 – Generative naming Task Mapping Attributes to RRG. Attributes Explanation of using it RRG relevance Consecutive list of words within a minute Domain dependent – picking out words (memory recall) Velocity of retrieval Ontology – placement from the correct Semantic classes Lexicon - Lexical word retrieval and lexical word category Pragmatic – linked to the topic/question Appendix A.2 - Model experiment 2 participant Task 3 – picture description task mapping attributes to RRG Table 8 Model experiment 2 Participant Task 3 – Picture Description Task Mapping Attributes to RRG. Attributes Language Production & Cognition Links – Mapping framework RRG relevance • Degree of interpretation of the picture • Captured Event with participating objects. • Who did what to Whom? • Describe the Captured event in stages (Cognition – picking of keywords) • Who did what to whom? • This requires stating the relevant participating objects in the picture (memory recall). • Where, how, when, and why of the captured event. • This requires reference to participants, objects, location, time, manner, and reason (mental lexicon and representing knowledge in order) • Semantic macroroles – Actors and undergoers. • Semantic – present the meaning of the event – Logical structure of the clause (LSc) • Lexical representation of events (aktionsarten verbs) e.g., static, active accomplishment – e.g., slowed, slowly, finished, • Syntax – structured explanation of the event (SVO). Layered structure of the clause (LSC) • Discourse – linking the events • Pragmatic context of the event to other events. 12 Language and Health xxx (xxxx) xxx K. Panesar and M.B. Pérez Cabello de Alba Appendix A3 – Strand 3 - participant interview task – interview sets about your family (Q and A) attributes to RRG concepts Interview sets are based on topics such as the ‘tell me something about your family’, ‘tell me about your job’, ‘tell me about Little Red Riding Hood’, Goldilocks and 3 bears; tell me about your country you live’. A sample model experiment is ‘tell me about your family’. Q stands for ‘Question’. Table 9 Strand 3 - Participant Interview Task – Interview sets and attributes to RRG concepts. Language Production and Cognition Links RRG relevance Understand the initial Q, with an initial response of Q Understanding the confirmation (investigator) – acts as trigger Turn taking and response - Understanding of the interim Q in relation to the initial Q Understanding the leading Qs, and immediate response – and the nature of the response Analyse the non-ability to understand the question Range of participant responses Q and knowledge domain dependent – picking out words (memory recall) Velocity of retrieval How many people are in your family? Who is related to whom? What does a particular family member do? This requires remembering family members (memory recall) via leading Qs. When, where how and why – can reflect the adverbial. This requires reference to participants, objects, location, time, manner, and reason (mental lexicon and representing knowledge in order) • Fluency of response • Cognitive – lexicon – pick out key words and word production • Cognitive – lexicon – pick out key words and word production • Semantic macroroles – Actors and undergoers – relation between the family members • Semantic – present a response the question – Logical structure of the clause (LSc) • Lexical representation to describe linguistic phenomena (aktionsarten verbs) e.g., poorly, energetic, depressed • Syntax – structured explanation of the answer (SVO). Layered structure of the clause (LSC) • Discourse – linking of the answer to the question • Pragmatic context of the answer to the question • Speech fluency • • • • • • • • • • • • • • Appendix B – Adapted - speech notation (SN), phonological fragments (PF), speech disfluency (MacWhinney, 2021), qualitative scoring (QS) Table 10 Speech notation (SN), Phonological fragments (PF), Speech disfluency, and qualitative scoring (QS). Speech disfluency Speech notation (SN) QS SN QS SN QS SN QS PF Revision &+kn, (e.g &+m &+s, &+r, & +ha) No [//] No < > [//] &-uh, (e.g &-um) Fluent, (or n/a) Fluent Fluent Fluent Partial Partial Poor &+kn, (again) more than 3 times [//] three or more < > [//] three or more &-uh, (again) four and more [x > 3] Critical Fluent &+kn, (again) three times [//] twice < > [//] twice &-uh, (again) three and more [x 3] Poor No [x]* ** &+kn, (again) twice [//] < > [//] &-uh, (again) twice [x 2] or [/] No pause ^ Fluent Fluent (.) <short> ^2 Partial Partial (.) <medium> ^3 Poor Poor (…) <long> ^>3 Critical Critical Word revision (<> Phrase) revision Filled pause Repetition (mono words) Pause Pause within a word blocking Partial Partial Partial Poor Poor Poor Critical Critical Critical Critical Appendix C- Participant (Par) data (Dementia TalkBank) and our hypothesis testing Table 11 Participant (Par) Data (Dementia TalkBank) and our Hypothesis Testing. Par Age Sex (M/F) Ed years Occupation Comment Experiment Strands File (INV) Model Train – Test GDS PAR1 PAR2 62 76 M F 15 18 designated as early confusional stage CLA (4 tasks) 4 tasks Stage 3 Stage 4 78 66 F F 16 /tele01a /tele01a /tele01b /tele01b /PPA /depaul2A Train Train PAR3 PAR5 Manager -private water co MSc degree History teacher Graduate Textile designer Train Test Stage 5 Stage 6 4 tasks Interview Appendix D – GDS Matrix and bands (also found in the Excel file worksheet GDS Band Matrix) (23 refers to the maximum total for the group totals – derived from 23 parameter assessments) Table 12 GDS Matrix and bands. Scale Indicator Mean Utt Score Min Mean Utt Score Max Mean Language level Min Mean Language level Max Grp A (Good) Min Gp A Good Max Grp B Partial Min 1 No cognitive decline 68 69 3 3 23 23 0 GpB Partial Max 0 Grp Poor C Min 0 Grp C Poor Max 0 (continued on next page) 13 K. Panesar and M.B. Pérez Cabello de Alba Language and Health xxx (xxxx) xxx Table 12 (continued ) Scale Indicator Mean Utt Score Min Mean Utt Score Max Mean Language level Min Mean Language level Max Grp A (Good) Min Gp A Good Max Grp B Partial Min 2 Very mild cognitive decline Mild cognitive decline Moderate cognitive decline Moderately severe cognitive decline Severe cognitive decline Very severe cognitive decline 66 67.99 2.80 2.99 22 23 0 1 0 0 63 65.99 2.60 2.79 15 21 1 2 1 2 53 62.99 2.40 2.59 11 14 3 5 3 11 26 52.99 2.2 2.39 8 10 6 8 7 15 19 25.99 1.0 2.19 5 7 9 11 8 22 0 18.99 0 0.99 0 4 9 11 8 23 3 4 5 6 7 Appendix E: Conceptual architecture Fig. 5. Conceptual Architecture. . Appendix F - Pre-screening trained language and cognition assessment model (PST-LCAM) Please see the Microsoft Excel File here: includes worksheets: 1. 2. 3. 4. 5. 6. 7. 8. 9. Instructions for use Worksheet 1 - Lookup list Worksheet 2 - GDS Band Matrix Worksheet 3 - Participant 1 (Strand 1) utterances, PST-LCAM analysis, assessment, and dashboard Worksheet 4 - Participant 2 (Strand 1) utterances, PST-LCAM analysis, assessment, and dashboard Worksheet 5 - Participant 3 (Strand 1) utterances, PST-LCAM analysis, assessment, and dashboard Worksheet 6 - Participant 5 (Strand 2) utterances, PST-LCAM analysis, assessment, and dashboard Worksheet 7 - Utterance Variations (line graph) comparing participants utterances Worksheet 8 - Utterance Extract – Top 20 words (Participant 5 utterances) and visualisation 14 GpB Partial Max Grp Poor C Min Grp C Poor Max K. Panesar and M.B. Pérez Cabello de Alba Language and Health xxx (xxxx) xxx References Jones, D., et al. (2016). Conversational assessment in memory clinic encounters: Interactional profiling for differentiating dementia from functional memory disorders. Aging & Mental Health, 20(5), 500–509. https://doi.org/10.1080/ 13607863.2015.1021753 Kaplan, E., Goodglass, H.,Weintraub, S. (2001). Boston naming test. Kindell, J., et al. (2013). Adapting to conversation with semantic dementia: Using enactment as a compensatory strategy in everyday social interaction. International Journal of Language & Communication Disorders, 48(5), 497–507. https://doi.org/ 10.1111/1460-6984.12023 Kulkarni, D. K., & Moningi, S. (2015). Neurocognitive function monitoring. Journal of Neuroanaesthesiology and Critical Care, 2(03), 246–256. Lee, D., & Yoon, S. N. (2021). Application of artificial intelligence-based technologies in the healthcare industry: Opportunities and challenges. International Journal of Environmental Research and Public Health, 18(1), 271. https://doi.org/10.3390/ ijerph18010271 Liddy, E. D. (2001). Natural language processing. Encyclopedia of Library and Information Science (second ed.). NY: Marcel Decker, Inc. Linguamatics. (2021). How does Natural Language Processing (NLP) work? Retrieved from 〈https://www.linguamatics.com/how-does-nlp-work〉. ’Accessed 12 December 2021. MacWhinney, B. (2000). The CHILDES Project: Tools for Analyzing Talk: Volume I: Transcription Format and Programs, Volume II: The Database (third ed.). Mahwah, NJ: Lawrence Erlbaum Associates. MacWhinney, B. (2017). Dementia.TalkBank. Retrieved from 〈https://dementia.talkban k.org/〉. ’Accessed 12 February 2021. MacWhinney, B. (2019). Understanding spoken language through TalkBank. Behavior Research Methods, 51(4), 1919–1927. MacWhinney, B. (2021). Tools for Analyzing Talk Part 1: The CHAT Transcription Format. Retrieved from 〈https://talkbank.org/manuals/CHAT.pdf〉. Mairal, R., Perez, M.-B.A.,et-al (2019). Teorías lingüísticas: Editorial UNED. Mannonen, P., Kaipio, J., & Nieminen, M. P. (2017). Patient-centred design of healthcare services: Meaningful events as basis for patient experiences of families. Stud Health Technol Inform, 234, 206–210. Maurya, H. C., Gupta, P., & Choudhary, N. (2015). Natural language ambiguity and its effect on machine learning. International Journal of Modern Engineering Research, 5, 25–30. McKhann, G., et al. (1984). Clinical diagnosis of Alzheimer’s disease. Neurology, 34(7), 939. https://doi.org/10.1212/WNL.34.7.939 Miah, Y., et al. (2021), 2021//. Performance Comparison of Machine Learning Techniques in Identifying Dementia from Open Access Clinical Datasets. Singapore: Paper presented at the Advances on Smart and Soft Computing. Michie, S., Atkins, L., West, R. (2014). The behaviour change wheel: a guide to designing interventions. Mosqueira-Rey, E., et al. (2023). Human-in-the-loop machine learning: A state of the art. Artificial Intelligence Review, 56(4), 3005–3054. https://doi.org/10.1007/s10462022-10246-w Nasreddine, Z. S., et al. (2005). The Montreal Cognitive Assessment, MoCA: A brief screening tool for mild cognitive impairment. Journal of the American Geriatrics Society, 53(4), 695–699. NHSorg. (2020). Activities for dementia - Dementia guide. Retrieved from 〈https://www. nhs.uk/conditions/dementia/activities/?tabname=symptoms-and-diagnosis〉. ’Accessed 1 July 2020. NHSuk. (2019). Standardized Mini-Mental State Examination (SMME). Retrieved from 〈https://www.swlstg.nhs.uk/images/Useful_docs_for_healthcare_professionals/min i-mental_state_examination_form.pdf〉. ’Accessed 12 December 2020. Noori, A., et al. (2022). Development and evaluation of a NLP annotation tool to facilitate phenotyping of cognitive status in electronic health records: Diagnostic study. Journal of Medical Internet Research, 24(8). https://doi.org/10.1002/ alz.068929 Norden, J., Wang, J., & Bhattacharyya, A. (2023). Where Generative. AI Meets Healthcare: Updating The Healthcare AI Landscape. Retrieved from https://aichecku p.substack.com/p/where-generative-ai-meets-healthcare. Ntracha, A., et al. (2020). Detection of mild cognitive impairment through natural language and touchscreen typing processing. Frontiers in Digital Health, 2, Article 567158. O’Malley, R., et al. (2020). Can an automated assessment of language help distinguish between Functional Cognitive Disorder and early neurodegeneration? Journal of Neurology, Neurosurgery Psychiatry, 91(8), e18–e19. https://doi.org/10.1136/jnnp2020-BNPA.43 O’Malley, R. P. D., et al. (2021). Fully automated cognitive screening tool based on assessment of speech and language. Journal of Neurology, Neurosurgery & Psychiatry, 92(1), 12–15. https://doi.org/10.1136/jnnp-2019-322517 OpenAI. (2023). GPT-4 is OpenAI’s most advanced system, producing safer and more useful responses. Retrieved from https://openai.com/gpt-4. ’Accessed 15 March 2023’ Ostrand, R., & Gunstad, J. (2021). Using automatic assessment of speech production to predict current and future cognitive function in older adults. Journal of Geriatric Psychiatry and Neurology, 34(5), 357–369. https://doi.org/10.1177/ 089198872093335 Padhee, S.et al. (2020). Identifying Easy Indicators of Dementia. Palanica, A., et al. (2019). Physicians’ perceptions of chatbots in health care: Crosssectional web-based survey. J Med Internet Res, 21(4), Article e12887. doi:https:// www.jmir.org/2019/4/e12887/. Pelc, C. (2023). Dementia only causes about 41% of cognitive decline: Study identifies other predictors. Retrieved from 〈https://www.medicalnewstoday.com/articles /cognitive-decline-predictors-besides-dementia〉. ’Accessed 16 April 2023. Adhikari, S., et al. (2022). Exploiting linguistic information from Nepali transcripts for early detection of Alzheimer’s disease using natural language processing and machine learning techniques. International Journal of Human-computer Studies, 160, Article 102761. https://doi.org/10.1016/j.ijhcs.2021.102761 Alzheimer’sAssociation. (2023). Alzheimer’s Disease Facts and Figures. Retrieved from 〈https://www.alz.org/alzheimers-dementia/facts-figures〉. ’Accessed 12 July 2023. AlzheimersOrgUk. (2018). Over half of people fear dementia diagnosis, 62 per cent think it means ’life is over’. Retrieved from 〈https://www.alzheimers.org.uk/news/201 8–05-29/over-half-people-fear-dementia-diagnosis-62-cent-think-it-means-life-ove r〉. ’Accessed 12 March 2021. AlzheimersOrgUk. (2020). Alzheimer’s Society comment on how coronavirus is affecting dementia assessment and diagnosis. Retrieved from 〈https://www.alzheimers.org. uk/news/2020–08-10/coronavirus-affecting-dementia-assessment-diagnosis〉. ’Accessed 9 March 2021. AlzheimersResearchUk. (2020). Statstics about dementia - prevalence. Retrieved from 〈https://dementiastatistics.org/about-dementia/prevalence-and-incidence/〉. ’Accessed 12 July 2020. AlzheimersResearchUkOrg. (2021). Alzheimer’s Research UK. Retrieved from 〈https://www.alzheimersresearchuk.org/research/〉. ’Accessed 20 September 2022. AlzheimersResearchUkOrg. (2023). Think Brain Health Check-in. Retrieved from 〈https ://www.alzheimersresearchuk.org/brain-health/check-in/〉. ’Accessed 23 January 2023. AphasiaTalkbank. (2021). AphasiaBank Example. Retrieved from 〈https://aphasia.talkb ank.org/education/examples/〉. ’Accessed 12 February 2021. BabylonHealthCom. (2021). Creating Better Health and Panel Discussion. Paper presented at the AI Business Week Digital Symposium February 22–25 2021. Bertini, F., et al. (2022). An automatic Alzheimer’s disease classifier based on spontaneous spoken English. Computer Speech & Language, 72, Article 101298. https://doi.org/10.1016/j.csl.2021.101298 Bohr, A., & Memarzadeh, K. (2020). Chapter 2 - the rise of artificial intelligence in healthcare applications. In A. Bohr, & K. Memarzadeh (Eds.), Artificial Intelligence in Healthcare (pp. 25–60). Academic Press. https://doi.org/10.1016/B978-0-12818438-7.00002-2. Boletsis, C. (2020). A review of automated speech-based interaction for cognitive screening. Multimodal Technologies and Interaction, 4(4), 93. https://doi.org/ 10.3390/mti4040093 Borson, S., et al. (2003). The Mini-Cog as a screen for dementia: Validation in a population-based sample. Journal of the American Geriatrics Society, 51(10), 1451–1454. https://doi.org/10.1046/j.1532-5415.2003.51465.x Bresnan, J., et al. (1982). Cross-serial dependencies in Dutch. The Formal Complexity of Natural Language, 33, 286–319. Bucks, R. S., et al. (2000). Analysis of spontaneous, conversational speech in dementia of Alzheimer type: Evaluation of an objective technique for analysing lexical performance. Aphasiology, 14(1), 71–91. https://doi.org/10.1080/ 026870300401603 Car, L. T., et al. (2020). Conversational agents in health care: Scoping review and conceptual analysis. Journal of medical Internet research, 22(8), Article e17158. doi: https://www.jmir.org/2020/8/e17158. Cockrell, J. R., & Folstein, M. F. (2002). Mini-mental state examination. In Principles and practice of geriatric psychiatry, 140–141. https://doi.org/10.1002/0470846410.ch27 (ii) Dastani, M., & Yazdanpanah, V. (2023). Responsibility of AI systems. Ai & Society, 38(2), 843–852. DementiaTalkbankOrg. (2017). TalkBank and DementiaBank. Retrieved from 〈htt ps://dementia.talkbank.org/〉. ’Accessed 2 January 2021. DementiaUK. (2021). Getting a diagnosis. Retrieved from 〈https://www.dementiauk.or g/get-support/diagnosis-and-specialist-support/getting-a-diagnosis-of-dementia/〉. ’Accessed 12 June 2023. Demir, E., et al. (2017). Smart home assistant for ambient assisted living of elderly people with dementia. Procedia Computer Science, 113, 609–614. https://doi.org/ 10.1016/j.procs.2017.08.302 Dik, S. (1991). Functional grammar. In Linguistic Theory and Grammatical Description (Vol. 75, pp. 247–274). John Benjamins Publishing Company. EPRScUKRIOrg. (2021). Healthcare Technologies Grand Challenges. Retrieved from 〈https://www.ukri.org/what-we-do/our-main-funds-and-areas-of-support/browse -our-areas-of-investment-and-support/healthcare-technologies-theme/〉. ’Accessed 15 December 2021. Foltz, P. W., et al. (2022). Reflections on the nature of measurement in language-based automated assessments of patients’ mental state and cognitive function. Schizophrenia Research. https://doi.org/10.1016/j.schres.2022.07.011 Förstl, H., & Kurz, A. (1999). Clinical features of Alzheimer’s disease. European Archives of Psychiatry and Clinical Neuroscience, 249, 288–290. Guinn, C. I., & Habash, A. (2012). Language analysis of speakers with dementia of the Alzheimer’s type. Paper presented at the 2012 AAAI Fall Symposium Series. InnovationsInDementia. (2016). Making an Impact Together - Sharing the learning on dementia activism from and across the DEEP network. Retrieved from The UK Network of Dementia Voices 〈https://www.dementiavoices.org.uk/wp-content/uploads/ 2016/11/Making-An-Impact-Together.pdf〉. InnovationsInDementiaOrgUk. (2021). Learning about your cognitive state using language and memory – a questionnaire. Retrieved from 〈https://www.dementiavo ices.org.uk/deep-groups-news/learning-about-your-cognitive-state-using-languageand-memory-a-questionnaire/〉. ’Accessed 03 September 2021. JAIN. (2021). Assisting people with memory loss. Retrieved from 〈https://www.jain projects.com/〉. ’Accessed 12 July 2021. 15 K. Panesar and M.B. Pérez Cabello de Alba Language and Health xxx (xxxx) xxx Pendrill, L. (2018). Assuring measurement quality in person-centred healthcare. Measurement Science and Technology, 29(3), Article 034003. https://doi.org/ 10.1088/1361-6501/aa9cd2 Penfold, R. B., et al. (2022). Development of a machine learning model to predict mild cognitive impairment using natural language processing in the absence of screening. BMC Medical Informatics and Decision Making, 22(1), 1–13. Pilnick, A., et al. (2021). Avoiding repair, maintaining face: Responding to hard-tointerpret talk from people living with dementia in the acute hospital. Social Science & Medicine, 282, Article 114156. https://doi.org/10.1016/j.socscimed.2021.114156 Reisberg, B., et al. (1982). The Global Deterioration Scale for assessment of primary degenerative dementia. The American journal of psychiatry. https://doi.org/10.1176/ ajp.139.9.1136 Roxby, P. (2023). Dementia: Brain check-up tool aims to cut risk at any age. Retrieved from 〈https://www.bbc.co.uk/news/health-64308997〉. ’Accessed 18 January 2023. Searle, J. R. (1969). Speech Acts: An Essay in the Philosophy of Language (Vol. 626). Cambridge: Cambridge University Press. Taylor, N. (2019). Duke Report Identifies Barriers to Adoption of AI Healthcare Systems. Retrieved from 〈https://www.medtechdive.com/news/duke-report-identifies -barriers-to-adoption-of-ai-healthcare-systems/546739/〉. ’Accessed 1 November 2021. TheAdultsSeechTherapyWorkbook.com. (2022). THE ADULT SPEECH THERAPY WORKBOOK - Everything you need to assess, treat, and document. Retrieved from 〈https://theadultspeechtherapyworkbook.com/speech-therapy-memory-activitiesfor-adults/〉. ’Accessed 1 July 2022. Thompson, I. (1987). Language in dementia: I. A review. International Journal of Geriatric Psychiatry. https://doi.org/10.1002/gps.930020304 Van Valin, R. D., Jr (2000). A concise introduction to role and reference grammar. FLUMINENSIA: časopis za filološka istraživanja, 12(1–2), 47–78. Van Valin, R. D., Jr (2005a). Exploring the syntax-semantics interface. Cambridge: Cambridge Univ Press. Van Valin Jr, R.D. (2005b). A summary of Role and reference Grammar. Role and Reference Grammar Web Page, University of Buffalo . Verizon. (2023). Do LLMs really understand human language? Verizon experts offer a critical perspective on language understanding by large language models. Retrieved from https://inform.tmforum.org/features-and-opinion/do-llms-really-understand-h uman-language/. ’Accessed 1 June 2023’ WorldAlzReport2015Org. (2015). Prevalence of dementia around the world, along with forecasts for 2030 and 2050. In 〈https://www.researchgate.net/figure/Prevalence-o f-dementia-around-the-world-along-with-forecasts-for-2030-and-2050_fig1_33880 1466〉 (Ed.). Research Gate. Yeung, A., et al. (2021). Correlating natural language processing and automated speech analysis with clinician assessment to quantify speech-language changes in mild cognitive impairment and Alzheimer’s dementia. Alzheimer’s Research & therapy, 13 (1), 109. 16