Journal of Neurology, Neurosurgery & Psychiatry
BackgroundExtracting data from healthcare records is essential for clinical and research purposes... more BackgroundExtracting data from healthcare records is essential for clinical and research purposes but can be labour and time intensive. We developed a natural language processing (NLP) application to automatically extract meaningful data from routinely-generated multiple sclerosis (MS) clinic letters.MethodWe developed the system using the open source platform GATE (General Architecture for Text Engineering) and a training set of 100 manually annotated MS clinic letters. The system extracts information from each clinic letter including: MS diagnosis and type, Extended Disability Status Scale (EDSS) score, current and previous Disease Modifying Therapies (DMT), walking distance, and MRI information.For initial validation, we used 250 MS clinic letters. We compared the systems performance in extracting MS diagnosis, EDSS score and current and previous DMTs with human annotation. We recorded precision (proportion of extracted items that are accurate), recall (proportion of items that a...
Background: To develop a population-based cohort of people with ankylosing spondylitis (AS) in Wa... more Background: To develop a population-based cohort of people with ankylosing spondylitis (AS) in Wales using (1) secondary care clinical datasets, (2) patient-derived questionnaire data and (3) routinely-collected information in order to examine disease history and the health economic cost of AS. Methods: This data model will include and link (1) secondary care clinician datasets (i.e. electronic patient notes from the rheumatologist) (2) patient completed questionnaires (giving information on disease activity, medication, function, quality of life, work limitations and health service utilisation) and (3) a broad range of routinely collected data (including; GP records, in-patient hospital admission data, emergency department data, laboratory/pathology data and social services databases). The protocol involves the use of a unique and powerful data linkage system which allows datasets to be interlinked and to complement each other. Discussion: This cohort can integrate patient supplied...
International Journal of Population Data Science, 2018
IntroductionModern team science requires effective sharing of data and skills. The DPUK Data Port... more IntroductionModern team science requires effective sharing of data and skills. The DPUK Data Portal is a collection of tools, datasets and networks that allows for epidemiologists and specialist researchers alike to access, analyse and investigate cohort and different modalities of routine data across UK and international sources. Objectives and ApproachThe Portal is housed on an instance of UKSeRP (UK Secure eResearch Platform), that allows customisable infrastructure to be used for multi-modal research (thus far live in genetics, imaging and clinical data) for researchers across the world using remote access technology whilst allowing governance to remain with the data provider. A central team at Swansea University is responsible for data curation and processing, and runs an access procedure for researchers to apply to use data from multiple sources to be analysed in a central analysis environment. Other modalities are similarly hosted, with input from partner sites in Cardiff and...
International Journal of Population Data Science, 2018
DPUK relaunched the Data Portal in November 2017 to present openly available information on the d... more DPUK relaunched the Data Portal in November 2017 to present openly available information on the data availability and technical capability of the Data Portal, which supports multi-modal research studies with various objectives from disease model validation to observation investigation.DPUK not only brings clinical data together from cohorts, but is now supporting multi-modal studies in genetics and imaging, as well as linkage opportunities to routine data using world-leading technical solutions to data sharing.The capacity, adaptability and sophistication of the UK Secure eResearch Platform which the Portal is housed on, allows for unprecedented levels of centralised access to rich cohort and routine data, which is consequentially leading to international collaboration and development ambition within epidemiology, bioinformatics, research methodology and technical research solutions.As of March 2018, DPUK is supporting 50 cohorts, 41 from the UK and 9 from across the rest of the wor...
ABSTRACTBackgroundCoronavirus disease 2019 (COVID-19), caused by the novel severe acute respirato... more ABSTRACTBackgroundCoronavirus disease 2019 (COVID-19), caused by the novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), is estimated to have caused more than 18 million deaths worldwide as of end-May 2022.MethodsCOVIDENCE UK is a longitudinal population-based study that investigates risk factors for, and impacts of, COVID-19 in UK residents aged ≥16 years. A unique feature is the capacity to support trial-within-cohort studies to evaluate interventions for prevention of COVID-19 and other acute respiratory illnesses. Participants complete a detailed online baseline questionnaire capturing self-reported information relating to their socio-demographic characteristics, occupation, lifestyle, quality of life, weight, height, longstanding medical conditions, medication use, vaccination status, diet and supplemental micronutrient intake. Follow-up on-line questionnaires capturing incident symptoms of COVID-19 and other acute respiratory infections, incident swab test-conf...
BACKGROUND Call detail records (CDRs) are collected by mobile network operators in the course of ... more BACKGROUND Call detail records (CDRs) are collected by mobile network operators in the course of providing their service. CDRs are increasingly being used in research along with other forms of big data and represent an emerging data type with potential for public good. Many jurisdictions have infrastructures for health data research that could benefit from the integration of CDRs with health data. OBJECTIVE The objective of this study was to review how CDRs have been used in health research and to identify challenges and potential opportunities for their wider use in conjunction with health data. METHODS A literature review was conducted using structured search terms making use of major search engines. Initially, 4066 items were identified. Following screening, 46 full text articles were included in the qualitative synthesis. Information extracted included research topic area, population of study, datasets used, information governance and ethical considerations, study findings, and ...
BACKGROUND Mobile phone call detail records (CDRs) are increasingly being used in health research... more BACKGROUND Mobile phone call detail records (CDRs) are increasingly being used in health research. The location element in CDRs is used in various health geographic studies, for example, to track population movement and infectious disease transmission. Vast volumes of CDRs are held by multinational organizations, which may make them available for research under various data governance regimes. However, there is an identified lack of public engagement on using CDRs for health research to contribute to an ethically founded framework. OBJECTIVE This study aimed to explore public views on the use of call detail records in health research. METHODS Views on using CDRs in health research were gained via a series of three public workshops (N=61) informed by a pilot workshop of 25 people. The workshops included an initial questionnaire to gauge participants’ prior views, discussion on health research using CDRs, and a final questionnaire to record workshop outcome views. The resulting data w...
IntroductionThe COVID-19 pandemic has resulted in significant morbidity and mortality and devasta... more IntroductionThe COVID-19 pandemic has resulted in significant morbidity and mortality and devastated economies globally. Among groups at increased risk are healthcare workers (HCWs) and ethnic minority groups. Emerging evidence suggests that HCWs from ethnic minority groups are at increased risk of adverse COVID-19-related outcomes. To date, there has been no large-scale analysis of these risks in UK HCWs or ancillary workers in healthcare settings, stratified by ethnicity or occupation, and adjusted for confounders. This paper reports the protocol for a prospective longitudinal questionnaire study of UK HCWs, as part of the UK-REACH programme (The United Kingdom Research study into Ethnicity And COVID-19 outcomes in Healthcare workers).Methods and analysisA baseline questionnaire will be administered to a national cohort of UK HCWs and ancillary workers in healthcare settings, and those registered with UK healthcare regulators, with follow-up questionnaires administered at 4 and 8 ...
International Journal of Population Data Science, 2020
Background: The SAIL Databank is a data safe haven established in 2007 at Swansea University (Wal... more Background: The SAIL Databank is a data safe haven established in 2007 at Swansea University (Wales). It was set up to create new opportunities for research using routinely-collected health and other public service datasets in linkable anonymised form. SAIL forms the bedrock of other Population Data Science initiatives made possible by the data and safe haven environment. Aim: The aim of this paper is to provide an overview of public involvement and engagement in connection with the SAIL Databank and related Population Data Science initiatives. Approach: We have a public involvement and engagement policy for SAIL in the context of Population Data Science. We established a Consumer Panel to provide advice on the work of SAIL and associated initiatives, including on proposed uses of SAIL data. We reviewed the topics discussed and provide examples of advice to researchers. We carried out a survey with members on their experiences of being on the Panel and their perceptions of the work ...
International Journal of Population Data Science, 2018
IntroductionThe Dementias Platform UK (DPUK) Data Portal is a secure, accessible environment faci... more IntroductionThe Dementias Platform UK (DPUK) Data Portal is a secure, accessible environment facilitating provision of rich data towards the largest Dementia, cognition and ageing community of cohort studies in the world. DPUK is also providing services for cohort studies and researchers to maximise the research potential of the programme’s community. Objectives and ApproachAs part of the engagement of DPUK cohorts with the Data Portal, cohorts will upload data onto the DPUK instance of UK Secure eResearch Platform infrastructure. The Data Portal allows access to a collaborative working space that allows cohorts to enrich their own data, perform their own analysis, and enhance the research potential of their data whilst making use of expertise at various DPUK sites, such as data linking, curation and multi-modal specialism. Cohort data divided into ontologies allows researchers to access data specific to their study needs and can be requested from multiple cohorts simultaneously. Re...
International Journal of Population Data Science, 2018
IntroductionSets of clinical codes that define conditions and events of interest are a key knowle... more IntroductionSets of clinical codes that define conditions and events of interest are a key knowledge product in health data research. Documenting such lists is essential for transparency and repeatability, and there is great potential benefit in their sharing and reuse. We designed and implemented software to address these goals. Objectives and ApproachOur goals were threefold: Provide a graphical user interface (GUI) to allow easier creation of code lists, for less technical users. Allow clear documentation of code lists, preserving the history of their creation and capturing metadata about their meaning, provenance, and use. Facilitate programmatic access, so that the software is not just documentation but can be integrated into data preparation and analysis. To these ends, we developed a web application using Python and PostgreSQL that allows creating, editing, and accessing via a GUI, as well as a REST API for integration into SQL, R, and other environments. ResultsThe software ...
International Journal for Population Data Science, 2017
ObjectivesThe UK MS Register is a research project that aims to capture real world data about liv... more ObjectivesThe UK MS Register is a research project that aims to capture real world data about living with Multiple Sclerosis(MS) in the UK. Launched in 2011, identified data sources were: Directly from People with MS (PwMS) via the internet, from NHS treatment centers via ‘traditional’ database capture and by linkage to routine datasets from the SAIL databank. Data received from the NHS, though ‘gold standard’ in terms of diagnosis, is dependent on clinical staff finding both time and information to enter into a clinical system. System implementations across the NHS are variable, as is clinical time. Therefore, we looked to other complementary methodologies. ApproachThe Clix enrich natural language processing (NLP) software was chosen to see if it could capture a portion of the MS Register minimum clinical dataset, the software matches clinical phrases against SNOMED-CT.40 letters, from 2 NHS Trusts, from 28 patients were loaded. The letters were a mix of MS patients with differing ...
International Journal of Healthcare Information Systems and Informatics, 2012
Internet-registers are having an increasing role in healthcare informatics. Understanding the mot... more Internet-registers are having an increasing role in healthcare informatics. Understanding the motivations and expectations of people choosing to use such registers is important, and these aspects were investigated regarding people with MS who registered on the UK MS Register. An objective was to explore relationships between these factors and the source from which participants first learned about this Register, as this is relevant to how registers are publicised. The responses from a large number of participants (N = 2,675) to questions about the source by which they discovered the Register, why they registered, and how they thought it should be used, were qualitatively analysed using a ‘word cloud’ technique and traditional content analysis strategy to provide a more detailed analysis. The significant trends that emerged from these analyses were the importance to the participants of: studying MS; raising awareness about MS; improving and developing services and policies regarding M...
Journal of Neurology, Neurosurgery & Psychiatry
BackgroundExtracting data from healthcare records is essential for clinical and research purposes... more BackgroundExtracting data from healthcare records is essential for clinical and research purposes but can be labour and time intensive. We developed a natural language processing (NLP) application to automatically extract meaningful data from routinely-generated multiple sclerosis (MS) clinic letters.MethodWe developed the system using the open source platform GATE (General Architecture for Text Engineering) and a training set of 100 manually annotated MS clinic letters. The system extracts information from each clinic letter including: MS diagnosis and type, Extended Disability Status Scale (EDSS) score, current and previous Disease Modifying Therapies (DMT), walking distance, and MRI information.For initial validation, we used 250 MS clinic letters. We compared the systems performance in extracting MS diagnosis, EDSS score and current and previous DMTs with human annotation. We recorded precision (proportion of extracted items that are accurate), recall (proportion of items that a...
Background: To develop a population-based cohort of people with ankylosing spondylitis (AS) in Wa... more Background: To develop a population-based cohort of people with ankylosing spondylitis (AS) in Wales using (1) secondary care clinical datasets, (2) patient-derived questionnaire data and (3) routinely-collected information in order to examine disease history and the health economic cost of AS. Methods: This data model will include and link (1) secondary care clinician datasets (i.e. electronic patient notes from the rheumatologist) (2) patient completed questionnaires (giving information on disease activity, medication, function, quality of life, work limitations and health service utilisation) and (3) a broad range of routinely collected data (including; GP records, in-patient hospital admission data, emergency department data, laboratory/pathology data and social services databases). The protocol involves the use of a unique and powerful data linkage system which allows datasets to be interlinked and to complement each other. Discussion: This cohort can integrate patient supplied...
International Journal of Population Data Science, 2018
IntroductionModern team science requires effective sharing of data and skills. The DPUK Data Port... more IntroductionModern team science requires effective sharing of data and skills. The DPUK Data Portal is a collection of tools, datasets and networks that allows for epidemiologists and specialist researchers alike to access, analyse and investigate cohort and different modalities of routine data across UK and international sources. Objectives and ApproachThe Portal is housed on an instance of UKSeRP (UK Secure eResearch Platform), that allows customisable infrastructure to be used for multi-modal research (thus far live in genetics, imaging and clinical data) for researchers across the world using remote access technology whilst allowing governance to remain with the data provider. A central team at Swansea University is responsible for data curation and processing, and runs an access procedure for researchers to apply to use data from multiple sources to be analysed in a central analysis environment. Other modalities are similarly hosted, with input from partner sites in Cardiff and...
International Journal of Population Data Science, 2018
DPUK relaunched the Data Portal in November 2017 to present openly available information on the d... more DPUK relaunched the Data Portal in November 2017 to present openly available information on the data availability and technical capability of the Data Portal, which supports multi-modal research studies with various objectives from disease model validation to observation investigation.DPUK not only brings clinical data together from cohorts, but is now supporting multi-modal studies in genetics and imaging, as well as linkage opportunities to routine data using world-leading technical solutions to data sharing.The capacity, adaptability and sophistication of the UK Secure eResearch Platform which the Portal is housed on, allows for unprecedented levels of centralised access to rich cohort and routine data, which is consequentially leading to international collaboration and development ambition within epidemiology, bioinformatics, research methodology and technical research solutions.As of March 2018, DPUK is supporting 50 cohorts, 41 from the UK and 9 from across the rest of the wor...
ABSTRACTBackgroundCoronavirus disease 2019 (COVID-19), caused by the novel severe acute respirato... more ABSTRACTBackgroundCoronavirus disease 2019 (COVID-19), caused by the novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), is estimated to have caused more than 18 million deaths worldwide as of end-May 2022.MethodsCOVIDENCE UK is a longitudinal population-based study that investigates risk factors for, and impacts of, COVID-19 in UK residents aged ≥16 years. A unique feature is the capacity to support trial-within-cohort studies to evaluate interventions for prevention of COVID-19 and other acute respiratory illnesses. Participants complete a detailed online baseline questionnaire capturing self-reported information relating to their socio-demographic characteristics, occupation, lifestyle, quality of life, weight, height, longstanding medical conditions, medication use, vaccination status, diet and supplemental micronutrient intake. Follow-up on-line questionnaires capturing incident symptoms of COVID-19 and other acute respiratory infections, incident swab test-conf...
BACKGROUND Call detail records (CDRs) are collected by mobile network operators in the course of ... more BACKGROUND Call detail records (CDRs) are collected by mobile network operators in the course of providing their service. CDRs are increasingly being used in research along with other forms of big data and represent an emerging data type with potential for public good. Many jurisdictions have infrastructures for health data research that could benefit from the integration of CDRs with health data. OBJECTIVE The objective of this study was to review how CDRs have been used in health research and to identify challenges and potential opportunities for their wider use in conjunction with health data. METHODS A literature review was conducted using structured search terms making use of major search engines. Initially, 4066 items were identified. Following screening, 46 full text articles were included in the qualitative synthesis. Information extracted included research topic area, population of study, datasets used, information governance and ethical considerations, study findings, and ...
BACKGROUND Mobile phone call detail records (CDRs) are increasingly being used in health research... more BACKGROUND Mobile phone call detail records (CDRs) are increasingly being used in health research. The location element in CDRs is used in various health geographic studies, for example, to track population movement and infectious disease transmission. Vast volumes of CDRs are held by multinational organizations, which may make them available for research under various data governance regimes. However, there is an identified lack of public engagement on using CDRs for health research to contribute to an ethically founded framework. OBJECTIVE This study aimed to explore public views on the use of call detail records in health research. METHODS Views on using CDRs in health research were gained via a series of three public workshops (N=61) informed by a pilot workshop of 25 people. The workshops included an initial questionnaire to gauge participants’ prior views, discussion on health research using CDRs, and a final questionnaire to record workshop outcome views. The resulting data w...
IntroductionThe COVID-19 pandemic has resulted in significant morbidity and mortality and devasta... more IntroductionThe COVID-19 pandemic has resulted in significant morbidity and mortality and devastated economies globally. Among groups at increased risk are healthcare workers (HCWs) and ethnic minority groups. Emerging evidence suggests that HCWs from ethnic minority groups are at increased risk of adverse COVID-19-related outcomes. To date, there has been no large-scale analysis of these risks in UK HCWs or ancillary workers in healthcare settings, stratified by ethnicity or occupation, and adjusted for confounders. This paper reports the protocol for a prospective longitudinal questionnaire study of UK HCWs, as part of the UK-REACH programme (The United Kingdom Research study into Ethnicity And COVID-19 outcomes in Healthcare workers).Methods and analysisA baseline questionnaire will be administered to a national cohort of UK HCWs and ancillary workers in healthcare settings, and those registered with UK healthcare regulators, with follow-up questionnaires administered at 4 and 8 ...
International Journal of Population Data Science, 2020
Background: The SAIL Databank is a data safe haven established in 2007 at Swansea University (Wal... more Background: The SAIL Databank is a data safe haven established in 2007 at Swansea University (Wales). It was set up to create new opportunities for research using routinely-collected health and other public service datasets in linkable anonymised form. SAIL forms the bedrock of other Population Data Science initiatives made possible by the data and safe haven environment. Aim: The aim of this paper is to provide an overview of public involvement and engagement in connection with the SAIL Databank and related Population Data Science initiatives. Approach: We have a public involvement and engagement policy for SAIL in the context of Population Data Science. We established a Consumer Panel to provide advice on the work of SAIL and associated initiatives, including on proposed uses of SAIL data. We reviewed the topics discussed and provide examples of advice to researchers. We carried out a survey with members on their experiences of being on the Panel and their perceptions of the work ...
International Journal of Population Data Science, 2018
IntroductionThe Dementias Platform UK (DPUK) Data Portal is a secure, accessible environment faci... more IntroductionThe Dementias Platform UK (DPUK) Data Portal is a secure, accessible environment facilitating provision of rich data towards the largest Dementia, cognition and ageing community of cohort studies in the world. DPUK is also providing services for cohort studies and researchers to maximise the research potential of the programme’s community. Objectives and ApproachAs part of the engagement of DPUK cohorts with the Data Portal, cohorts will upload data onto the DPUK instance of UK Secure eResearch Platform infrastructure. The Data Portal allows access to a collaborative working space that allows cohorts to enrich their own data, perform their own analysis, and enhance the research potential of their data whilst making use of expertise at various DPUK sites, such as data linking, curation and multi-modal specialism. Cohort data divided into ontologies allows researchers to access data specific to their study needs and can be requested from multiple cohorts simultaneously. Re...
International Journal of Population Data Science, 2018
IntroductionSets of clinical codes that define conditions and events of interest are a key knowle... more IntroductionSets of clinical codes that define conditions and events of interest are a key knowledge product in health data research. Documenting such lists is essential for transparency and repeatability, and there is great potential benefit in their sharing and reuse. We designed and implemented software to address these goals. Objectives and ApproachOur goals were threefold: Provide a graphical user interface (GUI) to allow easier creation of code lists, for less technical users. Allow clear documentation of code lists, preserving the history of their creation and capturing metadata about their meaning, provenance, and use. Facilitate programmatic access, so that the software is not just documentation but can be integrated into data preparation and analysis. To these ends, we developed a web application using Python and PostgreSQL that allows creating, editing, and accessing via a GUI, as well as a REST API for integration into SQL, R, and other environments. ResultsThe software ...
International Journal for Population Data Science, 2017
ObjectivesThe UK MS Register is a research project that aims to capture real world data about liv... more ObjectivesThe UK MS Register is a research project that aims to capture real world data about living with Multiple Sclerosis(MS) in the UK. Launched in 2011, identified data sources were: Directly from People with MS (PwMS) via the internet, from NHS treatment centers via ‘traditional’ database capture and by linkage to routine datasets from the SAIL databank. Data received from the NHS, though ‘gold standard’ in terms of diagnosis, is dependent on clinical staff finding both time and information to enter into a clinical system. System implementations across the NHS are variable, as is clinical time. Therefore, we looked to other complementary methodologies. ApproachThe Clix enrich natural language processing (NLP) software was chosen to see if it could capture a portion of the MS Register minimum clinical dataset, the software matches clinical phrases against SNOMED-CT.40 letters, from 2 NHS Trusts, from 28 patients were loaded. The letters were a mix of MS patients with differing ...
International Journal of Healthcare Information Systems and Informatics, 2012
Internet-registers are having an increasing role in healthcare informatics. Understanding the mot... more Internet-registers are having an increasing role in healthcare informatics. Understanding the motivations and expectations of people choosing to use such registers is important, and these aspects were investigated regarding people with MS who registered on the UK MS Register. An objective was to explore relationships between these factors and the source from which participants first learned about this Register, as this is relevant to how registers are publicised. The responses from a large number of participants (N = 2,675) to questions about the source by which they discovered the Register, why they registered, and how they thought it should be used, were qualitatively analysed using a ‘word cloud’ technique and traditional content analysis strategy to provide a more detailed analysis. The significant trends that emerged from these analyses were the importance to the participants of: studying MS; raising awareness about MS; improving and developing services and policies regarding M...
Uploads
Papers by David Ford