PURPOSE Variation in risk of adverse clinical outcomes in patients with cancer and COVID-19 has b... more PURPOSE Variation in risk of adverse clinical outcomes in patients with cancer and COVID-19 has been reported from relatively small cohorts. The NCATS’ National COVID Cohort Collaborative (N3C) is a centralized data resource representing the largest multicenter cohort of COVID-19 cases and controls nationwide. We aimed to construct and characterize the cancer cohort within N3C and identify risk factors for all-cause mortality from COVID-19. METHODS We used 4,382,085 patients from 50 US medical centers to construct a cohort of patients with cancer. We restricted analyses to adults ≥ 18 years old with a COVID-19–positive or COVID-19–negative diagnosis between January 1, 2020, and March 25, 2021. We followed N3C selection of an index encounter per patient for analyses. All analyses were performed in the N3C Data Enclave Palantir platform. RESULTS A total of 398,579 adult patients with cancer were identified from the N3C cohort; 63,413 (15.9%) were COVID-19–positive. Most common represe...
Since late 2019, the novel coronavirus SARS-CoV-2 has introduced a wide array of health challenge... more Since late 2019, the novel coronavirus SARS-CoV-2 has introduced a wide array of health challenges globally. In addition to a complex acute presentation that can affect multiple organ systems, increasing evidence points to long-term sequelae being common and impactful. The worldwide scientific community is forging ahead to characterize a wide range of outcomes associated with SARS-CoV-2 infection; however the underlying assumptions in these studies have varied so widely that the resulting data are difficult to compareFormal definitions are needed in order to design robust and consistent studies of Long COVID that consistently capture variation in long-term outcomes. Even the condition itself goes by three terms, most widely “Long COVID”, but also “COVID-19 syndrome (PACS)” or, “post-acute sequelae of SARS-CoV-2 infection (PASC)”. In the present study, we investigate the definitions used in the literature published to date and compare them against data available from electronic healt...
BACKGROUND Bleeding events are common and critical which may cause significant morbidity and mort... more BACKGROUND Bleeding events are common and critical which may cause significant morbidity and mortality. Studies show that high incidences of bleeding events are associated with cardiovascular disease (CVD) patients on anticoagulant therapy. Prompt and accurate detection of bleeding events are essential for preventing serious consequences. As bleeding events are often described in clinical notes, automatic detection of bleeding events from Electronic Health Record (EHR) narratives has the potential to improve drug safety surveillance and pharmacovigilance. OBJECTIVE We developed a natural language processing (NLP) system to automatically classify whether an EHR note sentence contains a bleeding event. METHODS We expert-annotated 878 EHR notes (76,577 sentences and 562,630 word tokens) for identifying bleeding events at the sentence-level. This annotated corpus was then used to train and validate our NLP systems. We developed an innovative hybrid CNN and LSTM Autoencoder model (HCLA), which integrates a convolutional neural network architecture (CNN) with a bidirectional Long-short term memory (BiLSTM) autoencoder model to leverage large unlabeled EHR data. RESULTS HCLA achieved an F-score of 93.79% for identifying whether a sentence contains a bleeding event, surpassing the strong baseline SVM and other CNN models. CONCLUSIONS By incorporating a supervised CNN model with a pre-trained unsupervised BiLSTM Autoencoder, HCLA achieved a high performance in detecting bleeding events.
Medication and adverse drug event (ADE) information extracted from electronic health record (EHR)... more Medication and adverse drug event (ADE) information extracted from electronic health record (EHR) notes can be a rich resource for drug safety surveillance. Existing observational studies have mainly relied on structured EHR data to obtain ADE information; however, ADEs are often buried in the EHR narratives and not recorded in structured data. To unlock ADE-related information from EHR narratives, there is a need to extract relevant entities and identify relations among them. In this study, we focus on relation identification. This study aimed to evaluate natural language processing and machine learning approaches using the expert-annotated medical entities and relations in the context of drug safety surveillance, and investigate how different learning approaches perform under different configurations. We have manually annotated 791 EHR notes with 9 named entities (eg, medication, indication, severity, and ADEs) and 7 different types of relations (eg, medication-dosage, medication-...
ABSTRACT Chunking is a useful step for natural language processing. The paper puts forward a defi... more ABSTRACT Chunking is a useful step for natural language processing. The paper puts forward a definition of co-chunks for Chinese-English spoken-language translation, based on both the characteristics of spoken-language and the differences between Chinese and English. An algorithm is proposed to identify the co-chunks automatically, which combines the rules into a statistical method and makes a co-chunk has both syntactical structure and perfect meaning. Using the co-chunk alignment corpus, we present the framework of our ...
Studies in health technology and informatics, 2013
Physicians are increasingly using the Internet for finding medical information related to patient... more Physicians are increasingly using the Internet for finding medical information related to patient care. Wikipedia is a valuable online medical resource to be integrated into existing clinical question answering (QA) systems. On the other hand, Wikipedia contains a full spectrum of world's knowledge and therefore comprises a large partition of non-health-related content, which makes disambiguation more challenging and consequently leads to large overhead for existing systems to effectively filter irrelevant information. To overcome this, we have developed both unsupervised and supervised approaches to identify health-related articles as well as clinically relevant articles. Furthermore, we explored novel features by extracting health related hierarchy from the Wikipedia category network, from which a variety of features were derived and evaluated. Our experiments show promising results and also demonstrate that employing the category hierarchy can effectively improve the system p...
OBJECTIVE:
Clinical questions are often long and complex and take many forms. We have built a cl... more OBJECTIVE:
Clinical questions are often long and complex and take many forms. We have built a clinical question answering system named AskHERMES to perform robust semantic analysis on complex clinical questions and output question-focused extractive summaries as answers.
DESIGN:
This paper describes the system architecture and a preliminary evaluation of AskHERMES, which implements innovative approaches in question analysis, summarization, and answer presentation. Five types of resources were indexed in this system: MEDLINE abstracts, PubMed Central full-text articles, eMedicine documents, clinical guidelines and Wikipedia articles.
MEASUREMENT:
We compared the AskHERMES system with Google (Google and Google Scholar) and UpToDate and asked physicians to score the three systems by ease of use, quality of answer, time spent, and overall performance.
RESULTS:
AskHERMES allows physicians to enter a question in a natural way with minimal query formulation and allows physicians to efficiently navigate among all the answer sentences to quickly meet their information needs. In contrast, physicians need to formulate queries to search for information in Google and UpToDate. The development of the AskHERMES system is still at an early stage, and the knowledge resource is limited compared with Google or UpToDate. Nevertheless, the evaluation results show that AskHERMES' performance is comparable to the other systems. In particular, when answering complex clinical questions, it demonstrates the potential to outperform both Google and UpToDate systems.
CONCLUSIONS:
AskHERMES, available at http://www.AskHERMES.org, has the potential to help physicians practice evidence-based medicine and improve the quality of patient care.
"Objective
Both healthcare professionals and healthcare consumers have information needs that ca... more "Objective
Both healthcare professionals and healthcare consumers have information needs that can be met through the use of computers, specifically via medical question answering systems. However, the information needs of both groups are different in terms of literacy levels and technical expertise, and an effective question answering system must be able to account for these differences if it is to formulate the most relevant responses for users from each group. In this paper, we propose that a first step toward answering the queries of different users is automatically classifying questions according to whether they were asked by healthcare professionals or consumers.
Design
We obtained two sets of consumer questions (~10,000 questions in total) from Yahoo answers. The professional questions consist of two question collections: 4654 point-of-care questions (denoted as PointCare) obtained from interviews of a group of family doctors following patient visits and 5378 questions from physician practices through professional online services (denoted as OnlinePractice). With more than 20,000 questions combined, we developed supervised machine-learning models for automatic classification between consumer questions and professional questions. To evaluate the robustness of our models, we tested the model that was trained on the Consumer-PointCare dataset on the Consumer-OnlinePractice dataset. We evaluated both linguistic features and statistical features and examined how the characteristics in two different types of professional questions (PointCare vs. OnlinePractice) may affect the classification performance. We explored information gain for feature reduction and the back-off linguistic category features.
Results
10-fold cross-validation results showed the best F1-measure of 0.936 and 0.946 on Consumer-PointCare and Consumer-OnlinePractice respectively, and the best F1-measure of 0.891 when testing the Consumer-PointCare model on the Consumer-OnlinePractice dataset.
Conclusion
Healthcare consumer questions posted at Yahoo online communities can be reliably classified from professional questions posted by point-of-care clinicians and online physicians. The supervised machine-learning models are robust for this task. Our study will significantly benefit further development in automated consumer question answering."
Abstract In mobile health (M-health), Short Message Service (SMS) has shown to improve disease re... more Abstract In mobile health (M-health), Short Message Service (SMS) has shown to improve disease related self-management and health service outcomes, leading to enhanced patient care. However, the hard limit on character size for each message limits the full value of exploring SMS communication in health care practices. To overcome this problem and improve the efficiency of clinical workflow, we developed an innovative system, MedTxting (available at http://medtxting. askhermes.
Abstract There has been increasing interest recently in meeting understanding, such as summarizat... more Abstract There has been increasing interest recently in meeting understanding, such as summarization, browsing, action item detection, and topic segmentation. However, there is very limited effort on using rich recognition output (eg, recognition confidence measure or more recognition candidates) for these downstream tasks. This paper presents an initial study using n-best recognition hypotheses for two tasks, extractive summarization and keyword extraction.
Abstract Significant research efforts have been devoted to speech summarization, including automa... more Abstract Significant research efforts have been devoted to speech summarization, including automatic approaches and evaluation metrics. However, a fundamental problem about what summaries are for the speech data and whether humans agree with each other remains unclear. This paper performs an analysis of human annotated extractive summaries using the ICSI meeting corpus with an aim to examine their consistency and the factors impacting human agreement.
Abstract Social media language contains huge amount and wide variety of nonstandard tokens, creat... more Abstract Social media language contains huge amount and wide variety of nonstandard tokens, created both intentionally and unintentionally by the users. It is of crucial importance to normalize the noisy nonstandard tokens before applying other NLP techniques. A major challenge facing this task is the system coverage, ie, for any user-created nonstandard term, the system should be able to restore the correct word within its top n output candidates.
Abstract: This paper describes our participation in the 2008 TREC Blog track. Our system consists... more Abstract: This paper describes our participation in the 2008 TREC Blog track. Our system consists of 3 components: data preprocessing, topic retrieval, and opinion finding. In the topic retrieval task, we applied Lemur IR toolkit and used various techniques for query expansion. In the opinion finding and polarization task, we employed a feature-based classification approach. Then re-ranking was performed using a linear combination of the opinionated score and the topic relevance score.
ABSTRACT Osteoarthritis (OA) means inflammation of the joints, with the symptoms of joint pain, s... more ABSTRACT Osteoarthritis (OA) means inflammation of the joints, with the symptoms of joint pain, stiffness, and swelling of the joints. It is a degenerative disease that appears to be caused by both biomechanical and biochemical factors. Intra-articular (IA) injection treatment is one of the main treatment methods for OA because of its positive effect in reducing joint pain and increasing joint mobility.
Abstract Under statistical learning framework, the paper focuses on how to use traditional lingui... more Abstract Under statistical learning framework, the paper focuses on how to use traditional linguistic findings on anaphora resolution as a guide for mining and organizing contextual features for Chinese co-reference resolution.
PURPOSE Variation in risk of adverse clinical outcomes in patients with cancer and COVID-19 has b... more PURPOSE Variation in risk of adverse clinical outcomes in patients with cancer and COVID-19 has been reported from relatively small cohorts. The NCATS’ National COVID Cohort Collaborative (N3C) is a centralized data resource representing the largest multicenter cohort of COVID-19 cases and controls nationwide. We aimed to construct and characterize the cancer cohort within N3C and identify risk factors for all-cause mortality from COVID-19. METHODS We used 4,382,085 patients from 50 US medical centers to construct a cohort of patients with cancer. We restricted analyses to adults ≥ 18 years old with a COVID-19–positive or COVID-19–negative diagnosis between January 1, 2020, and March 25, 2021. We followed N3C selection of an index encounter per patient for analyses. All analyses were performed in the N3C Data Enclave Palantir platform. RESULTS A total of 398,579 adult patients with cancer were identified from the N3C cohort; 63,413 (15.9%) were COVID-19–positive. Most common represe...
Since late 2019, the novel coronavirus SARS-CoV-2 has introduced a wide array of health challenge... more Since late 2019, the novel coronavirus SARS-CoV-2 has introduced a wide array of health challenges globally. In addition to a complex acute presentation that can affect multiple organ systems, increasing evidence points to long-term sequelae being common and impactful. The worldwide scientific community is forging ahead to characterize a wide range of outcomes associated with SARS-CoV-2 infection; however the underlying assumptions in these studies have varied so widely that the resulting data are difficult to compareFormal definitions are needed in order to design robust and consistent studies of Long COVID that consistently capture variation in long-term outcomes. Even the condition itself goes by three terms, most widely “Long COVID”, but also “COVID-19 syndrome (PACS)” or, “post-acute sequelae of SARS-CoV-2 infection (PASC)”. In the present study, we investigate the definitions used in the literature published to date and compare them against data available from electronic healt...
BACKGROUND Bleeding events are common and critical which may cause significant morbidity and mort... more BACKGROUND Bleeding events are common and critical which may cause significant morbidity and mortality. Studies show that high incidences of bleeding events are associated with cardiovascular disease (CVD) patients on anticoagulant therapy. Prompt and accurate detection of bleeding events are essential for preventing serious consequences. As bleeding events are often described in clinical notes, automatic detection of bleeding events from Electronic Health Record (EHR) narratives has the potential to improve drug safety surveillance and pharmacovigilance. OBJECTIVE We developed a natural language processing (NLP) system to automatically classify whether an EHR note sentence contains a bleeding event. METHODS We expert-annotated 878 EHR notes (76,577 sentences and 562,630 word tokens) for identifying bleeding events at the sentence-level. This annotated corpus was then used to train and validate our NLP systems. We developed an innovative hybrid CNN and LSTM Autoencoder model (HCLA), which integrates a convolutional neural network architecture (CNN) with a bidirectional Long-short term memory (BiLSTM) autoencoder model to leverage large unlabeled EHR data. RESULTS HCLA achieved an F-score of 93.79% for identifying whether a sentence contains a bleeding event, surpassing the strong baseline SVM and other CNN models. CONCLUSIONS By incorporating a supervised CNN model with a pre-trained unsupervised BiLSTM Autoencoder, HCLA achieved a high performance in detecting bleeding events.
Medication and adverse drug event (ADE) information extracted from electronic health record (EHR)... more Medication and adverse drug event (ADE) information extracted from electronic health record (EHR) notes can be a rich resource for drug safety surveillance. Existing observational studies have mainly relied on structured EHR data to obtain ADE information; however, ADEs are often buried in the EHR narratives and not recorded in structured data. To unlock ADE-related information from EHR narratives, there is a need to extract relevant entities and identify relations among them. In this study, we focus on relation identification. This study aimed to evaluate natural language processing and machine learning approaches using the expert-annotated medical entities and relations in the context of drug safety surveillance, and investigate how different learning approaches perform under different configurations. We have manually annotated 791 EHR notes with 9 named entities (eg, medication, indication, severity, and ADEs) and 7 different types of relations (eg, medication-dosage, medication-...
ABSTRACT Chunking is a useful step for natural language processing. The paper puts forward a defi... more ABSTRACT Chunking is a useful step for natural language processing. The paper puts forward a definition of co-chunks for Chinese-English spoken-language translation, based on both the characteristics of spoken-language and the differences between Chinese and English. An algorithm is proposed to identify the co-chunks automatically, which combines the rules into a statistical method and makes a co-chunk has both syntactical structure and perfect meaning. Using the co-chunk alignment corpus, we present the framework of our ...
Studies in health technology and informatics, 2013
Physicians are increasingly using the Internet for finding medical information related to patient... more Physicians are increasingly using the Internet for finding medical information related to patient care. Wikipedia is a valuable online medical resource to be integrated into existing clinical question answering (QA) systems. On the other hand, Wikipedia contains a full spectrum of world's knowledge and therefore comprises a large partition of non-health-related content, which makes disambiguation more challenging and consequently leads to large overhead for existing systems to effectively filter irrelevant information. To overcome this, we have developed both unsupervised and supervised approaches to identify health-related articles as well as clinically relevant articles. Furthermore, we explored novel features by extracting health related hierarchy from the Wikipedia category network, from which a variety of features were derived and evaluated. Our experiments show promising results and also demonstrate that employing the category hierarchy can effectively improve the system p...
OBJECTIVE:
Clinical questions are often long and complex and take many forms. We have built a cl... more OBJECTIVE:
Clinical questions are often long and complex and take many forms. We have built a clinical question answering system named AskHERMES to perform robust semantic analysis on complex clinical questions and output question-focused extractive summaries as answers.
DESIGN:
This paper describes the system architecture and a preliminary evaluation of AskHERMES, which implements innovative approaches in question analysis, summarization, and answer presentation. Five types of resources were indexed in this system: MEDLINE abstracts, PubMed Central full-text articles, eMedicine documents, clinical guidelines and Wikipedia articles.
MEASUREMENT:
We compared the AskHERMES system with Google (Google and Google Scholar) and UpToDate and asked physicians to score the three systems by ease of use, quality of answer, time spent, and overall performance.
RESULTS:
AskHERMES allows physicians to enter a question in a natural way with minimal query formulation and allows physicians to efficiently navigate among all the answer sentences to quickly meet their information needs. In contrast, physicians need to formulate queries to search for information in Google and UpToDate. The development of the AskHERMES system is still at an early stage, and the knowledge resource is limited compared with Google or UpToDate. Nevertheless, the evaluation results show that AskHERMES' performance is comparable to the other systems. In particular, when answering complex clinical questions, it demonstrates the potential to outperform both Google and UpToDate systems.
CONCLUSIONS:
AskHERMES, available at http://www.AskHERMES.org, has the potential to help physicians practice evidence-based medicine and improve the quality of patient care.
"Objective
Both healthcare professionals and healthcare consumers have information needs that ca... more "Objective
Both healthcare professionals and healthcare consumers have information needs that can be met through the use of computers, specifically via medical question answering systems. However, the information needs of both groups are different in terms of literacy levels and technical expertise, and an effective question answering system must be able to account for these differences if it is to formulate the most relevant responses for users from each group. In this paper, we propose that a first step toward answering the queries of different users is automatically classifying questions according to whether they were asked by healthcare professionals or consumers.
Design
We obtained two sets of consumer questions (~10,000 questions in total) from Yahoo answers. The professional questions consist of two question collections: 4654 point-of-care questions (denoted as PointCare) obtained from interviews of a group of family doctors following patient visits and 5378 questions from physician practices through professional online services (denoted as OnlinePractice). With more than 20,000 questions combined, we developed supervised machine-learning models for automatic classification between consumer questions and professional questions. To evaluate the robustness of our models, we tested the model that was trained on the Consumer-PointCare dataset on the Consumer-OnlinePractice dataset. We evaluated both linguistic features and statistical features and examined how the characteristics in two different types of professional questions (PointCare vs. OnlinePractice) may affect the classification performance. We explored information gain for feature reduction and the back-off linguistic category features.
Results
10-fold cross-validation results showed the best F1-measure of 0.936 and 0.946 on Consumer-PointCare and Consumer-OnlinePractice respectively, and the best F1-measure of 0.891 when testing the Consumer-PointCare model on the Consumer-OnlinePractice dataset.
Conclusion
Healthcare consumer questions posted at Yahoo online communities can be reliably classified from professional questions posted by point-of-care clinicians and online physicians. The supervised machine-learning models are robust for this task. Our study will significantly benefit further development in automated consumer question answering."
Abstract In mobile health (M-health), Short Message Service (SMS) has shown to improve disease re... more Abstract In mobile health (M-health), Short Message Service (SMS) has shown to improve disease related self-management and health service outcomes, leading to enhanced patient care. However, the hard limit on character size for each message limits the full value of exploring SMS communication in health care practices. To overcome this problem and improve the efficiency of clinical workflow, we developed an innovative system, MedTxting (available at http://medtxting. askhermes.
Abstract There has been increasing interest recently in meeting understanding, such as summarizat... more Abstract There has been increasing interest recently in meeting understanding, such as summarization, browsing, action item detection, and topic segmentation. However, there is very limited effort on using rich recognition output (eg, recognition confidence measure or more recognition candidates) for these downstream tasks. This paper presents an initial study using n-best recognition hypotheses for two tasks, extractive summarization and keyword extraction.
Abstract Significant research efforts have been devoted to speech summarization, including automa... more Abstract Significant research efforts have been devoted to speech summarization, including automatic approaches and evaluation metrics. However, a fundamental problem about what summaries are for the speech data and whether humans agree with each other remains unclear. This paper performs an analysis of human annotated extractive summaries using the ICSI meeting corpus with an aim to examine their consistency and the factors impacting human agreement.
Abstract Social media language contains huge amount and wide variety of nonstandard tokens, creat... more Abstract Social media language contains huge amount and wide variety of nonstandard tokens, created both intentionally and unintentionally by the users. It is of crucial importance to normalize the noisy nonstandard tokens before applying other NLP techniques. A major challenge facing this task is the system coverage, ie, for any user-created nonstandard term, the system should be able to restore the correct word within its top n output candidates.
Abstract: This paper describes our participation in the 2008 TREC Blog track. Our system consists... more Abstract: This paper describes our participation in the 2008 TREC Blog track. Our system consists of 3 components: data preprocessing, topic retrieval, and opinion finding. In the topic retrieval task, we applied Lemur IR toolkit and used various techniques for query expansion. In the opinion finding and polarization task, we employed a feature-based classification approach. Then re-ranking was performed using a linear combination of the opinionated score and the topic relevance score.
ABSTRACT Osteoarthritis (OA) means inflammation of the joints, with the symptoms of joint pain, s... more ABSTRACT Osteoarthritis (OA) means inflammation of the joints, with the symptoms of joint pain, stiffness, and swelling of the joints. It is a degenerative disease that appears to be caused by both biomechanical and biochemical factors. Intra-articular (IA) injection treatment is one of the main treatment methods for OA because of its positive effect in reducing joint pain and increasing joint mobility.
Abstract Under statistical learning framework, the paper focuses on how to use traditional lingui... more Abstract Under statistical learning framework, the paper focuses on how to use traditional linguistic findings on anaphora resolution as a guide for mining and organizing contextual features for Chinese co-reference resolution.
Uploads
Papers by Feifan Liu
Clinical questions are often long and complex and take many forms. We have built a clinical question answering system named AskHERMES to perform robust semantic analysis on complex clinical questions and output question-focused extractive summaries as answers.
DESIGN:
This paper describes the system architecture and a preliminary evaluation of AskHERMES, which implements innovative approaches in question analysis, summarization, and answer presentation. Five types of resources were indexed in this system: MEDLINE abstracts, PubMed Central full-text articles, eMedicine documents, clinical guidelines and Wikipedia articles.
MEASUREMENT:
We compared the AskHERMES system with Google (Google and Google Scholar) and UpToDate and asked physicians to score the three systems by ease of use, quality of answer, time spent, and overall performance.
RESULTS:
AskHERMES allows physicians to enter a question in a natural way with minimal query formulation and allows physicians to efficiently navigate among all the answer sentences to quickly meet their information needs. In contrast, physicians need to formulate queries to search for information in Google and UpToDate. The development of the AskHERMES system is still at an early stage, and the knowledge resource is limited compared with Google or UpToDate. Nevertheless, the evaluation results show that AskHERMES' performance is comparable to the other systems. In particular, when answering complex clinical questions, it demonstrates the potential to outperform both Google and UpToDate systems.
CONCLUSIONS:
AskHERMES, available at http://www.AskHERMES.org, has the potential to help physicians practice evidence-based medicine and improve the quality of patient care.
Both healthcare professionals and healthcare consumers have information needs that can be met through the use of computers, specifically via medical question answering systems. However, the information needs of both groups are different in terms of literacy levels and technical expertise, and an effective question answering system must be able to account for these differences if it is to formulate the most relevant responses for users from each group. In this paper, we propose that a first step toward answering the queries of different users is automatically classifying questions according to whether they were asked by healthcare professionals or consumers.
Design
We obtained two sets of consumer questions (~10,000 questions in total) from Yahoo answers. The professional questions consist of two question collections: 4654 point-of-care questions (denoted as PointCare) obtained from interviews of a group of family doctors following patient visits and 5378 questions from physician practices through professional online services (denoted as OnlinePractice). With more than 20,000 questions combined, we developed supervised machine-learning models for automatic classification between consumer questions and professional questions. To evaluate the robustness of our models, we tested the model that was trained on the Consumer-PointCare dataset on the Consumer-OnlinePractice dataset. We evaluated both linguistic features and statistical features and examined how the characteristics in two different types of professional questions (PointCare vs. OnlinePractice) may affect the classification performance. We explored information gain for feature reduction and the back-off linguistic category features.
Results
10-fold cross-validation results showed the best F1-measure of 0.936 and 0.946 on Consumer-PointCare and Consumer-OnlinePractice respectively, and the best F1-measure of 0.891 when testing the Consumer-PointCare model on the Consumer-OnlinePractice dataset.
Conclusion
Healthcare consumer questions posted at Yahoo online communities can be reliably classified from professional questions posted by point-of-care clinicians and online physicians. The supervised machine-learning models are robust for this task. Our study will significantly benefit further development in automated consumer question answering."
Clinical questions are often long and complex and take many forms. We have built a clinical question answering system named AskHERMES to perform robust semantic analysis on complex clinical questions and output question-focused extractive summaries as answers.
DESIGN:
This paper describes the system architecture and a preliminary evaluation of AskHERMES, which implements innovative approaches in question analysis, summarization, and answer presentation. Five types of resources were indexed in this system: MEDLINE abstracts, PubMed Central full-text articles, eMedicine documents, clinical guidelines and Wikipedia articles.
MEASUREMENT:
We compared the AskHERMES system with Google (Google and Google Scholar) and UpToDate and asked physicians to score the three systems by ease of use, quality of answer, time spent, and overall performance.
RESULTS:
AskHERMES allows physicians to enter a question in a natural way with minimal query formulation and allows physicians to efficiently navigate among all the answer sentences to quickly meet their information needs. In contrast, physicians need to formulate queries to search for information in Google and UpToDate. The development of the AskHERMES system is still at an early stage, and the knowledge resource is limited compared with Google or UpToDate. Nevertheless, the evaluation results show that AskHERMES' performance is comparable to the other systems. In particular, when answering complex clinical questions, it demonstrates the potential to outperform both Google and UpToDate systems.
CONCLUSIONS:
AskHERMES, available at http://www.AskHERMES.org, has the potential to help physicians practice evidence-based medicine and improve the quality of patient care.
Both healthcare professionals and healthcare consumers have information needs that can be met through the use of computers, specifically via medical question answering systems. However, the information needs of both groups are different in terms of literacy levels and technical expertise, and an effective question answering system must be able to account for these differences if it is to formulate the most relevant responses for users from each group. In this paper, we propose that a first step toward answering the queries of different users is automatically classifying questions according to whether they were asked by healthcare professionals or consumers.
Design
We obtained two sets of consumer questions (~10,000 questions in total) from Yahoo answers. The professional questions consist of two question collections: 4654 point-of-care questions (denoted as PointCare) obtained from interviews of a group of family doctors following patient visits and 5378 questions from physician practices through professional online services (denoted as OnlinePractice). With more than 20,000 questions combined, we developed supervised machine-learning models for automatic classification between consumer questions and professional questions. To evaluate the robustness of our models, we tested the model that was trained on the Consumer-PointCare dataset on the Consumer-OnlinePractice dataset. We evaluated both linguistic features and statistical features and examined how the characteristics in two different types of professional questions (PointCare vs. OnlinePractice) may affect the classification performance. We explored information gain for feature reduction and the back-off linguistic category features.
Results
10-fold cross-validation results showed the best F1-measure of 0.936 and 0.946 on Consumer-PointCare and Consumer-OnlinePractice respectively, and the best F1-measure of 0.891 when testing the Consumer-PointCare model on the Consumer-OnlinePractice dataset.
Conclusion
Healthcare consumer questions posted at Yahoo online communities can be reliably classified from professional questions posted by point-of-care clinicians and online physicians. The supervised machine-learning models are robust for this task. Our study will significantly benefit further development in automated consumer question answering."