BACKGROUND Scientific hypothesis generation is a critical step in scientific research that determ... more BACKGROUND Scientific hypothesis generation is a critical step in scientific research that determines the direction and impact of any investigation. Despite its vital role, we have limited knowledge of the process itself, thus hindering our ability to address some critical questions. OBJECTIVE This study aims to answer the following questions: To what extent can secondary data analytics tools facilitate the generation of scientific hypotheses during clinical research? Are the processes similar in developing clinical diagnoses during clinical practice and developing scientific hypotheses for clinical research projects? Furthermore, this study explores the process of scientific hypothesis generation in the context of clinical research. It was designed to compare the role of VIADS, a visual interactive analysis tool for filtering and summarizing large data sets coded with hierarchical terminologies, and the experience levels of study participants during the scientific hypothesis generation process. METHODS This manuscript introduces a study design. Experienced and inexperienced clinical researchers are being recruited since July 2021 to take part in this 2×2 factorial study, in which all participants use the same data sets during scientific hypothesis–generation sessions and follow predetermined scripts. The clinical researchers are separated into experienced or inexperienced groups based on predetermined criteria and are then randomly assigned into groups that use and do not use VIADS via block randomization. The study sessions, screen activities, and audio recordings of participants are captured. Participants use the think-aloud protocol during the study sessions. After each study session, every participant is given a follow-up survey, with participants using VIADS completing an additional modified System Usability Scale survey. A panel of clinical research experts will assess the scientific hypotheses generated by participants based on predeveloped metrics. All data will be anonymized, transcribed, aggregated, and analyzed. RESULTS Data collection for this study began in July 2021. Recruitment uses a brief online survey. The preliminary results showed that study participants can generate a few to over a dozen scientific hypotheses during a 2-hour study session, regardless of whether they used VIADS or other analytics tools. A metric to more accurately, comprehensively, and consistently assess scientific hypotheses within a clinical research context has been developed. CONCLUSIONS The scientific hypothesis–generation process is an advanced cognitive activity and a complex process. Our results so far show that clinical researchers can quickly generate initial scientific hypotheses based on data sets and prior experience. However, refining these scientific hypotheses is a much more time-consuming activity. To uncover the fundamental mechanisms underlying the generation of scientific hypotheses, we need breakthroughs that can capture thinking processes more precisely. INTERNATIONAL REGISTERED REPORT DERR1-10.2196/39414
BACKGROUND Visualization can be a powerful tool to comprehend data sets, especially when they can... more BACKGROUND Visualization can be a powerful tool to comprehend data sets, especially when they can be represented via hierarchical structures. Enhanced comprehension can facilitate the development of scientific hypotheses. However, the inclusion of excessive data can make visualizations overwhelming. OBJECTIVE We developed a visual interactive analytic tool for filtering and summarizing large health data sets coded with hierarchical terminologies (VIADS). In this study, we evaluated the usability of VIADS for visualizing data sets of patient diagnoses and procedures coded in the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM). METHODS We used mixed methods in the study. A group of 12 clinical researchers participated in the generation of data-driven hypotheses using the same data sets and time frame (a 1-hour training session and a 2-hour study session) utilizing VIADS via the think-aloud protocol. The audio and screen activities were recorded remotely. A modified version of the System Usability Scale (SUS) survey and a brief survey with open-ended questions were administered after the study to assess the usability of VIADS and verify their intense usage experience with VIADS. RESULTS The range of SUS scores was 37.5 to 87.5. The mean SUS score for VIADS was 71.88 (out of a possible 100, SD 14.62), and the median SUS was 75. The participants unanimously agreed that VIADS offers new perspectives on data sets (12/12, 100%), while 75% (8/12) agreed that VIADS facilitates understanding, presentation, and interpretation of underlying data sets. The comments on the utility of VIADS were positive and aligned well with the design objectives of VIADS. The answers to the open-ended questions in the modified SUS provided specific suggestions regarding potential improvements for VIADS, and the identified problems with usability were used to update the tool. CONCLUSIONS This usability study demonstrates that VIADS is a usable tool for analyzing secondary data sets with good average usability, good SUS score, and favorable utility. Currently, VIADS accepts data sets with hierarchical codes and their corresponding frequencies. Consequently, only specific types of use cases are supported by the analytical results. Participants agreed, however, that VIADS provides new perspectives on data sets and is relatively easy to use. The VIADS functionalities most appreciated by participants were the ability to filter, summarize, compare, and visualize data. INTERNATIONAL REGISTERED REPORT RR2-10.2196/39414
BACKGROUND Visualization can be a powerful tool to comprehend data sets, especially when they can... more BACKGROUND Visualization can be a powerful tool to comprehend data sets, especially when they can be represented via hierarchical structures. Enhanced comprehension can facilitate the development of scientific hypotheses. However, the inclusion of excessive data can make visualizations overwhelming. OBJECTIVE We developed a visual interactive analytic tool for filtering and summarizing large health data sets coded with hierarchical terminologies (VIADS). In this study, we evaluated the usability of VIADS for visualizing data sets of patient diagnoses and procedures coded in the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM). METHODS We used mixed methods in the study. A group of 12 clinical researchers participated in the generation of data-driven hypotheses using the same data sets and time frame (a 1-hour training session and a 2-hour study session) utilizing VIADS via the think-aloud protocol. The audio and screen activities were record...
Background MetaMap is a valuable tool for processing biomedical texts to identify concepts. Altho... more Background MetaMap is a valuable tool for processing biomedical texts to identify concepts. Although MetaMap is highly configurative, configuration decisions are not straightforward. Objective To develop a systematic, data-driven methodology for configuring MetaMap for optimal performance. Methods MetaMap, the word2vec model, and the phrase model were used to build a pipeline. For unsupervised training, the phrase and word2vec models used abstracts related to clinical decision support as input. During testing, MetaMap was configured with the default option, one behavior option, and two behavior options. For each configuration, cosine and soft cosine similarity scores between identified entities and gold-standard terms were computed for 40 annotated abstracts (422 sentences). The similarity scores were used to calculate and compare the overall percentages of exact matches, similar matches, and missing gold-standard terms among the abstracts for each configuration. The results were ma...
Background Disease status (eg, cancer stage) has been used in routine clinical practice to determ... more Background Disease status (eg, cancer stage) has been used in routine clinical practice to determine more accurate treatment plans. Health-related indicators, such as mortality, morbidity, and population group life expectancy, have also been used. However, few studies have specifically focused on the comprehensive and objective measures of individual health status. Objective The aim of this study was to analyze the perspectives of the public toward 29 health indicators obtained from a literature review to provide evidence for further prioritization of the indicators. The difference between health status and disease status should be considered. Methods This study used a cross-sectional design. Online surveys were administered through Ohio University, ResearchMatch, and Clemson University, resulting in three samples. Participants aged 18 years or older rated the importance of the 29 health indicators. The rating results were aggregated and analyzed as follows (in each case, the depend...
BACKGROUND Scientific hypothesis generation is a critical step in scientific research that determ... more BACKGROUND Scientific hypothesis generation is a critical step in scientific research that determines the direction and impact of any investigation. Despite its vital role, we have limited knowledge of the process itself, thus hindering our ability to address some critical questions. OBJECTIVE This study aims to answer the following questions: To what extent can secondary data analytics tools facilitate the generation of scientific hypotheses during clinical research? Are the processes similar in developing clinical diagnoses during clinical practice and developing scientific hypotheses for clinical research projects? Furthermore, this study explores the process of scientific hypothesis generation in the context of clinical research. It was designed to compare the role of VIADS, a visual interactive analysis tool for filtering and summarizing large data sets coded with hierarchical terminologies, and the experience levels of study participants during the scientific hypothesis genera...
ObjectiveWe examined the perspectives of the general public on 29 health indicators to provide ev... more ObjectiveWe examined the perspectives of the general public on 29 health indicators to provide evidence for further prioritizing the indicators, which were obtained from the literature review. Health status is different from disease status, which can refer to different stages of cancer.DesignThis study uses a cross-sectional design.SettingAn online survey was administered through Ohio University, ResearchMatch, and Clemson University.ParticipantsParticipants included the general public who are 18 years or older. A total of 1153 valid responses were included in the analysis.Primary outcomes measuresParticipants rated the importance of the 29 health indicators. The data were aggregated, cleaned, and analyzed in three ways: (1) to determine the agreement among the three samples on the importance of each indicator (IV = the three samples, DV = individual survey responses); (2) to examine the mean differences between the retained indicators with agreement across the three samples (IV = t...
BACKGROUND Background: The unified medical language system (UMLS) has been a critical tool in bio... more BACKGROUND Background: The unified medical language system (UMLS) has been a critical tool in biomedical and health informatics, and the year 2020 marks the 30th anniversary of UMLS. Despite its longevity, there is no systematic review on UMLS, in general. Thus, this systematic review was conducted to provide an overview of UMLS and its usage in English-language publications in the last 30 years. OBJECTIVE Objectives: The objective is twofold: to provide a comprehensive and systematic picture of the themes, their subtopics, and the publications under each category and to document systematic evidence of UMLS and how it has been used in English-language publications in the last 30 years. METHODS Methods: PubMed, ACM Digital Library, and Nursing & Allied Health Database were used to search for literature. The primary literature search strategy was as follows: UMLS was used as a MeSH term or a keyword or appeared in the title or abstract. Only English-language publications were consider...
Clinicians' patient care information needs are frequent and largely unmet. Online knowledge r... more Clinicians' patient care information needs are frequent and largely unmet. Online knowledge resources are available that can help clinicians meet these information needs. Yet, significant barriers limit the use of these resources within the clinical workflow. Infobuttons are clinical decision support tools that use the clinical context (e.g., institution, user, patient) within electronic health record (EHR) systems to anticipate clinicians' questions and provide automated links to relevant information in knowledge resources. This paper describes OpenInfobutton (www.openinfobutton.org): a standards-based, open source Web service that was designed to disseminate infobutton capabilities in multiple EHR systems and healthcare organizations. OpenInfobutton has been successfully integrated with 38 knowledge resources at 5 large healthcare organizations in the United States. We describe the OpenInfobutton architecture, knowledge resource integration, and experiences at five large h...
Context-aware links between electronic health records (EHRs) and online knowledge resources, comm... more Context-aware links between electronic health records (EHRs) and online knowledge resources, commonly called "infobuttons" are being used increasingly as part of EHR "meaningful use" requirements. While an HL7 standard exists for specifying how the links should be constructed, there is no guidance on what links to construct. Collectively, the authors manage four infobutton systems that serve 16 institutions. The purpose of this paper is to publish our experience with linking various resources and specifying particular criteria that can be used by infobutton managers to select resources that are most relevant for a given situation. This experience can be used directly by those wishing to customize their own EHRs, for example by using the OpenInfobutton infobutton manager and its configuration tool, the Librarian Infobutton Tailoring Environment.
Infobuttons are clinical decision support tools that use information about the clinical context (... more Infobuttons are clinical decision support tools that use information about the clinical context (institution, user, patient) in which an information need arises to provide direct access to relevant information from knowledge resources. Two freely available resources make infobutton implementation possible for virtually any EHR system. OpenInfobutton is an HL7-compliant system that accepts context parameters from an EHR and, using its knowledge base of resources and information needs, generates a set of links that direct the user to relevant information. The Librarian Infobutton Tailoring Environment (LITE) is a second system that allows institutional librarians to specify which resources should be selected in a given context by OpenInfobutton. This paper describes the steps needed to use LITE to customize OpenInfobutton and to integrate OpenInfobutton into an EHR.
To explore new graphical methods for reducing and analyzing large data sets in which the data are... more To explore new graphical methods for reducing and analyzing large data sets in which the data are coded with a hierarchical terminology. We use a hierarchical terminology to organize a data set and display it in a graph. We reduce the size and complexity of the data set by considering the terminological structure and the data set itself (using a variety of thresholds) as well as contributions of child level nodes to parent level nodes. We found that our methods can reduce large data sets to manageable size and highlight the differences among graphs. The thresholds used as filters to reduce the data set can be used alone or in combination. We applied our methods to two data sets containing information about how nurses and physicians query online knowledge resources. The reduced graphs make the differences between the two groups readily apparent. This is a new approach to reduce size and complexity of large data sets and to simplify visualization. This approach can be applied to any d...
SummaryObjectives: Graphical displays can make data more understandable; however, large graphs ca... more SummaryObjectives: Graphical displays can make data more understandable; however, large graphs can challenge human comprehension. We have previously described a filtering method to provide high-level summary views of large data sets. In this paper we demonstrate our method for setting and selecting thresholds to limit graph size while retaining important information by applying it to large single and paired data sets, taken from patient and bibliographic databases.Methods: Four case studies are used to illustrate our method. The data are either patient discharge diagnoses (coded using the International Classification of Diseases, Clinical Modifications [ICD9-CM]) or Medline citations (coded using the Medical Subject Headings [MeSH]). We use combinations of different thresholds to obtain filtered graphs for detailed analysis. The thresholds setting and selection, such as thresholds for node counts, class counts, ratio values, p values (for diff data sets), and percentiles of selected...
BACKGROUND Scientific hypothesis generation is a critical step in scientific research that determ... more BACKGROUND Scientific hypothesis generation is a critical step in scientific research that determines the direction and impact of any investigation. Despite its vital role, we have limited knowledge of the process itself, thus hindering our ability to address some critical questions. OBJECTIVE This study aims to answer the following questions: To what extent can secondary data analytics tools facilitate the generation of scientific hypotheses during clinical research? Are the processes similar in developing clinical diagnoses during clinical practice and developing scientific hypotheses for clinical research projects? Furthermore, this study explores the process of scientific hypothesis generation in the context of clinical research. It was designed to compare the role of VIADS, a visual interactive analysis tool for filtering and summarizing large data sets coded with hierarchical terminologies, and the experience levels of study participants during the scientific hypothesis generation process. METHODS This manuscript introduces a study design. Experienced and inexperienced clinical researchers are being recruited since July 2021 to take part in this 2×2 factorial study, in which all participants use the same data sets during scientific hypothesis–generation sessions and follow predetermined scripts. The clinical researchers are separated into experienced or inexperienced groups based on predetermined criteria and are then randomly assigned into groups that use and do not use VIADS via block randomization. The study sessions, screen activities, and audio recordings of participants are captured. Participants use the think-aloud protocol during the study sessions. After each study session, every participant is given a follow-up survey, with participants using VIADS completing an additional modified System Usability Scale survey. A panel of clinical research experts will assess the scientific hypotheses generated by participants based on predeveloped metrics. All data will be anonymized, transcribed, aggregated, and analyzed. RESULTS Data collection for this study began in July 2021. Recruitment uses a brief online survey. The preliminary results showed that study participants can generate a few to over a dozen scientific hypotheses during a 2-hour study session, regardless of whether they used VIADS or other analytics tools. A metric to more accurately, comprehensively, and consistently assess scientific hypotheses within a clinical research context has been developed. CONCLUSIONS The scientific hypothesis–generation process is an advanced cognitive activity and a complex process. Our results so far show that clinical researchers can quickly generate initial scientific hypotheses based on data sets and prior experience. However, refining these scientific hypotheses is a much more time-consuming activity. To uncover the fundamental mechanisms underlying the generation of scientific hypotheses, we need breakthroughs that can capture thinking processes more precisely. INTERNATIONAL REGISTERED REPORT DERR1-10.2196/39414
BACKGROUND Visualization can be a powerful tool to comprehend data sets, especially when they can... more BACKGROUND Visualization can be a powerful tool to comprehend data sets, especially when they can be represented via hierarchical structures. Enhanced comprehension can facilitate the development of scientific hypotheses. However, the inclusion of excessive data can make visualizations overwhelming. OBJECTIVE We developed a visual interactive analytic tool for filtering and summarizing large health data sets coded with hierarchical terminologies (VIADS). In this study, we evaluated the usability of VIADS for visualizing data sets of patient diagnoses and procedures coded in the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM). METHODS We used mixed methods in the study. A group of 12 clinical researchers participated in the generation of data-driven hypotheses using the same data sets and time frame (a 1-hour training session and a 2-hour study session) utilizing VIADS via the think-aloud protocol. The audio and screen activities were recorded remotely. A modified version of the System Usability Scale (SUS) survey and a brief survey with open-ended questions were administered after the study to assess the usability of VIADS and verify their intense usage experience with VIADS. RESULTS The range of SUS scores was 37.5 to 87.5. The mean SUS score for VIADS was 71.88 (out of a possible 100, SD 14.62), and the median SUS was 75. The participants unanimously agreed that VIADS offers new perspectives on data sets (12/12, 100%), while 75% (8/12) agreed that VIADS facilitates understanding, presentation, and interpretation of underlying data sets. The comments on the utility of VIADS were positive and aligned well with the design objectives of VIADS. The answers to the open-ended questions in the modified SUS provided specific suggestions regarding potential improvements for VIADS, and the identified problems with usability were used to update the tool. CONCLUSIONS This usability study demonstrates that VIADS is a usable tool for analyzing secondary data sets with good average usability, good SUS score, and favorable utility. Currently, VIADS accepts data sets with hierarchical codes and their corresponding frequencies. Consequently, only specific types of use cases are supported by the analytical results. Participants agreed, however, that VIADS provides new perspectives on data sets and is relatively easy to use. The VIADS functionalities most appreciated by participants were the ability to filter, summarize, compare, and visualize data. INTERNATIONAL REGISTERED REPORT RR2-10.2196/39414
BACKGROUND Visualization can be a powerful tool to comprehend data sets, especially when they can... more BACKGROUND Visualization can be a powerful tool to comprehend data sets, especially when they can be represented via hierarchical structures. Enhanced comprehension can facilitate the development of scientific hypotheses. However, the inclusion of excessive data can make visualizations overwhelming. OBJECTIVE We developed a visual interactive analytic tool for filtering and summarizing large health data sets coded with hierarchical terminologies (VIADS). In this study, we evaluated the usability of VIADS for visualizing data sets of patient diagnoses and procedures coded in the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM). METHODS We used mixed methods in the study. A group of 12 clinical researchers participated in the generation of data-driven hypotheses using the same data sets and time frame (a 1-hour training session and a 2-hour study session) utilizing VIADS via the think-aloud protocol. The audio and screen activities were record...
Background MetaMap is a valuable tool for processing biomedical texts to identify concepts. Altho... more Background MetaMap is a valuable tool for processing biomedical texts to identify concepts. Although MetaMap is highly configurative, configuration decisions are not straightforward. Objective To develop a systematic, data-driven methodology for configuring MetaMap for optimal performance. Methods MetaMap, the word2vec model, and the phrase model were used to build a pipeline. For unsupervised training, the phrase and word2vec models used abstracts related to clinical decision support as input. During testing, MetaMap was configured with the default option, one behavior option, and two behavior options. For each configuration, cosine and soft cosine similarity scores between identified entities and gold-standard terms were computed for 40 annotated abstracts (422 sentences). The similarity scores were used to calculate and compare the overall percentages of exact matches, similar matches, and missing gold-standard terms among the abstracts for each configuration. The results were ma...
Background Disease status (eg, cancer stage) has been used in routine clinical practice to determ... more Background Disease status (eg, cancer stage) has been used in routine clinical practice to determine more accurate treatment plans. Health-related indicators, such as mortality, morbidity, and population group life expectancy, have also been used. However, few studies have specifically focused on the comprehensive and objective measures of individual health status. Objective The aim of this study was to analyze the perspectives of the public toward 29 health indicators obtained from a literature review to provide evidence for further prioritization of the indicators. The difference between health status and disease status should be considered. Methods This study used a cross-sectional design. Online surveys were administered through Ohio University, ResearchMatch, and Clemson University, resulting in three samples. Participants aged 18 years or older rated the importance of the 29 health indicators. The rating results were aggregated and analyzed as follows (in each case, the depend...
BACKGROUND Scientific hypothesis generation is a critical step in scientific research that determ... more BACKGROUND Scientific hypothesis generation is a critical step in scientific research that determines the direction and impact of any investigation. Despite its vital role, we have limited knowledge of the process itself, thus hindering our ability to address some critical questions. OBJECTIVE This study aims to answer the following questions: To what extent can secondary data analytics tools facilitate the generation of scientific hypotheses during clinical research? Are the processes similar in developing clinical diagnoses during clinical practice and developing scientific hypotheses for clinical research projects? Furthermore, this study explores the process of scientific hypothesis generation in the context of clinical research. It was designed to compare the role of VIADS, a visual interactive analysis tool for filtering and summarizing large data sets coded with hierarchical terminologies, and the experience levels of study participants during the scientific hypothesis genera...
ObjectiveWe examined the perspectives of the general public on 29 health indicators to provide ev... more ObjectiveWe examined the perspectives of the general public on 29 health indicators to provide evidence for further prioritizing the indicators, which were obtained from the literature review. Health status is different from disease status, which can refer to different stages of cancer.DesignThis study uses a cross-sectional design.SettingAn online survey was administered through Ohio University, ResearchMatch, and Clemson University.ParticipantsParticipants included the general public who are 18 years or older. A total of 1153 valid responses were included in the analysis.Primary outcomes measuresParticipants rated the importance of the 29 health indicators. The data were aggregated, cleaned, and analyzed in three ways: (1) to determine the agreement among the three samples on the importance of each indicator (IV = the three samples, DV = individual survey responses); (2) to examine the mean differences between the retained indicators with agreement across the three samples (IV = t...
BACKGROUND Background: The unified medical language system (UMLS) has been a critical tool in bio... more BACKGROUND Background: The unified medical language system (UMLS) has been a critical tool in biomedical and health informatics, and the year 2020 marks the 30th anniversary of UMLS. Despite its longevity, there is no systematic review on UMLS, in general. Thus, this systematic review was conducted to provide an overview of UMLS and its usage in English-language publications in the last 30 years. OBJECTIVE Objectives: The objective is twofold: to provide a comprehensive and systematic picture of the themes, their subtopics, and the publications under each category and to document systematic evidence of UMLS and how it has been used in English-language publications in the last 30 years. METHODS Methods: PubMed, ACM Digital Library, and Nursing & Allied Health Database were used to search for literature. The primary literature search strategy was as follows: UMLS was used as a MeSH term or a keyword or appeared in the title or abstract. Only English-language publications were consider...
Clinicians' patient care information needs are frequent and largely unmet. Online knowledge r... more Clinicians' patient care information needs are frequent and largely unmet. Online knowledge resources are available that can help clinicians meet these information needs. Yet, significant barriers limit the use of these resources within the clinical workflow. Infobuttons are clinical decision support tools that use the clinical context (e.g., institution, user, patient) within electronic health record (EHR) systems to anticipate clinicians' questions and provide automated links to relevant information in knowledge resources. This paper describes OpenInfobutton (www.openinfobutton.org): a standards-based, open source Web service that was designed to disseminate infobutton capabilities in multiple EHR systems and healthcare organizations. OpenInfobutton has been successfully integrated with 38 knowledge resources at 5 large healthcare organizations in the United States. We describe the OpenInfobutton architecture, knowledge resource integration, and experiences at five large h...
Context-aware links between electronic health records (EHRs) and online knowledge resources, comm... more Context-aware links between electronic health records (EHRs) and online knowledge resources, commonly called "infobuttons" are being used increasingly as part of EHR "meaningful use" requirements. While an HL7 standard exists for specifying how the links should be constructed, there is no guidance on what links to construct. Collectively, the authors manage four infobutton systems that serve 16 institutions. The purpose of this paper is to publish our experience with linking various resources and specifying particular criteria that can be used by infobutton managers to select resources that are most relevant for a given situation. This experience can be used directly by those wishing to customize their own EHRs, for example by using the OpenInfobutton infobutton manager and its configuration tool, the Librarian Infobutton Tailoring Environment.
Infobuttons are clinical decision support tools that use information about the clinical context (... more Infobuttons are clinical decision support tools that use information about the clinical context (institution, user, patient) in which an information need arises to provide direct access to relevant information from knowledge resources. Two freely available resources make infobutton implementation possible for virtually any EHR system. OpenInfobutton is an HL7-compliant system that accepts context parameters from an EHR and, using its knowledge base of resources and information needs, generates a set of links that direct the user to relevant information. The Librarian Infobutton Tailoring Environment (LITE) is a second system that allows institutional librarians to specify which resources should be selected in a given context by OpenInfobutton. This paper describes the steps needed to use LITE to customize OpenInfobutton and to integrate OpenInfobutton into an EHR.
To explore new graphical methods for reducing and analyzing large data sets in which the data are... more To explore new graphical methods for reducing and analyzing large data sets in which the data are coded with a hierarchical terminology. We use a hierarchical terminology to organize a data set and display it in a graph. We reduce the size and complexity of the data set by considering the terminological structure and the data set itself (using a variety of thresholds) as well as contributions of child level nodes to parent level nodes. We found that our methods can reduce large data sets to manageable size and highlight the differences among graphs. The thresholds used as filters to reduce the data set can be used alone or in combination. We applied our methods to two data sets containing information about how nurses and physicians query online knowledge resources. The reduced graphs make the differences between the two groups readily apparent. This is a new approach to reduce size and complexity of large data sets and to simplify visualization. This approach can be applied to any d...
SummaryObjectives: Graphical displays can make data more understandable; however, large graphs ca... more SummaryObjectives: Graphical displays can make data more understandable; however, large graphs can challenge human comprehension. We have previously described a filtering method to provide high-level summary views of large data sets. In this paper we demonstrate our method for setting and selecting thresholds to limit graph size while retaining important information by applying it to large single and paired data sets, taken from patient and bibliographic databases.Methods: Four case studies are used to illustrate our method. The data are either patient discharge diagnoses (coded using the International Classification of Diseases, Clinical Modifications [ICD9-CM]) or Medline citations (coded using the Medical Subject Headings [MeSH]). We use combinations of different thresholds to obtain filtered graphs for detailed analysis. The thresholds setting and selection, such as thresholds for node counts, class counts, ratio values, p values (for diff data sets), and percentiles of selected...
Uploads
Papers by Xia Jing