Bioinformatics approaches to examine gene-gene models provide a means to discover interactions be... more Bioinformatics approaches to examine gene-gene models provide a means to discover interactions between multiple genes that underlie complex disease. Extensive computational demands and adjusting for multiple testing make uncovering genetic interactions a challenge. Here, we address these issues using our knowledge-driven filtering method, Biofilter, to identify putative single nucleotide polymorphism (SNP) interaction models for cataract susceptibility, thereby reducing the number of models for analysis. Models were evaluated in 3,377 European Americans (1,185 controls, 2,192 cases) from the Marshfield Clinic, a study site of the Electronic Medical Records and Genomics (eMERGE) Network, using logistic regression. All statistically significant models from the Marshfield Clinic were then evaluated in an independent dataset of 4,311 individuals (742 controls, 3,569 cases), using independent samples from additional study sites in the eMERGE Network: Mayo Clinic, Group Health/University ...
The goal of this study was to compare the value of mammographic features and genetic variants for... more The goal of this study was to compare the value of mammographic features and genetic variants for breast cancer risk prediction with Bayesian reasoning and information theory. We conducted a retrospective case-control study, collecting mammographic findings and high-frequency/low-penetrance genetic variants from an existing personalized medicine data repository. We trained and tested Bayesian networks for mammographic findings and genetic variants respectively. We found that mammographic findings had a higher discriminative ability than genetic variants for improving breast cancer risk prediction in terms of the area under the ROC curve. We compared the value of each mammographic feature and genetic variant for breast risk prediction in terms of mutual information, with and without consideration of interactions of those risk factors. We also identified the interactions between mammographic features and genetic variants in an attempt to prioritize mammographic features and genetic va...
To assess the level of awareness of eye diseases in the urban population of Hyderabad in southern... more To assess the level of awareness of eye diseases in the urban population of Hyderabad in southern India. A total of 2522 subjects of all ages, who were representative of the Hyderabad population, participated in the population-based Andhra Pradesh Eye Disease Study. Of these subjects, 1859 aged > 15 years responded to a structured questionnaire on cataract, glaucoma, night blindness and diabetic retinopathy to trained field investigators. Having heard of the eye disease in question was defined as "awareness" and having some understanding of the eye disease was defined as "knowledge". Awareness of cataract (69.8%) and night blindness (60.0%) was moderate but that of diabetic retinopathy (27.0%) was low, while that of glaucoma (2.3%) was very poor. Knowledge of all the eye diseases assessed was poor. Subjects aged > or = 30 years were significantly more aware of all eye diseases assessed except night blindness. Multivariate analysis revealed that women were s...
To assess the prevalence, distribution, and demographic associations of refractive error in an ur... more To assess the prevalence, distribution, and demographic associations of refractive error in an urban population in southern India. Two thousand five hundred twenty-two subjects of all ages, representative of the Hyderabad population, were examined in the population-based Andhra Pradesh Eye Disease Study. Objective and subjective refraction was attempted on subjects >15 years of age with presenting distance and/or near visual acuity worse than 20/20 in either eye. Refraction under cycloplegia was attempted on all subjects < or =15 years of age. Spherical equivalent >0.50 D in the worse eye was considered as refractive error. Data on objective refraction under cycloplegia were analyzed for subjects < or =15 years and on subjective refraction were analyzed for subjects >15 years of age. Data on refractive error were available for 2,321 (92.0%) subjects. In subjects < or =15 years of age, age-gender-adjusted prevalence of myopia was 4.44% (95% confidence interval [CI],...
AMIA Joint Summits on Translational Science proceedings AMIA Summit on Translational Science, 2014
Recent large-scale genome-wide association studies (GWAS) have identified a number of new genetic... more Recent large-scale genome-wide association studies (GWAS) have identified a number of new genetic variants associated with breast cancer. However, the degree to which these genetic variants improve breast cancer diagnosis in concert with mammography remains unknown. We conducted a case-control study and collected mammography features and 77 genetic variants which reflect the state of the art GWAS findings on breast cancer. A naïve Bayes model was developed on the mammography features and these genetic variants. We observed that the incorporation of the genetic variants significantly improved breast cancer diagnosis based on mammographic findings.
Herpes zoster, commonly referred to as shingles, is caused by the varicella zoster virus (VZV). V... more Herpes zoster, commonly referred to as shingles, is caused by the varicella zoster virus (VZV). VZV initially manifests as chicken pox, most commonly in childhood, can remain asymptomatically latent in nerve tissues for many years and often re-emerges as shingles. Although reactivation may be related to immune suppression, aging and female sex, most inter-individual variability in re-emergence risk has not been explained to date. We performed a genome-wide association analyses in 22,981 participants (2280 shingles cases) from the electronic Medical Records and Genomics Network. Using Cox survival and logistic regression, we identified a genomic region in the combined and European ancestry groups that has an age of onset effect reaching genome-wide significance (P&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;1.0 × 10(-8)). This region tags the non-coding gene HCP5 (HLA Complex P5) in the major histocompatibility complex. This gene is an endogenous retrovirus and likely influences viral activity through regulatory functions. Variants in this genetic region are known to be associated with delay in development of AIDS in people infected by HIV. Our study provides further suggestion that this region may have a critical role in viral suppression and could potentially harbor a clinically actionable variant for the shingles vaccine.
The journal of allergy and clinical immunology. In practice
The incidence of angiotensin-converting enzyme (ACE) inhibitor-associated angioedema is increased... more The incidence of angiotensin-converting enzyme (ACE) inhibitor-associated angioedema is increased in patients with seasonal allergies. We tested the hypothesis that patients with ACE inhibitor-associated angioedema present during months when pollen counts are increased. Cohort analysis examined the month of presentation of ACE inhibitor-associated angioedema and pollen counts in the ambulatory and hospital setting. Patients with ACE inhibitor-associated angioedema were ascertained through (1) an observational study of patients presenting to Vanderbilt University Medical Center, (2) patients presenting to the Marshfield Clinic and participating in the Marshfield Clinic Personalized Medicine Research Project, and (3) patients enrolled in The Ongoing Telmisartan Alone and in Combination with Ramipril Global Endpoint Trial (ONTARGET). Measurements include date of presentation of ACE inhibitor-associated angioedema, population exposure to ACE inhibitor by date, and local pollen counts by...
AIMTo assess the prevalence of active and inactive uveitis unrelated to previous surgery or traum... more AIMTo assess the prevalence of active and inactive uveitis unrelated to previous surgery or trauma in an urban population in southern India.METHODSAs part of the Andhra Pradesh Eye Disease Study, 2522 subjects (85.4% of those eligible), a sample representative of the population of Hyderabad city in southern India, underwent interview and detailed dilated eye examination. Presence of sequelae of uveitis
Cataract is the leading cause of blindness in the world, and in the United States accounts for ap... more Cataract is the leading cause of blindness in the world, and in the United States accounts for approximately 60% of Medicare costs related to vision. The purpose of this study was to identify genetic markers for age-related cataract through a genome-wide association study (GWAS). In the electronic medical records and genomics (eMERGE) network, we ran an electronic phenotyping algorithm on individuals in each of five sites with electronic medical records linked to DNA biobanks. We performed a GWAS using 530,101 SNPs from the Illumina 660W-Quad in a total of 7,397 individuals (5,503 cases and 1,894 controls). We also performed an age-at-diagnosis case-only analysis. We identified several statistically significant associations with age-related cataract (45 SNPs) as well as age at diagnosis (44 SNPs). The 45 SNPs associated with cataract at p<1×10(-5) are in several interesting genes, including ALDOB, MAP3K1, and MEF2C. All have potential biologic relationships with cataracts. This i...
Uncertainty in artificial intelligence : proceedings of the ... conference. Conference on Uncertainty in Artificial Intelligence, 2012
Large-scale multiple testing tasks often exhibit dependence, and leveraging the dependence betwee... more Large-scale multiple testing tasks often exhibit dependence, and leveraging the dependence between individual tests is still one challenging and important problem in statistics. With recent advances in graphical models, it is feasible to use them to perform multiple testing under dependence. We propose a multiple testing procedure which is based on a Markov-random-field-coupled mixture model. The ground truth of hypotheses is represented by a latent binary Markov random-field, and the observed test statistics appear as the coupled mixture variables. The parameters in our model can be automatically learned by a novel EM algorithm. We use an MCMC algorithm to infer the posterior probability that each hypothesis is null (termed local index of significance), and the false discovery rate can be controlled accordingly. Simulations show that the numerical performance of multiple testing can be improved substantially by using our procedure. We apply the procedure to a real-world genome-wide...
Phenome-wide association studies (PheWAS) have demonstrated utility in validating genetic associa... more Phenome-wide association studies (PheWAS) have demonstrated utility in validating genetic associations derived from traditional genetic studies as well as identifying novel genetic associations. Here we used an electronic health record (EHR)-based PheWAS to explore pleiotropy of genetic variants in the fat mass and obesity associated gene (FTO), some of which have been previously associated with obesity and type 2 diabetes (T2D). We used a population of 10,487 individuals of European ancestry with genome-wide genotyping from the Electronic Medical Records and Genomics (eMERGE) Network and another population of 13,711 individuals of European ancestry from the BioVU DNA biobank at Vanderbilt genotyped using Illumina HumanExome BeadChip. A meta-analysis of the two study populations replicated the well-described associations between FTO variants and obesity (odds ratio [OR] = 1.25, 95% Confidence Interval = 1.11-1.24, p = 2.10 × 10(-9)) and FTO variants and T2D (OR = 1.14, 95% CI = 1.08-1.21, p = 2.34 × 10(-6)). The meta-analysis also demonstrated that FTO variant rs8050136 was significantly associated with sleep apnea (OR = 1.14, 95% CI = 1.07-1.22, p = 3.33 × 10(-5)); however, the association was attenuated after adjustment for body mass index (BMI). Novel phenotype associations with obesity-associated FTO variants included fibrocystic breast disease (rs9941349, OR = 0.81, 95% CI = 0.74-0.91, p = 5.41 × 10(-5)) and trends toward associations with non-alcoholic liver disease and gram-positive bacterial infections. FTO variants not associated with obesity demonstrated other potential disease associations including non-inflammatory disorders of the cervix and chronic periodontitis. These results suggest that genetic variants in FTO may have pleiotropic associations, some of which are not mediated by obesity.
Several recent genome-wide association studies have identified genetic variants associated with b... more Several recent genome-wide association studies have identified genetic variants associated with breast cancer. However, how much these genetic variants may help advance breast cancer risk prediction based on other clinical features, like mammographic findings, is unknown. We conducted a retrospective case-control study, collecting mammographic findings and high-frequency/low-penetrance genetic variants from an existing personalized medicine data repository. A Bayesian network was developed using Tree Augmented Naive Bayes (TAN) by training on the mammographic findings, with and without the 22 genetic variants collected. We analyzed the predictive performance using the area under the ROC curve, and found that the genetic variants significantly improved breast cancer risk prediction on mammograms. We also identified the interaction effect between the genetic variants and collected mammographic findings in an attempt to link genotype to mammographic phenotype to better understand disea...
Bioinformatics approaches to examine gene-gene models provide a means to discover interactions be... more Bioinformatics approaches to examine gene-gene models provide a means to discover interactions between multiple genes that underlie complex disease. Extensive computational demands and adjusting for multiple testing make uncovering genetic interactions a challenge. Here, we address these issues using our knowledge-driven filtering method, Biofilter, to identify putative single nucleotide polymorphism (SNP) interaction models for cataract susceptibility, thereby reducing the number of models for analysis. Models were evaluated in 3,377 European Americans (1,185 controls, 2,192 cases) from the Marshfield Clinic, a study site of the Electronic Medical Records and Genomics (eMERGE) Network, using logistic regression. All statistically significant models from the Marshfield Clinic were then evaluated in an independent dataset of 4,311 individuals (742 controls, 3,569 cases), using independent samples from additional study sites in the eMERGE Network: Mayo Clinic, Group Health/University ...
The goal of this study was to compare the value of mammographic features and genetic variants for... more The goal of this study was to compare the value of mammographic features and genetic variants for breast cancer risk prediction with Bayesian reasoning and information theory. We conducted a retrospective case-control study, collecting mammographic findings and high-frequency/low-penetrance genetic variants from an existing personalized medicine data repository. We trained and tested Bayesian networks for mammographic findings and genetic variants respectively. We found that mammographic findings had a higher discriminative ability than genetic variants for improving breast cancer risk prediction in terms of the area under the ROC curve. We compared the value of each mammographic feature and genetic variant for breast risk prediction in terms of mutual information, with and without consideration of interactions of those risk factors. We also identified the interactions between mammographic features and genetic variants in an attempt to prioritize mammographic features and genetic va...
To assess the level of awareness of eye diseases in the urban population of Hyderabad in southern... more To assess the level of awareness of eye diseases in the urban population of Hyderabad in southern India. A total of 2522 subjects of all ages, who were representative of the Hyderabad population, participated in the population-based Andhra Pradesh Eye Disease Study. Of these subjects, 1859 aged > 15 years responded to a structured questionnaire on cataract, glaucoma, night blindness and diabetic retinopathy to trained field investigators. Having heard of the eye disease in question was defined as "awareness" and having some understanding of the eye disease was defined as "knowledge". Awareness of cataract (69.8%) and night blindness (60.0%) was moderate but that of diabetic retinopathy (27.0%) was low, while that of glaucoma (2.3%) was very poor. Knowledge of all the eye diseases assessed was poor. Subjects aged > or = 30 years were significantly more aware of all eye diseases assessed except night blindness. Multivariate analysis revealed that women were s...
To assess the prevalence, distribution, and demographic associations of refractive error in an ur... more To assess the prevalence, distribution, and demographic associations of refractive error in an urban population in southern India. Two thousand five hundred twenty-two subjects of all ages, representative of the Hyderabad population, were examined in the population-based Andhra Pradesh Eye Disease Study. Objective and subjective refraction was attempted on subjects >15 years of age with presenting distance and/or near visual acuity worse than 20/20 in either eye. Refraction under cycloplegia was attempted on all subjects < or =15 years of age. Spherical equivalent >0.50 D in the worse eye was considered as refractive error. Data on objective refraction under cycloplegia were analyzed for subjects < or =15 years and on subjective refraction were analyzed for subjects >15 years of age. Data on refractive error were available for 2,321 (92.0%) subjects. In subjects < or =15 years of age, age-gender-adjusted prevalence of myopia was 4.44% (95% confidence interval [CI],...
AMIA Joint Summits on Translational Science proceedings AMIA Summit on Translational Science, 2014
Recent large-scale genome-wide association studies (GWAS) have identified a number of new genetic... more Recent large-scale genome-wide association studies (GWAS) have identified a number of new genetic variants associated with breast cancer. However, the degree to which these genetic variants improve breast cancer diagnosis in concert with mammography remains unknown. We conducted a case-control study and collected mammography features and 77 genetic variants which reflect the state of the art GWAS findings on breast cancer. A naïve Bayes model was developed on the mammography features and these genetic variants. We observed that the incorporation of the genetic variants significantly improved breast cancer diagnosis based on mammographic findings.
Herpes zoster, commonly referred to as shingles, is caused by the varicella zoster virus (VZV). V... more Herpes zoster, commonly referred to as shingles, is caused by the varicella zoster virus (VZV). VZV initially manifests as chicken pox, most commonly in childhood, can remain asymptomatically latent in nerve tissues for many years and often re-emerges as shingles. Although reactivation may be related to immune suppression, aging and female sex, most inter-individual variability in re-emergence risk has not been explained to date. We performed a genome-wide association analyses in 22,981 participants (2280 shingles cases) from the electronic Medical Records and Genomics Network. Using Cox survival and logistic regression, we identified a genomic region in the combined and European ancestry groups that has an age of onset effect reaching genome-wide significance (P&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;1.0 × 10(-8)). This region tags the non-coding gene HCP5 (HLA Complex P5) in the major histocompatibility complex. This gene is an endogenous retrovirus and likely influences viral activity through regulatory functions. Variants in this genetic region are known to be associated with delay in development of AIDS in people infected by HIV. Our study provides further suggestion that this region may have a critical role in viral suppression and could potentially harbor a clinically actionable variant for the shingles vaccine.
The journal of allergy and clinical immunology. In practice
The incidence of angiotensin-converting enzyme (ACE) inhibitor-associated angioedema is increased... more The incidence of angiotensin-converting enzyme (ACE) inhibitor-associated angioedema is increased in patients with seasonal allergies. We tested the hypothesis that patients with ACE inhibitor-associated angioedema present during months when pollen counts are increased. Cohort analysis examined the month of presentation of ACE inhibitor-associated angioedema and pollen counts in the ambulatory and hospital setting. Patients with ACE inhibitor-associated angioedema were ascertained through (1) an observational study of patients presenting to Vanderbilt University Medical Center, (2) patients presenting to the Marshfield Clinic and participating in the Marshfield Clinic Personalized Medicine Research Project, and (3) patients enrolled in The Ongoing Telmisartan Alone and in Combination with Ramipril Global Endpoint Trial (ONTARGET). Measurements include date of presentation of ACE inhibitor-associated angioedema, population exposure to ACE inhibitor by date, and local pollen counts by...
AIMTo assess the prevalence of active and inactive uveitis unrelated to previous surgery or traum... more AIMTo assess the prevalence of active and inactive uveitis unrelated to previous surgery or trauma in an urban population in southern India.METHODSAs part of the Andhra Pradesh Eye Disease Study, 2522 subjects (85.4% of those eligible), a sample representative of the population of Hyderabad city in southern India, underwent interview and detailed dilated eye examination. Presence of sequelae of uveitis
Cataract is the leading cause of blindness in the world, and in the United States accounts for ap... more Cataract is the leading cause of blindness in the world, and in the United States accounts for approximately 60% of Medicare costs related to vision. The purpose of this study was to identify genetic markers for age-related cataract through a genome-wide association study (GWAS). In the electronic medical records and genomics (eMERGE) network, we ran an electronic phenotyping algorithm on individuals in each of five sites with electronic medical records linked to DNA biobanks. We performed a GWAS using 530,101 SNPs from the Illumina 660W-Quad in a total of 7,397 individuals (5,503 cases and 1,894 controls). We also performed an age-at-diagnosis case-only analysis. We identified several statistically significant associations with age-related cataract (45 SNPs) as well as age at diagnosis (44 SNPs). The 45 SNPs associated with cataract at p<1×10(-5) are in several interesting genes, including ALDOB, MAP3K1, and MEF2C. All have potential biologic relationships with cataracts. This i...
Uncertainty in artificial intelligence : proceedings of the ... conference. Conference on Uncertainty in Artificial Intelligence, 2012
Large-scale multiple testing tasks often exhibit dependence, and leveraging the dependence betwee... more Large-scale multiple testing tasks often exhibit dependence, and leveraging the dependence between individual tests is still one challenging and important problem in statistics. With recent advances in graphical models, it is feasible to use them to perform multiple testing under dependence. We propose a multiple testing procedure which is based on a Markov-random-field-coupled mixture model. The ground truth of hypotheses is represented by a latent binary Markov random-field, and the observed test statistics appear as the coupled mixture variables. The parameters in our model can be automatically learned by a novel EM algorithm. We use an MCMC algorithm to infer the posterior probability that each hypothesis is null (termed local index of significance), and the false discovery rate can be controlled accordingly. Simulations show that the numerical performance of multiple testing can be improved substantially by using our procedure. We apply the procedure to a real-world genome-wide...
Phenome-wide association studies (PheWAS) have demonstrated utility in validating genetic associa... more Phenome-wide association studies (PheWAS) have demonstrated utility in validating genetic associations derived from traditional genetic studies as well as identifying novel genetic associations. Here we used an electronic health record (EHR)-based PheWAS to explore pleiotropy of genetic variants in the fat mass and obesity associated gene (FTO), some of which have been previously associated with obesity and type 2 diabetes (T2D). We used a population of 10,487 individuals of European ancestry with genome-wide genotyping from the Electronic Medical Records and Genomics (eMERGE) Network and another population of 13,711 individuals of European ancestry from the BioVU DNA biobank at Vanderbilt genotyped using Illumina HumanExome BeadChip. A meta-analysis of the two study populations replicated the well-described associations between FTO variants and obesity (odds ratio [OR] = 1.25, 95% Confidence Interval = 1.11-1.24, p = 2.10 × 10(-9)) and FTO variants and T2D (OR = 1.14, 95% CI = 1.08-1.21, p = 2.34 × 10(-6)). The meta-analysis also demonstrated that FTO variant rs8050136 was significantly associated with sleep apnea (OR = 1.14, 95% CI = 1.07-1.22, p = 3.33 × 10(-5)); however, the association was attenuated after adjustment for body mass index (BMI). Novel phenotype associations with obesity-associated FTO variants included fibrocystic breast disease (rs9941349, OR = 0.81, 95% CI = 0.74-0.91, p = 5.41 × 10(-5)) and trends toward associations with non-alcoholic liver disease and gram-positive bacterial infections. FTO variants not associated with obesity demonstrated other potential disease associations including non-inflammatory disorders of the cervix and chronic periodontitis. These results suggest that genetic variants in FTO may have pleiotropic associations, some of which are not mediated by obesity.
Several recent genome-wide association studies have identified genetic variants associated with b... more Several recent genome-wide association studies have identified genetic variants associated with breast cancer. However, how much these genetic variants may help advance breast cancer risk prediction based on other clinical features, like mammographic findings, is unknown. We conducted a retrospective case-control study, collecting mammographic findings and high-frequency/low-penetrance genetic variants from an existing personalized medicine data repository. A Bayesian network was developed using Tree Augmented Naive Bayes (TAN) by training on the mammographic findings, with and without the 22 genetic variants collected. We analyzed the predictive performance using the area under the ROC curve, and found that the genetic variants significantly improved breast cancer risk prediction on mammograms. We also identified the interaction effect between the genetic variants and collected mammographic findings in an attempt to link genotype to mammographic phenotype to better understand disea...
Uploads
Papers by Catherine McCarty