Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2975167.2985683acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
research-article

Statistical and Network Analysis of Metabolomics Data

Published: 02 October 2016 Publication History

Abstract

Metabolomics encompasses analysis of metabolites using profiling techniques such as mass spectroscopy (MS) and nuclear magnetic resonance (NMR). Statistical analysis is performed on the profiled data to determine variations in the levels of metabolites. The goal here is to reveal relationships between the variations in the concentrations of metabolites and specific pathophysiological conditions such as diseases or external factors. Metabolomics has been widely used to characterize metabolites in various body fluids such as saliva, serum and urine in various fields of medical research including cancer [3], cardialogy [6], diabetes [5], human infections [12], neurology [7], neonatology [4] and respiratory diseases [2] to name a few.
In the statistical analysis of metabolomics data, many methods are used which can be categorized as univariate and multivariate analysis methods. Univariate methods are very commonly applied due to their ease of use and interpretation. These methods consider metabolomic features (variables) one at a time independent of each other, thus, ignoring correlations with other features. Moreover, as pointed by Alonso et al. [1], these methods ignore confounding variables such as age, gender, body mass index (BMI), which may lead to incorrect results [13, 15]. On the other hand, multivariate methods consider all the features and their correlations during data analysis. These methods include unsupervised methods such as principal component analysis (PCA), and supervised methods such as partial least squares (PLS) and support vector machine (SVM). Alonso et al. has provided a review of univariate and multivariate methods used in metabolomics. To the best of our knowledge, there are many state of the art statistical methods that have not be used for metabolomic data analysis. A significant advantage of these methods over commonly used methods is their ability to process high-dimensional data. Along with state-of-the-art statistical methods we have used differential network analysis to identify variations at system level.
In this work we have analyzed urine samples from Qatar Metabolomics Study on Diabetes (QMDiab) for identification of potential biomarkers. QMDiab was conducted by Hamad Medical Corporation, Qatar (HMC) and Weill Cornell Medical College, Qatar in 2012 with approval from the Institutional Review Boards of HMC and Weill Cornell Medical College-Qatar (Research Protocol number 11131/11). Written informed consent was obtained from all participants. Subjects in the study included males and females from Arab and Asian ethnicities aging 17-81 years. Urine samples were sent to Chenomx Inc., Alberta, Canada for proton nuclear magnetic resonance (1H NMR). Although the original study was targeting investigation of type 2 diabetes, in this paper we are focusing on obesity as well by using BMI as a representative measure of obesity.
In this work we have used regularization models and differential network analysis. We have used the elastic net, glinternet, the lasso projection and high-dimensional inference. The elastic net uses L1 and L2 penalty resulting in a mix of ridge and lasso regression. The glinternet is a group-lasso based method developed by Lim and Hastie [9]. The method learns pairwise interactions of variables in linear regression models satisfying strong hierarchy. The lasso projection (lasso proj) or de-sparsified lasso is a regularization based method that performs statistical inference of low dimensional parameters with high dimensional data [17]. The method uses low dimension projection approach to construct confidence intervals for the estimated regression parameters. The high-dimensional inference computes P-values of variables and associated confidence intervals in high-dimensional data [10].
Further, we performed differential network analysis to identify variable interactions, which differentiate between diabetic and non-diabetic, or obese and lean subjects. The network is constructed using mutual information between the variables for different groups of samples. We applied the differential network analysis, dGHD algorithm, proposed by Ruan et al. [14] for detecting interaction patterns, which differentiate two networks. The algorithm uses the Generalised Hamming Distance (GHD) for calculating topological differences between the networks along with computation of their statistical significance.
It is astonishing that the proposed methods, which have not been applied in the field yet, identify potential biomarkers, proposed in the literature by previous studies, in a small dataset. The results for the elastic net, the glinternet and the lasso proj are summarized in Table 1. For diabetes analysis, identified significant variables include age, betaine, glycolate and glucose, well known biomarkers for diabetes [8, 11]. For obesity analysis, identified significant variables include age, dimethylamine, succinate and cis-aconitate, previously identified by [16]. The high-dimensional inference only identified age and betaine for diabetes study.
We conclude that state-of-the-art statistical and network analysis methods can be used for metabolomics data analysis for datasets with limited number of samples. The number of metabolomic features is increasing with the advancement of technologies. The ability of these methods to handle high-dimensional data make them suitable in the settings where the number of samples is smaller than the number of features. These methods can help in identification potential biomarkers in future studies.

References

[1]
A. Alonso, S. Marsal, and A. Julià. Analytical methods in untargeted metabolomics: State of the art in 2015, 2015.
[2]
A. Atzei, L. Atzori, C. Moretti, L. Barberini, A. Noto, G. Ottonello, E. Pusceddu, and V. Fanos. Metabolomics in paediatric respiratory diseases and bronchiolitis. The Journal of Maternal-Fetal & Neonatal Medicine, 24(sup2):59--62, 2011.
[3]
R. Bujak, E. Daghir, J. Rybka, P. Koslinski, and M. J. Markuszewski. Metabolomics in urogenital cancer. Bioanalysis, 3(8):913--923, Apr. 2011.
[4]
V. Fanos, R. Antonucci, L. Barberini, A. Noto, and L. Atzori. Clinical application of metabolomics in neonatology. The Journal of Maternal-Fetal & Neonatal Medicine, 25(sup1):104--109, 2012.
[5]
N. Friedrich. Metabolomics in diabetes research. Journal of Endocrinology, 215(1):29--42, 2012.
[6]
J. L. Griffin, H. Atherton, J. Shockcor, and L. Atzori. Metabolomics as a tool for cardiac research. Nat Rev Cardiol, 8(11):630--643, Nov. 2011.
[7]
G. Hassan-Smith, G. R. Wallace, M. R. Douglas, and A. J. Sinclair. The role of metabolomics in neurological disease. Journal of Neuroimmunology, 248(1-2):48--52, 2012. Special Issue on New technologies for biomarker discovery in multiple sclerosis.
[8]
M. Lever, S. Slow, D. O. McGregor, W. J. Dellow, P. M. George, and S. T. Chambers. Variability of plasma and urine betaine in diabetes mellitus and its relationship to methionine load test responses: an observational study. Cardiovascular Diabetology, 11(1):1--8, 2012.
[9]
M. Lim and T. Hastie. Learning interactions via hierarchical group-lasso regularization. Journal of Computational and Graphical Statistics, 24(3):627--654, 2015. 26759522.
[10]
N. Meinshausen, L. Meier, and P. Bühlmann. p-values for high-dimensional regression. Journal of the American Statistical Association, 104(488):1671--1681, 2009.
[11]
V. J. Nikiforova, P. Giesbertz, J. Wiemer, B. Bethan, R. Looser, V. Liebenberg, P. Ruiz Noppinger, H. Daniel, and D. Rein. Glyoxylate, a new marker metabolite of type 2 diabetes. Journal of Diabetes Research, 2014:9, 2014.
[12]
T. Pacchiarotta, A. M. Deelder, and O. A. Mayboroda. Metabolomic investigations of human infections. Bioanalysis, 4(8):919--925, Apr. 2012.
[13]
L. G. Rasmussen, F. Savorani, T. M. Larsen, L. O. Dragsted, A. Astrup, and S. B. Engelsen. Standardization of factors that influence human urine metabolomics. Metabolomics, 7(1):71--83, 2011.
[14]
D. Ruan, A. Young, and G. Montana. Differential analysis of biological networks. BMC Bioinformatics, 16(1):1--13, 2015.
[15]
M. K. Townsend, C. B. Clish, P. Kraft, C. Wu, A. L. Souza, A. A. Deik, S. S. Tworoger, and B. M. Wolpin. Reproducibility of metabolomic profiles among men and women in 2 large cohort studies. Clinical Chemistry, 59(11):1657--1667, 2013.
[16]
B. Xie, M. J. Waters, and H. J. Schirra. Investigating potential mechanisms of obesity by metabolomics. Journal of Biomedicine and Biotechnology, 2012:10, 2012.
[17]
C. Zhang and S. S. Zhang. Confidence intervals for low dimensional parameters in high dimensional linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76(1):217--242, 2014.

Index Terms

  1. Statistical and Network Analysis of Metabolomics Data

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    BCB '16: Proceedings of the 7th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics
    October 2016
    675 pages
    ISBN:9781450342254
    DOI:10.1145/2975167
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 02 October 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Differential network analysis
    2. Metabolomics
    3. Multivariate analysis
    4. Nuclear magnetic resonance
    5. Statistical analysis

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    BCB '16
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 254 of 885 submissions, 29%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 124
      Total Downloads
    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 03 Oct 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media