Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
Johan Westerhuis

    Johan Westerhuis

    University of Amsterdam, SILS, Faculty Member
    The plant microbiome plays an essential role in supporting plant growth and health, but plant molecular mechanisms underlying its recruitment are still unclear. Multi-omics data integration methods can be used to unravel new signalling... more
    The plant microbiome plays an essential role in supporting plant growth and health, but plant molecular mechanisms underlying its recruitment are still unclear. Multi-omics data integration methods can be used to unravel new signalling relationships. Here, we review the effects of plant genetics and root exudates on root microbiome recruitment, and discuss methodological advances in data integration approaches that can help us to better understand and optimise the crop-microbiome interaction for a more sustainable agriculture.
    Data sets resulting from metabolomics, proteomics, or metabolic profiling experiments are usually complex. This type of data contains underlying factors, such as time, doses, or combinations thereof. Classical biostatistics methods do not... more
    Data sets resulting from metabolomics, proteomics, or metabolic profiling experiments are usually complex. This type of data contains underlying factors, such as time, doses, or combinations thereof. Classical biostatistics methods do not take into account the structure of such complex data sets. However, incorporating this structure into the data analysis is important for understanding the biological information in these data sets. We describe ANOVA simultaneous component analysis (ASCA), a method capable of dealing with complex multivariate data sets containing an underlying experimental design. It is a generalization of analysis of variance (ANOVA) for univariate data to the multivariate case. The method allows for easy interpretation of the variation induced by the different factors of the design. The method is illustrated with a data set from a metabolomics experiment with time and dose factors.
    Strigolactones are endogenous plant hormones regulating plant development and are exuded into the rhizosphere when plants experience nutrient deficiency. There, they promote the mutualistic association of plants with arbuscular... more
    Strigolactones are endogenous plant hormones regulating plant development and are exuded into the rhizosphere when plants experience nutrient deficiency. There, they promote the mutualistic association of plants with arbuscular mycorrhizal fungi that help the plant with the uptake of nutrients from the soil. This shows that plants actively establish—through the exudation of strigolactones—mutualistic interactions with microbes to overcome inadequate nutrition. The signaling function of strigolactones could possibly extend to other microbial partners, but the effect of strigolactones on the global root and rhizosphere microbiome remains poorly understood. Therefore, we analyzed the bacterial and fungal microbial communities of 16 rice genotypes differing in their root strigolactone exudation. Using multivariate analyses, distinctive differences in the microbiome composition were uncovered depending on strigolactone exudation. Moreover, the results of regression modeling showed that structural differences in the exuded strigolactones affected different sets of microbes. In particular, orobanchol was linked to the relative abundance of Burkholderia–Caballeronia–Paraburkholderia and Acidobacteria that potentially solubilize phosphate, while 4-deoxyorobanchol was associated with the genera Dyella and Umbelopsis. With this research, we provide new insight into the role of strigolactones in the interplay between plants and microbes in the rhizosphere.
    In most systems involving biological studies, the effects of experimental factors on the system are assessed using functional genomics tools such as metabolomics or proteomics. Datasets resulting from metabolomics or metabolic profiling... more
    In most systems involving biological studies, the effects of experimental factors on the system are assessed using functional genomics tools such as metabolomics or proteomics. Datasets resulting from metabolomics or metabolic profiling experiments are becoming increasingly complex because of underlying factors, such as time (time-resolved or longitudinal measurements), different treatments or combinations thereof, leading to between-factor interactions. For the analysis of such complex data, combinations of Analysis of Variance (ANOVA) models and high-dimensional analysis methods such as Principal Component Analysis (PCA) and Partial Least Squares-Discriminant Analysis (PLS-DA) have been developed. The linear model familiar from ANOVA separates the data into orthogonal effect matrices which allows construction of independent models for each effect. The high-dimensional analysis methods, however, explore these effect matrices for correlations and underlying relationships between the metabolites. These methods facilitate a relatively simple interpretation of the variation induced by each different factor in the experimental design. Here, two applications are presented in which the first one focuses on different treatments of plants, whilst in the second application the differences between human individuals in a polyphenolic intervention study represents the factor of major importance.
    Motivation Genome‐wide measurements of genetic and epigenetic alterations are generating more and more high‐dimensional binary data. The special mathematical characteristics of binary data make the direct use of the classical principal... more
    Motivation Genome‐wide measurements of genetic and epigenetic alterations are generating more and more high‐dimensional binary data. The special mathematical characteristics of binary data make the direct use of the classical principal component analysis (PCA) model to explore low‐dimensional structures less obvious. Although there are several PCA alternatives for binary data in the psychometric, data analysis and machine learning literature, they are not well known to the bioinformatics community. Results: In this article, we introduce the motivation and rationale of some parametric and nonparametric versions of PCA specifically geared for binary data. Using both realistic simulations of binary data as well as mutation, CNA and methylation data of the Genomic Determinants of Sensitivity in Cancer 1000 (GDSC1000), the methods were explored for their performance with respect to finding the correct number of components, overfit, finding back the correct low‐dimensional structure, variable importance, etc. The results show that if a low‐dimensional structure exists in the data, that most of the methods can find it. When assuming a probabilistic generating process is underlying the data, we recommend to use the parametric logistic PCA model, while when such an assumption is not valid and the data are considered as given, the nonparametric Gifi model is recommended. Availability The codes to reproduce the results in this article are available at the homepage of the Biosystems Data Analysis group (www.bdagroup.nl).
    This dataset contains key characteristics about the data described in the Data Descriptor STATegra, a comprehensive multi-omics dataset of B-cell differentiation in mouse. <br> Contents: <br> 1. human readable metadata summary... more
    This dataset contains key characteristics about the data described in the Data Descriptor STATegra, a comprehensive multi-omics dataset of B-cell differentiation in mouse. <br> Contents: <br> 1. human readable metadata summary table in CSV format 2. machine readable metadata file in JSON format
    <p>The shaded area around the white lines marks the 95% confidence interval of the mean, the thin lines show the individual results. The four metrics unanimously point to lag 1, but the different metrics center on different aspects... more
    <p>The shaded area around the white lines marks the 95% confidence interval of the mean, the thin lines show the individual results. The four metrics unanimously point to lag 1, but the different metrics center on different aspects of the relation between ACTH and cortisol.</p
    High-throughput experimentation and screening methods are changing work flows and creating new possibilities in biochemistry, organometallic chemistry, and catalysis. However, many high throughput systems rely on off-line chromatography... more
    High-throughput experimentation and screening methods are changing work flows and creating new possibilities in biochemistry, organometallic chemistry, and catalysis. However, many high throughput systems rely on off-line chromatography methods that shift the bottleneck to the analysis stage. On-line or at-line spectroscopic analysis is an attractive alternative. It is fast, noninvasive, and nondestructive and requires no sample handling. The disadvantage is that spectroscopic calibration is timeconsuming and complex. Ideally, the calibration model should give reliable predictions while keeping the number of calibration samples to a minimum. In this paper, we employ the net analyte signal approach to build a calibration model for Fourier transform near-infrared measurements, using a minimum number of calibration samples based on blank samples. This approach fits very well to high-throughput setups. With this approach, we can reduce the number of calibration samples to the number of ...
    Research Interests:
    Data sets resulting from metabolomics, proteomics, or metabolic profiling experiments are usually complex. This type of data contains underlying factors, such as time, doses, or combinations thereof. Classical biostatistics methods do not... more
    Data sets resulting from metabolomics, proteomics, or metabolic profiling experiments are usually complex. This type of data contains underlying factors, such as time, doses, or combinations thereof. Classical biostatistics methods do not take into account the structure of such complex data sets. However, incorporating this structure into the data analysis is important for understanding the biological information in these data sets. We describe ANOVA simultaneous component analysis (ASCA), a method capable of dealing with complex multivariate data sets containing an underlying experimental design. It is a generalization of analysis of variance (ANOVA) for univariate data to the multivariate case. The method allows for easy interpretation of the variation induced by the different factors of the design. The method is illustrated with a data set from a metabolomics experiment with time and dose factors.
    Gene expression profiling has imposed as a frequent approach to study the molecular basis of a wide variety of biological phenomena. In multiple series of time course experiments (MTC) a large number of genes are measured on different... more
    Gene expression profiling has imposed as a frequent approach to study the molecular basis of a wide variety of biological phenomena. In multiple series of time course experiments (MTC) a large number of genes are measured on different conditions, which are combinations of time and different experimental groups. Identifying genes with different trends over time for the experimental groups and the levels of the factors that produce the differences are the main objectives in these studies. We propose a statistical method for the analysis of MTC data that applies the multivariate ASCA strategy as pre-processing for generating noise-filtered data on the main components of variability. Secondly, the regression based maSigPro method is applied on these data to obtain a selection of genes with differential expression along time and between experimental conditions. We present our results on the application of this method to simulated and real microarray data. We will show how the approach is effective in removing structural and unwanted noise in the data set and increasing the statistical power of the inference method applied to detect differentially expressed genes.
    Research Interests:
    To set the scene for this chapter, Figure 11.1 shows the metabolomics pipeline. This is a general pipeline for a metabolomics study showing the different steps. In many of these steps, data analysis methods are needed [1]. At the start of... more
    To set the scene for this chapter, Figure 11.1 shows the metabolomics pipeline. This is a general pipeline for a metabolomics study showing the different steps. In many of these steps, data analysis methods are needed [1]. At the start of the study, the design of experiments methods have to be used to set up a proper study design. Such a study design should be accompanied by a proper measurement design to assure high-quality data without confounding. The measurement design is especially important when large series ...
    ABSTRACT Challenge tests are used to assess the resilience of human beings to perturbations by analyzing responses to detect functional abnormalities. Well known examples are allergy tests and glucose tolerance tests. Increasingly,... more
    ABSTRACT Challenge tests are used to assess the resilience of human beings to perturbations by analyzing responses to detect functional abnormalities. Well known examples are allergy tests and glucose tolerance tests. Increasingly, metabolomics analysis of blood or serum samples is used to analyze the biological response of the individual to these challenges. The information content of such metabolomics challenge test data involves both the disturbance and restoration of homeostasis on a metabolic level and is thus inherently different from the analysis of steady state data. It opens doors to study the variation of resilience between individuals beyond the classical biomarkers; preferably in terms of underlying biological processes. We review challenge tests in which metabolomics was used to analyze the biological response. Specifically, we describe strategies to perform statistical analyses on the responses and we will show some examples of these strategies applied to a postprandial challenge that was used to study a diet with anti-inflammatory properties. Finally we discuss open issues and give recommendation for further research.
    Metabolomics studies aim at a better understanding of biochemical processes by studying relations between metabolites and between metabolites and other types of information (e.g., sensory and phenotypic features). The objectives of these... more
    Metabolomics studies aim at a better understanding of biochemical processes by studying relations between metabolites and between metabolites and other types of information (e.g., sensory and phenotypic features). The objectives of these studies are diverse, but the types of data generated and the methods for extracting information from the data and analysing the data are similar. Besides instrumental analysis tools,
    Many high quality products are produced in a batch wise manner. One of the characteristics of a batch process is the recipe driven nature. By repeating the recipe in an identical manner a desired end-product is obtained. However, in spite... more
    Many high quality products are produced in a batch wise manner. One of the characteristics of a batch process is the recipe driven nature. By repeating the recipe in an identical manner a desired end-product is obtained. However, in spite of repeating the recipe in an identical manner, process differences occur. These differences can be caused by a change of feed stock supplier or impurities in the process. Because of this, differences might occur in the end-product quality or unsafe process situations arise. Therefore, the need to monitor an industrial batch process exists. An industrial process is usually monitored by process measurements such as pressures and temperatures. Nowadays, due to technical developments, spectroscopy is more and more used for process monitoring. Spectroscopic measurements have the advantage of giving a direct chemical insight in the process. Multivariate statistical process control (MSPC) is a statistical way of monitoring the behaviour of a process. Combining spectroscopic measurements with MSPC will notice process perturbations or process deviations from normal operating conditions in a very simple manner. In the following an application is given of batch process monitoring. It is shown how a calibration model is developed and used with the principles of MSPC. Statistical control charts are developed and used to detect batches with a process upset.
    ABSTRACT The beneficial health effects of fruits and vegetables have been attributed to their polyphenol content. These compounds undergo many bioconversions in the body. Modeling polyphenol exposure of humans upon intake is a... more
    ABSTRACT The beneficial health effects of fruits and vegetables have been attributed to their polyphenol content. These compounds undergo many bioconversions in the body. Modeling polyphenol exposure of humans upon intake is a prerequisite for understanding the modulating effect of the food matrix and the colonic microbiome. This modeling is not a trivial task and requires a careful integration of measuring techniques, modeling methods and experimental design. Moreover, both at the population level as well as the individual level polyphenol exposure has to be quantified and assessed. We developed a strategy to quantify polyphenol exposure based on the concept of nutrikinetics in combination with population-based modeling. The key idea of the strategy is to derive nutrikinetic model parameters that summarize all information of the polyphenol exposure at both individual and population level. This is illustrated by a placebo-controlled crossover study in which an extract of wine/grapes and black tea solids was administered to twenty subjects. We show that urinary and plasma nutrikinetic time-response curves can be used for phenotyping the gut microbial bioconversion capacity of individuals. Each individual harbours an intrinsic microbiota composition converting similar polyphenols from both test products in the same manner and stable over time. We demonstrate that this is a novel approach for associating the production of two gut-mediated γ-valerolactones to specific gut phylotypes. The large inter-individual variation in nutrikinetics and γ-valerolactones production indicated that gut microbial metabolism is an essential factor in polyphenol exposure and related potential health benefits.
    Near-infrared (NIR) spectroscopy is used to monitor online a large variety of processes. Hydrocarbons with their strong NIR spectral signature are good candidate analytes. For this work, the sorption data are measured in a manometric... more
    Near-infrared (NIR) spectroscopy is used to monitor online a large variety of processes. Hydrocarbons with their strong NIR spectral signature are good candidate analytes. For this work, the sorption data are measured in a manometric setup coupled with online NIR spectroscopy, to monitor the bulk composition. The assessment of time based results faces a baseline stability problem. The goal of this article is to study the robustness of different spectral preprocessing methods when dealing with time based data. In this study, it was found that for time based experiments it is necessary to perform drift correction on the spectra combined with a water band correction. For the calibration experiments, which only last few seconds, offset correction and drift correction performed equally well.
    Real time release (RTR) of products is a new paradigm in the pharmaceutical industry. An RTR system assures that when the last manufacturing step is passed all the final release criteria are met. Various types of models can be used within... more
    Real time release (RTR) of products is a new paradigm in the pharmaceutical industry. An RTR system assures that when the last manufacturing step is passed all the final release criteria are met. Various types of models can be used within the RTR framework. For each RTR system, the monitoring capability, control capability and RTR capability need to be tested. This paper presents some practical examples within the RTR framework using near-infrared and process data obtained from a tablet manufacturing process.
    Multivariate calibration is a powerful tool for establishing a relationship between spectral variables and properties of interest. Usually, changes in spectral variables are ascribed to changes in the chemical composition of the sample.... more
    Multivariate calibration is a powerful tool for establishing a relationship between spectral variables and properties of interest. Usually, changes in spectral variables are ascribed to changes in the chemical composition of the sample. However, spectral intensities that are measured at varying temperatures do not only change because of changes in sample composition but also respond to the change in temperature. In these cases, multivariate calibration can be (severely) hindered, resulting in a loss of prediction capabilities. This paper provides an overview of the characteristics and possibilities of (most) methods for temperature robust multivariate calibration. The methods are discussed by using two data sets.
    A new method to eliminate the background spectrum (EBS) during analyte elution in column liquid chromatography (LC) coupled to spectroscopic techniques is proposed. This method takes into account the shape and also intensity differences... more
    A new method to eliminate the background spectrum (EBS) during analyte elution in column liquid chromatography (LC) coupled to spectroscopic techniques is proposed. This method takes into account the shape and also intensity differences of the background eluent spectrum. This allows the EBS method to make a better estimation of the background eluent spectrum during analyte elution. This is an advantage for quantification as well as for identification of analytes. The EBS method uses a two-step procedure. First, the baseline spectra are modeled using a limited number of principal components (PCs). Subsequently, an asymmetric least squares (asLS) regression method is applied using these principal components to correct the measured spectra during elution for the background contribution. The asymmetric least squares regression needs one parameter, the asymmetry factor p. This asymmetry factor determines relative weight of positive and negative residuals. Simulations are performed to test the EBS method in well-defined situations. The effect of spectral noise on the performance and the sensitivity of the EBS method for the value of the asymmetry factorp is tested. Two applications of the EBS method are discussed. In the first application, the goal is to extract the analyte spectrum from an LC-Raman analysis. In this case, the EBS method facilitates easy identification of unknown analytes using spectral libraries. In a second application, the EBS method is used for baseline correction in LC-diode array detection (DAD) analysis of polymeric standards during a gradient elution separation. It is shown that the EBS method yields a good baseline correction, without the need to perform a blank chromatographic run.
    Abstract The interpretation of principal component analysis (PCA) models of complex biological or chemical data can be cumbersome because in PCA the decomposition is performed without any knowledge of the system at hand. Prior information... more
    Abstract The interpretation of principal component analysis (PCA) models of complex biological or chemical data can be cumbersome because in PCA the decomposition is performed without any knowledge of the system at hand. Prior information of the system is not used to improve the interpretation. In this paper we introduce Grey Component Analysis (GCA) as a new explorative data analysis method that uses the available prior information. GCA uses a soft penalty approach to gently push the decomposition into the direction of ...
    For analyzing designed high‐dimensional data, no standard methods are currently available. A method that is becoming more and more popular for analyzing such data is ASCA. The mathematics of ASCA are already described elsewhere but a... more
    For analyzing designed high‐dimensional data, no standard methods are currently available. A method that is becoming more and more popular for analyzing such data is ASCA. The mathematics of ASCA are already described elsewhere but a geometrical interpretation is still lacking. The geometry can help practitioners to understand what ASCA does and the more advanced user can get insight into the properties of the method. This paper shows the geometry of ASCA in both the row‐ and column‐space of the matrices involved. Copyright © 2008 John Wiley & Sons, Ltd.
    Novel post‐genomics experiments such as metabolomics provide datasets that are highly multivariate and often reflect an underlying experimental design, developed with a specific experimental question in mind. ANOVA‐simultaneous component... more
    Novel post‐genomics experiments such as metabolomics provide datasets that are highly multivariate and often reflect an underlying experimental design, developed with a specific experimental question in mind. ANOVA‐simultaneous component analysis (ASCA) can be used for the analysis of multivariate data obtained from an experimental design instead of the widely used principal component analysis (PCA). This increases the interpretability of the model in terms of the experimental question. Aside from the levels of individual factors, variation that can be described by the experimental design may also depend on levels of multiple (crossed) factors simultaneously, e.g. the interactions. ASCA describes each contribution with a PCA model, but a contribution depending on crossed factors may be described more parsimoniously by multiway models like parallel factor analysis (PARAFAC). The combination of PARAFAC and ASCA, named PARAFASCA, provides a view on the data that is both parsimonious an...
    Abstract In this paper the general theory of multiway multiblock component and covariates regression models is explained. Unlike in existing methods such as multiblock PLS and multiblock PCA, in the new proposed method a different number... more
    Abstract In this paper the general theory of multiway multiblock component and covariates regression models is explained. Unlike in existing methods such as multiblock PLS and multiblock PCA, in the new proposed method a different number of components can be selected for each block. Furthermore, the method can be generalized to incorporate multiway blocks to which any multiway model can be applied. The method is a direct extension of principal covariates regression and therefore works in a simultaneous ...
    In both analytical and process chemistry, one common aim is to build models describing measured data. In cases where additional information about the chemical system is available, this can be incorporated into the model with the aim of... more
    In both analytical and process chemistry, one common aim is to build models describing measured data. In cases where additional information about the chemical system is available, this can be incorporated into the model with the aim of improving model fit and interpretability. A ...
    Clustering of metabolomics data can be hampered by noise originating from biological variation, physical sampling error and analytical error. Using data analysis methods which are not specially suited for dealing with noisy data will... more
    Clustering of metabolomics data can be hampered by noise originating from biological variation, physical sampling error and analytical error. Using data analysis methods which are not specially suited for dealing with noisy data will yield sub optimal solutions. Bootstrap aggregating (bagging) is a resampling technique that can deal with noise and improves accuracy. This paper demonstrates the possibilities for bagged clustering applied to metabolomics data. The metabolomics data used in this paper is computer-generated with ...

    And 124 more