SummaryEpigenetic modifications reflect key aspects of transcriptional regulation, and many epige... more SummaryEpigenetic modifications reflect key aspects of transcriptional regulation, and many epigenomic data sets have been generated under many biological contexts to provide insights into regulatory processes. However, the technical noise in epigenomic data sets and the many dimensions (features) examined make it challenging to effectively extract biologically meaningful inferences from these data sets. We developed a package that reduces noise while normalizing the epigenomic data by a novel normalization method, followed by integrative dimensional reduction by learning and assigning epigenetic states. This package, called S3V2-IDEAS, can be used to identify epigenetic states for multiple features, or identify signal intensity states and a master peak list across different cell types for a single feature. We illustrate the outputs and performance of S3V2-IDEAS using 137 epigenomics data sets from the VISION project that provides ValIdated Systematic IntegratiON of epigenomic data ...
Joint analyses of genomic datasets obtained in multiple different conditions are essential for un... more Joint analyses of genomic datasets obtained in multiple different conditions are essential for understanding the biological mechanism that drives tissue-specificity and cell differentiation, but they still remain computationally challenging. To address this we introduce CLIMB (Composite LIkelihood eMpirical Bayes), a statistical methodology that learns patterns of condition-specificity present in genomic data. CLIMB provides a generic framework facilitating a host of analyses, such as clustering genomic features sharing similar condition-specific patterns and identifying which of these features are involved in cell fate commitment. Our approach improves upon existing methods by boosting statistical power to identify meaningful signals while retaining interpretability and computational tractability. We illustrate CLIMB’s value on two sets of hematopoietic data: one studying CTCF ChIP-seq measured in 17 different cell populations, and another examining RNA-seq measured across constitu...
The spatial organization of chromatin in the nucleus has been implicated in regulating gene expre... more The spatial organization of chromatin in the nucleus has been implicated in regulating gene expression. Maps of high-frequency interactions between different segments of chromatin have revealed topologically associating domains (TADs), within which most of the regulatory interactions are thought to occur. TADs are not homogeneous structural units but appear to be organized into a hierarchy. We present OnTAD, an optimized nested TAD caller from Hi-C data, to identify hierarchical TADs. OnTAD reveals new biological insights into the role of different TAD levels, boundary usage in gene regulation, the loop extrusion model, and compartmental domains. OnTAD is available at https://github.com/anlin00007/OnTAD.
Members of the GATA family of transcription factors play key roles in the differentiation of spec... more Members of the GATA family of transcription factors play key roles in the differentiation of specific cell lineages by regulating the expression of target genes. Three GATA factors play distinct roles in hematopoietic differentiation. In order to better understand how these GATA factors function to regulate genes throughout the genome, we are studying the epigenomic and transcriptional landscapes of hematopoietic cells in a model-driven, integrative fashion. We have formed the collaborative multi-lab VISION project to conduct ValIdated Systematic IntegratiON of epigenomic data in mouse and human hematopoiesis. The epigenomic data included nuclease accessibility in chromatin, CTCF occupancy, and histone H3 modifications for twenty cell types covering hematopoietic stem cells, multilineage progenitor cells, and mature cells across the blood cell lineages of mouse. The analysis used the Integrative and Discriminative Epigenome Annotation System (IDEAS), which learns all common combinat...
Thousands of epigenomic datasets have been generated in the past decade, but it is difficult for ... more Thousands of epigenomic datasets have been generated in the past decade, but it is difficult for researchers to effectively utilize all the data relevant to their projects. Systematic integrative analysis can help meet this need, and the VISION project was established for ValIdated Systematic IntegratiON of epigenomic data in hematopoiesis. Here, we systematically integrated extensive data recording epigenetic features and transcriptomes from many sources, including individual laboratories and consortia, to produce a comprehensive view of the regulatory landscape of differentiating hematopoietic cell types in mouse. By employing IDEAS as our Integrative and Discriminative Epigenome Annotation System, we identified and assigned epigenetic states simultaneously along chromosomes and across cell types, precisely and comprehensively. Combining nuclease accessibility and epigenetic states produced a set of over 200,000 candidate cis-regulatory elements (cCREs) that efficiently capture en...
The spatial organization of chromatin in the nucleus has been implicated in many aspects of regul... more The spatial organization of chromatin in the nucleus has been implicated in many aspects of regulated gene expression. Maps of high frequency interactions between different segments of chromatin have revealed Topologically Associating Domains (TADs), within which most of the regulatory interactions are thought to occur. Recent studies have shown that TADs are not homogeneous structural units, but rather they appear to be organized into a hierarchy. However, precise identification of hierarchical TAD structures remains a challenge. We present OnTAD, an Optimized Nested TAD caller from Hi-C data, to identify hierarchical TADs. Compared to existing methods, OnTAD has significantly improved accuracy and running speed. Results from OnTAD reveal new biological insights on the role of different TAD levels, boundary usage in gene regulation, the loop extrusion model, and compartmental domains. The software and documentation for OnTAD are available at: https://github.com/anlin00007/OnTAD
Epigenetic modification of chromatin plays a pivotal role in regulating gene expression during ce... more Epigenetic modification of chromatin plays a pivotal role in regulating gene expression during cell differentiation. The scale and complexity of epigenetic data pose a significant challenge for biologists to identify the regulatory events controlling each stage of cell differentiation. Here, we present a new method, called Snapshot, that uses epigenetic data to generate a hierarchical visualization for the DNA regions segregating with respect to epigenetics along any given cell differentiation hierarchy of interest. Different cell type hierarchies may be used to highlight the epigenetic history specific to any particular lineage of cell differentiation. We demonstrate the utility of Snapshot using data from the VISION project, an international project for ValIdated Systematic IntegratiON of epigenomic data in mouse and human hematopoiesis.
Motivation: Quantitative comparison of epigenomic data across multiple cell types is a promising ... more Motivation: Quantitative comparison of epigenomic data across multiple cell types is a promising way to understand the biological functions of epigenetic modifications. However, differences in sequencing depth and signal-to-noise ratios ratios compromise our ability to infer real biological differences from raw epigenomic data. Proper data normalization is therefore needed before meaningful data analyses. Existing normalization methods are designed mainly for scaling signals in either whole-genome or peak regions, without considering potential difference in signal-to-noise ratios between data sets. Results: We propose a new data normalization method called S3norm to standardize data by a monotonic nonlinear transformation to match signals in both peak regions and background regions differently, such that both sequencing depth and signal-to-noise ratio between data sets can be normalized simultaneously. We show that the epigenomic data normalized in this way can better reflect real b...
SummaryEpigenetic modifications reflect key aspects of transcriptional regulation, and many epige... more SummaryEpigenetic modifications reflect key aspects of transcriptional regulation, and many epigenomic data sets have been generated under many biological contexts to provide insights into regulatory processes. However, the technical noise in epigenomic data sets and the many dimensions (features) examined make it challenging to effectively extract biologically meaningful inferences from these data sets. We developed a package that reduces noise while normalizing the epigenomic data by a novel normalization method, followed by integrative dimensional reduction by learning and assigning epigenetic states. This package, called S3V2-IDEAS, can be used to identify epigenetic states for multiple features, or identify signal intensity states and a master peak list across different cell types for a single feature. We illustrate the outputs and performance of S3V2-IDEAS using 137 epigenomics data sets from the VISION project that provides ValIdated Systematic IntegratiON of epigenomic data ...
Joint analyses of genomic datasets obtained in multiple different conditions are essential for un... more Joint analyses of genomic datasets obtained in multiple different conditions are essential for understanding the biological mechanism that drives tissue-specificity and cell differentiation, but they still remain computationally challenging. To address this we introduce CLIMB (Composite LIkelihood eMpirical Bayes), a statistical methodology that learns patterns of condition-specificity present in genomic data. CLIMB provides a generic framework facilitating a host of analyses, such as clustering genomic features sharing similar condition-specific patterns and identifying which of these features are involved in cell fate commitment. Our approach improves upon existing methods by boosting statistical power to identify meaningful signals while retaining interpretability and computational tractability. We illustrate CLIMB’s value on two sets of hematopoietic data: one studying CTCF ChIP-seq measured in 17 different cell populations, and another examining RNA-seq measured across constitu...
The spatial organization of chromatin in the nucleus has been implicated in regulating gene expre... more The spatial organization of chromatin in the nucleus has been implicated in regulating gene expression. Maps of high-frequency interactions between different segments of chromatin have revealed topologically associating domains (TADs), within which most of the regulatory interactions are thought to occur. TADs are not homogeneous structural units but appear to be organized into a hierarchy. We present OnTAD, an optimized nested TAD caller from Hi-C data, to identify hierarchical TADs. OnTAD reveals new biological insights into the role of different TAD levels, boundary usage in gene regulation, the loop extrusion model, and compartmental domains. OnTAD is available at https://github.com/anlin00007/OnTAD.
Members of the GATA family of transcription factors play key roles in the differentiation of spec... more Members of the GATA family of transcription factors play key roles in the differentiation of specific cell lineages by regulating the expression of target genes. Three GATA factors play distinct roles in hematopoietic differentiation. In order to better understand how these GATA factors function to regulate genes throughout the genome, we are studying the epigenomic and transcriptional landscapes of hematopoietic cells in a model-driven, integrative fashion. We have formed the collaborative multi-lab VISION project to conduct ValIdated Systematic IntegratiON of epigenomic data in mouse and human hematopoiesis. The epigenomic data included nuclease accessibility in chromatin, CTCF occupancy, and histone H3 modifications for twenty cell types covering hematopoietic stem cells, multilineage progenitor cells, and mature cells across the blood cell lineages of mouse. The analysis used the Integrative and Discriminative Epigenome Annotation System (IDEAS), which learns all common combinat...
Thousands of epigenomic datasets have been generated in the past decade, but it is difficult for ... more Thousands of epigenomic datasets have been generated in the past decade, but it is difficult for researchers to effectively utilize all the data relevant to their projects. Systematic integrative analysis can help meet this need, and the VISION project was established for ValIdated Systematic IntegratiON of epigenomic data in hematopoiesis. Here, we systematically integrated extensive data recording epigenetic features and transcriptomes from many sources, including individual laboratories and consortia, to produce a comprehensive view of the regulatory landscape of differentiating hematopoietic cell types in mouse. By employing IDEAS as our Integrative and Discriminative Epigenome Annotation System, we identified and assigned epigenetic states simultaneously along chromosomes and across cell types, precisely and comprehensively. Combining nuclease accessibility and epigenetic states produced a set of over 200,000 candidate cis-regulatory elements (cCREs) that efficiently capture en...
The spatial organization of chromatin in the nucleus has been implicated in many aspects of regul... more The spatial organization of chromatin in the nucleus has been implicated in many aspects of regulated gene expression. Maps of high frequency interactions between different segments of chromatin have revealed Topologically Associating Domains (TADs), within which most of the regulatory interactions are thought to occur. Recent studies have shown that TADs are not homogeneous structural units, but rather they appear to be organized into a hierarchy. However, precise identification of hierarchical TAD structures remains a challenge. We present OnTAD, an Optimized Nested TAD caller from Hi-C data, to identify hierarchical TADs. Compared to existing methods, OnTAD has significantly improved accuracy and running speed. Results from OnTAD reveal new biological insights on the role of different TAD levels, boundary usage in gene regulation, the loop extrusion model, and compartmental domains. The software and documentation for OnTAD are available at: https://github.com/anlin00007/OnTAD
Epigenetic modification of chromatin plays a pivotal role in regulating gene expression during ce... more Epigenetic modification of chromatin plays a pivotal role in regulating gene expression during cell differentiation. The scale and complexity of epigenetic data pose a significant challenge for biologists to identify the regulatory events controlling each stage of cell differentiation. Here, we present a new method, called Snapshot, that uses epigenetic data to generate a hierarchical visualization for the DNA regions segregating with respect to epigenetics along any given cell differentiation hierarchy of interest. Different cell type hierarchies may be used to highlight the epigenetic history specific to any particular lineage of cell differentiation. We demonstrate the utility of Snapshot using data from the VISION project, an international project for ValIdated Systematic IntegratiON of epigenomic data in mouse and human hematopoiesis.
Motivation: Quantitative comparison of epigenomic data across multiple cell types is a promising ... more Motivation: Quantitative comparison of epigenomic data across multiple cell types is a promising way to understand the biological functions of epigenetic modifications. However, differences in sequencing depth and signal-to-noise ratios ratios compromise our ability to infer real biological differences from raw epigenomic data. Proper data normalization is therefore needed before meaningful data analyses. Existing normalization methods are designed mainly for scaling signals in either whole-genome or peak regions, without considering potential difference in signal-to-noise ratios between data sets. Results: We propose a new data normalization method called S3norm to standardize data by a monotonic nonlinear transformation to match signals in both peak regions and background regions differently, such that both sequencing depth and signal-to-noise ratio between data sets can be normalized simultaneously. We show that the epigenomic data normalized in this way can better reflect real b...
Uploads
Papers by Guanjue Xiang