Skin homeostasis is maintained by stem cells, which must communicate to balance their regenerativ... more Skin homeostasis is maintained by stem cells, which must communicate to balance their regenerative behaviors. Yet, how adult stem cells signal across regenerative tissue remains unknown due to challenges in studying signaling dynamics in live mice. We combined live imaging in the mouse basal stem cell layer with machine learning tools to analyze patterns of Ca2+ signaling. We show that basal cells display dynamic intercellular Ca2+ signaling among local neighborhoods. We find that these Ca2+ signals are coordinated across thousands of cells and that this coordination is an emergent property of the stem cell layer. We demonstrate that G2 cells are required to initiate normal levels of Ca2+ signaling, while connexin43 connects basal cells to orchestrate tissue-wide coordination of Ca2+ signaling. Lastly, we find that Ca2+ signaling drives cell cycle progression, revealing a communication feedback loop. This work provides resolution into how stem cells at different cell cycle stages co...
Due to commonalities in pathophysiology, age-related macular degeneration (AMD) represents a uniq... more Due to commonalities in pathophysiology, age-related macular degeneration (AMD) represents a uniquely accessible model to investigate therapies for neurodegenerative diseases, leading us to examine whether pathways of disease progression are shared across neurodegenerative conditions. Here we use single-nucleus RNA sequencing to profile lesions from 11 postmortem human retinas with age-related macular degeneration and 6 control retinas with no history of retinal disease. We create a machine-learning pipeline based on recent advances in data geometry and topology and identify activated glial populations enriched in the early phase of disease. Examining single-cell data from Alzheimer’s disease and progressive multiple sclerosis with our pipeline, we find a similar glial activation profile enriched in the early phase of these neurodegenerative diseases. In late-stage age-related macular degeneration, we identify a microglia-to-astrocyte signaling axis mediated by interleukin-1β which ...
The complexity and intelligence of the brain give the illusion that measurements of brain activit... more The complexity and intelligence of the brain give the illusion that measurements of brain activity will have intractably high dimensionality, rifewith collection and biological noise. Nonlinear dimensionality reduction methods like UMAP and t-SNE have proven useful for high-throughput biomedical data. However, they have not been used extensively for brain imaging data such as from functional magnetic resonance imaging (fMRI), a noninvasive, secondary measure of neural activity over time containing redundancy and co-modulation from neural population activity. Here we introduce a nonlinear manifold learning algorithm for timeseries data like fMRI, called temporal potential of heat diffusion for affinity-based transition embedding (T-PHATE). In addition to recovering a lower intrinsic dimensionality from timeseries data, T-PHATE exploits autocorrelative structure within the data to faithfully denoise dynamic signals and learn activation manifolds. We empirically validate T-PHATE on thr...
Phenotypic plasticity describes the ability of cancer cells to undergo dynamic, nongenetic cell s... more Phenotypic plasticity describes the ability of cancer cells to undergo dynamic, nongenetic cell state changes that amplify cancer heterogeneity to promote metastasis and therapy evasion. Thus, cancer cells occupy a continuous spectrum of phenotypic states connected by trajectories defining dynamic transitions upon a cancer cell state landscape. With technologies proliferating to systematically record molecular mechanisms at single-cell resolution, we illuminate manifold learning techniques as emerging computational tools to effectively model cell state dynamics in a way that mimics our understanding of the cell state landscape. We anticipate that “state-gating” therapies targeting phenotypic plasticity will limit cancer heterogeneity, metastasis, and therapy resistance. Significance: Nongenetic mechanisms underlying phenotypic plasticity have emerged as significant drivers of tumor heterogeneity, metastasis, and therapy resistance. Herein, we discuss new experimental and computation...
The development of powerful natural language models have increased the ability to learn meaningfu... more The development of powerful natural language models have increased the ability to learn meaningful representations of protein sequences. In addition, advances in high-throughput mutagenesis, directed evolution, and next-generation sequencing have allowed for the accumulation of large amounts of labeled fitness data. Leveraging these two trends, we introduce Regularized Latent Space Optimization (ReLSO), a deep transformer-based autoencoder which is trained to jointly generate sequences as well as predict fitness. Using ReLSO, we explicitly model the underlying sequence-function landscape of large labeled datasets and optimize within latent space using gradient-based methods. Through regularized prediction heads, ReLSO introduces a powerful protein sequence encoder and novel approach for efficient fitness landscape traversal.
The last decade has witnessed a technological arms race to encode the molecular states of cells i... more The last decade has witnessed a technological arms race to encode the molecular states of cells into DNA libraries, turning DNA sequencers into scalable single-cell microscopes. Single-cell measurement of chromatin accessibility (DNA), gene expression (RNA), and proteins has revealed rich cellular diversity across tissues, organisms, and disease states. However, single-cell data poses a unique set of challenges. A dataset may comprise millions of cells with tens of thousands of sparse features. Identifying biologically relevant signals from the background sources of technical noise requires innovation in predictive and representational learning. Furthermore, unlike in machine vision or natural language processing, biological ground truth is limited. Here we leverage recent advances in multi-modal single-cell technologies which, by simultaneously measuring two layers of cellular processing in each cell, provide ground truth analogous to language translation. We define three key tasks...
Skin epidermal homeostasis is maintained via constant regeneration by stem cells, which must comm... more Skin epidermal homeostasis is maintained via constant regeneration by stem cells, which must communicate to balance their self-renewal and differentiation. A key molecular pathway, Ca2+ signaling has been implicated as a signal integrator in developing and wounded epithelial tissues[1, 2, 3, 4]. Yet how stem cells carry out this signaling across a regenerative tissue remains unknown due to significant challenges in studying signaling dynamics in live mice, limiting our understanding of the mechanisms of stem cell communication during homeostasis. To interpret high dimensional signals that have complex spatial and temporal patterns, we combined optimized imaging of Ca2+ signaling in thousands of epidermal stem cells in living mice with a new machine learning tool, Geometric Scattering Trajectory Homology (GSTH). Using a combination of signal processing, data geometry, and topology, GSTH captures patterns of signaling at multiple scales, either between direct or distant stem cell neig...
In many important contexts involving measurements of biological entities, there are distinct cate... more In many important contexts involving measurements of biological entities, there are distinct categories of information: some information is easy-to-obtain information (EI) and can be gathered on virtually every subject of interest, while other information is hard-to-obtain information (HI) and can only be gathered on some of the biological samples. For example, in the context of drug discovery, measurements like the chemical structure of a drug are EI, while measurements of the transcriptome of a cell population perturbed with the drug is HI. In the clinical context, basic health monitoring is EI because it is already being captured as part of other processes, while cellular measurements like flow cytometry or even ultimate patient outcome are HI. We propose building a model to make probabilistic predictions of HI from EI on the samples that have both kinds of measurements, which will allow us to generalize and predict the HI on a large set of samples from just the EI. To accomplish...
An amendment to this paper has been published and can be accessed via a link at the top of the pa... more An amendment to this paper has been published and can be accessed via a link at the top of the paper.
Previously, the effect of a drug on a cell population was measured based on simple metrics such a... more Previously, the effect of a drug on a cell population was measured based on simple metrics such as cell viability. However, as single-cell technologies are becoming more advanced, drug screen experiments can now be conducted with more complex readouts such as gene expression profiles of individual cells. The increasing complexity of measurements from these multi-sample experiments calls for more sophisticated analytical approaches than are currently available. We developed a novel method called PhEMD (Phenotypic Earth Mover’s Distance) and show that it can be used to embed the space of drug perturbations on the basis of the drugs’ effects on cell populations. When testing PhEMD on a newly-generated, 300-sample CyTOF kinase inhibition screen experiment, we find that the state space of the perturbation conditions is surprisingly low-dimensional and that the network of drugs demonstrates manifold structure. We show that because of the fairly simple manifold geometry of the 300 samples,...
Journal of immunology (Baltimore, Md. : 1950), Jan 6, 2018
Type 1 diabetes (T1D) is most likely caused by killing of β cells by autoreactive CD8 T cells. Me... more Type 1 diabetes (T1D) is most likely caused by killing of β cells by autoreactive CD8 T cells. Methods to isolate and identify these cells are limited by their low frequency in the peripheral blood. We analyzed CD8 T cells, reactive with diabetes Ags, with T cell libraries and further characterized their phenotype by CyTOF using class I MHC tetramers. In the libraries, the frequency of islet Ag-specific CD45ROIFN-γCD8 T cells was higher in patients with T1D compared with healthy control subjects. Ag-specific cells from the libraries of patients with T1D were reactive with ZnT8, whereas those from healthy control recognized ZnT8 and other Ags. ZnT8-reactive CD8 cells expressed an activation phenotype in T1D patients. We found TCR sequences that were used in multiple library wells from patients with T1D, but these sequences were private and not shared between individuals. These sequences could identify the Ag-specific T cells on a repeated draw, ex vivo in the IFN-γ CD8 T cell subset....
Background: A leading model of cancer metastasis is epithelial-to-mesenchymal transition (EMT). W... more Background: A leading model of cancer metastasis is epithelial-to-mesenchymal transition (EMT). We sought to determine whether single-cell inhibition data targeting potential mediators of EMT could uncover mechanistic insights into the EMT process. Methods: EMT was artificially induced on Py2T murine breast cancer cells by TGFb treatment. Additionally, a unique drug inhibitor was added to each well of a multiplexed CyTOF experiment. 37 transcription factors and cell surface markers were measured in each cell to assess epithelial and mesenchymal states, SMAD, AKT, and MAPK signaling activity, cell cycle regulation, and apoptosis pathway activation. The final single-cell dataset consisted of 300 inhibition and control conditions (cell populations), which we aimed to characterize in relation to one another with respect to effect on EMT. Analyzing the similarity between drug inhibitions amounts to a novel type of clustering problem that involves computing the similarity between diverse ...
ABSTRACTSingle-cell RNA-sequencing is fast becoming a major technology that is revolutionizing bi... more ABSTRACTSingle-cell RNA-sequencing is fast becoming a major technology that is revolutionizing biological discovery in fields such as development, immunology and cancer. The ability to simultaneously measure thousands of genes at single cell resolution allows, among other prospects, for the possibility of learning gene regulatory networks at large scales. However, scRNA-seq technologies suffer from many sources of significant technical noise, the most prominent of which is ‘dropout’ due to inefficient mRNA capture. This results in data that has a high degree of sparsity, with typically only ~10% non-zero values. To address this, we developed MAGIC (Markov Affinity-based Graph Imputation of Cells), a method for imputing missing values, and restoring the structure of the data. After MAGIC, we find that two- and three-dimensional gene interactions are restored and that MAGIC is able to impute complex and non-linear shapes of interactions. MAGIC also retains cluster structure, enhances ...
Skin homeostasis is maintained by stem cells, which must communicate to balance their regenerativ... more Skin homeostasis is maintained by stem cells, which must communicate to balance their regenerative behaviors. Yet, how adult stem cells signal across regenerative tissue remains unknown due to challenges in studying signaling dynamics in live mice. We combined live imaging in the mouse basal stem cell layer with machine learning tools to analyze patterns of Ca2+ signaling. We show that basal cells display dynamic intercellular Ca2+ signaling among local neighborhoods. We find that these Ca2+ signals are coordinated across thousands of cells and that this coordination is an emergent property of the stem cell layer. We demonstrate that G2 cells are required to initiate normal levels of Ca2+ signaling, while connexin43 connects basal cells to orchestrate tissue-wide coordination of Ca2+ signaling. Lastly, we find that Ca2+ signaling drives cell cycle progression, revealing a communication feedback loop. This work provides resolution into how stem cells at different cell cycle stages co...
Due to commonalities in pathophysiology, age-related macular degeneration (AMD) represents a uniq... more Due to commonalities in pathophysiology, age-related macular degeneration (AMD) represents a uniquely accessible model to investigate therapies for neurodegenerative diseases, leading us to examine whether pathways of disease progression are shared across neurodegenerative conditions. Here we use single-nucleus RNA sequencing to profile lesions from 11 postmortem human retinas with age-related macular degeneration and 6 control retinas with no history of retinal disease. We create a machine-learning pipeline based on recent advances in data geometry and topology and identify activated glial populations enriched in the early phase of disease. Examining single-cell data from Alzheimer’s disease and progressive multiple sclerosis with our pipeline, we find a similar glial activation profile enriched in the early phase of these neurodegenerative diseases. In late-stage age-related macular degeneration, we identify a microglia-to-astrocyte signaling axis mediated by interleukin-1β which ...
The complexity and intelligence of the brain give the illusion that measurements of brain activit... more The complexity and intelligence of the brain give the illusion that measurements of brain activity will have intractably high dimensionality, rifewith collection and biological noise. Nonlinear dimensionality reduction methods like UMAP and t-SNE have proven useful for high-throughput biomedical data. However, they have not been used extensively for brain imaging data such as from functional magnetic resonance imaging (fMRI), a noninvasive, secondary measure of neural activity over time containing redundancy and co-modulation from neural population activity. Here we introduce a nonlinear manifold learning algorithm for timeseries data like fMRI, called temporal potential of heat diffusion for affinity-based transition embedding (T-PHATE). In addition to recovering a lower intrinsic dimensionality from timeseries data, T-PHATE exploits autocorrelative structure within the data to faithfully denoise dynamic signals and learn activation manifolds. We empirically validate T-PHATE on thr...
Phenotypic plasticity describes the ability of cancer cells to undergo dynamic, nongenetic cell s... more Phenotypic plasticity describes the ability of cancer cells to undergo dynamic, nongenetic cell state changes that amplify cancer heterogeneity to promote metastasis and therapy evasion. Thus, cancer cells occupy a continuous spectrum of phenotypic states connected by trajectories defining dynamic transitions upon a cancer cell state landscape. With technologies proliferating to systematically record molecular mechanisms at single-cell resolution, we illuminate manifold learning techniques as emerging computational tools to effectively model cell state dynamics in a way that mimics our understanding of the cell state landscape. We anticipate that “state-gating” therapies targeting phenotypic plasticity will limit cancer heterogeneity, metastasis, and therapy resistance. Significance: Nongenetic mechanisms underlying phenotypic plasticity have emerged as significant drivers of tumor heterogeneity, metastasis, and therapy resistance. Herein, we discuss new experimental and computation...
The development of powerful natural language models have increased the ability to learn meaningfu... more The development of powerful natural language models have increased the ability to learn meaningful representations of protein sequences. In addition, advances in high-throughput mutagenesis, directed evolution, and next-generation sequencing have allowed for the accumulation of large amounts of labeled fitness data. Leveraging these two trends, we introduce Regularized Latent Space Optimization (ReLSO), a deep transformer-based autoencoder which is trained to jointly generate sequences as well as predict fitness. Using ReLSO, we explicitly model the underlying sequence-function landscape of large labeled datasets and optimize within latent space using gradient-based methods. Through regularized prediction heads, ReLSO introduces a powerful protein sequence encoder and novel approach for efficient fitness landscape traversal.
The last decade has witnessed a technological arms race to encode the molecular states of cells i... more The last decade has witnessed a technological arms race to encode the molecular states of cells into DNA libraries, turning DNA sequencers into scalable single-cell microscopes. Single-cell measurement of chromatin accessibility (DNA), gene expression (RNA), and proteins has revealed rich cellular diversity across tissues, organisms, and disease states. However, single-cell data poses a unique set of challenges. A dataset may comprise millions of cells with tens of thousands of sparse features. Identifying biologically relevant signals from the background sources of technical noise requires innovation in predictive and representational learning. Furthermore, unlike in machine vision or natural language processing, biological ground truth is limited. Here we leverage recent advances in multi-modal single-cell technologies which, by simultaneously measuring two layers of cellular processing in each cell, provide ground truth analogous to language translation. We define three key tasks...
Skin epidermal homeostasis is maintained via constant regeneration by stem cells, which must comm... more Skin epidermal homeostasis is maintained via constant regeneration by stem cells, which must communicate to balance their self-renewal and differentiation. A key molecular pathway, Ca2+ signaling has been implicated as a signal integrator in developing and wounded epithelial tissues[1, 2, 3, 4]. Yet how stem cells carry out this signaling across a regenerative tissue remains unknown due to significant challenges in studying signaling dynamics in live mice, limiting our understanding of the mechanisms of stem cell communication during homeostasis. To interpret high dimensional signals that have complex spatial and temporal patterns, we combined optimized imaging of Ca2+ signaling in thousands of epidermal stem cells in living mice with a new machine learning tool, Geometric Scattering Trajectory Homology (GSTH). Using a combination of signal processing, data geometry, and topology, GSTH captures patterns of signaling at multiple scales, either between direct or distant stem cell neig...
In many important contexts involving measurements of biological entities, there are distinct cate... more In many important contexts involving measurements of biological entities, there are distinct categories of information: some information is easy-to-obtain information (EI) and can be gathered on virtually every subject of interest, while other information is hard-to-obtain information (HI) and can only be gathered on some of the biological samples. For example, in the context of drug discovery, measurements like the chemical structure of a drug are EI, while measurements of the transcriptome of a cell population perturbed with the drug is HI. In the clinical context, basic health monitoring is EI because it is already being captured as part of other processes, while cellular measurements like flow cytometry or even ultimate patient outcome are HI. We propose building a model to make probabilistic predictions of HI from EI on the samples that have both kinds of measurements, which will allow us to generalize and predict the HI on a large set of samples from just the EI. To accomplish...
An amendment to this paper has been published and can be accessed via a link at the top of the pa... more An amendment to this paper has been published and can be accessed via a link at the top of the paper.
Previously, the effect of a drug on a cell population was measured based on simple metrics such a... more Previously, the effect of a drug on a cell population was measured based on simple metrics such as cell viability. However, as single-cell technologies are becoming more advanced, drug screen experiments can now be conducted with more complex readouts such as gene expression profiles of individual cells. The increasing complexity of measurements from these multi-sample experiments calls for more sophisticated analytical approaches than are currently available. We developed a novel method called PhEMD (Phenotypic Earth Mover’s Distance) and show that it can be used to embed the space of drug perturbations on the basis of the drugs’ effects on cell populations. When testing PhEMD on a newly-generated, 300-sample CyTOF kinase inhibition screen experiment, we find that the state space of the perturbation conditions is surprisingly low-dimensional and that the network of drugs demonstrates manifold structure. We show that because of the fairly simple manifold geometry of the 300 samples,...
Journal of immunology (Baltimore, Md. : 1950), Jan 6, 2018
Type 1 diabetes (T1D) is most likely caused by killing of β cells by autoreactive CD8 T cells. Me... more Type 1 diabetes (T1D) is most likely caused by killing of β cells by autoreactive CD8 T cells. Methods to isolate and identify these cells are limited by their low frequency in the peripheral blood. We analyzed CD8 T cells, reactive with diabetes Ags, with T cell libraries and further characterized their phenotype by CyTOF using class I MHC tetramers. In the libraries, the frequency of islet Ag-specific CD45ROIFN-γCD8 T cells was higher in patients with T1D compared with healthy control subjects. Ag-specific cells from the libraries of patients with T1D were reactive with ZnT8, whereas those from healthy control recognized ZnT8 and other Ags. ZnT8-reactive CD8 cells expressed an activation phenotype in T1D patients. We found TCR sequences that were used in multiple library wells from patients with T1D, but these sequences were private and not shared between individuals. These sequences could identify the Ag-specific T cells on a repeated draw, ex vivo in the IFN-γ CD8 T cell subset....
Background: A leading model of cancer metastasis is epithelial-to-mesenchymal transition (EMT). W... more Background: A leading model of cancer metastasis is epithelial-to-mesenchymal transition (EMT). We sought to determine whether single-cell inhibition data targeting potential mediators of EMT could uncover mechanistic insights into the EMT process. Methods: EMT was artificially induced on Py2T murine breast cancer cells by TGFb treatment. Additionally, a unique drug inhibitor was added to each well of a multiplexed CyTOF experiment. 37 transcription factors and cell surface markers were measured in each cell to assess epithelial and mesenchymal states, SMAD, AKT, and MAPK signaling activity, cell cycle regulation, and apoptosis pathway activation. The final single-cell dataset consisted of 300 inhibition and control conditions (cell populations), which we aimed to characterize in relation to one another with respect to effect on EMT. Analyzing the similarity between drug inhibitions amounts to a novel type of clustering problem that involves computing the similarity between diverse ...
ABSTRACTSingle-cell RNA-sequencing is fast becoming a major technology that is revolutionizing bi... more ABSTRACTSingle-cell RNA-sequencing is fast becoming a major technology that is revolutionizing biological discovery in fields such as development, immunology and cancer. The ability to simultaneously measure thousands of genes at single cell resolution allows, among other prospects, for the possibility of learning gene regulatory networks at large scales. However, scRNA-seq technologies suffer from many sources of significant technical noise, the most prominent of which is ‘dropout’ due to inefficient mRNA capture. This results in data that has a high degree of sparsity, with typically only ~10% non-zero values. To address this, we developed MAGIC (Markov Affinity-based Graph Imputation of Cells), a method for imputing missing values, and restoring the structure of the data. After MAGIC, we find that two- and three-dimensional gene interactions are restored and that MAGIC is able to impute complex and non-linear shapes of interactions. MAGIC also retains cluster structure, enhances ...
Uploads
Papers by Smita Krishnaswamy