Search | arXiv e-print repository

When are Deep Networks really better than Decision Forests at small sample sizes, and how?

Authors: Haoyin Xu, Kaleab A. Kinfu, Will LeVine, Sambit Panda, Jayanta Dey, Michael Ainsworth, Yu-Chung Peng, Madi Kusmanov, Florian Engert, Christopher M. White, Joshua T. Vogelstein, Carey E. Priebe

Abstract: Deep networks and decision forests (such as random forests and gradient boosted trees) are the leading machine learning methods for structured and tabular data, respectively. Many papers have empirically compared large numbers of classifiers on one or two different domains (e.g., on 100 different tabular data settings). However, a careful conceptual and empirical comparison of these two strategies… ▽ More Deep networks and decision forests (such as random forests and gradient boosted trees) are the leading machine learning methods for structured and tabular data, respectively. Many papers have empirically compared large numbers of classifiers on one or two different domains (e.g., on 100 different tabular data settings). However, a careful conceptual and empirical comparison of these two strategies using the most contemporary best practices has yet to be performed. Conceptually, we illustrate that both can be profitably viewed as "partition and vote" schemes. Specifically, the representation space that they both learn is a partitioning of feature space into a union of convex polytopes. For inference, each decides on the basis of votes from the activated nodes. This formulation allows for a unified basic understanding of the relationship between these methods. Empirically, we compare these two strategies on hundreds of tabular data settings, as well as several vision and auditory settings. Our focus is on datasets with at most 10,000 samples, which represent a large fraction of scientific and biomedical datasets. In general, we found forests to excel at tabular and structured data (vision and audition) with small sample sizes, whereas deep nets performed better on structured data with larger sample sizes. This suggests that further gains in both scenarios may be realized via further combining aspects of forests and networks. We will continue revising this technical report in the coming months with updated results. △ Less

Submitted 2 November, 2021; v1 submitted 31 August, 2021; originally announced August 2021.

arXiv:2011.14990 [pdf, other]

Discovery of Multi-Level Network Differences Across Populations of Heterogeneous Connectomes

Authors: Vivek Gopalakrishnan, Jaewon Chung, Eric Bridgeford, Benjamin D. Pedigo, Jesús Arroyo, Lucy Upchurch, G. Allan Johnson, Nian Wang, Youngser Park, Carey E. Priebe, Joshua T. Vogelstein

Abstract: A connectome is a map of the structural and/or functional connections in the brain. This information-rich representation has the potential to transform our understanding of the relationship between patterns in brain connectivity and neurological processes, disorders, and diseases. However, existing computational techniques used to analyze connectomes are oftentimes insufficient for interrogating m… ▽ More A connectome is a map of the structural and/or functional connections in the brain. This information-rich representation has the potential to transform our understanding of the relationship between patterns in brain connectivity and neurological processes, disorders, and diseases. However, existing computational techniques used to analyze connectomes are oftentimes insufficient for interrogating multi-subject connectomics datasets: most current methods are either solely designed to analyze single connectomes, or leverage heuristic graph statistics that are unable to capture the complete topology of connections between brain regions. To enable more rigorous connectomics analysis, we introduce a set of robust and interpretable statistical hypothesis tests motivated by recent theoretical advances in random graph models. These tests facilitate simultaneous analysis of multiple connectomes across different levels of network topology, enabling the robust and reproducible discovery of hierarchical brain structures that vary in relation with phenotypic profiles. In addition to explaining the theoretical foundations and guarantees of our hypothesis tests, we demonstrate their superiority over current state-of-the-art connectomics methods through extensive simulation studies, as well as synthetic and real-data experiments. Using a set of high-resolution connectomes obtained from genetically distinct mouse strains (including the BTBR mouse -- a standard model of autism -- and three behavioral wild-types), we illustrate how our methods can be used to successfully uncover latent information in multi-subject connectomics data and yield valuable insights into the connective correlates of neurological phenotypes. The code necessary to reproduce the analyses, simulations, and figures presented in this work are available in a series of Jupyter Notebooks (https://github.com/neurodata/MCC). △ Less

Submitted 13 April, 2022; v1 submitted 30 November, 2020; originally announced November 2020.

Comments: 29 pages, 12 figures

arXiv:1507.08376 [pdf, other]

A Joint Graph Inference Case Study: the C.elegans Chemical and Electrical Connectomes

Authors: Li Chen, Joshua T. Vogelstein, Vince Lyzinski, Carey E. Priebe

Abstract: We investigate joint graph inference for the chemical and electrical connectomes of the \textit{Caenorhabditis elegans} roundworm. The \textit{C.elegans} connectomes consist of $253$ non-isolated neurons with known functional attributes, and there are two types of synaptic connectomes, resulting in a pair of graphs. We formulate our joint graph inference from the perspectives of seeded graph match… ▽ More We investigate joint graph inference for the chemical and electrical connectomes of the \textit{Caenorhabditis elegans} roundworm. The \textit{C.elegans} connectomes consist of $253$ non-isolated neurons with known functional attributes, and there are two types of synaptic connectomes, resulting in a pair of graphs. We formulate our joint graph inference from the perspectives of seeded graph matching and joint vertex classification. Our results suggest that connectomic inference should proceed in the joint space of the two connectomes, which has significant neuroscientific implications. △ Less

Submitted 5 August, 2015; v1 submitted 30 July, 2015; originally announced July 2015.

arXiv:1411.6880 [pdf, other]

An Automated Images-to-Graphs Framework for High Resolution Connectomics

Authors: William Gray Roncal, Dean M. Kleissas, Joshua T. Vogelstein, Priya Manavalan, Kunal Lillaney, Michael Pekala, Randal Burns, R. Jacob Vogelstein, Carey E. Priebe, Mark A. Chevillet, Gregory D. Hager

Abstract: Reconstructing a map of neuronal connectivity is a critical challenge in contemporary neuroscience. Recent advances in high-throughput serial section electron microscopy (EM) have produced massive 3D image volumes of nanoscale brain tissue for the first time. The resolution of EM allows for individual neurons and their synaptic connections to be directly observed. Recovering neuronal networks by m… ▽ More Reconstructing a map of neuronal connectivity is a critical challenge in contemporary neuroscience. Recent advances in high-throughput serial section electron microscopy (EM) have produced massive 3D image volumes of nanoscale brain tissue for the first time. The resolution of EM allows for individual neurons and their synaptic connections to be directly observed. Recovering neuronal networks by manually tracing each neuronal process at this scale is unmanageable, and therefore researchers are developing automated image processing modules. Thus far, state-of-the-art algorithms focus only on the solution to a particular task (e.g., neuron segmentation or synapse identification). In this manuscript we present the first fully automated images-to-graphs pipeline (i.e., a pipeline that begins with an imaged volume of neural tissue and produces a brain graph without any human interaction). To evaluate overall performance and select the best parameters and methods, we also develop a metric to assess the quality of the output graphs. We evaluate a set of algorithms and parameters, searching possible operating points to identify the best available brain graph for our assessment metric. Finally, we deploy a reference end-to-end version of the pipeline on a large, publicly available data set. This provides a baseline result and framework for community analysis and future algorithm development and testing. All code and data derivatives have been made publicly available toward eventually unlocking new biofidelic computational primitives and understanding of neuropathologies. △ Less

Submitted 30 April, 2015; v1 submitted 25 November, 2014; originally announced November 2014.

Comments: 13 pages, first two authors contributed equally V2: Added additional experiments and clarifications; added information on infrastructure and pipeline environment

arXiv:1312.4318 [pdf, other]

doi 10.1109/GlobalSIP.2013.6736874

Computing Scalable Multivariate Glocal Invariants of Large (Brain-) Graphs

Authors: Disa Mhembere, William Gray Roncal, Daniel Sussman, Carey E. Priebe, Rex Jung, Sephira Ryman, R. Jacob Vogelstein, Joshua T. Vogelstein, Randal Burns

Abstract: Graphs are quickly emerging as a leading abstraction for the representation of data. One important application domain originates from an emerging discipline called "connectomics". Connectomics studies the brain as a graph; vertices correspond to neurons (or collections thereof) and edges correspond to structural or functional connections between them. To explore the variability of connectomes---to… ▽ More Graphs are quickly emerging as a leading abstraction for the representation of data. One important application domain originates from an emerging discipline called "connectomics". Connectomics studies the brain as a graph; vertices correspond to neurons (or collections thereof) and edges correspond to structural or functional connections between them. To explore the variability of connectomes---to address both basic science questions regarding the structure of the brain, and medical health questions about psychiatry and neurology---one can study the topological properties of these brain-graphs. We define multivariate glocal graph invariants: these are features of the graph that capture various local and global topological properties of the graphs. We show that the collection of features can collectively be computed via a combination of daisy-chaining, sparse matrix representation and computations, and efficient approximations. Our custom open-source Python package serves as a back-end to a Web-service that we have created to enable researchers to upload graphs, and download the corresponding invariants in a number of different formats. Moreover, we built this package to support distributed processing on multicore machines. This is therefore an enabling technology for network science, lowering the barrier of entry by providing tools to biologists and analysts who otherwise lack these capabilities. As a demonstration, we run our code on 120 brain-graphs, each with approximately 16M vertices and up to 90M edges. △ Less

Submitted 16 December, 2013; originally announced December 2013.

Comments: Published as part of 2013 IEEE GlobalSIP conference

arXiv:1112.5507 [pdf, other]

Fast Approximate Quadratic Programming for Large (Brain) Graph Matching

Authors: Joshua T. Vogelstein, John M. Conroy, Vince Lyzinski, Louis J. Podrazik, Steven G. Kratzer, Eric T. Harley, Donniell E. Fishkind, R. Jacob Vogelstein, Carey E. Priebe

Abstract: Quadratic assignment problems (QAPs) arise in a wide variety of domains, ranging from operations research to graph theory to computer vision to neuroscience. In the age of big data, graph valued data is becoming more prominent, and with it, a desire to run algorithms on ever larger graphs. Because QAP is NP-hard, exact algorithms are intractable. Approximate algorithms necessarily employ an accura… ▽ More Quadratic assignment problems (QAPs) arise in a wide variety of domains, ranging from operations research to graph theory to computer vision to neuroscience. In the age of big data, graph valued data is becoming more prominent, and with it, a desire to run algorithms on ever larger graphs. Because QAP is NP-hard, exact algorithms are intractable. Approximate algorithms necessarily employ an accuracy/efficiency trade-off. We developed a fast approximate quadratic assignment algorithm (FAQ). FAQ finds a local optima in (worst case) time cubic in the number of vertices, similar to other approximate QAP algorithms. We demonstrate empirically that our algorithm is faster and achieves a lower objective value on over 80% of the suite of QAP benchmarks, compared with the previous state-of-the-art. Applying the algorithms to our motivating example, matching C. elegans connectomes (brain-graphs), we find that FAQ achieves the optimal performance in record time, whereas none of the others even find the optimum. △ Less

Submitted 13 September, 2014; v1 submitted 22 December, 2011; originally announced December 2011.

Comments: 17 pages, 5 figures, 2 tables

arXiv:1112.5506 [pdf, other]

Shuffled Graph Classification: Theory and Connectome Applications

Authors: Joshua T. Vogelstein, Carey E. Priebe

Abstract: We develop a formalism to address statistical pattern recognition of graph valued data. Of particular interest is the case of all graphs having the same number of uniquely labeled vertices. When the vertex labels are latent, such graphs are called shuffled graphs. Our formalism provides insight to trivially answer a number of open statistical questions including: (i) under what conditions does shu… ▽ More We develop a formalism to address statistical pattern recognition of graph valued data. Of particular interest is the case of all graphs having the same number of uniquely labeled vertices. When the vertex labels are latent, such graphs are called shuffled graphs. Our formalism provides insight to trivially answer a number of open statistical questions including: (i) under what conditions does shuffling the vertices degrade classification performance and (ii) do universally consistent graph classifiers exist? The answers to these questions lead to practical heuristic algorithms with state-of-the-art finite sample performance, in agreement with our theoretical asymptotics. △ Less

Submitted 16 October, 2012; v1 submitted 22 December, 2011; originally announced December 2011.

Comments: 12 pages, 1 figure

arXiv:1108.6271 [pdf, other]

Optimizing the quantity/quality trade-off in connectome inference

Authors: Carey E. Priebe, Joshua T. Vogelstein, Davi Bock

Abstract: We demonstrate a meaningful prospective power analysis for an (admittedly idealized) illustrative connectome inference task. Modeling neurons as vertices and synapses as edges in a simple random graph model, we optimize the trade-off between the number of (putative) edges identified and the accuracy of the edge identification procedure. We conclude that explicit analysis of the quantity/quality tr… ▽ More We demonstrate a meaningful prospective power analysis for an (admittedly idealized) illustrative connectome inference task. Modeling neurons as vertices and synapses as edges in a simple random graph model, we optimize the trade-off between the number of (putative) edges identified and the accuracy of the edge identification procedure. We conclude that explicit analysis of the quantity/quality trade-off is imperative for optimal neuroscientific experimental design. In particular, more though more errorful edge identification can yield superior inferential performance. △ Less

Submitted 11 October, 2011; v1 submitted 31 August, 2011; originally announced August 2011.

Comments: 8 pages, 1 figure

arXiv:0912.1672 [pdf, other]

Are mental properties supervenient on brain properties?

Authors: Joshua T. Vogelstein, R. Jacob Vogelstein, Carey E. Priebe

Abstract: The "mind-brain supervenience" conjecture suggests that all mental properties are derived from the physical properties of the brain. To address the question of whether the mind supervenes on the brain, we frame a supervenience hypothesis in rigorous statistical terms. Specifically, we propose a modified version of supervenience (called epsilon-supervenience) that is amenable to experimental invest… ▽ More The "mind-brain supervenience" conjecture suggests that all mental properties are derived from the physical properties of the brain. To address the question of whether the mind supervenes on the brain, we frame a supervenience hypothesis in rigorous statistical terms. Specifically, we propose a modified version of supervenience (called epsilon-supervenience) that is amenable to experimental investigation and statistical analysis. To illustrate this approach, we perform a thought experiment that illustrates how the probabilistic theory of pattern recognition can be used to make a one-sided determination of epsilon-supervenience. The physical property of the brain employed in this analysis is the graph describing brain connectivity (i.e., the brain-graph or connectome). epsilon-supervenience allows us to determine whether a particular mental property can be inferred from one's connectome to within any given positive misclassification rate, regardless of the relationship between the two. This may provide motivation for cross-disciplinary research between neuroscientists and statisticians. △ Less

Submitted 5 August, 2011; v1 submitted 9 December, 2009; originally announced December 2009.

Comments: 9 pages, 2 figures

Showing 1–9 of 9 results for author: Priebe, C E