Towards Algorithmic Analytics for Large-scale Datasets

Danilo Bzdok; Thomas E Nichols; Stephen M Smith

doi:10.1038/s42256-019-0069-5

Towards Algorithmic Analytics for Large-scale Datasets

Nat Mach Intell. 2019 Jul;1(7):296-306. doi: 10.1038/s42256-019-0069-5. Epub 2019 Jul 9.

Authors

Danilo Bzdok^{1

2

3}, Thomas E Nichols^{4

5}, Stephen M Smith⁴

Affiliations

¹ Department of Psychiatry, Psychotherapy and Psychosomatics, RWTH Aachen University, 52072 Aachen, Germany.
² JARA, Translational Brain Medicine, Aachen, Germany.
³ Parietal Team, INRIA, Neurospin, bat 145, CEA Saclay, 91191 Gif-sur-Yvette, France.
⁴ Wellcome Trust Centre for Integrative Neuroimaging (WIN-FMRIB), University of Oxford, Oxford, UK.
⁵ Big Data Institute, University of Oxford, Oxford, UK.

Abstract

The traditional goals of quantitative analytics cherish simple, transparent models to generate explainable insights. Large-scale data acquisition, enabled for instance by brain scanning and genomic profiling with microarray-type techniques, has prompted a wave of statistical inventions and innovative applications. Modern analysis approaches 1) tame large variable arrays capitalizing on regularization and dimensionality-reduction strategies, 2) are increasingly backed up by empirical model validations rather than justified by mathematical proofs, 3) will compare against and build on open data and consortium repositories, as well as 4) often embrace more elaborate, less interpretable models in order to maximize prediction accuracy. Here we review these trends in learning from "big data" and illustrate examples from imaging neuroscience.

Keywords: data science; deep phenotyping; explainable AI; machine learning; open science; reproducibility.

Abstract

Grants and funding