Search | arXiv e-print repository

Physics-Informed Bayesian Optimization of Variational Quantum Circuits

Authors: Kim A. Nicoli, Christopher J. Anders, Lena Funcke, Tobias Hartung, Karl Jansen, Stefan Kühn, Klaus-Robert Müller, Paolo Stornati, Pan Kessel, Shinichi Nakajima

Abstract: In this paper, we propose a novel and powerful method to harness Bayesian optimization for Variational Quantum Eigensolvers (VQEs) -- a hybrid quantum-classical protocol used to approximate the ground state of a quantum Hamiltonian. Specifically, we derive a VQE-kernel which incorporates important prior information about quantum circuits: the kernel feature map of the VQE-kernel exactly matches th… ▽ More In this paper, we propose a novel and powerful method to harness Bayesian optimization for Variational Quantum Eigensolvers (VQEs) -- a hybrid quantum-classical protocol used to approximate the ground state of a quantum Hamiltonian. Specifically, we derive a VQE-kernel which incorporates important prior information about quantum circuits: the kernel feature map of the VQE-kernel exactly matches the known functional form of the VQE's objective function and thereby significantly reduces the posterior uncertainty. Moreover, we propose a novel acquisition function for Bayesian optimization called Expected Maximum Improvement over Confident Regions (EMICoRe) which can actively exploit the inductive bias of the VQE-kernel by treating regions with low predictive uncertainty as indirectly ``observed''. As a result, observations at as few as three points in the search domain are sufficient to determine the complete objective function along an entire one-dimensional subspace of the optimization landscape. Our numerical experiments demonstrate that our approach improves over state-of-the-art baselines. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 36 pages, 17 figures, 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

arXiv:2404.10935 [pdf, other]

Molecular relaxation by reverse diffusion with time step prediction

Authors: Khaled Kahouli, Stefaan Simon Pierre Hessmann, Klaus-Robert Müller, Shinichi Nakajima, Stefan Gugler, Niklas Wolf Andreas Gebauer

Abstract: Molecular relaxation, finding the equilibrium state of a non-equilibrium structure, is an essential component of computational chemistry to understand reactivity. Classical force field methods often rely on insufficient local energy minimization, while neural network force field models require large labeled datasets encompassing both equilibrium and non-equilibrium structures. As a remedy, we prop… ▽ More Molecular relaxation, finding the equilibrium state of a non-equilibrium structure, is an essential component of computational chemistry to understand reactivity. Classical force field methods often rely on insufficient local energy minimization, while neural network force field models require large labeled datasets encompassing both equilibrium and non-equilibrium structures. As a remedy, we propose MoreRed, molecular relaxation by reverse diffusion, a conceptually novel and purely statistical approach where non-equilibrium structures are treated as noisy instances of their corresponding equilibrium states. To enable the denoising of arbitrarily noisy inputs via a generative diffusion model, we further introduce a novel diffusion time step predictor. Notably, MoreRed learns a simpler pseudo potential energy surface instead of the complex physical potential energy surface. It is trained on a significantly smaller, and thus computationally cheaper, dataset consisting of solely unlabeled equilibrium structures, avoiding the computation of non-equilibrium structures altogether. We compare MoreRed to classical force fields, equivariant neural network force fields trained on a large dataset of equilibrium and non-equilibrium data, as well as a semi-empirical tight-binding model. To assess this quantitatively, we evaluate the root-mean-square deviation between the found equilibrium structures and the reference equilibrium structures as well as their DFT energies. △ Less

Submitted 16 April, 2024; originally announced April 2024.

arXiv:2403.03333 [pdf, other]

Federated Learning over Connected Modes

Authors: Dennis Grinwald, Philipp Wiesner, Shinichi Nakajima

Abstract: Statistical heterogeneity in federated learning poses two major challenges: slow global training due to conflicting gradient signals, and the need of personalization for local distributions. In this work, we tackle both challenges by leveraging recent advances in \emph{linear mode connectivity} -- identifying a linearly connected low-loss region in the weight space of neural networks, which we cal… ▽ More Statistical heterogeneity in federated learning poses two major challenges: slow global training due to conflicting gradient signals, and the need of personalization for local distributions. In this work, we tackle both challenges by leveraging recent advances in \emph{linear mode connectivity} -- identifying a linearly connected low-loss region in the weight space of neural networks, which we call solution simplex. We propose federated learning over connected modes (\textsc{Floco}), where clients are assigned local subregions in this simplex based on their gradient signals, and together learn the shared global solution simplex. This allows personalization of the client models to fit their local distributions within the degrees of freedom in the solution simplex and homogenizes the update signals for the global simplex training. Our experiments show that \textsc{Floco} accelerates the global training process, and significantly improves the local accuracy with minimal computational overhead. △ Less

Submitted 21 June, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

arXiv:2311.13594 [pdf, other]

Labeling Neural Representations with Inverse Recognition

Authors: Kirill Bykov, Laura Kopf, Shinichi Nakajima, Marius Kloft, Marina M. -C. Höhne

Abstract: Deep Neural Networks (DNNs) demonstrate remarkable capabilities in learning complex hierarchical data representations, but the nature of these representations remains largely unknown. Existing global explainability methods, such as Network Dissection, face limitations such as reliance on segmentation masks, lack of statistical significance testing, and high computational demands. We propose Invers… ▽ More Deep Neural Networks (DNNs) demonstrate remarkable capabilities in learning complex hierarchical data representations, but the nature of these representations remains largely unknown. Existing global explainability methods, such as Network Dissection, face limitations such as reliance on segmentation masks, lack of statistical significance testing, and high computational demands. We propose Inverse Recognition (INVERT), a scalable approach for connecting learned representations with human-understandable concepts by leveraging their capacity to discriminate between these concepts. In contrast to prior work, INVERT is capable of handling diverse types of neurons, exhibits less computational complexity, and does not rely on the availability of segmentation masks. Moreover, INVERT provides an interpretable metric assessing the alignment between the representation and its corresponding explanation and delivering a measure of statistical significance. We demonstrate the applicability of INVERT in various scenarios, including the identification of representations affected by spurious correlations, and the interpretation of the hierarchical structure of decision-making within the models. △ Less

Submitted 18 January, 2024; v1 submitted 22 November, 2023; originally announced November 2023.

Comments: 25 pages, 16 figures

Journal ref: 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

arXiv:2310.17638 [pdf, other]

Generative Fractional Diffusion Models

Authors: Gabriel Nobis, Maximilian Springenberg, Marco Aversa, Michael Detzel, Rembert Daems, Roderick Murray-Smith, Shinichi Nakajima, Sebastian Lapuschkin, Stefano Ermon, Tolga Birdal, Manfred Opper, Christoph Knochenhauer, Luis Oala, Wojciech Samek

Abstract: We introduce the first continuous-time score-based generative model that leverages fractional diffusion processes for its underlying dynamics. Although diffusion models have excelled at capturing data distributions, they still suffer from various limitations such as slow convergence, mode-collapse on imbalanced data, and lack of diversity. These issues are partially linked to the use of light-tail… ▽ More We introduce the first continuous-time score-based generative model that leverages fractional diffusion processes for its underlying dynamics. Although diffusion models have excelled at capturing data distributions, they still suffer from various limitations such as slow convergence, mode-collapse on imbalanced data, and lack of diversity. These issues are partially linked to the use of light-tailed Brownian motion (BM) with independent increments. In this paper, we replace BM with an approximation of its non-Markovian counterpart, fractional Brownian motion (fBM), characterized by correlated increments and Hurst index $H \in (0,1)$, where $H=1/2$ recovers the classical BM. To ensure tractable inference and learning, we employ a recently popularized Markov approximation of fBM (MA-fBM) and derive its reverse time model, resulting in generative fractional diffusion models (GFDMs). We characterize the forward dynamics using a continuous reparameterization trick and propose an augmented score matching loss to efficiently learn the score-function, which is partly known in closed form, at minimal added cost. The ability to drive our diffusion model via fBM provides flexibility and control. $H \leq 1/2$ enters the regime of rough paths whereas $H>1/2$ regularizes diffusion paths and invokes long-term memory as well as a heavy-tailed behaviour (super-diffusion). The Markov approximation allows added control by varying the number of Markov processes linearly combined to approximate fBM. Our evaluations on real image datasets demonstrate that GFDM achieves greater pixel-wise diversity and enhanced image quality, as indicated by a lower FID, offering a promising alternative to traditional diffusion models. △ Less

Submitted 24 June, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

ACM Class: I.2.4; F.4.1; G.3

arXiv:2302.14112 [pdf, other]

Injectivity of ReLU networks: perspectives from statistical physics

Authors: Antoine Maillard, Afonso S. Bandeira, David Belius, Ivan Dokmanić, Shuta Nakajima

Abstract: When can the input of a ReLU neural network be inferred from its output? In other words, when is the network injective? We consider a single layer, $x \mapsto \mathrm{ReLU}(Wx)$, with a random Gaussian $m \times n$ matrix $W$, in a high-dimensional setting where $n, m \to \infty$. Recent work connects this problem to spherical integral geometry giving rise to a conjectured sharp injectivity thresh… ▽ More When can the input of a ReLU neural network be inferred from its output? In other words, when is the network injective? We consider a single layer, $x \mapsto \mathrm{ReLU}(Wx)$, with a random Gaussian $m \times n$ matrix $W$, in a high-dimensional setting where $n, m \to \infty$. Recent work connects this problem to spherical integral geometry giving rise to a conjectured sharp injectivity threshold for $α= \frac{m}{n}$ by studying the expected Euler characteristic of a certain random set. We adopt a different perspective and show that injectivity is equivalent to a property of the ground state of the spherical perceptron, an important spin glass model in statistical physics. By leveraging the (non-rigorous) replica symmetry-breaking theory, we derive analytical equations for the threshold whose solution is at odds with that from the Euler characteristic. Furthermore, we use Gordon's min--max theorem to prove that a replica-symmetric upper bound refutes the Euler characteristic prediction. Along the way we aim to give a tutorial-style introduction to key ideas from statistical physics in an effort to make the exposition accessible to a broad audience. Our analysis establishes a connection between spin glasses and integral geometry but leaves open the problem of explaining the discrepancies. △ Less

Submitted 27 February, 2023; originally announced February 2023.

Comments: 60 pages

arXiv:2302.14082 [pdf, other]

Detecting and Mitigating Mode-Collapse for Flow-based Sampling of Lattice Field Theories

Authors: Kim A. Nicoli, Christopher J. Anders, Tobias Hartung, Karl Jansen, Pan Kessel, Shinichi Nakajima

Abstract: We study the consequences of mode-collapse of normalizing flows in the context of lattice field theory. Normalizing flows allow for independent sampling. For this reason, it is hoped that they can avoid the tunneling problem of local-update MCMC algorithms for multi-modal distributions. In this work, we first point out that the tunneling problem is also present for normalizing flows but is shifted… ▽ More We study the consequences of mode-collapse of normalizing flows in the context of lattice field theory. Normalizing flows allow for independent sampling. For this reason, it is hoped that they can avoid the tunneling problem of local-update MCMC algorithms for multi-modal distributions. In this work, we first point out that the tunneling problem is also present for normalizing flows but is shifted from the sampling to the training phase of the algorithm. Specifically, normalizing flows often suffer from mode-collapse for which the training process assigns vanishingly low probability mass to relevant modes of the physical distribution. This may result in a significant bias when the flow is used as a sampler in a Markov-Chain or with Importance Sampling. We propose a metric to quantify the degree of mode-collapse and derive a bound on the resulting bias. Furthermore, we propose various mitigation strategies in particular in the context of estimating thermodynamic observables, such as the free energy. △ Less

Submitted 3 November, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

Comments: 10 pages, 7 figures, 6 pages of supplement material

arXiv:2210.04962 [pdf, other]

Domain-Specific Word Embeddings with Structure Prediction

Authors: Stephanie Brandl, David Lassner, Anne Baillot, Shinichi Nakajima

Abstract: Complementary to finding good general word embeddings, an important question for representation learning is to find dynamic word embeddings, e.g., across time or domain. Current methods do not offer a way to use or predict information on structure between sub-corpora, time or domain and dynamic embeddings can only be compared after post-alignment. We propose novel word embedding methods that provi… ▽ More Complementary to finding good general word embeddings, an important question for representation learning is to find dynamic word embeddings, e.g., across time or domain. Current methods do not offer a way to use or predict information on structure between sub-corpora, time or domain and dynamic embeddings can only be compared after post-alignment. We propose novel word embedding methods that provide general word representations for the whole corpus, domain-specific representations for each sub-corpus, sub-corpus structure, and embedding alignment simultaneously. We present an empirical evaluation on New York Times articles and two English Wikipedia datasets with articles on science and philosophy. Our method, called Word2Vec with Structure Prediction (W2VPred), provides better performance than baselines in terms of the general analogy tests, domain-specific analogy tests, and multiple specific word embedding evaluations as well as structure prediction performance when no structure is given a priori. As a use case in the field of Digital Humanities we demonstrate how to raise novel research questions for high literature from the German Text Archive. △ Less

Submitted 6 October, 2022; originally announced October 2022.

Comments: accepted at TACL 13 pages, 4 figures

arXiv:2207.08219 [pdf, other]

Gradients should stay on Path: Better Estimators of the Reverse- and Forward KL Divergence for Normalizing Flows

Authors: Lorenz Vaitl, Kim A. Nicoli, Shinichi Nakajima, Pan Kessel

Abstract: We propose an algorithm to estimate the path-gradient of both the reverse and forward Kullback-Leibler divergence for an arbitrary manifestly invertible normalizing flow. The resulting path-gradient estimators are straightforward to implement, have lower variance, and lead not only to faster convergence of training but also to better overall approximation results compared to standard total gradien… ▽ More We propose an algorithm to estimate the path-gradient of both the reverse and forward Kullback-Leibler divergence for an arbitrary manifestly invertible normalizing flow. The resulting path-gradient estimators are straightforward to implement, have lower variance, and lead not only to faster convergence of training but also to better overall approximation results compared to standard total gradient estimators. We also demonstrate that path-gradient training is less susceptible to mode-collapse. In light of our results, we expect that path-gradient estimators will become the new standard method to train normalizing flows for variational inference. △ Less

Submitted 17 July, 2022; originally announced July 2022.

Comments: 29 pages, 8 figures

arXiv:2206.11723 [pdf, other]

Self-Supervised Training with Autoencoders for Visual Anomaly Detection

Authors: Alexander Bauer, Shinichi Nakajima, Klaus-Robert Müller

Abstract: We focus on a specific use case in anomaly detection where the distribution of normal samples is supported by a lower-dimensional manifold. Here, regularized autoencoders provide a popular approach by learning the identity mapping on the set of normal examples, while trying to prevent good reconstruction on points outside of the manifold. Typically, this goal is implemented by controlling the capa… ▽ More We focus on a specific use case in anomaly detection where the distribution of normal samples is supported by a lower-dimensional manifold. Here, regularized autoencoders provide a popular approach by learning the identity mapping on the set of normal examples, while trying to prevent good reconstruction on points outside of the manifold. Typically, this goal is implemented by controlling the capacity of the model, either directly by reducing the size of the bottleneck layer or implicitly by imposing some sparsity (or contraction) constraints on parts of the corresponding network. However, neither of these techniques does explicitly penalize the reconstruction of anomalous signals often resulting in poor detection. We tackle this problem by adapting a self-supervised learning regime that exploits discriminative information during training but focuses on the submanifold of normal examples. Informally, our training objective regularizes the model to produce locally consistent reconstructions, while replacing irregularities by acting as a filter that removes anomalous patterns. To support this intuition, we perform a rigorous formal analysis of the proposed method and provide a number of interesting insights. In particular, we show that the resulting model resembles a non-linear orthogonal projection of partially corrupted images onto the submanifold of uncorrupted samples. On the other hand, we identify the orthogonal projection as an optimal solution for a number of regularized autoencoders including the contractive and denoising variants. We support our theoretical analysis by empirical evaluation of the resulting detection and localization performance of the proposed method. In particular, we achieve a new state-of-the-art result on the MVTec AD dataset -- a challenging benchmark for visual anomaly detection in the manufacturing domain. △ Less

Submitted 13 May, 2024; v1 submitted 23 June, 2022; originally announced June 2022.

arXiv:2206.09016 [pdf, other]

Path-Gradient Estimators for Continuous Normalizing Flows

Authors: Lorenz Vaitl, Kim A. Nicoli, Shinichi Nakajima, Pan Kessel

Abstract: Recent work has established a path-gradient estimator for simple variational Gaussian distributions and has argued that the path-gradient is particularly beneficial in the regime in which the variational distribution approaches the exact target distribution. In many applications, this regime can however not be reached by a simple Gaussian variational distribution. In this work, we overcome this cr… ▽ More Recent work has established a path-gradient estimator for simple variational Gaussian distributions and has argued that the path-gradient is particularly beneficial in the regime in which the variational distribution approaches the exact target distribution. In many applications, this regime can however not be reached by a simple Gaussian variational distribution. In this work, we overcome this crucial limitation by proposing a path-gradient estimator for the considerably more expressive variational family of continuous normalizing flows. We outline an efficient algorithm to calculate this estimator and establish its superior performance empirically. △ Less

Submitted 17 June, 2022; originally announced June 2022.

Comments: 8 pages, 5 figures, 39th International Conference on Machine Learning

arXiv:2204.05229 [pdf, other]

Mixture-of-experts VAEs can disregard variation in surjective multimodal data

Authors: Jannik Wolff, Tassilo Klein, Moin Nabi, Rahul G. Krishnan, Shinichi Nakajima

Abstract: Machine learning systems are often deployed in domains that entail data from multiple modalities, for example, phenotypic and genotypic characteristics describe patients in healthcare. Previous works have developed multimodal variational autoencoders (VAEs) that generate several modalities. We consider subjective data, where single datapoints from one modality (such as class labels) describe multi… ▽ More Machine learning systems are often deployed in domains that entail data from multiple modalities, for example, phenotypic and genotypic characteristics describe patients in healthcare. Previous works have developed multimodal variational autoencoders (VAEs) that generate several modalities. We consider subjective data, where single datapoints from one modality (such as class labels) describe multiple datapoints from another modality (such as images). We theoretically and empirically demonstrate that multimodal VAEs with a mixture of experts posterior can struggle to capture variability in such surjective data. △ Less

Submitted 11 April, 2022; originally announced April 2022.

Comments: Accepted at the NeurIPS 2021 workshop on Bayesian Deep Learning

arXiv:2201.10859 [pdf, other]

Visualizing the Diversity of Representations Learned by Bayesian Neural Networks

Authors: Dennis Grinwald, Kirill Bykov, Shinichi Nakajima, Marina M. -C. Höhne

Abstract: Explainable Artificial Intelligence (XAI) aims to make learning machines less opaque, and offers researchers and practitioners various tools to reveal the decision-making strategies of neural networks. In this work, we investigate how XAI methods can be used for exploring and visualizing the diversity of feature representations learned by Bayesian Neural Networks (BNNs). Our goal is to provide a g… ▽ More Explainable Artificial Intelligence (XAI) aims to make learning machines less opaque, and offers researchers and practitioners various tools to reveal the decision-making strategies of neural networks. In this work, we investigate how XAI methods can be used for exploring and visualizing the diversity of feature representations learned by Bayesian Neural Networks (BNNs). Our goal is to provide a global understanding of BNNs by making their decision-making strategies a) visible and tangible through feature visualizations and b) quantitatively measurable with a distance measure learned by contrastive learning. Our work provides new insights into the \emph{posterior} distribution in terms of human-understandable feature information with regard to the underlying decision making strategies. The main findings of our work are the following: 1) global XAI methods can be applied to explain the diversity of decision-making strategies of BNN instances, 2) Monte Carlo dropout with commonly used Dropout rates exhibit increased diversity in feature representations compared to the multimodal posterior approximation of MultiSWAG, 3) the diversity of learned feature representations highly correlates with the uncertainty estimate for the output and 4) the inter-mode diversity of the multimodal posterior decreases as the network width increases, while the intra mode diversity increases. These findings are consistent with the recent Deep Neural Networks theory, providing additional intuitions about what the theory implies in terms of humanly understandable concepts. △ Less

Submitted 14 November, 2023; v1 submitted 26 January, 2022; originally announced January 2022.

Comments: 16 pages, 18 figures

Journal ref: Published in Transactions on Machine Learning Research (11/2023)

arXiv:2111.11303 [pdf, ps, other]

doi 10.22323/1.396.0338

Machine Learning of Thermodynamic Observables in the Presence of Mode Collapse

Authors: Kim A. Nicoli, Christopher Anders, Lena Funcke, Tobias Hartung, Karl Jansen, Pan Kessel, Shinichi Nakajima, Paolo Stornati

Abstract: Estimating the free energy, as well as other thermodynamic observables, is a key task in lattice field theories. Recently, it has been pointed out that deep generative models can be used in this context [1]. Crucially, these models allow for the direct estimation of the free energy at a given point in parameter space. This is in contrast to existing methods based on Markov chains which generically… ▽ More Estimating the free energy, as well as other thermodynamic observables, is a key task in lattice field theories. Recently, it has been pointed out that deep generative models can be used in this context [1]. Crucially, these models allow for the direct estimation of the free energy at a given point in parameter space. This is in contrast to existing methods based on Markov chains which generically require integration through parameter space. In this contribution, we will review this novel machine-learning-based estimation method. We will in detail discuss the issue of mode collapse and outline mitigation techniques which are particularly suited for applications at finite temperature. △ Less

Submitted 30 November, 2021; v1 submitted 22 November, 2021; originally announced November 2021.

Comments: 10 pages, 2 figures, Proceedings of the 38th International Symposium on Lattice Field Theory, 26th-30th July 2021, Zoom/Gather@Massachusetts Institute of Technology

Report number: MIT-CTP/5353

arXiv:2111.09462 [pdf]

Order recognition by Schubert polynomials generated by optical near-field statistics via nanometre-scale photochromism

Authors: Kazuharu Uchiyama, Sota Nakajima, Hirotsugu Suzui, Nicolas Chauvet, Hayato Saigo, Ryoichi Horisaki, Kingo Uchida, Makoto Naruse, Hirokazu Hori

Abstract: We have previously observed an irregular spatial distribution of photon transmission through a photochromic crystal photoisomerized by a local optical near-field excitation, manifesting complex branching processes via the interplay of deformation of the material and near-field photon transfer therein. Furthermore, by combining such naturally constructed complex photon transmission with a simple ph… ▽ More We have previously observed an irregular spatial distribution of photon transmission through a photochromic crystal photoisomerized by a local optical near-field excitation, manifesting complex branching processes via the interplay of deformation of the material and near-field photon transfer therein. Furthermore, by combining such naturally constructed complex photon transmission with a simple photon detection protocol, Schubert polynomials, the foundation of versatile permutation operations in mathematics, have been generated. In this study, we demonstrate an order recognition algorithm inspired by Schubert calculus using optical near-field statistics via nanometre-scale photochromism. More specifically, by utilizing Schubert polynomials generated via optical near-field patterns, we show that the order of slot machines with initially unknown reward probability is successfully recognized. We emphasize that, unlike conventional algorithms in the literature, the proposed principle does not estimate the reward probabilities. Instead, it exploits the inversion relations contained in the Schubert polynomials. To quantitatively evaluate the impact of the Schubert polynomials generated from an optical near-field pattern, order recognition performances are compared with uniformly distributed and spatially strongly skewed probability distributions, where the optical near-field pattern outperforms the others. We found that the number of singularities contained in Schubert polynomials and that of the given problem or considered environment exhibits a clear correspondence, indicating that superior order recognition performances may be attained if the singularity of the given problem is presupposed. This study paves a new way toward nanophotonic intelligent devices and systems by the interplay of complex natural processes and mathematical insights gained by Schubert calculus. △ Less

Submitted 17 November, 2021; originally announced November 2021.

arXiv:2108.10346 [pdf, other]

Explaining Bayesian Neural Networks

Authors: Kirill Bykov, Marina M. -C. Höhne, Adelaida Creosteanu, Klaus-Robert Müller, Frederick Klauschen, Shinichi Nakajima, Marius Kloft

Abstract: To make advanced learning machines such as Deep Neural Networks (DNNs) more transparent in decision making, explainable AI (XAI) aims to provide interpretations of DNNs' predictions. These interpretations are usually given in the form of heatmaps, each one illustrating relevant patterns regarding the prediction for a given instance. Bayesian approaches such as Bayesian Neural Networks (BNNs) so fa… ▽ More To make advanced learning machines such as Deep Neural Networks (DNNs) more transparent in decision making, explainable AI (XAI) aims to provide interpretations of DNNs' predictions. These interpretations are usually given in the form of heatmaps, each one illustrating relevant patterns regarding the prediction for a given instance. Bayesian approaches such as Bayesian Neural Networks (BNNs) so far have a limited form of transparency (model transparency) already built-in through their prior weight distribution, but notably, they lack explanations of their predictions for given instances. In this work, we bring together these two perspectives of transparency into a holistic explanation framework for explaining BNNs. Within the Bayesian framework, the network weights follow a probability distribution. Hence, the standard (deterministic) prediction strategy of DNNs extends in BNNs to a predictive distribution, and thus the standard explanation extends to an explanation distribution. Exploiting this view, we uncover that BNNs implicitly employ multiple heterogeneous prediction strategies. While some of these are inherited from standard DNNs, others are revealed to us by considering the inherent uncertainty in BNNs. Our quantitative and qualitative experiments on toy/benchmark data and real-world data from pathology show that the proposed approach of explaining BNNs can lead to more effective and insightful explanations. △ Less

Submitted 23 August, 2021; originally announced August 2021.

Comments: 16 pages, 8 figures

arXiv:2107.05045 [pdf, other]

Positive-Unlabeled Classification under Class-Prior Shift: A Prior-invariant Approach Based on Density Ratio Estimation

Authors: Shota Nakajima, Masashi Sugiyama

Abstract: Learning from positive and unlabeled (PU) data is an important problem in various applications. Most of the recent approaches for PU classification assume that the class-prior (the ratio of positive samples) in the training unlabeled dataset is identical to that of the test data, which does not hold in many practical cases. In addition, we usually do not know the class-priors of the training and t… ▽ More Learning from positive and unlabeled (PU) data is an important problem in various applications. Most of the recent approaches for PU classification assume that the class-prior (the ratio of positive samples) in the training unlabeled dataset is identical to that of the test data, which does not hold in many practical cases. In addition, we usually do not know the class-priors of the training and test data, thus we have no clue on how to train a classifier without them. To address these problems, we propose a novel PU classification method based on density ratio estimation. A notable advantage of our proposed method is that it does not require the class-priors in the training phase; class-prior shift is incorporated only in the test phase. We theoretically justify our proposed method and experimentally demonstrate its effectiveness. △ Less

Submitted 15 December, 2021; v1 submitted 11 July, 2021; originally announced July 2021.

Comments: 36 pages, 4 figures

arXiv:2106.10185 [pdf, other]

NoiseGrad: Enhancing Explanations by Introducing Stochasticity to Model Weights

Authors: Kirill Bykov, Anna Hedström, Shinichi Nakajima, Marina M. -C. Höhne

Abstract: Many efforts have been made for revealing the decision-making process of black-box learning machines such as deep neural networks, resulting in useful local and global explanation methods. For local explanation, stochasticity is known to help: a simple method, called SmoothGrad, has improved the visual quality of gradient-based attribution by adding noise to the input space and averaging the expla… ▽ More Many efforts have been made for revealing the decision-making process of black-box learning machines such as deep neural networks, resulting in useful local and global explanation methods. For local explanation, stochasticity is known to help: a simple method, called SmoothGrad, has improved the visual quality of gradient-based attribution by adding noise to the input space and averaging the explanations of the noisy inputs. In this paper, we extend this idea and propose NoiseGrad that enhances both local and global explanation methods. Specifically, NoiseGrad introduces stochasticity in the weight parameter space, such that the decision boundary is perturbed. NoiseGrad is expected to enhance the local explanation, similarly to SmoothGrad, due to the dual relationship between the input perturbation and the decision boundary perturbation. We evaluate NoiseGrad and its fusion with SmoothGrad -- FusionGrad -- qualitatively and quantitatively with several evaluation criteria, and show that our novel approach significantly outperforms the baseline methods. Both NoiseGrad and FusionGrad are method-agnostic and as handy as SmoothGrad using a simple heuristic for the choice of the hyperparameter setting without the need of finetuning. △ Less

Submitted 30 May, 2022; v1 submitted 18 June, 2021; originally announced June 2021.

Comments: 21 pages, 18 figures

arXiv:2105.11990 [pdf, other]

Optimal Sampling Density for Nonparametric Regression

Authors: Danny Panknin, Klaus Robert Müller, Shinichi Nakajima

Abstract: We propose a novel active learning strategy for regression, which is model-agnostic, robust against model mismatch, and interpretable. Assuming that a small number of initial samples are available, we derive the optimal training density that minimizes the generalization error of local polynomial smoothing (LPS) with its kernel bandwidth tuned locally: We adopt the mean integrated squared error (MI… ▽ More We propose a novel active learning strategy for regression, which is model-agnostic, robust against model mismatch, and interpretable. Assuming that a small number of initial samples are available, we derive the optimal training density that minimizes the generalization error of local polynomial smoothing (LPS) with its kernel bandwidth tuned locally: We adopt the mean integrated squared error (MISE) as a generalization criterion, and use the asymptotic behavior of the MISE as well as the locally optimal bandwidths (LOB) - the bandwidth function that minimizes MISE in the asymptotic limit. The asymptotic expression of our objective then reveals the dependence of the MISE on the training density, enabling analytic minimization. As a result,we obtain the optimal training density in a closed-form. The almost model-free nature of our approach thus helps to encode the essential properties of the target problem, providing a robust and model-agnostic active learning strategy. Furthermore, the obtained training density factorizes the influence of local function complexity, noise level and test density in a transparent and interpretable way. We validate our theory in numerical simulations, and show that the proposed active learning method outperforms the existing state-of-the-art model-agnostic approaches. △ Less

Submitted 24 July, 2021; v1 submitted 25 May, 2021; originally announced May 2021.

Comments: 50 pages, plus 33 pages appendix

arXiv:2008.13723 [pdf, other]

Langevin Cooling for Domain Translation

Authors: Vignesh Srinivasan, Klaus-Robert Müller, Wojciech Samek, Shinichi Nakajima

Abstract: Domain translation is the task of finding correspondence between two domains. Several Deep Neural Network (DNN) models, e.g., CycleGAN and cross-lingual language models, have shown remarkable successes on this task under the unsupervised setting---the mappings between the domains are learned from two independent sets of training data in both domains (without paired samples). However, those methods… ▽ More Domain translation is the task of finding correspondence between two domains. Several Deep Neural Network (DNN) models, e.g., CycleGAN and cross-lingual language models, have shown remarkable successes on this task under the unsupervised setting---the mappings between the domains are learned from two independent sets of training data in both domains (without paired samples). However, those methods typically do not perform well on a significant proportion of test samples. In this paper, we hypothesize that many of such unsuccessful samples lie at the fringe---relatively low-density areas---of data distribution, where the DNN was not trained very well, and propose to perform Langevin dynamics to bring such fringe samples towards high density areas. We demonstrate qualitatively and quantitatively that our strategy, called Langevin Cooling (L-Cool), enhances state-of-the-art methods in image translation and language translation tasks. △ Less

Submitted 31 August, 2020; originally announced August 2020.

arXiv:2007.07115 [pdf, other]

doi 10.1103/PhysRevLett.126.032001

Estimation of Thermodynamic Observables in Lattice Field Theories with Deep Generative Models

Authors: Kim A. Nicoli, Christopher J. Anders, Lena Funcke, Tobias Hartung, Karl Jansen, Pan Kessel, Shinichi Nakajima, Paolo Stornati

Abstract: In this work, we demonstrate that applying deep generative machine learning models for lattice field theory is a promising route for solving problems where Markov Chain Monte Carlo (MCMC) methods are problematic. More specifically, we show that generative models can be used to estimate the absolute value of the free energy, which is in contrast to existing MCMC-based methods which are limited to o… ▽ More In this work, we demonstrate that applying deep generative machine learning models for lattice field theory is a promising route for solving problems where Markov Chain Monte Carlo (MCMC) methods are problematic. More specifically, we show that generative models can be used to estimate the absolute value of the free energy, which is in contrast to existing MCMC-based methods which are limited to only estimate free energy differences. We demonstrate the effectiveness of the proposed method for two-dimensional $φ^4$ theory and compare it to MCMC-based methods in detailed numerical experiments. △ Less

Submitted 5 January, 2021; v1 submitted 14 July, 2020; originally announced July 2020.

Comments: 8 figures

Journal ref: Phys. Rev. Lett. 126, 032001 (2021)

arXiv:2006.09000 [pdf, other]

How Much Can I Trust You? -- Quantifying Uncertainties in Explaining Neural Networks

Authors: Kirill Bykov, Marina M. -C. Höhne, Klaus-Robert Müller, Shinichi Nakajima, Marius Kloft

Abstract: Explainable AI (XAI) aims to provide interpretations for predictions made by learning machines, such as deep neural networks, in order to make the machines more transparent for the user and furthermore trustworthy also for applications in e.g. safety-critical areas. So far, however, no methods for quantifying uncertainties of explanations have been conceived, which is problematic in domains where… ▽ More Explainable AI (XAI) aims to provide interpretations for predictions made by learning machines, such as deep neural networks, in order to make the machines more transparent for the user and furthermore trustworthy also for applications in e.g. safety-critical areas. So far, however, no methods for quantifying uncertainties of explanations have been conceived, which is problematic in domains where a high confidence in explanations is a prerequisite. We therefore contribute by proposing a new framework that allows to convert any arbitrary explanation method for neural networks into an explanation method for Bayesian neural networks, with an in-built modeling of uncertainties. Within the Bayesian framework a network's weights follow a distribution that extends standard single explanation scores and heatmaps to distributions thereof, in this manner translating the intrinsic network model uncertainties into a quantification of explanation uncertainties. This allows us for the first time to carve out uncertainties associated with a model explanation and subsequently gauge the appropriate level of explanation confidence for a user (using percentiles). We demonstrate the effectiveness and usefulness of our approach extensively in various experiments, both qualitatively and quantitatively. △ Less

Submitted 16 June, 2020; originally announced June 2020.

Comments: 12 pages, 10 figures

arXiv:2006.03589 [pdf, other]

doi 10.1109/TPAMI.2021.3115452

Higher-Order Explanations of Graph Neural Networks via Relevant Walks

Authors: Thomas Schnake, Oliver Eberle, Jonas Lederer, Shinichi Nakajima, Kristof T. Schütt, Klaus-Robert Müller, Grégoire Montavon

Abstract: Graph Neural Networks (GNNs) are a popular approach for predicting graph structured data. As GNNs tightly entangle the input graph into the neural network structure, common explainable AI approaches are not applicable. To a large extent, GNNs have remained black-boxes for the user so far. In this paper, we show that GNNs can in fact be naturally explained using higher-order expansions, i.e. by ide… ▽ More Graph Neural Networks (GNNs) are a popular approach for predicting graph structured data. As GNNs tightly entangle the input graph into the neural network structure, common explainable AI approaches are not applicable. To a large extent, GNNs have remained black-boxes for the user so far. In this paper, we show that GNNs can in fact be naturally explained using higher-order expansions, i.e. by identifying groups of edges that jointly contribute to the prediction. Practically, we find that such explanations can be extracted using a nested attribution scheme, where existing techniques such as layer-wise relevance propagation (LRP) can be applied at each step. The output is a collection of walks into the input graph that are relevant for the prediction. Our novel explanation method, which we denote by GNN-LRP, is applicable to a broad range of graph neural networks and lets us extract practically relevant insights on sentiment analysis of text data, structure-property relationships in quantum chemistry, and image classification. △ Less

Submitted 26 November, 2020; v1 submitted 5 June, 2020; originally announced June 2020.

Comments: 14 pages + 6 pages supplement

arXiv:2003.09136 [pdf]

Automatic Identification of Types of Alterations in Historical Manuscripts

Authors: David Lassner, Anne Baillot, Sergej Dogadov, Klaus-Robert Müller, Shinichi Nakajima

Abstract: Alterations in historical manuscripts such as letters represent a promising field of research. On the one hand, they help understand the construction of text. On the other hand, topics that are being considered sensitive at the time of the manuscript gain coherence and contextuality when taking alterations into account, especially in the case of deletions. The analysis of alterations in manuscript… ▽ More Alterations in historical manuscripts such as letters represent a promising field of research. On the one hand, they help understand the construction of text. On the other hand, topics that are being considered sensitive at the time of the manuscript gain coherence and contextuality when taking alterations into account, especially in the case of deletions. The analysis of alterations in manuscripts, though, is a traditionally very tedious work. In this paper, we present a machine learning-based approach to help categorize alterations in documents. In particular, we present a new probabilistic model (Alteration Latent Dirichlet Allocation, alterLDA in the following) that categorizes content-related alterations. The method proposed here is developed based on experiments carried out on the digital scholarly edition Berlin Intellectuals, for which alterLDA achieves high performance in the recognition of alterations on labelled data. On unlabelled data, applying alterLDA leads to interesting new insights into the alteration behavior of authors, editors and other manuscript contributors, as well as insights into sensitive topics in the correspondence of Berlin intellectuals around 1800. In addition to the findings based on the digital scholarly edition Berlin Intellectuals, we present a general framework for the analysis of text genesis that can be used in the context of other digital resources representing document variants. To that end, we present in detail the methodological steps that are to be followed in order to achieve such results, giving thereby a prime example of an Machine Learning application the Digital Humanities. △ Less

Submitted 4 November, 2020; v1 submitted 20 March, 2020; originally announced March 2020.

Comments: Accepted for publication in Digital Humanities Quarterly

arXiv:1912.12090 [pdf, other]

Polynomial-Time Exact MAP Inference on Discrete Models with Global Dependencies

Authors: Alexander Bauer, Shinichi Nakajima

Abstract: Considering the worst-case scenario, junction tree algorithm remains the most general solution for exact MAP inference with polynomial run-time guarantees. Unfortunately, its main tractability assumption requires the treewidth of a corresponding MRF to be bounded strongly limiting the range of admissible applications. In fact, many practical problems in the area of structured prediction require mo… ▽ More Considering the worst-case scenario, junction tree algorithm remains the most general solution for exact MAP inference with polynomial run-time guarantees. Unfortunately, its main tractability assumption requires the treewidth of a corresponding MRF to be bounded strongly limiting the range of admissible applications. In fact, many practical problems in the area of structured prediction require modelling of global dependencies by either directly introducing global factors or enforcing global constraints on the prediction variables. That, however, always results in a fully-connected graph making exact inference by means of this algorithm intractable. Previous work [1]-[4] focusing on the problem of loss-augmented inference has demonstrated how efficient inference can be performed on models with specific global factors representing non-decomposable loss functions within the training regime of SSVMs. In this paper, we extend the framework for an efficient exact inference proposed in in [3] by allowing much finer interactions between the energy of the core model and the sufficient statistics of the global terms with no additional computation costs. We demonstrate the usefulness of our method in several use cases, including one that cannot be handled by any of the previous approaches. Finally, we propose a new graph transformation technique via node cloning which ensures a polynomial run-time for solving our target problem independently of the form of a corresponding clique tree. This is important for the efficiency of the main algorithm and greatly improves upon the theoretical guarantees of the previous works. △ Less

Submitted 9 February, 2022; v1 submitted 27 December, 2019; originally announced December 2019.

arXiv:1911.11596 [pdf, ps, other]

Distortion and Faults in Machine Learning Software

Authors: Shin Nakajima

Abstract: Machine learning software, deep neural networks (DNN) software in particular, discerns valuable information from a large dataset, a set of data. Outcomes of such DNN programs are dependent on the quality of both learning programs and datasets. Unfortunately, the quality of datasets is difficult to be defined, because they are just samples. The quality assurance of DNN software is difficult, becaus… ▽ More Machine learning software, deep neural networks (DNN) software in particular, discerns valuable information from a large dataset, a set of data. Outcomes of such DNN programs are dependent on the quality of both learning programs and datasets. Unfortunately, the quality of datasets is difficult to be defined, because they are just samples. The quality assurance of DNN software is difficult, because resultant trained machine learning models are unknown prior to its development, and the validation is conducted indirectly in terms of prediction performance. This paper introduces a hypothesis that faults in the learning programs manifest themselves as distortions in trained machine learning models. Relative distortion degrees measured with appropriate observer functions may indicate that there are some hidden faults. The proposal is demonstrated with example cases of the MNIST dataset. △ Less

Submitted 25 November, 2019; originally announced November 2019.

Comments: Presented at the 9th SOFL+MSVL Workshop in Shenzhen, November 2019

arXiv:1910.13496 [pdf, other]

doi 10.1103/PhysRevE.101.023304

Asymptotically unbiased estimation of physical observables with neural samplers

Authors: Kim A. Nicoli, Shinichi Nakajima, Nils Strodthoff, Wojciech Samek, Klaus-Robert Müller, Pan Kessel

Abstract: We propose a general framework for the estimation of observables with generative neural samplers focusing on modern deep generative neural networks that provide an exact sampling probability. In this framework, we present asymptotically unbiased estimators for generic observables, including those that explicitly depend on the partition function such as free energy or entropy, and derive correspond… ▽ More We propose a general framework for the estimation of observables with generative neural samplers focusing on modern deep generative neural networks that provide an exact sampling probability. In this framework, we present asymptotically unbiased estimators for generic observables, including those that explicitly depend on the partition function such as free energy or entropy, and derive corresponding variance estimators. We demonstrate their practical applicability by numerical experiments for the 2d Ising model which highlight the superiority over existing methods. Our approach greatly enhances the applicability of generative neural samplers to real-world physical systems. △ Less

Submitted 13 February, 2020; v1 submitted 29 October, 2019; originally announced October 2019.

Comments: 5 figures

Journal ref: Phys. Rev. E 101, 023304 (2020)

arXiv:1910.09840 [pdf, ps, other]

Towards Best Practice in Explaining Neural Network Decisions with LRP

Authors: Maximilian Kohlbrenner, Alexander Bauer, Shinichi Nakajima, Alexander Binder, Wojciech Samek, Sebastian Lapuschkin

Abstract: Within the last decade, neural network based predictors have demonstrated impressive - and at times super-human - capabilities. This performance is often paid for with an intransparent prediction process and thus has sparked numerous contributions in the novel field of explainable artificial intelligence (XAI). In this paper, we focus on a popular and widely used method of XAI, the Layer-wise Rele… ▽ More Within the last decade, neural network based predictors have demonstrated impressive - and at times super-human - capabilities. This performance is often paid for with an intransparent prediction process and thus has sparked numerous contributions in the novel field of explainable artificial intelligence (XAI). In this paper, we focus on a popular and widely used method of XAI, the Layer-wise Relevance Propagation (LRP). Since its initial proposition LRP has evolved as a method, and a best practice for applying the method has tacitly emerged, based however on humanly observed evidence alone. In this paper we investigate - and for the first time quantify - the effect of this current best practice on feedforward neural networks in a visual object detection setting. The results verify that the layer-dependent approach to LRP applied in recent literature better represents the model's reasoning, and at the same time increases the object localization and class discriminativity of LRP. △ Less

Submitted 13 July, 2020; v1 submitted 22 October, 2019; originally announced October 2019.

Comments: 7 pages, 4 figures, 1 table. fixed table row compared to v2. Presented virtually at IJCNN 2020

arXiv:1904.05586 [pdf, other]

Black-Box Decision based Adversarial Attack with Symmetric $α$-stable Distribution

Authors: Vignesh Srinivasan, Ercan E. Kuruoglu, Klaus-Robert Müller, Wojciech Samek, Shinichi Nakajima

Abstract: Developing techniques for adversarial attack and defense is an important research field for establishing reliable machine learning and its applications. Many existing methods employ Gaussian random variables for exploring the data space to find the most adversarial (for attacking) or least adversarial (for defense) point. However, the Gaussian distribution is not necessarily the optimal choice whe… ▽ More Developing techniques for adversarial attack and defense is an important research field for establishing reliable machine learning and its applications. Many existing methods employ Gaussian random variables for exploring the data space to find the most adversarial (for attacking) or least adversarial (for defense) point. However, the Gaussian distribution is not necessarily the optimal choice when the exploration is required to follow the complicated structure that most real-world data distributions exhibit. In this paper, we investigate how statistics of random variables affect such random walk exploration. Specifically, we generalize the Boundary Attack, a state-of-the-art black-box decision based attacking strategy, and propose the Lévy-Attack, where the random walk is driven by symmetric $α$-stable random variables. Our experiments on MNIST and CIFAR10 datasets show that the Lévy-Attack explores the image data space more efficiently, and significantly improves the performance. Our results also give an insight into the recently found fact in the whitebox attacking scenario that the choice of the norm for measuring the amplitude of the adversarial patterns is essential. △ Less

Submitted 11 April, 2019; originally announced April 2019.

arXiv:1903.11048 [pdf, other]

Comment on "Solving Statistical Mechanics Using VANs": Introducing saVANt - VANs Enhanced by Importance and MCMC Sampling

Authors: Kim Nicoli, Pan Kessel, Nils Strodthoff, Wojciech Samek, Klaus-Robert Müller, Shinichi Nakajima

Abstract: In this comment on "Solving Statistical Mechanics Using Variational Autoregressive Networks" by Wu et al., we propose a subtle yet powerful modification of their approach. We show that the inherent sampling error of their method can be corrected by using neural network-based MCMC or importance sampling which leads to asymptotically unbiased estimators for physical quantities. This modification is… ▽ More In this comment on "Solving Statistical Mechanics Using Variational Autoregressive Networks" by Wu et al., we propose a subtle yet powerful modification of their approach. We show that the inherent sampling error of their method can be corrected by using neural network-based MCMC or importance sampling which leads to asymptotically unbiased estimators for physical quantities. This modification is possible due to a singular property of VANs, namely that they provide the exact sample probability. With these modifications, we believe that their method could have a substantially greater impact on various important fields of physics, including strongly-interacting field theories and statistical physics. △ Less

Submitted 26 March, 2019; originally announced March 2019.

Comments: 6 pages, 4 figures

arXiv:1902.10664 [pdf, other]

Local Function Complexity for Active Learning via Mixture of Gaussian Processes

Authors: Danny Panknin, Stefan Chmiela, Klaus-Robert Müller, Shinichi Nakajima

Abstract: Inhomogeneities in real-world data, e.g., due to changes in the observation noise level or variations in the structural complexity of the source function, pose a unique set of challenges for statistical inference. Accounting for them can greatly improve predictive power when physical resources or computation time is limited. In this paper, we draw on recent theoretical results on the estimation of… ▽ More Inhomogeneities in real-world data, e.g., due to changes in the observation noise level or variations in the structural complexity of the source function, pose a unique set of challenges for statistical inference. Accounting for them can greatly improve predictive power when physical resources or computation time is limited. In this paper, we draw on recent theoretical results on the estimation of local function complexity (LFC), derived from the domain of local polynomial smoothing (LPS), to establish a notion of local structural complexity, which is used to develop a model-agnostic active learning (AL) framework. Due to its reliance on pointwise estimates, the LPS model class is not robust and scalable concerning large input space dimensions that typically come along with real-world problems. Here, we derive and estimate the Gaussian process regression (GPR)-based analog of the LPS-based LFC and use it as a substitute in the above framework to make it robust and scalable. We assess the effectiveness of our LFC estimate in an AL application on a prototypical low-dimensional synthetic dataset, before taking on the challenging real-world task of reconstructing a quantum chemical force field for a small organic molecule and demonstrating state-of-the-art performance with a significantly reduced training demand. △ Less

Submitted 12 December, 2023; v1 submitted 27 February, 2019; originally announced February 2019.

Comments: 30 pages (+18 pages of references and appendices), 20 figures

Journal ref: Transactions on Machine Learning Research, December 2023

arXiv:1806.11326 [pdf, other]

Unsupervised Detection and Explanation of Latent-class Contextual Anomalies

Authors: Jacob Kauffmann, Grégoire Montavon, Luiz Alberto Lima, Shinichi Nakajima, Klaus-Robert Müller, Nico Görnitz

Abstract: Detecting and explaining anomalies is a challenging effort. This holds especially true when data exhibits strong dependencies and single measurements need to be assessed and analyzed in their respective context. In this work, we consider scenarios where measurements are non-i.i.d, i.e. where samples are dependent on corresponding discrete latent variables which are connected through some given dep… ▽ More Detecting and explaining anomalies is a challenging effort. This holds especially true when data exhibits strong dependencies and single measurements need to be assessed and analyzed in their respective context. In this work, we consider scenarios where measurements are non-i.i.d, i.e. where samples are dependent on corresponding discrete latent variables which are connected through some given dependency structure, the contextual information. Our contribution is twofold: (i) Building atop of support vector data description (SVDD), we derive a method able to cope with latent-class dependency structure that can still be optimized efficiently. We further show that our approach neatly generalizes vanilla SVDD as well as k-means and conditional random fields (CRF) and provide a corresponding probabilistic interpretation. (ii) In unsupervised scenarios where it is not possible to quantify the accuracy of an anomaly detector, having an human-interpretable solution is the key to success. Based on deep Taylor decomposition and a reformulation of our trained anomaly detector as a neural network, we are able to backpropagate predictions to pixel-domain and thus identify features and regions of high relevance. We demonstrate the usefulness of our novel approach on toy data with known spatio-temporal structure and successfully validate on synthetic as well as real world off-shore data from the oil industry. △ Less

Submitted 29 June, 2018; originally announced June 2018.

arXiv:1806.06126 [pdf, other]

Tight Bound of Incremental Cover Trees for Dynamic Diversification

Authors: Hannah Marienwald, Wikor Pronobis, Klaus-Robert Müller, Shinichi Nakajima

Abstract: Dynamic diversification---finding a set of data points with maximum diversity from a time-dependent sample pool---is an important task in recommender systems, web search, database search, and notification services, to avoid showing users duplicate or very similar items. The incremental cover tree (ICT) with high computational efficiency and flexibility has been applied to this task, and shown good… ▽ More Dynamic diversification---finding a set of data points with maximum diversity from a time-dependent sample pool---is an important task in recommender systems, web search, database search, and notification services, to avoid showing users duplicate or very similar items. The incremental cover tree (ICT) with high computational efficiency and flexibility has been applied to this task, and shown good performance. Specifically, it was empirically observed that ICT typically provides a set with its diversity only marginally ($\sim 1/ 1.2$ times) worse than the greedy max-min (GMM) algorithm, the state-of-the-art method for static diversification with its performance bound optimal for any polynomial time algorithm. Nevertheless, the known performance bound for ICT is 4 times worse than this optimal bound. With this paper, we aim to fill this very gap between theory and empirical observations. For achieving this, we first analyze variants of ICT methods, and derive tighter performance bounds. We then investigate the gap between the obtained bound and empirical observations by using specially designed artificial data for which the optimal diversity is known. Finally, we analyze the tightness of the bound, and show that the bound cannot be further improved, i.e., this paper provides the tightest possible bound for ICT methods. In addition, we demonstrate a new use of dynamic diversification for generative image samplers, where prototypes are incrementally collected from a stream of artificial images generated by an image sampler. △ Less

Submitted 15 June, 2018; originally announced June 2018.

arXiv:1805.12017 [pdf, other]

Robustifying Models Against Adversarial Attacks by Langevin Dynamics

Authors: Vignesh Srinivasan, Arturo Marban, Klaus-Robert Müller, Wojciech Samek, Shinichi Nakajima

Abstract: Adversarial attacks on deep learning models have compromised their performance considerably. As remedies, a lot of defense methods were proposed, which however, have been circumvented by newer attacking strategies. In the midst of this ensuing arms race, the problem of robustness against adversarial attacks still remains unsolved. This paper proposes a novel, simple yet effective defense strategy… ▽ More Adversarial attacks on deep learning models have compromised their performance considerably. As remedies, a lot of defense methods were proposed, which however, have been circumvented by newer attacking strategies. In the midst of this ensuing arms race, the problem of robustness against adversarial attacks still remains unsolved. This paper proposes a novel, simple yet effective defense strategy where adversarial samples are relaxed onto the underlying manifold of the (unknown) target class distribution. Specifically, our algorithm drives off-manifold adversarial samples towards high density regions of the data generating distribution of the target class by the Metroplis-adjusted Langevin algorithm (MALA) with perceptual boundary taken into account. Although the motivation is similar to projection methods, e.g., Defense-GAN, our algorithm, called MALA for DEfense (MALADE), is equipped with significant dispersion - projection is distributed broadly, and therefore any whitebox attack cannot accurately align the input so that the MALADE moves it to a targeted untrained spot where the model predicts a wrong label. In our experiments, MALADE exhibited state-of-the-art performance against various elaborate attacking strategies. △ Less

Submitted 6 June, 2019; v1 submitted 30 May, 2018; originally announced May 2018.

arXiv:1805.04636

doi 10.4204/EPTCS.271

Proceedings Joint Workshop on Handling IMPlicit and EXplicit knowledge in formal system development (IMPEX) and Formal and Model-Driven Techniques for Developing Trustworthy Systems (FM&MDD)

Authors: Régine Laleau, Dominique Méry, Shin Nakajima, Elena Troubitsyna

Abstract: This volume contains the joint proceedings of IMPEX 2017, the first workshop on Handling IMPlicit and EXplicit knowledge in formal system development and FM&MDD, the second workshop on Formal and Model-Driven Techniques for Developing Trustworthy Systems (FM&MDD) held together on November 16, 2017 in Xi'an, China, as part of ICFEM 2017, 19th International Conference on Formal Engineering Method… ▽ More This volume contains the joint proceedings of IMPEX 2017, the first workshop on Handling IMPlicit and EXplicit knowledge in formal system development and FM&MDD, the second workshop on Formal and Model-Driven Techniques for Developing Trustworthy Systems (FM&MDD) held together on November 16, 2017 in Xi'an, China, as part of ICFEM 2017, 19th International Conference on Formal Engineering Methods. IMPEX emphasises mechanisms for reducing heterogeneity of models induced by the absence of explicit semantics expression in the formal techniques used to specify these models. More precisely, the meeting targets to highlight the advances in handling both implicit and explicit semantics in formal system developments. The aims of FM&MDD are to advance the understanding in the area of developing and applying formal and model-driven techniques for designing trustworthy systems, to discuss the emerging issues in the area, to improve the dialog between different research communities and between academia and industry, to discuss a roadmap of the future research in the area and to create a forum for discussing and disseminating the new ideas and the research results in the area △ Less

Submitted 11 May, 2018; originally announced May 2018.

Journal ref: EPTCS 271, 2018

arXiv:1709.01562 [pdf, other]

Optimizing for Measure of Performance in Max-Margin Parsing

Authors: Alexander Bauer, Shinichi Nakajima, Nico Görnitz, Klaus-Robert Müller

Abstract: Many statistical learning problems in the area of natural language processing including sequence tagging, sequence segmentation and syntactic parsing has been successfully approached by means of structured prediction methods. An appealing property of the corresponding discriminative learning algorithms is their ability to integrate the loss function of interest directly into the optimization proce… ▽ More Many statistical learning problems in the area of natural language processing including sequence tagging, sequence segmentation and syntactic parsing has been successfully approached by means of structured prediction methods. An appealing property of the corresponding discriminative learning algorithms is their ability to integrate the loss function of interest directly into the optimization process, which potentially can increase the resulting performance accuracy. Here, we demonstrate on the example of constituency parsing how to optimize for F1-score in the max-margin framework of structural SVM. In particular, the optimization is with respect to the original (not binarized) trees. △ Less

Submitted 8 September, 2017; v1 submitted 5 September, 2017; originally announced September 2017.

arXiv:1708.03314 [pdf, other]

Partial Optimality of Dual Decomposition for MAP Inference in Pairwise MRFs

Authors: Alexander Bauer, Shinichi Nakajima, Nico Görnitz, Klaus-Robert Müller

Abstract: Markov random fields (MRFs) are a powerful tool for modelling statistical dependencies for a set of random variables using a graphical representation. An important computational problem related to MRFs, called maximum a posteriori (MAP) inference, is finding a joint variable assignment with the maximal probability. It is well known that the two popular optimisation techniques for this task, linear… ▽ More Markov random fields (MRFs) are a powerful tool for modelling statistical dependencies for a set of random variables using a graphical representation. An important computational problem related to MRFs, called maximum a posteriori (MAP) inference, is finding a joint variable assignment with the maximal probability. It is well known that the two popular optimisation techniques for this task, linear programming (LP) relaxation and dual decomposition (DD), have a strong connection both providing an optimal solution to the MAP problem when a corresponding LP relaxation is tight. However, less is known about their relationship in the opposite and more realistic case. In this paper, we explain how the fully integral assignments obtained via DD partially agree with the optimal fractional assignments via LP relaxation when the latter is not tight. In particular, for binary pairwise MRFs the corresponding result suggests that both methods share the partial optimality property of their solutions. △ Less

Submitted 9 August, 2017; originally announced August 2017.

arXiv:1609.03219 [pdf, other]

Sharing Hash Codes for Multiple Purposes

Authors: Wikor Pronobis, Danny Panknin, Johannes Kirschnick, Vignesh Srinivasan, Wojciech Samek, Volker Markl, Manohar Kaul, Klaus-Robert Mueller, Shinichi Nakajima

Abstract: Locality sensitive hashing (LSH) is a powerful tool for sublinear-time approximate nearest neighbor search, and a variety of hashing schemes have been proposed for different dissimilarity measures. However, hash codes significantly depend on the dissimilarity, which prohibits users from adjusting the dissimilarity at query time. In this paper, we propose {multiple purpose LSH (mp-LSH) which shares… ▽ More Locality sensitive hashing (LSH) is a powerful tool for sublinear-time approximate nearest neighbor search, and a variety of hashing schemes have been proposed for different dissimilarity measures. However, hash codes significantly depend on the dissimilarity, which prohibits users from adjusting the dissimilarity at query time. In this paper, we propose {multiple purpose LSH (mp-LSH) which shares the hash codes for different dissimilarities. mp-LSH supports L2, cosine, and inner product dissimilarities, and their corresponding weighted sums, where the weights can be adjusted at query time. It also allows us to modify the importance of pre-defined groups of features. Thus, mp-LSH enables us, for example, to retrieve similar items to a query with the user preference taken into account, to find a similar material to a query with some properties (stability, utility, etc.) optimized, and to turn on or off a part of multi-modal information (brightness, color, audio, text, etc.) in image/video retrieval. We theoretically and empirically analyze the performance of three variants of mp-LSH, and demonstrate their usefulness on real-world data sets. △ Less

Submitted 1 June, 2017; v1 submitted 11 September, 2016; originally announced September 2016.

arXiv:1609.00626 [pdf, other]

SynsetRank: Degree-adjusted Random Walk for Relation Identification

Authors: Shinichi Nakajima, Sebastian Krause, Dirk Weissenborn, Sven Schmeier, Nico Goernitz, Feiyu Xu

Abstract: In relation extraction, a key process is to obtain good detectors that find relevant sentences describing the target relation. To minimize the necessity of labeled data for refining detectors, previous work successfully made use of BabelNet, a semantic graph structure expressing relationships between synsets, as side information or prior knowledge. The goal of this paper is to enhance the use of g… ▽ More In relation extraction, a key process is to obtain good detectors that find relevant sentences describing the target relation. To minimize the necessity of labeled data for refining detectors, previous work successfully made use of BabelNet, a semantic graph structure expressing relationships between synsets, as side information or prior knowledge. The goal of this paper is to enhance the use of graph structure in the framework of random walk with a few adjustable parameters. Actually, a straightforward application of random walk degrades the performance even after parameter optimization. With the insight from this unsuccessful trial, we propose SynsetRank, which adjusts the initial probability so that high degree nodes influence the neighbors as strong as low degree nodes. In our experiment on 13 relations in the FB15K-237 dataset, SynsetRank significantly outperforms baselines and the plain random walk approach. △ Less

Submitted 15 September, 2016; v1 submitted 2 September, 2016; originally announced September 2016.

arXiv:1507.06878 [pdf, other]

doi 10.1007/s00453-016-0267-z

Quantum Algorithm for Triangle Finding in Sparse Graphs

Authors: François Le Gall, Shogo Nakajima

Abstract: This paper presents a quantum algorithm for triangle finding over sparse graphs that improves over the previous best quantum algorithm for this task by Buhrman et al. [SIAM Journal on Computing, 2005]. Our algorithm is based on the recent $\tilde O(n^{5/4})$-query algorithm given by Le Gall [FOCS 2014] for triangle finding over dense graphs (here $n$ denotes the number of vertices in the graph). W… ▽ More This paper presents a quantum algorithm for triangle finding over sparse graphs that improves over the previous best quantum algorithm for this task by Buhrman et al. [SIAM Journal on Computing, 2005]. Our algorithm is based on the recent $\tilde O(n^{5/4})$-query algorithm given by Le Gall [FOCS 2014] for triangle finding over dense graphs (here $n$ denotes the number of vertices in the graph). We show in particular that triangle finding can be solved with $O(n^{5/4-ε})$ queries for some constant $ε>0$ whenever the graph has at most $O(n^{2-c})$ edges for some constant $c>0$. △ Less

Submitted 24 July, 2015; originally announced July 2015.

Comments: 13 pages

Journal ref: Algorithmica, Vol. 79 No. 3, pp. 941-959, 2017

arXiv:1507.04777 [pdf, other]

doi 10.1007/s10994-017-5652-6

Sparse Probit Linear Mixed Model

Authors: Stephan Mandt, Florian Wenzel, Shinichi Nakajima, John P. Cunningham, Christoph Lippert, Marius Kloft

Abstract: Linear Mixed Models (LMMs) are important tools in statistical genetics. When used for feature selection, they allow to find a sparse set of genetic traits that best predict a continuous phenotype of interest, while simultaneously correcting for various confounding factors such as age, ethnicity and population structure. Formulated as models for linear regression, LMMs have been restricted to conti… ▽ More Linear Mixed Models (LMMs) are important tools in statistical genetics. When used for feature selection, they allow to find a sparse set of genetic traits that best predict a continuous phenotype of interest, while simultaneously correcting for various confounding factors such as age, ethnicity and population structure. Formulated as models for linear regression, LMMs have been restricted to continuous phenotypes. We introduce the Sparse Probit Linear Mixed Model (Probit-LMM), where we generalize the LMM modeling paradigm to binary phenotypes. As a technical challenge, the model no longer possesses a closed-form likelihood function. In this paper, we present a scalable approximate inference algorithm that lets us fit the model to high-dimensional data sets. We show on three real-world examples from different domains that in the setup of binary labels, our algorithm leads to better prediction accuracies and also selects features which show less correlation with the confounding factors. △ Less

Submitted 17 July, 2017; v1 submitted 16 July, 2015; originally announced July 2015.

Comments: Published version, 21 pages, 6 figures

Journal ref: Machine Learning, 106(9), 1621-1642 (2017)

arXiv:1112.3697 [pdf, ps, other]

doi 10.1371/journal.pone.0038897

Insights from Classifying Visual Concepts with Multiple Kernel Learning

Authors: Alexander Binder, Shinichi Nakajima, Marius Kloft, Christina Müller, Wojciech Samek, Ulf Brefeld, Klaus-Robert Müller, Motoaki Kawanabe

Abstract: Combining information from various image features has become a standard technique in concept recognition tasks. However, the optimal way of fusing the resulting kernel functions is usually unknown in practical applications. Multiple kernel learning (MKL) techniques allow to determine an optimal linear combination of such similarity matrices. Classical approaches to MKL promote sparse mixtures. Unf… ▽ More Combining information from various image features has become a standard technique in concept recognition tasks. However, the optimal way of fusing the resulting kernel functions is usually unknown in practical applications. Multiple kernel learning (MKL) techniques allow to determine an optimal linear combination of such similarity matrices. Classical approaches to MKL promote sparse mixtures. Unfortunately, so-called 1-norm MKL variants are often observed to be outperformed by an unweighted sum kernel. The contribution of this paper is twofold: We apply a recently developed non-sparse MKL variant to state-of-the-art concept recognition tasks within computer vision. We provide insights on benefits and limits of non-sparse MKL and compare it against its direct competitors, the sum kernel SVM and the sparse MKL. We report empirical results for the PASCAL VOC 2009 Classification and ImageCLEF2010 Photo Annotation challenge data sets. About to be submitted to PLoS ONE. △ Less

Submitted 15 December, 2011; originally announced December 2011.

Comments: 18 pages, 8 tables, 4 figures, format deviating from plos one submission format requirements for aesthetic reasons

Journal ref: PLoS ONE 7(8): e38897, 2012

Showing 1–42 of 42 results for author: Nakajima, S