-
A Quadrature Approach for General-Purpose Batch Bayesian Optimization via Probabilistic Lifting
Authors:
Masaki Adachi,
Satoshi Hayakawa,
Martin Jørgensen,
Saad Hamid,
Harald Oberhauser,
Michael A. Osborne
Abstract:
Parallelisation in Bayesian optimisation is a common strategy but faces several challenges: the need for flexibility in acquisition functions and kernel choices, flexibility dealing with discrete and continuous variables simultaneously, model misspecification, and lastly fast massive parallelisation. To address these challenges, we introduce a versatile and modular framework for batch Bayesian opt…
▽ More
Parallelisation in Bayesian optimisation is a common strategy but faces several challenges: the need for flexibility in acquisition functions and kernel choices, flexibility dealing with discrete and continuous variables simultaneously, model misspecification, and lastly fast massive parallelisation. To address these challenges, we introduce a versatile and modular framework for batch Bayesian optimisation via probabilistic lifting with kernel quadrature, called SOBER, which we present as a Python library based on GPyTorch/BoTorch. Our framework offers the following unique benefits: (1) Versatility in downstream tasks under a unified approach. (2) A gradient-free sampler, which does not require the gradient of acquisition functions, offering domain-agnostic sampling (e.g., discrete and mixed variables, non-Euclidean space). (3) Flexibility in domain prior distribution. (4) Adaptive batch size (autonomous determination of the optimal batch size). (5) Robustness against a misspecified reproducing kernel Hilbert space. (6) Natural stopping criterion.
△ Less
Submitted 19 April, 2024; v1 submitted 18 April, 2024;
originally announced April 2024.
-
Policy Gradient with Kernel Quadrature
Authors:
Satoshi Hayakawa,
Tetsuro Morimura
Abstract:
Reward evaluation of episodes becomes a bottleneck in a broad range of reinforcement learning tasks. Our aim in this paper is to select a small but representative subset of a large batch of episodes, only on which we actually compute rewards for more efficient policy gradient iterations. We build a Gaussian process modeling of discounted returns or rewards to derive a positive definite kernel on t…
▽ More
Reward evaluation of episodes becomes a bottleneck in a broad range of reinforcement learning tasks. Our aim in this paper is to select a small but representative subset of a large batch of episodes, only on which we actually compute rewards for more efficient policy gradient iterations. We build a Gaussian process modeling of discounted returns or rewards to derive a positive definite kernel on the space of episodes, run an ``episodic" kernel quadrature method to compress the information of sample episodes, and pass the reduced episodes to the policy network for gradient updates. We present the theoretical background of this procedure as well as its numerical illustrations in MuJoCo tasks.
△ Less
Submitted 5 December, 2023; v1 submitted 23 October, 2023;
originally announced October 2023.
-
Adaptive Batch Sizes for Active Learning A Probabilistic Numerics Approach
Authors:
Masaki Adachi,
Satoshi Hayakawa,
Martin Jørgensen,
Xingchen Wan,
Vu Nguyen,
Harald Oberhauser,
Michael A. Osborne
Abstract:
Active learning parallelization is widely used, but typically relies on fixing the batch size throughout experimentation. This fixed approach is inefficient because of a dynamic trade-off between cost and speed -- larger batches are more costly, smaller batches lead to slower wall-clock run-times -- and the trade-off may change over the run (larger batches are often preferable earlier). To address…
▽ More
Active learning parallelization is widely used, but typically relies on fixing the batch size throughout experimentation. This fixed approach is inefficient because of a dynamic trade-off between cost and speed -- larger batches are more costly, smaller batches lead to slower wall-clock run-times -- and the trade-off may change over the run (larger batches are often preferable earlier). To address this trade-off, we propose a novel Probabilistic Numerics framework that adaptively changes batch sizes. By framing batch selection as a quadrature task, our integration-error-aware algorithm facilitates the automatic tuning of batch sizes to meet predefined quadrature precision objectives, akin to how typical optimizers terminate based on convergence thresholds. This approach obviates the necessity for exhaustive searches across all potential batch sizes. We also extend this to scenarios with constrained active learning and constrained optimization, interpreting constraint violations as reductions in the precision requirement, to subsequently adapt batch construction. Through extensive experiments, we demonstrate that our approach significantly enhances learning efficiency and flexibility in diverse Bayesian batch active learning and Bayesian optimization applications.
△ Less
Submitted 21 February, 2024; v1 submitted 9 June, 2023;
originally announced June 2023.
-
Quantum Ridgelet Transform: Winning Lottery Ticket of Neural Networks with Quantum Computation
Authors:
Hayata Yamasaki,
Sathyawageeswar Subramanian,
Satoshi Hayakawa,
Sho Sonoda
Abstract:
A significant challenge in the field of quantum machine learning (QML) is to establish applications of quantum computation to accelerate common tasks in machine learning such as those for neural networks. Ridgelet transform has been a fundamental mathematical tool in the theoretical studies of neural networks, but the practical applicability of ridgelet transform to conducting learning tasks was l…
▽ More
A significant challenge in the field of quantum machine learning (QML) is to establish applications of quantum computation to accelerate common tasks in machine learning such as those for neural networks. Ridgelet transform has been a fundamental mathematical tool in the theoretical studies of neural networks, but the practical applicability of ridgelet transform to conducting learning tasks was limited since its numerical implementation by conventional classical computation requires an exponential runtime $\exp(O(D))$ as data dimension $D$ increases. To address this problem, we develop a quantum ridgelet transform (QRT), which implements the ridgelet transform of a quantum state within a linear runtime $O(D)$ of quantum computation. As an application, we also show that one can use QRT as a fundamental subroutine for QML to efficiently find a sparse trainable subnetwork of large shallow wide neural networks without conducting large-scale optimization of the original network. This application discovers an efficient way in this regime to demonstrate the lottery ticket hypothesis on finding such a sparse trainable neural network. These results open an avenue of QML for accelerating learning tasks with commonly used classical neural networks.
△ Less
Submitted 11 September, 2023; v1 submitted 27 January, 2023;
originally announced January 2023.
-
SOBER: Highly Parallel Bayesian Optimization and Bayesian Quadrature over Discrete and Mixed Spaces
Authors:
Masaki Adachi,
Satoshi Hayakawa,
Saad Hamid,
Martin Jørgensen,
Harald Oberhauser,
Micheal A. Osborne
Abstract:
Batch Bayesian optimisation and Bayesian quadrature have been shown to be sample-efficient methods of performing optimisation and quadrature where expensive-to-evaluate objective functions can be queried in parallel. However, current methods do not scale to large batch sizes -- a frequent desideratum in practice (e.g. drug discovery or simulation-based inference). We present a novel algorithm, SOB…
▽ More
Batch Bayesian optimisation and Bayesian quadrature have been shown to be sample-efficient methods of performing optimisation and quadrature where expensive-to-evaluate objective functions can be queried in parallel. However, current methods do not scale to large batch sizes -- a frequent desideratum in practice (e.g. drug discovery or simulation-based inference). We present a novel algorithm, SOBER, which permits scalable and diversified batch global optimisation and quadrature with arbitrary acquisition functions and kernels over discrete and mixed spaces. The key to our approach is to reformulate batch selection for global optimisation as a quadrature problem, which relaxes acquisition function maximisation (non-convex) to kernel recombination (convex). Bridging global optimisation and quadrature can efficiently solve both tasks by balancing the merits of exploitative Bayesian optimisation and explorative Bayesian quadrature. We show that SOBER outperforms 11 competitive baselines on 12 synthetic and diverse real-world tasks.
△ Less
Submitted 5 July, 2023; v1 submitted 27 January, 2023;
originally announced January 2023.
-
Sampling-based Nyström Approximation and Kernel Quadrature
Authors:
Satoshi Hayakawa,
Harald Oberhauser,
Terry Lyons
Abstract:
We analyze the Nyström approximation of a positive definite kernel associated with a probability measure. We first prove an improved error bound for the conventional Nyström approximation with i.i.d. sampling and singular-value decomposition in the continuous regime; the proof techniques are borrowed from statistical learning theory. We further introduce a refined selection of subspaces in Nyström…
▽ More
We analyze the Nyström approximation of a positive definite kernel associated with a probability measure. We first prove an improved error bound for the conventional Nyström approximation with i.i.d. sampling and singular-value decomposition in the continuous regime; the proof techniques are borrowed from statistical learning theory. We further introduce a refined selection of subspaces in Nyström approximation with theoretical guarantees that is applicable to non-i.i.d. landmark points. Finally, we discuss their application to convex kernel quadrature and give novel theoretical guarantees as well as numerical observations.
△ Less
Submitted 22 May, 2023; v1 submitted 23 January, 2023;
originally announced January 2023.
-
Fast Bayesian Inference with Batch Bayesian Quadrature via Kernel Recombination
Authors:
Masaki Adachi,
Satoshi Hayakawa,
Martin Jørgensen,
Harald Oberhauser,
Michael A. Osborne
Abstract:
Calculation of Bayesian posteriors and model evidences typically requires numerical integration. Bayesian quadrature (BQ), a surrogate-model-based approach to numerical integration, is capable of superb sample efficiency, but its lack of parallelisation has hindered its practical applications. In this work, we propose a parallelised (batch) BQ method, employing techniques from kernel quadrature, t…
▽ More
Calculation of Bayesian posteriors and model evidences typically requires numerical integration. Bayesian quadrature (BQ), a surrogate-model-based approach to numerical integration, is capable of superb sample efficiency, but its lack of parallelisation has hindered its practical applications. In this work, we propose a parallelised (batch) BQ method, employing techniques from kernel quadrature, that possesses an empirically exponential convergence rate. Additionally, just as with Nested Sampling, our method permits simultaneous inference of both posteriors and model evidence. Samples from our BQ surrogate model are re-selected to give a sparse set of samples, via a kernel recombination algorithm, requiring negligible additional time to increase the batch size. Empirically, we find that our approach significantly outperforms the sampling efficiency of both state-of-the-art BQ techniques and Nested Sampling in various real-world datasets, including lithium-ion battery analytics.
△ Less
Submitted 27 January, 2023; v1 submitted 9 June, 2022;
originally announced June 2022.
-
A Dual-Arm Robot that Manipulates Heavy Plates Cooperatively with a Vacuum Lifter
Authors:
Shogo Hayakawa,
Weiwei Wan,
Keisuke Koyama,
Kensuke Harada
Abstract:
A vacuum lifter is widely used to hold and pick up large, heavy, and flat objects. Conventionally, when using a vacuum lifter, a human worker watches the state of a running vacuum lifter and adjusts the object's pose to maintain balance. In this work, we propose using a dual-arm robot to replace the human workers and develop planning and control methods for a dual-arm robot to raise a heavy plate…
▽ More
A vacuum lifter is widely used to hold and pick up large, heavy, and flat objects. Conventionally, when using a vacuum lifter, a human worker watches the state of a running vacuum lifter and adjusts the object's pose to maintain balance. In this work, we propose using a dual-arm robot to replace the human workers and develop planning and control methods for a dual-arm robot to raise a heavy plate with the help of a vacuum lifter. The methods help the robot determine its actions by considering the vacuum lifer's suction position and suction force limits. The essence of the methods is two-fold. First, we build a Manipulation State Graph (MSG) to store the weighted logical relations of various plate contact states and robot/vacuum lifter configurations, and search the graph to plan efficient and low-cost robot manipulation sequences. Second, we develop a velocity-based impedance controller to coordinate the robot and the vacuum lifter when lifting an object. With its help, a robot can follow the vacuum lifter's motion and realize compliant robot-vacuum lifter collaboration. The proposed planning and control methods are investigated using real-world experiments. The results show that a robot can effectively and flexibly work together with a vacuum lifter to manipulate large and heavy plate-like objects with the methods' support.
△ Less
Submitted 20 March, 2022;
originally announced March 2022.
-
Positively Weighted Kernel Quadrature via Subsampling
Authors:
Satoshi Hayakawa,
Harald Oberhauser,
Terry Lyons
Abstract:
We study kernel quadrature rules with convex weights. Our approach combines the spectral properties of the kernel with recombination results about point measures. This results in effective algorithms that construct convex quadrature rules using only access to i.i.d. samples from the underlying measure and evaluation of the kernel and that result in a small worst-case error. In addition to our theo…
▽ More
We study kernel quadrature rules with convex weights. Our approach combines the spectral properties of the kernel with recombination results about point measures. This results in effective algorithms that construct convex quadrature rules using only access to i.i.d. samples from the underlying measure and evaluation of the kernel and that result in a small worst-case error. In addition to our theoretical results and the benefits resulting from convex weights, our experiments indicate that this construction can compete with the optimal bounds in well-known examples.
△ Less
Submitted 11 October, 2022; v1 submitted 20 July, 2021;
originally announced July 2021.
-
A Dual-arm Robot that Autonomously Lifts Up and Tumbles Heavy Plates Using Crane Pulley Blocks
Authors:
Shogo Hayakawa,
Weiwei Wan,
Keisuke Koyama,
Kensuke Harada
Abstract:
This paper develops a planner that plans the action sequences and motion for a dual-arm robot to lift up and flip heavy plates using crane pulley blocks. The problem is motivated by the low payload of modern collaborative robots. Instead of directly manipulating heavy plates that collaborative robots cannot afford, the paper develops a planner for collaborative robots to operate crane pulley block…
▽ More
This paper develops a planner that plans the action sequences and motion for a dual-arm robot to lift up and flip heavy plates using crane pulley blocks. The problem is motivated by the low payload of modern collaborative robots. Instead of directly manipulating heavy plates that collaborative robots cannot afford, the paper develops a planner for collaborative robots to operate crane pulley blocks. The planner assumes a target plate is pre-attached to the crane hook. It optimizes dual-arm action sequences and plans the robot's dual-arm motion that pulls the rope of the crane pulley blocks to lift up the plate. The crane pulley blocks reduce the payload that each robotic arm needs to bear. When the plate is lifted up to a satisfying pose, the planner plans a pushing motion for one of the robot arms to tumble over the plate while considering force and moment constraints. The article presents the technical details of the planner and several experiments and analysis carried out using a dual-arm robot made by two Universal Robots UR3 arms. The influence of various parameters and optimization goals are investigated and compared in depth. The results show that the proposed planner is flexible and efficient.
△ Less
Submitted 23 January, 2021;
originally announced January 2021.
-
On the minimax optimality and superiority of deep neural network learning over sparse parameter spaces
Authors:
Satoshi Hayakawa,
Taiji Suzuki
Abstract:
Deep learning has been applied to various tasks in the field of machine learning and has shown superiority to other common procedures such as kernel methods. To provide a better theoretical understanding of the reasons for its success, we discuss the performance of deep learning and other methods on a nonparametric regression problem with a Gaussian noise. Whereas existing theoretical studies of d…
▽ More
Deep learning has been applied to various tasks in the field of machine learning and has shown superiority to other common procedures such as kernel methods. To provide a better theoretical understanding of the reasons for its success, we discuss the performance of deep learning and other methods on a nonparametric regression problem with a Gaussian noise. Whereas existing theoretical studies of deep learning have been based mainly on mathematical theories of well-known function classes such as Hölder and Besov classes, we focus on function classes with discontinuity and sparsity, which are those naturally assumed in practice. To highlight the effectiveness of deep learning, we compare deep learning with a class of linear estimators representative of a class of shallow estimators. It is shown that the minimax risk of a linear estimator on the convex hull of a target function class does not differ from that of the original target function class. This results in the suboptimality of linear methods over a simple but non-convex function class, on which deep learning can attain nearly the minimax-optimal rate. In addition to this extreme case, we consider function classes with sparse wavelet coefficients. On these function classes, deep learning also attains the minimax rate up to log factors of the sample size, and linear methods are still suboptimal if the assumed sparsity is strong. We also point out that the parameter sharing of deep neural networks can remarkably reduce the complexity of the model in our setting.
△ Less
Submitted 20 September, 2019; v1 submitted 22 May, 2019;
originally announced May 2019.
-
FurcaNet: An end-to-end deep gated convolutional, long short-term memory, deep neural networks for single channel speech separation
Authors:
Ziqiang Shi,
Huibin Lin,
Liu Liu,
Rujie Liu,
Shoji Hayakawa,
Shouji Harada,
Jiqing Han
Abstract:
Deep gated convolutional networks have been proved to be very effective in single channel speech separation. However current state-of-the-art framework often considers training the gated convolutional networks in time-frequency (TF) domain. Such an approach will result in limited perceptual score, such as signal-to-distortion ratio (SDR) upper bound of separated utterances and also fail to exploit…
▽ More
Deep gated convolutional networks have been proved to be very effective in single channel speech separation. However current state-of-the-art framework often considers training the gated convolutional networks in time-frequency (TF) domain. Such an approach will result in limited perceptual score, such as signal-to-distortion ratio (SDR) upper bound of separated utterances and also fail to exploit an end-to-end framework. In this paper we present an integrated simple and effective end-to-end approach to monaural speech separation, which consists of deep gated convolutional neural networks (GCNN) that takes the mixed utterance of two speakers and maps it to two separated utterances, where each utterance contains only one speaker's voice. In addition long short-term memory (LSTM) is employed for long term temporal modeling. For the objective, we propose to train the network by directly optimizing utterance level SDR in a permutation invariant training (PIT) style. Our experiments on the public WSJ0-2mix data corpus demonstrate that this new scheme can produce more discriminative separated utterances and leading to performance improvement on the speaker separation task.
△ Less
Submitted 17 March, 2019; v1 submitted 2 February, 2019;
originally announced February 2019.