-
Group and Shuffle: Efficient Structured Orthogonal Parametrization
Authors:
Mikhail Gorbunov,
Nikolay Yudin,
Vera Soboleva,
Aibek Alanov,
Alexey Naumov,
Maxim Rakhuba
Abstract:
The increasing size of neural networks has led to a growing demand for methods of efficient fine-tuning. Recently, an orthogonal fine-tuning paradigm was introduced that uses orthogonal matrices for adapting the weights of a pretrained model. In this paper, we introduce a new class of structured matrices, which unifies and generalizes structured classes from previous works. We examine properties o…
▽ More
The increasing size of neural networks has led to a growing demand for methods of efficient fine-tuning. Recently, an orthogonal fine-tuning paradigm was introduced that uses orthogonal matrices for adapting the weights of a pretrained model. In this paper, we introduce a new class of structured matrices, which unifies and generalizes structured classes from previous works. We examine properties of this class and build a structured orthogonal parametrization upon it. We then use this parametrization to modify the orthogonal fine-tuning framework, improving parameter and computational efficiency. We empirically validate our method on different domains, including adapting of text-to-image diffusion models and downstream task fine-tuning in language modeling. Additionally, we adapt our construction for orthogonal convolutions and conduct experiments with 1-Lipschitz neural networks.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Dimension-free Structured Covariance Estimation
Authors:
Nikita Puchkin,
Maxim Rakhuba
Abstract:
Given a sample of i.i.d. high-dimensional centered random vectors, we consider a problem of estimation of their covariance matrix $Σ$ with an additional assumption that $Σ$ can be represented as a sum of a few Kronecker products of smaller matrices. Under mild conditions, we derive the first non-asymptotic dimension-free high-probability bound on the Frobenius distance between $Σ$ and a widely use…
▽ More
Given a sample of i.i.d. high-dimensional centered random vectors, we consider a problem of estimation of their covariance matrix $Σ$ with an additional assumption that $Σ$ can be represented as a sum of a few Kronecker products of smaller matrices. Under mild conditions, we derive the first non-asymptotic dimension-free high-probability bound on the Frobenius distance between $Σ$ and a widely used penalized permuted least squares estimate. Because of the hidden structure, the established rate of convergence is faster than in the standard covariance estimation problem.
△ Less
Submitted 15 June, 2024; v1 submitted 15 February, 2024;
originally announced February 2024.
-
Towards Practical Control of Singular Values of Convolutional Layers
Authors:
Alexandra Senderovich,
Ekaterina Bulatova,
Anton Obukhov,
Maxim Rakhuba
Abstract:
In general, convolutional neural networks (CNNs) are easy to train, but their essential properties, such as generalization error and adversarial robustness, are hard to control. Recent research demonstrated that singular values of convolutional layers significantly affect such elusive properties and offered several methods for controlling them. Nevertheless, these methods present an intractable co…
▽ More
In general, convolutional neural networks (CNNs) are easy to train, but their essential properties, such as generalization error and adversarial robustness, are hard to control. Recent research demonstrated that singular values of convolutional layers significantly affect such elusive properties and offered several methods for controlling them. Nevertheless, these methods present an intractable computational challenge or resort to coarse approximations. In this paper, we offer a principled approach to alleviating constraints of the prior art at the expense of an insignificant reduction in layer expressivity. Our method is based on the tensor-train decomposition; it retains control over the actual singular values of convolutional mappings while providing structurally sparse and hardware-friendly representation. We demonstrate the improved properties of modern CNNs with our method and analyze its impact on the model performance, calibration, and adversarial robustness. The source code is available at: https://github.com/WhiteTeaDragon/practical_svd_conv
△ Less
Submitted 24 November, 2022;
originally announced November 2022.
-
Tensor rank bounds and explicit QTT representations for the inverses of circulant matrices
Authors:
Lev Vysotsky,
Maxim Rakhuba
Abstract:
In this paper, we are concerned with the inversion of circulant matrices and their quantized tensor-train (QTT) structure. In particular, we show that the inverse of a complex circulant matrix $A$, generated by the first column of the form $(a_0,\dots,a_{m-1},0,\dots,0,a_{-n},\dots, a_{-1})^\top$ admits a QTT representation with the QTT ranks bounded by $(m+n)$. Under certain assumptions on the en…
▽ More
In this paper, we are concerned with the inversion of circulant matrices and their quantized tensor-train (QTT) structure. In particular, we show that the inverse of a complex circulant matrix $A$, generated by the first column of the form $(a_0,\dots,a_{m-1},0,\dots,0,a_{-n},\dots, a_{-1})^\top$ admits a QTT representation with the QTT ranks bounded by $(m+n)$. Under certain assumptions on the entries of $A$, we also derive an explicit QTT representation of $A^{-1}$. The latter can be used, for instance, to overcome stability issues arising when numerically solving differential equations with periodic boundary conditions in the QTT format.
△ Less
Submitted 9 May, 2022;
originally announced May 2022.
-
Local convergence of alternating low-rank optimization methods with overrelaxation
Authors:
Ivan V. Oseledets,
Maxim V. Rakhuba,
André Uschmajew
Abstract:
The local convergence of alternating optimization methods with overrelaxation for low-rank matrix and tensor problems is established. The analysis is based on the linearization of the method which takes the form of an SOR iteration for a positive semidefinite Hessian and can be studied in the corresponding quotient geometry of equivalent low-rank representations. In the matrix case, the optimal re…
▽ More
The local convergence of alternating optimization methods with overrelaxation for low-rank matrix and tensor problems is established. The analysis is based on the linearization of the method which takes the form of an SOR iteration for a positive semidefinite Hessian and can be studied in the corresponding quotient geometry of equivalent low-rank representations. In the matrix case, the optimal relaxation parameter for accelerating the local convergence can be determined from the convergence rate of the standard method. This result relies on a version of Young's SOR theorem for positive semidefinite $2 \times 2$ block systems.
△ Less
Submitted 28 June, 2022; v1 submitted 29 November, 2021;
originally announced November 2021.
-
Cherry-Picking Gradients: Learning Low-Rank Embeddings of Visual Data via Differentiable Cross-Approximation
Authors:
Mikhail Usvyatsov,
Anastasia Makarova,
Rafael Ballester-Ripoll,
Maxim Rakhuba,
Andreas Krause,
Konrad Schindler
Abstract:
We propose an end-to-end trainable framework that processes large-scale visual data tensors by looking at a fraction of their entries only. Our method combines a neural network encoder with a tensor train decomposition to learn a low-rank latent encoding, coupled with cross-approximation (CA) to learn the representation through a subset of the original samples. CA is an adaptive sampling algorithm…
▽ More
We propose an end-to-end trainable framework that processes large-scale visual data tensors by looking at a fraction of their entries only. Our method combines a neural network encoder with a tensor train decomposition to learn a low-rank latent encoding, coupled with cross-approximation (CA) to learn the representation through a subset of the original samples. CA is an adaptive sampling algorithm that is native to tensor decompositions and avoids working with the full high-resolution data explicitly. Instead, it actively selects local representative samples that we fetch out-of-core and on-demand. The required number of samples grows only logarithmically with the size of the input. Our implicit representation of the tensor in the network enables processing large grids that could not be otherwise tractable in their uncompressed form. The proposed approach is particularly useful for large-scale multidimensional grid data (e.g., 3D tomography), and for tasks that require context over a large receptive field (e.g., predicting the medical condition of entire organs). The code is available at https://github.com/aelphy/c-pic.
△ Less
Submitted 12 November, 2021; v1 submitted 29 May, 2021;
originally announced May 2021.
-
Automatic differentiation for Riemannian optimization on low-rank matrix and tensor-train manifolds
Authors:
Alexander Novikov,
Maxim Rakhuba,
Ivan Oseledets
Abstract:
In scientific computing and machine learning applications, matrices and more general multidimensional arrays (tensors) can often be approximated with the help of low-rank decompositions. Since matrices and tensors of fixed rank form smooth Riemannian manifolds, one of the popular tools for finding low-rank approximations is to use Riemannian optimization. Nevertheless, efficient implementation of…
▽ More
In scientific computing and machine learning applications, matrices and more general multidimensional arrays (tensors) can often be approximated with the help of low-rank decompositions. Since matrices and tensors of fixed rank form smooth Riemannian manifolds, one of the popular tools for finding low-rank approximations is to use Riemannian optimization. Nevertheless, efficient implementation of Riemannian gradients and Hessians, required in Riemannian optimization algorithms, can be a nontrivial task in practice. Moreover, in some cases, analytic formulas are not even available. In this paper, we build upon automatic differentiation and propose a method that, given an implementation of the function to be minimized, efficiently computes Riemannian gradients and matrix-by-vector products between an approximate Riemannian Hessian and a given vector.
△ Less
Submitted 23 October, 2021; v1 submitted 27 March, 2021;
originally announced March 2021.
-
Spectral Tensor Train Parameterization of Deep Learning Layers
Authors:
Anton Obukhov,
Maxim Rakhuba,
Alexander Liniger,
Zhiwu Huang,
Stamatios Georgoulis,
Dengxin Dai,
Luc Van Gool
Abstract:
We study low-rank parameterizations of weight matrices with embedded spectral properties in the Deep Learning context. The low-rank property leads to parameter efficiency and permits taking computational shortcuts when computing mappings. Spectral properties are often subject to constraints in optimization problems, leading to better models and stability of optimization. We start by looking at the…
▽ More
We study low-rank parameterizations of weight matrices with embedded spectral properties in the Deep Learning context. The low-rank property leads to parameter efficiency and permits taking computational shortcuts when computing mappings. Spectral properties are often subject to constraints in optimization problems, leading to better models and stability of optimization. We start by looking at the compact SVD parameterization of weight matrices and identifying redundancy sources in the parameterization. We further apply the Tensor Train (TT) decomposition to the compact SVD components, and propose a non-redundant differentiable parameterization of fixed TT-rank tensor manifolds, termed the Spectral Tensor Train Parameterization (STTP). We demonstrate the effects of neural network compression in the image classification setting and both compression and improved training stability in the generative adversarial training setting.
△ Less
Submitted 13 July, 2021; v1 submitted 6 March, 2021;
originally announced March 2021.
-
Low rank tensor approximation of singularly perturbed partial differential equations in one dimension
Authors:
Carlo Marcati,
Maxim Rakhuba,
Johan E. M. Ulander
Abstract:
We derive rank bounds on the quantized tensor train (QTT) compressed approximation of singularly perturbed reaction diffusion partial differential equations (PDEs) in one dimension. Specifically, we show that, independently of the scale of the singular perturbation parameter, a numerical solution with accuracy $0<ε<1$ can be represented in QTT format with a number of parameters that depends only p…
▽ More
We derive rank bounds on the quantized tensor train (QTT) compressed approximation of singularly perturbed reaction diffusion partial differential equations (PDEs) in one dimension. Specifically, we show that, independently of the scale of the singular perturbation parameter, a numerical solution with accuracy $0<ε<1$ can be represented in QTT format with a number of parameters that depends only polylogarithmically on $ε$. In other words, QTT compressed solutions converge exponentially to the exact solution, with respect to a root of the number of parameters. We also verify the rank bound estimates numerically, and overcome known stability issues of the QTT based solution of PDEs by adapting a preconditioning strategy to obtain stable schemes at all scales. We find, therefore, that the QTT based strategy is a rapidly converging algorithm for the solution of singularly perturbed PDEs, which does not require prior knowledge on the scale of the singular perturbation and on the shape of the boundary layers.
△ Less
Submitted 14 October, 2020;
originally announced October 2020.
-
T-Basis: a Compact Representation for Neural Networks
Authors:
Anton Obukhov,
Maxim Rakhuba,
Stamatios Georgoulis,
Menelaos Kanakis,
Dengxin Dai,
Luc Van Gool
Abstract:
We introduce T-Basis, a novel concept for a compact representation of a set of tensors, each of an arbitrary shape, which is often seen in Neural Networks. Each of the tensors in the set is modeled using Tensor Rings, though the concept applies to other Tensor Networks. Owing its name to the T-shape of nodes in diagram notation of Tensor Rings, T-Basis is simply a list of equally shaped three-dime…
▽ More
We introduce T-Basis, a novel concept for a compact representation of a set of tensors, each of an arbitrary shape, which is often seen in Neural Networks. Each of the tensors in the set is modeled using Tensor Rings, though the concept applies to other Tensor Networks. Owing its name to the T-shape of nodes in diagram notation of Tensor Rings, T-Basis is simply a list of equally shaped three-dimensional tensors, used to represent Tensor Ring nodes. Such representation allows us to parameterize the tensor set with a small number of parameters (coefficients of the T-Basis tensors), scaling logarithmically with each tensor's size in the set and linearly with the dimensionality of T-Basis. We evaluate the proposed approach on the task of neural network compression and demonstrate that it reaches high compression rates at acceptable performance drops. Finally, we analyze memory and operation requirements of the compressed networks and conclude that T-Basis networks are equally well suited for training and inference in resource-constrained environments and usage on the edge devices.
△ Less
Submitted 13 July, 2021; v1 submitted 13 July, 2020;
originally announced July 2020.
-
Quantized tensor FEM for multiscale problems: diffusion problems in two and three dimensions
Authors:
V. Kazeev,
I. Oseledets,
M. Rakhuba,
Ch. Schwab
Abstract:
Homogenization in terms of multiscale limits transforms a multiscale problem with $n+1$ asymptotically separated microscales posed on a physical domain $D \subset \mathbb{R}^d$ into a one-scale problem posed on a product domain of dimension $(n+1)d$ by introducing $n$ so-called "fast variables". This procedure allows to convert $n+1$ scales in $d$ physical dimensions into a single-scale structure…
▽ More
Homogenization in terms of multiscale limits transforms a multiscale problem with $n+1$ asymptotically separated microscales posed on a physical domain $D \subset \mathbb{R}^d$ into a one-scale problem posed on a product domain of dimension $(n+1)d$ by introducing $n$ so-called "fast variables". This procedure allows to convert $n+1$ scales in $d$ physical dimensions into a single-scale structure in $(n+1)d$ dimensions. We prove here that both the original, physical multiscale problem and the corresponding high-dimensional, one-scale limiting problem can be efficiently treated numerically with the recently developed quantized tensor-train finite-element method (QTT-FEM).
The method is based on restricting computation to sequences of nested subspaces of low dimensions (which are called tensor ranks) within a vast but generic "virtual" (background) discretization space. In the course of computation, these subspaces are computed iteratively and data-adaptively at runtime, bypassing any "offline precomputation". For the purpose of theoretical analysis, such low-dimensional subspaces are constructed analytically to bound the tensor ranks vs. error $τ>0$.
We consider a model linear elliptic multiscale problem in several physical dimensions and show, theoretically and experimentally, that both (i) the solution of the associated high-dimensional one-scale problem and (ii) the corresponding approximation to the solution of the multiscale problem admit efficient approximation by the QTT-FEM. These problems can therefore be numerically solved in a scale-robust fashion by standard (low-order) PDE discretizations combined with state-of-the-art general-purpose solvers for tensor-structured linear systems. We prove scale-robust exponential convergence, i.e., that QTT-FEM achieves accuracy $τ$ with the number of effective degrees of freedom scaling polynomially in $\log τ$.
△ Less
Submitted 2 June, 2020;
originally announced June 2020.
-
Tensor Rank bounds for Point Singularities in $\mathbb{R}^3$
Authors:
Carlo Marcati,
Maxim Rakhuba,
Christoph Schwab
Abstract:
We analyze rates of approximation by quantized, tensor-structured representations of functions with isolated point singularities in ${\mathbb R}^3$. We consider functions in countably normed Sobolev spaces with radial weights and analytic- or Gevrey-type control of weighted semi-norms. Several classes of boundary value and eigenvalue problems from science and engineering are discussed whose soluti…
▽ More
We analyze rates of approximation by quantized, tensor-structured representations of functions with isolated point singularities in ${\mathbb R}^3$. We consider functions in countably normed Sobolev spaces with radial weights and analytic- or Gevrey-type control of weighted semi-norms. Several classes of boundary value and eigenvalue problems from science and engineering are discussed whose solutions belong to the countably normed spaces.
It is shown that quantized, tensor-structured approximations of functions in these classes exhibit tensor ranks bounded polylogarithmically with respect to the accuracy $ε\in(0,1)$ in the Sobolev space $H^1$. We prove exponential convergence rates of three specific types of quantized tensor decompositions: quantized tensor train (QTT), transposed QTT and Tucker-QTT. In addition, the bounds for the patchwise decompositions are uniform with respect to the position of the point singularity. An auxiliary result of independent interest is the proof of exponential convergence of $hp$-finite element approximations for Gevrey-regular functions with point singularities in the unit cube $Q=(0,1)^3$. Numerical examples of function approximations and of Schrödinger-type eigenvalue problems illustrate the theoretical results.
△ Less
Submitted 17 December, 2019;
originally announced December 2019.
-
Low-rank Riemannian eigensolver for high-dimensional Hamiltonians
Authors:
Maxim Rakhuba,
Alexander Novikov,
Ivan Oseledets
Abstract:
Such problems as computation of spectra of spin chains and vibrational spectra of molecules can be written as high-dimensional eigenvalue problems, i.e., when the eigenvector can be naturally represented as a multidimensional tensor. Tensor methods have proven to be an efficient tool for the approximation of solutions of high-dimensional eigenvalue problems, however, their performance deteriorates…
▽ More
Such problems as computation of spectra of spin chains and vibrational spectra of molecules can be written as high-dimensional eigenvalue problems, i.e., when the eigenvector can be naturally represented as a multidimensional tensor. Tensor methods have proven to be an efficient tool for the approximation of solutions of high-dimensional eigenvalue problems, however, their performance deteriorates quickly when the number of eigenstates to be computed increases. We address this issue by designing a new algorithm motivated by the ideas of Riemannian optimization (optimization on smooth manifolds) for the approximation of multiple eigenstates in the tensor-train format, which is also known as matrix product state representation. The proposed algorithm is implemented in TensorFlow, which allows for both CPU and GPU parallelization.
△ Less
Submitted 27 November, 2018;
originally announced November 2018.
-
Alternating least squares as moving subspace correction
Authors:
Ivan Oseledets,
Maxim Rakhuba,
André Uschmajew
Abstract:
In this note we take a new look at the local convergence of alternating optimization methods for low-rank matrices and tensors. Our abstract interpretation as sequential optimization on moving subspaces yields insightful reformulations of some known convergence conditions that focus on the interplay between the contractivity of classical multiplicative Schwarz methods with overlapping subspaces an…
▽ More
In this note we take a new look at the local convergence of alternating optimization methods for low-rank matrices and tensors. Our abstract interpretation as sequential optimization on moving subspaces yields insightful reformulations of some known convergence conditions that focus on the interplay between the contractivity of classical multiplicative Schwarz methods with overlapping subspaces and the curvature of low-rank matrix and tensor manifolds. While the verification of the abstract conditions in concrete scenarios remains open in most cases, we are able to provide an alternative and conceptually simple derivation of the asymptotic convergence rate of the two-sided block power method of numerical algebra for computing the dominant singular subspaces of a rectangular matrix. This method is equivalent to an alternating least squares method applied to a distance function. The theoretical results are illustrated and validated by numerical experiments.
△ Less
Submitted 11 January, 2019; v1 submitted 21 September, 2017;
originally announced September 2017.
-
Vico-Greengard-Ferrando quadratures in the tensor solver for integral equations
Authors:
Valentin Khrulkov,
Maxim Rakhuba,
Ivan Oseledets
Abstract:
Convolution with Green's function of a differential operator appears in a lot of applications e.g. Lippmann-Schwinger integral equation. Algorithms for computing such are usually non-trivial and require non-uniform mesh. However, recently Vico, Greengard and Ferrando developed method for computing convolution with smooth functions with compact support with spectral accuracy, requiring nothing more…
▽ More
Convolution with Green's function of a differential operator appears in a lot of applications e.g. Lippmann-Schwinger integral equation. Algorithms for computing such are usually non-trivial and require non-uniform mesh. However, recently Vico, Greengard and Ferrando developed method for computing convolution with smooth functions with compact support with spectral accuracy, requiring nothing more than Fast Fourier Transform (FFT). Their approach is very suitable for the low-rank tensor implementation which we develop using Quantized Tensor Train (QTT) decomposition.
△ Less
Submitted 5 April, 2017;
originally announced April 2017.
-
Jacobi-Davidson method on low-rank matrix manifolds
Authors:
Maxim Rakhuba,
Ivan Oseledets
Abstract:
In this work we generalize the Jacobi-Davidson method to the case when eigenvector can be reshaped into a low-rank matrix. In this setting the proposed method inherits advantages of the original Jacobi-Davidson method, has lower complexity and requires less storage. We also introduce low-rank version of the Rayleigh quotient iteration which naturally arises in the Jacobi-Davidson method.
In this work we generalize the Jacobi-Davidson method to the case when eigenvector can be reshaped into a low-rank matrix. In this setting the proposed method inherits advantages of the original Jacobi-Davidson method, has lower complexity and requires less storage. We also introduce low-rank version of the Rayleigh quotient iteration which naturally arises in the Jacobi-Davidson method.
△ Less
Submitted 27 March, 2017;
originally announced March 2017.
-
Robust discretization in quantized tensor train format for elliptic problems in two dimensions
Authors:
A. V. Chertkov,
I. V. Oseledets,
M. V. Rakhuba
Abstract:
In this work we propose an efficient black-box solver for two-dimensional stationary diffusion equations, which is based on a new robust discretization scheme. The idea is to formulate an equation in a certain form without derivatives with a non-local stencil, which leads us to a linear system of equations with dense matrix. This matrix and a right-hand side are represented in a low-rank parametri…
▽ More
In this work we propose an efficient black-box solver for two-dimensional stationary diffusion equations, which is based on a new robust discretization scheme. The idea is to formulate an equation in a certain form without derivatives with a non-local stencil, which leads us to a linear system of equations with dense matrix. This matrix and a right-hand side are represented in a low-rank parametric representation -- the quantized tensor train (QTT-) format, and then all operations are performed with logarithmic complexity and memory consumption. Hence very fine grids can be used, and very accurate solutions with extremely high spatial resolution can be obtained. Numerical experiments show that this formulation gives accurate results and can be used up to $2^{60}$ grid points with no problems with conditioning, while total computational time is around several seconds.
△ Less
Submitted 21 December, 2016; v1 submitted 4 December, 2016;
originally announced December 2016.
-
Calculating vibrational spectra of molecules using tensor train decomposition
Authors:
Maxim Rakhuba,
Ivan Oseledets
Abstract:
We propose a new algorithm for calculation of vibrational spectra of molecules using tensor train decomposition. Under the assumption that eigenfunctions lie on a low-parametric manifold of low-rank tensors we suggest using well-known iterative methods that utilize matrix inversion (LOBPCG, inverse iteration) and solve corresponding linear systems inexactly along this manifold. As an application,…
▽ More
We propose a new algorithm for calculation of vibrational spectra of molecules using tensor train decomposition. Under the assumption that eigenfunctions lie on a low-parametric manifold of low-rank tensors we suggest using well-known iterative methods that utilize matrix inversion (LOBPCG, inverse iteration) and solve corresponding linear systems inexactly along this manifold. As an application, we accurately compute vibrational spectra (84 states) of acetonitrile molecule CH$_3$CN on a laptop in one hour using only $100$ MB of memory to represent all computed eigenfunctions.
△ Less
Submitted 6 September, 2016; v1 submitted 26 May, 2016;
originally announced May 2016.
-
Grid-based electronic structure calculations: the tensor decomposition approach
Authors:
Maxim Rakhuba,
Ivan Oseledets
Abstract:
We present a fully grid-based approach for solving Hartree-Fock and all-electron Kohn-Sham equations based on low-rank approximation of three-dimensional electron orbitals. Due to the low-rank structure the total complexity of the algorithm depends linearly with respect to the one-dimensional grid size. Linear complexity allows for the usage of fine grids, e.g. $8192^3$ and, thus, cheap extrapolat…
▽ More
We present a fully grid-based approach for solving Hartree-Fock and all-electron Kohn-Sham equations based on low-rank approximation of three-dimensional electron orbitals. Due to the low-rank structure the total complexity of the algorithm depends linearly with respect to the one-dimensional grid size. Linear complexity allows for the usage of fine grids, e.g. $8192^3$ and, thus, cheap extrapolation procedure.
We test the proposed approach on closed-shell atoms up to the argon, several molecules and clusters of hydrogen atoms. All tests show systematical convergence with the required accuracy.
△ Less
Submitted 30 August, 2015;
originally announced August 2015.
-
Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition
Authors:
Vadim Lebedev,
Yaroslav Ganin,
Maksim Rakhuba,
Ivan Oseledets,
Victor Lempitsky
Abstract:
We propose a simple two-step approach for speeding up convolution layers within large convolutional neural networks based on tensor decomposition and discriminative fine-tuning. Given a layer, we use non-linear least squares to compute a low-rank CP-decomposition of the 4D convolution kernel tensor into a sum of a small number of rank-one tensors. At the second step, this decomposition is used to…
▽ More
We propose a simple two-step approach for speeding up convolution layers within large convolutional neural networks based on tensor decomposition and discriminative fine-tuning. Given a layer, we use non-linear least squares to compute a low-rank CP-decomposition of the 4D convolution kernel tensor into a sum of a small number of rank-one tensors. At the second step, this decomposition is used to replace the original convolutional layer with a sequence of four convolutional layers with small kernels. After such replacement, the entire network is fine-tuned on the training data using standard backpropagation process.
We evaluate this approach on two CNNs and show that it is competitive with previous approaches, leading to higher obtained CPU speedups at the cost of lower accuracy drops for the smaller of the two networks. Thus, for the 36-class character classification CNN, our approach obtains a 8.5x CPU speedup of the whole network with only minor accuracy drop (1% from 91% to 90%). For the standard ImageNet architecture (AlexNet), the approach speeds up the second convolution layer by a factor of 4x at the cost of $1\%$ increase of the overall top-5 classification error.
△ Less
Submitted 24 April, 2015; v1 submitted 19 December, 2014;
originally announced December 2014.
-
Fast multidimensional convolution in low-rank formats via cross approximation
Authors:
M. V. Rakhuba,
I. V. Oseledets
Abstract:
We propose a new cross-conv algorithm for approximate computation of convolution in different low-rank tensor formats (tensor train, Tucker, Hierarchical Tucker). It has better complexity with respect to the tensor rank than previous approaches. The new algorithm has a high potential impact in different applications. The key idea is based on applying cross approximation in the "frequency domain",…
▽ More
We propose a new cross-conv algorithm for approximate computation of convolution in different low-rank tensor formats (tensor train, Tucker, Hierarchical Tucker). It has better complexity with respect to the tensor rank than previous approaches. The new algorithm has a high potential impact in different applications. The key idea is based on applying cross approximation in the "frequency domain", where convolution becomes a simple elementwise product. We illustrate efficiency of our algorithm by computing the three-dimensional Newton potential and by presenting preliminary results for solution of the Hartree-Fock equation on tensor-product grids.
△ Less
Submitted 6 September, 2016; v1 submitted 23 February, 2014;
originally announced February 2014.