An SVD-like Decomposition of Bounded-Input Bounded-Output Functions

Brian Charles Brown¹, Michael King¹, Sean Warnick¹, Enoch Yeung^2,3, David Grimsman¹ 1 - Department of Computer Science, Brigham Young University, UT, 84602; 2 - Department of Mechanical Engineering, University of California, Santa Barbara, CA 93106; 3- Department of Bioengineering, University of California, Santa Barbara, CA 93106. This work was funded by DOE Grant #SC0021693, and has been submitted to the 2024 Conference on Decision and Control. Correspondence should be addressed to Brian Brown at bcbrown365@gmail.com.

Abstract

The Singular Value Decomposition (SVD) of linear functions facilitates the calculation of their 2-induced norm and row and null spaces, hallmarks of linear control theory. In this work, we present a function representation that, similar to SVD, provides an upper bound on the 2-induced norm of bounded-input bounded-output functions, as well as facilitates the computation of generalizations of the notions of row and null spaces. Borrowing from the notion of “lifting” in Koopman operator theory, we construct a finite-dimensional lifting of inputs that relaxes the unitary property of the right-most matrix in traditional SVD, $V^{*}$ , to be an injective, norm-preserving mapping to a slightly higher-dimensional space.

I Introduction

Decomposing a function $f:\mathbb{R}^{n}\to\mathbb{R}^{p}$ into smaller or more manageable terms can often lead to valuable insights and tools to analyze $f$ [23]. For instance, $f$ can be represented as the weighted sum of sinusoids in the Fourier series, allowing one to identify which frequencies are most significant in computing the output of $f$ [8]. Other decomposition methods include using radial basis functions [5], wavelets [16], or polynomials [2], each giving a different perspective on $f$ .

In the case where $f$ is a linear function of finite-dimensional inputs and outputs, then one can use the well-known singular value decomposition (SVD) to identify inputs whose output has a maximum (or minimum) increase in magnitude [7]. The $LU$ decomposition and the $QR$ decomposition give insight into $f^{-1}$ , in other words, how to solve the set of linear equations defined by $f$ and some output [10]. When $f$ represents a linear dynamical system $\dot{x}=f(x)$ ( $f$ maps $\mathbb{R}^{n}$ to itself) then methods such as Jordan decomposition [3] or Schur decomposition [10] can be used to identify stability and other properties of the system.

When $f$ is a linear functional, i.e. is a linear map from $\mathbb{R}^{n}$ to $\mathbb{R}$ , the Riesz Representation Theorem implies that $f$ can be associated with a unique vector $u\in\mathbb{R}^{n}$ such that $f(x)=\langle u,x\rangle$ for any $x\in\mathbb{R}^{n}$ [1]. This seminal result shows that each linear functional can be represented by an element of its domain, which has widespread benefits in computational physics, for instance. Stacking these functionals leads to a similar result, or decomposition, for linear functions mapping $\mathbb{R}^{n}$ to $\mathbb{R}^{p}$ .

Another type of decomposition from which this work draws inspiration is that of Koopman. For a nonlinear dynamical system, the states can be “lifted” to a higher (potentially infinite) dimension, whereby the dynamics of the system are precisely described by the linear Koopman Operator [4]. Much recent work has been devoted to advancing Koopman Operator Theory (see, for instance [22, 13, 15, 21]), including a recent trend to impose that liftings either be invertible [14, 11] or state-inclusive [12]. One key advantage of using the Koopman Operator is that it allows one to use well-known and well-understood tools for analyzing linear dynamical systems, such as the ones described above, in the context of nonlinear systems. While the focus on dynamical systems has generally limited the analysis to functions that map $\mathbb{R}^{n}$ to itself, our goal in this work is to study more general functions that map $\mathbb{R}^{n}$ to $\mathbb{R}^{p}$ .

The main contribution of this paper is to present a novel decomposition for any arbitrary bounded-input bounded-output function $f:\mathbb{R}^{n}\to\mathbb{R}^{p}$ , such that $\|f(x)\|_{2}<c\|x\|_{2}$ for all $x\in\mathbb{R}^{n}$ and for some $c\in\mathbb{R}^{+}<\infty$ . The function $f$ is decomposed into two parts: a linear part and a norm-preserving injective nonlinear part, as stated precisely in Theorem 1. The primary benefit of this decomposition is that tools used for analyzing linear functions, such as SVD, can be adapted to analyze $f$ . Indeed, Theorem 1 shows that our decomposition is a generalization of the SVD to a large class of nonlinear functions.

We note that other work in the literature has the goal to generalize the SVD. For instance, both [20] and [9] develop such ideas, but still restricted to linear functions. The work in [6] is aimed at using a generalized SVD for nonlinear dynamical systems, but only to build observers; the generalization is that one decomposes two matrices instead of one. The works in [18] and [19] address the scenario where only a finite number of observations are known about $f$ , with the goal to identify $f$ by augmenting the data matrix with columns that are functions (or observables) of the original data. While our approach is somewhat similar, the goal of this work is different in that we seek a representation of the function itself. Furthermore, we believe we are unique in enforcing that our observables are norm-preserving, a key to ensuring that the linear part of the decomposition is as descriptive as possible (see Remark 2 after Theorem 1).

I-A Notation

Per notation common in Koopman operator theory, for some linear mapping $K$ and a potentially nonlinear mapping, $g$ , we use $(K\circ g)(x)$ to represent the composition of $K$ with $g$ . However, if $K$ and $g$ are finite-dimensional, for example $K\in\mathbb{R}^{p\times m}$ and $g(x)\in\mathbb{R}^{p}$ , then we define $(K\circ g)(x)=Kg(x)$ , i.e. traditional matrix-vector multiplication. We use $V^{*}$ to denote the Hermitian transpose of some matrix $V$ . The matrices $U$ , $\Sigma$ , and $V^{*}$ will always represent the matrices of the singular value decomposition of some matrix $K$ , such that $K=U\Sigma V^{*}$ , with $U\in\mathbb{R}^{p\times p}$ , a unitary (and therefore norm-preserving, injective, and surjective) matrix, $V^{*}\in\mathbb{R}^{m\times m}$ , another unitary matrix, and $\Sigma\in\mathbb{R}^{p\times m}$ , a real, non-negative, rectangular-diagonal matrix such that the $i$ -th diagonal element is given by $\sigma_{i}$ , such that $\sigma_{i}\geq\sigma_{j}\geq 0$ for all $i<j$ . Calligraphic letters, e.g. $\mathcal{X}$ , are always sets, and $|\mathcal{X}|\in\mathbb{Z}$ is the cardinality of $\mathcal{X}$ .

For some mapping $f:\mathbb{R}^{n}\to\mathbb{R}^{p}$ , $f_{i}:\mathbb{R}^{n}\to\mathbb{R}$ is the $i$ -th component functional of $f$ . For brevity, we will denote $f(x)$ as $f$ and $f_{i}(x)$ as $f_{i}$ when $x$ is arbitrary. We denote the 2-induced norm of $f$ as $\|f\|_{2-2}=\sup_{x}\frac{\|f(x)\|_{2}}{\|x\|_{2}}$ , which for a linear $f$ , is given by the maximum singular value, $\sigma_{1}$ , of the matrix representation of $f$ . For some mapping $v:\mathbb{R}^{n}\to\mathbb{R}^{m}$ , then for some particular $x$ , $\|v(x)\|_{2}$ represents the 2-norm of the vector $v(x)\in\mathbb{R}^{m}$ .

We use $\mathbf{1}_{p}$ to represent a vector of all ones of dimension $p$ . Similarly, $e_{i}$ is a vector of all zeros except a 1 at the $i$ -th index. Occasionally, we will need to refer to the $i$ -th element of a particular vector, $x_{j}$ . In this case, we will use $x_{j,i}$ to denote the $i$ -th element of the $j$ -th vector $x$ .

We will also use element-wise operations on vectors. For some $\sigma\in\mathbb{R}^{p}$ , we define $\sigma^{-1}$ , and $\sigma\odot\sigma$ , and $\sigma^{2}$ as follows:

\sigma^{-1}=\begin{bmatrix}\frac{1}{\sigma_{1}}\\ \frac{1}{\sigma_{2}}\\ \vdots\\ \frac{1}{\sigma_{p}}\\ \end{bmatrix},\quad\sigma\odot\sigma=\sigma^{2}=\begin{bmatrix}\sigma_{1}^{2}% \\ \sigma_{1}^{2}\\ \vdots\\ \sigma_{1}^{2}\\ \end{bmatrix}

II Representation of Vector-Valued BIBO Functions with Norm-Preserving Transformation

Theorem 1.

Let $f:\mathbb{R}^{n}\rightarrow\mathbb{R}^{p}$ be an arbitrary bounded-input bounded-output function. Then there exists a unitary matrix, $U\in\mathbb{R}^{p\times p}$ , a real, non-negative, rectangular-diagonal matrix $\Sigma\in\mathbb{R}^{p\times m}$ , and a norm-preserving, injective mapping, $v:\mathbb{R}^{n}\rightarrow\mathbb{R}^{m}$ , with $m\geq p+n$ , such that:

f(x)=U\Sigma v(x),\quad\text{for all }x\in\mathbb{R}^{n}

(1)

Proof.

Let $m=n+p$ , $\delta:\mathbb{R}^{n}\to\mathbb{R}^{p}$ , $x_{\delta}:=\begin{bmatrix}\delta(x)\\ x\end{bmatrix}$ , and $v:\mathbb{R}^{n}\to\mathbb{R}^{m}$ given by:

v(x):=\frac{\|x\|_{2}}{\|x_{\delta}\|_{2}}x_{\delta}.

(2)

Notice that for any well-defined function $\delta$ , $v$ is both norm-preserving, i.e. $\|v(x)\|_{2}=\|x\|_{2},\;\forall x\in\mathbb{R}^{n}$ , and injective, i.e. for all $x_{1}\neq x_{2}\in\mathbb{R}^{n},v(x_{1})\neq v(x_{2})$ . Choosing the appropriate function $\delta$ and a corresponding real, non-negative, rectangular-diagonal matrix:

\displaystyle\Sigma

\displaystyle:=\left[\begin{array}[]{cccc|c}\sigma_{1}&0&0&\dots&\bf{0}\\ 0&\sigma_{2}&0&\dots&\bf{0}\\ 0&0&\ddots&&\bf{0}\\ \vdots&\vdots&&\sigma_{p}&\bf{0}\\ \end{array}\right]\in\mathbb{R}^{p\times m}

(7)

with (admissible) $\sigma_{i}\geq\sigma_{j}\geq 0$ for $j>i$ , to satisfy Equation (1), is the key to the proof.

In order to ensure that $\sigma_{i}\geq\sigma_{j}\geq 0$ for all $i>j$ , let $f_{q(i)}$ denote the component functional of $f$ with the $i$ -th placement in the relative ranking of the 2-induced norms of the component functionals of $f$ . For example, if $\|f_{j}\|_{2-2}$ were the smallest among all $\|f_{i}\|_{2-2},\;i=1,2,\dots,p$ , then $f_{q(p)}=f_{j}$ , whereas if $\|f_{l}\|_{2-2}$ were the largest, then $f_{q(1)}=f_{l}$ . Let $f_{q}$ represent $f$ with its component functionals re-ordered from largest to smallest induced norm, such that the first index of $f_{q}=f_{q(1)}$ , etc. Let $\Sigma$ be defined given some values $\sigma_{i}$ , for $i=1,2,\dots,p$ , such that:

\displaystyle\overset{p}{\underset{i=1}{\sum}}\frac{\|f_{q(i)}\|_{2-2}^{2}}{% \sigma_{i}^{2}}<1,

(8)

and consider $d_{i}$ given by:

\displaystyle d_{i}:=\frac{\sigma_{i}^{2}\|x\|_{2}^{2}}{f_{q(i)}(x)^{2}}-1,

(9)

and a matrix, $A\in\mathbb{R}^{p\times p}$ , such that:

A:=\begin{bmatrix}d_{1}&-1&\dots&-1\\ -1&d_{2}&\dots&-1\\ \vdots&&\ddots&\vdots\\ -1&\dots&&d_{p}\end{bmatrix}.

(10)

We conjecture that for some choice of admissible $\sigma$ , $A\delta^{2}=\|x\|_{2}^{2}\bf{1}$ , and that this equality will imply that $f(x)=U\Sigma v(x)$ for an appropriate choice of unitary $U$ . To check this, we expand the $i$ -th row of $A\delta^{2}=\|x\|_{2}^{2}\bf{1}$ , to find:

$\displaystyle\left(\frac{\sigma_{i}^{2}\\|x\\|_{2}^{2}}{f_{q(i)}^{2}}-1\right)% \delta_{i}^{2}-\Sigma_{j\neq i}^{p}\delta_{j}^{2}$	$\displaystyle=\\|x\\|_{2}^{2},$	(11)
$\displaystyle\left(1-\frac{f_{q(i)}^{2}}{\sigma_{i}^{2}\\|x\\|_{2}^{2}}\right)% \delta_{i}^{2}-\frac{f_{q(i)}^{2}}{\sigma_{i}^{2}\\|x\\|_{2}^{2}}\Sigma_{j\neq i% }^{p}\delta_{j}^{2}$	$\displaystyle=\frac{f_{q(i)}^{2}}{\sigma_{i}^{2}},$
$\displaystyle\delta_{i}^{2}-\left(\frac{\sigma_{i}^{-2}f_{q(i)}^{2}\delta_{i}^% {2}}{\\|x\\|_{2}^{2}}+\frac{\sigma_{i}^{-2}f_{q(i)}^{2}\Sigma_{j\neq i}^{p}% \delta_{j}^{2}}{\\|x\\|_{2}^{2}}\right)$	$\displaystyle=\sigma_{i}^{-2}f_{q(i)}^{2}.$
$\displaystyle\delta_{i}^{2}-\sigma_{i}^{-2}f_{q(i)}^{2}\frac{\left(\delta_{i}^% {2}+\Sigma_{j\neq i}^{p}\delta_{j}^{2}\right)}{\\|x\\|_{2}^{2}}$	$\displaystyle=\sigma_{i}^{-2}f_{q(i)}^{2}.$

Stacking and using element-wise operations yields:

$\displaystyle\delta^{2}-f_{q}^{2}\odot\sigma^{-2}\frac{\\|\delta\\|_{2}^{2}}{\\|x% \\|_{2}^{2}}$	$\displaystyle=f_{q}^{2}\odot\sigma^{-2}\implies$
$\displaystyle\delta^{2}$	$\displaystyle=f_{q}^{2}\odot\sigma^{-2}\frac{\\|x\\|_{2}^{2}}{\\|x\\|_{2}^{2}}+f_{% q}^{2}\odot\sigma^{-2}\frac{\\|\delta\\|_{2}^{2}}{\\|x\\|_{2}^{2}},$
	$\displaystyle=f_{q}^{2}\odot\sigma^{-2}\frac{(\Sigma_{i}^{n}x_{i}^{2})+(\Sigma% _{i}^{p}\delta_{i}^{2})}{\\|x\\|_{2}^{2}},$
	$\displaystyle=f_{q}^{2}\odot\sigma^{-2}\frac{(\Sigma_{i}^{n}x_{j}^{2})+(\Sigma% _{i}^{p}\delta_{i}^{2})}{\\|x\\|_{2}}\implies$
$\displaystyle\delta$	$\displaystyle=f_{q}\odot\sigma^{-1}\frac{\\|x_{\delta}\\|_{2}}{\\|x\\|_{2}},$
$\displaystyle\sigma\odot\delta$	$\displaystyle=f_{q}\frac{\\|x_{\delta}\\|_{2}}{\\|x\\|_{2}},$
$\displaystyle\sigma\odot\delta\frac{\\|x\\|_{2}}{\\|x_{\delta}\\|_{2}}$	$\displaystyle=f_{q}(x),$
$\displaystyle\Sigma v(x)$	$\displaystyle=f(x).$	(12)

It now remains to identify the $\delta$ that satisfies $A\delta^{2}=\|x\|^{2}\bf{1}$ , which reduces to finding the inverse of $A$ and ensuring that $\delta^{2}$ is positive so that $\delta$ is real-valued. We will first address the inverse of $A$ by using the Woodbury matrix identity. Define $\tilde{A}:=\text{Diag}(A)+I$ , $\tilde{B}:=\mathbf{1}_{p}$ , a column vector of $p$ ones, $\tilde{C}:=-1$ , $\tilde{D}:=\mathbf{1}_{p}^{T}$ , a row vector of $p$ ones, and $\gamma:=-1+\frac{1}{d_{1}+1}+\frac{1}{d_{2}+1}+\dots+\frac{1}{d_{p}+1}$ . We thus have:

	$\displaystyle A^{-1}$	$\displaystyle=(\tilde{A}+\tilde{B}\tilde{C}\tilde{D})^{-1},$
		$\displaystyle=\tilde{A}^{-1}-\tilde{A}^{-1}\tilde{B}(\tilde{C}^{-1}+\tilde{D}% \tilde{A}^{-1}\tilde{B})^{-1}\tilde{D}\tilde{A}^{-1},$
		$\displaystyle=\tilde{A}^{-1}-\tilde{A}^{-1}\textbf{1}_{p}(-1+\mathbf{1}_{p}^{T% }\tilde{A}^{-1}\mathbf{1})^{-1}\mathbf{1}_{p}^{T}\tilde{A}^{-1},$
		$\displaystyle=\tilde{A}^{-1}-\tilde{A}^{-1}\mathbf{1}_{p}\frac{1}{\text{tr}(% \tilde{A}^{-1})-1}\mathbf{1}^{T}_{p}\tilde{A}^{-1},$

	$\displaystyle A^{-1}=\begin{bmatrix}\frac{1}{(d_{1}+1)}&0&\dots&0\\ 0&\frac{1}{(d_{2}+1)}&&\vdots\\ \vdots&&\ddots&\\ 0&\dots&&\frac{1}{d_{p}+1}\end{bmatrix}-$
	$\displaystyle\begin{bmatrix}\frac{1}{d_{1}+1}\\ \vdots\\ \frac{1}{d_{p}+1}\end{bmatrix}\begin{bmatrix}\frac{1}{d_{1}+1}&\dots&\frac{1}{% d_{p}+1}\end{bmatrix}\frac{1}{\gamma}.$

Element-wise, this yields

	$\displaystyle\delta_{i}^{2}=\\|x\\|_{2}^{2}\left(\frac{1}{d_{i}+1}+\sum_{j=1}^{p% }\frac{-1}{(d_{i}+1)(d_{j}+1)\gamma}\right)\implies$		(13)
	$\displaystyle\delta_{i}=\textrm{sgn}(f_{q(i)}(x))\\|x\\|_{2}\left(\frac{1}{d_{i}% +1}+\sum_{j=1}^{p}\frac{-1}{(d_{i}+1)(d_{j}+1)\gamma}\right)^{\frac{1}{2}}.$		(14)

We now see that in order for $\delta$ to be real-valued, the expression $\left(\frac{1}{d_{i}+1}+\sum_{j=1}^{p}\frac{-1}{(d_{i}+1)(d_{j}+1)\gamma}\right)$ must be positive.

$\displaystyle\delta_{i}^{2}>0\implies$
$\displaystyle\\|x\\|_{2}\left(\frac{1}{d_{i}+1}+\overset{p}{\underset{j=1}{\sum}% }\frac{-1}{(d_{i}+1)(d_{j}+1)\gamma}\right)$	$\displaystyle>0\implies$
$\displaystyle\frac{1}{d_{i}+1}>\overset{p}{\underset{j=1}{\sum}}\frac{1}{(d_{i% }+1)(d_{j}+1)\gamma}\implies$
$\displaystyle 1>\overset{p}{\underset{j=1}{\sum}}\frac{1}{(d_{j}+1)\gamma}$		(15)

We postulate that $\gamma$ must be negative in order for (15) to be true. Suppose for contradiction that $\gamma>0$ . This leads to:

	$\displaystyle\gamma>\overset{p}{\underset{j=1}{\sum}}\frac{1}{(d_{j}+1)}\implies$
	$\displaystyle-1+\overset{p}{\underset{j=1}{\sum}}\frac{1}{(d_{j}+1)}>\overset{% p}{\underset{j=1}{\sum}}\frac{1}{(d_{j}+1)}\implies$
	$\displaystyle-1>0,$

which is a contradiction. Therefore, $\gamma$ must be negative so that the direction of the inequality switches when multiplying by $\gamma$ . We thus have:

	$\displaystyle 1>\overset{p}{\underset{j=1}{\sum}}\frac{1}{(d_{j}+1)\gamma}\implies$
	$\displaystyle\gamma<\overset{p}{\underset{j=1}{\sum}}\frac{1}{(d_{j}+1)}\implies$
	$\displaystyle-1+\overset{p}{\underset{j=1}{\sum}}\frac{1}{(d_{j}+1)}<\overset{% p}{\underset{j=1}{\sum}}\frac{1}{(d_{j}+1)}\implies$
	$\displaystyle-1<0.$

Now we need to understand the conditions under which $\gamma$ could be less than zero. The only free parameters in $\gamma$ are the values of $\sigma$ :

	$\displaystyle\gamma=-1+\overset{p}{\underset{j=1}{\sum}}\frac{1}{(d_{j}+1)}$	$\displaystyle<0\implies$
	$\displaystyle\overset{p}{\underset{j=1}{\sum}}\frac{1}{(d_{j}+1)}$	$\displaystyle<1$
	$\displaystyle\overset{p}{\underset{j=1}{\sum}}\frac{1}{\frac{\sigma_{j}^{2}\\|x% \\|_{2}^{2}}{f_{j}(x)^{2}}}$	$\displaystyle<1,\quad\forall x\in\mathbb{R}^{n}$
	$\displaystyle\overset{p}{\underset{j=1}{\sum}}\frac{f_{j}(x)^{2}}{\sigma_{j}^{% 2}\\|x\\|_{2}^{2}}$	$\displaystyle<1,\quad\forall x\in\mathbb{R}^{n}$

However, by the definition of the 2-induced norm of $f_{j}$ , $\frac{|f_{j}(x)|}{\|x\|_{2}}\leq\|f_{j}\|_{2-2},\;\forall x\in\mathbb{R}^{n}$ . Therefore, using the least upper bound property of the 2-induced norm,

\displaystyle\overset{p}{\underset{j=1}{\sum}}\frac{\|f_{q(j)}\|_{2-2}^{2}}{% \sigma_{j}^{2}}<1.

(16)

Thus, $\sigma$ may be chosen to be large enough such that $\gamma<0$ , and when $\gamma<0$ , we have shown that $\delta^{2}$ is always positive and thus $\delta$ is real-valued. In addition, $\sigma$ may be easily chosen such that $\sigma_{i}\geq\sigma_{j}\geq 0$ for all $i>j$ .

Furthermore, since we have shown that $\sigma$ chosen in Equation (8) generates a real-valued $\delta$ , $\delta$ is well-defined and, from Equation (12), we see that that if $U=I\in\mathbb{R}^{p\times p}$ (which is unitary), then:

f(x)=U\Sigma v(x),\;\forall x\in\mathbb{R}^{n},

which completes the proof. ∎

Refer to caption — Figure 1: An example of a norm-preserving mapping: The unit disc ${\cal H}$ in $\mathbb{R}^{2}$ is depicted in blue. Two norm-preserving mappings $g_{1}:\mathcal{H}\to{\cal G}_{1}$ (black and dashed) and $g_{2}:\mathcal{H}\to{\cal G}_{2}$ (orange and solid) are shown. Both are liftings for distinct hypothetical functions, $f_{1}(x)=K\circ g_{1}(x)$ and $f_{2}(x)=K\circ g_{2}(x)$ (per the notation in Remark 1). Since $f_{1}$ and $f_{2}$ are functionals, $K$ can only have one non-zero singular value. The right singular vector corresponding to this non-zero singular value is shown in red and is denoted as $v^{*}_{1}$ . Note then that $g_{2}$ would correspond to a function $f$ that stretches all elements of $\mathcal{H}$ uniformly.

Remark 1.

In the proof, $U\in\mathbb{R}^{p\times p}$ was chosen to be identity and we had $f(x)=U\Sigma v(x)$ . However, $U$ could be selected to be an arbitrary real, unitary matrix, and an additional, real, unitary matrix $V^{*}\in\mathbb{R}^{m\times m}$ could be chosen as well as an injective, norm-preserving lifting, $g$ , similar to the lifting given in the construction, or perhaps learned from data, such that $v(x)=V^{*}\circ g(x)$ . Combined with an appropriate $\Sigma$ , this gives the singular value decomposition of some matrix $K\in\mathbb{R}^{m\times p}$ , composed with a lifting such that:

f(x)=U\Sigma(V^{*}\circ g)(x)=K\circ g(x),

(17)

where $f$ is a bounded-input bounded-output function.

Remark 2.

The requirement that $\|v(x)\|_{2}=\|x\|_{2}$ is essential for identifying a meaningful $\Sigma$ . For example, if $v(x)$ were not norm-preserving (or injective), then a trivial solution would always exist where $v(x)=f(x)$ , and $\Sigma=I$ such that $f(x)=If(x)$ . Instead, the provided constraints require that a set of basis functions be identified that simultaneously 1) are collectively a norm-preserving map of the inputs into an alternate space, and 2) span the image of $f(x)$ . The existence of such a finite-dimensional mapping is not immediately obvious and is the main contribution of this theorem.

Remark 3.

Recall that the Riesz Representation Theorem [17] focuses on offering a representation of the form $f(x)=\langle k,x\rangle$ for linear functionals of $x\in\mathbb{R}^{n}$ , with $k$ , a characterizing vector of the functional, also in $\mathbb{R}^{n}$ . Theorem 1 extends that representation to bounded-input bounded-output functionals, since Equation (17) can be re-written for functionals $f:\mathbb{R}^{n}\to\mathbb{R}$ (using the notation from Remark 1) as:

f(x)=\langle k,g(x)\rangle.

(18)

In the case of linear functionals, $g:\mathbb{R}^{n}\to\mathbb{R}^{n}$ is the identity mapping, i.e. $g(x)=x$ for all $x\in\mathbb{R}^{n}$ . In this way, the given theorem provides a generalization of the Riesz representation to bounded-input bounded-output functionals. The norm-preserving and injectivity properties of $g$ may render this extension useful for future work in nonlinear optimization.

Note that Theorem 1 states that $v(x)$ can always be chosen to be injective. This is a convenience that facilitates computing sets of inputs that are the natural relaxations of the null space and row space of linear functions. To see this, note that the columns of $V^{*}$ , as defined in Remark 1, define a basis for the null space and row space of $K$ , depending on whether their associated singular value is zero or non-zero, respectively. If $g^{-L}$ is the left-inverse of $g$ , and $\mathcal{Y}$ is the intersection of the image of $g$ with the null space of $K$ , then $\{g^{-L}(y)|\;y\in\mathcal{Y}\}$ is the appropriate relaxation of the null space of $f$ .

The injectivity of $v(x)$ is also a way of ensuring that all the nonlinear portions of the computations in $f$ remain reversible until the final linear computation. Furthermore, if $\sigma_{i}=0$ , then $v_{i}(x)$ represents information about $x$ that is lost during the computation of $f(x)$ . Conversely, $v_{1}(x)$ represents information about $x$ that most strongly contributes to the 2-norm of $f(x)$ .

Note that in the construction, the function $v$ maps from a lower dimensional space into a higher-dimensional space (sometimes referred to as a “lifting”). This is important for maintaining injectivity without making $v(x)$ difficult to compute. However, it is only necessary that $v$ map $\mathbb{R}^{n}\to\mathbb{R}^{p+1}$ , as demonstrated in Lemma (1) in the Appendix. Furthermore, if $p<n-2$ , then $v(x)$ need not be a “lifting” at all, but rather a mapping more aptly called a “lowering.” We leave the detailing of such mappings to future work, noting that they may likely be harder to compute. In contrast, the construction given in the proof is easily computed, as will be demonstrated in Section III.

II-A Bounds on Induced Norms

There are several reasons why bounded-input bounded-output functions are a natural extension of linear functions:

•

All linear functions are bounded-input bounded-output functions.
•

All bounded-input bounded-output functions can be bounded by a linear envelope.

This linear envelope is intricately connected to the $\sigma$ given in the construction:

Corollary 1.

f(x)=U\Sigma v(x),\quad\text{for all }x\in\mathbb{R}^{n},\quad\forall x\in% \mathbb{R}^{n},

and $\sigma_{1}$ , the maximum entry in $\Sigma$ , is an upper bound on the 2-induced norm of $f$ , i.e.

\|f\|_{2-2}\|x\|_{2}<\|x\|_{2}\sigma_{1},\quad\forall x\in\mathbb{R}^{n}

(19)

Proof.

	$\displaystyle\\|f\\|_{2-2}$	$\displaystyle=\sup_{x}\frac{\\|f(x)\\|_{2}}{\\|x\\|_{2}}$
		$\displaystyle=\sup_{x}\frac{\\|U\Sigma v(x)\\|_{2}}{\\|x\\|_{2}}$
		$\displaystyle\leq\sup_{x}\frac{\\|U\\|_{2-2}\\|\Sigma\\|_{2-2}\\|v(x)\\|_{2}}{\\|x\\|_% {2}}$
		$\displaystyle=\sup_{x}\frac{\\|\Sigma\\|_{2-2}\\|x\\|_{2}}{\\|x\\|_{2}}$
		$\displaystyle=\\|\Sigma\\|_{2-2}$

By the constraint in Equation (8) in the construction, $\sigma_{i}>f_{i}$ for all $i=1,2,\dots p$ . This makes the inequality strict, i.e. $\|f\|_{2-2}<\sigma_{1}$ . ∎

Thus, $\sigma$ becomes more meaningful when it is minimized during the construction of the function representation.

In linear functions, any scaling of $x^{*}=\sup_{x}\frac{\|f(x)\|_{2}}{\|x\|_{2}}$ will be stretched by the same amount, $\sigma_{1}$ , under $f$ . In the extension to bounded-input bounded-output functions, this is no longer the case. In bounded-input bounded-output functions, the point or set of points that achieve the induced norm of the function may be an irregular set in $\mathbb{R}^{n}$ . For example, in the first panel of Figure 2, only a few inputs come close to achieving the upper bound given by $\pm\sigma_{1}\|x\|_{2}$ .

III Examples and Experiments

We tested the construction given in Theorem 1 on several bounded-input bounded output functions. In each case, $f(x_{i})=Kg(x_{i})$ for all $x_{i}$ tested. The first function tested was a single-input single-output bounded-input bounded-output function with a 2-induced norm of 1:

f(x)=\frac{x\sin(x)+x\cos(x^{2})}{2}.

(20)

The results of this experiment, as well as a visualization of the computed lifting, can be seen in Figure 2.

We also visualized the lifting for a multi-input single-output function, which, for convenience, we will write as several component functions, listed below:

	$\displaystyle h_{1}(x)$	$\displaystyle=\sin(0.1x_{1}*x_{2}),$
	$\displaystyle h_{2}(x)$	$\displaystyle=0.1\cos(3\frac{x_{1}/x_{2}}{)},$
	$\displaystyle h_{3}(x)$	$\displaystyle=0.4\sin(20x_{1}),$
	$\displaystyle h_{4}(x)$	$\displaystyle=0.3\cos(x_{2}+4),$
	$\displaystyle h_{5}(x)$	$\displaystyle=0.3\sin(0.1e^{x_{1}}),$
	$\displaystyle h_{6}(x)$	$\displaystyle=0.2\cos(\frac{1}{x_{1}^{2}}),$
	$\displaystyle h_{7}(x)$	$\displaystyle=0.1\sin(0.1(x_{1}+x_{2})),$
	$\displaystyle h_{8}(x)$	$\displaystyle=0.1\cos(0.001x_{2}^{2}),$

such that:

f(x)=\frac{\|x\|_{2}}{2.5}\overset{8}{\underset{i=1}{\sum}}h_{i}(x).

(21)

The function and computed lifting can be visualized in Figure 3.

IV Conclusion

Here we have demonstrated that every bounded-input bounded-output function, $f:\mathbb{R}^{n}\to\mathbb{R}^{p}$ , has an SVD-like decomposition with a finite-dimensional representation of the form:

f(x)=U\Sigma v(x)

By leveraging an injective “lifting,” i.e. $v:\mathbb{R}^{n}\to\mathbb{R}^{n+p}$ , this decomposition facilitates the computation of the extensions of null and row spaces for bounded-input and bounded-output functions. A constraint on the lifting to be norm-preserving causes the 2-induced norm of $f$ to be upper-bounded by the maximum element of $\Sigma$ . When $p=1$ , the representation also provides a natural extension of the Reisz Reprsentation Theorem to bounded-input bounded-output functionals.

V Acknowledgments

We express our gratitude to Tyler Jarvis, Mark Transtrum, and Jared Whitehead for their invaluable internal reviews of the early stages of this work.

References

[1] George Bachman and Lawrence Narici. Functional analysis. Courier Corporation, 2000.
[2] David R Barton and Richard Zippel. Polynomial decomposition algorithms. Journal of Symbolic Computation, 1(2):159–168, 1985.
[3] Frédéric Brechenmacher. Histoire du théorème de Jordan de la décomposition matricielle (1870-1930). Formes de représentation et méthodes de décomposition. PhD thesis, Ecole des Hautes Etudes en Sciences Sociales (EHESS), 2006.
[4] Steven L. Brunton, Marko Budišić, Eurika Kaiser, and J. Nathan Kutz. Modern koopman theory for dynamical systems. SIAM Review, 64(2):229–340, 2022.
[5] M. D. Buhmann. Radial basis functions. Acta Numerica, 9:1–38, 2000.
[6] Gbolahan P. Dada and Antonios Armaou. Generalized svd reduced-order observers for nonlinear systems. In 2020 American Control Conference (ACC), pages 3473–3478, 2020.
[7] Carl Eckart and Gale Young. The approximation of one matrix by another of lower rank. Psychometrika, 1(3):211–218, 1936.
[8] JBJ Fourier. Mémoire sur la propagation de la chaleur dans les corps solides,(nepublikováno) pro institute de france. Paris, podáno, 21, 1807.
[9] Michael J Greenacre. Theory and applications of correspondence analysis. 1984.
[10] Roger A Horn and Charles R Johnson. Matrix analysis. Cambridge university press, 2012.
[11] Yuhong Jin, Lei Hou, and Shun Zhong. Extended dynamic mode decomposition with invertible dictionary learning. Neural Networks, 173:106177, 2024.
[12] Charles A. Johnson and Enoch Yeung. A class of logistic functions for approximating state-inclusive koopman operators. In 2018 Annual American Control Conference (ACC), pages 4803–4810, 2018.
[13] Alexandre Mauroy, Y Susuki, and Igor Mezić. Koopman operator in systems and control. Springer, 2020.
[14] Yuhuang Meng, Jianguo Huang, and Yue Qiu. Koopman operator learning using invertible neural networks. Journal of Computational Physics, 501:112795, 2024.
[15] Igor Mezić. Analysis of fluid flows via spectral properties of the koopman operator. Annual review of fluid mechanics, 45:357–378, 2013.
[16] Tao Qian, Mang I Vai, and Yuesheng Xu. Wavelet analysis and applications. Springer Science & Business Media, 2007.
[17] Walter Rudin. Real and complex analysis. 1987. Cited on, 156:16, 1987.
[18] Qinghua Tao, Francesco Tonin, Panagiotis Patrinos, and Johan AK Suykens. Nonlinear svd with asymmetric kernels: feature learning and asymmetric nystr $\backslash$ ” om method. arXiv preprint arXiv:2306.07040, 2023.
[19] Prabhakar G Vaidya, Nithin Nagaraj, et al. A non-linear generalization of singular value decomposition and its application to cryptanalysis. arXiv preprint arXiv:0711.4910, 2007.
[20] Charles F. Van Loan. Generalizing the singular value decomposition. SIAM Journal on Numerical Analysis, 13(1):76–83, 1976.
[21] Matthew O Williams, Ioannis G Kevrekidis, and Clarence W Rowley. A data–driven approximation of the koopman operator: Extending dynamic mode decomposition. Journal of Nonlinear Science, 25:1307–1346, 2015.
[22] Enoch Yeung, Soumya Kundu, and Nathan Hodas. Learning deep neural network representations for koopman operators of nonlinear dynamical systems, 2017.
[23] Blaz Zupan, Marko Bohanec, Ivan Bratko, and Janez Demsar. Machine learning by function decomposition. In ICML, pages 421–429. Citeseer, 1997.

VI Appendix

Lemma 1.

There does not exist an injective, norm-preserving mapping, $g:\mathbb{R}^{n}\rightarrow\mathbb{R}^{p}$ , with an associated matrix, $K\in\mathbb{R}^{p\times p}$ , satisfying

f(x)=K\circ g(x)

(22)

for all bounded-input bounded-output functions, $f:\mathbb{R}^{n}\to\mathbb{R}^{p}$ , with $n,p\in\mathbb{N}^{+}$ .

Proof.

Consider a bounded-input bounded-output function, $f:\mathbb{R}^{n}\rightarrow\mathbb{R}^{p}$ , such that

\displaystyle f(x)=\begin{bmatrix}f_{1}(x)\\ f_{2}(x)\\ \vdots\\ f_{p}(x)\end{bmatrix}=\begin{bmatrix}a_{1}\|x\|_{2}\\ a_{2}\|x\|_{2}\\ \vdots\\ a_{p}\|x\|_{2}\end{bmatrix}

(23)

Note that $f(x)$ is trivially bounded-input bounded-output since $\|f_{q(i)}(x)\|_{2}\leq a_{i}\|x\|_{2}$ for $i=1,2,\dots,n$ .

Let $g:\mathbb{R}^{n}\rightarrow\mathbb{R}^{p}$ be any norm-preserving, injective mapping, i.e. $\|g(x)\|_{2}=\|x\|_{2}$ for all $x\in\mathbb{R}^{n}$ , and, for all $x_{1}\neq x_{2}\in\mathbb{R}^{n},g(x_{1})\neq g(x_{2})$ . One consequence of this last property is that for a set with no repeats, $\mathcal{X}\subset\mathbb{R}^{n}$ , with a given cardinality, $c$ , the set $\{g(x)|x\in\mathcal{X}\}$ must have the same cardinality when repeats are removed. Let $|\cdot|$ be an operation measuring the repeat-removed cardinality of a set. Then, $|\mathcal{X}|=|\{g(x)|x\in\mathcal{X}\}|=c$ .

Because $g(x)$ is a norm-preserving mapping, we can, without loss of generality, consider the actions of $f$ and $g$ on a set of inputs, $\mathcal{H}_{r}\subset\mathbb{R}^{n}$ , such that $x\in\mathcal{H}_{r}\implies\|x\|_{2}=r$ . Now consider the set $\mathcal{G}_{r}=\{g(x)|x\in\mathcal{H}_{r}\}$ . Note that if $n>1$ , then $|\mathcal{G}_{r}|=|\mathcal{H}_{r}|=2^{\aleph_{0}}$ , and $\mathcal{G}_{r}$ is necessarily a subset of an origin-centered hypersphere of radius $r$ and dimension $p$ .

Define $K\in\mathbb{R}^{p\times p}$ , an arbitrary linear function, such that its singular value decomposition is given by $K=U\Sigma V^{*}$ , with the standard definitions of $U,\Sigma$ , and $V^{*}$ . Now suppose that $f(x)=(K\circ g)(x)$ . Without loss of generality, we may consider $U=I$ and $V^{*}=I$ , with $I$ of the appropriate dimensions. With these values of $U$ and $V^{*}$ , $f(x)=(K\circ g)(x)\implies f_{q(i)}(x)=\sigma_{i}g_{i}(x)$ .

We now note a relationship between the image of $g$ and the pre-image of $K$ that must be satisfied for $f(x)=(K\circ g)(x)$ to hold. Consider $\mathcal{Y}$ , the set of vectors such that $Ky=f(x)$ for $y\in\mathcal{Y}$ and $x\in\mathcal{H}_{r}$ . Given our choice of $f$ , $f(x)$ is identical for all $x\in\mathcal{H}_{r}$ . If $n>1$ (i.e. $|\mathcal{H}_{r}|=2^{\aleph_{0}}$ ), then to simultaneously satisfy injectivity and representation, the intersection of $\mathcal{Y}$ and $\mathcal{G}_{r}$ must have the cardinality of the continuum, i.e. $|\mathcal{G}_{r}\cap\mathcal{Y}|=2^{\aleph_{0}}$ .

Now let $P_{i}(g(x))$ be the projection of $g(x)$ onto the one-dimensional subspace spanned by $v^{*}_{i}$ . Given the norm-preserving constraints on $g(x)$ , we then have that $P_{i}(g(x))=b_{i}\|x\|_{2}$ , for some $-1\leq b_{k}\leq 1$ . Note that since there is no upper bound on $\sigma$ , $\sigma_{i}$ and $g_{i}$ (and therefore $b_{i}$ ) can always be chosen such that $\sigma_{i}b_{i}=a_{k}$ , implying that the intersection of the image of $g(x)$ for $x\in\mathcal{H}_{r}$ and $\mathcal{Y}$ can be non-empty.

Let $\mathcal{Y}_{i}\in\mathcal{H}$ be the set of all vectors reachable from $g$ and in the pre-image of $K$ satisfying

\displaystyle Ky_{i}=\begin{bmatrix}c_{1}\\ c_{2}\\ \vdots\\ \sigma_{i}g_{i}(g^{-L}(y_{i}))\\ \vdots\\ c_{p}\end{bmatrix},\quad\text{for }y_{i}\in\mathcal{Y}_{i}

with $c_{j\neq i}\in\mathbb{R}^{p}$ not specified.

Thus, for $y\in\mathcal{Y}=(\overset{p}{\underset{i}{\cap}}\mathcal{Y}_{i})$ ,

Ky=\begin{bmatrix}\sigma_{1}g_{1}(g^{-L}(y))\\ \sigma_{2}g_{2}(g^{-L}(y))\\ \vdots\\ \sigma_{p}g_{p}(g^{-L}(y))\\ \end{bmatrix}

(24)

and furthermore $\mathcal{H}_{r}\cap\mathcal{Y}=\emptyset$ , unless $\sigma$ is chosen such that $\sigma_{i}P_{i}(g_{i}(g^{-L}(y)))=a_{i}\|g_{i}^{-L}(y)\|_{2}$ , in which case $|\mathcal{H}_{r}\cap\mathcal{Y}|=1$ , see Figure 4. This non-infinite cardinality violates the injectivity of $g$ since $|\mathcal{G}|=2^{\aleph_{0}}$ , implying that an injective, norm-preserving mapping $g(x)$ can only satisfy $f(x)=(K\circ g)(x)$ for a single $x$ when $f_{q(i)}(x)=a_{i}\|x\|_{2}$ . ∎

$\displaystyle\left(\frac{\sigma_{i}^{2}\\|x\\|_{2}^{2}}{f_{q(i)}^{2}}-1\right)% \delta_{i}^{2}-\Sigma_{j\neq i}^{p}\delta_{j}^{2}$	$\displaystyle=\\|x\\|_{2}^{2},$	(11)
$\displaystyle\left(1-\frac{f_{q(i)}^{2}}{\sigma_{i}^{2}\\|x\\|_{2}^{2}}\right)% \delta_{i}^{2}-\frac{f_{q(i)}^{2}}{\sigma_{i}^{2}\\|x\\|_{2}^{2}}\Sigma_{j\neq i% }^{p}\delta_{j}^{2}$	$\displaystyle=\frac{f_{q(i)}^{2}}{\sigma_{i}^{2}},$
$\displaystyle\delta_{i}^{2}-\left(\frac{\sigma_{i}^{-2}f_{q(i)}^{2}\delta_{i}^% {2}}{\\|x\\|_{2}^{2}}+\frac{\sigma_{i}^{-2}f_{q(i)}^{2}\Sigma_{j\neq i}^{p}% \delta_{j}^{2}}{\\|x\\|_{2}^{2}}\right)$	$\displaystyle=\sigma_{i}^{-2}f_{q(i)}^{2}.$
$\displaystyle\delta_{i}^{2}-\sigma_{i}^{-2}f_{q(i)}^{2}\frac{\left(\delta_{i}^% {2}+\Sigma_{j\neq i}^{p}\delta_{j}^{2}\right)}{\\|x\\|_{2}^{2}}$	$\displaystyle=\sigma_{i}^{-2}f_{q(i)}^{2}.$

	$\displaystyle\\|f\\|_{2-2}$	$\displaystyle=\sup_{x}\frac{\\|f(x)\\|_{2}}{\\|x\\|_{2}}$
		$\displaystyle=\sup_{x}\frac{\\|U\Sigma v(x)\\|_{2}}{\\|x\\|_{2}}$
		$\displaystyle\leq\sup_{x}\frac{\\|U\\|_{2-2}\\|\Sigma\\|_{2-2}\\|v(x)\\|_{2}}{\\|x\\|_% {2}}$
		$\displaystyle=\sup_{x}\frac{\\|\Sigma\\|_{2-2}\\|x\\|_{2}}{\\|x\\|_{2}}$
		$\displaystyle=\\|\Sigma\\|_{2-2}$