Quantum visual feature encoding revisited

Nguyen, Xuan-Bac; Nguyen, Hoang-Quan; Churchill, Hugh; Khan, Samee U.; Luu, Khoa

doi:10.1007/s42484-024-00192-x

Quantum visual feature encoding revisited

Research Article
Open access
Published: 17 September 2024

Volume 6, article number 61, (2024)
Cite this article

Download PDF

You have full access to this open access article

Quantum Machine Intelligence Aims and scope Submit manuscript

Quantum visual feature encoding revisited

Download PDF

Xuan-Bac Nguyen^1,4,
Hoang-Quan Nguyen^1,4,
Hugh Churchill^2,4,
Samee U. Khan³ &
…
Khoa Luu^1,4

526 Accesses
1 Citation
Explore all metrics

Abstract

Although quantum machine learning has been introduced for a while, its applications in computer vision are still limited. This paper, therefore, revisits the quantum visual encoding strategies, the initial step in quantum machine learning. Investigating the root cause, we uncover that the existing quantum encoding design fails to ensure information preservation of the visual features after the encoding process, thus complicating the learning process of the quantum machine learning models. In particular, the problem, termed the “Quantum Information Gap” (QIG), leads to an information gap between classical and corresponding quantum features. We provide theoretical proof and practical examples with visualization for that found and underscore the significance of QIG, as it directly impacts the performance of quantum machine learning algorithms. To tackle this challenge, we introduce a simple but efficient new loss function named Quantum Information Preserving (QIP) to minimize this gap, resulting in enhanced performance of quantum machine learning algorithms. Extensive experiments validate the effectiveness of our approach, showcasing superior performance compared to current methodologies and consistently achieving state-of-the-art results in quantum modeling.

Quantum Computing for Computer Vision: Applications, Challenges, and Research Tracks

A novel image classification framework based on variational quantum algorithms

Article Open access 23 October 2024

Applications of Quantum Embedding in Computer Vision

Discover the latest articles, news and stories from top researchers in related subjects.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Quantum machine learning, as highlighted in Biamonte et al. (2017); Schuld et al. (2015); Ciliberto et al. (2018); Lloyd et al. (2013), represents a promising research direction at the intersection of quantum computing and artificial intelligence. Within this realm, the utilization of quantum computers promises to significantly boost machine learning algorithms by leveraging their innate parallel attributes, thereby showcasing quantum advantages that surpass classical algorithms, as suggested by Harrow and Montanaro (2017). Due to the substantial collaborative endeavors of academia and industry, contemporary quantum devices, often referred to as noisy intermediate-scale quantum (NISQ) devices (Preskill 2018), are now capable of demonstrating quantum advantages in specific meticulously crafted tasks (Arute et al. 2019; Zhong et al. 2020). An emerging research focus lies in leveraging near-term quantum devices for practical machine learning applications, with a prominent approach being hybrid quantum-classical algorithms (Bharti et al. 2022; Cerezo et al. 2021), also referred to as variational quantum algorithms. These algorithms typically employ a classical optimizer to refine quantum neural networks (QNNs), allocating complex tasks to quantum computers while assigning simpler ones to classical computers. In typical quantum machine learning scenarios, a quantum circuit utilized in variational quantum algorithms is commonly divided into two components: a data encoding circuit and a QNN. On the one hand, enhancing these algorithms’ efficacy in handling practical tasks involves the development of various QNN architectures. Numerous architectures, including strongly entangling circuit architectures (Schuld et al. 2020), tree-tensor networks (Grant et al. 2018), quantum convolutional neural networks (Cong et al. 2019), and even automatically searched architectures (Ostaszewski et al. 2021a, b; Zhang et al. 2022; Du et al. 2020), have been proposed. On the other hand, careful design of the encoding circuit is crucial, as it can significantly impact the generalization performance of these algorithms.

Encoding classical information into quantum data is a crucial step, as it directly impacts the performance of quantum machine learning algorithms. These algorithms are designed to optimize objective functions, such as classification, using encoded data. However, quantum encoding poses significant challenges, especially on near-term quantum devices, as highlighted in previous research (Biamonte et al. 2017). While phase and amplitude encoding are foundational approaches, recent advancements have popularized parameterized quantum circuits (PQCs) as the most practical strategy for encoding on NISQ devices (Benedetti et al. 2019). Nevertheless, despite the prevalence of PQCs, it is essential to utilize the basic encoding methods at the first step, such as phase and amplitude encoding. An important question arises regarding whether these encoding strategies guarantee preserving fundamental properties or characteristics of classical data in its quantum form.

Contributions of this work

This paper has three key contributions. First, we identify a challenge with current visual encoding strategies regarding the preservation of information during the transition from classical to quantum data. Specifically, we observe distinct characteristics between feature spaces in quantum computing compared to their classical counterparts, resulting in lower performance of quantum machine learning algorithms than expected. Second, we introduce a simple but efficient novel training approach to generate classical features conducive to quantum machines post-encoding. This method holds promise for substantially enhancing quantum machine learning algorithms. Finally, our empirical experiments demonstrate the state-of-the-art performance of quantum machine learning across diverse benchmarks.

2 Related work

2.1 Quantum computer vision

Several quantum techniques are available for computer vision tasks, such as recognition and classification (O’Malley et al. 2018; Cavallaro et al. 2020), object tracking (Li and Ghosh 2020), transformation estimation (Golyanik and Theobalt 2020), shape alignment and matching (Noormandipour and Wang 2022; Benkner et al. 2021, 2020), permutation synchronization (Birdal et al. 2021), visual clustering (Nguyen et al. 2023), and motion segmentation (Arrigoni et al. 2022). Via Adiabatic Quantum Computing (AQC), O’Malley et al. (2018) applied binary matrix factorization to extract features of facial images. In contrast, Li and Ghosh (2020) reduced redundant detections in multi-object detection. Dendukuri and Luu (2018) presented the image representation using quantum information to reduce the computational resources of classical computers. Cavallaro et al. (2020) presented multi-spectral image classification using quantum SVM. Golyanik and Theobalt (2020) introduced correspondence problems for point sets using AQC to align the rotations between pairs of point sets. Meanwhile, Noormandipour and Wang (2022) proposed a parameterized quantum circuit learning method for the point set matching problem. Using AQC to solve the formulated Quadratic Unconstrained Binary Optimization (QUBO), Nguyen et al. (2023) proposed an unsupervised visual clustering method optimizing the distances between clusters. In contrast, Arrigoni et al. (2022) optimized the matching motions of key points between consecutive frames.

2.2 Hybrid classical-quantum machine learning

Date et al. (2020) implemented a classical high-performance computing model with an Adiabatic Quantum Processor for a classification task on the MNIST dataset. Their experiment evaluated two classification models, i.e., the Deep Belief Network (DBN) and the Restricted Boltzmann Machines (RBM). It is shown that classical computing performs heavy matrix computations efficiently. At the same time, the sampling task is more convenient to quantum computing, as quantum mechanical processes are used to generate samples, making them truly random. Barkoutsos et al. (2020) introduced an improved platform for combinatorial optimization problems using hybrid classical-quantum variational circuits. It was empirically shown that this approach leads to faster convergence to better solutions for all combinatorial optimization problems on both classical simulation and quantum hardware. Romero and Aspuru-Guzik (2021) presented generative modeling of continuous probability distributions via a Hybrid Quantum-Classical model. Inspired by convolutional neural networks, Liu et al. (2021) proposed a hybrid quantum-classical convolutional neural network using the quantum advantage to enhance the feature mapping process, the most computationally intensive part of the convolutional neural networks. The feature map extracted by a parametrized quantum circuit can detect the correlations of neighboring data points in a complexly large space.

3 Background

3.1 Quantum basics

This section provides a concise introduction to fundamental concepts in quantum computing essential for this paper. For a detailed comprehensive review, we refer to Nielsen and Chuang (2001). In quantum computing, quantum information is typically expressed through n-qubit (pure) quantum states within the Hilbert space ${{\mathbb {C}}}^{2^n}$. Specifically, a pure quantum state can be denoted by a unit vector $| \psi \rangle \in {{\mathbb {C}}}^{2^n}$ (or $\langle \psi |$), where the ket notation $| \rangle $ signifies a column vector, and the bra notation $\langle \psi |=| \psi \rangle ^\textsf{T}$ with $\textsf{T}$ indicating the conjugate transpose, represents a row vector.

Mathematically, the evaluation of a pure quantum state $| \psi \rangle $ is delineated by employing a quantum circuit, often called a quantum gate. It is represented as $| \psi ^\prime \rangle =U| \psi \rangle $, where U denotes the unitary operator (matrix) signifying the quantum circuit, and $| \psi ^\prime \rangle $ represents the quantum state after the evolution. Standard single-qubit quantum gates encompass the Pauli operators.

$$\begin{aligned} X := \begin{bmatrix} 0 & 1\\ 1 & 0\end{bmatrix}, Y := \begin{bmatrix} 0 & -i\\ i & 0\end{bmatrix}, Z := \begin{bmatrix} 1 & 0\\ 0 & -1\end{bmatrix}, \end{aligned}$$

(1)

The corresponding rotation gates denoted by $R_P(\theta )= \text {exp}(-i\theta P/2)$ $ = \cos \frac{\theta }{2} I -i\sin \frac{\theta }{2}P$, where the rotation angle $\theta \in [0,2\pi )$ and $P\in \{X,Y,Z\}$ indicating rotation around X, Y, Z coordinates. In this paper, multiple-qubit quantum gates mainly include the identity gate I, the CNOT gate, and the tensor product of single-qubit gates, e.g., $Z\otimes Z$, $Z\otimes I$, $Z^{\otimes n}$ and so on.

Quantum measurement is a method for extracting classical information from a quantum state. For example, given a quantum state $| \psi \rangle $ and an observable H, one can design quantum measurements to obtain the information $\langle \psi | H | \psi \rangle $. This study concentrates on hardware-efficient Pauli measurements, where H is set as Pauli operators or their tensor products. For instance, one might choose $Z_1 = Z\otimes I^{\otimes (n-1)}$, $X_2 = I\otimes X\otimes I^{\otimes (n-2)}$, $Z_1Z_2 = Z\otimes Z\otimes I^{\otimes (n-2)}$, etc., with a total of n qubits.

3.2 Limitations in current quantum encoding methods

Let $\textbf{v} \in {{\mathbb {R}}}^{d}$ be a typical $d-$dimension vector of a classical computer. We denote $\mathcal {E}(\textbf{v})$ to be a quantum encoding function that transforms the vector $\textbf{v}$ into the vector $| \psi \rangle \in {{\mathbb {C}}}^{2^n}$ of quantum states over Hilbert space, where n is the number of qubits.

$$\begin{aligned} | \psi \rangle = \mathcal {E}(\textbf{v}) \end{aligned}$$

(2)

Specifically, the $\mathcal {E}$ can be amplitude, phase encoding, or PQC. It is important to note that the $| \psi \rangle $ represents the qubits’ states; for further usage of the quantum machine learning function, it is necessary to extract information from these quantum states. To accomplish it, the observable denoted as $\mathcal {O}(| \psi \rangle )$ is utilized. In particular, the observable $\mathcal {O}$ measures the state of every single qubit. Let $\textbf{q} = [q(0),\dots ,q(i),\dots ,q({n-1})] \in {{\mathbb {R}}}^{n}$ be a vector of information measured by $\mathcal {O}$ where q(i) is the measurement of $i^{th}$ qubit and formulated as in Eq. 3.

$$\begin{aligned} q(i) = \langle {\psi }\!|\mathcal {O}_i\!|{\psi }\rangle \end{aligned}$$

(3)

In the equation above, a different observable $\mathcal {O}_i$ is applied for each qubit. In particular, $\mathcal {O}_i$ is a unitary operator represented by a matrix. Let P be a Pauli operation where $P \in \{X, Y, Z\}$, the $\mathcal {O}_i$ can be further derived as Eq. 4.

$$\begin{aligned} \mathcal {O}_i = I^{\otimes i} \otimes P \otimes I^{\otimes (n-i-1)} \end{aligned}$$

(4)

According to Eq. 4, we can measure the state of a qubit in any coordinates (X, Y, or Z) of the Hilbert space.

In summary, the relation between quantum information vector $\textbf{q}$ and classical information vector $\textbf{v}$ is represented as Eq. 5.

$$\begin{aligned} \textbf{v} \in {{\mathbb {R}}}^{d} \overset{\mathcal {E}(\textbf{v})}{\underset{\text {Quantum encoding}}{\longmapsto }} | \psi \rangle \in {{\mathbb {R}}}^{2^n} = {{\mathbb {R}}}^{d} \overset{\mathcal {O}(| \psi \rangle )}{\underset{\text {Measurement}}{\longmapsto }} \textbf{q} \in {{\mathbb {R}}}^{n} \end{aligned}$$

(5)

Mathematically, we can define $\mathcal {Q}$ as the function to map $\textbf{v} \rightarrow \textbf{q}$ as Eq. 6.

$$\begin{aligned} \textbf{q} = \mathcal {Q}(\textbf{v}, \mathcal {E}, \mathcal {O}) \end{aligned}$$

(6)

The details of the proposed framework are demonstrated in Fig. 1.

Proposition 1

Consider two different quantum state vectors, denoted as $|\psi _1\rangle $ and $|\psi _2\rangle $, and these corresponding quantum information vectors $\textbf{q}_1$ and $\textbf{q}_2$. We have $\langle \psi _1|\psi _2\rangle \ne \textbf{q}_1^\textsf{T} \textbf{q}_2$ for any Pauli observable and quantum encoding strategies.

Proof

As $q(i) = \langle \psi |\mathcal {O}_i| \psi \rangle $, we have:

$$\begin{aligned} \begin{aligned} \textbf{q}_1^\textsf{T} \textbf{q}_2&= \sum _{i=1}^n q_1(i)q_2(i) \\&= \sum _{i=1}^n \langle \psi _1| \mathcal {O}_i |\psi _1\rangle \langle \psi _2| \mathcal {O}_i |\psi _2\rangle \\&= \langle \psi _1| \left( \sum _{i=1}^n \mathcal {O}_i |\psi _1\rangle \langle \psi _2| \mathcal {O}_i \right) |\psi _2\rangle , \\&= \langle \psi _1| A |\psi _2\rangle , \end{aligned} \end{aligned}$$

(7)

where $A = \sum _{i=1}^n \left( \mathcal {O}_i |\psi _1\rangle \langle \psi _2| \mathcal {O}_i \right) $. We have to prove that $A \ne I$. That is true because:

$$\begin{aligned} \begin{aligned} \text {tr}(A)&= \sum _{i=1}^n \text {tr}(\mathcal {O}_i |\psi _1\rangle \langle \psi _2| \mathcal {O}_i) \\&= \sum _{i=1}^n \text {tr}(\mathcal {O}_i \mathcal {O}_i |\psi _1\rangle \langle \psi _2|) \\&= \sum _{i=1}^n \text {tr}(|\psi _1\rangle \langle \psi _2|) \end{aligned} \end{aligned}$$

(8)

From Proposition 1, since $|\psi _1\rangle \ne |\psi _2\rangle $, then $\langle \psi _1|\psi _2\rangle = \text {tr}(|\psi _1\rangle \langle \psi _2|) < 1$. For that reason, we have $\text {tr}(A) < n$ then $A \ne I$ since $\text {tr}(I) = n$. The proposition 1 has been proven. This proposition indicates that no Pauli observable and quantum encoding strategies keep the information when we transform the classical features into quantum features. $\square $

3.3 Theoretical analysis and problem visualization

In this section, we first pre-define the definition of the term information as the correlation between pairwise vectors.

Theoretical analysis

The goal of encoding $\mathcal {E}$ is to transform a classical feature $\textbf{v} \in \mathbb {R}^d$ into a quantum state $| \psi \rangle \in \mathbb {R}^d$ using fewer bits while retaining maximum information as much as in the classical one. Assuming $\textbf{v}$ is a normalized vector and $\mathcal {E}$ represents an amplitude encoding, the preservation of information is evident as $\textbf{v} = | \psi \rangle $. Additionally, since $\mathcal {E}$ requires fewer than d qubits ($n < d$), it appears to be the optimal choice given these constraints.

However, the limitation of amplitude encoding is its potential unsuitability for many problems. To address this problem, Parametrized Quantum Circuits (PQC) have recently become the most prevalent encoding strategy. PQC incorporates trainable parameters that can be optimized during training, reducing dependencies on specific problems. However, information is not guaranteed to be preserved when representing features in Hilbert spaces of $| \psi \rangle $. Additionally, Proposition 1 suggests that no observables guarantee uniform discriminability between the features $| \psi \rangle $ and $\textbf{q}$. Considering these factors, current encoding strategies fail to ensure the preservation of information when mapping classical features to quantum features, thus creating an information gap.

Looking at it from a different angle, if we temporarily set aside quantum theory, Eq. 6 reveals that $\mathcal {Q}$ serves as a dimension reduction function, mapping $\mathbb {R}^d$ to $\mathbb {R}^n$ where $n \ll d$. As far as we know, no flawless dimension reduction algorithms can preserve pairwise cosine distances between vectors. Even if a perfect algorithm existed, extending its theory to the quantum realm remains an open question.

Problem visualization

Considering the task of face clustering (Nguyen et al. 2021), we assume that a model $\mathcal {M}(x)$ (Deng et al. 2019) is trained with metric loss functions (Wang et al. 2018; Deng et al. 2019) to map a facial image x into a high-dimensional features space. This mapping ensures that similar faces are clustered closely while separating from faces of different identities. As discussed in Nguyen et al. (2021), recent studies have significantly addressed large-scale clustering challenges within classical machine learning. These methods extensively utilize the discriminative nature of facial features, mainly relying on cosine distance in algorithmic design. However, envisioning a quantum counterpart algorithm that perfectly mirrors these methods reveals a crucial limitation. Despite their potential, quantum algorithms struggle to match the performance of classical ones due to the absence of ideal strategies for encoding classical information into quantum formats, as shown in the Proposition 1.

We illustrate the issue in Fig. 2. Specifically, we employ a face recognition model, ResNet50 (He et al. 2016), trained with ArcFace (Deng et al. 2019) on the MSCeleb-1 M database (Guo et al. 2016) using classical machine techniques. We randomly select subjects from the hold-out set and extract their facial features. Subsequently, we process the corresponding quantum information of these features according to Eq. 5. The boundary between these subjects appears blurred in the quantum machine’s perspective, whereas it remains distinct in the classical one. Some samples close together in the classical machine space appear far apart in the quantum space, presenting challenges for quantum algorithms to determine the boundary.

4 Our proposed approach

4.1 Problem formulation

Let $x \in {{\mathbb {R}}}^{h \times w \times c}$ denote the input image where h, w, and c are the image height, width, and number of channels correspondingly. Consider $\textbf{v} = \mathcal {M}(x)$ is the deep features extracted by a model $\mathcal {M}$. Let $\mathcal {K}$ be the function to measure the gap of information between classical vector $\textbf{v}$ and its corresponding quantum vector $\textbf{q}$. Our goal can be presented as in Eq. 9.

$$\begin{aligned} \text {min} \quad \mathcal {K}(\textbf{v}, \textbf{q}) = \mathcal {K}(\mathcal {M}(x), \mathcal {Q}(\mathcal {M}(x), \mathcal {E}, \mathcal {O})) \quad \text {w.r.t} \quad \mathcal {E} \text {,} \mathcal {O} \quad \text {and} \quad \textbf{v} = \mathcal {M}(x) \end{aligned}$$

(9)

4.2 Quantum information preserving loss

In Eq. 9, only $\mathcal {M}$ and $\mathcal {E}$ are considered trainable. Theoretically, we can optimize either $\mathcal {M}$ or $\mathcal {E}$ to minimize the Eq. 9. In this study, however, we concentrate on training $\mathcal {M}$ since, as demonstrated in Eq. 5, $\textbf{q} = \mathcal {M} \circ \mathcal {E} \circ \mathcal {O}$, indicating that $\mathcal {M}$ initiates the quantum encoding process, making it the most critical component to address. Let $\mathcal {F}$ represent the task-specific layer to train the feature representation of x. $\mathcal {M}$ can be optimized with the objective function as in Eq. 10.

$$\begin{aligned} \theta ^*_{\mathcal {M}} = \arg \min _{\theta _{\mathcal {M}}} \mathbb {E}_{x_i \sim p(x_i)} \left[ \mathcal {L} ( \mathcal {F}(\mathcal {M}(x_i)), \hat{y}_i) \right] \end{aligned}$$

(10)

Here, $\hat{y}_i$ and $\mathcal {L}$ denote the ground truth and the loss function, respectively. The common approach (e.g., Deng et al. 2009; He et al. 2016; Liu et al. 2022) typically designs $\mathcal {F}$ as a fully connected layer and employs loss functions such as cross-entropy or metric losses (e.g., Deng et al. 2019; Wang et al. 2018) for training a classification model. For simplicity, we choose cross-entropy as $\mathcal {L}$. It’s important to note that, however, $\mathcal {L}$ is also applicable to metric loss functions like ArcFace or CosFace.

$$\begin{aligned} \mathcal {L} = - \frac{1}{N} \sum _{i=1}^N \text {log} \frac{e^{W_{\hat{y}_i}^\textsf{T} \textbf{v}_i + b_j}}{\sum _{j=1}^C e^{W_j^\textsf{T} \textbf{v}_i + b_j}} \end{aligned}$$

(11)

where $W_j \in {{\mathbb {R}}}^d$ denotes the $j^{th}$ column of the weight $W \in {{\mathbb {R}}}^{d \times C}$. C is the number of classes and $b_j \in {{\mathbb {R}}}$ is the bias term. For simply, we fix $b_j = 0$ as in Wang et al. (2018). The equation turns out $\mathcal {L} = - \frac{1}{N} \sum _{i=1}^N \text {log} \frac{e^{W_{\hat{y}_i}^\textsf{T} \textbf{v}_i}}{\sum _{j=1}^C e^{W_j^\textsf{T} \textbf{v}_i}}$. Interestingly, $W_j$ represents a center vector corresponding to class j. The loss function $\mathcal {L}$ optimizes model $\mathcal {M}$ so that the vector $\textbf{v}_i$ aligns closely with $W_j$ if they belong to the same class in the feature space. Moreover, $W_j^\textsf{T} \textbf{v}$ signifies the cosine distance between the two vectors since as in Deng et al. (2019); Wang et al. (2018) these features are normalized, which precisely fulfills the roles of $| \psi _1 \rangle $ and $| \psi _2 \rangle $ in Proposition 1. Leveraging this elegant property, we can define $\mathcal {K}$ as the Kullback–Leibler divergence (KL) to minimize the information gap formulated in Eq. 9 as follows:

$$\begin{aligned} \begin{aligned} \mathcal {K}&= \frac{1}{N} \sum _{i=1}^N \text {KL}\left( W^\textsf{T} \textbf{v}_i, S^\textsf{T} \textbf{q}_i \right) \\&= \frac{1}{N} \sum _{i=1}^N \sum _{j=1}^C \text {softmax}(W^\textsf{T} \textbf{v}_i)_j \times \text {log}\frac{\text {softmax}(W^\textsf{T} \textbf{v}_i)_j}{\text {softmax}(S^\textsf{T} \textbf{q}_i)_j} \end{aligned} \end{aligned}$$

(12)

where $S_j$ is the corresponding quantum information vector of $W_j$ using Eq. 6. In conclusion, we propose a novel loss function named Quantum Information Preserving Loss to train $\mathcal {M}$ as follows:

$$\begin{aligned} \theta ^*_{\mathcal {M}} = \arg \min _{\theta _{\mathcal {M}}} \mathbb {E}_{x_i \sim p(x_i)} \left[ -\text {log} \frac{e^{W_{\hat{y}_i}^\textsf{T} \textbf{v}_i}}{\sum _{j=1}^C e^{W_j^\textsf{T} \textbf{v}_j}} \!+\! \lambda \times \text {KL}\left( W^\textsf{T} \textbf{v}_i, S^\textsf{T} \textbf{q}_i \right) \right] \end{aligned}$$

(13)

where $\lambda $ is the loss factor for controlling how much information is preserved. Using this loss function, the model $\mathcal {M}$ can produce the feature $\textbf{v}$, which is friendly with the quantum machine by keeping as much information after the quantum encoding. We also provide the pseudo-code in the Algorithm 1.

5 Experiment setup and implementation

Given that Proposition 1 implies the information as the relationship between two vectors, i.e., cosine similarity, selecting the model $\mathcal {M}$ optimized for cosine similarity becomes paramount for problem validation and experimental demonstration. Consequently, this study aims for unsupervised clustering tasks, namely face and landmark clustering, as they align well with models trained using cosine-based loss functions. It is important to note that similar problems, such as classification, also apply to our proposed Proposition 1.

5.1 Experiment setup

We follow the experimental framework outlined in previous studies (Nguyen et al. 2021; Yang et al. 2019, 2020; Shen et al. 2023; Shin et al. 2023; Shen et al. 2021; Wang et al. 2022; Nguyen et al. 2023b). In essence, our clustering methodology consists of three key stages. First, we train a model $\mathcal {M}(x)$ to extract image features x. Second, the k nearest neighbors algorithm, denoted as $\textbf{K}(x_i, k)$, is utilized to identify the k most similar neighbors of a given sample $x_i$, forming a cluster $\mathbf {\Phi }_i = \textbf{K}(x_i, k)$. Finally, as clusters $\mathbf {\Phi }_i$ may encompass erroneous samples due to challenges such as database anomalies or imperfect feature representations by $\mathcal {M}$, previous studies have proposed training a model $\mathcal {N}(\mathbf {\Phi }_i)$ to detect and eliminate these inaccuracies, thereby refining the cluster.

In contrast to prior research, we focus on studying this problem from a quantum perspective. It leads to designing modules, namely $\mathcal {M}(x)$ and $\mathcal {N}(\mathbf {\Phi }_i)$, to operate on quantum hardware to the fullest extent possible. While training $\mathcal {M}(x)$ using our proposed methodology constitutes a critical aspect of this study, We aim to design $\mathcal {N}(\mathbf {\Phi }_i)$ as a quantum machine learning model, thus enabling the entire pipeline to be executed on a quantum machine as much as possible.

Multiple methodologies have addressed the clustering problem on classical computers. These include traditional techniques (Ester et al. 1996; Otto et al. 2017), graph-based methodologies (Wang et al. 2019; Yang et al. 2020, 2019; Shen et al. 2021, 2023; Shin et al. 2023), and transformer-based approaches (Nguyen et al. 2021). While transformer architectures have demonstrated significant success in various computer vision tasks (Li et al. 2022; Yu et al. 2022; Zhai et al. 2023; Luo et al. 2023; Wang et al. 2023; Nguyen et al. 2023a, b, 2020, 2019; Nguyen-Xuan and Lee 2019; Nguyen et al. 2021, 2022, 2023d, c, b; Serna-Aguilera et al. 2024), their potential in quantum computing remains promising. Adapting the typical transformer architecture for quantum systems, as proposed by Chen et al. (2022), offers added convenience. Although graph-based networks present a possible option, the computational challenge of processing large datasets, such as a (5.2M $\times $ 5.2M) sparse matrix on a quantum machine or even a simulated one, poses limitations. In contrast, transformer models do not encounter such constraints. Hence, inspired by the insights from Nguyen et al. (2021), we propose redesigning $\mathcal {N}(\mathbf {\Phi }_i)$ as a transformer-based quantum model.

5.2 Implementation details

We employ ResNet50 architecture to train the model $\mathcal {M}(x)$ as prior works (Wang et al. 2019; Yang et al. 2020; Nguyen et al. 2021). This model is trained on large-scale datasets like MSCeleb-1 M, employing ArcFace (Deng et al. 2019) for feature representation learning. In addition to ArcFace, we integrate the Quantum Information Preserving Loss outlined in Sect. 4 to mitigate information loss during encoding. The loss factor $\lambda $ is configured at 0.5.

To implement the Quantum Clusformer (Nguyen et al. 2024) $\mathcal {N}(\mathbf {\Phi }_i)$, we initially redesign the self-attention layer (Vaswani et al. 2017) tailored for quantum machines. We employ Parameterized Quantum Circuits (PQC) for each Query, Key, and Value layer. We construct transformer blocks suitable for the transformer-based model. Ultimately, we achieve full implementation of the Quantum Clusformer on quantum machines.^{Footnote 1}

For the components running on the classical machine, we use the PyTorch framework while we utilize the torchquantum library (Wang et al. 2022) and cuQuantum to simulate the quantum machine. Since this library relays Pytorch as the backend, we can also leverage GPUs and CUDA to speed up the training process. The models are trained utilizing an 8 $\times $ A100 GPU setup, each with 40GB of memory. The learning rate is initially set to 0.0001, progressively decreasing to zero following the CosineAnnealing policy (Loshchilov and Hutter 2016). Each GPU operates with a batch size of 512. The optimization uses AdamW (Loshchilov and Hutter 2017) for 12 epochs. Training time for the model $\mathcal {M}$ is approximately 2 h, and the training time for the Quantum Clusformer $\mathcal {N}(\mathbf {\Phi }_i)$ is about 4 h.

5.3 Datasets and metrics

5.3.1 Datasets

We follow (Yang et al. 2019, 2020) to use MSCeleb-1 M Guo et al. (2016) and Nguyen et al. (2021) to use the Google Landmarks Dataset Version 2 (GLDv2) Weyand et al. (2020) for experiments.

MSCeleb-1 M

Guo et al. (2016) is a vast face recognition dataset compiled from web sources, encompassing 100,000 identities, with each identity represented by approximately 100 facial images. Nonetheless, the original dataset retains noisy labels. Consequently, we utilize a subset derived from ArcFace (Deng et al. 2019), which undergoes improved annotation post-cleaning. This refined dataset comprises 5.8 million images sourced from 85,000 identities. All images undergo pre-processing, involving alignment and cropping to dimensions of $112 \times 112$.

The Google Landmarks Dataset Version 2 (GLDv2)

Weyand et al. (2020) is one of the largest datasets dedicated to visual landmark recognition and identification. Its cleaned iteration comprises 1.4 million images spanning 85,000 landmarks and 800 h of human annotation. These landmarks span diverse categories and are sourced from various corners of the globe. The dataset exhibits an extremely long-tail distribution, with the number of images per class varying from 0 to 10,000. Compared to face recognition tasks, GLDv2 presents a similar yet notably more challenging scenario. We randomly partition the dataset into three segments, each featuring 28,000 landmarks. Notably, there is no overlap between these partitions. One segment is designated for training the deep visual model and Clusformer, while the remaining segments are reserved for testing purposes. The Fig. 3 demonstrates samples from these datasets.

5.3.2 Metrics

To evaluate the approach for the clustering task, we follow (Yang et al. 2019, 2020; Nguyen et al. 2021) and use Fowlkes Mallows Score to measure the similarity between two clusters with a set of points. This score is computed by taking the geometry mean of precision and recall of the point pairs. Thus, Fowlkes Mallows Score is called Pairwise F-score ($F_P$). BCubed F-score $F_B$ is another popular metric for clustering evaluation focusing on each data point.

6 Experimental results

6.1 Performance on MSCeleb-1 M clustering

The performance of our proposed method is shown in the Table 1. To begin, we define QClusformer as the Clusformer operating on a quantum machine for ease of reference. However, due to hardware constraints, we can only emulate QClusformer with fewer layers/transformer blocks than the original model (Nguyen et al. 2021). To ensure a fair evaluation, we initially retrain the Clusformer, denoted as $\text {Clusformer}^{\dagger }$, on a classical machine using identical configurations to those of QClusformer, explicitly setting the number of encoders to 1. The training process is outlined in Fig. 4a. As a result, the performance of $\text {Clusformer}^{\dagger }$ is slightly inferior to the original model. Notably, the $F_P$ metric decreases from 88.20 to 86.49% on the 584K test set, representing an approximate 2% reduction. It consistently maintains marginally lower performance across both $F_B$ and $F_P$ on the remaining test sets.

Then, we train QClusformer with the strategy as in Fig. 4b. Our chosen encoding strategy is amplitude, paired with Pauli-Z as the observable for the baseline. There is a notable decline in performance, approximately 2.8%. However, employing the QIP Loss function within the same setup is a potent remedy for bridging the information gap between quantum and classical features, resulting in a notable performance recovery. Noted that QClusformer with QIP Loss achieves 87.18% and 91.01% on $F_P$ and $F_B$, respectively, on the 584K test set, surpassing $\text {Clusformer}^{\dagger }$ by 0.6% and 3.2%, respectively. Similar trends are observed across all test sets of MSCeleb-1 M.

These findings underscore the competitive performance of Quantum Clusformer, particularly when leveraging with QIP Loss. Notably, its performance surpasses that of the best-performing Clusformer with a complete setup on a classical machine, signaling the promising capabilities of quantum computing in the clustering problem.

6.2 Performance on Google Landmark clustering

This section compares the proposed method’s performance on the Google Landmark Dataset, a visual landmark clustering dataset shown in Table 2. The experimental setups and evaluation protocols are similar to the previous MSCeleb-1 M section and in the prior work, Nguyen et al. (2021). Similar results to those obtained with the MSCeleb-1 M database are observed. Specifically, $\text {Clusformer}^{\dagger }$, when runs on a classical machine, achieves 17.74% and 38.80% in terms of $F_P$ and $F_B$ respectively. However, when the model operates on a quantum machine named QClusformer, its performance drops significantly to 13.20% and 35.63% for $F_P$ and $F_B$, respectively. Nonetheless, by using the QIP Loss function, the performance rebounds to 19.02% for $F_P$ and 40.28% for $F_B$, surpassing that of $\text {Clusformer}^{\dagger }$ and remaining competitive with the original Clusformer which has 19.32% and 40.63% of $F_P$ and $F_B$.

6.3 Ablation studies

This ablation study section practically proves the Proposition 1.

QIP works with different encoding strategies

In Proposition 1, we present the information gap between quantum and classical machines across various encoding strategies. To demonstrate the efficiency of our proposed method with diverse encoding approaches, we initially hold observables constant, specifically the Pauli-Z, and subsequently change between phase and $U_3$ encoding (Benedetti et al. 2019). Unlike amplitude and phase encoding, $U_3$ represents a Parameterized Quantum Circuit (PQC) with trainable parameters. The performances of these configurations are detailed in Table 3. Remarkably, the QClusformer, trained with QIP Loss, the Pauli-Z observable, and either phase or $U_3$ encoding strategies, consistently outperforms the standalone QClusformer. It underscores the adaptability of the QIP Loss across diverse encoding strategies. Notably, phase and $U_3$ encoding show inferior performance compared to amplitude. As we mentioned in the previous section, the amplitude is naturally fit for the clustering problem than other strategies.

QIP works with different observables

The intuition of these ablation studies is similar to the encoding above strategies. In particular, we fix the encoding strategies as amplitude while experimenting with various observables, i.e., Z, X, and XZ (a combination of measuring both X and Z coordinates). As depicted in Table 4, QClusformer exhibits the highest accuracy in $F_P$ and $F_B$ when utilizing the Z observable, while both X and XZ show slight decreases. When dealing with the Pauli-Y observable, amplitude strategies prove ineffective as they result in all-zero measurements. Consequently, we select $U_3$ for encoding and compare the performance of Pauli-Y versus Pauli-Z. Interestingly, the performance using Pauli-Y remains relatively unchanged compared to Pauli-Z. Nonetheless, these configurations still significantly outperform QClusformer alone, underscoring the versatility of the Quantum Information Processing (QIP) approach across diverse observables.

The role of $\lambda $ - QIP loss factor

We investigate the impact of the control factor $\lambda $ for managing QIP Loss on the performance. To achieve this, we conduct experiments using a subset of 584K samples from the MSCeleb-1 M dataset. The experimental configurations remain consistent with those outlined in the previous section, i.e., employing amplitude encoding and Pauli-Z observable.

Table 1 Performance on face clustering w.r.t the different number of unlabelled test sets

Full size table

Table 2 Performance on landmark clustering w.r.t different quantum encoding and observables

Full size table

Table 3 Ablation studies on different encoding strategies of the MSCeleb-1 M

Full size table

Table 4 Ablation studies on different observables of MSCeleb-1 M

Full size table

The results are shown in Fig. 5. When $\lambda = 0$, indicating the absence of QIP Loss utilization, the performance stands at 83.68% and 86.89% for $F_P$ and $F_B$ respectively, as detailed in Table 1 above. Gradually increasing this parameter yields a steady enhancement in performance. However, the peak performance is attained at $\lambda =0.5$, after which a decline is observed. This phenomenon is due to the role of QIP Loss in minimizing the disparity between quantum and classical features. According to Proposition 1, the gap toward zero only when two vectors ${\textbf {v}}_1$ and ${\textbf {v}}_2$ are identical. In this case, the model $\mathcal {M}$ generates similar features irrespective of input images, leading to model collapse and failure in distinguishing samples from distinct classes. Hence, it is necessary to control $\lambda $ to prevent such collapse. Our investigation found that the optimal value for $\lambda $ within this framework is 0.5.

Quantum feature representations

We investigate how QIP Loss helps to align the features in the quantum computer as in Fig. 6. We randomly select 200 subjects from 581K part of MSCeleb-1 M to extract the features. We employ T-SNE to reduce the dimension from 256 to 2 and visualize these features in the 2D space. From left to right, the first image (with a red border) indicates the classical features. The second image (with a green border) illustrates the quantum features of these subjects without training with QIP Loss, and the last one demonstrates the quantum features optimized by QIP Loss.

Performance of feature extractor - $\mathcal {M}$

Since $\mathcal {M}$ is trained by a combination of ArcFace (Deng et al. 2019) and our proposed QIP Loss, it is important to evaluate the effectiveness of $\mathcal {M}$ and verify how QIP Loss affects to its performance. We follow the same evaluation protocol as in Deng et al. (2019). In particular, we evaluate the face verification accuracy of $\mathcal {M}$ on the IJBC (Maze et al. 2018) database. The results are reported in the Table 5. As the baseline, the performance of Resnet50 without using QIP Loss on IJBC is 96.140%. We observe a slight drop to 96.068 when incorporating QIP Loss with the factor by $\lambda = 0.5$. However, the lambda is increased to $\lambda = 0.9$, the performance is reduced by 4% approximately. The reason for that drop can be explained in the section above where the feature representation tends to collapse when increasing $\lambda $.

Table 5 Face verification accuracy of feature extractor $\mathcal {M}$ on IJBC database

Full size table

Comparison with classical method

Since the problem can be treated as a representation learning task, we compare our method to a classical machine learning approach in this section. Specifically, we choose the Support Vector Machine (SVM), a kernel-based feature representation method, for the comparison. Following (Schuld 2021), we implement a Quantum SVM algorithm that can be executed on a quantum computer. This algorithm comprises two main components: quantum encoding and measurement, i.e., Parameterized Quantum Circuit (PQC). Unlike the aforementioned training strategy, we do not train $\mathcal {M}$ jointly with Quantum SVM. Instead, we train the Quantum SVM separately, using classical features v as input to perform a classification task. After training, the corresponding quantum features are utilized to train the Quantum Clusformer $\mathcal {N}(\mathbf {\Phi }_i)$. The performance results are presented in Table 6. Using Quantum SVM for quantum feature representation results in a significant performance drop. It achieves $F_P$ and $F_B$ scores of 80.3% and 82.82%, respectively, which is about 7% lower than our proposed method approximately. This decline in performance is because Quantum SVM is designed for a close-set problem, whereas unsupervised clustering addresses an open-set problem. While Quantum SVM may provide a good quantum feature representation for the training set, it struggles with the testing set, leading to poor feature distinction and, consequently, the worst performance.

Table 6 Performance comparison with classical method Quantum SVM on 584K subject of MSCeleb-1 M

Full size table

7 Conclusion

This paper revisits the quantum visual feature encoding strategies employed in quantum machine learning with computer vision applications. We identify a significant Quantum Information Gap (QIG) issue stemming from current encoding methods, resulting in non-discriminative feature representations in the quantum space, thereby challenging quantum machine learning algorithms. To tackle this challenge, we propose a simple yet effective solution called Quantum Information Preserving Loss. Through empirical experiments conducted on various large-scale datasets, we demonstrate the effectiveness of our approach, achieving state-of-the-art performance in clustering problems on quantum machines. Our insights into quantum encoding strategies are poised to stimulate further research efforts in this domain, prompting researchers to focus on designing more effective quantum machine learning algorithms.

8 Discussion

Since quantum machines have limited access to the general public, the experiments were carried out through noise-free simulation systems such as torchquantum and cuQuantum. However, real-world scenarios may involve noise within the system, leading to uncertain quantum state measurements and affecting overall performance. Despite this limitation, the theoretical problem of QIG persists. It is crucial to figure out that quantum machine learning algorithms must confront these dual challenges of QIP and noise. We anticipate that addressing these issues will attract significant research attention in future endeavors.

Data availability

The MSCeleb-1 M Guo et al. (2016) is no longer available due to ethical and privacy concerns. The Google Landmark Database Weyand et al. (2020) is publicly available at https://github.com/cvdfoundation/google-landmark.

Code availability

The code is published at: https://github.com/uark-cviu/QuantumVisualFeatureEncodingRevisited

Notes

Code will be released upon acceptance.

References

Arrigoni F, Menapace W, Benkner MS, Ricci E, Golyanik V (2022) Quantum motion segmentation. In: European Conference on Computer Vision, pp 506–523. Springer
Arute F, Arya K, Babbush R, Bacon D, Bardin JC, Barends R, Biswas R, Boixo S, Brandao FG, Buell DA et al (2019) Quantum supremacy using a programmable superconducting processor. Nature 574(7779):505–510
Article Google Scholar
Barkoutsos PK, Nannicini G, Robert A, Tavernelli I, Woerner S (2020) Improving variational quantum optimization using CVaR. Quantum 4:256
Article Google Scholar
Benedetti M, Lloyd E, Sack S, Fiorentini M (2019) Parameterized quantum circuits as machine learning models. Quantum Sci Technol 4(4):043001
Article Google Scholar
Benkner MS, Golyanik V, Theobalt C, Moeller M (2020) Adiabatic quantum graph matching with permutation matrix constraints. In: 2020 International conference on 3D vision (3DV), pp 583–592. IEEE
Benkner MS, Lähner Z, Golyanik V, Wunderlich C, Theobalt C, Moeller M (2021) Q-match: iterative shape matching via quantum annealing. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 7586–7596
Bharti K, Cervera-Lierta A, Kyaw TH, Haug T, Alperin-Lea S, Anand A, Degroote M, Heimonen H, Kottmann JS, Menke T et al (2022) Noisy intermediate-scale quantum algorithms. Rev Mod Phys 94(1):015004
Biamonte J, Wittek P, Pancotti N, Rebentrost P, Wiebe N, Lloyd S (2017) Quantum machine learning. Nature 549(7671):195–202
Article Google Scholar
Birdal T, Golyanik V, Theobalt C, Guibas LJ (2021) Quantum permutation synchronization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13122–13133
Cavallaro G, Willsch D, Willsch M, Michielsen K, Riedel M (2020) Approaching remote sensing image classification with ensembles of support vector machines on the d-wave quantum annealer. In: IGARSS 2020-2020 IEEE international geoscience and remote sensing symposium, pp 1973–1976. IEEE
Cerezo M, Arrasmith A, Babbush R, Benjamin SC, Endo S, Fujii K, McClean JR, Mitarai K, Yuan X, Cincio L et al (2021) Variational quantum algorithms. Nature Reviews. Physics 3(9):625–644
Google Scholar
Chen SY-C, Yoo S, Fang Y-LL (2022) Quantum long short-term memory. In: ICASSP 2022-2022 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 8622–8626. IEEE
Ciliberto C, Herbster M, Ialongo AD, Pontil M, Rocchetto A, Severini S, Wossnig L (2018) Quantum machine learning: a classical perspective. Proc R Soc A: Math Phys Eng Sci 474(2209):20170551
Article MathSciNet Google Scholar
Cong I, Choi S, Lukin MD (2019) Quantum convolutional neural networks. Nat Phys 15(12):1273–1278
Date P, Schuman C, Patton R, Potok T (2020) A classical-quantum hybrid approach for unsupervised probabilistic machine learning. In: Advances in Information and Communication: Proceedings of the 2019 future of information and communication conference (FICC), Volume 2, pp 98–117. Springer
Dendukuri A, Luu K (2018) Image processing in quantum computers. arXiv:1812.11042
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp 248–255. IEEE
Deng J, Guo J, Xue N, Zafeiriou S (2019) Arcface: additive angular margin loss for deep face recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4690–4699
Du Y, Huang T, You S, Hsieh M-H, Tao D (2020) Quantum circuit architecture search: error mitigation and trainability enhancement for variational quantum solvers. arXiv:2010.10217
Ester M, Kriegel H-P, Sander J, Xu X, et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Kdd, vol 96, pp 226–231
Golyanik V, Theobalt C (2020) A quantum computational approach to correspondence problems on point sets. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9182–9191
Grant E, Benedetti M, Cao S, Hallam A, Lockhart J, Stojevic V, Green AG, Severini S (2018) Hierarchical quantum classifiers. npj Quantum. Information 4(1):65
Google Scholar
Guo Y, Zhang L, Hu Y, He X, Gao J (2016) Ms-celeb-1m: a dataset and benchmark for large-scale face recognition. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part III 14, pp 87–102. Springer
Harrow AW, Montanaro A (2017) Quantum computational supremacy. Nature 549(7671):203–209
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778
Ho J, Yang M-H, Lim J, Lee K-C, Kriegman D (2003) Clustering appearances of objects under varying illumination conditions. In: 2003 IEEE computer society conference on computer vision and pattern recognition, 2003. Proceedings., vol 1. IEEE
Li J, Ghosh S (2020) Quantum-soft qubo suppression for accurate object detection. In: European conference on computer vision, pp 158–173. Springer
Li J, Li D, Xiong C, Hoi S (2022) Blip: bootstrapping language-image pre-training for unified vision-language understanding and generation. In: International conference on machine learning, pp 12888–12900. PMLR
Liu J, Lim KH, Wood KL, Huang W, Guo C, Huang H-L (2021) Hybrid quantum-classical convolutional neural networks. Sci China- Phys Mech Astron 64(9):290311
Article Google Scholar
Liu Z, Mao H, Wu C-Y, Feichtenhofer C, Darrell T, Xie S (2022) A convnet for the 2020s. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11976–11986
Liu J, Qiu D, Yan P, Wei X (2021) Learn to cluster faces via pairwise classification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3845–3853
Lloyd S (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–137
Article MathSciNet Google Scholar
Lloyd S, Mohseni M, Rebentrost P (2013) Quantum algorithms for supervised and unsupervised machine learning. arXiv:1307.0411
Loshchilov I, Hutter F (2016) Sgdr: stochastic gradient descent with warm restarts. arXiv:1608.03983
Loshchilov I, Hutter F (2017) Decoupled weight decay regularization. arXiv:1711.05101
Luo Z, Zhao P, Xu C, Geng X, Shen T, Tao C, Ma J, Lin Q, Jiang D (2023) Lexlip: lexicon-bottlenecked language-image pre-training for large-scale image-text sparse retrieval. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 11206–11217
Maze B, Adams J, Duncan JA, Kalka N, Miller T, Otto C, Jain AK, Niggel WT, Anderson J, Cheney J, Grother P (2018) Iarpa janus benchmark - c: face dataset and protocol. In: 2018 International conference on biometrics (ICB), pp 158–165. https://doi.org/10.1109/ICB2018.2018.00033
Nguyen H-Q, Truong T-D, Nguyen XB, Dowling A, Li X, Luu K (2023) Insect-foundation: a foundation model and large-scale 1m dataset for visual insect understanding. arXiv:2311.15206
Nguyen XB, Bisht A, Churchill H, Luu K (2022) Two-dimensional quantum material identification via self-attention and soft-labeling in deep learning. arXiv:2205.15948
Nguyen X-B, Bui DT, Duong CN, Bui TD, Luu K (2021) Clusformer: a transformer based clustering approach to unsupervised large-scale face and visual landmark recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10847–10856
Nguyen X-B, Duong CN, Li X, Gauch S, Seo H-S, Luu K (2023) Micron-BERT: BERT-based facial micro-expression recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1482–1492
Nguyen X-B, Duong CN, Savvides M, Roy K, Churchill H, Luu K (2023) Fairness in visual clustering: a novel transformer clustering approach. arXiv:2304.07408
Nguyen X-B, Li X, Khan SU, Luu K (2023) Brainformer: modeling MRI brain functions to machine vision. arXiv:2312.00236
Nguyen X-B, Liu X, Li X, Luu K (2023) The Algonauts project 2023 challenge: Uark-Ualbany team solution. arXiv:2308.00262
Nguyen X-B, Nguyen H-Q, Chen SY-C, Khan SU, Churchill H, Luu K (2024) Qclusformer: a quantum transformer-based framework for unsupervised visual clustering. arXiv:2405.19722
Nguyen XB, Thompson B, Churchill H, Luu K, Khan SU (2023) Quantum vision clustering. arXiv:2309.09907
Nguyen, X.-B., Lee, G.-S., Kim, S.-H., Yang, H.-J (2019) Audio-video based emotion recognition using minimum cost flow algorithm. In: 2019 IEEE/CVF international conference on computer vision workshop (ICCVW), pp 3737–3741. IEEE
Nguyen X-B, Lee GS, Kim SH, Yang HJ (2020) Self-supervised learning based on spatial awareness for medical image analysis. IEEE Access 8:162973–162981
Article Google Scholar
Nguyen-Xuan B, Lee G-S (2019) Sketch recognition using LSTM with attention mechanism and minimum cost flow algorithm. Int J Contents 15(4):8–15
Google Scholar
Nielsen MA, Chuang IL (2001) Quantum computation and quantum information. Phys Today 54(2):60
Google Scholar
Noormandipour M, Wang H (2022) Matching point sets with quantum circuit learning. In: ICASSP 2022-2022 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 8607–8611. IEEE
O’Malley D, Vesselinov VV, Alexandrov BS, Alexandrov LB (2018) Nonnegative/binary matrix factorization with a d-wave quantum annealer. PloS One 13(12):0206653
Article Google Scholar
Ostaszewski M, Trenkwalder LM, Masarczyk W, Scerri E, Dunjko V (2021) Reinforcement learning for optimization of variational quantum circuit architectures. Adv Neural Inf Process Syst 34:18182–18194
Google Scholar
Ostaszewski M, Grant E, Benedetti M (2021) Structure optimization for parameterized quantum circuits. Quantum 5:391
Article Google Scholar
Otto C, Wang D, Jain AK (2017) Clustering millions of faces by identity. IEEE Trans Pattern Anal Mach Intell 40(2):289–303
Article Google Scholar
Preskill J (2018) Quantum computing in the NISQ era and beyond. Quantum 2:79
Article Google Scholar
Romero J, Aspuru-Guzik A (2021) Variational quantum generators: generative adversarial quantum machine learning for continuous distributions. Adv Quantum Technol 4(1):2000003
Article Google Scholar
Schuld M (2021) Supervised quantum machine learning models are kernel methods. arXiv:2101.11020
Schuld M, Sinayskiy I, Petruccione F (2015) An introduction to quantum machine learning. Contemp Phys 56(2):172–185
Article Google Scholar
Schuld M, Bocharov A, Svore KM, Wiebe N (2020) Circuit-centric quantum classifiers. Phys Rev A 101(3):032308
Article MathSciNet Google Scholar
Sculley D (2010) Web-scale k-means clustering. In: Proceedings of the 19th international conference on world wide web, pp 1177–1178
Serna-Aguilera M, Nguyen XB, Singh A, Rockers L, Park S-W, Neely L, Seo H-S, Luu K (2024) Video-based autism detection with deep learning. In: 2024 IEEE Green Technologies Conference (GreenTech), pp 159–161. IEEE
Shen S, Li W, Wang X, Zhang D, Jin Z, Zhou J, Lu J (2023) Clip-cluster: clip-guided attribute hallucination for face clustering. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 20786–20795
Shen S, Li W, Zhu Z, Huang G, Du D, Lu J, Zhou J (2021) Structure-aware face clustering on a large-scale graph with 107 nodes. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9085–9094
Shin J, Lee H-J, Kim H, Baek J-H, Kim D, Koh YJ (2023) Local connectivity-based density estimation for face clustering. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13621–13629
Sibson R (1973) Slink: an optimally efficient algorithm for the single-link cluster method. Comput J 16(1):30–34
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30
Wang H, Ding Y, Gu J, Li Z, Lin Y, Pan DZ, Chong FT, Han S (2022) Quantumnas: noise-adaptive search for robust quantum circuits. In: The 28th IEEE international symposium on high-performance computer architecture (HPCA-28)
Wang T, Lin K, Li L, Lin C-C, Yang Z, Zhang H, Liu Z, Wang L (2023) Equivariant similarity for vision-language foundation models. arXiv:2303.14465
Wang H, Wang Y, Zhou Z, Ji X, Gong D, Zhou J, Li Z, Liu W (2018) Cosface: large margin cosine loss for deep face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5265–5274
Wang Y, Zhang Y, Zhang F, Lin M, Zhang Y, Wang S, Sun X (2022) Ada-nets: face clustering via adaptive neighbour discovery in the structure space. arXiv:2202.03800
Wang Z, Zheng L, Li Y, Wang S (2019) Linkage based face clustering via graph convolution network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1117–1125
Weyand T, Araujo A, Cao B, Sim J (2020) google landmarks dataset v2 - a large-scale benchmark for instance-level recognition and retrieval. In: Proceeding CVPR
Yang L, Chen D, Zhan X, Zhao R, Loy CC, Lin D (2020) Learning to cluster faces via confidence and connectivity estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13369–13378
Yang L, Zhan X, Chen D, Yan J, Loy CC, Lin D (2019) Learning to cluster faces on an affinity graph. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2298–2306
Yu J, Wang Z, Vasudevan V, Yeung L, Seyedhosseini M, Wu Y (2022) Coca: contrastive captioners are image-text foundation models. arXiv:2205.01917
Zhai, X., Mustafa, B., Kolesnikov, A., Beyer, L.: Sigmoid loss for language image pre-training. arXiv preprint arXiv:2303.15343 (2023)
Zhang S-X, Hsieh C-Y, Zhang S, Yao H (2022) Differentiable quantum architecture search. Quantum Sci Technol 7(4):045023
Article Google Scholar
Zhan X, Liu Z, Yan J, Lin D, Loy CC (2018) Consensus-driven propagation in massive unlabeled data for face recognition. In: Proceedings of the European conference on computer vision (ECCV), pp 568–583
Zhong H-S, Wang H, Deng Y-H, Chen M-C, Peng L-C, Luo Y-H, Qin J, Wu D, Ding X, Hu Y et al (2020) Quantum computational advantage using photons. Science 370(6523):1460–1463
Article Google Scholar

Download references

Acknowledgements

This work is partly supported by MonArk NSF Quantum Foundry, supported by the National Science Foundation Q-AMASE-i program under NSF award No. DMR-1906383. It acknowledges the Arkansas High-Performance Computing Center for providing GPUs.

Author information

Authors and Affiliations

Department of Electrical Engineering and Computer Science, University of Arkansas, Fayetteville, 72703, AR, USA
Xuan-Bac Nguyen, Hoang-Quan Nguyen & Khoa Luu
Department of Physics, University of Arkansas, Fayetteville, 72703, AR, USA
Hugh Churchill
Department of Electrical & Computer Engineering, Mississippi State University, Starkville, 39762, MS, USA
Samee U. Khan
MonArk NSF Quantum Foundry, Fayetteville, AR, USA
Xuan-Bac Nguyen, Hoang-Quan Nguyen, Hugh Churchill & Khoa Luu

Authors

Xuan-Bac Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Hoang-Quan Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Hugh Churchill
View author publications
You can also search for this author in PubMed Google Scholar
Samee U. Khan
View author publications
You can also search for this author in PubMed Google Scholar
Khoa Luu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

X.B wrote the main manuscript. H.Q prepared pseudo code, result tables, and experiment setups. H.C and S.K provided fundamental materials of the quantum machine. K.L discussed the novelty and the research direction. All the authors revised the manuscript.

Corresponding author

Correspondence to Xuan-Bac Nguyen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Nguyen, XB., Nguyen, HQ., Churchill, H. et al. Quantum visual feature encoding revisited. Quantum Mach. Intell. 6, 61 (2024). https://doi.org/10.1007/s42484-024-00192-x

Download citation

Received: 10 June 2024
Accepted: 17 August 2024
Published: 17 September 2024
DOI: https://doi.org/10.1007/s42484-024-00192-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Quantum visual feature encoding revisited

Abstract

Similar content being viewed by others

Quantum Computing for Computer Vision: Applications, Challenges, and Research Tracks

A novel image classification framework based on variational quantum algorithms

Applications of Quantum Embedding in Computer Vision

Explore related subjects

1 Introduction

Contributions of this work

2 Related work

2.1 Quantum computer vision

2.2 Hybrid classical-quantum machine learning

3 Background

3.1 Quantum basics

3.2 Limitations in current quantum encoding methods

Proposition 1

Proof

3.3 Theoretical analysis and problem visualization

Theoretical analysis

Problem visualization

4 Our proposed approach

4.1 Problem formulation

4.2 Quantum information preserving loss

5 Experiment setup and implementation

5.1 Experiment setup

5.2 Implementation details

5.3 Datasets and metrics

5.3.1 Datasets

MSCeleb-1 M

The Google Landmarks Dataset Version 2 (GLDv2)

5.3.2 Metrics

6 Experimental results

6.1 Performance on MSCeleb-1 M clustering

6.2 Performance on Google Landmark clustering

6.3 Ablation studies

QIP works with different encoding strategies

QIP works with different observables

The role of \(\lambda \) - QIP loss factor

Quantum feature representations

Performance of feature extractor - \(\mathcal {M}\)

Comparison with classical method

7 Conclusion

8 Discussion

Data availability

Code availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation