Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
\copyrightclause

Copyright for this paper by its authors.
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).

\conference

CLEF 2024: Conference and Labs of the Evaluation Forum, September 9–12, 2024, Grenoble, France

[email=s4068570@student.rmit.edu.au ]

[email=hey.jieli@gmail.com ]

[email=ke.deng@rmit.edu.au ]

[email=yongli.ren@rmit.edu.au ]

CRUISE on Quantum Computing for Feature Selection in Recommender Systems

Jiayang Niu    Jie Li    Ke Deng    Yongli Ren School of Computing Technologies, RMIT University, Melbourne, Victoria 3000
(2024)

Notebook for the QuantumCLEF Lab at CLEF 2024

Jiayang Niu    Jie Li    Ke Deng    Yongli Ren School of Computing Technologies, RMIT University, Melbourne, Victoria 3000
(2024)
Abstract

Using Quantum Computers to solve problems in Recommender Systems that classical computers cannot address is a worthwhile research topic. In this paper, we use Quantum Annealers to address the feature selection problem in recommendation algorithms. This feature selection problem is a Quadratic Unconstrained Binary Optimization (QUBO) problem. By incorporating Counterfactual Analysis, we significantly improve the performance of the item-based KNN recommendation algorithm compared to using pure Mutual Information. Extensive experiments have demonstrated that the use of Counterfactual Analysis holds great promise for addressing such problems.

keywords:
Quantum Computers \sepRecommender Systems \sepCounterfactual Analysis \sepFeature Selection

1 Introduction

Collaborative filtering technology [1, 2], which predicts potential user-item interactions based on the patterns of user behavior and item characteristics, is widely applied in recommendation algorithms, Some well-known techniques in this field include matrix factorization methods [3], neighborhood-based methods [4], deep learning approaches [5, 6], graph-based techniques [7, 8], factorization machines [9], hybrid methods [10], Bayesian methods [11], and large language models (LLMs) [12]. However, collaborative filtering technology [1] heavily relies on the quality of data. For instance, using user profiles, item features, reviews, images, and other information can significantly improve the performance of recommendation algorithms, but in some cases, it can also decrease their performance. Therefore, it’s critical to distinguish what information are useful for recommendations so as to help the the construction of efficient systems and reduction of energy consumption [13, 14, 15, 16]. Quantum computers, with its use of qubits and quantum effects like superposition, entanglement, and quantum tunneling, is an effective tool for identifying useful information from redundant data [17]. It significantly enhances the processing speed of search problems and large integer factorization [18]. Therefore, in this paper, we aim to find useful features for recommendations by leveraging quantum computing techniques. Our goal is to improve the efficiency and accuracy of recommendation systems by identifying and utilizing relevant data, thereby reducing computational requirements and energy consumption [18, 19, 20].

In QuantumCLEF 2024, we focus on Task 1B, where 150 and 500 features are provided for each item, respectively[21, 22]. We will analyze these features to extract the most relevant ones for recommender systems. The task requires participants to use Quantum Annealing and Simulated Annealing to select appropriate features from the given data for an Item-Based KNN recommendation algorithm (Item-KNN). The organizers provided an example of feature selection by using Mutual Information [18]. However, our preliminary experiments showed that using only Mutual Information for feature selection resulted in limited improvement in the performance of Item-KNN compared to using all features without any selection. This is because Mutual Information only reflects the mutual relationship between two variables and is not associated with the final goal of the recommendation algorithm. Therefore, to achieve better performance, we propose taking the impact of features on recommendation quality into consideration when performing feature selection.

One approach to achieve this is through Counterfactual Analysis [23], which is a causal research tool to examine the impact of a factor on the final result by hypothesizing the absence or alteration of that factor. This approach mainly considers three aspects: Which factors need to be evaluated? What metrics are used to assess the impact of these factors on the model’s outcomes? And what models are used to derive the values of these metrics? In this work, due to the limited time for this task, we aim to measure and explore the impact of item features by Counterfactually Analyzing their effect on nDCG [24] performance of recommendation lists and we chose the KNN-based recommendation algorithm, a commonly used method in collaborative filtering, to perform these measurements. Specifically, we used Item-KNN to derive the change in nDCG values after removing a specific item feature. Since Mutual Information can reflect the relationship between two features, which may positively affects the final results, we did not discard it. Instead, we integrated the results of Counterfactual Analysis into Mutual Information using a temperature coefficient, which is used to control the influence of Counterfactual Analysis on the final results. Given the current limitations on the number of qubits in Quantum Computers, directly performing Quantum Annealing on 500 variables remains a challenging task. Therefore, in this task, we first partitioned the 500 features into subsets manageable by the Quantum Computer, and then combined the results.

The paper is organized as follows: Section 2 introduces related works; Section 3 describes the QUBO formulation, how Mutual Information is applied to QUBO for feature selection, and our proposed method of using Counterfactual Analysis for feature selection in QUBO; Section 4 explains our experimental setup and experimental result; Section 5 discusses our main findings; finally, Section 6 draws some conclusions and outlooks for future work.

2 Related Work

2.1 Quantum Computers

In recent years, the rapid development of Quantum Computers has demonstrated their tremendous potential in solving problems that Classical Computer cannot address, such as NP and NP-hard problems [25]. Based on their functionality and application scenarios, Quantum Computers can be categorized into Universal Quantum Computers, Quantum Annealers, Quantum Machine Learning Accelerators, and others [26]. Recent studies have utilized Quantum Annealers for feature selection to enhance the performance of recommendation systems or retrieval systems [27, 28, 18]. Nembrini et al. [27] attempted to apply Quantum Computers to recommendation systems by using Quantum Annealing to solve a hybrid feature selection approach. Their work demonstrates that current Quantum Computers are already capable of addressing real-world recommendation system problems. Nikitin et.al.[28] reproduced Nembrini’s work and employed Tensor Train-based Optimization (TTOpt) as an optimizer for the cold start problem in recommendation systems. MIQUBO [18] discussed the problem of feature selection using Quantum Computers and formalizes it as a Quadratic Unconstrained Binary Optimization (QUBO) problem. It demonstrates the potential of Quantum Computers to solve ranking and classification problems more efficiently.

2.2 Counterfactual Analysis

Existing deep learning models have complex decision-making processes that are difficult for people to understand, often functioning as black-box models, Counterfactual Analysis is a highly effective method for helping people understand these complex models and robust them [29]. For example, CF2superscriptCF2\textbf{CF}^{2}CF start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT [30] used Counterfactual Analysis to explore the explanations of Graph Neural Networks. In recommender systems, Counterfactual Analysis is primarily used for explainability and to combat data sparsity. ACCENT [31] was the first to apply Counterfactual Analysis to neural network-based recommendation algorithms. CountER [32] utilizes Counterfactual Analysis to construct a low-complexity, high-strength model for explaining recommendation systems. It also highlights that using Counterfactual Analysis contributes to the interpretability and evaluation of recommendation systems. Zhang et al [33] designed a CauseRec framework that utilizes Counterfactual to enhance representations in the data distribution, aiming to mitigate data sparsity.

In summary, Counterfactual Analysis can help people understand complex deep learning decision systems and has the potential to analyze how various factors interact in recommendation systems. Given the current advancements in Quantum Computers, utilizing Counterfactual Analysis combined with the ability of Quantum Computers to handle NP problems presents a promising direction.

3 Methodology

3.1 Preliminary

3.1.1 QUBO Formulation

In this work, we follow the approach described in [18], which utilizes Quantum Annealing for feature selection. To apply these methods, the feature selection problem is formulated as a Quadratic Unconstrained Binary Optimization (QUBO) problem. The QUBO formulation can be used to solve certain NP and NP-hard optimization problems and is defined as follows [18]:

minY=xTQx,𝑌superscript𝑥𝑇𝑄𝑥\min Y=x^{T}Qx,roman_min italic_Y = italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_Q italic_x , (1)

where x𝑥xitalic_x is a binary vector of length m𝑚mitalic_m, with each element of the vector being either 0 or 1. Q𝑄Qitalic_Q is a symmetric matrix, where each element represents the relationship between the elements of x𝑥xitalic_x. m𝑚mitalic_m denotes the number of features to be selected. In other words, the elements of vector x𝑥xitalic_x indicate whether the corresponding features are selected, and the elements in Q𝑄Qitalic_Q influence the search direction of the function, determining feature selection.

3.1.2 Feature Selection Based on Mutual Information

Following [18], Mutual Information QUBO (MIQUBO) is a quadratic feature selection model based on Mutual Information. MIQUBO aims to maximize the Mutual Information, which measures the dependency between two variables, and the Conditional Mutual Information, which measures the dependency between two variables given a target variable, of the selected features. In this context, the matrix Q𝑄Qitalic_Q in Equation 1 is defined as:

Qij={CMI(fi;yfj)if ijMI(fi;y)if i=j,subscript𝑄𝑖𝑗casesCMIsubscript𝑓𝑖conditional𝑦subscript𝑓𝑗if 𝑖𝑗MIsubscript𝑓𝑖𝑦if 𝑖𝑗Q_{ij}=\begin{cases}-\text{CMI}(f_{i};y\mid f_{j})&\text{if }i\neq j\\ -\text{MI}(f_{i};y)&\text{if }i=j,\end{cases}italic_Q start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = { start_ROW start_CELL - CMI ( italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; italic_y ∣ italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) end_CELL start_CELL if italic_i ≠ italic_j end_CELL end_ROW start_ROW start_CELL - MI ( italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; italic_y ) end_CELL start_CELL if italic_i = italic_j , end_CELL end_ROW (2)

where MI(fi;y)MIsubscript𝑓𝑖𝑦\text{MI}(f_{i};y)MI ( italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; italic_y ) is the Mutual Information between feature fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and target feature y𝑦yitalic_y, and CMI(fi;yfj)CMIsubscript𝑓𝑖conditional𝑦subscript𝑓𝑗\text{CMI}(f_{i};y\mid f_{j})CMI ( italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; italic_y ∣ italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) is the Conditional Mutual Information between feature fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and target feature y𝑦yitalic_y given feature fjsubscript𝑓𝑗f_{j}italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. Since QUBO formulation is used to find the minimum state, a negative sign is required before MI and CMI.

To control the number of selected features, a penalty term is added to Equation 1, which is then transformed to:

minY=xTQx+(i=1Nxik)2.𝑌superscript𝑥𝑇𝑄𝑥superscriptsuperscriptsubscript𝑖1𝑁subscript𝑥𝑖𝑘2\min Y=x^{T}Qx+\left(\sum_{i=1}^{N}x_{i}-k\right)^{2}.roman_min italic_Y = italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_Q italic_x + ( ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_k ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (3)

This formula will be minimized when selecting k𝑘kitalic_k features, this also following the descriptions in [18].

3.2 Counterfactual Analysis

To better identify features directly associated with recommendation performance, we integrate a widely used recommendation ranking metric into Mutual Information through Counterfactual Analysis.

3.2.1 Counterfactual Analysis for Feature Selection

Counterfactual Analysis [23] is usually used to examine the causal relationship between conditions, decisions, and outcomes by hypothesizing how the results of observed events would change if the conditions and decisions were altered. In the field of Recommender System, Counterfactual Analysis is often used for the interpretability of recommendation models, helping researchers enhance algorithm performance [32, 33]. Inspired by existing works [32, 33], the impact of item features can be explored by excluding the corresponding feature and analyzing the difference in recommendation performance between the recommendation lists generated by the model with and without the corresponding feature.

In this work, we use the widely used Item-KNN recommendation algorithm, termed as model G𝐺Gitalic_G, and employ the recommendation performance metric Normalized Discounted Cumulative Gain (nDCG) [24] for Counterfactual Analysis. nDCG is defined as:

Ei=nDCGG(F)nDCGG(Ffi),subscriptE𝑖subscriptnDCG𝐺FsubscriptnDCG𝐺Fsubscript𝑓𝑖\text{E}_{i}=\text{nDCG}_{G(\text{F})}-\text{nDCG}_{G(\text{F}\setminus{f_{i}}% )},E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = nDCG start_POSTSUBSCRIPT italic_G ( F ) end_POSTSUBSCRIPT - nDCG start_POSTSUBSCRIPT italic_G ( F ∖ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT , (4)

where Eisubscript𝐸𝑖E_{i}italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT represents the change in the nDCG result of the recommendation model G𝐺Gitalic_G after removing the feature fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. nDCGG(F)subscriptnDCG𝐺F\text{nDCG}_{G(\text{F})}nDCG start_POSTSUBSCRIPT italic_G ( F ) end_POSTSUBSCRIPT represents the nDCG@10 value obtained by the G𝐺Gitalic_G using all item features set F𝐹Fitalic_F, while nDCGG(Ffi)subscriptnDCG𝐺Fsubscript𝑓𝑖\text{nDCG}_{G(\text{F}\setminus{f_{i}})}nDCG start_POSTSUBSCRIPT italic_G ( F ∖ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT represents the nDCG@10 value obtained by the G𝐺Gitalic_G using features set which is set F𝐹Fitalic_F removing feature i𝑖iitalic_i. It is important to note that Eisubscript𝐸𝑖E_{i}italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ultimately reflects the impact of feature i𝑖iitalic_i on the result. Since the final outcome is influenced by the interactions between all features, simply removing features with positive Eisubscript𝐸𝑖E_{i}italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT values does not yield the optimal feature selection solution.

When Ei0subscript𝐸𝑖0E_{i}\geq 0italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≥ 0, it indicates that the algorithm’s performance decreases after removing the feature i𝑖iitalic_i. The extent of this decrease reflects the positive impact of this feature on the algorithm. Conversely, an increase in the value reflects the negative impact of this feature on the algorithm. We hypothesize that if the selected set of features is set(F)𝑠𝑒𝑡superscript𝐹set(F^{*})italic_s italic_e italic_t ( italic_F start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ), the maximization the sum of Eisubscript𝐸𝑖E_{i}italic_E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (iset(F)𝑖𝑠𝑒𝑡superscript𝐹i\in set(F^{*})italic_i ∈ italic_s italic_e italic_t ( italic_F start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT )), the maximization the performance improvement of the baseline algorithm. Since the QUBO problem is a minimization optimization problem, we redefine Q𝑄Qitalic_Q as follows:

Qij={CMI(fi;yfj)if ijMI(fi;y)λEiif i=jsubscript𝑄𝑖𝑗cases𝐶𝑀𝐼subscript𝑓𝑖conditional𝑦subscript𝑓𝑗if 𝑖𝑗𝑀𝐼subscript𝑓𝑖𝑦𝜆subscriptE𝑖if 𝑖𝑗Q_{ij}=\begin{cases}-CMI(f_{i};y\mid f_{j})&\text{if }i\neq j\\ -MI(f_{i};y)-\lambda\text{E}_{i}&\text{if }i=j\end{cases}italic_Q start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = { start_ROW start_CELL - italic_C italic_M italic_I ( italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; italic_y ∣ italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) end_CELL start_CELL if italic_i ≠ italic_j end_CELL end_ROW start_ROW start_CELL - italic_M italic_I ( italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; italic_y ) - italic_λ E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL start_CELL if italic_i = italic_j end_CELL end_ROW (5)

where λ𝜆\lambdaitalic_λ is a coefficient used to control the influence of E𝐸Eitalic_E on the search results. The larger the value of λ𝜆\lambdaitalic_λ, the greater the influence of E𝐸Eitalic_E on the final results. The overall process of the above algorithm, which we refer to as Counterfactual Analysis QUBO (CAQUBO), is as follows in Algorithm 1.

Algorithm 1 Counterfactual Analysis QUBO
1:Initialize variable set E, set F, nlen(F)𝑛𝑙𝑒𝑛Fn\leftarrow len(\text{F})italic_n ← italic_l italic_e italic_n ( F ), k𝑘kitalic_k, Q𝑄Qitalic_Q, λ𝜆\lambdaitalic_λ
2:procedure Calculate EisubscriptE𝑖\text{E}_{i}E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT
3:     for fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in F do
4:         FFsuperscriptFF\text{F}^{\text{'}}\leftarrow\text{F}F start_POSTSUPERSCRIPT ’ end_POSTSUPERSCRIPT ← F
5:         FsuperscriptF\text{F}^{\text{'}}F start_POSTSUPERSCRIPT ’ end_POSTSUPERSCRIPT.pop(fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT)
6:         EisubscriptE𝑖absent\text{E}_{i}\leftarrowE start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ←G(F) - G(F)GsuperscriptF\text{G}(\text{F}^{\text{'}})G ( F start_POSTSUPERSCRIPT ’ end_POSTSUPERSCRIPT )
7:     end for
8:     return E
9:end procedure
10:procedure Feature Selection
11:     Calculate MI and CMI
12:     for fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in F do
13:         Qii=MI(fi;y)λEisubscript𝑄𝑖𝑖MIsubscript𝑓𝑖𝑦𝜆subscriptE𝑖Q_{ii}=-\text{MI}(f_{i};y)-\lambda\text{E}_{i}italic_Q start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT = - MI ( italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; italic_y ) - italic_λ E start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT
14:     end for
15:     for fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in F do
16:         for fjsubscript𝑓𝑗f_{j}italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT in F do
17:              Qij=CMI(fi;yfj)subscript𝑄𝑖𝑗CMIsubscript𝑓𝑖conditional𝑦subscript𝑓𝑗Q_{ij}=-\text{CMI}(f_{i};y\mid f_{j})italic_Q start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = - CMI ( italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; italic_y ∣ italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT )
18:         end for
19:     end for
20:     set FsuperscriptFabsent\text{F}^{*}\leftarrowF start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ← QA or SA Qabsent𝑄\leftarrow Q← italic_Q and λ𝜆\lambdaitalic_λ # Input parameters Q𝑄Qitalic_Q and λ𝜆\lambdaitalic_λ into the Quantum Annealer.
21:     return set FsuperscriptF\text{F}^{*}F start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT # Selected Feature Set
22:end procedure

3.3 Handling Large Feature Set

Although Quantum Computers are developing rapidly, the limitation in the number of qubits restricts them to handling only a limited number of feature selection problems. For selecting from 500 features, we partition them into several subsets and use Quantum Annealing (QA) or Simulated Annealing (SA) to perform feature selection on these subsets individually, then combine the results.

First, partition the 500 features into n𝑛nitalic_n subsets by order, S1,S2,,Si,,Snsubscript𝑆1subscript𝑆2subscript𝑆𝑖subscript𝑆𝑛{S_{1},S_{2},\cdots,S_{i},\cdots,S_{n}}italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , ⋯ , italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, where Sisubscript𝑆𝑖S_{i}italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the i𝑖iitalic_i-th subset of features, and n𝑛nitalic_n is the number of subsets.

S1,S2,,Si,,Sn=divide(F)subscript𝑆1subscript𝑆2subscript𝑆𝑖subscript𝑆𝑛divide(F){S_{1},S_{2},\cdots,S_{i},\cdots,S_{n}}=\text{divide(F)}italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , ⋯ , italic_S start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = divide(F) (6)

Then, use Quantum Annealing (QA) or Simulated Annealing (SA) to perform feature selection on each subset, and combine the results:

S~=i=1nQA/SA(Si),~𝑆superscriptsubscript𝑖1𝑛QA/SAsubscript𝑆𝑖\tilde{S}=\bigcup_{i=1}^{n}\text{QA/SA}(S_{i}),over~ start_ARG italic_S end_ARG = ⋃ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT QA/SA ( italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , (7)

where S~~𝑆\tilde{S}over~ start_ARG italic_S end_ARG is the final selected features set, represents each partitioned subset of features, and QA/SA (S_i)𝑆_𝑖(S\_i)( italic_S _ italic_i ) represents the selected features from subset Sisubscript𝑆𝑖S_{i}italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT using QA and SA. The final feature set is obtained by merging the selected features from all subsets.

4 Experimental Setup

Datasets: In this work, two tasks are undertaken: the first involves selecting appropriate features from a set of 150 item features for training G𝐺Gitalic_G, and the second involves selecting features from a set of 500 item features. Three data sets are provided for these tasks: 150_ICM, 500_ICM, and URM. The 150_ICM and 500_ICM contain item features, while the URM includes interaction data between 1,890 users and 18,022 interacted items.

Experimental parameter setting: We used a self-implemented Item-KNN recommendation model based on the problem statement to calculate E𝐸Eitalic_E. The interaction data was split into training and test sets in an 80:20 ratio. It is worth noting that calculating E𝐸Eitalic_E is very time-consuming, so we only used a subset of items for the calculations. In the use of Quantum Annealing (QA) and Simulated Annealing(SA), the coefficient λ𝜆\lambdaitalic_λ significantly affects the features selected by QA and SA. Due to the limited usage time of the Quantum Annealer (QA), it is necessary to use Simulated Annealing (SA) to explore the effectiveness of the selected features under different parameters λ𝜆\lambdaitalic_λ and k𝑘kitalic_k before using QA. In preliminary experiment, we attempt [λ𝜆\bm{\lambda}bold_italic_λ: 0, 1e1, 1e3, 1e5, 1e7], [k: 50, 100, 130, 140, 145] in Feature 150 and [λ𝜆\bm{\lambda}bold_italic_λ: 0, 1e1, 1e3, 1e5, 1e7], [k: 300, 350, 400, 450, 470] in Feature 500. For the selection of 500 features, n (is mentioned in Section 3.3) is set to 5. The preliminary experiment results can be found in Table 1.

Repeated Calculations: Due to the heuristic nature of Simulated Annealing (SA) and Quantum Annealing (QA), the final results may vary even with fixed parameters. To mitigate this effect, we perform multiple iterations of QA and SA under the same parameters and select the final feature set via voting. For example, we repeated the experiment five times. fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT was not included in Fsuperscript𝐹F^{*}italic_F start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT in any of the five experiments, while fjsubscript𝑓𝑗f_{j}italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT was included in Fsuperscript𝐹F^{*}italic_F start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT in four out of the five experiments. Therefore, the final submitted feature set Fsuperscript𝐹F^{*}italic_F start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT does not include fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT but includes fjsubscript𝑓𝑗f_{j}italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT.

Table 1: nDCG@10 for Feature 150 and Feature 500 datasets individually using SA-based feature selection, with different numbers of selected features k𝑘kitalic_k and different coefficients λ𝜆\lambdaitalic_λ.

. k 50 100 130 140 145 300 350 400 450 470 λ𝜆\lambdaitalic_λ Feature 150 nDCG@10 Feature 500 nDCG@10 0 0.0602 0.0870 0.0968 0.1033 0.1018 0.1078 0.0894 0.0971 0.0969 0.0991 1 0.0870 0.0974 0.0999 0.1009 0.1029 0.1066 0.1108 0.1195 0.1291 0.1197 1e3 0.0755 0.1051 0.1151 0.1119 0.1152 0.1206 0.1249 0.1257 0.1305 0.1302 1e5 0.0878 0.1160 0.1232 0.1256 0.1180 0.1224 0.1238 0.1303 0.1290 0.1307 1e7 0.0795 0.1155 0.1221 0.1264 0.1180 0.1235 0.1218 0.1298 0.1306 0.1293 150 Feature nDCG 0.1028 500 Feature nDCG 0.0988

Table 2: This table contains the final data submitted to the organizers, with data sourced from the organizers’ website1superscriptwebsite1\text{website}^{\textbf{1}}website start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT. Due to the fact that when 𝝀𝝀\bm{\lambda}bold_italic_λ is too large, the values of elements in Q become excessively large, which is detrimental to the performance of QA and SA, a coefficient 𝝁𝝁\bm{\mu}bold_italic_μ is applied to all elements in Q. An asterisk (*) after the sub_ID indicates that the selected features are the result of repeated calculations. Those submissions was repeated five times to determine the final feature set.
150 Feature submissions All Feature nDCG 0.0810
Parameters set nDCG@10 Annealing Time Type nº features sub_id
k=140 λ𝜆{\lambda}italic_λ=1e7 μ𝜇\muitalic_μ=1e-5 0.0805 536250 Q 138 1
k=140 λ𝜆{\lambda}italic_λ=1e7 μ𝜇\muitalic_μ=1e-3 0.0826 528844 Q 136 2
k=140 λ𝜆{\lambda}italic_λ=1e7 μ𝜇\muitalic_μ=1e-3 0.0690 530804 Q 132 3
k=140 λ𝜆{\lambda}italic_λ=0 μ𝜇\muitalic_μ=1 0.0763 558321 Q 133 4
k=140 λ𝜆{\lambda}italic_λ=1e7 μ𝜇\muitalic_μ=1e-2 0.1003 1375068 Q 144 5superscript5\text{5}^{*}5 start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT
k=140 λ𝜆{\lambda}italic_λ=1e7 μ𝜇\muitalic_μ=1e-5 0.0998 1745487 S 140 1
k=140 λ𝜆{\lambda}italic_λ=1e7 μ𝜇\muitalic_μ=1e-3 0.0993 17357899 S 140 2
k=140 λ𝜆{\lambda}italic_λ=1e7 μ𝜇\muitalic_μ=1e-3 0.1001 1760252 S 140 3
k=140 λ𝜆{\lambda}italic_λ=0 μ𝜇\muitalic_μ=1 0.0793 17387227 S 140 4
k=140 λ𝜆{\lambda}italic_λ=1e7 μ𝜇\muitalic_μ=1e-2 0.1003 88395437 S 144 5superscript5\text{5}^{*}5 start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT
500 Feature submissions All Feature nDCG 0.0827
k=450 λ𝜆{\lambda}italic_λ=1e7 μ𝜇\muitalic_μ=1e-2 0.0757 2287019 Q 407 1
k=450 λ𝜆{\lambda}italic_λ=1e1 μ𝜇\muitalic_μ=1 0.0839 2122701 Q 397 2
k=450 λ𝜆{\lambda}italic_λ=1e7 μ𝜇\muitalic_μ=1e-2 0.1196 43339285 S 450 1
k=450 λ𝜆{\lambda}italic_λ=1e1 μ𝜇\muitalic_μ=1 0.1198 42776695 S 450 2
  • 1

    https://qclef.dei.unipd.it/clef2024-results.html

5 Results

Table 1 describes the performance in nDCG@10 of G𝐺Gitalic_G using features selected by QA and SA under different parameters λ𝜆\lambdaitalic_λ and k𝑘kitalic_k. When λ=0𝜆0\lambda=0italic_λ = 0, QA and SA select features based solely on Mutual Information (MI) and Conditional Mutual Information (CMI). Across different values of parameter k𝑘kitalic_k, the performance of selected features in G𝐺Gitalic_G rarely surpasses the performance in Counterfactual Analysis QUBO. As the parameter λ𝜆\lambdaitalic_λ increases, the performance of the features selected by QA and SA in the item-KNN shows significant improvement compared to using all features. The effectiveness of feature selection shows no significant improvement when λ>1e5𝜆1𝑒5\lambda>1e5italic_λ > 1 italic_e 5 . This may be because as the value of λ𝜆\lambdaitalic_λ increases, the impact of MI and CMI on feature selection diminishes, causing QA and SA to rely entirely on E𝐸Eitalic_E for feature selection.

Table 2 reflects the same situation: feature selection relying solely on MI and CMI does not surpass the performance in Counterfactual Analysis QUBO. After incorporating the counterfactual analysis-derived E𝐸Eitalic_E into Q𝑄Qitalic_Q, the features selected by QA and SA show a significant performance improvement in item-KNN compared to using all features. An unusual observation is that, under the same parameters, the features selected by QA generally do not perform as well as those selected by SA in item-KNN, and sometimes do not even surpass the performance of using all features. During the experiments, we noticed that this is due to QA often returning results before finding the optimal solution.

6 Conclusions and Future Work

In this paper, we present the explorations conducted by our team and the details of our final submission for the QuantumCLEF 2024 activities. We used Counterfactual Analysis of individual item features to select appropriate features for item-KNN using Quantum Annealing. Our preliminary experiments and the results returned by QuantumCLEF 2024 demonstrated that our use of Counterfactual Analysis significantly improved the performance of item-KNN.

Within the limited time of QuantumCLEF, we attempted Counterfactual Analysis of individual features. However, because the performance of collaborative filtering is actually the result of feature interactions, Counterfactual Analysis of individual features has significant limitations. Additionally, since Quantum Annealing cannot directly handle the selection of 500 features, we adopted a sequential partitioning and merging approach. As negative features are not uniformly distributed by their indices among all features, this sequential partitioning and merging method still requires improvement.

References

  • Su and Khoshgoftaar [2009] X. Su, T. M. Khoshgoftaar, A survey of collaborative filtering techniques, Advances in artificial intelligence 2009 (2009).
  • Lee et al. [2012] J. Lee, M. Sun, G. Lebanon, A comparative study of collaborative filtering algorithms, arXiv preprint arXiv:1205.3193 (2012).
  • Koenigstein et al. [2012] N. Koenigstein, P. Ram, Y. Shavitt, Efficient retrieval of recommendations in a matrix factorization framework, in: Proceedings of the 21st ACM international conference on Information and knowledge management, 2012, pp. 535–544.
  • Adeniyi et al. [2016] D. A. Adeniyi, Z. Wei, Y. Yongquan, Automated web usage data mining and recommendation system using k-nearest neighbor (knn) classification method, Applied Computing and Informatics 12 (2016) 90–108.
  • Hidasi et al. [2015] B. Hidasi, A. Karatzoglou, L. Baltrunas, D. Tikk, Session-based recommendations with recurrent neural networks, arXiv preprint arXiv:1511.06939 (2015).
  • Vaswani et al. [2017] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin, Attention is all you need, Advances in neural information processing systems 30 (2017).
  • Wang et al. [2019] X. Wang, X. He, M. Wang, F. Feng, T.-S. Chua, Neural graph collaborative filtering, in: Proceedings of the 42nd international ACM SIGIR conference on Research and development in Information Retrieval, 2019, pp. 165–174.
  • He et al. [2020] X. He, K. Deng, X. Wang, Y. Li, Y. Zhang, M. Wang, Lightgcn: Simplifying and powering graph convolution network for recommendation, in: Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, 2020, pp. 639–648.
  • Yuan et al. [2016] F. Yuan, G. Guo, J. M. Jose, L. Chen, H. Yu, W. Zhang, Lambdafm: Learning optimal ranking with factorization machines using lambda surrogates, in: Proceedings of the 25th ACM international on conference on information and knowledge management, 2016, pp. 227–236.
  • Adomavicius and Tuzhilin [2005] G. Adomavicius, A. Tuzhilin, Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions, IEEE transactions on knowledge and data engineering 17 (2005) 734–749.
  • Lopes et al. [2016] R. Lopes, R. Assunção, R. L. Santos, Efficient bayesian methods for graph-based recommendation, in: Proceedings of the 10th ACM Conference on Recommender Systems, 2016, pp. 333–340.
  • Yang et al. [2022] Y. Yang, K. S. Kim, M. Kim, J. Park, Gram: Fast fine-tuning of pre-trained language models for content-based collaborative filtering, arXiv preprint arXiv:2204.04179 (2022).
  • Marchesin et al. [2020] S. Marchesin, A. Purpura, G. Silvello, Focal elements of neural information retrieval models. an outlook through a reproducibility study, Information Processing & Management 57 (2020) 102109.
  • Strubell et al. [2019] E. Strubell, A. Ganesh, A. McCallum, Energy and policy considerations for deep learning in nlp, arXiv preprint arXiv:1906.02243 (2019).
  • Himeur et al. [2021] Y. Himeur, A. Alsalemi, A. Al-Kababji, F. Bensaali, A. Amira, C. Sardianos, G. Dimitrakopoulos, I. Varlamis, A survey of recommender systems for energy efficiency in buildings: Principles, challenges and prospects, Information Fusion 72 (2021) 1–21.
  • Adomavicius and Zhang [2012] G. Adomavicius, J. Zhang, Impact of data characteristics on recommender systems performance, ACM Transactions on Management Information Systems (TMIS) 3 (2012) 1–17.
  • Lu et al. [2023] Y. Lu, A. Sigov, L. Ratkin, L. A. Ivanov, M. Zuo, Quantum computing and industrial information integration: A review, Journal of Industrial Information Integration (2023) 100511.
  • Ferrari Dacrema et al. [2022] M. Ferrari Dacrema, F. Moroni, R. Nembrini, N. Ferro, G. Faggioli, P. Cremonesi, Towards feature selection for ranking and classification exploiting quantum annealers, in: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2022, pp. 2814–2824.
  • Glover et al. [2019] F. Glover, G. Kochenberger, Y. Du, Quantum bridge analytics i: a tutorial on formulating and using qubo models, 4or 17 (2019) 335–371.
  • Pilato and Vella [2022] G. Pilato, F. Vella, A survey on quantum computing for recommendation systems, Information 14 (2022) 20.
  • Pasin et al. [2024a] A. Pasin, M. Ferrari Dacrema, P. Cremonesi, N. Ferro, QuantumCLEF 2024: Overview of the Quantum Computing Challenge for Information Retrieval and Recommender Systems at CLEF, in: Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), Grenoble, France, September 9th to 12th, 2024, 2024a.
  • Pasin et al. [2024b] A. Pasin, M. Ferrari Dacrema, P. Cremonesi, N. Ferro, Overview of QuantumCLEF 2024: The Quantum Computing Challenge for Information Retrieval and Recommender Systems at CLEF, in: Experimental IR Meets Multilinguality, Multimodality, and Interaction - 15th International Conference of the CLEF Association, CLEF 2024, Grenoble, France, September 9-12, 2024, Proceedings, 2024b.
  • Pearl et al. [2016] J. Pearl, M. Glymour, N. P. Jewell, Causal inference in statistics: A primer, John Wiley & Sons, 2016.
  • Järvelin and Kekäläinen [2002] K. Järvelin, J. Kekäläinen, Cumulated gain-based evaluation of ir techniques, ACM Transactions on Information Systems (TOIS) 20 (2002) 422–446.
  • Bittel and Kliesch [2021] L. Bittel, M. Kliesch, Training variational quantum algorithms is np-hard, Physical review letters 127 (2021) 120502.
  • Gill et al. [2022] S. S. Gill, A. Kumar, H. Singh, M. Singh, K. Kaur, M. Usman, R. Buyya, Quantum computing: A taxonomy, systematic review and future directions, Software: Practice and Experience 52 (2022) 66–114.
  • Nembrini et al. [2021] R. Nembrini, M. Ferrari Dacrema, P. Cremonesi, Feature selection for recommender systems with quantum computing, Entropy 23 (2021) 970.
  • Nikitin et al. [2022] A. Nikitin, A. Chertkov, R. Ballester-Ripoll, I. Oseledets, E. Frolov, Are quantum computers practical yet? a case for feature selection in recommender systems using tensor networks, arXiv preprint arXiv:2205.04490 (2022).
  • Verma et al. [2020] S. Verma, V. Boonsanong, M. Hoang, K. E. Hines, J. P. Dickerson, C. Shah, Counterfactual explanations and algorithmic recourses for machine learning: A review, arXiv preprint arXiv:2010.10596 (2020).
  • Olson et al. [2021] M. L. Olson, R. Khanna, L. Neal, F. Li, W.-K. Wong, Counterfactual state explanations for reinforcement learning agents via generative deep learning, Artificial Intelligence 295 (2021) 103455.
  • Tran et al. [2021] K. H. Tran, A. Ghazimatin, R. Saha Roy, Counterfactual explanations for neural recommenders, in: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021, pp. 1627–1631.
  • Tan et al. [2021] J. Tan, S. Xu, Y. Ge, Y. Li, X. Chen, Y. Zhang, Counterfactual explainable recommendation, in: Proceedings of the 30th ACM International Conference on Information & Knowledge Management, 2021, pp. 1784–1793.
  • Zhang et al. [2021] S. Zhang, D. Yao, Z. Zhao, T.-S. Chua, F. Wu, Causerec: Counterfactual user sequence synthesis for sequential recommendation, in: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021, pp. 367–377.