-
Three-State Information Hiding: Provably Secure Asymmetric Steganography
Authors:
Minhao Bai,
Jinshuai Yang,
Kaiyi Pang,
Xu Xin,
Yongfeng Huang
Abstract:
The rise of language models has provided a fertile ground for the application of steganography. Due to their qualified output, steganographic texts become similar to human and have attracted most of the steganography researchers' attention. However, running a language model requires a strong computation platform. It limits the applicable scenario of steganography, since those electronic devices co…
▽ More
The rise of language models has provided a fertile ground for the application of steganography. Due to their qualified output, steganographic texts become similar to human and have attracted most of the steganography researchers' attention. However, running a language model requires a strong computation platform. It limits the applicable scenario of steganography, since those electronic devices controlled by the decoder may not even equipped with a GPU. Traditional provably secure steganography methods cannot be applied to this low-resource scenario. Therefore, we aim at design a novel steganography framework that is practical in a low-resource scheme. We start from the rigorous probability analysis with the help of hypothesis testing techniques to construct an theoretical framework. Then we prove the security and robostness of our framework and point out its optimization goal. We test our theoretical framework in some famous LLMs and the results have proved its usability. There are still some practical problems and this gives the direction of future work. We hope that this work will expand the practical scope of steganography and create a new branch of steganography.
△ Less
Submitted 18 July, 2024;
originally announced July 2024.
-
Towards Personalized Federated Multi-scenario Multi-task Recommendation
Authors:
Yue Ding,
Yanbiao Ji,
Xun Cai,
Xin Xin,
Xiaofeng Gao,
Hongtao Lu
Abstract:
In modern recommender system applications, such as e-commerce, predicting multiple targets like click-through rate (CTR) and post-view click-through \& conversion rate (CTCVR) is common. Multi-task recommender systems are gaining traction in research and practical use. Existing multi-task recommender systems tackle diverse business scenarios, merging and modeling these scenarios unlocks shared kno…
▽ More
In modern recommender system applications, such as e-commerce, predicting multiple targets like click-through rate (CTR) and post-view click-through \& conversion rate (CTCVR) is common. Multi-task recommender systems are gaining traction in research and practical use. Existing multi-task recommender systems tackle diverse business scenarios, merging and modeling these scenarios unlocks shared knowledge to boost overall performance. As new and more complex real-world recommendation scenarios have emerged, data privacy issues make it difficult to train a single global multi-task recommendation model that processes multiple separate scenarios.
In this paper, we propose a novel framework for personalized federated multi-scenario multi-task recommendation, called PF-MSMTrec. We assign each scenario to a dedicated client, with each client utilizing the Mixture-of-Experts (MMoE) structure. Our proposed method aims to tackle the unique challenge posed by multiple optimization conflicts in this setting. We introduce a bottom-up joint learning mechanism. Firstly, we design a parameter template to decouple the parameters of the expert network. Thus, scenario parameters are shared knowledge for federated parameter aggregation, while task-specific parameters are personalized local parameters. Secondly, we conduct personalized federated learning for the parameters of each expert network through a federated communication round, utilizing three modules: federated batch normalization, conflict coordination, and personalized aggregation. Finally, we perform another round of personalized federated parameter aggregation on the task tower network to obtain the prediction results for multiple tasks. We conduct extensive experiments on two public datasets, and the results demonstrate that our proposed method surpasses state-of-the-art methods.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
MEFT: Memory-Efficient Fine-Tuning through Sparse Adapter
Authors:
Jitai Hao,
WeiWei Sun,
Xin Xin,
Qi Meng,
Zhumin Chen,
Pengjie Ren,
Zhaochun Ren
Abstract:
Parameter-Efficient Fine-tuning (PEFT) facilitates the fine-tuning of Large Language Models (LLMs) under limited resources. However, the fine-tuning performance with PEFT on complex, knowledge-intensive tasks is limited due to the constrained model capacity, which originates from the limited number of additional trainable parameters. To overcome this limitation, we introduce a novel mechanism that…
▽ More
Parameter-Efficient Fine-tuning (PEFT) facilitates the fine-tuning of Large Language Models (LLMs) under limited resources. However, the fine-tuning performance with PEFT on complex, knowledge-intensive tasks is limited due to the constrained model capacity, which originates from the limited number of additional trainable parameters. To overcome this limitation, we introduce a novel mechanism that fine-tunes LLMs with adapters of larger size yet memory-efficient. This is achieved by leveraging the inherent activation sparsity in the Feed-Forward Networks (FFNs) of LLMs and utilizing the larger capacity of Central Processing Unit (CPU) memory compared to Graphics Processing Unit (GPU). We store and update the parameters of larger adapters on the CPU. Moreover, we employ a Mixture of Experts (MoE)-like architecture to mitigate unnecessary CPU computations and reduce the communication volume between the GPU and CPU. This is particularly beneficial over the limited bandwidth of PCI Express (PCIe). Our method can achieve fine-tuning results comparable to those obtained with larger memory capacities, even when operating under more limited resources such as a 24GB memory single GPU setup, with acceptable loss in training efficiency. Our codes are available at https://github.com/CURRENTF/MEFT.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Why You Should Not Trust Interpretations in Machine Learning: Adversarial Attacks on Partial Dependence Plots
Authors:
Xi Xin,
Giles Hooker,
Fei Huang
Abstract:
The adoption of artificial intelligence (AI) across industries has led to the widespread use of complex black-box models and interpretation tools for decision making. This paper proposes an adversarial framework to uncover the vulnerability of permutation-based interpretation methods for machine learning tasks, with a particular focus on partial dependence (PD) plots. This adversarial framework mo…
▽ More
The adoption of artificial intelligence (AI) across industries has led to the widespread use of complex black-box models and interpretation tools for decision making. This paper proposes an adversarial framework to uncover the vulnerability of permutation-based interpretation methods for machine learning tasks, with a particular focus on partial dependence (PD) plots. This adversarial framework modifies the original black box model to manipulate its predictions for instances in the extrapolation domain. As a result, it produces deceptive PD plots that can conceal discriminatory behaviors while preserving most of the original model's predictions. This framework can produce multiple fooled PD plots via a single model. By using real-world datasets including an auto insurance claims dataset and COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) dataset, our results show that it is possible to intentionally hide the discriminatory behavior of a predictor and make the black-box model appear neutral through interpretation tools like PD plots while retaining almost all the predictions of the original black-box model. Managerial insights for regulators and practitioners are provided based on the findings.
△ Less
Submitted 1 May, 2024; v1 submitted 29 April, 2024;
originally announced April 2024.
-
Offline Trajectory Generalization for Offline Reinforcement Learning
Authors:
Ziqi Zhao,
Zhaochun Ren,
Liu Yang,
Fajie Yuan,
Pengjie Ren,
Zhumin Chen,
jun Ma,
Xin Xin
Abstract:
Offline reinforcement learning (RL) aims to learn policies from static datasets of previously collected trajectories. Existing methods for offline RL either constrain the learned policy to the support of offline data or utilize model-based virtual environments to generate simulated rollouts. However, these methods suffer from (i) poor generalization to unseen states; and (ii) trivial improvement f…
▽ More
Offline reinforcement learning (RL) aims to learn policies from static datasets of previously collected trajectories. Existing methods for offline RL either constrain the learned policy to the support of offline data or utilize model-based virtual environments to generate simulated rollouts. However, these methods suffer from (i) poor generalization to unseen states; and (ii) trivial improvement from low-qualified rollout simulation. In this paper, we propose offline trajectory generalization through world transformers for offline reinforcement learning (OTTO). Specifically, we use casual Transformers, a.k.a. World Transformers, to predict state dynamics and the immediate reward. Then we propose four strategies to use World Transformers to generate high-rewarded trajectory simulation by perturbing the offline data. Finally, we jointly use offline data with simulated data to train an offline RL algorithm. OTTO serves as a plug-in module and can be integrated with existing offline RL methods to enhance them with better generalization capability of transformers and high-rewarded data augmentation. Conducting extensive experiments on D4RL benchmark datasets, we verify that OTTO significantly outperforms state-of-the-art offline RL methods.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
IISAN: Efficiently Adapting Multimodal Representation for Sequential Recommendation with Decoupled PEFT
Authors:
Junchen Fu,
Xuri Ge,
Xin Xin,
Alexandros Karatzoglou,
Ioannis Arapakis,
Jie Wang,
Joemon M. Jose
Abstract:
Multimodal foundation models are transformative in sequential recommender systems, leveraging powerful representation learning capabilities. While Parameter-efficient Fine-tuning (PEFT) is commonly used to adapt foundation models for recommendation tasks, most research prioritizes parameter efficiency, often overlooking critical factors like GPU memory efficiency and training speed. Addressing thi…
▽ More
Multimodal foundation models are transformative in sequential recommender systems, leveraging powerful representation learning capabilities. While Parameter-efficient Fine-tuning (PEFT) is commonly used to adapt foundation models for recommendation tasks, most research prioritizes parameter efficiency, often overlooking critical factors like GPU memory efficiency and training speed. Addressing this gap, our paper introduces IISAN (Intra- and Inter-modal Side Adapted Network for Multimodal Representation), a simple plug-and-play architecture using a Decoupled PEFT structure and exploiting both intra- and inter-modal adaptation.
IISAN matches the performance of full fine-tuning (FFT) and state-of-the-art PEFT. More importantly, it significantly reduces GPU memory usage - from 47GB to just 3GB for multimodal sequential recommendation tasks. Additionally, it accelerates training time per epoch from 443s to 22s compared to FFT. This is also a notable improvement over the Adapter and LoRA, which require 37-39 GB GPU memory and 350-380 seconds per epoch for training.
Furthermore, we propose a new composite efficiency metric, TPME (Training-time, Parameter, and GPU Memory Efficiency) to alleviate the prevalent misconception that "parameter efficiency represents overall efficiency". TPME provides more comprehensive insights into practical efficiency comparisons between different methods. Besides, we give an accessible efficiency analysis of all PEFT and FFT approaches, which demonstrate the superiority of IISAN. We release our codes and other materials at https://github.com/GAIR-Lab/IISAN.
△ Less
Submitted 21 July, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
Enhanced Generative Recommendation via Content and Collaboration Integration
Authors:
Yidan Wang,
Zhaochun Ren,
Weiwei Sun,
Jiyuan Yang,
Zhixiang Liang,
Xin Chen,
Ruobing Xie,
Su Yan,
Xu Zhang,
Pengjie Ren,
Zhumin Chen,
Xin Xin
Abstract:
Generative recommendation has emerged as a promising paradigm aimed at augmenting recommender systems with recent advancements in generative artificial intelligence. This task has been formulated as a sequence-to-sequence generation process, wherein the input sequence encompasses data pertaining to the user's previously interacted items, and the output sequence denotes the generative identifier fo…
▽ More
Generative recommendation has emerged as a promising paradigm aimed at augmenting recommender systems with recent advancements in generative artificial intelligence. This task has been formulated as a sequence-to-sequence generation process, wherein the input sequence encompasses data pertaining to the user's previously interacted items, and the output sequence denotes the generative identifier for the suggested item. However, existing generative recommendation approaches still encounter challenges in (i) effectively integrating user-item collaborative signals and item content information within a unified generative framework, and (ii) executing an efficient alignment between content information and collaborative signals.
In this paper, we introduce content-based collaborative generation for recommender systems, denoted as ColaRec. To capture collaborative signals, the generative item identifiers are derived from a pretrained collaborative filtering model, while the user is represented through the aggregation of interacted items' content. Subsequently, the aggregated textual description of items is fed into a language model to encapsulate content information. This integration enables ColaRec to amalgamate collaborative signals and content information within an end-to-end framework. Regarding the alignment, we propose an item indexing task to facilitate the mapping between the content-based semantic space and the interaction-based collaborative space. Additionally, a contrastive loss is introduced to ensure that items with similar collaborative GIDs possess comparable content representations, thereby enhancing alignment. To validate the efficacy of ColaRec, we conduct experiments on three benchmark datasets. Empirical results substantiate the superior performance of ColaRec.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Uncovering Selective State Space Model's Capabilities in Lifelong Sequential Recommendation
Authors:
Jiyuan Yang,
Yuanzi Li,
Jingyu Zhao,
Hanbing Wang,
Muyang Ma,
Jun Ma,
Zhaochun Ren,
Mengqi Zhang,
Xin Xin,
Zhumin Chen,
Pengjie Ren
Abstract:
Sequential Recommenders have been widely applied in various online services, aiming to model users' dynamic interests from their sequential interactions. With users increasingly engaging with online platforms, vast amounts of lifelong user behavioral sequences have been generated. However, existing sequential recommender models often struggle to handle such lifelong sequences. The primary challeng…
▽ More
Sequential Recommenders have been widely applied in various online services, aiming to model users' dynamic interests from their sequential interactions. With users increasingly engaging with online platforms, vast amounts of lifelong user behavioral sequences have been generated. However, existing sequential recommender models often struggle to handle such lifelong sequences. The primary challenges stem from computational complexity and the ability to capture long-range dependencies within the sequence. Recently, a state space model featuring a selective mechanism (i.e., Mamba) has emerged. In this work, we investigate the performance of Mamba for lifelong sequential recommendation (i.e., length>=2k). More specifically, we leverage the Mamba block to model lifelong user sequences selectively. We conduct extensive experiments to evaluate the performance of representative sequential recommendation models in the setting of lifelong sequences. Experiments on two real-world datasets demonstrate the superiority of Mamba. We found that RecMamba achieves performance comparable to the representative model while significantly reducing training duration by approximately 70% and memory costs by 80%. Codes and data are available at \url{https://github.com/nancheng58/RecMamba}.
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
Asclepius: A Spectrum Evaluation Benchmark for Medical Multi-Modal Large Language Models
Authors:
Wenxuan Wang,
Yihang Su,
Jingyuan Huan,
Jie Liu,
Wenting Chen,
Yudi Zhang,
Cheng-Yi Li,
Kao-Jung Chang,
Xiaohan Xin,
Linlin Shen,
Michael R. Lyu
Abstract:
The significant breakthroughs of Medical Multi-Modal Large Language Models (Med-MLLMs) renovate modern healthcare with robust information synthesis and medical decision support. However, these models are often evaluated on benchmarks that are unsuitable for the Med-MLLMs due to the intricate nature of the real-world diagnostic frameworks, which encompass diverse medical specialties and involve com…
▽ More
The significant breakthroughs of Medical Multi-Modal Large Language Models (Med-MLLMs) renovate modern healthcare with robust information synthesis and medical decision support. However, these models are often evaluated on benchmarks that are unsuitable for the Med-MLLMs due to the intricate nature of the real-world diagnostic frameworks, which encompass diverse medical specialties and involve complex clinical decisions. Moreover, these benchmarks are susceptible to data leakage, since Med-MLLMs are trained on large assemblies of publicly available data. Thus, an isolated and clinically representative benchmark is highly desirable for credible Med-MLLMs evaluation. To this end, we introduce Asclepius, a novel Med-MLLM benchmark that rigorously and comprehensively assesses model capability in terms of: distinct medical specialties (cardiovascular, gastroenterology, etc.) and different diagnostic capacities (perception, disease analysis, etc.). Grounded in 3 proposed core principles, Asclepius ensures a comprehensive evaluation by encompassing 15 medical specialties, stratifying into 3 main categories and 8 sub-categories of clinical tasks, and exempting from train-validate contamination. We further provide an in-depth analysis of 6 Med-MLLMs and compare them with 5 human specialists, providing insights into their competencies and limitations in various medical contexts. Our work not only advances the understanding of Med-MLLMs' capabilities but also sets a precedent for future evaluations and the safe deployment of these models in clinical environments. We launch and maintain a leaderboard for community assessment of Med-MLLM capabilities (https://asclepius-med.github.io/).
△ Less
Submitted 17 February, 2024;
originally announced February 2024.
-
On the Effectiveness of Unlearning in Session-Based Recommendation
Authors:
Xin Xin,
Liu Yang,
Ziqi Zhao,
Pengjie Ren,
Zhumin Chen,
Jun Ma,
Zhaochun Ren
Abstract:
Session-based recommendation predicts users' future interests from previous interactions in a session. Despite the memorizing of historical samples, the request of unlearning, i.e., to remove the effect of certain training samples, also occurs for reasons such as user privacy or model fidelity. However, existing studies on unlearning are not tailored for the session-based recommendation. On the on…
▽ More
Session-based recommendation predicts users' future interests from previous interactions in a session. Despite the memorizing of historical samples, the request of unlearning, i.e., to remove the effect of certain training samples, also occurs for reasons such as user privacy or model fidelity. However, existing studies on unlearning are not tailored for the session-based recommendation. On the one hand, these approaches cannot achieve satisfying unlearning effects due to the collaborative correlations and sequential connections between the unlearning item and the remaining items in the session. On the other hand, seldom work has conducted the research to verify the unlearning effectiveness in the session-based recommendation scenario. In this paper, we propose SRU, a session-based recommendation unlearning framework, which enables high unlearning efficiency, accurate recommendation performance, and improved unlearning effectiveness in session-based recommendation. Specifically, we first partition the training sessions into separate sub-models according to the similarity across the sessions, then we utilize an attention-based aggregation layer to fuse the hidden states according to the correlations between the session and the centroid of the data in the sub-model. To improve the unlearning effectiveness, we further propose three extra data deletion strategies, including collaborative extra deletion (CED), neighbor extra deletion (NED), and random extra deletion (RED). Besides, we propose an evaluation metric that measures whether the unlearning sample can be inferred after the data deletion to verify the unlearning effectiveness. We implement SRU with three representative session-based recommendation models and conduct experiments on three benchmark datasets. Experimental results demonstrate the effectiveness of our methods.
△ Less
Submitted 22 December, 2023;
originally announced December 2023.
-
Debiasing Sequential Recommenders through Distributionally Robust Optimization over System Exposure
Authors:
Jiyuan Yang,
Yue Ding,
Yidan Wang,
Pengjie Ren,
Zhumin Chen,
Fei Cai,
Jun Ma,
Rui Zhang,
Zhaochun Ren,
Xin Xin
Abstract:
Sequential recommendation (SR) models are typically trained on user-item interactions which are affected by the system exposure bias, leading to the user preference learned from the biased SR model not being fully consistent with the true user preference. Exposure bias refers to the fact that user interactions are dependent upon the partial items exposed to the user. Existing debiasing methods do…
▽ More
Sequential recommendation (SR) models are typically trained on user-item interactions which are affected by the system exposure bias, leading to the user preference learned from the biased SR model not being fully consistent with the true user preference. Exposure bias refers to the fact that user interactions are dependent upon the partial items exposed to the user. Existing debiasing methods do not make full use of the system exposure data and suffer from sub-optimal recommendation performance and high variance. In this paper, we propose to debias sequential recommenders through Distributionally Robust Optimization (DRO) over system exposure data. The key idea is to utilize DRO to optimize the worst-case error over an uncertainty set to safeguard the model against distributional discrepancy caused by the exposure bias. The main challenge to apply DRO for exposure debiasing in SR lies in how to construct the uncertainty set and avoid the overestimation of user preference on biased samples. Moreover, how to evaluate the debiasing effect on biased test set is also an open question. To this end, we first introduce an exposure simulator trained upon the system exposure data to calculate the exposure distribution, which is then regarded as the nominal distribution to construct the uncertainty set of DRO. Then, we introduce a penalty to items with high exposure probability to avoid the overestimation of user preference for biased samples. Finally, we design a debiased self-normalized inverse propensity score (SNIPS) evaluator for evaluating the debiasing effect on the biased offline test set. We conduct extensive experiments on two real-world datasets to verify the effectiveness of the proposed methods. Experimental results demonstrate the superior exposure debiasing performance of proposed methods. Codes and data are available at \url{https://github.com/nancheng58/DebiasedSR_DRO}.
△ Less
Submitted 12 December, 2023;
originally announced December 2023.
-
Learning Robust Sequential Recommenders through Confident Soft Labels
Authors:
Shiguang Wu,
Xin Xin,
Pengjie Ren,
Zhumin Chen,
Jun Ma,
Maarten de Rijke,
Zhaochun Ren
Abstract:
Sequential recommenders that are trained on implicit feedback are usually learned as a multi-class classification task through softmax-based loss functions on one-hot class labels. However, one-hot training labels are sparse and may lead to biased training and sub-optimal performance. Dense, soft labels have been shown to help improve recommendation performance. But how to generate high-quality an…
▽ More
Sequential recommenders that are trained on implicit feedback are usually learned as a multi-class classification task through softmax-based loss functions on one-hot class labels. However, one-hot training labels are sparse and may lead to biased training and sub-optimal performance. Dense, soft labels have been shown to help improve recommendation performance. But how to generate high-quality and confident soft labels from noisy sequential interactions between users and items is still an open question.
We propose a new learning framework for sequential recommenders, CSRec, which introduces confident soft labels to provide robust guidance when learning from user-item interactions. CSRec contains a teacher module that generates high-quality and confident soft labels and a student module that acts as the target recommender and is trained on the combination of dense, soft labels and sparse, one-hot labels.
We propose and compare three approaches to constructing the teacher module: (i) model-level, (ii) data-level, and (iii) training-level. To evaluate the effectiveness and generalization ability of CSRec, we conduct experiments using various state-of-the-art sequential recommendation models as the target student module on four benchmark datasets. Our experimental results demonstrate that CSRec is effective in training better performing sequential recommenders.
△ Less
Submitted 4 November, 2023;
originally announced November 2023.
-
Distributed end-effector formation control for mixed fully- and under-actuated manipulators with flexible joints
Authors:
Zhiyu Peng,
Bayu Jayawardhana,
Xin Xin
Abstract:
The presence of faulty or underactuated manipulators can disrupt the end-effector formation keeping of a team of manipulators. Based on two-link planar manipulators, we investigate this end-effector formation keeping problem for mixed fully- and under-actuated manipulators with flexible joints. In this case, the underactuated manipulators can comprise of active-passive (AP) manipulators, passive-a…
▽ More
The presence of faulty or underactuated manipulators can disrupt the end-effector formation keeping of a team of manipulators. Based on two-link planar manipulators, we investigate this end-effector formation keeping problem for mixed fully- and under-actuated manipulators with flexible joints. In this case, the underactuated manipulators can comprise of active-passive (AP) manipulators, passive-active (PA) manipulators, or a combination thereof. We propose distributed control laws for the different types of manipulators to achieve and maintain the desired formation shape of the end-effectors. It is achieved by assigning virtual springs to the end-effectors for the fully-actuated ones and to the virtual end-effectors for the under-actuated ones. We study further the set of all desired and reachable shapes for the networked manipulators' end-effectors. Finally, we validate our analysis via numerical simulations.
△ Less
Submitted 16 February, 2024; v1 submitted 2 October, 2023;
originally announced October 2023.
-
Distributed formation control of end-effector of mixed planar fully- and under-actuated manipulators
Authors:
Zhiyu Peng,
Bayu Jayawardhana,
Xin Xin
Abstract:
This paper addresses the problem of end-effector formation control for a mixed group of two-link manipulators moving in a horizontal plane that comprises of fully-actuated manipulators and underactuated manipulators with only the second joint being actuated (referred to as the passive-active (PA) manipulators). The problem is solved by extending the distributed end-effector formation controller fo…
▽ More
This paper addresses the problem of end-effector formation control for a mixed group of two-link manipulators moving in a horizontal plane that comprises of fully-actuated manipulators and underactuated manipulators with only the second joint being actuated (referred to as the passive-active (PA) manipulators). The problem is solved by extending the distributed end-effector formation controller for the fully-actuated manipulator to the PA manipulator moving in a horizontal plane by using its integrability. This paper presents stability analysis of the closed-loop systems under a given necessary condition, and we prove that the manipulators' end-effector converge to the desired formation shape. The proposed method is validated by simulations.
△ Less
Submitted 14 September, 2023;
originally announced September 2023.
-
Confucius: Iterative Tool Learning from Introspection Feedback by Easy-to-Difficult Curriculum
Authors:
Shen Gao,
Zhengliang Shi,
Minghang Zhu,
Bowen Fang,
Xin Xin,
Pengjie Ren,
Zhumin Chen,
Jun Ma,
Zhaochun Ren
Abstract:
Augmenting large language models (LLMs) with external tools has emerged as a promising approach to extending the capability of LLMs. Although some works employ open-source LLMs for the tool learning task, most of them are trained in a controlled environment in which LLMs only learn to execute the human-provided tools. However, selecting proper tools from the large toolset is also a crucial ability…
▽ More
Augmenting large language models (LLMs) with external tools has emerged as a promising approach to extending the capability of LLMs. Although some works employ open-source LLMs for the tool learning task, most of them are trained in a controlled environment in which LLMs only learn to execute the human-provided tools. However, selecting proper tools from the large toolset is also a crucial ability for the tool learning model to be applied in real-world applications. Existing methods usually directly employ self-instruction methods to train the model, which ignores differences in tool complexity. In this paper, we propose the Confucius, a novel tool learning framework to train LLM to use complicated tools in real-world scenarios, which contains two main phases: (1) We first propose a multi-stage learning method to teach the LLM to use various tools from an easy-to-difficult curriculum; (2) thenceforth, we propose the Iterative Self-instruct from Introspective Feedback (ISIF) to dynamically construct the dataset to improve the ability to use the complicated tool. Extensive experiments conducted on both controlled and real-world settings demonstrate the superiority of our tool learning framework in the real-world application scenarios compared to both tuning-free (e.g. ChatGPT, Claude) and tuning-based baselines (e.g. GPT4Tools).
△ Less
Submitted 21 December, 2023; v1 submitted 27 August, 2023;
originally announced August 2023.
-
Label Denoising through Cross-Model Agreement
Authors:
Yu Wang,
Xin Xin,
Zaiqiao Meng,
Joemon Jose,
Fuli Feng
Abstract:
Learning from corrupted labels is very common in real-world machine-learning applications. Memorizing such noisy labels could affect the learning of the model, leading to sub-optimal performances. In this work, we propose a novel framework to learn robust machine-learning models from noisy labels. Through an empirical study, we find that different models make relatively similar predictions on clea…
▽ More
Learning from corrupted labels is very common in real-world machine-learning applications. Memorizing such noisy labels could affect the learning of the model, leading to sub-optimal performances. In this work, we propose a novel framework to learn robust machine-learning models from noisy labels. Through an empirical study, we find that different models make relatively similar predictions on clean examples, while the predictions on noisy examples vary much more across different models. Motivated by this observation, we propose \em denoising with cross-model agreement \em (DeCA) which aims to minimize the KL-divergence between the true label distributions parameterized by two machine learning models while maximizing the likelihood of data observation. We employ the proposed DeCA on both the binary label scenario and the multiple label scenario. For the binary label scenario, we select implicit feedback recommendation as the downstream task and conduct experiments with four state-of-the-art recommendation models on four datasets. For the multiple-label scenario, the downstream application is image classification on two benchmark datasets. Experimental results demonstrate that the proposed methods significantly improve the model performance compared with normal training and other denoising methods on both binary and multiple-label scenarios.
△ Less
Submitted 18 December, 2023; v1 submitted 26 August, 2023;
originally announced August 2023.
-
Focused Specific Objects NeRF
Authors:
Yuesong Li,
Feng Pan,
Helong Yan,
Xiuli Xin,
Xiaoxue Feng
Abstract:
Most NeRF-based models are designed for learning the entire scene, and complex scenes can lead to longer learning times and poorer rendering effects. This paper utilizes scene semantic priors to make improvements in fast training, allowing the network to focus on the specific targets and not be affected by complex backgrounds. The training speed can be increased by 7.78 times with better rendering…
▽ More
Most NeRF-based models are designed for learning the entire scene, and complex scenes can lead to longer learning times and poorer rendering effects. This paper utilizes scene semantic priors to make improvements in fast training, allowing the network to focus on the specific targets and not be affected by complex backgrounds. The training speed can be increased by 7.78 times with better rendering effect, and small to medium sized targets can be rendered faster. In addition, this improvement applies to all NeRF-based models. Considering the inherent multi-view consistency and smoothness of NeRF, this paper also studies weak supervision by sparsely sampling negative ray samples. With this method, training can be further accelerated and rendering quality can be maintained. Finally, this paper extends pixel semantic and color rendering formulas and proposes a new scene editing technique that can achieve unique displays of the specific semantic targets or masking them in rendering. To address the problem of unsupervised regions incorrect inferences in the scene, we also designed a self-supervised loop that combines morphological operations and clustering.
△ Less
Submitted 11 August, 2023;
originally announced August 2023.
-
Information Retrieval Meets Large Language Models: A Strategic Report from Chinese IR Community
Authors:
Qingyao Ai,
Ting Bai,
Zhao Cao,
Yi Chang,
Jiawei Chen,
Zhumin Chen,
Zhiyong Cheng,
Shoubin Dong,
Zhicheng Dou,
Fuli Feng,
Shen Gao,
Jiafeng Guo,
Xiangnan He,
Yanyan Lan,
Chenliang Li,
Yiqun Liu,
Ziyu Lyu,
Weizhi Ma,
Jun Ma,
Zhaochun Ren,
Pengjie Ren,
Zhiqiang Wang,
Mingwen Wang,
Ji-Rong Wen,
Le Wu
, et al. (8 additional authors not shown)
Abstract:
The research field of Information Retrieval (IR) has evolved significantly, expanding beyond traditional search to meet diverse user information needs. Recently, Large Language Models (LLMs) have demonstrated exceptional capabilities in text understanding, generation, and knowledge inference, opening up exciting avenues for IR research. LLMs not only facilitate generative retrieval but also offer…
▽ More
The research field of Information Retrieval (IR) has evolved significantly, expanding beyond traditional search to meet diverse user information needs. Recently, Large Language Models (LLMs) have demonstrated exceptional capabilities in text understanding, generation, and knowledge inference, opening up exciting avenues for IR research. LLMs not only facilitate generative retrieval but also offer improved solutions for user understanding, model evaluation, and user-system interactions. More importantly, the synergistic relationship among IR models, LLMs, and humans forms a new technical paradigm that is more powerful for information seeking. IR models provide real-time and relevant information, LLMs contribute internal knowledge, and humans play a central role of demanders and evaluators to the reliability of information services. Nevertheless, significant challenges exist, including computational costs, credibility concerns, domain-specific limitations, and ethical considerations. To thoroughly discuss the transformative impact of LLMs on IR research, the Chinese IR community conducted a strategic workshop in April 2023, yielding valuable insights. This paper provides a summary of the workshop's outcomes, including the rethinking of IR's core values, the mutual enhancement of LLMs and IR, the proposal of a novel IR technical paradigm, and open challenges.
△ Less
Submitted 26 July, 2023; v1 submitted 19 July, 2023;
originally announced July 2023.
-
Large-Scale Person Detection and Localization using Overhead Fisheye Cameras
Authors:
Lu Yang,
Liulei Li,
Xueshi Xin,
Yifan Sun,
Qing Song,
Wenguan Wang
Abstract:
Location determination finds wide applications in daily life. Instead of existing efforts devoted to localizing tourist photos captured by perspective cameras, in this article, we focus on devising person positioning solutions using overhead fisheye cameras. Such solutions are advantageous in large field of view (FOV), low cost, anti-occlusion, and unaggressive work mode (without the necessity of…
▽ More
Location determination finds wide applications in daily life. Instead of existing efforts devoted to localizing tourist photos captured by perspective cameras, in this article, we focus on devising person positioning solutions using overhead fisheye cameras. Such solutions are advantageous in large field of view (FOV), low cost, anti-occlusion, and unaggressive work mode (without the necessity of cameras carried by persons). However, related studies are quite scarce, due to the paucity of data. To stimulate research in this exciting area, we present LOAF, the first large-scale overhead fisheye dataset for person detection and localization. LOAF is built with many essential features, e.g., i) the data cover abundant diversities in scenes, human pose, density, and location; ii) it contains currently the largest number of annotated pedestrian, i.e., 457K bounding boxes with groundtruth location information; iii) the body-boxes are labeled as radius-aligned so as to fully address the positioning challenge. To approach localization, we build a fisheye person detection network, which exploits the fisheye distortions by a rotation-equivariant training strategy and predict radius-aligned human boxes end-to-end. Then, the actual locations of the detected persons are calculated by a numerical solution on the fisheye model and camera altitude data. Extensive experiments on LOAF validate the superiority of our fisheye detector w.r.t. previous methods, and show that our whole fisheye positioning solution is able to locate all persons in FOV with an accuracy of 0.5 m, within 0.1 s.
△ Less
Submitted 17 July, 2023;
originally announced July 2023.
-
How Graph Convolutions Amplify Popularity Bias for Recommendation?
Authors:
Jiajia Chen,
Jiancan Wu,
Jiawei Chen,
Xin Xin,
Yong Li,
Xiangnan He
Abstract:
Graph convolutional networks (GCNs) have become prevalent in recommender system (RS) due to their superiority in modeling collaborative patterns. Although improving the overall accuracy, GCNs unfortunately amplify popularity bias -- tail items are less likely to be recommended. This effect prevents the GCN-based RS from making precise and fair recommendations, decreasing the effectiveness of recom…
▽ More
Graph convolutional networks (GCNs) have become prevalent in recommender system (RS) due to their superiority in modeling collaborative patterns. Although improving the overall accuracy, GCNs unfortunately amplify popularity bias -- tail items are less likely to be recommended. This effect prevents the GCN-based RS from making precise and fair recommendations, decreasing the effectiveness of recommender systems in the long run.
In this paper, we investigate how graph convolutions amplify the popularity bias in RS. Through theoretical analyses, we identify two fundamental factors: (1) with graph convolution (\textit{i.e.,} neighborhood aggregation), popular items exert larger influence than tail items on neighbor users, making the users move towards popular items in the representation space; (2) after multiple times of graph convolution, popular items would affect more high-order neighbors and become more influential. The two points make popular items get closer to almost users and thus being recommended more frequently. To rectify this, we propose to estimate the amplified effect of popular nodes on each node's representation, and intervene the effect after each graph convolution. Specifically, we adopt clustering to discover highly-influential nodes and estimate the amplification effect of each node, then remove the effect from the node embeddings at each graph convolution layer. Our method is simple and generic -- it can be used in the inference stage to correct existing models rather than training a new model from scratch, and can be applied to various GCN models. We demonstrate our method on two representative GCN backbones LightGCN and UltraGCN, verifying its ability in improving the recommendations of tail items without sacrificing the performance of popular items. Codes are open-sourced \footnote{https://github.com/MEICRS/DAP}.
△ Less
Submitted 24 May, 2023;
originally announced May 2023.
-
Contrastive State Augmentations for Reinforcement Learning-Based Recommender Systems
Authors:
Zhaochun Ren,
Na Huang,
Yidan Wang,
Pengjie Ren,
Jun Ma,
Jiahuan Lei,
Xinlei Shi,
Hengliang Luo,
Joemon M Jose,
Xin Xin
Abstract:
Learning reinforcement learning (RL)-based recommenders from historical user-item interaction sequences is vital to generate high-reward recommendations and improve long-term cumulative benefits. However, existing RL recommendation methods encounter difficulties (i) to estimate the value functions for states which are not contained in the offline training data, and (ii) to learn effective state re…
▽ More
Learning reinforcement learning (RL)-based recommenders from historical user-item interaction sequences is vital to generate high-reward recommendations and improve long-term cumulative benefits. However, existing RL recommendation methods encounter difficulties (i) to estimate the value functions for states which are not contained in the offline training data, and (ii) to learn effective state representations from user implicit feedback due to the lack of contrastive signals. In this work, we propose contrastive state augmentations (CSA) for the training of RL-based recommender systems. To tackle the first issue, we propose four state augmentation strategies to enlarge the state space of the offline data. The proposed method improves the generalization capability of the recommender by making the RL agent visit the local state regions and ensuring the learned value functions are similar between the original and augmented states. For the second issue, we propose introducing contrastive signals between augmented states and the state randomly sampled from other sessions to improve the state representation learning further. To verify the effectiveness of the proposed CSA, we conduct extensive experiments on two publicly accessible datasets and one dataset collected from a real-life e-commerce platform. We also conduct experiments on a simulated environment as the online evaluation setting. Experimental results demonstrate that CSA can effectively improve recommendation performance.
△ Less
Submitted 18 May, 2023;
originally announced May 2023.
-
Improving Implicit Feedback-Based Recommendation through Multi-Behavior Alignment
Authors:
Xin Xin,
Xiangyuan Liu,
Hanbing Wang,
Pengjie Ren,
Zhumin Chen,
Jiahuan Lei,
Xinlei Shi,
Hengliang Luo,
Joemon Jose,
Maarten de Rijke,
Zhaochun Ren
Abstract:
Recommender systems that learn from implicit feedback often use large volumes of a single type of implicit user feedback, such as clicks, to enhance the prediction of sparse target behavior such as purchases. Using multiple types of implicit user feedback for such target behavior prediction purposes is still an open question. Existing studies that attempted to learn from multiple types of user beh…
▽ More
Recommender systems that learn from implicit feedback often use large volumes of a single type of implicit user feedback, such as clicks, to enhance the prediction of sparse target behavior such as purchases. Using multiple types of implicit user feedback for such target behavior prediction purposes is still an open question. Existing studies that attempted to learn from multiple types of user behavior often fail to: (i) learn universal and accurate user preferences from different behavioral data distributions, and (ii) overcome the noise and bias in observed implicit user feedback. To address the above problems, we propose multi-behavior alignment (MBA), a novel recommendation framework that learns from implicit feedback by using multiple types of behavioral data. We conjecture that multiple types of behavior from the same user (e.g., clicks and purchases) should reflect similar preferences of that user. To this end, we regard the underlying universal user preferences as a latent variable. The variable is inferred by maximizing the likelihood of multiple observed behavioral data distributions and, at the same time, minimizing the Kullback-Leibler divergence (KL-divergence) between user models learned from auxiliary behavior (such as clicks or views) and the target behavior separately. MBA infers universal user preferences from multi-behavior data and performs data denoising to enable effective knowledge transfer. We conduct experiments on three datasets, including a dataset collected from an operational e-commerce platform. Empirical results demonstrate the effectiveness of our proposed method in utilizing multiple types of behavioral data to enhance the prediction of the target behavior.
△ Less
Submitted 9 May, 2023;
originally announced May 2023.
-
A Self-Correcting Sequential Recommender
Authors:
Yujie Lin,
Chenyang Wang,
Zhumin Chen,
Zhaochun Ren,
Xin Xin,
Qiang Yan,
Maarten de Rijke,
Xiuzhen Cheng,
Pengjie Ren
Abstract:
Sequential recommendations aim to capture users' preferences from their historical interactions so as to predict the next item that they will interact with. Sequential recommendation methods usually assume that all items in a user's historical interactions reflect her/his preferences and transition patterns between items. However, real-world interaction data is imperfect in that (i) users might er…
▽ More
Sequential recommendations aim to capture users' preferences from their historical interactions so as to predict the next item that they will interact with. Sequential recommendation methods usually assume that all items in a user's historical interactions reflect her/his preferences and transition patterns between items. However, real-world interaction data is imperfect in that (i) users might erroneously click on items, i.e., so-called misclicks on irrelevant items, and (ii) users might miss items, i.e., unexposed relevant items due to inaccurate recommendations. To tackle the two issues listed above, we propose STEAM, a Self-correcTing sEquentiAl recoMmender. STEAM first corrects an input item sequence by adjusting the misclicked and/or missed items. It then uses the corrected item sequence to train a recommender and make the next item prediction.We design an item-wise corrector that can adaptively select one type of operation for each item in the sequence. The operation types are 'keep', 'delete' and 'insert.' In order to train the item-wise corrector without requiring additional labeling, we design two self-supervised learning mechanisms: (i) deletion correction (i.e., deleting randomly inserted items), and (ii) insertion correction (i.e., predicting randomly deleted items). We integrate the corrector with the recommender by sharing the encoder and by training them jointly. We conduct extensive experiments on three real-world datasets and the experimental results demonstrate that STEAM outperforms state-of-the-art sequential recommendation baselines. Our in-depth analyses confirm that STEAM benefits from learning to correct the raw item sequences.
△ Less
Submitted 20 April, 2023; v1 submitted 3 March, 2023;
originally announced March 2023.
-
Variational Reasoning over Incomplete Knowledge Graphs for Conversational Recommendation
Authors:
Xiaoyu Zhang,
Xin Xin,
Dongdong Li,
Wenxuan Liu,
Pengjie Ren,
Zhumin Chen,
Jun Ma,
Zhaochun Ren
Abstract:
Conversational recommender systems (CRSs) often utilize external knowledge graphs (KGs) to introduce rich semantic information and recommend relevant items through natural language dialogues. However, original KGs employed in existing CRSs are often incomplete and sparse, which limits the reasoning capability in recommendation. Moreover, only few of existing studies exploit the dialogue context to…
▽ More
Conversational recommender systems (CRSs) often utilize external knowledge graphs (KGs) to introduce rich semantic information and recommend relevant items through natural language dialogues. However, original KGs employed in existing CRSs are often incomplete and sparse, which limits the reasoning capability in recommendation. Moreover, only few of existing studies exploit the dialogue context to dynamically refine knowledge from KGs for better recommendation. To address the above issues, we propose the Variational Reasoning over Incomplete KGs Conversational Recommender (VRICR). Our key idea is to incorporate the large dialogue corpus naturally accompanied with CRSs to enhance the incomplete KGs; and perform dynamic knowledge reasoning conditioned on the dialogue context. Specifically, we denote the dialogue-specific subgraphs of KGs as latent variables with categorical priors for adaptive knowledge graphs refactor. We propose a variational Bayesian method to approximate posterior distributions over dialogue-specific subgraphs, which not only leverages the dialogue corpus for restructuring missing entity relations but also dynamically selects knowledge based on the dialogue context. Finally, we infuse the dialogue-specific subgraphs to decode the recommendation and responses. We conduct experiments on two benchmark CRSs datasets. Experimental results confirm the effectiveness of our proposed method.
△ Less
Submitted 23 December, 2022; v1 submitted 22 December, 2022;
originally announced December 2022.
-
On the User Behavior Leakage from Recommender System Exposure
Authors:
Xin Xin,
Jiyuan Yang,
Hanbing Wang,
Jun Ma,
Pengjie Ren,
Hengliang Luo,
Xinlei Shi,
Zhumin Chen,
Zhaochun Ren
Abstract:
Modern recommender systems are trained to predict users potential future interactions from users historical behavior data. During the interaction process, despite the data coming from the user side recommender systems also generate exposure data to provide users with personalized recommendation slates. Compared with the sparse user behavior data, the system exposure data is much larger in volume s…
▽ More
Modern recommender systems are trained to predict users potential future interactions from users historical behavior data. During the interaction process, despite the data coming from the user side recommender systems also generate exposure data to provide users with personalized recommendation slates. Compared with the sparse user behavior data, the system exposure data is much larger in volume since only very few exposed items would be clicked by the user. Besides, the users historical behavior data is privacy sensitive and is commonly protected with careful access authorization. However, the large volume of recommender exposure data usually receives less attention and could be accessed within a relatively larger scope of various information seekers. In this paper, we investigate the problem of user behavior leakage in recommender systems. We show that the privacy sensitive user past behavior data can be inferred through the modeling of system exposure. Besides, one can infer which items the user have clicked just from the observation of current system exposure for this user. Given the fact that system exposure data could be widely accessed from a relatively larger scope, we believe that the user past behavior privacy has a high risk of leakage in recommender systems. More precisely, we conduct an attack model whose input is the current recommended item slate (i.e., system exposure) for the user while the output is the user's historical behavior. Experimental results on two real-world datasets indicate a great danger of user behavior leakage. To address the risk, we propose a two-stage privacy-protection mechanism which firstly selects a subset of items from the exposure slate and then replaces the selected items with uniform or popularity-based exposure. Experimental evaluation reveals a trade-off effect between the recommendation accuracy and the privacy disclosure risk.
△ Less
Submitted 23 October, 2022; v1 submitted 15 October, 2022;
originally announced October 2022.
-
Rethinking Reinforcement Learning for Recommendation: A Prompt Perspective
Authors:
Xin Xin,
Tiago Pimentel,
Alexandros Karatzoglou,
Pengjie Ren,
Konstantina Christakopoulou,
Zhaochun Ren
Abstract:
Modern recommender systems aim to improve user experience. As reinforcement learning (RL) naturally fits this objective -- maximizing an user's reward per session -- it has become an emerging topic in recommender systems. Developing RL-based recommendation methods, however, is not trivial due to the \emph{offline training challenge}. Specifically, the keystone of traditional RL is to train an agen…
▽ More
Modern recommender systems aim to improve user experience. As reinforcement learning (RL) naturally fits this objective -- maximizing an user's reward per session -- it has become an emerging topic in recommender systems. Developing RL-based recommendation methods, however, is not trivial due to the \emph{offline training challenge}. Specifically, the keystone of traditional RL is to train an agent with large amounts of online exploration making lots of `errors' in the process. In the recommendation setting, though, we cannot afford the price of making `errors' online. As a result, the agent needs to be trained through offline historical implicit feedback, collected under different recommendation policies; traditional RL algorithms may lead to sub-optimal policies under these offline training settings.
Here we propose a new learning paradigm -- namely Prompt-Based Reinforcement Learning (PRL) -- for the offline training of RL-based recommendation agents. While traditional RL algorithms attempt to map state-action input pairs to their expected rewards (e.g., Q-values), PRL directly infers actions (i.e., recommended items) from state-reward inputs. In short, the agents are trained to predict a recommended item given the prior interactions and an observed reward value -- with simple supervised learning. At deployment time, this historical (training) data acts as a knowledge base, while the state-reward pairs are used as a prompt. The agents are thus used to answer the question: \emph{ Which item should be recommended given the prior interactions \& the prompted reward value}? We implement PRL with four notable recommendation models and conduct experiments on two real-world e-commerce datasets. Experimental results demonstrate the superior performance of our proposed methods.
△ Less
Submitted 15 June, 2022;
originally announced June 2022.
-
GDSRec: Graph-Based Decentralized Collaborative Filtering for Social Recommendation
Authors:
Jiajia Chen,
Xin Xin,
Xianfeng Liang,
Xiangnan He,
Jun Liu
Abstract:
Generating recommendations based on user-item interactions and user-user social relations is a common use case in web-based systems. These connections can be naturally represented as graph-structured data and thus utilizing graph neural networks (GNNs) for social recommendation has become a promising research direction. However, existing graph-based methods fails to consider the bias offsets of us…
▽ More
Generating recommendations based on user-item interactions and user-user social relations is a common use case in web-based systems. These connections can be naturally represented as graph-structured data and thus utilizing graph neural networks (GNNs) for social recommendation has become a promising research direction. However, existing graph-based methods fails to consider the bias offsets of users (items). For example, a low rating from a fastidious user may not imply a negative attitude toward this item because the user tends to assign low ratings in common cases. Such statistics should be considered into the graph modeling procedure. While some past work considers the biases, we argue that these proposed methods only treat them as scalars and can not capture the complete bias information hidden in data. Besides, social connections between users should also be differentiable so that users with similar item preference would have more influence on each other. To this end, we propose Graph-Based Decentralized Collaborative Filtering for Social Recommendation (GDSRec). GDSRec treats the biases as vectors and fuses them into the process of learning user and item representations. The statistical bias offsets are captured by decentralized neighborhood aggregation while the social connection strength is defined according to the preference similarity and then incorporated into the model design. We conduct extensive experiments on two benchmark datasets to verify the effectiveness of the proposed model. Experimental results show that the proposed GDSRec achieves superior performance compared with state-of-the-art related baselines. Our implementations are available in \url{https://github.com/MEICRS/GDSRec}.
△ Less
Submitted 19 May, 2022;
originally announced May 2022.
-
Prediction Algorithm for Heat Demand of Science and Technology Topics Based on Time Convolution Network
Authors:
Cui Haiyan,
Li Yawen,
Xu Xin
Abstract:
Thanks to the rapid development of deep learning, big data analysis technology is not only widely used in the field of natural language processing, but also more mature in the field of numerical prediction. It is of great significance for the subject heat prediction and analysis of science and technology demand data. How to apply theme features to accurately predict the theme heat of science and t…
▽ More
Thanks to the rapid development of deep learning, big data analysis technology is not only widely used in the field of natural language processing, but also more mature in the field of numerical prediction. It is of great significance for the subject heat prediction and analysis of science and technology demand data. How to apply theme features to accurately predict the theme heat of science and technology demand is the core to solve this problem. In this paper, a prediction method of subject heat of science and technology demand based on time convolution network (TCN) is proposed to obtain the subject feature representation of science and technology demand. Time series prediction is carried out based on TCN network and self attention mechanism, which increases the accuracy of subject heat prediction of science and technology demand data Experiments show that the prediction accuracy of this algorithm is better than other time series prediction methods on the real science and technology demand datasets.
△ Less
Submitted 20 March, 2022;
originally announced March 2022.
-
Handwritten Mathematical Expression Recognition via Attention Aggregation based Bi-directional Mutual Learning
Authors:
Xiaohang Bian,
Bo Qin,
Xiaozhe Xin,
Jianwu Li,
Xuefeng Su,
Yanfeng Wang
Abstract:
Handwritten mathematical expression recognition aims to automatically generate LaTeX sequences from given images. Currently, attention-based encoder-decoder models are widely used in this task. They typically generate target sequences in a left-to-right (L2R) manner, leaving the right-to-left (R2L) contexts unexploited. In this paper, we propose an Attention aggregation based Bi-directional Mutual…
▽ More
Handwritten mathematical expression recognition aims to automatically generate LaTeX sequences from given images. Currently, attention-based encoder-decoder models are widely used in this task. They typically generate target sequences in a left-to-right (L2R) manner, leaving the right-to-left (R2L) contexts unexploited. In this paper, we propose an Attention aggregation based Bi-directional Mutual learning Network (ABM) which consists of one shared encoder and two parallel inverse decoders (L2R and R2L). The two decoders are enhanced via mutual distillation, which involves one-to-one knowledge transfer at each training step, making full use of the complementary information from two inverse directions. Moreover, in order to deal with mathematical symbols in diverse scales, an Attention Aggregation Module (AAM) is proposed to effectively integrate multi-scale coverage attentions. Notably, in the inference phase, given that the model already learns knowledge from two inverse directions, we only use the L2R branch for inference, keeping the original parameter size and inference speed. Extensive experiments demonstrate that our proposed approach achieves the recognition accuracy of 56.85 % on CROHME 2014, 52.92 % on CROHME 2016, and 53.96 % on CROHME 2019 without data augmentation and model ensembling, substantially outperforming the state-of-the-art methods. The source code is available in https://github.com/XH-B/ABM.
△ Less
Submitted 23 February, 2022; v1 submitted 7 December, 2021;
originally announced December 2021.
-
Supervised Advantage Actor-Critic for Recommender Systems
Authors:
Xin Xin,
Alexandros Karatzoglou,
Ioannis Arapakis,
Joemon M. Jose
Abstract:
Casting session-based or sequential recommendation as reinforcement learning (RL) through reward signals is a promising research direction towards recommender systems (RS) that maximize cumulative profits. However, the direct use of RL algorithms in the RS setting is impractical due to challenges like off-policy training, huge action spaces and lack of sufficient reward signals. Recent RL approach…
▽ More
Casting session-based or sequential recommendation as reinforcement learning (RL) through reward signals is a promising research direction towards recommender systems (RS) that maximize cumulative profits. However, the direct use of RL algorithms in the RS setting is impractical due to challenges like off-policy training, huge action spaces and lack of sufficient reward signals. Recent RL approaches for RS attempt to tackle these challenges by combining RL and (self-)supervised sequential learning, but still suffer from certain limitations. For example, the estimation of Q-values tends to be biased toward positive values due to the lack of negative reward signals. Moreover, the Q-values also depend heavily on the specific timestamp of a sequence.
To address the above problems, we propose negative sampling strategy for training the RL component and combine it with supervised sequential learning. We call this method Supervised Negative Q-learning (SNQN). Based on sampled (negative) actions (items), we can calculate the "advantage" of a positive action over the average case, which can be further utilized as a normalized weight for learning the supervised sequential part. This leads to another learning framework: Supervised Advantage Actor-Critic (SA2C). We instantiate SNQN and SA2C with four state-of-the-art sequential recommendation models and conduct experiments on two real-world datasets. Experimental results show that the proposed approaches achieve significantly better performance than state-of-the-art supervised methods and existing self-supervised RL methods . Code will be open-sourced.
△ Less
Submitted 5 November, 2021;
originally announced November 2021.
-
Choosing the Best of Both Worlds: Diverse and Novel Recommendations through Multi-Objective Reinforcement Learning
Authors:
Dusan Stamenkovic,
Alexandros Karatzoglou,
Ioannis Arapakis,
Xin Xin,
Kleomenis Katevas
Abstract:
Since the inception of Recommender Systems (RS), the accuracy of the recommendations in terms of relevance has been the golden criterion for evaluating the quality of RS algorithms. However, by focusing on item relevance, one pays a significant price in terms of other important metrics: users get stuck in a "filter bubble" and their array of options is significantly reduced, hence degrading the qu…
▽ More
Since the inception of Recommender Systems (RS), the accuracy of the recommendations in terms of relevance has been the golden criterion for evaluating the quality of RS algorithms. However, by focusing on item relevance, one pays a significant price in terms of other important metrics: users get stuck in a "filter bubble" and their array of options is significantly reduced, hence degrading the quality of the user experience and leading to churn. Recommendation, and in particular session-based/sequential recommendation, is a complex task with multiple - and often conflicting objectives - that existing state-of-the-art approaches fail to address.
In this work, we take on the aforementioned challenge and introduce Scalarized Multi-Objective Reinforcement Learning (SMORL) for the RS setting, a novel Reinforcement Learning (RL) framework that can effectively address multi-objective recommendation tasks. The proposed SMORL agent augments standard recommendation models with additional RL layers that enforce it to simultaneously satisfy three principal objectives: accuracy, diversity, and novelty of recommendations. We integrate this framework with four state-of-the-art session-based recommendation models and compare it with a single-objective RL agent that only focuses on accuracy. Our experimental results on two real-world datasets reveal a substantial increase in aggregate diversity, a moderate increase in accuracy, reduced repetitiveness of recommendations, and demonstrate the importance of reinforcing diversity and novelty as complementary objectives.
△ Less
Submitted 28 October, 2021;
originally announced October 2021.
-
Extracting Attentive Social Temporal Excitation for Sequential Recommendation
Authors:
Yunzhe Li,
Yue Ding,
Bo Chen,
Xin Xin,
Yule Wang,
Yuxiang Shi,
Ruiming Tang,
Dong Wang
Abstract:
In collaborative filtering, it is an important way to make full use of social information to improve the recommendation quality, which has been proved to be effective because user behavior will be affected by her friends. However, existing works leverage the social relationship to aggregate user features from friends' historical behavior sequences in a user-level indirect paradigm. A significant d…
▽ More
In collaborative filtering, it is an important way to make full use of social information to improve the recommendation quality, which has been proved to be effective because user behavior will be affected by her friends. However, existing works leverage the social relationship to aggregate user features from friends' historical behavior sequences in a user-level indirect paradigm. A significant defect of the indirect paradigm is that it ignores the temporal relationships between behavior events across users. In this paper, we propose a novel time-aware sequential recommendation framework called Social Temporal Excitation Networks (STEN), which introduces temporal point processes to model the fine-grained impact of friends' behaviors on the user s dynamic interests in an event-level direct paradigm. Moreover, we propose to decompose the temporal effect in sequential recommendation into social mutual temporal effect and ego temporal effect. Specifically, we employ a social heterogeneous graph embedding layer to refine user representation via structural information. To enhance temporal information propagation, STEN directly extracts the fine-grained temporal mutual influence of friends' behaviors through the mutually exciting temporal network. Besides, the user s dynamic interests are captured through the self-exciting temporal network. Extensive experiments on three real-world datasets show that STEN outperforms state-of-the-art baseline methods. Moreover, STEN provides event-level recommendation explainability, which is also illustrated experimentally.
△ Less
Submitted 28 September, 2021;
originally announced September 2021.
-
ICPE: An Item Cluster-Wise Pareto-Efficient Framework for Recommendation Debiasing
Authors:
Yule Wang,
Xin Xin,
Yue Ding,
Yunzhe Li,
Dong Wang
Abstract:
Recommender system based on historical user-item interactions is of vital importance for web-based services. However, the observed data used to train the recommender model suffers from severe bias issues. Practically, the item frequency distribution of the dataset is a highly skewed power-law distribution. Interactions of a small fraction of head items account for almost the whole training data. T…
▽ More
Recommender system based on historical user-item interactions is of vital importance for web-based services. However, the observed data used to train the recommender model suffers from severe bias issues. Practically, the item frequency distribution of the dataset is a highly skewed power-law distribution. Interactions of a small fraction of head items account for almost the whole training data. The normal training paradigm from such biased data tends to repetitively generate recommendations from the head items, which further exacerbates the biases and affects the exploration of potentially interesting items from the niche set. In this work, we innovatively explore the central theme of recommendation debiasing from an item cluster-wise multi-objective optimization perspective. Aiming to balance the learning on various item clusters that differ in popularity during the training process, we propose a model-agnostic framework namely Item Cluster-Wise Pareto-Efficient Recommendation (ICPE). In detail, we define our item cluster-wise optimization target as the recommender model should balance all item clusters that differ in popularity, thus we set the model learning on each item cluster as a unique optimization objective. To achieve this goal, we first explore items' popularity levels from a novel causal reasoning perspective. Then, we devise popularity discrepancy-based bisecting clustering to separate the item clusters. Next, we adaptively find the overall harmonious gradient direction for cluster-wise optimization objectives from a Pareto-efficient solver. Finally, in the prediction stage, we perform counterfactual inference to further eliminate the impact of global propensity. Extensive experimental results verify the superiorities of ICPE on overall recommendation performance and biases elimination.
△ Less
Submitted 22 July, 2023; v1 submitted 27 September, 2021;
originally announced September 2021.
-
ReMeDi: Resources for Multi-domain, Multi-service, Medical Dialogues
Authors:
Guojun Yan,
Jiahuan Pei,
Pengjie Ren,
Zhaochun Ren,
Xin Xin,
Huasheng Liang,
Maarten de Rijke,
Zhumin Chen
Abstract:
Medical dialogue systems (MDSs) aim to assist doctors and patients with a range of professional medical services, i.e., diagnosis, treatment and consultation. The development of MDSs is hindered because of a lack of resources. In particular. (1) there is no dataset with large-scale medical dialogues that covers multiple medical services and contains fine-grained medical labels (i.e., intents, acti…
▽ More
Medical dialogue systems (MDSs) aim to assist doctors and patients with a range of professional medical services, i.e., diagnosis, treatment and consultation. The development of MDSs is hindered because of a lack of resources. In particular. (1) there is no dataset with large-scale medical dialogues that covers multiple medical services and contains fine-grained medical labels (i.e., intents, actions, slots, values), and (2) there is no set of established benchmarks for MDSs for multi-domain, multi-service medical dialogues. In this paper, we present ReMeDi, a set of resource for medical dialogues. ReMeDi consists of two parts, the ReMeDi dataset and the ReMeDi benchmarks. The ReMeDi dataset contains 96,965 conversations between doctors and patients, including 1,557 conversations with fine-gained labels. It covers 843 types of diseases, 5,228 medical entities, and 3 specialties of medical services across 40 domains. To the best of our knowledge, the ReMeDi dataset is the only medical dialogue dataset that covers multiple domains and services, and has fine-grained medical labels. The second part of the ReMeDi resources consists of a set of state-of-the-art models for (medical) dialogue generation. The ReMeDi benchmark has the following methods: (1) pretrained models (i.e., BERT-WWM, BERT-MED, GPT2, and MT5) trained, validated, and tested on the ReMeDi dataset, and (2) a self-supervised contrastive learning(SCL) method to expand the ReMeDi dataset and enhance the training of the state-of-the-art pretrained models. We describe the creation of the ReMeDi dataset, the ReMeDi benchmarking methods, and establish experimental results using the ReMeDi benchmarking methods on the ReMeDi dataset for future research to compare against. With this paper, we share the dataset, implementations of the benchmarks, and evaluation scripts.
△ Less
Submitted 1 March, 2022; v1 submitted 1 September, 2021;
originally announced September 2021.
-
Memory-Free Generative Replay For Class-Incremental Learning
Authors:
Xiaomeng Xin,
Yiran Zhong,
Yunzhong Hou,
Jinjun Wang,
Liang Zheng
Abstract:
Regularization-based methods are beneficial to alleviate the catastrophic forgetting problem in class-incremental learning. With the absence of old task images, they often assume that old knowledge is well preserved if the classifier produces similar output on new images. In this paper, we find that their effectiveness largely depends on the nature of old classes: they work well on classes that ar…
▽ More
Regularization-based methods are beneficial to alleviate the catastrophic forgetting problem in class-incremental learning. With the absence of old task images, they often assume that old knowledge is well preserved if the classifier produces similar output on new images. In this paper, we find that their effectiveness largely depends on the nature of old classes: they work well on classes that are easily distinguishable between each other but may fail on more fine-grained ones, e.g., boy and girl. In spirit, such methods project new data onto the feature space spanned by the weight vectors in the fully connected layer, corresponding to old classes. The resulting projections would be similar on fine-grained old classes, and as a consequence the new classifier will gradually lose the discriminative ability on these classes. To address this issue, we propose a memory-free generative replay strategy to preserve the fine-grained old classes characteristics by generating representative old images directly from the old classifier and combined with new data for new classifier training. To solve the homogenization problem of the generated samples, we also propose a diversity loss that maximizes Kullback Leibler (KL) divergence between generated samples. Our method is best complemented by prior regularization-based methods proved to be effective for easily distinguishable old classes. We validate the above design and insights on CUB-200-2011, Caltech-101, CIFAR-100 and Tiny ImageNet and show that our strategy outperforms existing memory-free methods with a clear margin. Code is available at https://github.com/xmengxin/MFGR
△ Less
Submitted 1 September, 2021;
originally announced September 2021.
-
Structure Amplification on Multi-layer Stochastic Block Models
Authors:
Xiaodong Xin,
Kun He,
Jialu Bao,
Bart Selman,
John E. Hopcroft
Abstract:
Much of the complexity of social, biological, and engineered systems arises from a network of complex interactions connecting many basic components. Network analysis tools have been successful at uncovering latent structure termed communities in such networks. However, some of the most interesting structure can be difficult to uncover because it is obscured by the more dominant structure. Our prev…
▽ More
Much of the complexity of social, biological, and engineered systems arises from a network of complex interactions connecting many basic components. Network analysis tools have been successful at uncovering latent structure termed communities in such networks. However, some of the most interesting structure can be difficult to uncover because it is obscured by the more dominant structure. Our previous work proposes a general structure amplification technique called HICODE that uncovers many layers of functional hidden structure in complex networks. HICODE incrementally weakens dominant structure through randomization allowing the hidden functionality to emerge, and uncovers these hidden structure in real-world networks that previous methods rarely uncover. In this work, we conduct a comprehensive and systematic theoretical analysis on the hidden community structure. In what follows, we define multi-layer stochastic block model, and provide theoretical support using the model on why the existence of hidden structure will make the detection of dominant structure harder compared with equivalent random noise. We then provide theoretical proofs that the iterative reducing methods could help promote the uncovering of hidden structure as well as boosting the detection quality of dominant structure.
△ Less
Submitted 30 July, 2021;
originally announced August 2021.
-
Quality-Aware Network for Face Parsing
Authors:
Lu Yang,
Qing Song,
Xueshi Xin,
Wenhe Jia,
Zhiwei Liu
Abstract:
This is a very short technical report, which introduces the solution of the Team BUPT-CASIA for Short-video Face Parsing Track of The 3rd Person in Context (PIC) Workshop and Challenge at CVPR 2021.
Face parsing has recently attracted increasing interest due to its numerous application potentials. Generally speaking, it has a lot in common with human parsing, such as task setting, data character…
▽ More
This is a very short technical report, which introduces the solution of the Team BUPT-CASIA for Short-video Face Parsing Track of The 3rd Person in Context (PIC) Workshop and Challenge at CVPR 2021.
Face parsing has recently attracted increasing interest due to its numerous application potentials. Generally speaking, it has a lot in common with human parsing, such as task setting, data characteristics, number of categories and so on. Therefore, this work applies state-of-the-art human parsing method to face parsing task to explore the similarities and differences between them. Our submission achieves 86.84% score and wins the 2nd place in the challenge.
△ Less
Submitted 26 October, 2023; v1 submitted 14 June, 2021;
originally announced June 2021.
-
Learning Robust Recommenders through Cross-Model Agreement
Authors:
Yu Wang,
Xin Xin,
Zaiqiao Meng,
Xiangnan He,
Joemon Jose,
Fuli Feng
Abstract:
Learning from implicit feedback is one of the most common cases in the application of recommender systems. Generally speaking, interacted examples are considered as positive while negative examples are sampled from uninteracted ones. However, noisy examples are prevalent in real-world implicit feedback. A noisy positive example could be interacted but it actually leads to negative user preference.…
▽ More
Learning from implicit feedback is one of the most common cases in the application of recommender systems. Generally speaking, interacted examples are considered as positive while negative examples are sampled from uninteracted ones. However, noisy examples are prevalent in real-world implicit feedback. A noisy positive example could be interacted but it actually leads to negative user preference. A noisy negative example which is uninteracted because of unawareness of the user could also denote potential positive user preference. Conventional training methods overlook these noisy examples, leading to sub-optimal recommendations. In this work, we propose a novel framework to learn robust recommenders from implicit feedback. Through an empirical study, we find that different models make relatively similar predictions on clean examples which denote the real user preference, while the predictions on noisy examples vary much more across different models. Motivated by this observation, we propose denoising with cross-model agreement(DeCA) which aims to minimize the KL-divergence between the real user preference distributions parameterized by two recommendation models while maximizing the likelihood of data observation. We employ the proposed DeCA on four state-of-the-art recommendation models and conduct experiments on four datasets. Experimental results demonstrate that DeCA significantly improves recommendation performance compared with normal training and other denoising methods. Codes will be open-sourced.
△ Less
Submitted 13 March, 2022; v1 submitted 20 May, 2021;
originally announced May 2021.
-
AutoDebias: Learning to Debias for Recommendation
Authors:
Jiawei Chen,
Hande Dong,
Yang Qiu,
Xiangnan He,
Xin Xin,
Liang Chen,
Guli Lin,
Keping Yang
Abstract:
Recommender systems rely on user behavior data like ratings and clicks to build personalization model. However, the collected data is observational rather than experimental, causing various biases in the data which significantly affect the learned model. Most existing work for recommendation debiasing, such as the inverse propensity scoring and imputation approaches, focuses on one or two specific…
▽ More
Recommender systems rely on user behavior data like ratings and clicks to build personalization model. However, the collected data is observational rather than experimental, causing various biases in the data which significantly affect the learned model. Most existing work for recommendation debiasing, such as the inverse propensity scoring and imputation approaches, focuses on one or two specific biases, lacking the universal capacity that can account for mixed or even unknown biases in the data. Towards this research gap, we first analyze the origin of biases from the perspective of \textit{risk discrepancy} that represents the difference between the expectation empirical risk and the true risk. Remarkably, we derive a general learning framework that well summarizes most existing debiasing strategies by specifying some parameters of the general framework. This provides a valuable opportunity to develop a universal solution for debiasing, e.g., by learning the debiasing parameters from data. However, the training data lacks important signal of how the data is biased and what the unbiased data looks like. To move this idea forward, we propose \textit{AotoDebias} that leverages another (small) set of uniform data to optimize the debiasing parameters by solving the bi-level optimization problem with meta-learning. Through theoretical analyses, we derive the generalization bound for AutoDebias and prove its ability to acquire the appropriate debiasing strategy. Extensive experiments on two real datasets and a simulated dataset demonstrated effectiveness of AutoDebias. The code is available at \url{https://github.com/DongHande/AutoDebias}.
△ Less
Submitted 28 October, 2021; v1 submitted 10 May, 2021;
originally announced May 2021.
-
Graph Convolutional Embeddings for Recommender Systems
Authors:
Paula Gómez Duran,
Alexandros Karatzoglou,
Jordi Vitrià ,
Xin Xin,
Ioannis Arapakis
Abstract:
Modern recommender systems (RS) work by processing a number of signals that can be inferred from large sets of user-item interaction data. The main signal to analyze stems from the raw matrix that represents interactions. However, we can increase the performance of RS by considering other kinds of signals like the context of interactions, which could be, for example, the time or date of the intera…
▽ More
Modern recommender systems (RS) work by processing a number of signals that can be inferred from large sets of user-item interaction data. The main signal to analyze stems from the raw matrix that represents interactions. However, we can increase the performance of RS by considering other kinds of signals like the context of interactions, which could be, for example, the time or date of the interaction, the user location, or sequential data corresponding to the historical interactions of the user with the system. These complex, context-based interaction signals are characterized by a rich relational structure that can be represented by a multi-partite graph. Graph Convolutional Networks (GCNs) have been used successfully in collaborative filtering with simple user-item interaction data. In this work, we generalize the use of GCNs for N-partite graphs by considering N multiple context dimensions and propose a simple way for their seamless integration in modern deep learning RS architectures. More specifically, we define a graph convolutional embedding layer for N-partite graphs that processes user-item-context interactions, and constructs node embeddings by leveraging their relational structure. Experiments on several datasets from recommender systems to drug re-purposing show the benefits of the introduced GCN embedding layer by measuring the performance of different context-enriched tasks.
△ Less
Submitted 5 March, 2021;
originally announced March 2021.
-
Should Graph Convolution Trust Neighbors? A Simple Causal Inference Method
Authors:
Fuli Feng,
Weiran Huang,
Xiangnan He,
Xin Xin,
Qifan Wang,
Tat-Seng Chua
Abstract:
Graph Convolutional Network (GCN) is an emerging technique for information retrieval (IR) applications. While GCN assumes the homophily property of a graph, real-world graphs are never perfect: the local structure of a node may contain discrepancy, e.g., the labels of a node's neighbors could vary. This pushes us to consider the discrepancy of local structure in GCN modeling. Existing work approac…
▽ More
Graph Convolutional Network (GCN) is an emerging technique for information retrieval (IR) applications. While GCN assumes the homophily property of a graph, real-world graphs are never perfect: the local structure of a node may contain discrepancy, e.g., the labels of a node's neighbors could vary. This pushes us to consider the discrepancy of local structure in GCN modeling. Existing work approaches this issue by introducing an additional module such as graph attention, which is expected to learn the contribution of each neighbor. However, such module may not work reliably as expected, especially when there lacks supervision signal, e.g., when the labeled data is small. Moreover, existing methods focus on modeling the nodes in the training data, and never consider the local structure discrepancy of testing nodes.
This work focuses on the local structure discrepancy issue for testing nodes, which has received little scrutiny. From a novel perspective of causality, we investigate whether a GCN should trust the local structure of a testing node when predicting its label. To this end, we analyze the working mechanism of GCN with causal graph, estimating the causal effect of a node's local structure for the prediction. The idea is simple yet effective: given a trained GCN model, we first intervene the prediction by blocking the graph structure; we then compare the original prediction with the intervened prediction to assess the causal effect of the local structure on the prediction. Through this way, we can eliminate the impact of local structure discrepancy and make more accurate prediction. Extensive experiments on seven node classification datasets show that our method effectively enhances the inference stage of GCN.
△ Less
Submitted 6 June, 2021; v1 submitted 22 October, 2020;
originally announced October 2020.
-
Renovating Parsing R-CNN for Accurate Multiple Human Parsing
Authors:
Lu Yang,
Qing Song,
Zhihui Wang,
Mengjie Hu,
Chun Liu,
Xueshi Xin,
Wenhe Jia,
Songcen Xu
Abstract:
Multiple human parsing aims to segment various human parts and associate each part with the corresponding instance simultaneously. This is a very challenging task due to the diverse human appearance, semantic ambiguity of different body parts, and complex background. Through analysis of multiple human parsing task, we observe that human-centric global perception and accurate instance-level parsing…
▽ More
Multiple human parsing aims to segment various human parts and associate each part with the corresponding instance simultaneously. This is a very challenging task due to the diverse human appearance, semantic ambiguity of different body parts, and complex background. Through analysis of multiple human parsing task, we observe that human-centric global perception and accurate instance-level parsing scoring are crucial for obtaining high-quality results. But the most state-of-the-art methods have not paid enough attention to these issues. To reverse this phenomenon, we present Renovating Parsing R-CNN (RP R-CNN), which introduces a global semantic enhanced feature pyramid network and a parsing re-scoring network into the existing high-performance pipeline. The proposed RP R-CNN adopts global semantic representation to enhance multi-scale features for generating human parsing maps, and regresses a confidence score to represent its quality. Extensive experiments show that RP R-CNN performs favorably against state-of-the-art methods on CIHP and MHP-v2 datasets. Code and models are available at https://github.com/soeaver/RP-R-CNN.
△ Less
Submitted 20 September, 2020;
originally announced September 2020.
-
Automated Radiological Report Generation For Chest X-Rays With Weakly-Supervised End-to-End Deep Learning
Authors:
Shuai Zhang,
Xiaoyan Xin,
Yang Wang,
Yachong Guo,
Qiuqiao Hao,
Xianfeng Yang,
Jun Wang,
Jian Zhang,
Bing Zhang,
Wei Wang
Abstract:
The chest X-Ray (CXR) is the one of the most common clinical exam used to diagnose thoracic diseases and abnormalities. The volume of CXR scans generated daily in hospitals is huge. Therefore, an automated diagnosis system able to save the effort of doctors is of great value. At present, the applications of artificial intelligence in CXR diagnosis usually use pattern recognition to classify the sc…
▽ More
The chest X-Ray (CXR) is the one of the most common clinical exam used to diagnose thoracic diseases and abnormalities. The volume of CXR scans generated daily in hospitals is huge. Therefore, an automated diagnosis system able to save the effort of doctors is of great value. At present, the applications of artificial intelligence in CXR diagnosis usually use pattern recognition to classify the scans. However, such methods rely on labeled databases, which are costly and usually have large error rates. In this work, we built a database containing more than 12,000 CXR scans and radiological reports, and developed a model based on deep convolutional neural network and recurrent network with attention mechanism. The model learns features from the CXR scans and the associated raw radiological reports directly; no additional labeling of the scans are needed. The model provides automated recognition of given scans and generation of reports. The quality of the generated reports was evaluated with both the CIDEr scores and by radiologists as well. The CIDEr scores are found to be around 5.8 on average for the testing dataset. Further blind evaluation suggested a comparable performance against human radiologist.
△ Less
Submitted 18 June, 2020;
originally announced June 2020.
-
Self-Supervised Reinforcement Learning for Recommender Systems
Authors:
Xin Xin,
Alexandros Karatzoglou,
Ioannis Arapakis,
Joemon M. Jose
Abstract:
In session-based or sequential recommendation, it is important to consider a number of factors like long-term user engagement, multiple types of user-item interactions such as clicks, purchases etc. The current state-of-the-art supervised approaches fail to model them appropriately. Casting sequential recommendation task as a reinforcement learning (RL) problem is a promising direction. A major co…
▽ More
In session-based or sequential recommendation, it is important to consider a number of factors like long-term user engagement, multiple types of user-item interactions such as clicks, purchases etc. The current state-of-the-art supervised approaches fail to model them appropriately. Casting sequential recommendation task as a reinforcement learning (RL) problem is a promising direction. A major component of RL approaches is to train the agent through interactions with the environment. However, it is often problematic to train a recommender in an on-line fashion due to the requirement to expose users to irrelevant recommendations. As a result, learning the policy from logged implicit feedback is of vital importance, which is challenging due to the pure off-policy setting and lack of negative rewards (feedback). In this paper, we propose self-supervised reinforcement learning for sequential recommendation tasks. Our approach augments standard recommendation models with two output layers: one for self-supervised learning and the other for RL. The RL part acts as a regularizer to drive the supervised layer focusing on specific rewards(e.g., recommending items which may lead to purchases rather than clicks) while the self-supervised layer with cross-entropy loss provides strong gradient signals for parameter updates. Based on such an approach, we propose two frameworks namely Self-Supervised Q-learning(SQN) and Self-Supervised Actor-Critic(SAC). We integrate the proposed frameworks with four state-of-the-art recommendation models. Experimental results on two real-world datasets demonstrate the effectiveness of our approach.
△ Less
Submitted 11 June, 2020; v1 submitted 10 June, 2020;
originally announced June 2020.
-
Graph Highway Networks
Authors:
Xin Xin,
Alexandros Karatzoglou,
Ioannis Arapakis,
Joemon M. Jose
Abstract:
Graph Convolution Networks (GCN) are widely used in learning graph representations due to their effectiveness and efficiency. However, they suffer from the notorious over-smoothing problem, in which the learned representations of densely connected nodes converge to alike vectors when many (>3) graph convolutional layers are stacked. In this paper, we argue that there-normalization trick used in GC…
▽ More
Graph Convolution Networks (GCN) are widely used in learning graph representations due to their effectiveness and efficiency. However, they suffer from the notorious over-smoothing problem, in which the learned representations of densely connected nodes converge to alike vectors when many (>3) graph convolutional layers are stacked. In this paper, we argue that there-normalization trick used in GCN leads to overly homogeneous information propagation, which is the source of over-smoothing. To address this problem, we propose Graph Highway Networks(GHNet) which utilize gating units to automatically balance the trade-off between homogeneity and heterogeneity in the GCN learning process. The gating units serve as direct highways to maintain heterogeneous information from the node itself after feature propagation. This design enables GHNet to achieve much larger receptive fields per node without over-smoothing and thus access to more of the graph connectivity information. Experimental results on benchmark datasets demonstrate the superior performance of GHNet over GCN and related models.
△ Less
Submitted 9 April, 2020;
originally announced April 2020.
-
Generalized Embedding Machines for Recommender Systems
Authors:
Enneng Yang,
Xin Xin,
Li Shen,
Guibing Guo
Abstract:
Factorization machine (FM) is an effective model for feature-based recommendation which utilizes inner product to capture second-order feature interactions. However, one of the major drawbacks of FM is that it couldn't capture complex high-order interaction signals. A common solution is to change the interaction function, such as stacking deep neural networks on the top of FM. In this work, we pro…
▽ More
Factorization machine (FM) is an effective model for feature-based recommendation which utilizes inner product to capture second-order feature interactions. However, one of the major drawbacks of FM is that it couldn't capture complex high-order interaction signals. A common solution is to change the interaction function, such as stacking deep neural networks on the top of FM. In this work, we propose an alternative approach to model high-order interaction signals in the embedding level, namely Generalized Embedding Machine (GEM). The embedding used in GEM encodes not only the information from the feature itself but also the information from other correlated features. Under such situation, the embedding becomes high-order. Then we can incorporate GEM with FM and even its advanced variants to perform feature interactions. More specifically, in this paper we utilize graph convolution networks (GCN) to generate high-order embeddings. We integrate GEM with several FM-based models and conduct extensive experiments on two real-world datasets. The results demonstrate significant improvement of GEM over corresponding baselines.
△ Less
Submitted 16 February, 2020;
originally announced February 2020.
-
Hidden Community Detection on Two-layer Stochastic Models: a Theoretical Perspective
Authors:
Jialu Bao,
Kun He,
Xiaodong Xin,
Bart Selman,
John E. Hopcroft
Abstract:
Hidden community is a new graph-theoretical concept recently proposed [4], in which the authors also propose a meta-approach called HICODE (Hidden Community Detection) for detecting hidden communities. HICODE is demonstrated through experiments that it is able to uncover previously overshadowed weak layers and uncover both weak and strong layers at a higher accuracy. However, the authors provide n…
▽ More
Hidden community is a new graph-theoretical concept recently proposed [4], in which the authors also propose a meta-approach called HICODE (Hidden Community Detection) for detecting hidden communities. HICODE is demonstrated through experiments that it is able to uncover previously overshadowed weak layers and uncover both weak and strong layers at a higher accuracy. However, the authors provide no theoretical guarantee for the performance. In this work, we focus on the theoretical analysis of HICODE on synthetic two-layer networks, where layers are independent of each other and each layer is generated by stochastic block model. We bridge their gap through two-layer stochastic block model networks in the following aspects: 1) we show that partitions that locally optimize modularity correspond to grounded layers, indicating modularity-optimizing algorithms can detect strong layers; 2) we prove that when reducing found layers, HICODE increases absolute modularities of all unreduced layers, showing its layer reduction step makes weak layers more detectable. Our work builds a solid theoretical base for HICODE, demonstrating that it is promising in uncovering both weak and strong layers of communities in two-layer networks.
△ Less
Submitted 12 March, 2020; v1 submitted 16 January, 2020;
originally announced January 2020.
-
Relational Collaborative Filtering:Modeling Multiple Item Relations for Recommendation
Authors:
Xin Xin,
Xiangnan He,
Yongfeng Zhang,
Yongdong Zhang,
Joemon Jose
Abstract:
Existing item-based collaborative filtering (ICF) methods leverage only the relation of collaborative similarity. Nevertheless, there exist multiple relations between items in real-world scenarios. Distinct from the collaborative similarity that implies co-interact patterns from the user perspective, these relations reveal fine-grained knowledge on items from different perspectives of meta-data, f…
▽ More
Existing item-based collaborative filtering (ICF) methods leverage only the relation of collaborative similarity. Nevertheless, there exist multiple relations between items in real-world scenarios. Distinct from the collaborative similarity that implies co-interact patterns from the user perspective, these relations reveal fine-grained knowledge on items from different perspectives of meta-data, functionality, etc. However, how to incorporate multiple item relations is less explored in recommendation research. In this work, we propose Relational Collaborative Filtering (RCF), a general framework to exploit multiple relations between items in recommender system. We find that both the relation type and the relation value are crucial in inferring user preference. To this end, we develop a two-level hierarchical attention mechanism to model user preference. The first-level attention discriminates which types of relations are more important, and the second-level attention considers the specific relation values to estimate the contribution of a historical item in recommending the target item. To make the item embeddings be reflective of the relational structure between items, we further formulate a task to preserve the item relations, and jointly train it with the recommendation task of preference modeling. Empirical results on two real datasets demonstrate the strong performance of RCF. Furthermore, we also conduct qualitative analyses to show the benefits of explanations brought by the modeling of multiple item relations.
△ Less
Submitted 11 May, 2019; v1 submitted 29 April, 2019;
originally announced April 2019.
-
Pulsar Candidate Identification with Artificial Intelligence Techniques
Authors:
Ping Guo,
Fuqing Duan,
Pei Wang,
Yao Yao,
Qian Yin,
Xin Xin
Abstract:
Discovering pulsars is a significant and meaningful research topic in the field of radio astronomy. With the advent of astronomical instruments such as he Five-hundred-meter Aperture Spherical Telescope (FAST) in China, data volumes and data rates are exponentially growing. This fact necessitates a focus on artificial intelligence (AI) technologies that can perform the automatic pulsar candidate i…
▽ More
Discovering pulsars is a significant and meaningful research topic in the field of radio astronomy. With the advent of astronomical instruments such as he Five-hundred-meter Aperture Spherical Telescope (FAST) in China, data volumes and data rates are exponentially growing. This fact necessitates a focus on artificial intelligence (AI) technologies that can perform the automatic pulsar candidate identification to mine large astronomical data sets. Automatic pulsar candidate identification can be considered as a task of determining potential candidates for further investigation and eliminating noises of radio frequency interferences or other non-pulsar signals. It is very hard to raise the performance of DCNN-based pulsar identification because the limited training samples restrict network structure to be designed deep enough for learning good features as well as the crucial class imbalance problem due to very limited number of real pulsar samples. To address these problems, we proposed a framework which combines deep convolution generative adversarial network (DCGAN) with support vector machine (SVM) to deal with imbalance class problem and to improve pulsar identification accuracy. DCGAN is used as sample generation and feature learning model, and SVM is adopted as the classifier for predicting candidate's labels in the inference stage. The proposed framework is a novel technique which not only can solve imbalance class problem but also can learn discriminative feature representations of pulsar candidates instead of computing hand-crafted features in preprocessing steps too, which makes it more accurate for automatic pulsar candidate selection. Experiments on two pulsar datasets verify the effectiveness and efficiency of our proposed method.
△ Less
Submitted 23 October, 2019; v1 submitted 27 November, 2017;
originally announced November 2017.
-
Deep Self-Paced Learning for Person Re-Identification
Authors:
Sanping Zhou,
Jinjun Wang,
Deyu Meng,
Xiaomeng Xin,
Yubing Li,
Yihong Gong,
Nanning Zheng
Abstract:
Person re-identification (Re-ID) usually suffers from noisy samples with background clutter and mutual occlusion, which makes it extremely difficult to distinguish different individuals across the disjoint camera views. In this paper, we propose a novel deep self-paced learning (DSPL) algorithm to alleviate this problem, in which we apply a self-paced constraint and symmetric regularization to hel…
▽ More
Person re-identification (Re-ID) usually suffers from noisy samples with background clutter and mutual occlusion, which makes it extremely difficult to distinguish different individuals across the disjoint camera views. In this paper, we propose a novel deep self-paced learning (DSPL) algorithm to alleviate this problem, in which we apply a self-paced constraint and symmetric regularization to help the relative distance metric training the deep neural network, so as to learn the stable and discriminative features for person Re-ID. Firstly, we propose a soft polynomial regularizer term which can derive the adaptive weights to samples based on both the training loss and model age. As a result, the high-confidence fidelity samples will be emphasized and the low-confidence noisy samples will be suppressed at early stage of the whole training process. Such a learning regime is naturally implemented under a self-paced learning (SPL) framework, in which samples weights are adaptively updated based on both model age and sample loss using an alternative optimization method. Secondly, we introduce a symmetric regularizer term to revise the asymmetric gradient back-propagation derived by the relative distance metric, so as to simultaneously minimize the intra-class distance and maximize the inter-class distance in each triplet unit. Finally, we build a part-based deep neural network, in which the features of different body parts are first discriminately learned in the lower convolutional layers and then fused in the higher fully connected layers. Experiments on several benchmark datasets have demonstrated the superior performance of our method as compared with the state-of-the-art approaches.
△ Less
Submitted 6 October, 2017;
originally announced October 2017.