Search | arXiv e-print repository

Breaking Language Barriers: Cross-Lingual Continual Pre-Training at Scale

Authors: Wenzhen Zheng, Wenbo Pan, Xu Xu, Libo Qin, Li Yue, Ming Zhou

Abstract: In recent years, Large Language Models (LLMs) have made significant strides towards Artificial General Intelligence. However, training these models from scratch requires substantial computational resources and vast amounts of text data. In this paper, we explore an alternative approach to constructing an LLM for a new language by continually pretraining (CPT) from existing pretrained LLMs, instead… ▽ More In recent years, Large Language Models (LLMs) have made significant strides towards Artificial General Intelligence. However, training these models from scratch requires substantial computational resources and vast amounts of text data. In this paper, we explore an alternative approach to constructing an LLM for a new language by continually pretraining (CPT) from existing pretrained LLMs, instead of using randomly initialized parameters. Based on parallel experiments on 40 model sizes ranging from 40M to 5B parameters, we find that 1) CPT converges faster and saves significant resources in a scalable manner; 2) CPT adheres to an extended scaling law derived from Hoffmann et al. (2022) with a joint data-parameter scaling term; 3) The compute-optimal data-parameter allocation for CPT markedly differs based on our estimated scaling factors; 4) The effectiveness of transfer at scale is influenced by training duration and linguistic properties, while robust to data replaying, a method that effectively mitigates catastrophic forgetting in CPT. We hope our findings provide deeper insights into the transferability of LLMs at scale for the research community. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: 8 pages

arXiv:2406.19767 [pdf, other]

Subgraph Matching via Partial Optimal Transport

Authors: Wen-Xin Pan, Isabel Haasler, Pascal Frossard

Abstract: In this work, we propose a novel approach for subgraph matching, the problem of finding a given query graph in a large source graph, based on the fused Gromov-Wasserstein distance. We formulate the subgraph matching problem as a partial fused Gromov-Wasserstein problem, which allows us to build on existing theory and computational methods in order to solve this challenging problem. We extend our m… ▽ More In this work, we propose a novel approach for subgraph matching, the problem of finding a given query graph in a large source graph, based on the fused Gromov-Wasserstein distance. We formulate the subgraph matching problem as a partial fused Gromov-Wasserstein problem, which allows us to build on existing theory and computational methods in order to solve this challenging problem. We extend our method by employing a subgraph sliding approach, which makes it efficient even for large graphs. In numerical experiments, we showcase that our new algorithms have the ability to outperform state-of-the-art methods for subgraph matching on synthetic as well as realworld datasets. In particular, our methods exhibit robustness with respect to noise in the datasets and achieve very fast query times. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.19593 [pdf, other]

SK-VQA: Synthetic Knowledge Generation at Scale for Training Context-Augmented Multimodal LLMs

Authors: Xin Su, Man Luo, Kris W Pan, Tien Pei Chou, Vasudev Lal, Phillip Howard

Abstract: Synthetic data generation has gained significant attention recently for its utility in training large vision and language models. However, the application of synthetic data to the training of multimodal context-augmented generation systems has been relatively unexplored. This gap in existing work is important because existing vision and language models (VLMs) are not trained specifically for conte… ▽ More Synthetic data generation has gained significant attention recently for its utility in training large vision and language models. However, the application of synthetic data to the training of multimodal context-augmented generation systems has been relatively unexplored. This gap in existing work is important because existing vision and language models (VLMs) are not trained specifically for context-augmented generation. Resources for adapting such models are therefore crucial for enabling their use in retrieval-augmented generation (RAG) settings, where a retriever is used to gather relevant information that is then subsequently provided to a generative model via context augmentation. To address this challenging problem, we generate SK-VQA: a large synthetic multimodal dataset containing over 2 million question-answer pairs which require external knowledge to determine the final answer. Our dataset is both larger and significantly more diverse than existing resources of its kind, possessing over 11x more unique questions and containing images from a greater variety of sources than previously-proposed datasets. Through extensive experiments, we demonstrate that our synthetic dataset can not only serve as a challenging benchmark, but is also highly effective for adapting existing generative multimodal models for context-augmented generation. △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.19247 [pdf, other]

Local Manifold Learning for No-Reference Image Quality Assessment

Authors: Timin Gao, Wensheng Pan, Yan Zhang, Sicheng Zhao, Shengchuan Zhang, Xiawu Zheng, Ke Li, Liujuan Cao, Rongrong Ji

Abstract: Contrastive learning has considerably advanced the field of Image Quality Assessment (IQA), emerging as a widely adopted technique. The core mechanism of contrastive learning involves minimizing the distance between quality-similar (positive) examples while maximizing the distance between quality-dissimilar (negative) examples. Despite its successes, current contrastive learning methods often negl… ▽ More Contrastive learning has considerably advanced the field of Image Quality Assessment (IQA), emerging as a widely adopted technique. The core mechanism of contrastive learning involves minimizing the distance between quality-similar (positive) examples while maximizing the distance between quality-dissimilar (negative) examples. Despite its successes, current contrastive learning methods often neglect the importance of preserving the local manifold structure. This oversight can result in a high degree of similarity among hard examples within the feature space, thereby impeding effective differentiation and assessment. To address this issue, we propose an innovative framework that integrates local manifold learning with contrastive learning for No-Reference Image Quality Assessment (NR-IQA). Our method begins by sampling multiple crops from a given image, identifying the most visually salient crop. This crop is then used to cluster other crops from the same image as the positive class, while crops from different images are treated as negative classes to increase inter-class distance. Uniquely, our approach also considers non-saliency crops from the same image as intra-class negative classes to preserve their distinctiveness. Additionally, we employ a mutual learning framework, which further enhances the model's ability to adaptively learn and identify visual saliency regions. Our approach demonstrates a better performance compared to state-of-the-art methods in 7 standard datasets, achieving PLCC values of 0.942 (compared to 0.908 in TID2013) and 0.914 (compared to 0.894 in LIVEC). △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.12215 [pdf, other]

Discrete Variable Topology Optimization Using Multi-Cut Formulation and Adaptive Trust Regions

Authors: Zisheng Ye, Wenxiao Pan

Abstract: We present a new framework for solving general topology optimization (TO) problems that find an optimal material distribution within a design space to maximize the performance of a structure while satisfying design constraints. These problems involve state variables that nonlinearly depend on the design variables, with objective functions that can be convex or non-convex, and may include multiple… ▽ More We present a new framework for solving general topology optimization (TO) problems that find an optimal material distribution within a design space to maximize the performance of a structure while satisfying design constraints. These problems involve state variables that nonlinearly depend on the design variables, with objective functions that can be convex or non-convex, and may include multiple candidate materials. The framework is designed to greatly enhance computational efficiency, primarily by diminishing optimization iteration counts and thereby reducing the solving of associated state-equilibrium partial differential equations (PDEs). It maintains binary design variables and addresses the large-scale mixed integer nonlinear programming (MINLP) problem that arises from discretizing the design space and PDEs. The core of this framework is the integration of the generalized Benders' decomposition and adaptive trust regions. The trust-region radius adapts based on a merit function. To mitigate ill-conditioning due to extreme parameter values, we further introduce a parameter relaxation scheme where two parameters are relaxed in stages at different paces. Numerical tests validate the framework's superior performance, including minimum compliance and compliant mechanism problems in single-material and multi-material designs. We compare our results with those of other methods and demonstrate significant reductions in optimization iterations by about one order of magnitude, while maintaining comparable optimal objective function values. As the design variables and constraints increase, the framework maintains consistent solution quality and efficiency, underscoring its good scalability. We anticipate this framework will be especially advantageous for TO applications involving substantial design variables and constraints and requiring significant computational resources for PDE solving. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.10744

Technique Report of CVPR 2024 PBDL Challenges

Authors: Ying Fu, Yu Li, Shaodi You, Boxin Shi, Jose Alvarez, Coert van Gemeren, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li, Yuze Han, Yingkai Zhang, Jianan Wang, Qinglin Liu, Wei Yu, Xiaoqian Lv, Jianing Li, Shengping Zhang, Xiangyang Ji, Yuanpei Chen, Yuhan Zhang, Weihang Peng, Liwen Zhang, Zhe Xu, Dingyong Gou , et al. (77 additional authors not shown)

Abstract: The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, a… ▽ More The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, and medium properties from images. In recent years, deep learning has shown promising improvements for various vision tasks, and when combined with physics-based vision, these approaches can enhance the robustness and accuracy of vision systems. This technical report summarizes the outcomes of the Physics-Based Vision Meets Deep Learning (PBDL) 2024 challenge, held in CVPR 2024 workshop. The challenge consisted of eight tracks, focusing on Low-Light Enhancement and Detection as well as High Dynamic Range (HDR) Imaging. This report details the objectives, methodologies, and results of each track, highlighting the top-performing solutions and their innovative approaches. △ Less

Submitted 27 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

Comments: The author list and contents need to be verified by all authors

arXiv:2406.00116 [pdf, other]

A Sim2Real Approach for Identifying Task-Relevant Properties in Interpretable Machine Learning

Authors: Eura Nofshin, Esther Brown, Brian Lim, Weiwei Pan, Finale Doshi-Velez

Abstract: Existing user studies suggest that different tasks may require explanations with different properties. However, user studies are expensive. In this paper, we introduce a generalizable, cost-effective method for identifying task-relevant explanation properties in silico, which can guide the design of more expensive user studies. We use our approach to identify relevant proxies for three example tas… ▽ More Existing user studies suggest that different tasks may require explanations with different properties. However, user studies are expensive. In this paper, we introduce a generalizable, cost-effective method for identifying task-relevant explanation properties in silico, which can guide the design of more expensive user studies. We use our approach to identify relevant proxies for three example tasks and validate our simulation with real user studies. △ Less

Submitted 31 May, 2024; originally announced June 2024.

arXiv:2405.20195 [pdf, other]

Using Large Language Models for Humanitarian Frontline Negotiation: Opportunities and Considerations

Authors: Zilin Ma, Susannah, Su, Nathan Zhao, Linn Bieske, Blake Bullwinkel, Yanyi Zhang, Sophia, Yang, Ziqing Luo, Siyao Li, Gekai Liao, Boxiang Wang, Jinglun Gao, Zihan Wen, Claude Bruderlein, Weiwei Pan

Abstract: Humanitarian negotiations in conflict zones, called \emph{frontline negotiation}, are often highly adversarial, complex, and high-risk. Several best-practices have emerged over the years that help negotiators extract insights from large datasets to navigate nuanced and rapidly evolving scenarios. Recent advances in large language models (LLMs) have sparked interest in the potential for AI to aid d… ▽ More Humanitarian negotiations in conflict zones, called \emph{frontline negotiation}, are often highly adversarial, complex, and high-risk. Several best-practices have emerged over the years that help negotiators extract insights from large datasets to navigate nuanced and rapidly evolving scenarios. Recent advances in large language models (LLMs) have sparked interest in the potential for AI to aid decision making in frontline negotiation. Through in-depth interviews with 13 experienced frontline negotiators, we identified their needs for AI-assisted case analysis and creativity support, as well as concerns surrounding confidentiality and model bias. We further explored the potential for AI augmentation of three standard tools used in frontline negotiation planning. We evaluated the quality and stability of our ChatGPT-based negotiation tools in the context of two real cases. Our findings highlight the potential for LLMs to enhance humanitarian negotiations and underscore the need for careful ethical and practical considerations. △ Less

Submitted 30 May, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.19909 [pdf, other]

Adaptive Advantage-Guided Policy Regularization for Offline Reinforcement Learning

Authors: Tenglong Liu, Yang Li, Yixing Lan, Hao Gao, Wei Pan, Xin Xu

Abstract: In offline reinforcement learning, the challenge of out-of-distribution (OOD) is pronounced. To address this, existing methods often constrain the learned policy through policy regularization. However, these methods often suffer from the issue of unnecessary conservativeness, hampering policy improvement. This occurs due to the indiscriminate use of all actions from the behavior policy that genera… ▽ More In offline reinforcement learning, the challenge of out-of-distribution (OOD) is pronounced. To address this, existing methods often constrain the learned policy through policy regularization. However, these methods often suffer from the issue of unnecessary conservativeness, hampering policy improvement. This occurs due to the indiscriminate use of all actions from the behavior policy that generates the offline dataset as constraints. The problem becomes particularly noticeable when the quality of the dataset is suboptimal. Thus, we propose Adaptive Advantage-guided Policy Regularization (A2PR), obtaining high-advantage actions from an augmented behavior policy combined with VAE to guide the learned policy. A2PR can select high-advantage actions that differ from those present in the dataset, while still effectively maintaining conservatism from OOD actions. This is achieved by harnessing the VAE capacity to generate samples matching the distribution of the data points. We theoretically prove that the improvement of the behavior policy is guaranteed. Besides, it effectively mitigates value overestimation with a bounded performance gap. Empirically, we conduct a series of experiments on the D4RL benchmark, where A2PR demonstrates state-of-the-art performance. Furthermore, experimental results on additional suboptimal mixed datasets reveal that A2PR exhibits superior performance. Code is available at https://github.com/ltlhuuu/A2PR. △ Less

Submitted 1 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

Comments: ICML 2024, 19 pages

arXiv:2405.18194 [pdf, other]

Delving into Differentially Private Transformer

Authors: Youlong Ding, Xueyang Wu, Yining Meng, Yonggang Luo, Hao Wang, Weike Pan

Abstract: Deep learning with differential privacy (DP) has garnered significant attention over the past years, leading to the development of numerous methods aimed at enhancing model accuracy and training efficiency. This paper delves into the problem of training Transformer models with differential privacy. Our treatment is modular: the logic is to `reduce' the problem of training DP Transformer to the mor… ▽ More Deep learning with differential privacy (DP) has garnered significant attention over the past years, leading to the development of numerous methods aimed at enhancing model accuracy and training efficiency. This paper delves into the problem of training Transformer models with differential privacy. Our treatment is modular: the logic is to `reduce' the problem of training DP Transformer to the more basic problem of training DP vanilla neural nets. The latter is better understood and amenable to many model-agnostic methods. Such `reduction' is done by first identifying the hardness unique to DP Transformer training: the attention distraction phenomenon and a lack of compatibility with existing techniques for efficient gradient clipping. To deal with these two issues, we propose the Re-Attention Mechanism and Phantom Clipping, respectively. We believe that our work not only casts new light on training DP Transformers but also promotes a modular treatment to advance research in the field of differentially private deep learning. △ Less

Submitted 29 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

Comments: ICML 2024

arXiv:2405.17250 [pdf, ps, other]

"Pass the butter": A study on desktop-classic multitasking robotic arm based on advanced YOLOv7 and BERT

Authors: Haohua Que, Wenbin Pan, Jie Xu, Hao Luo, Pei Wang, Li Zhang

Abstract: In recent years, various intelligent autonomous robots have begun to appear in daily life and production. Desktop-level robots are characterized by their flexible deployment, rapid response, and suitability for light workload environments. In order to meet the current societal demand for service robot technology, this study proposes using a miniaturized desktop-level robot (by ROS) as a carrier, l… ▽ More In recent years, various intelligent autonomous robots have begun to appear in daily life and production. Desktop-level robots are characterized by their flexible deployment, rapid response, and suitability for light workload environments. In order to meet the current societal demand for service robot technology, this study proposes using a miniaturized desktop-level robot (by ROS) as a carrier, locally deploying a natural language model (NLP-BERT), and integrating visual recognition (CV-YOLO) and speech recognition technology (ASR-Whisper) as inputs to achieve autonomous decision-making and rational action by the desktop robot. Three comprehensive experiments were designed to validate the robotic arm, and the results demonstrate excellent performance using this approach across all three experiments. In Task 1, the execution rates for speech recognition and action performance were 92.6% and 84.3%, respectively. In Task 2, the highest execution rates under the given conditions reached 92.1% and 84.6%, while in Task 3, the highest execution rates were 95.2% and 80.8%, respectively. Therefore, it can be concluded that the proposed solution integrating ASR, NLP, and other technologies on edge devices is feasible and provides a technical and engineering foundation for realizing multimodal desktop-level robots. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.16413 [pdf, other]

Augmented Risk Prediction for the Onset of Alzheimer's Disease from Electronic Health Records with Large Language Models

Authors: Jiankun Wang, Sumyeong Ahn, Taykhoom Dalal, Xiaodan Zhang, Weishen Pan, Qiannan Zhang, Bin Chen, Hiroko H. Dodge, Fei Wang, Jiayu Zhou

Abstract: Alzheimer's disease (AD) is the fifth-leading cause of death among Americans aged 65 and older. Screening and early detection of AD and related dementias (ADRD) are critical for timely intervention and for identifying clinical trial participants. The widespread adoption of electronic health records (EHRs) offers an important resource for developing ADRD screening tools such as machine learning bas… ▽ More Alzheimer's disease (AD) is the fifth-leading cause of death among Americans aged 65 and older. Screening and early detection of AD and related dementias (ADRD) are critical for timely intervention and for identifying clinical trial participants. The widespread adoption of electronic health records (EHRs) offers an important resource for developing ADRD screening tools such as machine learning based predictive models. Recent advancements in large language models (LLMs) demonstrate their unprecedented capability of encoding knowledge and performing reasoning, which offers them strong potential for enhancing risk prediction. This paper proposes a novel pipeline that augments risk prediction by leveraging the few-shot inference power of LLMs to make predictions on cases where traditional supervised learning methods (SLs) may not excel. Specifically, we develop a collaborative pipeline that combines SLs and LLMs via a confidence-driven decision-making mechanism, leveraging the strengths of SLs in clear-cut cases and LLMs in more complex scenarios. We evaluate this pipeline using a real-world EHR data warehouse from Oregon Health \& Science University (OHSU) Hospital, encompassing EHRs from over 2.5 million patients and more than 20 million patient encounters. Our results show that our proposed approach effectively combines the power of SLs and LLMs, offering significant improvements in predictive performance. This advancement holds promise for revolutionizing ADRD screening and early detection practices, with potential implications for better strategies of patient management and thus improving healthcare. △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2405.11280 [pdf, other]

Joint Analysis of Single-Cell Data across Cohorts with Missing Modalities

Authors: Marianne Arriola, Weishen Pan, Manqi Zhou, Qiannan Zhang, Chang Su, Fei Wang

Abstract: Joint analysis of multi-omic single-cell data across cohorts has significantly enhanced the comprehensive analysis of cellular processes. However, most of the existing approaches for this purpose require access to samples with complete modality availability, which is impractical in many real-world scenarios. In this paper, we propose (Single-Cell Cross-Cohort Cross-Category) integration, a novel f… ▽ More Joint analysis of multi-omic single-cell data across cohorts has significantly enhanced the comprehensive analysis of cellular processes. However, most of the existing approaches for this purpose require access to samples with complete modality availability, which is impractical in many real-world scenarios. In this paper, we propose (Single-Cell Cross-Cohort Cross-Category) integration, a novel framework that learns unified cell representations under domain shift without requiring full-modality reference samples. Our generative approach learns rich cross-modal and cross-domain relationships that enable imputation of these missing modalities. Through experiments on real-world multi-omic datasets, we demonstrate that offers a robust solution to single-cell tasks such as cell type clustering, cell type classification, and feature imputation. △ Less

Submitted 18 May, 2024; originally announced May 2024.

Comments: 10 pages, 7 figures, 5 tables

arXiv:2404.14949 [pdf, other]

Multi-Modal Prompt Learning on Blind Image Quality Assessment

Authors: Wensheng Pan, Timin Gao, Yan Zhang, Runze Hu, Xiawu Zheng, Enwei Zhang, Yuting Gao, Yutao Liu, Yunhang Shen, Ke Li, Shengchuan Zhang, Liujuan Cao, Rongrong Ji

Abstract: Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly. Currently, leveraging semantic information to enhance IQA is a crucial research direction. Traditional methods, hindered by a lack of sufficiently annotated data, have employed the CLIP image-text pretraining model as their backbone to gain semant… ▽ More Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly. Currently, leveraging semantic information to enhance IQA is a crucial research direction. Traditional methods, hindered by a lack of sufficiently annotated data, have employed the CLIP image-text pretraining model as their backbone to gain semantic awareness. However, the generalist nature of these pre-trained Vision-Language (VL) models often renders them suboptimal for IQA-specific tasks. Recent approaches have attempted to address this mismatch using prompt technology, but these solutions have shortcomings. Existing prompt-based VL models overly focus on incremental semantic information from text, neglecting the rich insights available from visual data analysis. This imbalance limits their performance improvements in IQA tasks. This paper introduces an innovative multi-modal prompt-based methodology for IQA. Our approach employs carefully crafted prompts that synergistically mine incremental semantic information from both visual and linguistic data. Specifically, in the visual branch, we introduce a multi-layer prompt structure to enhance the VL model's adaptability. In the text branch, we deploy a dual-prompt scheme that steers the model to recognize and differentiate between scene category and distortion type, thereby refining the model's capacity to assess image quality. Our experimental findings underscore the effectiveness of our method over existing Blind Image Quality Assessment (BIQA) approaches. Notably, it demonstrates competitive performance across various datasets. Our method achieves Spearman Rank Correlation Coefficient (SRCC) values of 0.961(surpassing 0.946 in CSIQ) and 0.941 (exceeding 0.930 in KADID), illustrating its robustness and accuracy in diverse contexts. △ Less

Submitted 18 May, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

arXiv:2404.11624 [pdf, ps, other]

Token Space: A Category Theory Framework for AI Computations

Authors: Wuming Pan

Abstract: This paper introduces the Token Space framework, a novel mathematical construct designed to enhance the interpretability and effectiveness of deep learning models through the application of category theory. By establishing a categorical structure at the Token level, we provide a new lens through which AI computations can be understood, emphasizing the relationships between tokens, such as grouping… ▽ More This paper introduces the Token Space framework, a novel mathematical construct designed to enhance the interpretability and effectiveness of deep learning models through the application of category theory. By establishing a categorical structure at the Token level, we provide a new lens through which AI computations can be understood, emphasizing the relationships between tokens, such as grouping, order, and parameter types. We explore the foundational methodologies of the Token Space, detailing its construction, the role of construction operators and initial categories, and its application in analyzing deep learning models, specifically focusing on attention mechanisms and Transformer architectures. The integration of category theory into AI research offers a unified framework to describe and analyze computational structures, enabling new research paths and development possibilities. Our investigation reveals that the Token Space framework not only facilitates a deeper theoretical understanding of deep learning models but also opens avenues for the design of more efficient, interpretable, and innovative models, illustrating the significant role of category theory in advancing computational models. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: 42 pages,5 tables

MSC Class: I.2.6

arXiv:2403.08941 [pdf, other]

Towards Model-Agnostic Posterior Approximation for Fast and Accurate Variational Autoencoders

Authors: Yaniv Yacoby, Weiwei Pan, Finale Doshi-Velez

Abstract: Inference for Variational Autoencoders (VAEs) consists of learning two models: (1) a generative model, which transforms a simple distribution over a latent space into the distribution over observed data, and (2) an inference model, which approximates the posterior of the latent codes given data. The two components are learned jointly via a lower bound to the generative model's log marginal likelih… ▽ More Inference for Variational Autoencoders (VAEs) consists of learning two models: (1) a generative model, which transforms a simple distribution over a latent space into the distribution over observed data, and (2) an inference model, which approximates the posterior of the latent codes given data. The two components are learned jointly via a lower bound to the generative model's log marginal likelihood. In early phases of joint training, the inference model poorly approximates the latent code posteriors. Recent work showed that this leads optimization to get stuck in local optima, negatively impacting the learned generative model. As such, recent work suggests ensuring a high-quality inference model via iterative training: maximizing the objective function relative to the inference model before every update to the generative model. Unfortunately, iterative training is inefficient, requiring heuristic criteria for reverting from iterative to joint training for speed. Here, we suggest an inference method that trains the generative and inference models independently. It approximates the posterior of the true model a priori; fixing this posterior approximation, we then maximize the lower bound relative to only the generative model. By conventional wisdom, this approach should rely on the true prior and likelihood of the true model to approximate its posterior (which are unknown). However, we show that we can compute a deterministic, model-agnostic posterior approximation (MAPA) of the true model's posterior. We then use MAPA to develop a proof-of-concept inference method. We present preliminary results on low-dimensional synthetic data that (1) MAPA captures the trend of the true posterior, and (2) our MAPA-based inference performs better density estimation with less computation than baselines. Lastly, we present a roadmap for scaling the MAPA-based inference method to high-dimensional data. △ Less

Submitted 12 June, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

Comments: Accepted at the Workshop at the 6th Symposium on Advances in Approximate Bayesian Inference (AABI) 2024

arXiv:2403.01977 [pdf, other]

TTA-Nav: Test-time Adaptive Reconstruction for Point-Goal Navigation under Visual Corruptions

Authors: Maytus Piriyajitakonkij, Mingfei Sun, Mengmi Zhang, Wei Pan

Abstract: Robot navigation under visual corruption presents a formidable challenge. To address this, we propose a Test-time Adaptation (TTA) method, named as TTA-Nav, for point-goal navigation under visual corruptions. Our "plug-and-play" method incorporates a top-down decoder to a pre-trained navigation model. Firstly, the pre-trained navigation model gets a corrupted image and extracts features. Secondly,… ▽ More Robot navigation under visual corruption presents a formidable challenge. To address this, we propose a Test-time Adaptation (TTA) method, named as TTA-Nav, for point-goal navigation under visual corruptions. Our "plug-and-play" method incorporates a top-down decoder to a pre-trained navigation model. Firstly, the pre-trained navigation model gets a corrupted image and extracts features. Secondly, the top-down decoder produces the reconstruction given the high-level features extracted by the pre-trained model. Then, it feeds the reconstruction of a corrupted image back to the pre-trained model. Finally, the pre-trained model does forward pass again to output action. Despite being trained solely on clean images, the top-down decoder can reconstruct cleaner images from corrupted ones without the need for gradient-based adaptation. The pre-trained navigation model with our top-down decoder significantly enhances navigation performance across almost all visual corruptions in our benchmarks. Our method improves the success rate of point-goal navigation from the state-of-the-art result of 46% to 94% on the most severe corruption. This suggests its potential for broader application in robotic visual navigation. Project page: https://sites.google.com/view/tta-nav △ Less

Submitted 14 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

Comments: Submitted to IROS2024

arXiv:2402.18784 [pdf, other]

Brain-inspired and Self-based Artificial Intelligence

Authors: Yi Zeng, Feifei Zhao, Yuxuan Zhao, Dongcheng Zhao, Enmeng Lu, Qian Zhang, Yuwei Wang, Hui Feng, Zhuoya Zhao, Jihang Wang, Qingqun Kong, Yinqian Sun, Yang Li, Guobin Shen, Bing Han, Yiting Dong, Wenxuan Pan, Xiang He, Aorigele Bao, Jin Wang

Abstract: The question "Can machines think?" and the Turing Test to assess whether machines could achieve human-level intelligence is one of the roots of AI. With the philosophical argument "I think, therefore I am", this paper challenge the idea of a "thinking machine" supported by current AIs since there is no sense of self in them. Current artificial intelligence is only seemingly intelligent information… ▽ More The question "Can machines think?" and the Turing Test to assess whether machines could achieve human-level intelligence is one of the roots of AI. With the philosophical argument "I think, therefore I am", this paper challenge the idea of a "thinking machine" supported by current AIs since there is no sense of self in them. Current artificial intelligence is only seemingly intelligent information processing and does not truly understand or be subjectively aware of oneself and perceive the world with the self as human intelligence does. In this paper, we introduce a Brain-inspired and Self-based Artificial Intelligence (BriSe AI) paradigm. This BriSe AI paradigm is dedicated to coordinating various cognitive functions and learning strategies in a self-organized manner to build human-level AI models and robotic applications. Specifically, BriSe AI emphasizes the crucial role of the Self in shaping the future AI, rooted with a practical hierarchical Self framework, including Perception and Learning, Bodily Self, Autonomous Self, Social Self, and Conceptual Self. The hierarchical framework of the Self highlights self-based environment perception, self-bodily modeling, autonomous interaction with the environment, social interaction and collaboration with others, and even more abstract understanding of the Self. Furthermore, the positive mutual promotion and support among multiple levels of Self, as well as between Self and learning, enhance the BriSe AI's conscious understanding of information and flexible adaptation to complex environments, serving as a driving force propelling BriSe AI towards real Artificial General Intelligence. △ Less

Submitted 28 February, 2024; originally announced February 2024.

arXiv:2402.17375 [pdf, other]

Impact of Computation in Integral Reinforcement Learning for Continuous-Time Control

Authors: Wenhan Cao, Wei Pan

Abstract: Integral reinforcement learning (IntRL) demands the precise computation of the utility function's integral at its policy evaluation (PEV) stage. This is achieved through quadrature rules, which are weighted sums of utility functions evaluated from state samples obtained in discrete time. Our research reveals a critical yet underexplored phenomenon: the choice of the computational method -- in this… ▽ More Integral reinforcement learning (IntRL) demands the precise computation of the utility function's integral at its policy evaluation (PEV) stage. This is achieved through quadrature rules, which are weighted sums of utility functions evaluated from state samples obtained in discrete time. Our research reveals a critical yet underexplored phenomenon: the choice of the computational method -- in this case, the quadrature rule -- can significantly impact control performance. This impact is traced back to the fact that computational errors introduced in the PEV stage can affect the policy iteration's convergence behavior, which in turn affects the learned controller. To elucidate how computation impacts control, we draw a parallel between IntRL's policy iteration and Newton's method applied to the Hamilton-Jacobi-Bellman equation. In this light, computational error in PEV manifests as an extra error term in each iteration of Newton's method, with its upper bound proportional to the computational error. Further, we demonstrate that when the utility function resides in a reproducing kernel Hilbert space (RKHS), the optimal quadrature is achievable by employing Bayesian quadrature with the RKHS-inducing kernel function. We prove that the local convergence rates for IntRL using the trapezoidal rule and Bayesian quadrature with a Matérn kernel to be $O(N^{-2})$ and $O(N^{-b})$, where $N$ is the number of evenly-spaced samples and $b$ is the Matérn kernel's smoothness parameter. These theoretical findings are finally validated by two canonical control tasks. △ Less

Submitted 27 February, 2024; originally announced February 2024.

arXiv:2402.15259 [pdf, other]

Open Ad Hoc Teamwork with Cooperative Game Theory

Authors: Jianhong Wang, Yang Li, Yuan Zhang, Wei Pan, Samuel Kaski

Abstract: Ad hoc teamwork poses a challenging problem, requiring the design of an agent to collaborate with teammates without prior coordination or joint training. Open ad hoc teamwork (OAHT) further complicates this challenge by considering environments with a changing number of teammates, referred to as open teams. One promising solution in practice to this problem is leveraging the generalizability of gr… ▽ More Ad hoc teamwork poses a challenging problem, requiring the design of an agent to collaborate with teammates without prior coordination or joint training. Open ad hoc teamwork (OAHT) further complicates this challenge by considering environments with a changing number of teammates, referred to as open teams. One promising solution in practice to this problem is leveraging the generalizability of graph neural networks to handle an unrestricted number of agents with various agent-types, named graph-based policy learning (GPL). However, its joint Q-value representation over a coordination graph lacks convincing explanations. In this paper, we establish a new theory to understand the representation of the joint Q-value for OAHT and its learning paradigm, through the lens of cooperative game theory. Building on our theory, we propose a novel algorithm named CIAO, based on GPL's framework, with additional provable implementation tricks that can facilitate learning. The demos of experimental results are available on https://sites.google.com/view/ciao2024, and the code of experiments is published on https://github.com/hsvgbkhgbv/CIAO. △ Less

Submitted 10 June, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

Comments: Published at ICML 2024, 29 pages

arXiv:2402.12733 [pdf, other]

BMLP: Behavior-aware MLP for Heterogeneous Sequential Recommendation

Authors: Weixin Li, Yuhao Wu, Yang Liu, Weike Pan, Zhong Ming

Abstract: In real recommendation scenarios, users often have different types of behaviors, such as clicking and buying. Existing research methods show that it is possible to capture the heterogeneous interests of users through different types of behaviors. However, most multi-behavior approaches have limitations in learning the relationship between different behaviors. In this paper, we propose a novel mult… ▽ More In real recommendation scenarios, users often have different types of behaviors, such as clicking and buying. Existing research methods show that it is possible to capture the heterogeneous interests of users through different types of behaviors. However, most multi-behavior approaches have limitations in learning the relationship between different behaviors. In this paper, we propose a novel multilayer perceptron (MLP)-based heterogeneous sequential recommendation method, namely behavior-aware multilayer perceptron (BMLP). Specifically, it has two main modules, including a heterogeneous interest perception (HIP) module, which models behaviors at multiple granularities through behavior types and transition relationships, and a purchase intent perception (PIP) module, which adaptively fuses subsequences of auxiliary behaviors to capture users' purchase intent. Compared with mainstream sequence models, MLP is competitive in terms of accuracy and has unique advantages in simplicity and efficiency. Extensive experiments show that BMLP achieves significant improvement over state-of-the-art algorithms on four public datasets. In addition, its pure MLP architecture leads to a linear time complexity. △ Less

Submitted 20 February, 2024; originally announced February 2024.

arXiv:2402.12416 [pdf, other]

Aligning Individual and Collective Objectives in Multi-Agent Cooperation

Authors: Yang Li, Wenhao Zhang, Jianhong Wang, Shao Zhang, Yali Du, Ying Wen, Wei Pan

Abstract: Among the research topics in multi-agent learning, mixed-motive cooperation is one of the most prominent challenges, primarily due to the mismatch between individual and collective goals. The cutting-edge research is focused on incorporating domain knowledge into rewards and introducing additional mechanisms to incentivize cooperation. However, these approaches often face shortcomings such as the… ▽ More Among the research topics in multi-agent learning, mixed-motive cooperation is one of the most prominent challenges, primarily due to the mismatch between individual and collective goals. The cutting-edge research is focused on incorporating domain knowledge into rewards and introducing additional mechanisms to incentivize cooperation. However, these approaches often face shortcomings such as the effort on manual design and the absence of theoretical groundings. To close this gap, we model the mixed-motive game as a differentiable game for the ease of illuminating the learning dynamics towards cooperation. More detailed, we introduce a novel optimization method named \textbf{\textit{A}}ltruistic \textbf{\textit{G}}radient \textbf{\textit{A}}djustment (\textbf{\textit{AgA}}) that employs gradient adjustments to progressively align individual and collective objectives. Furthermore, we theoretically prove that AgA effectively attracts gradients to stable fixed points of the collective objective while considering individual interests, and we validate these claims with empirical evidence. We evaluate the effectiveness of our algorithm AgA through benchmark environments for testing mixed-motive collaboration with small-scale agents such as the two-player public good game and the sequential social dilemma games, Cleanup and Harvest, as well as our self-developed large-scale environment in the game StarCraft II. △ Less

Submitted 22 May, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

Comments: 19 pages

arXiv:2401.15369 [pdf, other]

Privacy-Preserving Cross-Domain Sequential Recommendation

Authors: Zhaohao Lin, Weike Pan, Zhong Ming

Abstract: Cross-domain sequential recommendation is an important development direction of recommender systems. It combines the characteristics of sequential recommender systems and cross-domain recommender systems, which can capture the dynamic preferences of users and alleviate the problem of cold-start users. However, in recent years, people pay more and more attention to their privacy. They do not want o… ▽ More Cross-domain sequential recommendation is an important development direction of recommender systems. It combines the characteristics of sequential recommender systems and cross-domain recommender systems, which can capture the dynamic preferences of users and alleviate the problem of cold-start users. However, in recent years, people pay more and more attention to their privacy. They do not want other people to know what they just bought, what videos they just watched, and where they just came from. How to protect the users' privacy has become an urgent problem to be solved. In this paper, we propose a novel privacy-preserving cross-domain sequential recommender system (PriCDSR), which can provide users with recommendation services while preserving their privacy at the same time. Specifically, we define a new differential privacy on the data, taking into account both the ID information and the order information. Then, we design a random mechanism that satisfies this differential privacy and provide its theoretical proof. Our PriCDSR is a non-invasive method that can adopt any cross-domain sequential recommender system as a base model without any modification to it. To the best of our knowledge, our PriCDSR is the first work to investigate privacy issues in cross-domain sequential recommender systems. We conduct experiments on three domains, and the results demonstrate that our PriCDSR, despite introducing noise, still outperforms recommender systems that only use data from a single domain. △ Less

Submitted 27 January, 2024; originally announced January 2024.

arXiv:2401.14923 [pdf, other]

Reinforcement Learning Interventions on Boundedly Rational Human Agents in Frictionful Tasks

Authors: Eura Nofshin, Siddharth Swaroop, Weiwei Pan, Susan Murphy, Finale Doshi-Velez

Abstract: Many important behavior changes are frictionful; they require individuals to expend effort over a long period with little immediate gratification. Here, an artificial intelligence (AI) agent can provide personalized interventions to help individuals stick to their goals. In these settings, the AI agent must personalize rapidly (before the individual disengages) and interpretably, to help us unders… ▽ More Many important behavior changes are frictionful; they require individuals to expend effort over a long period with little immediate gratification. Here, an artificial intelligence (AI) agent can provide personalized interventions to help individuals stick to their goals. In these settings, the AI agent must personalize rapidly (before the individual disengages) and interpretably, to help us understand the behavioral interventions. In this paper, we introduce Behavior Model Reinforcement Learning (BMRL), a framework in which an AI agent intervenes on the parameters of a Markov Decision Process (MDP) belonging to a boundedly rational human agent. Our formulation of the human decision-maker as a planning agent allows us to attribute undesirable human policies (ones that do not lead to the goal) to their maladapted MDP parameters, such as an extremely low discount factor. Furthermore, we propose a class of tractable human models that captures fundamental behaviors in frictionful tasks. Introducing a notion of MDP equivalence specific to BMRL, we theoretically and empirically show that AI planning with our human models can lead to helpful policies on a wide range of more complex, ground-truth humans. △ Less

Submitted 26 January, 2024; originally announced January 2024.

Comments: In AAMAS 2024

arXiv:2401.12596 [pdf, other]

UniHDA: A Unified and Versatile Framework for Multi-Modal Hybrid Domain Adaptation

Authors: Hengjia Li, Yang Liu, Yuqi Lin, Zhanwei Zhang, Yibo Zhao, weihang Pan, Tu Zheng, Zheng Yang, Yuchun Jiang, Boxi Wu, Deng Cai

Abstract: Recently, generative domain adaptation has achieved remarkable progress, enabling us to adapt a pre-trained generator to a new target domain. However, existing methods simply adapt the generator to a single target domain and are limited to a single modality, either text-driven or image-driven. Moreover, they cannot maintain well consistency with the source domain, which impedes the inheritance of… ▽ More Recently, generative domain adaptation has achieved remarkable progress, enabling us to adapt a pre-trained generator to a new target domain. However, existing methods simply adapt the generator to a single target domain and are limited to a single modality, either text-driven or image-driven. Moreover, they cannot maintain well consistency with the source domain, which impedes the inheritance of the diversity. In this paper, we propose UniHDA, a \textbf{unified} and \textbf{versatile} framework for generative hybrid domain adaptation with multi-modal references from multiple domains. We use CLIP encoder to project multi-modal references into a unified embedding space and then linearly interpolate the direction vectors from multiple target domains to achieve hybrid domain adaptation. To ensure \textbf{consistency} with the source domain, we propose a novel cross-domain spatial structure (CSS) loss that maintains detailed spatial structure information between source and target generator. Experiments show that the adapted generator can synthesise realistic images with various attribute compositions. Additionally, our framework is generator-agnostic and versatile to multiple generators, e.g., StyleGAN, EG3D, and Diffusion Models. △ Less

Submitted 15 March, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

arXiv:2401.04971 [pdf, other]

A Survey on Cross-Domain Sequential Recommendation

Authors: Shu Chen, Zitao Xu, Weike Pan, Qiang Yang, Zhong Ming

Abstract: Cross-domain sequential recommendation (CDSR) shifts the modeling of user preferences from flat to stereoscopic by integrating and learning interaction information from multiple domains at different granularities (ranging from inter-sequence to intra-sequence and from single-domain to cross-domain). In this survey, we first define the CDSR problem using a four-dimensional tensor and then analyze i… ▽ More Cross-domain sequential recommendation (CDSR) shifts the modeling of user preferences from flat to stereoscopic by integrating and learning interaction information from multiple domains at different granularities (ranging from inter-sequence to intra-sequence and from single-domain to cross-domain). In this survey, we first define the CDSR problem using a four-dimensional tensor and then analyze its multi-type input representations under multidirectional dimensionality reductions. Following that, we provide a systematic overview from both macro and micro views. From a macro view, we abstract the multi-level fusion structures of various models across domains and discuss their bridges for fusion. From a micro view, focusing on the existing models, we first discuss the basic technologies and then explain the auxiliary learning technologies. Finally, we exhibit the available public datasets and the representative experimental results as well as provide some insights into future directions for research in CDSR. △ Less

Submitted 17 May, 2024; v1 submitted 10 January, 2024; originally announced January 2024.

Comments: Accepted to the IJCAI 2024 Survey Track

arXiv:2401.03676 [pdf, other]

Assessing AI Detectors in Identifying AI-Generated Code: Implications for Education

Authors: Wei Hung Pan, Ming Jie Chok, Jonathan Leong Shan Wong, Yung Xin Shin, Yeong Shian Poon, Zhou Yang, Chun Yong Chong, David Lo, Mei Kuan Lim

Abstract: Educators are increasingly concerned about the usage of Large Language Models (LLMs) such as ChatGPT in programming education, particularly regarding the potential exploitation of imperfections in Artificial Intelligence Generated Content (AIGC) Detectors for academic misconduct. In this paper, we present an empirical study where the LLM is examined for its attempts to bypass detection by AIGC Det… ▽ More Educators are increasingly concerned about the usage of Large Language Models (LLMs) such as ChatGPT in programming education, particularly regarding the potential exploitation of imperfections in Artificial Intelligence Generated Content (AIGC) Detectors for academic misconduct. In this paper, we present an empirical study where the LLM is examined for its attempts to bypass detection by AIGC Detectors. This is achieved by generating code in response to a given question using different variants. We collected a dataset comprising 5,069 samples, with each sample consisting of a textual description of a coding problem and its corresponding human-written Python solution codes. These samples were obtained from various sources, including 80 from Quescol, 3,264 from Kaggle, and 1,725 from LeetCode. From the dataset, we created 13 sets of code problem variant prompts, which were used to instruct ChatGPT to generate the outputs. Subsequently, we assessed the performance of five AIGC detectors. Our results demonstrate that existing AIGC Detectors perform poorly in distinguishing between human-written code and AI-generated code. △ Less

Submitted 8 January, 2024; originally announced January 2024.

Comments: 11 pages, paper accepted at 46th International Conference on Software Engineering, Software Engineering Education and Training Track (ICSE-SEET 2024)

arXiv:2312.15248 [pdf, other]

Type-II Apollonian Model

Authors: Fei Ma, Jinzhi Ouyang, Ping Wang, Haobin Shi, Wei Pan

Abstract: The family of planar graphs is a particularly important family and models many real-world networks. In this paper, we propose a principled framework based on the widely-known Apollonian packing process to generate new planar network, i.e., Type-II Apollonian network $\mathcal{A}_{t}$. The manipulation is different from that of the typical Apollonian network, and is proceeded in terms of the iterat… ▽ More The family of planar graphs is a particularly important family and models many real-world networks. In this paper, we propose a principled framework based on the widely-known Apollonian packing process to generate new planar network, i.e., Type-II Apollonian network $\mathcal{A}_{t}$. The manipulation is different from that of the typical Apollonian network, and is proceeded in terms of the iterative addition of triangle instead of vertex. As a consequence, network $\mathcal{A}_{t}$ turns out to be hamiltonian and eulerian, however, the typical Apollonian network is not. Then, we in-depth study some fundamental structural properties on network $\mathcal{A}_{t}$, and verify that network $\mathcal{A}_{t}$ is sparse like most real-world networks, has scale-free feature and small-world property, and exhibits disassortative mixing structure. Next, we design an effective algorithm for solving the problem of how to enumerate spanning trees on network $\mathcal{A}_{t}$, and derive the asymptotic solution of the spanning tree entropy, which suggests that Type-II Apollonian network is more reliable to a random removal of edges than the typical Apollonian network. Additionally, we study trapping problem on network $\mathcal{A}_{t}$, and use average trapping time as metric to show that Type-II Apollonian network $\mathcal{A}_{t}$ has better structure for fast information diffusion than the typical Apollonian network. △ Less

Submitted 23 December, 2023; originally announced December 2023.

arXiv:2312.08631 [pdf, other]

Semi-supervised Semantic Segmentation Meets Masked Modeling:Fine-grained Locality Learning Matters in Consistency Regularization

Authors: Wentao Pan, Zhe Xu, Jiangpeng Yan, Zihan Wu, Raymond Kai-yu Tong, Xiu Li, Jianhua Yao

Abstract: Semi-supervised semantic segmentation aims to utilize limited labeled images and abundant unlabeled images to achieve label-efficient learning, wherein the weak-to-strong consistency regularization framework, popularized by FixMatch, is widely used as a benchmark scheme. Despite its effectiveness, we observe that such scheme struggles with satisfactory segmentation for the local regions. This can… ▽ More Semi-supervised semantic segmentation aims to utilize limited labeled images and abundant unlabeled images to achieve label-efficient learning, wherein the weak-to-strong consistency regularization framework, popularized by FixMatch, is widely used as a benchmark scheme. Despite its effectiveness, we observe that such scheme struggles with satisfactory segmentation for the local regions. This can be because it originally stems from the image classification task and lacks specialized mechanisms to capture fine-grained local semantics that prioritizes in dense prediction. To address this issue, we propose a novel framework called \texttt{MaskMatch}, which enables fine-grained locality learning to achieve better dense segmentation. On top of the original teacher-student framework, we design a masked modeling proxy task that encourages the student model to predict the segmentation given the unmasked image patches (even with 30\% only) and enforces the predictions to be consistent with pseudo-labels generated by the teacher model using the complete image. Such design is motivated by the intuition that if the predictions are more consistent given insufficient neighboring information, stronger fine-grained locality perception is achieved. Besides, recognizing the importance of reliable pseudo-labels in the above locality learning and the original consistency learning scheme, we design a multi-scale ensembling strategy that considers context at different levels of abstraction for pseudo-label generation. Extensive experiments on benchmark datasets demonstrate the superiority of our method against previous approaches and its plug-and-play flexibility. △ Less

Submitted 13 December, 2023; originally announced December 2023.

arXiv:2312.06987

A new lightweight additive homomorphic encryption algorithm

Authors: Wuqiong Pan, Hongliang Gu

Abstract: This article describes a lightweight additive homomorphic algorithm with the same encryption and decryption keys. Compared to standard additive homomorphic algorithms like Paillier, this algorithm reduces the computational cost of encryption and decryption from modular exponentiation to modular multiplication, and reduces the computational cost of ciphertext addition from modular multiplication to… ▽ More This article describes a lightweight additive homomorphic algorithm with the same encryption and decryption keys. Compared to standard additive homomorphic algorithms like Paillier, this algorithm reduces the computational cost of encryption and decryption from modular exponentiation to modular multiplication, and reduces the computational cost of ciphertext addition from modular multiplication to modular addition. This algorithm is based on a new mathematical problem: in two division operations, whether it is possible to infer the remainder or divisor based on the dividend when two remainders are related. Currently, it is not obvious how to break this problem, but further exploration is needed to determine if it is sufficiently difficult. In addition to this mathematical problem, we have also designed two interesting mathematical structures for decryption, which are used in the two algorithms mentioned in the main text. It is possible that the decryption structure of Algorithm 2 introduces new security vulnerabilities, but we have not investigated this issue thoroughly. △ Less

Submitted 1 April, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

Comments: This algorithm proposed in this paper has serious security problem. It can be attacked by Orthogonal lattice

arXiv:2312.05643 [pdf, other]

NiSNN-A: Non-iterative Spiking Neural Networks with Attention with Application to Motor Imagery EEG Classification

Authors: Chuhan Zhang, Wei Pan, Cosimo Della Santina

Abstract: Motor imagery, an important category in electroencephalogram (EEG) research, often intersects with scenarios demanding low energy consumption, such as portable medical devices and isolated environment operations. Traditional deep learning algorithms, despite their effectiveness, are characterized by significant computational demands accompanied by high energy usage. As an alternative, spiking neur… ▽ More Motor imagery, an important category in electroencephalogram (EEG) research, often intersects with scenarios demanding low energy consumption, such as portable medical devices and isolated environment operations. Traditional deep learning algorithms, despite their effectiveness, are characterized by significant computational demands accompanied by high energy usage. As an alternative, spiking neural networks (SNNs), inspired by the biological functions of the brain, emerge as a promising energy-efficient solution. However, SNNs typically exhibit lower accuracy than their counterpart convolutional neural networks (CNNs). Although attention mechanisms successfully increase network accuracy by focusing on relevant features, their integration in the SNN framework remains an open question. In this work, we combine the SNN and the attention mechanisms for the EEG classification, aiming to improve precision and reduce energy consumption. To this end, we first propose a Non-iterative Leaky Integrate-and-Fire (LIF) neuron model, overcoming the gradient issues in the traditional SNNs using the Iterative LIF neurons. Then, we introduce the sequence-based attention mechanisms to refine the feature map. We evaluated the proposed Non-iterative SNN with Attention (NiSNN-A) model on OpenBMI, a large-scale motor imagery dataset. Experiment results demonstrate that 1) our model outperforms other SNN models by achieving higher accuracy, 2) our model increases energy efficiency compared to the counterpart CNN models (i.e., by 2.27 times) while maintaining comparable accuracy. △ Less

Submitted 9 December, 2023; originally announced December 2023.

arXiv:2311.09008 [pdf, other]

End-to-end Task-oriented Dialogue: A Survey of Tasks, Methods, and Future Directions

Authors: Libo Qin, Wenbo Pan, Qiguang Chen, Lizi Liao, Zhou Yu, Yue Zhang, Wanxiang Che, Min Li

Abstract: End-to-end task-oriented dialogue (EToD) can directly generate responses in an end-to-end fashion without modular training, which attracts escalating popularity. The advancement of deep neural networks, especially the successful use of large pre-trained models, has further led to significant progress in EToD research in recent years. In this paper, we present a thorough review and provide a unifie… ▽ More End-to-end task-oriented dialogue (EToD) can directly generate responses in an end-to-end fashion without modular training, which attracts escalating popularity. The advancement of deep neural networks, especially the successful use of large pre-trained models, has further led to significant progress in EToD research in recent years. In this paper, we present a thorough review and provide a unified perspective to summarize existing approaches as well as recent trends to advance the development of EToD research. The contributions of this paper can be summarized: (1) \textbf{\textit{First survey}}: to our knowledge, we take the first step to present a thorough survey of this research field; (2) \textbf{\textit{New taxonomy}}: we first introduce a unified perspective for EToD, including (i) \textit{Modularly EToD} and (ii) \textit{Fully EToD}; (3) \textbf{\textit{New Frontiers}}: we discuss some potential frontier areas as well as the corresponding challenges, hoping to spur breakthrough research in EToD field; (4) \textbf{\textit{Abundant resources}}: we build a public website\footnote{We collect the related papers, baseline projects, and leaderboards for the community at \url{https://etods.net/}.}, where EToD researchers could directly access the recent progress. We hope this work can serve as a thorough reference for the EToD research community. △ Less

Submitted 15 November, 2023; originally announced November 2023.

Comments: Accepted at EMNLP2023

arXiv:2311.08244 [pdf, other]

Language and Sketching: An LLM-driven Interactive Multimodal Multitask Robot Navigation Framework

Authors: Weiqin Zu, Wenbin Song, Ruiqing Chen, Ze Guo, Fanglei Sun, Zheng Tian, Wei Pan, Jun Wang

Abstract: The socially-aware navigation system has evolved to adeptly avoid various obstacles while performing multiple tasks, such as point-to-point navigation, human-following, and -guiding. However, a prominent gap persists: in Human-Robot Interaction (HRI), the procedure of communicating commands to robots demands intricate mathematical formulations. Furthermore, the transition between tasks does not qu… ▽ More The socially-aware navigation system has evolved to adeptly avoid various obstacles while performing multiple tasks, such as point-to-point navigation, human-following, and -guiding. However, a prominent gap persists: in Human-Robot Interaction (HRI), the procedure of communicating commands to robots demands intricate mathematical formulations. Furthermore, the transition between tasks does not quite possess the intuitive control and user-centric interactivity that one would desire. In this work, we propose an LLM-driven interactive multimodal multitask robot navigation framework, termed LIM2N, to solve the above new challenge in the navigation field. We achieve this by first introducing a multimodal interaction framework where language and hand-drawn inputs can serve as navigation constraints and control objectives. Next, a reinforcement learning agent is built to handle multiple tasks with the received information. Crucially, LIM2N creates smooth cooperation among the reasoning of multimodal input, multitask planning, and adaptation and processing of the intelligent sensing modules in the complicated system. Extensive experiments are conducted in both simulation and the real world demonstrating that LIM2N has superior user needs understanding, alongside an enhanced interactive experience. △ Less

Submitted 21 March, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

arXiv:2311.03680

Deep Bayesian Reinforcement Learning for Spacecraft Proximity Maneuvers and Docking

Authors: Desong Du, Naiming Qi, Yanfang Liu, Wei Pan

Abstract: In the pursuit of autonomous spacecraft proximity maneuvers and docking(PMD), we introduce a novel Bayesian actor-critic reinforcement learning algorithm to learn a control policy with the stability guarantee. The PMD task is formulated as a Markov decision process that reflects the relative dynamic model, the docking cone and the cost function. Drawing from the principles of Lyapunov theory, we f… ▽ More In the pursuit of autonomous spacecraft proximity maneuvers and docking(PMD), we introduce a novel Bayesian actor-critic reinforcement learning algorithm to learn a control policy with the stability guarantee. The PMD task is formulated as a Markov decision process that reflects the relative dynamic model, the docking cone and the cost function. Drawing from the principles of Lyapunov theory, we frame the temporal difference learning as a constrained Gaussian process regression problem. This innovative approach allows the state-value function to be expressed as a Lyapunov function, leveraging the Gaussian process and deep kernel learning. We develop a novel Bayesian quadrature policy optimization procedure to analytically compute the policy gradient while integrating Lyapunov-based stability constraints. This integration is pivotal in satisfying the rigorous safety demands of spaceflight missions. The proposed algorithm has been experimentally evaluated on a spacecraft air-bearing testbed and shows impressive and promising performance. △ Less

Submitted 21 May, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

Comments: Because of a conflict of interest between me and my author's institution, my author and I do not want this paper to continue publication

arXiv:2310.20424 [pdf, other]

DDC-PIM: Efficient Algorithm/Architecture Co-design for Doubling Data Capacity of SRAM-based Processing-In-Memory

Authors: Cenlin Duan, Jianlei Yang, Xiaolin He, Yingjie Qi, Yikun Wang, Yiou Wang, Ziyan He, Bonan Yan, Xueyan Wang, Xiaotao Jia, Weitao Pan, Weisheng Zhao

Abstract: Processing-in-memory (PIM), as a novel computing paradigm, provides significant performance benefits from the aspect of effective data movement reduction. SRAM-based PIM has been demonstrated as one of the most promising candidates due to its endurance and compatibility. However, the integration density of SRAM-based PIM is much lower than other non-volatile memory-based ones, due to its inherent… ▽ More Processing-in-memory (PIM), as a novel computing paradigm, provides significant performance benefits from the aspect of effective data movement reduction. SRAM-based PIM has been demonstrated as one of the most promising candidates due to its endurance and compatibility. However, the integration density of SRAM-based PIM is much lower than other non-volatile memory-based ones, due to its inherent 6T structure for storing a single bit. Within comparable area constraints, SRAM-based PIM exhibits notably lower capacity. Thus, aiming to unleash its capacity potential, we propose DDC-PIM, an efficient algorithm/architecture co-design methodology that effectively doubles the equivalent data capacity. At the algorithmic level, we propose a filter-wise complementary correlation (FCC) algorithm to obtain a bitwise complementary pair. At the architecture level, we exploit the intrinsic cross-coupled structure of 6T SRAM to store the bitwise complementary pair in their complementary states ($Q/\overline{Q}$), thereby maximizing the data capacity of each SRAM cell. The dual-broadcast input structure and reconfigurable unit support both depthwise and pointwise convolution, adhering to the requirements of various neural networks. Evaluation results show that DDC-PIM yields about $2.84\times$ speedup on MobileNetV2 and $2.69\times$ on EfficientNet-B0 with negligible accuracy loss compared with PIM baseline implementation. Compared with state-of-the-art SRAM-based PIM macros, DDC-PIM achieves up to $8.41\times$ and $2.75\times$ improvement in weight density and area efficiency, respectively. △ Less

Submitted 31 October, 2023; originally announced October 2023.

Comments: 14 pages, to be published in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD)

arXiv:2310.17816 [pdf, other]

Local Discovery by Partitioning: Polynomial-Time Causal Discovery Around Exposure-Outcome Pairs

Authors: Jacqueline Maasch, Weishen Pan, Shantanu Gupta, Volodymyr Kuleshov, Kyra Gan, Fei Wang

Abstract: Causal discovery is crucial for causal inference in observational studies, as it can enable the identification of valid adjustment sets (VAS) for unbiased effect estimation. However, global causal discovery is notoriously hard in the nonparametric setting, with exponential time and sample complexity in the worst case. To address this, we propose local discovery by partitioning (LDP): a local causa… ▽ More Causal discovery is crucial for causal inference in observational studies, as it can enable the identification of valid adjustment sets (VAS) for unbiased effect estimation. However, global causal discovery is notoriously hard in the nonparametric setting, with exponential time and sample complexity in the worst case. To address this, we propose local discovery by partitioning (LDP): a local causal discovery method that is tailored for downstream inference tasks without requiring parametric and pretreatment assumptions. LDP is a constraint-based procedure that returns a VAS for an exposure-outcome pair under latent confounding, given sufficient conditions. The total number of independence tests performed is worst-case quadratic with respect to the cardinality of the variable set. Asymptotic theoretical guarantees are numerically validated on synthetic graphs. Adjustment sets from LDP yield less biased and more precise average treatment effect estimates than baseline discovery algorithms, with LDP outperforming on confounder recall, runtime, and test count for VAS discovery. Notably, LDP ran at least 1300x faster than baselines on a benchmark. △ Less

Submitted 1 June, 2024; v1 submitted 25 October, 2023; originally announced October 2023.

Journal ref: Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence (2024)

arXiv:2310.15699 [pdf, other]

DACOOP-A: Decentralized Adaptive Cooperative Pursuit via Attention

Authors: Zheng Zhang, Dengyu Zhang, Qingrui Zhang, Wei Pan, Tianjiang Hu

Abstract: Integrating rule-based policies into reinforcement learning promises to improve data efficiency and generalization in cooperative pursuit problems. However, most implementations do not properly distinguish the influence of neighboring robots in observation embedding or inter-robot interaction rules, leading to information loss and inefficient cooperation. This paper proposes a cooperative pursuit… ▽ More Integrating rule-based policies into reinforcement learning promises to improve data efficiency and generalization in cooperative pursuit problems. However, most implementations do not properly distinguish the influence of neighboring robots in observation embedding or inter-robot interaction rules, leading to information loss and inefficient cooperation. This paper proposes a cooperative pursuit algorithm named Decentralized Adaptive COOperative Pursuit via Attention (DACOOP-A) by empowering reinforcement learning with artificial potential field and attention mechanisms. An attention-based framework is developed to emphasize important neighbors by concurrently integrating the learned attention scores into observation embedding and inter-robot interaction rules. A KL divergence regularization is introduced to alleviate the resultant learning stability issue. Improvements in data efficiency and generalization are demonstrated through numerical simulations. Extensive quantitative analysis and ablation studies are performed to illustrate the advantages of the proposed modules. Real-world experiments are performed to justify the feasibility of deploying DACOOP-A in physical systems. △ Less

Submitted 28 October, 2023; v1 submitted 24 October, 2023; originally announced October 2023.

Comments: 8 Pages; This manuscript has been accepted by IEEE Robotics and Automation Letters

arXiv:2310.06606 [pdf, other]

SYNLOCO: Synthesizing Central Pattern Generator and Reinforcement Learning for Quadruped Locomotion

Authors: Xinyu Zhang, Zhiyuan Xiao, Qingrui Zhang, Wei Pan

Abstract: The Central Pattern Generator (CPG) is adept at generating rhythmic gait patterns characterized by consistent timing and adequate foot clearance. Yet, its open-loop configuration often compromises the system's control performance in response to environmental variations. On the other hand, Reinforcement Learning (RL), celebrated for its model-free properties, has gained significant traction in robo… ▽ More The Central Pattern Generator (CPG) is adept at generating rhythmic gait patterns characterized by consistent timing and adequate foot clearance. Yet, its open-loop configuration often compromises the system's control performance in response to environmental variations. On the other hand, Reinforcement Learning (RL), celebrated for its model-free properties, has gained significant traction in robotics due to its inherent adaptability and robustness. However, initiating traditional RL approaches from the ground up presents computational challenges and a heightened risk of converging to suboptimal local minima. In this paper, we propose an innovative quadruped locomotion framework, SYNLOCO, by synthesizing CPG and RL that can ingeniously integrate the strengths of both methods, enabling the development of a locomotion controller that is both stable and natural. Furthermore, we introduce a set of performance-driven reward metrics that augment the learning of locomotion control. To optimize the learning trajectory of SYNLOCO, a two-phased training strategy is presented. Our empirical evaluation, conducted on a Unitree GO1 robot under varied conditions--including distinct velocities, terrains, and payload capacities--showcases SYNLOCO's ability to produce consistent and clear-footed gaits across diverse scenarios. The developed controller exhibits resilience against substantial parameter variations, underscoring its potential for robust real-world applications. △ Less

Submitted 10 October, 2023; originally announced October 2023.

Comments: 7 Pages

arXiv:2309.14295 [pdf, other]

Unwieldy Object Delivery with Nonholonomic Mobile Base: A Stable Pushing Approach

Authors: Yujie Tang, Hai Zhu, Susan Potters, Martijn Wisse, Wei Pan

Abstract: This paper addresses the problem of pushing manipulation with nonholonomic mobile robots. Pushing is a fundamental skill that enables robots to move unwieldy objects that cannot be grasped. We propose a stable pushing method that maintains stiff contact between the robot and the object to avoid consuming repositioning actions. We prove that a line contact, rather than a single point contact, is ne… ▽ More This paper addresses the problem of pushing manipulation with nonholonomic mobile robots. Pushing is a fundamental skill that enables robots to move unwieldy objects that cannot be grasped. We propose a stable pushing method that maintains stiff contact between the robot and the object to avoid consuming repositioning actions. We prove that a line contact, rather than a single point contact, is necessary for nonholonomic robots to achieve stable pushing. We also show that the stable pushing constraint and the nonholonomic constraint of the robot can be simplified as a concise linear motion constraint. Then the pushing planning problem can be formulated as a constrained optimization problem using nonlinear model predictive control (NMPC). According to the experiments, our NMPC-based planner outperforms a reactive pushing strategy in terms of efficiency, reducing the robot's traveled distance by 23.8\% and time by 77.4\%. Furthermore, our method requires four fewer hyperparameters and decision variables than the Linear Time-Varying (LTV) MPC approach, making it easier to implement. Real-world experiments are carried out to validate the proposed method with two differential-drive robots, Husky and Boxer, under different friction conditions. △ Less

Submitted 25 September, 2023; originally announced September 2023.

Comments: The short version of the paper is accepted by RAL

arXiv:2309.11443 [pdf, other]

Signature Activation: A Sparse Signal View for Holistic Saliency

Authors: Jose Roberto Tello Ayala, Akl C. Fahed, Weiwei Pan, Eugene V. Pomerantsev, Patrick T. Ellinor, Anthony Philippakis, Finale Doshi-Velez

Abstract: The adoption of machine learning in healthcare calls for model transparency and explainability. In this work, we introduce Signature Activation, a saliency method that generates holistic and class-agnostic explanations for Convolutional Neural Network (CNN) outputs. Our method exploits the fact that certain kinds of medical images, such as angiograms, have clear foreground and background objects.… ▽ More The adoption of machine learning in healthcare calls for model transparency and explainability. In this work, we introduce Signature Activation, a saliency method that generates holistic and class-agnostic explanations for Convolutional Neural Network (CNN) outputs. Our method exploits the fact that certain kinds of medical images, such as angiograms, have clear foreground and background objects. We give theoretical explanation to justify our methods. We show the potential use of our method in clinical settings through evaluating its efficacy for aiding the detection of lesions in coronary angiograms. △ Less

Submitted 20 September, 2023; originally announced September 2023.

arXiv:2309.09550 [pdf, other]

Adaptive Reorganization of Neural Pathways for Continual Learning with Spiking Neural Networks

Authors: Bing Han, Feifei Zhao, Wenxuan Pan, Zhaoya Zhao, Xianqi Li, Qingqun Kong, Yi Zeng

Abstract: The human brain can self-organize rich and diverse sparse neural pathways to incrementally master hundreds of cognitive tasks. However, most existing continual learning algorithms for deep artificial and spiking neural networks are unable to adequately auto-regulate the limited resources in the network, which leads to performance drop along with energy consumption rise as the increase of tasks. In… ▽ More The human brain can self-organize rich and diverse sparse neural pathways to incrementally master hundreds of cognitive tasks. However, most existing continual learning algorithms for deep artificial and spiking neural networks are unable to adequately auto-regulate the limited resources in the network, which leads to performance drop along with energy consumption rise as the increase of tasks. In this paper, we propose a brain-inspired continual learning algorithm with adaptive reorganization of neural pathways, which employs Self-Organizing Regulation networks to reorganize the single and limited Spiking Neural Network (SOR-SNN) into rich sparse neural pathways to efficiently cope with incremental tasks. The proposed model demonstrates consistent superiority in performance, energy consumption, and memory capacity on diverse continual learning tasks ranging from child-like simple to complex tasks, as well as on generalized CIFAR100 and ImageNet datasets. In particular, the SOR-SNN model excels at learning more complex tasks as well as more tasks, and is able to integrate the past learned knowledge with the information from the current task, showing the backward transfer ability to facilitate the old tasks. Meanwhile, the proposed model exhibits self-repairing ability to irreversible damage and for pruned networks, could automatically allocate new pathway from the retained network to recover memory for forgotten knowledge. △ Less

Submitted 8 October, 2023; v1 submitted 18 September, 2023; originally announced September 2023.

arXiv:2309.05263 [pdf, other]

Brain-inspired Evolutionary Architectures for Spiking Neural Networks

Authors: Wenxuan Pan, Feifei Zhao, Zhuoya Zhao, Yi Zeng

Abstract: The complex and unique neural network topology of the human brain formed through natural evolution enables it to perform multiple cognitive functions simultaneously. Automated evolutionary mechanisms of biological network structure inspire us to explore efficient architectural optimization for Spiking Neural Networks (SNNs). Instead of manually designed fixed architectures or hierarchical Network… ▽ More The complex and unique neural network topology of the human brain formed through natural evolution enables it to perform multiple cognitive functions simultaneously. Automated evolutionary mechanisms of biological network structure inspire us to explore efficient architectural optimization for Spiking Neural Networks (SNNs). Instead of manually designed fixed architectures or hierarchical Network Architecture Search (NAS), this paper evolves SNNs architecture by incorporating brain-inspired local modular structure and global cross-module connectivity. Locally, the brain region-inspired module consists of multiple neural motifs with excitatory and inhibitory connections; Globally, we evolve free connections among modules, including long-term cross-module feedforward and feedback connections. We further introduce an efficient multi-objective evolutionary algorithm based on a few-shot performance predictor, endowing SNNs with high performance, efficiency and low energy consumption. Extensive experiments on static datasets (CIFAR10, CIFAR100) and neuromorphic datasets (CIFAR10-DVS, DVS128-Gesture) demonstrate that our proposed model boosts energy efficiency, archiving consistent and remarkable performance. This work explores brain-inspired neural architectures suitable for SNNs and also provides preliminary insights into the evolutionary mechanisms of biological neural networks in the human brain. △ Less

Submitted 11 September, 2023; originally announced September 2023.

arXiv:2309.04156 [pdf, other]

Cross-Utterance Conditioned VAE for Speech Generation

Authors: Yang Li, Cheng Yu, Guangzhi Sun, Weiqin Zu, Zheng Tian, Ying Wen, Wei Pan, Chao Zhang, Jun Wang, Yang Yang, Fanglei Sun

Abstract: Speech synthesis systems powered by neural networks hold promise for multimedia production, but frequently face issues with producing expressive speech and seamless editing. In response, we present the Cross-Utterance Conditioned Variational Autoencoder speech synthesis (CUC-VAE S2) framework to enhance prosody and ensure natural speech generation. This framework leverages the powerful representat… ▽ More Speech synthesis systems powered by neural networks hold promise for multimedia production, but frequently face issues with producing expressive speech and seamless editing. In response, we present the Cross-Utterance Conditioned Variational Autoencoder speech synthesis (CUC-VAE S2) framework to enhance prosody and ensure natural speech generation. This framework leverages the powerful representational capabilities of pre-trained language models and the re-expression abilities of variational autoencoders (VAEs). The core component of the CUC-VAE S2 framework is the cross-utterance CVAE, which extracts acoustic, speaker, and textual features from surrounding sentences to generate context-sensitive prosodic features, more accurately emulating human prosody generation. We further propose two practical algorithms tailored for distinct speech synthesis applications: CUC-VAE TTS for text-to-speech and CUC-VAE SE for speech editing. The CUC-VAE TTS is a direct application of the framework, designed to generate audio with contextual prosody derived from surrounding texts. On the other hand, the CUC-VAE SE algorithm leverages real mel spectrogram sampling conditioned on contextual information, producing audio that closely mirrors real sound and thereby facilitating flexible speech editing based on text such as deletion, insertion, and replacement. Experimental results on the LibriTTS datasets demonstrate that our proposed models significantly enhance speech synthesis and editing, producing more natural and expressive speech. △ Less

Submitted 8 September, 2023; originally announced September 2023.

Comments: 13 pages;

arXiv:2309.00827 [pdf, other]

Few shot font generation via transferring similarity guided global style and quantization local style

Authors: Wei Pan, Anna Zhu, Xinyu Zhou, Brian Kenji Iwana, Shilin Li

Abstract: Automatic few-shot font generation (AFFG), aiming at generating new fonts with only a few glyph references, reduces the labor cost of manually designing fonts. However, the traditional AFFG paradigm of style-content disentanglement cannot capture the diverse local details of different fonts. So, many component-based approaches are proposed to tackle this problem. The issue with component-based app… ▽ More Automatic few-shot font generation (AFFG), aiming at generating new fonts with only a few glyph references, reduces the labor cost of manually designing fonts. However, the traditional AFFG paradigm of style-content disentanglement cannot capture the diverse local details of different fonts. So, many component-based approaches are proposed to tackle this problem. The issue with component-based approaches is that they usually require special pre-defined glyph components, e.g., strokes and radicals, which is infeasible for AFFG of different languages. In this paper, we present a novel font generation approach by aggregating styles from character similarity-guided global features and stylized component-level representations. We calculate the similarity scores of the target character and the referenced samples by measuring the distance along the corresponding channels from the content features, and assigning them as the weights for aggregating the global style features. To better capture the local styles, a cross-attention-based style transfer module is adopted to transfer the styles of reference glyphs to the components, where the components are self-learned discrete latent codes through vector quantization without manual definition. With these designs, our AFFG method could obtain a complete set of component-level style representations, and also control the global glyph characteristics. The experimental results reflect the effectiveness and generalization of the proposed method on different linguistic scripts, and also show its superiority when compared with other state-of-the-art methods. The source code can be found at https://github.com/awei669/VQ-Font. △ Less

Submitted 14 September, 2023; v1 submitted 2 September, 2023; originally announced September 2023.

Comments: Accepted by ICCV 2023

arXiv:2309.00254 [pdf, other]

Why do universal adversarial attacks work on large language models?: Geometry might be the answer

Authors: Varshini Subhash, Anna Bialas, Weiwei Pan, Finale Doshi-Velez

Abstract: Transformer based large language models with emergent capabilities are becoming increasingly ubiquitous in society. However, the task of understanding and interpreting their internal workings, in the context of adversarial attacks, remains largely unsolved. Gradient-based universal adversarial attacks have been shown to be highly effective on large language models and potentially dangerous due to… ▽ More Transformer based large language models with emergent capabilities are becoming increasingly ubiquitous in society. However, the task of understanding and interpreting their internal workings, in the context of adversarial attacks, remains largely unsolved. Gradient-based universal adversarial attacks have been shown to be highly effective on large language models and potentially dangerous due to their input-agnostic nature. This work presents a novel geometric perspective explaining universal adversarial attacks on large language models. By attacking the 117M parameter GPT-2 model, we find evidence indicating that universal adversarial triggers could be embedding vectors which merely approximate the semantic information in their adversarial training region. This hypothesis is supported by white-box model analysis comprising dimensionality reduction and similarity measurement of hidden representations. We believe this new geometric perspective on the underlying mechanism driving universal attacks could help us gain deeper insight into the internal workings and failure modes of LLMs, thus enabling their mitigation. △ Less

Submitted 1 September, 2023; originally announced September 2023.

Comments: 2nd AdvML Frontiers Workshop at 40th International Conference on Machine Learning, Honolulu, Hawaii, USA, 2023

arXiv:2308.15701 [pdf, other]

A Survey on Multi-Behavior Sequential Recommendation

Authors: Xiaoqing Chen, Zhitao Li, Weike Pan, Zhong Ming

Abstract: Recommender systems is set up to address the issue of information overload in traditional information retrieval systems, which is focused on recommending information that is of most interest to users from massive information. Generally, there is a sequential nature and heterogeneity to the behavior of a person interacting with a system, leading to the proposal of multi-behavior sequential recommen… ▽ More Recommender systems is set up to address the issue of information overload in traditional information retrieval systems, which is focused on recommending information that is of most interest to users from massive information. Generally, there is a sequential nature and heterogeneity to the behavior of a person interacting with a system, leading to the proposal of multi-behavior sequential recommendation (MBSR). MBSR is a relatively new and worthy direction for in-depth research, which can achieve state-of-the-art recommendation through suitable modeling, and some related works have been proposed. This survey aims to shed light on the MBSR problem. Firstly, we introduce MBSR in detail, including its problem definition, application scenarios and challenges faced. Secondly, we detail the classification of MBSR, including neighborhood-based methods, matrix factorization-based methods and deep learning-based methods, where we further classify the deep learning-based methods into different learning architectures based on RNN, GNN, Transformer, and generic architectures as well as architectures that integrate hybrid techniques. In each method, we present related works based on the data perspective and the modeling perspective, as well as analyze the strengths, weaknesses and features of these works. Finally, we discuss some promising future research directions to address the challenges and improve the current status of MBSR. △ Less

Submitted 29 August, 2023; originally announced August 2023.

arXiv:2308.04749 [pdf, other]

Enhancing Efficient Continual Learning with Dynamic Structure Development of Spiking Neural Networks

Authors: Bing Han, Feifei Zhao, Yi Zeng, Wenxuan Pan, Guobin Shen

Abstract: Children possess the ability to learn multiple cognitive tasks sequentially, which is a major challenge toward the long-term goal of artificial general intelligence. Existing continual learning frameworks are usually applicable to Deep Neural Networks (DNNs) and lack the exploration on more brain-inspired, energy-efficient Spiking Neural Networks (SNNs). Drawing on continual learning mechanisms du… ▽ More Children possess the ability to learn multiple cognitive tasks sequentially, which is a major challenge toward the long-term goal of artificial general intelligence. Existing continual learning frameworks are usually applicable to Deep Neural Networks (DNNs) and lack the exploration on more brain-inspired, energy-efficient Spiking Neural Networks (SNNs). Drawing on continual learning mechanisms during child growth and development, we propose Dynamic Structure Development of Spiking Neural Networks (DSD-SNN) for efficient and adaptive continual learning. When learning a sequence of tasks, the DSD-SNN dynamically assigns and grows new neurons to new tasks and prunes redundant neurons, thereby increasing memory capacity and reducing computational overhead. In addition, the overlapping shared structure helps to quickly leverage all acquired knowledge to new tasks, empowering a single network capable of supporting multiple incremental tasks (without the separate sub-network mask for each task). We validate the effectiveness of the proposed model on multiple class incremental learning and task incremental learning benchmarks. Extensive experiments demonstrated that our model could significantly improve performance, learning speed and memory capacity, and reduce computational overhead. Besides, our DSD-SNN model achieves comparable performance with the DNNs-based methods, and significantly outperforms the state-of-the-art (SOTA) performance for existing SNNs-based continual learning methods. △ Less

Submitted 9 August, 2023; originally announced August 2023.

Journal ref: IJCAI2023 Camera ready

arXiv:2308.04719 [pdf, other]

JiangJun: Mastering Xiangqi by Tackling Non-Transitivity in Two-Player Zero-Sum Games

Authors: Yang Li, Kun Xiong, Yingping Zhang, Jiangcheng Zhu, Stephen Mcaleer, Wei Pan, Jun Wang, Zonghong Dai, Yaodong Yang

Abstract: This paper presents an empirical exploration of non-transitivity in perfect-information games, specifically focusing on Xiangqi, a traditional Chinese board game comparable in game-tree complexity to chess and shogi. By analyzing over 10,000 records of human Xiangqi play, we highlight the existence of both transitive and non-transitive elements within the game's strategic structure. To address non… ▽ More This paper presents an empirical exploration of non-transitivity in perfect-information games, specifically focusing on Xiangqi, a traditional Chinese board game comparable in game-tree complexity to chess and shogi. By analyzing over 10,000 records of human Xiangqi play, we highlight the existence of both transitive and non-transitive elements within the game's strategic structure. To address non-transitivity, we introduce the JiangJun algorithm, an innovative combination of Monte-Carlo Tree Search (MCTS) and Policy Space Response Oracles (PSRO) designed to approximate a Nash equilibrium. We evaluate the algorithm empirically using a WeChat mini program and achieve a Master level with a 99.41\% win rate against human players. The algorithm's effectiveness in overcoming non-transitivity is confirmed by a plethora of metrics, such as relative population performance and visualization results. Our project site is available at \url{https://sites.google.com/view/jiangjun-site/}. △ Less

Submitted 9 August, 2023; originally announced August 2023.

Comments: 28 pages, accepted by Transactions on Machine Learning Research (TMLR)

arXiv:2308.01420 [pdf, other]

SAP-sLDA: An Interpretable Interface for Exploring Unstructured Text

Authors: Charumathi Badrinath, Weiwei Pan, Finale Doshi-Velez

Abstract: A common way to explore text corpora is through low-dimensional projections of the documents, where one hopes that thematically similar documents will be clustered together in the projected space. However, popular algorithms for dimensionality reduction of text corpora, like Latent Dirichlet Allocation (LDA), often produce projections that do not capture human notions of document similarity. We pr… ▽ More A common way to explore text corpora is through low-dimensional projections of the documents, where one hopes that thematically similar documents will be clustered together in the projected space. However, popular algorithms for dimensionality reduction of text corpora, like Latent Dirichlet Allocation (LDA), often produce projections that do not capture human notions of document similarity. We propose a semi-supervised human-in-the-loop LDA-based method for learning topics that preserve semantically meaningful relationships between documents in low-dimensional projections. On synthetic corpora, our method yields more interpretable projections than baseline methods with only a fraction of labels provided. On a real corpus, we obtain qualitatively similar results. △ Less

Submitted 28 July, 2023; originally announced August 2023.

arXiv:2308.01197 [pdf, other]

GNN4FR: A Lossless GNN-based Federated Recommendation Framework

Authors: Guowei Wu, Weike Pan, Zhong Ming

Abstract: Graph neural networks (GNNs) have gained wide popularity in recommender systems due to their capability to capture higher-order structure information among the nodes of users and items. However, these methods need to collect personal interaction data between a user and the corresponding items and then model them in a central server, which would break the privacy laws such as GDPR. So far, no exist… ▽ More Graph neural networks (GNNs) have gained wide popularity in recommender systems due to their capability to capture higher-order structure information among the nodes of users and items. However, these methods need to collect personal interaction data between a user and the corresponding items and then model them in a central server, which would break the privacy laws such as GDPR. So far, no existing work can construct a global graph without leaking each user's private interaction data (i.e., his or her subgraph). In this paper, we are the first to design a novel lossless federated recommendation framework based on GNN, which achieves full-graph training with complete high-order structure information, enabling the training process to be equivalent to the corresponding un-federated counterpart. In addition, we use LightGCN to instantiate an example of our framework and show its equivalence. △ Less

Submitted 25 July, 2023; originally announced August 2023.

Showing 1–50 of 174 results for author: Pan, W