Search | arXiv e-print repository

arXiv:2407.19430 [pdf, other]

Progressive Domain Adaptation for Thermal Infrared Object Tracking

Authors: Qiao Li, Kanlun Tan, Qiao Liu, Di Yuan, Xin Li, Yunpeng Liu

Abstract: Due to the lack of large-scale labeled Thermal InfraRed (TIR) training datasets, most existing TIR trackers are trained directly on RGB datasets. However, tracking methods trained on RGB datasets suffer a significant drop-off in TIR data due to the domain shift issue. To this end, in this work, we propose a Progressive Domain Adaptation framework for TIR Tracking (PDAT), which transfers useful kno… ▽ More Due to the lack of large-scale labeled Thermal InfraRed (TIR) training datasets, most existing TIR trackers are trained directly on RGB datasets. However, tracking methods trained on RGB datasets suffer a significant drop-off in TIR data due to the domain shift issue. To this end, in this work, we propose a Progressive Domain Adaptation framework for TIR Tracking (PDAT), which transfers useful knowledge learned from RGB tracking to TIR tracking. The framework makes full use of large-scale labeled RGB datasets without requiring time-consuming and labor-intensive labeling of large-scale TIR data. Specifically, we first propose an adversarial-based global domain adaptation module to reduce domain gap on the feature level coarsely. Second, we design a clustering-based subdomain adaptation method to further align the feature distributions of the RGB and TIR datasets finely. These two domain adaptation modules gradually eliminate the discrepancy between the two domains, and thus learn domain-invariant fine-grained features through progressive training. Additionally, we collect a largescale TIR dataset with over 1.48 million unlabeled TIR images for training the proposed domain adaptation framework. Experimental results on five TIR tracking benchmarks show that the proposed method gains a nearly 6% success rate, demonstrating its effectiveness. △ Less

Submitted 28 July, 2024; originally announced July 2024.

Comments: 10 pages, 8 figures

arXiv:2407.06857 [pdf, other]

Enhanced Battery Degradation-Aware Scheduling for Distribution Network with Electric Vehicle Load

Authors: Vijay Babu Pamshetti, Wei Zhang, Andy Man-Fai Ng, Qingyu Yan, Kuan Tak Tan

Abstract: Batteries play a key role in today's power grid. In this paper, we investigate the impact of battery degradation on the distribution network. We formulate a multi-objective framework for optimizing battery scheduling with the goals of minimizing monetary costs and improving network performance. Our framework incorporates energy purchase and battery degradation into the costs and measures the netwo… ▽ More Batteries play a key role in today's power grid. In this paper, we investigate the impact of battery degradation on the distribution network. We formulate a multi-objective framework for optimizing battery scheduling with the goals of minimizing monetary costs and improving network performance. Our framework incorporates energy purchase and battery degradation into the costs and measures the network performance through energy losses and voltage deviation. We propose Bach for battery degradation-aware cheduling based on e-constraint and fuzzy logic methods. Bach is implemented for the IEEE 33-bus network for an experimental study. The results show the effectiveness of Bach in optimizing costs and performance simultaneously with battery degradation awareness and demonstrate the flexibility of further customization. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: 3 figures

arXiv:2407.03085 [pdf, other]

Accelerated Inference for Partially Observed Markov Processes using Automatic Differentiation

Authors: Kevin Tan, Giles Hooker, Edward L. Ionides

Abstract: Automatic differentiation (AD) has driven recent advances in machine learning, including deep neural networks and Hamiltonian Markov Chain Monte Carlo methods. Partially observed nonlinear stochastic dynamical systems have proved resistant to AD techniques because widely used particle filter algorithms yield an estimated likelihood function that is discontinuous as a function of the model paramete… ▽ More Automatic differentiation (AD) has driven recent advances in machine learning, including deep neural networks and Hamiltonian Markov Chain Monte Carlo methods. Partially observed nonlinear stochastic dynamical systems have proved resistant to AD techniques because widely used particle filter algorithms yield an estimated likelihood function that is discontinuous as a function of the model parameters. We show how to embed two existing AD particle filter methods in a theoretical framework that provides an extension to a new class of algorithms. This new class permits a bias/variance tradeoff and hence a mean squared error substantially lower than the existing algorithms. We develop likelihood maximization algorithms suited to the Monte Carlo properties of the AD gradient estimate. Our algorithms require only a differentiable simulator for the latent dynamic system; by contrast, most previous approaches to AD likelihood maximization for particle filters require access to the system's transition probabilities. Numerical results indicate that a hybrid algorithm that uses AD to refine a coarse solution from an iterated filtering algorithm show substantial improvement on current state-of-the-art methods for a challenging scientific benchmark problem. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.02639 [pdf, other]

Holistically-Nested Structure-Aware Graph Neural Network for Road Extraction

Authors: Tinghuai Wang, Guangming Wang, Kuan Eeik Tan

Abstract: Convolutional neural networks (CNN) have made significant advances in detecting roads from satellite images. However, existing CNN approaches are generally repurposed semantic segmentation architectures and suffer from the poor delineation of long and curved regions. Lack of overall road topology and structure information further deteriorates their performance on challenging remote sensing images.… ▽ More Convolutional neural networks (CNN) have made significant advances in detecting roads from satellite images. However, existing CNN approaches are generally repurposed semantic segmentation architectures and suffer from the poor delineation of long and curved regions. Lack of overall road topology and structure information further deteriorates their performance on challenging remote sensing images. This paper presents a novel multi-task graph neural network (GNN) which simultaneously detects both road regions and road borders; the inter-play between these two tasks unlocks superior performance from two perspectives: (1) the hierarchically detected road borders enable the network to capture and encode holistic road structure to enhance road connectivity (2) identifying the intrinsic correlation of semantic landcover regions mitigates the difficulty in recognizing roads cluttered by regions with similar appearance. Experiments on challenging dataset demonstrate that the proposed architecture can improve the road border delineation and road extraction accuracy compared with the existing methods. △ Less

Submitted 8 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

arXiv:2406.14359 [pdf, other]

Learning to Transfer for Evolutionary Multitasking

Authors: Sheng-Hao Wu, Yuxiao Huang, Xingyu Wu, Liang Feng, Zhi-Hui Zhan, Kay Chen Tan

Abstract: Evolutionary multitasking (EMT) is an emerging approach for solving multitask optimization problems (MTOPs) and has garnered considerable research interest. The implicit EMT is a significant research branch that utilizes evolution operators to enable knowledge transfer (KT) between tasks. However, current approaches in implicit EMT face challenges in adaptability, due to the use of a limited numbe… ▽ More Evolutionary multitasking (EMT) is an emerging approach for solving multitask optimization problems (MTOPs) and has garnered considerable research interest. The implicit EMT is a significant research branch that utilizes evolution operators to enable knowledge transfer (KT) between tasks. However, current approaches in implicit EMT face challenges in adaptability, due to the use of a limited number of evolution operators and insufficient utilization of evolutionary states for performing KT. This results in suboptimal exploitation of implicit KT's potential to tackle a variety of MTOPs. To overcome these limitations, we propose a novel Learning to Transfer (L2T) framework to automatically discover efficient KT policies for the MTOPs at hand. Our framework conceptualizes the KT process as a learning agent's sequence of strategic decisions within the EMT process. We propose an action formulation for deciding when and how to transfer, a state representation with informative features of evolution states, a reward formulation concerning convergence and transfer efficiency gain, and the environment for the agent to interact with MTOPs. We employ an actor-critic network structure for the agent and learn it via proximal policy optimization. This learned agent can be integrated with various evolutionary algorithms, enhancing their ability to address a range of new MTOPs. Comprehensive empirical studies on both synthetic and real-world MTOPs, encompassing diverse inter-task relationships, function classes, and task distributions are conducted to validate the proposed L2T framework. The results show a marked improvement in the adaptability and performance of implicit EMT when solving a wide spectrum of unseen MTOPs. △ Less

Submitted 22 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

Comments: Under review

arXiv:2406.12313 [pdf]

A framework for developing a knowledge management platform

Authors: Marie Lisandra Zepeda Mendoza, Sonali Agarwal, James A. Blackshaw, Vanesa Bol, Audrey Fazzi, Filippo Fiorini, Amy Louise Foreman, Nancy George, Brett R. Johnson, Brian Martin, Dave McComb, Euphemia Mutasa-Gottgens, Helen Parkinson, Martin Romacker, Rolf Russell, Valérien Ségard, Shawn Zheng Kai Tan, Wei Kheng Teh, F. P. Winstanley, Benedict Wong, Adrian M. Smith

Abstract: Knowledge management (KM) involves collecting, organizing, storing, and disseminating information to improve decision-making, innovation, and performance. Implementing KM at scale has become essential for organizations to effectively leverage vast accessible data. This paper is a compilation of concepts that emerged from KM workshops hosted by EMBL-EBI, attended by SMEs and industry. We provide gu… ▽ More Knowledge management (KM) involves collecting, organizing, storing, and disseminating information to improve decision-making, innovation, and performance. Implementing KM at scale has become essential for organizations to effectively leverage vast accessible data. This paper is a compilation of concepts that emerged from KM workshops hosted by EMBL-EBI, attended by SMEs and industry. We provide guidance on envisioning, executing, evaluating, and evolving knowledge management platforms. We emphasize essential considerations such as setting knowledge domain boundaries and measuring success, as well as the importance of making knowledge accessible for downstream applications and non-computational users and highlights necessary personal and organizational skills for success. We stress the importance of collaboration and the need for convergence on shared principles and commitment to provide or seek resources to advance KM. The community is invited to join the journey of KM and contribute to the advancement of the field by applying and improving on the guidelines described. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 18 pages, 1 figure

arXiv:2406.11809 [pdf, other]

Physics-Constrained Learning for PDE Systems with Uncertainty Quantified Port-Hamiltonian Models

Authors: Kaiyuan Tan, Peilun Li, Thomas Beckers

Abstract: Modeling the dynamics of flexible objects has become an emerging topic in the community as these objects become more present in many applications, e.g., soft robotics. Due to the properties of flexible materials, the movements of soft objects are often highly nonlinear and, thus, complex to predict. Data-driven approaches seem promising for modeling those complex dynamics but often neglect basic p… ▽ More Modeling the dynamics of flexible objects has become an emerging topic in the community as these objects become more present in many applications, e.g., soft robotics. Due to the properties of flexible materials, the movements of soft objects are often highly nonlinear and, thus, complex to predict. Data-driven approaches seem promising for modeling those complex dynamics but often neglect basic physical principles, which consequently makes them untrustworthy and limits generalization. To address this problem, we propose a physics-constrained learning method that combines powerful learning tools and reliable physical models. Our method leverages the data collected from observations by sending them into a Gaussian process that is physically constrained by a distributed Port-Hamiltonian model. Based on the Bayesian nature of the Gaussian process, we not only learn the dynamics of the system, but also enable uncertainty quantification. Furthermore, the proposed approach preserves the compositional nature of Port-Hamiltonian systems. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.11619 [pdf, other]

AV-CrossNet: an Audiovisual Complex Spectral Mapping Network for Speech Separation By Leveraging Narrow- and Cross-Band Modeling

Authors: Vahid Ahmadi Kalkhorani, Cheng Yu, Anurag Kumar, Ke Tan, Buye Xu, DeLiang Wang

Abstract: Adding visual cues to audio-based speech separation can improve separation performance. This paper introduces AV-CrossNet, an audiovisual (AV) system for speech enhancement, target speaker extraction, and multi-talker speaker separation. AV-CrossNet is extended from the CrossNet architecture, which is a recently proposed network that performs complex spectral mapping for speech separation by lever… ▽ More Adding visual cues to audio-based speech separation can improve separation performance. This paper introduces AV-CrossNet, an audiovisual (AV) system for speech enhancement, target speaker extraction, and multi-talker speaker separation. AV-CrossNet is extended from the CrossNet architecture, which is a recently proposed network that performs complex spectral mapping for speech separation by leveraging global attention and positional encoding. To effectively utilize visual cues, the proposed system incorporates pre-extracted visual embeddings and employs a visual encoder comprising temporal convolutional layers. Audio and visual features are fused in an early fusion layer before feeding to AV-CrossNet blocks. We evaluate AV-CrossNet on multiple datasets, including LRS, VoxCeleb, and COG-MHEAR challenge. Evaluation results demonstrate that AV-CrossNet advances the state-of-the-art performance in all audiovisual tasks, even on untrained and mismatched datasets. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 10 pages, 4 Figures, and 4 Tables

arXiv:2406.11414 [pdf, other]

Formally Certified Approximate Model Counting

Authors: Yong Kiam Tan, Jiong Yang, Mate Soos, Magnus O. Myreen, Kuldeep S. Meel

Abstract: Approximate model counting is the task of approximating the number of solutions to an input Boolean formula. The state-of-the-art approximate model counter for formulas in conjunctive normal form (CNF), ApproxMC, provides a scalable means of obtaining model counts with probably approximately correct (PAC)-style guarantees. Nevertheless, the validity of ApproxMC's approximation relies on a careful… ▽ More Approximate model counting is the task of approximating the number of solutions to an input Boolean formula. The state-of-the-art approximate model counter for formulas in conjunctive normal form (CNF), ApproxMC, provides a scalable means of obtaining model counts with probably approximately correct (PAC)-style guarantees. Nevertheless, the validity of ApproxMC's approximation relies on a careful theoretical analysis of its randomized algorithm and the correctness of its highly optimized implementation, especially the latter's stateful interactions with an incremental CNF satisfiability solver capable of natively handling parity (XOR) constraints. We present the first certification framework for approximate model counting with formally verified guarantees on the quality of its output approximation. Our approach combines: (i) a static, once-off, formal proof of the algorithm's PAC guarantee in the Isabelle/HOL proof assistant; and (ii) dynamic, per-run, verification of ApproxMC's calls to an external CNF-XOR solver using proof certificates. We detail our general approach to establish a rigorous connection between these two parts of the verification, including our blueprint for turning the formalized, randomized algorithm into a verified proof checker, and our design of proof certificates for both ApproxMC and its internal CNF-XOR solving steps. Experimentally, we show that certificate generation adds little overhead to an approximate counter implementation, and that our certificate checker is able to fully certify $84.7\%$ of instances with generated certificates when given the same time and memory limits as the counter. △ Less

Submitted 18 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

Comments: The extended version, including the appendix, of the paper to be published in CAV24. The associated artifact is available at https://doi.org/10.5281/zenodo.10948839

arXiv:2406.08987 [pdf, other]

Autonomous Multi-Objective Optimization Using Large Language Model

Authors: Yuxiao Huang, Shenghao Wu, Wenjie Zhang, Jibin Wu, Liang Feng, Kay Chen Tan

Abstract: Multi-objective optimization problems (MOPs) are ubiquitous in real-world applications, presenting a complex challenge of balancing multiple conflicting objectives. Traditional evolutionary algorithms (EAs), though effective, often rely on domain-specific expertise and iterative fine-tuning, hindering adaptability to unseen MOPs. In recent years, the advent of Large Language Models (LLMs) has revo… ▽ More Multi-objective optimization problems (MOPs) are ubiquitous in real-world applications, presenting a complex challenge of balancing multiple conflicting objectives. Traditional evolutionary algorithms (EAs), though effective, often rely on domain-specific expertise and iterative fine-tuning, hindering adaptability to unseen MOPs. In recent years, the advent of Large Language Models (LLMs) has revolutionized software engineering by enabling the autonomous generation and refinement of programs. Leveraging this breakthrough, we propose a new LLM-based framework that autonomously designs EA operators for solving MOPs. The proposed framework includes a robust testing module to refine the generated EA operator through error-driven dialogue with LLMs, a dynamic selection strategy along with informative prompting-based crossover and mutation to fit textual optimization pipeline. Our approach facilitates the design of EA operators without the extensive demands for expert intervention, thereby speeding up the innovation of EA operators. Empirical studies across various MOP categories validate the robustness and superior performance of our proposed framework. △ Less

Submitted 26 July, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

Comments: 14 pages, 11 figures, 6 tables

arXiv:2406.07390 [pdf, other]

DiffCom: Channel Received Signal is a Natural Condition to Guide Diffusion Posterior Sampling

Authors: Sixian Wang, Jincheng Dai, Kailin Tan, Xiaoqi Qin, Kai Niu, Ping Zhang

Abstract: End-to-end visual communication systems typically optimize a trade-off between channel bandwidth costs and signal-level distortion metrics. However, under challenging physical conditions, this traditional discriminative communication paradigm often results in unrealistic reconstructions with perceptible blurring and aliasing artifacts, despite the inclusion of perceptual or adversarial losses for… ▽ More End-to-end visual communication systems typically optimize a trade-off between channel bandwidth costs and signal-level distortion metrics. However, under challenging physical conditions, this traditional discriminative communication paradigm often results in unrealistic reconstructions with perceptible blurring and aliasing artifacts, despite the inclusion of perceptual or adversarial losses for optimizing. This issue primarily stems from the receiver's limited knowledge about the underlying data manifold and the use of deterministic decoding mechanisms. To address these limitations, this paper introduces DiffCom, a novel end-to-end generative communication paradigm that utilizes off-the-shelf generative priors and probabilistic diffusion models for decoding, thereby improving perceptual quality without heavily relying on bandwidth costs and received signal quality. Unlike traditional systems that rely on deterministic decoders optimized solely for distortion metrics, our DiffCom leverages raw channel-received signal as a fine-grained condition to guide stochastic posterior sampling. Our approach ensures that reconstructions remain on the manifold of real data with a novel confirming constraint, enhancing the robustness and reliability of the generated outcomes. Furthermore, DiffCom incorporates a blind posterior sampling technique to address scenarios with unknown forward transmission characteristics. Extensive experimental validations demonstrate that DiffCom not only produces realistic reconstructions with details faithful to the original data but also achieves superior robustness against diverse wireless transmission degradations. Collectively, these advancements establish DiffCom as a new benchmark in designing generative communication systems that offer enhanced robustness and generalization superiorities. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.07069 [pdf, other]

Optimal Gait Control for a Tendon-driven Soft Quadruped Robot by Model-based Reinforcement Learning

Authors: Xuezhi Niu, Kaige Tan, Lei Feng

Abstract: This study presents an innovative approach to optimal gait control for a soft quadruped robot enabled by four Compressible Tendon-driven Soft Actuators (CTSAs). Improving our previous studies of using model-free reinforcement learning for gait control, we employ model-based reinforcement learning (MBRL) to further enhance the performance of the gait controller. Compared to rigid robots, the propos… ▽ More This study presents an innovative approach to optimal gait control for a soft quadruped robot enabled by four Compressible Tendon-driven Soft Actuators (CTSAs). Improving our previous studies of using model-free reinforcement learning for gait control, we employ model-based reinforcement learning (MBRL) to further enhance the performance of the gait controller. Compared to rigid robots, the proposed soft quadruped robot has better safety, less weight, and a simpler mechanism for fabrication and control. However, the primary challenge lies in developing sophisticated control algorithms to attain optimal gait control for fast and stable locomotion. The research employs a multi-stage methodology, including state space restriction, data-driven model training, and reinforcement learning algorithm development. Compared to benchmark methods, the proposed MBRL algorithm, combined with post-training, significantly improves the efficiency and performance of gait control policies. The developed policy is both robust and adaptable to the robot's deformable morphology. The study concludes by highlighting the practical applicability of these findings in real-world scenarios. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.07065 [pdf, other]

Optimal Gait Design for a Soft Quadruped Robot via Multi-fidelity Bayesian Optimization

Authors: Kaige Tan, Xuezhi Niu, Qinglei Ji, Lei Feng, Martin Törngren

Abstract: This study focuses on the locomotion capability improvement in a tendon-driven soft quadruped robot through an online adaptive learning approach. Leveraging the inverse kinematics model of the soft quadruped robot, we employ a central pattern generator to design a parametric gait pattern, and use Bayesian optimization (BO) to find the optimal parameters. Further, to address the challenges of model… ▽ More This study focuses on the locomotion capability improvement in a tendon-driven soft quadruped robot through an online adaptive learning approach. Leveraging the inverse kinematics model of the soft quadruped robot, we employ a central pattern generator to design a parametric gait pattern, and use Bayesian optimization (BO) to find the optimal parameters. Further, to address the challenges of modeling discrepancies, we implement a multi-fidelity BO approach, combining data from both simulation and physical experiments throughout training and optimization. This strategy enables the adaptive refinement of the gait pattern and ensures a smooth transition from simulation to real-world deployment for the controller. Moreover, we integrate a computational task off-loading architecture by edge computing, which reduces the onboard computational and memory overhead, to improve real-time control performance and facilitate an effective online learning process. The proposed approach successfully achieves optimal walking gait design for physical deployment with high efficiency, effectively addressing challenges related to the reality gap in soft robotics. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.00812 [pdf, other]

Covariance-Adaptive Sequential Black-box Optimization for Diffusion Targeted Generation

Authors: Yueming Lyu, Kim Yong Tan, Yew Soon Ong, Ivor W. Tsang

Abstract: Diffusion models have demonstrated great potential in generating high-quality content for images, natural language, protein domains, etc. However, how to perform user-preferred targeted generation via diffusion models with only black-box target scores of users remains challenging. To address this issue, we first formulate the fine-tuning of the targeted reserve-time stochastic differential equatio… ▽ More Diffusion models have demonstrated great potential in generating high-quality content for images, natural language, protein domains, etc. However, how to perform user-preferred targeted generation via diffusion models with only black-box target scores of users remains challenging. To address this issue, we first formulate the fine-tuning of the targeted reserve-time stochastic differential equation (SDE) associated with a pre-trained diffusion model as a sequential black-box optimization problem. Furthermore, we propose a novel covariance-adaptive sequential optimization algorithm to optimize cumulative black-box scores under unknown transition dynamics. Theoretically, we prove a $O(\frac{d^2}{\sqrt{T}})$ convergence rate for cumulative convex functions without smooth and strongly convex assumptions. Empirically, experiments on both numerical test problems and target-guided 3D-molecule generation tasks show the superior performance of our method in achieving better target scores. △ Less

Submitted 8 June, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

arXiv:2405.17324 [pdf, other]

Leveraging Offline Data in Linear Latent Bandits

Authors: Chinmaya Kausik, Kevin Tan, Ambuj Tewari

Abstract: Sequential decision-making domains such as recommender systems, healthcare and education often have unobserved heterogeneity in the population that can be modeled using latent bandits $-$ a framework where an unobserved latent state determines the model for a trajectory. While the latent bandit framework is compelling, the extent of its generality is unclear. We first address this by establishing… ▽ More Sequential decision-making domains such as recommender systems, healthcare and education often have unobserved heterogeneity in the population that can be modeled using latent bandits $-$ a framework where an unobserved latent state determines the model for a trajectory. While the latent bandit framework is compelling, the extent of its generality is unclear. We first address this by establishing a de Finetti theorem for decision processes, and show that $\textit{every}$ exchangeable and coherent stateless decision process is a latent bandit. The latent bandit framework lends itself particularly well to online learning with offline datasets, a problem of growing interest in sequential decision-making. One can leverage offline latent bandit data to learn a complex model for each latent state, so that an agent can simply learn the latent state online to act optimally. We focus on a linear model for a latent bandit with $d_A$-dimensional actions, where the latent states lie in an unknown $d_K$-dimensional subspace for $d_K \ll d_A$. We present SOLD, a novel principled method to learn this subspace from short offline trajectories with guarantees. We then provide two methods to leverage this subspace online: LOCAL-UCB and ProBALL-UCB. We demonstrate that LOCAL-UCB enjoys $\tilde O(\min(d_A\sqrt{T}, d_K\sqrt{T}(1+\sqrt{d_AT/d_KN})))$ regret guarantees, where the effective dimension is lower when the size $N$ of the offline dataset is larger. ProBALL-UCB enjoys a slightly weaker guarantee, but is more practical and computationally efficient. Finally, we establish the efficacy of our methods using experiments on both synthetic data and real-life movie recommendation data from MovieLens. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 40 pages. 14 pages for main paper, 26 pages for references + appendix

arXiv:2405.16041 [pdf, other]

Explainable Molecular Property Prediction: Aligning Chemical Concepts with Predictions via Language Models

Authors: Zhenzhong Wang, Zehui Lin, Wanyu Lin, Ming Yang, Minggang Zeng, Kay Chen Tan

Abstract: Providing explainable molecule property predictions is critical for many scientific domains, such as drug discovery and material science. Though transformer-based language models have shown great potential in accurate molecular property prediction, they neither provide chemically meaningful explanations nor faithfully reveal the molecular structure-property relationships. In this work, we develop… ▽ More Providing explainable molecule property predictions is critical for many scientific domains, such as drug discovery and material science. Though transformer-based language models have shown great potential in accurate molecular property prediction, they neither provide chemically meaningful explanations nor faithfully reveal the molecular structure-property relationships. In this work, we develop a new framework for explainable molecular property prediction based on language models, dubbed as Lamole, which can provide chemical concepts-aligned explanations. We first leverage a designated molecular representation -- the Group SELFIES -- as it can provide chemically meaningful semantics. Because attention mechanisms in Transformers can inherently capture relationships within the input, we further incorporate the attention weights and gradients together to generate explanations for capturing the functional group interactions. We then carefully craft a marginal loss to explicitly optimize the explanations to be able to align with the chemists' annotations. We bridge the manifold hypothesis with the elaborated marginal loss to prove that the loss can align the explanations with the tangent space of the data manifold, leading to concept-aligned explanations. Experimental results over six mutagenicity datasets and one hepatotoxicity dataset demonstrate Lamole can achieve comparable classification accuracy and boost the explanation accuracy by up to 14.8%, being the state-of-the-art in explainable molecular property prediction. △ Less

Submitted 31 May, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.15252 [pdf, other]

Fast 3D Molecule Generation via Unified Geometric Optimal Transport

Authors: Haokai Hong, Wanyu Lin, Kay Chen Tan

Abstract: This paper proposes a new 3D molecule generation framework, called GOAT, for fast and effective 3D molecule generation based on the flow-matching optimal transport objective. Specifically, we formulate a geometric transport formula for measuring the cost of mapping multi-modal features (e.g., continuous atom coordinates and categorical atom types) between a base distribution and a target data dist… ▽ More This paper proposes a new 3D molecule generation framework, called GOAT, for fast and effective 3D molecule generation based on the flow-matching optimal transport objective. Specifically, we formulate a geometric transport formula for measuring the cost of mapping multi-modal features (e.g., continuous atom coordinates and categorical atom types) between a base distribution and a target data distribution. Our formula is solved within a unified, equivalent, and smooth representation space. This is achieved by transforming the multi-modal features into a continuous latent space with equivalent networks. In addition, we find that identifying optimal distributional coupling is necessary for fast and effective transport between any two distributions. We further propose a flow refinement and purification mechanism for optimal coupling identification. By doing so, GOAT can turn arbitrary distribution couplings into new deterministic couplings, leading to a unified optimal transport path for fast 3D molecule generation. The purification filters the subpar molecules to ensure the ultimate generation performance. We theoretically prove the proposed method indeed reduced the transport cost. Finally, extensive experiments show that GOAT enjoys the efficiency of solving geometric optimal transport, leading to a double speedup compared to the sub-optimal method while achieving the best generation quality regarding validity, uniqueness, and novelty. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.11349 [pdf, other]

Unlock the Power of Algorithm Features: A Generalization Analysis for Algorithm Selection

Authors: Xingyu Wu, Yan Zhong, Jibin Wu, Yuxiao Huang, Sheng-hao Wu, Kay Chen Tan

Abstract: In the algorithm selection research, the discussion surrounding algorithm features has been significantly overshadowed by the emphasis on problem features. Although a few empirical studies have yielded evidence regarding the effectiveness of algorithm features, the potential benefits of incorporating algorithm features into algorithm selection models and their suitability for different scenarios r… ▽ More In the algorithm selection research, the discussion surrounding algorithm features has been significantly overshadowed by the emphasis on problem features. Although a few empirical studies have yielded evidence regarding the effectiveness of algorithm features, the potential benefits of incorporating algorithm features into algorithm selection models and their suitability for different scenarios remain unclear. In this paper, we address this gap by proposing the first provable guarantee for algorithm selection based on algorithm features, taking a generalization perspective. We analyze the benefits and costs associated with algorithm features and investigate how the generalization error is affected by different factors. Specifically, we examine adaptive and predefined algorithm features under transductive and inductive learning paradigms, respectively, and derive upper bounds for the generalization error based on their model's Rademacher complexity. Our theoretical findings not only provide tight upper bounds, but also offer analytical insights into the impact of various factors, such as the training scale of problem instances and candidate algorithms, model parameters, feature values, and distributional differences between the training and test data. Notably, we demonstrate how models will benefit from algorithm features in complex scenarios involving many algorithms, and proves the positive correlation between generalization error bound and $χ^2$-divergence of distributions. △ Less

Submitted 3 June, 2024; v1 submitted 18 May, 2024; originally announced May 2024.

arXiv:2405.05767 [pdf]

Large Language Model-Aided Evolutionary Search for Constrained Multiobjective Optimization

Authors: Zeyi Wang, Songbai Liu, Jianyong Chen, Kay Chen Tan

Abstract: Evolutionary algorithms excel in solving complex optimization problems, especially those with multiple objectives. However, their stochastic nature can sometimes hinder rapid convergence to the global optima, particularly in scenarios involving constraints. In this study, we employ a large language model (LLM) to enhance evolutionary search for solving constrained multi-objective optimization prob… ▽ More Evolutionary algorithms excel in solving complex optimization problems, especially those with multiple objectives. However, their stochastic nature can sometimes hinder rapid convergence to the global optima, particularly in scenarios involving constraints. In this study, we employ a large language model (LLM) to enhance evolutionary search for solving constrained multi-objective optimization problems. Our aim is to speed up the convergence of the evolutionary population. To achieve this, we finetune the LLM through tailored prompt engineering, integrating information concerning both objective values and constraint violations of solutions. This process enables the LLM to grasp the relationship between well-performing and poorly performing solutions based on the provided input data. Solution's quality is assessed based on their constraint violations and objective-based performance. By leveraging the refined LLM, it can be used as a search operator to generate superior-quality solutions. Experimental evaluations across various test benchmarks illustrate that LLM-aided evolutionary search can significantly accelerate the population's convergence speed and stands out competitively against cutting-edge evolutionary algorithms. △ Less

Submitted 9 May, 2024; originally announced May 2024.

Comments: 15 pages, 6 figures, 2024 International Conference on Intelligent Computing

arXiv:2405.05413 [pdf]

Digital Evolution: Novo Nordisk's Shift to Ontology-Based Data Management

Authors: Shawn Zheng Kai Tan, Shounak Baksi, Thomas Gade Bjerregaard, Preethi Elangovan, Thrishna Kuttikattu Gopalakrishnan, Darko Hric, Joffrey Joumaa, Beidi Li, Kashif Rabbani, Santhosh Kannan Venkatesan, Joshua Daniel Valdez, Saritha Vettikunnel Kuriakose

Abstract: Biomedical data is growing exponentially, and managing it is increasingly challenging. While Findable, Accessible, Interoperable and Reusable (FAIR) data principles provide guidance, their adoption has proven difficult, especially in larger enterprises like pharmaceutical companies. In this manuscript, we describe how we leverage an Ontology-Based Data Management (OBDM) strategy for digital transf… ▽ More Biomedical data is growing exponentially, and managing it is increasingly challenging. While Findable, Accessible, Interoperable and Reusable (FAIR) data principles provide guidance, their adoption has proven difficult, especially in larger enterprises like pharmaceutical companies. In this manuscript, we describe how we leverage an Ontology-Based Data Management (OBDM) strategy for digital transformation in Novo Nordisk Research & Early Development. Here, we include both our technical blueprint and our approach for organizational change management. We further discuss how such an OBDM ecosystem plays a pivotal role in the organizations digital aspirations for data federation and discovery fuelled by artificial intelligence. Our aim for this paper is to share the lessons learned in order to foster dialogue with parties navigating similar waters while collectively advancing the efforts in the fields of data management, semantics and data driven drug discovery. △ Less

Submitted 10 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

Comments: 14 pages, 2 figures

arXiv:2405.05343 [pdf, other]

Distributed Least Squares in Small Space via Sketching and Bias Reduction

Authors: Sachin Garg, Kevin Tan, Michał Dereziński

Abstract: Matrix sketching is a powerful tool for reducing the size of large data matrices. Yet there are fundamental limitations to this size reduction when we want to recover an accurate estimator for a task such as least square regression. We show that these limitations can be circumvented in the distributed setting by designing sketching methods that minimize the bias of the estimator, rather than its e… ▽ More Matrix sketching is a powerful tool for reducing the size of large data matrices. Yet there are fundamental limitations to this size reduction when we want to recover an accurate estimator for a task such as least square regression. We show that these limitations can be circumvented in the distributed setting by designing sketching methods that minimize the bias of the estimator, rather than its error. In particular, we give a sparse sketching method running in optimal space and current matrix multiplication time, which recovers a nearly-unbiased least squares estimator using two passes over the data. This leads to new communication-efficient distributed averaging algorithms for least squares and related tasks, which directly improve on several prior approaches. Our key novelty is a new bias analysis for sketched least squares, giving a sharp characterization of its dependence on the sketch sparsity. The techniques include new higher-moment restricted Bai-Silverstein inequalities, which are of independent interest to the non-asymptotic analysis of deterministic equivalents for random matrices that arise from sketching. △ Less

Submitted 8 May, 2024; originally announced May 2024.

arXiv:2405.03924 [pdf, other]

NeurDB: An AI-powered Autonomous Data System

Authors: Beng Chin Ooi, Shaofeng Cai, Gang Chen, Yanyan Shen, Kian-Lee Tan, Yuncheng Wu, Xiaokui Xiao, Naili Xing, Cong Yue, Lingze Zeng, Meihui Zhang, Zhanhao Zhao

Abstract: In the wake of rapid advancements in artificial intelligence (AI), we stand on the brink of a transformative leap in data systems. The imminent fusion of AI and DB (AIxDB) promises a new generation of data systems, which will relieve the burden on end-users across all industry sectors by featuring AI-enhanced functionalities, such as personalized and automated in-database AI-powered analytics, sel… ▽ More In the wake of rapid advancements in artificial intelligence (AI), we stand on the brink of a transformative leap in data systems. The imminent fusion of AI and DB (AIxDB) promises a new generation of data systems, which will relieve the burden on end-users across all industry sectors by featuring AI-enhanced functionalities, such as personalized and automated in-database AI-powered analytics, self-driving capabilities for improved system performance, etc. In this paper, we explore the evolution of data systems with a focus on deepening the fusion of AI and DB. We present NeurDB, an AI-powered autonomous data system designed to fully embrace AI design in each major system component and provide in-database AI-powered analytics. We outline the conceptual and architectural overview of NeurDB, discuss its design choices and key components, and report its current development and future plan. △ Less

Submitted 4 July, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

arXiv:2404.17856 [pdf, other]

Uncertainty quantification for iterative algorithms in linear models with application to early stopping

Authors: Pierre C. Bellec, Kai Tan

Abstract: This paper investigates the iterates $\hbb^1,\dots,\hbb^T$ obtained from iterative algorithms in high-dimensional linear regression problems, in the regime where the feature dimension $p$ is comparable with the sample size $n$, i.e., $p \asymp n$. The analysis and proposed estimators are applicable to Gradient Descent (GD), proximal GD and their accelerated variants such as Fast Iterative Soft-Thr… ▽ More This paper investigates the iterates $\hbb^1,\dots,\hbb^T$ obtained from iterative algorithms in high-dimensional linear regression problems, in the regime where the feature dimension $p$ is comparable with the sample size $n$, i.e., $p \asymp n$. The analysis and proposed estimators are applicable to Gradient Descent (GD), proximal GD and their accelerated variants such as Fast Iterative Soft-Thresholding (FISTA). The paper proposes novel estimators for the generalization error of the iterate $\hbb^t$ for any fixed iteration $t$ along the trajectory. These estimators are proved to be $\sqrt n$-consistent under Gaussian designs. Applications to early-stopping are provided: when the generalization error of the iterates is a U-shape function of the iteration $t$, the estimates allow to select from the data an iteration $\hat t$ that achieves the smallest generalization error along the trajectory. Additionally, we provide a technique for developing debiasing corrections and valid confidence intervals for the components of the true coefficient vector from the iterate $\hbb^t$ at any finite iteration $t$. Extensive simulations on synthetic data illustrate the theoretical results. △ Less

Submitted 27 April, 2024; originally announced April 2024.

arXiv:2404.17316 [pdf, other]

Certified MaxSAT Preprocessing

Authors: Hannes Ihalainen, Andy Oertel, Yong Kiam Tan, Jeremias Berg, Matti Järvisalo, Jakob Nordström

Abstract: Building on the progress in Boolean satisfiability (SAT) solving over the last decades, maximum satisfiability (MaxSAT) has become a viable approach for solving NP-hard optimization problems, but ensuring correctness of MaxSAT solvers has remained an important concern. For SAT, this is largely a solved problem thanks to the use of proof logging, meaning that solvers emit machine-verifiable proofs… ▽ More Building on the progress in Boolean satisfiability (SAT) solving over the last decades, maximum satisfiability (MaxSAT) has become a viable approach for solving NP-hard optimization problems, but ensuring correctness of MaxSAT solvers has remained an important concern. For SAT, this is largely a solved problem thanks to the use of proof logging, meaning that solvers emit machine-verifiable proofs of (un)satisfiability to certify correctness. However, for MaxSAT, proof logging solvers have started being developed only very recently. Moreover, these nascent efforts have only targeted the core solving process, ignoring the preprocessing phase where input problem instances can be substantially reformulated before being passed on to the solver proper. In this work, we demonstrate how pseudo-Boolean proof logging can be used to certify the correctness of a wide range of modern MaxSAT preprocessing techniques. By combining and extending the VeriPB and CakePB tools, we provide formally verified, end-to-end proof checking that the input and preprocessed output MaxSAT problem instances have the same optimal value. An extensive evaluation on applied MaxSAT benchmarks shows that our approach is feasible in practice. △ Less

Submitted 26 April, 2024; originally announced April 2024.

arXiv:2404.16885 [pdf]

Adapting an Artificial Intelligence Sexually Transmitted Diseases Symptom Checker Tool for Mpox Detection: The HeHealth Experience

Authors: Rayner Kay Jin Tan, Dilruk Perera, Salomi Arasaratnam, Yudara Kularathne

Abstract: Artificial Intelligence applications have shown promise in the management of pandemics and have been widely used to assist the identification, classification, and diagnosis of medical images. In response to the global outbreak of Monkeypox (Mpox), the HeHealth.ai team leveraged an existing tool to screen for sexually transmitted diseases to develop a digital screening test for symptomatic Mpox thr… ▽ More Artificial Intelligence applications have shown promise in the management of pandemics and have been widely used to assist the identification, classification, and diagnosis of medical images. In response to the global outbreak of Monkeypox (Mpox), the HeHealth.ai team leveraged an existing tool to screen for sexually transmitted diseases to develop a digital screening test for symptomatic Mpox through AI approaches. Prior to the global outbreak of Mpox, the team developed a smartphone app, where app users can use their own smartphone cameras to take pictures of their own penises to screen for symptomatic STD. The AI model was initially developed using 5000 cases and use a modified convolutional neural network to output prediction scores across visually diagnosable penis pathologies including Syphilis, Herpes Simplex Virus, and Human Papilloma Virus. From June 2022 to October 2022, a total of about 22,000 users downloaded the HeHealth app, and about 21,000 images have been analyzed using HeHealth AI technology. We then engaged in formative research, stakeholder engagement, rapid consolidation images, a validation study, and implementation of the tool from July 2022. From July 2022 to October 2022, a total of 1000 Mpox related images had been used to train the Mpox symptom checker tool. Our digital symptom checker tool showed accuracy of 87% to rule in Mpox and 90% to rule out symptomatic Mpox. Several hurdles identified included issues of data privacy and security for app users, initial lack of data to train the AI tool, and the potential generalizability of input data. We offer several suggestions to help others get started on similar projects in emergency situations, including engaging a wide range of stakeholders, having a multidisciplinary team, prioritizing pragmatism, as well as the concept that big data in fact is made up of small data. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: 15 pages, 4 figures

arXiv:2404.16745 [pdf, other]

Statistical Inference for Covariate-Adjusted and Interpretable Generalized Factor Model with Application to Testing Fairness

Authors: Jing Ouyang, Chengyu Cui, Kean Ming Tan, Gongjun Xu

Abstract: In the era of data explosion, statisticians have been developing interpretable and computationally efficient statistical methods to measure latent factors (e.g., skills, abilities, and personalities) using large-scale assessment data. In addition to understanding the latent information, the covariate effect on responses controlling for latent factors is also of great scientific interest and has wi… ▽ More In the era of data explosion, statisticians have been developing interpretable and computationally efficient statistical methods to measure latent factors (e.g., skills, abilities, and personalities) using large-scale assessment data. In addition to understanding the latent information, the covariate effect on responses controlling for latent factors is also of great scientific interest and has wide applications, such as evaluating the fairness of educational testing, where the covariate effect reflects whether a test question is biased toward certain individual characteristics (e.g., gender and race) taking into account their latent abilities. However, the large sample size, substantial covariate dimension, and great test length pose challenges to developing efficient methods and drawing valid inferences. Moreover, to accommodate the commonly encountered discrete types of responses, nonlinear latent factor models are often assumed, bringing further complexity to the problem. To address these challenges, we consider a covariate-adjusted generalized factor model and develop novel and interpretable conditions to address the identifiability issue. Based on the identifiability conditions, we propose a joint maximum likelihood estimation method and establish estimation consistency and asymptotic normality results for the covariate effects under a practical yet challenging asymptotic regime. Furthermore, we derive estimation and inference results for latent factors and the factor loadings. We illustrate the finite sample performance of the proposed method through extensive numerical studies and an application to an educational assessment dataset obtained from the Programme for International Student Assessment (PISA). △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.13818 [pdf, other]

Joint Liability Model with Adaptation to Climate Change

Authors: Jiayue Zhang, Ken Seng Tan, Tony S. Wirjanto, Lysa Porth

Abstract: This paper extends the application of ESG score assessment methodologies from large corporations to individual farmers' production, within the context of climate change. Our proposal involves the integration of crucial agricultural sustainability variables into conventional personal credit evaluation frameworks, culminating in the formulation of a holistic sustainable credit rating referred to as… ▽ More This paper extends the application of ESG score assessment methodologies from large corporations to individual farmers' production, within the context of climate change. Our proposal involves the integration of crucial agricultural sustainability variables into conventional personal credit evaluation frameworks, culminating in the formulation of a holistic sustainable credit rating referred to as the Environmental, Social, Economics (ESE) score. This ESE score is integrated into theoretical joint liability models, to gain valuable insights into optimal group sizes and individual-ESE score relationships. Additionally, we adopt a mean-variance utility function for farmers to effectively capture the risk associated with anticipated profits. Through a set of simulation exercises, the paper investigates the implications of incorporating ESE scores into credit evaluation systems, offering a nuanced comprehension of the repercussions under various climatic conditions. △ Less

Submitted 21 April, 2024; originally announced April 2024.

arXiv:2404.12569 [pdf, other]

Multi-View Subgraph Neural Networks: Self-Supervised Learning with Scarce Labeled Data

Authors: Zhenzhong Wang, Qingyuan Zeng, Wanyu Lin, Min Jiang, Kay Chen Tan

Abstract: While graph neural networks (GNNs) have become the de-facto standard for graph-based node classification, they impose a strong assumption on the availability of sufficient labeled samples. This assumption restricts the classification performance of prevailing GNNs on many real-world applications suffering from low-data regimes. Specifically, features extracted from scarce labeled nodes could not p… ▽ More While graph neural networks (GNNs) have become the de-facto standard for graph-based node classification, they impose a strong assumption on the availability of sufficient labeled samples. This assumption restricts the classification performance of prevailing GNNs on many real-world applications suffering from low-data regimes. Specifically, features extracted from scarce labeled nodes could not provide sufficient supervision for the unlabeled samples, leading to severe over-fitting. In this work, we point out that leveraging subgraphs to capture long-range dependencies can augment the representation of a node with homophily properties, thus alleviating the low-data regime. However, prior works leveraging subgraphs fail to capture the long-range dependencies among nodes. To this end, we present a novel self-supervised learning framework, called multi-view subgraph neural networks (Muse), for handling long-range dependencies. In particular, we propose an information theory-based identification mechanism to identify two types of subgraphs from the views of input space and latent space, respectively. The former is to capture the local structure of the graph, while the latter captures the long-range dependencies among nodes. By fusing these two views of subgraphs, the learned representations can preserve the topological properties of the graph at large, including the local structure and long-range dependencies, thus maximizing their expressiveness for downstream node classification tasks. Experimental results show that Muse outperforms the alternative methods on node classification tasks with limited labeled data. △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.06418 [pdf, other]

Studying the Impact of Latent Representations in Implicit Neural Networks for Scientific Continuous Field Reconstruction

Authors: Wei Xu, Derek Freeman DeSantis, Xihaier Luo, Avish Parmar, Klaus Tan, Balu Nadiga, Yihui Ren, Shinjae Yoo

Abstract: Learning a continuous and reliable representation of physical fields from sparse sampling is challenging and it affects diverse scientific disciplines. In a recent work, we present a novel model called MMGN (Multiplicative and Modulated Gabor Network) with implicit neural networks. In this work, we design additional studies leveraging explainability methods to complement the previous experiments a… ▽ More Learning a continuous and reliable representation of physical fields from sparse sampling is challenging and it affects diverse scientific disciplines. In a recent work, we present a novel model called MMGN (Multiplicative and Modulated Gabor Network) with implicit neural networks. In this work, we design additional studies leveraging explainability methods to complement the previous experiments and further enhance the understanding of latent representations generated by the model. The adopted methods are general enough to be leveraged for any latent space inspection. Preliminary results demonstrate the contextual information incorporated in the latent representations and their impact on the model performance. As a work in progress, we will continue to verify our findings and develop novel explainability approaches. △ Less

Submitted 9 April, 2024; originally announced April 2024.

arXiv:2404.06349 [pdf, other]

CausalBench: A Comprehensive Benchmark for Causal Learning Capability of Large Language Models

Authors: Yu Zhou, Xingyu Wu, Beicheng Huang, Jibin Wu, Liang Feng, Kay Chen Tan

Abstract: Causality reveals fundamental principles behind data distributions in real-world scenarios, and the capability of large language models (LLMs) to understand causality directly impacts their efficacy across explaining outputs, adapting to new evidence, and generating counterfactuals. With the proliferation of LLMs, the evaluation of this capacity is increasingly garnering attention. However, the ab… ▽ More Causality reveals fundamental principles behind data distributions in real-world scenarios, and the capability of large language models (LLMs) to understand causality directly impacts their efficacy across explaining outputs, adapting to new evidence, and generating counterfactuals. With the proliferation of LLMs, the evaluation of this capacity is increasingly garnering attention. However, the absence of a comprehensive benchmark has rendered existing evaluation studies being straightforward, undiversified, and homogeneous. To address these challenges, this paper proposes a comprehensive benchmark, namely CausalBench, to evaluate the causality understanding capabilities of LLMs. Originating from the causal research community, CausalBench encompasses three causal learning-related tasks, which facilitate a convenient comparison of LLMs' performance with classic causal learning algorithms. Meanwhile, causal networks of varying scales and densities are integrated in CausalBench, to explore the upper limits of LLMs' capabilities across task scenarios of varying difficulty. Notably, background knowledge and structured data are also incorporated into CausalBench to thoroughly unlock the underlying potential of LLMs for long-text comprehension and prior information utilization. Based on CausalBench, this paper evaluates nineteen leading LLMs and unveils insightful conclusions in diverse aspects. Firstly, we present the strengths and weaknesses of LLMs and quantitatively explore the upper limits of their capabilities across various scenarios. Meanwhile, we further discern the adaptability and abilities of LLMs to specific structural networks and complex chain of thought structures. Moreover, this paper quantitatively presents the differences across diverse information sources and uncovers the gap between LLMs' capabilities in causal understanding within textual contexts and numerical domains. △ Less

Submitted 9 April, 2024; originally announced April 2024.

arXiv:2404.06290 [pdf, other]

Exploring the True Potential: Evaluating the Black-box Optimization Capability of Large Language Models

Authors: Beichen Huang, Xingyu Wu, Yu Zhou, Jibin Wu, Liang Feng, Ran Cheng, Kay Chen Tan

Abstract: Large language models (LLMs) have demonstrated exceptional performance not only in natural language processing tasks but also in a great variety of non-linguistic domains. In diverse optimization scenarios, there is also a rising trend of applying LLMs. However, whether the application of LLMs in the black-box optimization problems is genuinely beneficial remains unexplored. This paper endeavors t… ▽ More Large language models (LLMs) have demonstrated exceptional performance not only in natural language processing tasks but also in a great variety of non-linguistic domains. In diverse optimization scenarios, there is also a rising trend of applying LLMs. However, whether the application of LLMs in the black-box optimization problems is genuinely beneficial remains unexplored. This paper endeavors to offer deep insights into the potential of LLMs in optimization through a comprehensive investigation, which covers both discrete and continuous optimization problems to assess the efficacy and distinctive characteristics that LLMs bring to this field. Our findings reveal both the limitations and advantages of LLMs in optimization. Specifically, on the one hand, despite the significant power consumed for running the models, LLMs exhibit subpar performance in pure numerical tasks, primarily due to a mismatch between the problem domain and their processing capabilities; on the other hand, although LLMs may not be ideal for traditional numerical optimization, their potential in broader optimization contexts remains promising, where LLMs exhibit the ability to solve problems in non-numerical domains and can leverage heuristics from the prompt to enhance their performance. To the best of our knowledge, this work presents the first systematic evaluation of LLMs for numerical optimization. Our findings pave the way for a deeper understanding of LLMs' role in optimization and guide future application of LLMs in a wide range of scenarios. △ Less

Submitted 6 July, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

arXiv:2404.02062 [pdf, other]

Digital Forgetting in Large Language Models: A Survey of Unlearning Methods

Authors: Alberto Blanco-Justicia, Najeeb Jebreel, Benet Manzanares, David Sánchez, Josep Domingo-Ferrer, Guillem Collell, Kuan Eeik Tan

Abstract: The objective of digital forgetting is, given a model with undesirable knowledge or behavior, obtain a new model where the detected issues are no longer present. The motivations for forgetting include privacy protection, copyright protection, elimination of biases and discrimination, and prevention of harmful content generation. Effective digital forgetting has to be effective (meaning how well th… ▽ More The objective of digital forgetting is, given a model with undesirable knowledge or behavior, obtain a new model where the detected issues are no longer present. The motivations for forgetting include privacy protection, copyright protection, elimination of biases and discrimination, and prevention of harmful content generation. Effective digital forgetting has to be effective (meaning how well the new model has forgotten the undesired knowledge/behavior), retain the performance of the original model on the desirable tasks, and be scalable (in particular forgetting has to be more efficient than retraining from scratch on just the tasks/data to be retained). This survey focuses on forgetting in large language models (LLMs). We first provide background on LLMs, including their components, the types of LLMs, and their usual training pipeline. Second, we describe the motivations, types, and desired properties of digital forgetting. Third, we introduce the approaches to digital forgetting in LLMs, among which unlearning methodologies stand out as the state of the art. Fourth, we provide a detailed taxonomy of machine unlearning methods for LLMs, and we survey and compare current approaches. Fifth, we detail datasets, models and metrics used for the evaluation of forgetting, retaining and runtime. Sixth, we discuss challenges in the area. Finally, we provide some concluding remarks. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: 70 pages

MSC Class: 68 ACM Class: K.4.1; I.2.6; I.2.7

arXiv:2404.00962 [pdf, other]

Diffusion-Driven Domain Adaptation for Generating 3D Molecules

Authors: Haokai Hong, Wanyu Lin, Kay Chen Tan

Abstract: Can we train a molecule generator that can generate 3D molecules from a new domain, circumventing the need to collect data? This problem can be cast as the problem of domain adaptive molecule generation. This work presents a novel and principled diffusion-based approach, called GADM, that allows shifting a generative model to desired new domains without the need to collect even a single molecule.… ▽ More Can we train a molecule generator that can generate 3D molecules from a new domain, circumventing the need to collect data? This problem can be cast as the problem of domain adaptive molecule generation. This work presents a novel and principled diffusion-based approach, called GADM, that allows shifting a generative model to desired new domains without the need to collect even a single molecule. As the domain shift is typically caused by the structure variations of molecules, e.g., scaffold variations, we leverage a designated equivariant masked autoencoder (MAE) along with various masking strategies to capture the structural-grained representations of the in-domain varieties. In particular, with an asymmetric encoder-decoder module, the MAE can generalize to unseen structure variations from the target domains. These structure variations are encoded with an equivariant encoder and treated as domain supervisors to control denoising. We show that, with these encoded structural-grained domain supervisors, GADM can generate effective molecules within the desired new domains. We conduct extensive experiments across various domain adaptation tasks over benchmarking datasets. We show that our approach can improve up to 65.6% in terms of success rate defined based on molecular validity, uniqueness, and novelty compared to alternative baselines. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: 11 pages, 3 figures, and 3 tables

arXiv:2403.09701 [pdf, other]

A Natural Extension To Online Algorithms For Hybrid RL With Limited Coverage

Authors: Kevin Tan, Ziping Xu

Abstract: Hybrid Reinforcement Learning (RL), leveraging both online and offline data, has garnered recent interest, yet research on its provable benefits remains sparse. Additionally, many existing hybrid RL algorithms (Song et al., 2023; Nakamoto et al., 2023; Amortila et al., 2024) impose coverage assumptions on the offline dataset, but we show that this is unnecessary. A well-designed online algorithm s… ▽ More Hybrid Reinforcement Learning (RL), leveraging both online and offline data, has garnered recent interest, yet research on its provable benefits remains sparse. Additionally, many existing hybrid RL algorithms (Song et al., 2023; Nakamoto et al., 2023; Amortila et al., 2024) impose coverage assumptions on the offline dataset, but we show that this is unnecessary. A well-designed online algorithm should "fill in the gaps" in the offline dataset, exploring states and actions that the behavior policy did not explore. Unlike previous approaches that focus on estimating the offline data distribution to guide online exploration (Li et al., 2023b), we show that a natural extension to standard optimistic online algorithms -- warm-starting them by including the offline dataset in the experience replay buffer -- achieves similar provable gains from hybrid data even when the offline dataset does not have single-policy concentrability. We accomplish this by partitioning the state-action space into two, bounding the regret on each partition through an offline and an online complexity measure, and showing that the regret of this hybrid RL algorithm can be characterized by the best partition -- despite the algorithm not knowing the partition itself. As an example, we propose DISC-GOLF, a modification of an existing optimistic online algorithm with general function approximation called GOLF used in Jin et al. (2021); Xie et al. (2022a), and show that it demonstrates provable gains over both online-only and offline-only reinforcement learning, with competitive bounds when specialized to the tabular, linear and block MDP cases. Numerical simulations further validate our theory that hybrid data facilitates more efficient exploration, supporting the potential of hybrid RL in various scenarios. △ Less

Submitted 17 March, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

Comments: Submitted to the reinforcement learning conference

arXiv:2403.04133 [pdf, other]

Towards learning-based planning:The nuPlan benchmark for real-world autonomous driving

Authors: Napat Karnchanachari, Dimitris Geromichalos, Kok Seang Tan, Nanxiang Li, Christopher Eriksen, Shakiba Yaghoubi, Noushin Mehdipour, Gianmarco Bernasconi, Whye Kit Fong, Yiluan Guo, Holger Caesar

Abstract: Machine Learning (ML) has replaced traditional handcrafted methods for perception and prediction in autonomous vehicles. Yet for the equally important planning task, the adoption of ML-based techniques is slow. We present nuPlan, the world's first real-world autonomous driving dataset, and benchmark. The benchmark is designed to test the ability of ML-based planners to handle diverse driving situa… ▽ More Machine Learning (ML) has replaced traditional handcrafted methods for perception and prediction in autonomous vehicles. Yet for the equally important planning task, the adoption of ML-based techniques is slow. We present nuPlan, the world's first real-world autonomous driving dataset, and benchmark. The benchmark is designed to test the ability of ML-based planners to handle diverse driving situations and to make safe and efficient decisions. To that end, we introduce a new large-scale dataset that consists of 1282 hours of diverse driving scenarios from 4 cities (Las Vegas, Boston, Pittsburgh, and Singapore) and includes high-quality auto-labeled object tracks and traffic light data. We exhaustively mine and taxonomize common and rare driving scenarios which are used during evaluation to get fine-grained insights into the performance and characteristics of a planner. Beyond the dataset, we provide a simulation and evaluation framework that enables a planner's actions to be simulated in closed-loop to account for interactions with other traffic participants. We present a detailed analysis of numerous baselines and investigate gaps between ML-based and traditional methods. Find the nuPlan dataset and code at nuplan.org. △ Less

Submitted 6 March, 2024; originally announced March 2024.

Comments: ICRA 2024 camera ready incl. supplementary material

arXiv:2403.01757 [pdf, other]

How Multimodal Integration Boost the Performance of LLM for Optimization: Case Study on Capacitated Vehicle Routing Problems

Authors: Yuxiao Huang, Wenjie Zhang, Liang Feng, Xingyu Wu, Kay Chen Tan

Abstract: Recently, large language models (LLMs) have notably positioned them as capable tools for addressing complex optimization challenges. Despite this recognition, a predominant limitation of existing LLM-based optimization methods is their struggle to capture the relationships among decision variables when relying exclusively on numerical text prompts, especially in high-dimensional problems. Keeping… ▽ More Recently, large language models (LLMs) have notably positioned them as capable tools for addressing complex optimization challenges. Despite this recognition, a predominant limitation of existing LLM-based optimization methods is their struggle to capture the relationships among decision variables when relying exclusively on numerical text prompts, especially in high-dimensional problems. Keeping this in mind, we first propose to enhance the optimization performance using multimodal LLM capable of processing both textual and visual prompts for deeper insights of the processed optimization problem. This integration allows for a more comprehensive understanding of optimization problems, akin to human cognitive processes. We have developed a multimodal LLM-based optimization framework that simulates human problem-solving workflows, thereby offering a more nuanced and effective analysis. The efficacy of this method is evaluated through extensive empirical studies focused on a well-known combinatorial optimization problem, i.e., capacitated vehicle routing problem. The results are compared against those obtained from the LLM-based optimization algorithms that rely solely on textual prompts, demonstrating the significant advantages of our multimodal approach. △ Less

Submitted 4 March, 2024; originally announced March 2024.

Comments: 8pages,3 figures, 2 tables

arXiv:2403.01369 [pdf, other]

A Closer Look at Wav2Vec2 Embeddings for On-Device Single-Channel Speech Enhancement

Authors: Ravi Shankar, Ke Tan, Buye Xu, Anurag Kumar

Abstract: Self-supervised learned models have been found to be very effective for certain speech tasks such as automatic speech recognition, speaker identification, keyword spotting and others. While the features are undeniably useful in speech recognition and associated tasks, their utility in speech enhancement systems is yet to be firmly established, and perhaps not properly understood. In this paper, we… ▽ More Self-supervised learned models have been found to be very effective for certain speech tasks such as automatic speech recognition, speaker identification, keyword spotting and others. While the features are undeniably useful in speech recognition and associated tasks, their utility in speech enhancement systems is yet to be firmly established, and perhaps not properly understood. In this paper, we investigate the uses of SSL representations for single-channel speech enhancement in challenging conditions and find that they add very little value for the enhancement task. Our constraints are designed around on-device real-time speech enhancement -- model is causal, the compute footprint is small. Additionally, we focus on low SNR conditions where such models struggle to provide good enhancement. In order to systematically examine how SSL representations impact performance of such enhancement models, we propose a variety of techniques to utilize these embeddings which include different forms of knowledge-distillation and pre-training. △ Less

Submitted 2 March, 2024; originally announced March 2024.

Comments: 8 pages; Shorter form accepted in ICASSP 2024

arXiv:2402.19095 [pdf]

A Protein Structure Prediction Approach Leveraging Transformer and CNN Integration

Authors: Yanlin Zhou, Kai Tan, Xinyu Shen, Zheng He, Haotian Zheng

Abstract: Proteins are essential for life, and their structure determines their function. The protein secondary structure is formed by the folding of the protein primary structure, and the protein tertiary structure is formed by the bending and folding of the secondary structure. Therefore, the study of protein secondary structure is very helpful to the overall understanding of protein structure. Although t… ▽ More Proteins are essential for life, and their structure determines their function. The protein secondary structure is formed by the folding of the protein primary structure, and the protein tertiary structure is formed by the bending and folding of the secondary structure. Therefore, the study of protein secondary structure is very helpful to the overall understanding of protein structure. Although the accuracy of protein secondary structure prediction has continuously improved with the development of machine learning and deep learning, progress in the field of protein structure prediction, unfortunately, remains insufficient to meet the large demand for protein information. Therefore, based on the advantages of deep learning-based methods in feature extraction and learning ability, this paper adopts a two-dimensional fusion deep neural network model, DstruCCN, which uses Convolutional Neural Networks (CCN) and a supervised Transformer protein language model for single-sequence protein structure prediction. The training features of the two are combined to predict the protein Transformer binding site matrix, and then the three-dimensional structure is reconstructed using energy minimization. △ Less

Submitted 8 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

arXiv:2402.18292 [pdf, other]

FSL-Rectifier: Rectify Outliers in Few-Shot Learning via Test-Time Augmentation

Authors: Yunwei Bai, Ying Kiat Tan, Tsuhan Chen

Abstract: Few-shot-learning (FSL) commonly requires a model to identify images (queries) that belong to classes unseen during training, based on a few labelled samples of the new classes (support set) as reference. So far, plenty of algorithms involve training data augmentation to improve the generalization capability of FSL models, but outlier query or support images during inference can still pose great g… ▽ More Few-shot-learning (FSL) commonly requires a model to identify images (queries) that belong to classes unseen during training, based on a few labelled samples of the new classes (support set) as reference. So far, plenty of algorithms involve training data augmentation to improve the generalization capability of FSL models, but outlier query or support images during inference can still pose great generalization challenges. In this work, to reduce the bias caused by unconventional test samples, we generate additional test-class samples through combining them with suitable train-class samples, whose averaged representations then become more typical for few-shot classification. We experimentally and theoretically demonstrate the effectiveness of our method, via which we can achieve around 4\% test accuracy improvement for trained FSL models. Importantly, our method is universally compatible with different off-the-shelf FSL models, whose performance can be improved without extra dataset nor further training of the models themselves. △ Less

Submitted 21 July, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

arXiv:2402.17318 [pdf, other]

Scaling Supervised Local Learning with Augmented Auxiliary Networks

Authors: Chenxiang Ma, Jibin Wu, Chenyang Si, Kay Chen Tan

Abstract: Deep neural networks are typically trained using global error signals that backpropagate (BP) end-to-end, which is not only biologically implausible but also suffers from the update locking problem and requires huge memory consumption. Local learning, which updates each layer independently with a gradient-isolated auxiliary network, offers a promising alternative to address the above problems. How… ▽ More Deep neural networks are typically trained using global error signals that backpropagate (BP) end-to-end, which is not only biologically implausible but also suffers from the update locking problem and requires huge memory consumption. Local learning, which updates each layer independently with a gradient-isolated auxiliary network, offers a promising alternative to address the above problems. However, existing local learning methods are confronted with a large accuracy gap with the BP counterpart, particularly for large-scale networks. This is due to the weak coupling between local layers and their subsequent network layers, as there is no gradient communication across layers. To tackle this issue, we put forward an augmented local learning method, dubbed AugLocal. AugLocal constructs each hidden layer's auxiliary network by uniformly selecting a small subset of layers from its subsequent network layers to enhance their synergy. We also propose to linearly reduce the depth of auxiliary networks as the hidden layer goes deeper, ensuring sufficient network capacity while reducing the computational cost of auxiliary networks. Our extensive experiments on four image classification datasets (i.e., CIFAR-10, SVHN, STL-10, and ImageNet) demonstrate that AugLocal can effectively scale up to tens of local layers with a comparable accuracy to BP-trained networks while reducing GPU memory usage by around 40%. The proposed AugLocal method, therefore, opens up a myriad of opportunities for training high-performance deep neural networks on resource-constrained platforms.Code is available at https://github.com/ChenxiangMA/AugLocal. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: Accepted by ICLR 2024

arXiv:2402.16567 [pdf, other]

Aligning Large Language Models to a Domain-specific Graph Database

Authors: Yuanyuan Liang, Keren Tan, Tingyu Xie, Wenbiao Tao, Siyuan Wang, Yunshi Lan, Weining Qian

Abstract: Graph Databases (Graph DB) are widely applied in various fields, including finance, social networks, and medicine. However, translating Natural Language (NL) into the Graph Query Language (GQL), commonly known as NL2GQL, proves to be challenging due to its inherent complexity and specialized nature. Some approaches have sought to utilize Large Language Models (LLMs) to address analogous tasks like… ▽ More Graph Databases (Graph DB) are widely applied in various fields, including finance, social networks, and medicine. However, translating Natural Language (NL) into the Graph Query Language (GQL), commonly known as NL2GQL, proves to be challenging due to its inherent complexity and specialized nature. Some approaches have sought to utilize Large Language Models (LLMs) to address analogous tasks like text2SQL. Nevertheless, when it comes to NL2GQL taskson a particular domain, the absence of domain-specific NL-GQL data pairs makes it difficult to establish alignment between LLMs and the graph DB. To address this challenge, we propose a well-defined pipeline. Specifically, we utilize ChatGPT to create NL-GQL data pairs based on the given graph DB with self-instruct. Then, we use the created data to fine-tune LLMs, thereby achieving alignment between LLMs and the graph DB. Additionally, during inference, we propose a method that extracts relevant schema to the queried NL as the input context to guide LLMs for generating accurate GQLs.We evaluate our method on two constructed datasets deriving from graph DBs in finance domain and medicine domain, namely FinGQL and MediGQL. Experimental results demonstrate that our method significantly outperforms a set of baseline methods, with improvements of 5.90 and 6.36 absolute points on EM, and 6.00 and 7.09 absolute points on EX, respectively. △ Less

Submitted 28 February, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

Comments: 13 pages,2 figures

arXiv:2402.16250 [pdf, ps, other]

A Proof of Weak Cosmic Censorship Conjecture for the Spherically Symmetric Einstein-Maxwell-Charged Scalar Field System

Authors: Xinliang An, Hong Kiat Tan

Abstract: Under spherical symmetry, we show that the weak cosmic censorship holds for the gravitational collapse of the Einstein-Maxwell-charged scalar field system. Namely, for this system, with generic initial data, the formed spacetime singularities are concealed inside black-hole regions. This generalizes Christodoulou's celebrated results to the charged case. Due to the presence of charge $Q$ and the c… ▽ More Under spherical symmetry, we show that the weak cosmic censorship holds for the gravitational collapse of the Einstein-Maxwell-charged scalar field system. Namely, for this system, with generic initial data, the formed spacetime singularities are concealed inside black-hole regions. This generalizes Christodoulou's celebrated results to the charged case. Due to the presence of charge $Q$ and the complexification of the scalar field $φ$, multiple delicate features and miraculous monotonic properties of the Einstein-(real) scalar field system are not present. We develop a systematical approach to incorporate $Q$ and the complex-valued $φ$ into the integrated arguments. For instance, we discover a new path, employing the reduced mass ratio, to establish the sharp trapped surface formation criterion for the charged case. Due to the complex structure and the absence of translational symmetry of $φ$, we also carry out detailed modified scale-critical BV area estimates with renormalized quantities to deal with $Q$ and $φ$. We present a new $C^1$ extension criterion by utilizing the Doppler exponent to elucidate the blueshift effect, analogous to the role of integrating vorticity in the Beale-Kato-Madja breakdown criterion for incompressible fluids. Furthermore, by utilizing only double-null foliations, we establish the desired first and second instability theorems for the charged scenarios and identify generic initial conditions for the non-appearance of naked singularities. Our instability argument requires intricate generalizations of the treatment for the uncharged case via analyzing the precise contribution of the charged terms and its connection to the reduced mass ratio. △ Less

Submitted 25 February, 2024; originally announced February 2024.

Comments: 148 pages

arXiv:2402.15969 [pdf, other]

Efficient Online Learning for Networks of Two-Compartment Spiking Neurons

Authors: Yujia Yin, Xinyi Chen, Chenxiang Ma, Jibin Wu, Kay Chen Tan

Abstract: The brain-inspired Spiking Neural Networks (SNNs) have garnered considerable research interest due to their superior performance and energy efficiency in processing temporal signals. Recently, a novel multi-compartment spiking neuron model, namely the Two-Compartment LIF (TC-LIF) model, has been proposed and exhibited a remarkable capacity for sequential modelling. However, training the TC-LIF mod… ▽ More The brain-inspired Spiking Neural Networks (SNNs) have garnered considerable research interest due to their superior performance and energy efficiency in processing temporal signals. Recently, a novel multi-compartment spiking neuron model, namely the Two-Compartment LIF (TC-LIF) model, has been proposed and exhibited a remarkable capacity for sequential modelling. However, training the TC-LIF model presents challenges stemming from the large memory consumption and the issue of gradient vanishing associated with the Backpropagation Through Time (BPTT) algorithm. To address these challenges, online learning methodologies emerge as a promising solution. Yet, to date, the application of online learning methods in SNNs has been predominantly confined to simplified Leaky Integrate-and-Fire (LIF) neuron models. In this paper, we present a novel online learning method specifically tailored for networks of TC-LIF neurons. Additionally, we propose a refined TC-LIF neuron model called Adaptive TC-LIF, which is carefully designed to enhance temporal information integration in online learning scenarios. Extensive experiments, conducted on various sequential benchmarks, demonstrate that our approach successfully preserves the superior sequential modeling capabilities of the TC-LIF neuron while incorporating the training efficiency and hardware friendliness of online learning. As a result, it offers a multitude of opportunities to leverage neuromorphic solutions for processing temporal signals. △ Less

Submitted 24 February, 2024; originally announced February 2024.

arXiv:2402.14704 [pdf, other]

An LLM-Enhanced Adversarial Editing System for Lexical Simplification

Authors: Keren Tan, Kangyang Luo, Yunshi Lan, Zheng Yuan, Jinlong Shu

Abstract: Lexical Simplification (LS) aims to simplify text at the lexical level. Existing methods rely heavily on annotated data, making it challenging to apply in low-resource scenarios. In this paper, we propose a novel LS method without parallel corpora. This method employs an Adversarial Editing System with guidance from a confusion loss and an invariance loss to predict lexical edits in the original s… ▽ More Lexical Simplification (LS) aims to simplify text at the lexical level. Existing methods rely heavily on annotated data, making it challenging to apply in low-resource scenarios. In this paper, we propose a novel LS method without parallel corpora. This method employs an Adversarial Editing System with guidance from a confusion loss and an invariance loss to predict lexical edits in the original sentences. Meanwhile, we introduce an innovative LLM-enhanced loss to enable the distillation of knowledge from Large Language Models (LLMs) into a small-size LS system. From that, complex words within sentences are masked and a Difficulty-aware Filling module is crafted to replace masked positions with simpler words. At last, extensive experimental results and analyses on three benchmark LS datasets demonstrate the effectiveness of our proposed method. △ Less

Submitted 22 March, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

Comments: Accepted by COLING 2024 main conference

arXiv:2401.10034 [pdf, other]

Evolutionary Computation in the Era of Large Language Model: Survey and Roadmap

Authors: Xingyu Wu, Sheng-hao Wu, Jibin Wu, Liang Feng, Kay Chen Tan

Abstract: Large language models (LLMs) have not only revolutionized natural language processing but also extended their prowess to various domains, marking a significant stride towards artificial general intelligence. The interplay between LLMs and evolutionary algorithms (EAs), despite differing in objectives and methodologies, share a common pursuit of applicability in complex problems. Meanwhile, EA can… ▽ More Large language models (LLMs) have not only revolutionized natural language processing but also extended their prowess to various domains, marking a significant stride towards artificial general intelligence. The interplay between LLMs and evolutionary algorithms (EAs), despite differing in objectives and methodologies, share a common pursuit of applicability in complex problems. Meanwhile, EA can provide an optimization framework for LLM's further enhancement under black-box settings, empowering LLM with flexible global search capacities. On the other hand, the abundant domain knowledge inherent in LLMs could enable EA to conduct more intelligent searches. Furthermore, the text processing and generative capabilities of LLMs would aid in deploying EAs across a wide range of tasks. Based on these complementary advantages, this paper provides a thorough review and a forward-looking roadmap, categorizing the reciprocal inspiration into two main avenues: LLM-enhanced EA and EA-enhanced LLM. Some integrated synergy methods are further introduced to exemplify the complementarity between LLMs and EAs in diverse scenarios, including code generation, software engineering, neural architecture search, and various generation tasks. As the first comprehensive review focused on the EA research in the era of LLMs, this paper provides a foundational stepping stone for understanding the collaborative potential of LLMs and EAs. The identified challenges and future directions offer guidance for researchers and practitioners to unlock the full potential of this innovative collaboration in propelling advancements in optimization and artificial intelligence. We have created a GitHub repository to index the relevant papers: https://github.com/wuxingyu-ai/LLM4EC. △ Less

Submitted 29 May, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

Comments: evolutionary algorithm (EA), large language model (LLM), optimization problem, prompt engineering, algorithm generation, neural architecture search

arXiv:2401.04962 [pdf, other]

Large Model based Sequential Keyframe Extraction for Video Summarization

Authors: Kailong Tan, Yuxiang Zhou, Qianchen Xia, Rui Liu, Yong Chen

Abstract: Keyframe extraction aims to sum up a video's semantics with the minimum number of its frames. This paper puts forward a Large Model based Sequential Keyframe Extraction for video summarization, dubbed LMSKE, which contains three stages as below. First, we use the large model "TransNetV21" to cut the video into consecutive shots, and employ the large model "CLIP2" to generate each frame's visual fe… ▽ More Keyframe extraction aims to sum up a video's semantics with the minimum number of its frames. This paper puts forward a Large Model based Sequential Keyframe Extraction for video summarization, dubbed LMSKE, which contains three stages as below. First, we use the large model "TransNetV21" to cut the video into consecutive shots, and employ the large model "CLIP2" to generate each frame's visual feature within each shot; Second, we develop an adaptive clustering algorithm to yield candidate keyframes for each shot, with each candidate keyframe locating nearest to a cluster center; Third, we further reduce the above candidate keyframes via redundancy elimination within each shot, and finally concatenate them in accordance with the sequence of shots as the final sequential keyframes. To evaluate LMSKE, we curate a benchmark dataset and conduct rich experiments, whose results exhibit that LMSKE performs much better than quite a few SOTA competitors with average F1 of 0.5311, average fidelity of 0.8141, and average compression ratio of 0.9922. △ Less

Submitted 10 January, 2024; originally announced January 2024.

Comments: This paper has been accepted for CDIVP 2024

arXiv:2401.03508 [pdf, ps, other]

Kirkwood-Dirac Type Quasiprobabilities as Universal Identifiers of Nonclassical Quantum Resources

Authors: Kok Chuan Tan, Souradeep Sasmal

Abstract: We show that a Kirkwood-Dirac type quasiprobability distribution is sufficient to reveal any arbitrary quantum resource. This is achieved by demonstrating that it is always possible to identify a set of incompatible measurements that distinguishes between resourceful states and nonresourceful states. The quasiprobability reveals a resourceful quantum state by having at least one quasiprobabilty ou… ▽ More We show that a Kirkwood-Dirac type quasiprobability distribution is sufficient to reveal any arbitrary quantum resource. This is achieved by demonstrating that it is always possible to identify a set of incompatible measurements that distinguishes between resourceful states and nonresourceful states. The quasiprobability reveals a resourceful quantum state by having at least one quasiprobabilty outcome with a strictly negative numerical value. We also show that there always exists a quasiprobabilty distribution where the total negativity can be interpreted as the geometric distance between a resourceful quantum state to the closest nonresourceful state. It can also be shown that Kirkwood-Dirac type quasiprobability distributions, like the Wigner distribution, can be made informationally complete, in the sense that it can provide complete information about the quantum state while simultaneously revealing nonclassicality whenever a quasiprobability outcome is negative. Moreover, we demonstrate the existence of sufficiently strong anomalous weak values whenever the quasiprobability distribution is negative, which suggests a means to experimentally test such quasiprobability distributions. Since incompatible measurements are necessary in order for the quasiprobability to be negative, this result suggests that measurement incompatibility may underlie any quantum advantage gained from utilizing a nonclassical quantum resource △ Less

Submitted 7 January, 2024; originally announced January 2024.

Comments: 7 pages

arXiv:2401.01563 [pdf, other]

Towards Multi-Objective High-Dimensional Feature Selection via Evolutionary Multitasking

Authors: Yinglan Feng, Liang Feng, Songbai Liu, Sam Kwong, Kay Chen Tan

Abstract: Evolutionary Multitasking (EMT) paradigm, an emerging research topic in evolutionary computation, has been successfully applied in solving high-dimensional feature selection (FS) problems recently. However, existing EMT-based FS methods suffer from several limitations, such as a single mode of multitask generation, conducting the same generic evolutionary search for all tasks, relying on implicit… ▽ More Evolutionary Multitasking (EMT) paradigm, an emerging research topic in evolutionary computation, has been successfully applied in solving high-dimensional feature selection (FS) problems recently. However, existing EMT-based FS methods suffer from several limitations, such as a single mode of multitask generation, conducting the same generic evolutionary search for all tasks, relying on implicit transfer mechanisms through sole solution encodings, and employing single-objective transformation, which result in inadequate knowledge acquisition, exploitation, and transfer. To this end, this paper develops a novel EMT framework for multiobjective high-dimensional feature selection problems, namely MO-FSEMT. In particular, multiple auxiliary tasks are constructed by distinct formulation methods to provide diverse search spaces and information representations and then simultaneously addressed with the original task through a multi-slover-based multitask optimization scheme. Each task has an independent population with task-specific representations and is solved using separate evolutionary solvers with different biases and search preferences. A task-specific knowledge transfer mechanism is designed to leverage the advantage information of each task, enabling the discovery and effective transmission of high-quality solutions during the search process. Comprehensive experimental results demonstrate that our MO-FSEMT framework can achieve overall superior performance compared to the state-of-the-art FS methods on 26 datasets. Moreover, the ablation studies verify the contributions of different components of the proposed MO-FSEMT. △ Less

Submitted 3 January, 2024; originally announced January 2024.

arXiv:2312.16884 [pdf]

Binaural recording methods with analysis on inter-aural time, level, and phase differences

Authors: Johann Kay Ann Tan

Abstract: Binaural recordings are a form of stereophonic recording method that replicates how human ears perceive sound, these types of recordings create a 3D aural image around the listener and are extremely immersive when well recorded and listened to appropriately with headphones. It has wide applications in video, podcast, and gaming formats -- allowing the listener to feel like they are there. Although… ▽ More Binaural recordings are a form of stereophonic recording method that replicates how human ears perceive sound, these types of recordings create a 3D aural image around the listener and are extremely immersive when well recorded and listened to appropriately with headphones. It has wide applications in video, podcast, and gaming formats -- allowing the listener to feel like they are there. Although binaural formats are seldom used for music applications, they have also been utilized in music ranging from Rock, Jazz, Acoustic, and Classical. In this paper, we will investigate the acoustical phenomenon that produces the binaural effect in audio recordings -- including the ITD (Inter-aural time difference), the ILD (inter-aural level difference), IPD (inter-aural phase difference) as well as the monaural spectral difference that occurs between two ears so we can better understand the replication of human hearing in binaural recordings. Binaural recordings differ from regular stereophonic recordings as they are arranged in a specific way to account for HRTF (Head-related transfer function). The most common method of binaural recordings is with two high-quality omni-directional microphones affixed on a dummy head where the ears are located, although other methods exist without the use of a full dummy head. △ Less

Submitted 28 December, 2023; originally announced December 2023.

arXiv:2312.15602 [pdf]

doi 10.3397/IN-2021-3048

The effects of aural and visual factors on appropriateness ratings of residential spaces in an urban city

Authors: Johann Kay Ann Tan, Siu-Kit Lau, Yoshimi Hasegawa

Abstract: This study investigates the aural and visual factors that influence appropriateness perception in soundscape evaluations in residential spaces, where people may spend most of their time in. Appropriateness in soundscape is derived from the expectation of sound sources in a specific environment, place, or function heard by a listener. The appropriateness of soundscapes in 30 locations in an urban r… ▽ More This study investigates the aural and visual factors that influence appropriateness perception in soundscape evaluations in residential spaces, where people may spend most of their time in. Appropriateness in soundscape is derived from the expectation of sound sources in a specific environment, place, or function heard by a listener. The appropriateness of soundscapes in 30 locations in an urban residential environment is investigated with varying landscape, visual, and aural elements through a questionnaire. Participants experienced the soundscape in situ and were asked to evaluate the appropriateness of soundscape as well as the dominance of specific sound sources such as traffic, human activities, and birdsongs in the residential space. The effect of the type of traffic on appropriateness is also investigated. A strong relationship is found between appropriateness and affective soundscape qualities such as pleasantness, highlighting the importance of considering appropriateness in soundscape research. △ Less

Submitted 24 December, 2023; originally announced December 2023.

Comments: Internoise 2021

Showing 1–50 of 468 results for author: Tan, K