Search | arXiv e-print repository

MoELoRA: Contrastive Learning Guided Mixture of Experts on Parameter-Efficient Fine-Tuning for Large Language Models

Authors: Tongxu Luo, Jiahe Lei, Fangyu Lei, Weihao Liu, Shizhu He, Jun Zhao, Kang Liu

Abstract: Fine-tuning is often necessary to enhance the adaptability of Large Language Models (LLM) to downstream tasks. Nonetheless, the process of updating billions of parameters demands significant computational resources and training time, which poses a substantial obstacle to the widespread application of large-scale models in various scenarios. To address this issue, Parameter-Efficient Fine-Tuning (P… ▽ More Fine-tuning is often necessary to enhance the adaptability of Large Language Models (LLM) to downstream tasks. Nonetheless, the process of updating billions of parameters demands significant computational resources and training time, which poses a substantial obstacle to the widespread application of large-scale models in various scenarios. To address this issue, Parameter-Efficient Fine-Tuning (PEFT) has emerged as a prominent paradigm in recent research. However, current PEFT approaches that employ a limited set of global parameters (such as LoRA, which adds low-rank approximation matrices to all weights) face challenges in flexibly combining different computational modules in downstream tasks. In this work, we introduce a novel PEFT method: MoELoRA. We consider LoRA as Mixture of Experts (MoE), and to mitigate the random routing phenomenon observed in MoE, we propose the utilization of contrastive learning to encourage experts to learn distinct features. We conducted experiments on 11 tasks in math reasoning and common-sense reasoning benchmarks. With the same number of parameters, our approach outperforms LoRA significantly. In math reasoning, MoELoRA achieved an average performance that was 4.2% higher than LoRA, and demonstrated competitive performance compared to the 175B GPT-3.5 on several benchmarks. △ Less

Submitted 20 February, 2024; originally announced February 2024.

arXiv:2402.12271 [pdf, other]

Secure Federated Learning Across Heterogeneous Cloud and High-Performance Computing Resources -- A Case Study on Federated Fine-tuning of LLaMA 2

Authors: Zilinghan Li, Shilan He, Pranshu Chaturvedi, Volodymyr Kindratenko, Eliu A Huerta, Kibaek Kim, Ravi Madduri

Abstract: Federated learning enables multiple data owners to collaboratively train robust machine learning models without transferring large or sensitive local datasets by only sharing the parameters of the locally trained models. In this paper, we elaborate on the design of our Advanced Privacy-Preserving Federated Learning (APPFL) framework, which streamlines end-to-end secure and reliable federated learn… ▽ More Federated learning enables multiple data owners to collaboratively train robust machine learning models without transferring large or sensitive local datasets by only sharing the parameters of the locally trained models. In this paper, we elaborate on the design of our Advanced Privacy-Preserving Federated Learning (APPFL) framework, which streamlines end-to-end secure and reliable federated learning experiments across cloud computing facilities and high-performance computing resources by leveraging Globus Compute, a distributed function as a service platform, and Amazon Web Services. We further demonstrate the use case of APPFL in fine-tuning a LLaMA 2 7B model using several cloud resources and supercomputers. △ Less

Submitted 19 February, 2024; originally announced February 2024.

arXiv:2402.12219 [pdf, other]

Reformatted Alignment

Authors: Run-Ze Fan, Xuefeng Li, Haoyang Zou, Junlong Li, Shwai He, Ethan Chern, Jiewen Hu, Pengfei Liu

Abstract: The quality of finetuning data is crucial for aligning large language models (LLMs) with human values. Current methods to improve data quality are either labor-intensive or prone to factual errors caused by LLM hallucinations. This paper explores elevating the quality of existing instruction data to better align with human values, introducing a simple and effective approach named ReAlign, which re… ▽ More The quality of finetuning data is crucial for aligning large language models (LLMs) with human values. Current methods to improve data quality are either labor-intensive or prone to factual errors caused by LLM hallucinations. This paper explores elevating the quality of existing instruction data to better align with human values, introducing a simple and effective approach named ReAlign, which reformats the responses of instruction data into a format that better aligns with pre-established criteria and the collated evidence. This approach minimizes human annotation, hallucination, and the difficulty in scaling, remaining orthogonal to existing alignment techniques. Experimentally, ReAlign significantly boosts the general alignment ability, math reasoning, factuality, and readability of the LLMs. Encouragingly, without introducing any additional data or advanced training techniques, and merely by reformatting the response, LLaMA-2-13B's mathematical reasoning ability on GSM8K can be improved from 46.77% to 56.63% in accuracy. Additionally, a mere 5% of ReAlign data yields a 67% boost in general alignment ability measured by the Alpaca dataset. This work highlights the need for further research into the science and mechanistic interpretability of LLMs. We have made the associated code and data publicly accessible to support future studies at https://github.com/GAIR-NLP/ReAlign. △ Less

Submitted 17 April, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

Comments: Homepage: https://gair-nlp.github.io/ReAlign/

arXiv:2402.12099

Human Video Translation via Query Warping

Authors: Haiming Zhu, Yangyang Xu, Shengfeng He

Abstract: In this paper, we present QueryWarp, a novel framework for temporally coherent human motion video translation. Existing diffusion-based video editing approaches that rely solely on key and value tokens to ensure temporal consistency, which scarifies the preservation of local and structural regions. In contrast, we aim to consider complementary query priors by constructing the temporal correlations… ▽ More In this paper, we present QueryWarp, a novel framework for temporally coherent human motion video translation. Existing diffusion-based video editing approaches that rely solely on key and value tokens to ensure temporal consistency, which scarifies the preservation of local and structural regions. In contrast, we aim to consider complementary query priors by constructing the temporal correlations among query tokens from different frames. Initially, we extract appearance flows from source poses to capture continuous human foreground motion. Subsequently, during the denoising process of the diffusion model, we employ appearance flows to warp the previous frame's query token, aligning it with the current frame's query. This query warping imposes explicit constraints on the outputs of self-attention layers, effectively guaranteeing temporally coherent translation. We perform experiments on various human motion video translation tasks, and the results demonstrate that our QueryWarp framework surpasses state-of-the-art methods both qualitatively and quantitatively. △ Less

Submitted 21 May, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

Comments: This is not a complete paper and the methods and results have not been updated. We decided to withdraw and make further improvements

arXiv:2402.11139 [pdf, other]

LiGNN: Graph Neural Networks at LinkedIn

Authors: Fedor Borisyuk, Shihai He, Yunbo Ouyang, Morteza Ramezani, Peng Du, Xiaochen Hou, Chengming Jiang, Nitin Pasumarthy, Priya Bannur, Birjodh Tiwana, Ping Liu, Siddharth Dangi, Daqi Sun, Zhoutao Pei, Xiao Shi, Sirou Zhu, Qianqi Shen, Kuang-Hsuan Lee, David Stein, Baolei Li, Haichao Wei, Amol Ghoting, Souvik Ghosh

Abstract: In this paper, we present LiGNN, a deployed large-scale Graph Neural Networks (GNNs) Framework. We share our insight on developing and deployment of GNNs at large scale at LinkedIn. We present a set of algorithmic improvements to the quality of GNN representation learning including temporal graph architectures with long term losses, effective cold start solutions via graph densification, ID embedd… ▽ More In this paper, we present LiGNN, a deployed large-scale Graph Neural Networks (GNNs) Framework. We share our insight on developing and deployment of GNNs at large scale at LinkedIn. We present a set of algorithmic improvements to the quality of GNN representation learning including temporal graph architectures with long term losses, effective cold start solutions via graph densification, ID embeddings and multi-hop neighbor sampling. We explain how we built and sped up by 7x our large-scale training on LinkedIn graphs with adaptive sampling of neighbors, grouping and slicing of training data batches, specialized shared-memory queue and local gradient optimization. We summarize our deployment lessons and learnings gathered from A/B test experiments. The techniques presented in this work have contributed to an approximate relative improvements of 1% of Job application hearing back rate, 2% Ads CTR lift, 0.5% of Feed engaged daily active users, 0.2% session lift and 0.1% weekly active user lift from people recommendation. We believe that this work can provide practical solutions and insights for engineers who are interested in applying Graph neural networks at large scale. △ Less

Submitted 16 February, 2024; originally announced February 2024.

arXiv:2402.10940 [pdf, ps, other]

Neural machine translation of clinical procedure codes for medical diagnosis and uncertainty quantification

Authors: Pei-Hung Chung, Shuhan He, Norawit Kijpaisalratana, Abdel-badih el Ariss, Byung-Jun Yoon

Abstract: A Clinical Decision Support System (CDSS) is designed to enhance clinician decision-making by combining system-generated recommendations with medical expertise. Given the high costs, intensive labor, and time-sensitive nature of medical treatments, there is a pressing need for efficient decision support, especially in complex emergency scenarios. In these scenarios, where information can be limite… ▽ More A Clinical Decision Support System (CDSS) is designed to enhance clinician decision-making by combining system-generated recommendations with medical expertise. Given the high costs, intensive labor, and time-sensitive nature of medical treatments, there is a pressing need for efficient decision support, especially in complex emergency scenarios. In these scenarios, where information can be limited, an advanced CDSS framework that leverages AI (artificial intelligence) models to effectively reduce diagnostic uncertainty has utility. Such an AI-enabled CDSS framework with quantified uncertainty promises to be practical and beneficial in the demanding context of real-world medical care. In this study, we introduce the concept of Medical Entropy, quantifying uncertainties in patient outcomes predicted by neural machine translation based on the ICD-9 code of procedures. Our experimental results not only show strong correlations between procedure and diagnosis sequences based on the simple ICD-9 code but also demonstrate the promising capacity to model trends of uncertainties during hospitalizations through a data-driven approach. △ Less

Submitted 7 February, 2024; originally announced February 2024.

arXiv:2402.10464 [pdf, other]

FedKit: Enabling Cross-Platform Federated Learning for Android and iOS

Authors: Sichang He, Beilong Tang, Boyan Zhang, Jiaoqi Shao, Xiaomin Ouyang, Daniel Nata Nugraha, Bing Luo

Abstract: We present FedKit, a federated learning (FL) system tailored for cross-platform FL research on Android and iOS devices. FedKit pipelines cross-platform FL development by enabling model conversion, hardware-accelerated training, and cross-platform model aggregation. Our FL workflow supports flexible machine learning operations (MLOps) in production, facilitating continuous model delivery and traini… ▽ More We present FedKit, a federated learning (FL) system tailored for cross-platform FL research on Android and iOS devices. FedKit pipelines cross-platform FL development by enabling model conversion, hardware-accelerated training, and cross-platform model aggregation. Our FL workflow supports flexible machine learning operations (MLOps) in production, facilitating continuous model delivery and training. We have deployed FedKit in a real-world use case for health data analysis on university campuses, demonstrating its effectiveness. FedKit is open-source at https://github.com/FedCampus/FedKit. △ Less

Submitted 16 February, 2024; originally announced February 2024.

Comments: This work has been accepted for demonstration on IEEE International Conference on Computer Communications (INFOCOM) 2024

arXiv:2402.10151 [pdf, other]

ControlLM: Crafting Diverse Personalities for Language Models

Authors: Yixuan Weng, Shizhu He, Kang Liu, Shengping Liu, Jun Zhao

Abstract: As language models continue to scale in size and capability, they display an array of emerging behaviors, both beneficial and concerning. This heightens the need to control model behaviors. We hope to be able to control the personality traits of language models at the inference-time so as to have various character features, on top of which the requirements of different types of tasks can be met. P… ▽ More As language models continue to scale in size and capability, they display an array of emerging behaviors, both beneficial and concerning. This heightens the need to control model behaviors. We hope to be able to control the personality traits of language models at the inference-time so as to have various character features, on top of which the requirements of different types of tasks can be met. Personality is a higher-level and more abstract behavioral representation for language models. We introduce ControlLM, which leverages differential activation patterns, derived from contrasting behavioral prompts in the model's latent space, to influence the model's personality traits at inference. This approach allows for the precise, real-time adjustment of model behavior. First, we demonstrate ControlLM's capacity to elicit diverse persona behaviors without any training, while precision control allows personality traits to closely match average human values. Subsequently, we showcase improved reasoning and question answering through selective amplification of beneficial attributes like conscientiousness and friendliness. We hope that this work will inspire research on controlling human-like behaviors of language models and provide insights for future research. Our code is publicly available at: https://github.com/wengsyx/ControlLM. △ Less

Submitted 15 February, 2024; originally announced February 2024.

Comments: 17 pages

arXiv:2402.10110 [pdf, other]

Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning

Authors: Ming Li, Lichang Chen, Jiuhai Chen, Shwai He, Jiuxiang Gu, Tianyi Zhou

Abstract: Instruction tuning is critical to large language models (LLMs) for achieving better instruction following and task adaptation capabilities but its success heavily relies on the training data quality. Many recent methods focus on improving the data quality but often overlook the compatibility of the data with the student model being finetuned. This paper introduces Selective Reflection-Tuning, a no… ▽ More Instruction tuning is critical to large language models (LLMs) for achieving better instruction following and task adaptation capabilities but its success heavily relies on the training data quality. Many recent methods focus on improving the data quality but often overlook the compatibility of the data with the student model being finetuned. This paper introduces Selective Reflection-Tuning, a novel paradigm that synergizes a teacher LLM's reflection and introspection for improving existing data quality with the data selection capability of the student LLM, to automatically refine existing instruction-tuning data. This teacher-student collaboration produces high-quality and student-compatible instruction-response pairs, resulting in sample-efficient instruction tuning and LLMs of superior performance. Selective Reflection-Tuning is a data augmentation and synthesis that generally improves LLM finetuning and self-improvement without collecting brand-new data. We apply our method to Alpaca and WizardLM data and achieve much stronger and top-tier 7B and 13B LLMs. △ Less

Submitted 7 June, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

Comments: ACL2024 (findings), Camera-ready

arXiv:2402.07939 [pdf, other]

UFO: A UI-Focused Agent for Windows OS Interaction

Authors: Chaoyun Zhang, Liqun Li, Shilin He, Xu Zhang, Bo Qiao, Si Qin, Minghua Ma, Yu Kang, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang

Abstract: We introduce UFO, an innovative UI-Focused agent to fulfill user requests tailored to applications on Windows OS, harnessing the capabilities of GPT-Vision. UFO employs a dual-agent framework to meticulously observe and analyze the graphical user interface (GUI) and control information of Windows applications. This enables the agent to seamlessly navigate and operate within individual applications… ▽ More We introduce UFO, an innovative UI-Focused agent to fulfill user requests tailored to applications on Windows OS, harnessing the capabilities of GPT-Vision. UFO employs a dual-agent framework to meticulously observe and analyze the graphical user interface (GUI) and control information of Windows applications. This enables the agent to seamlessly navigate and operate within individual applications and across them to fulfill user requests, even when spanning multiple applications. The framework incorporates a control interaction module, facilitating action grounding without human intervention and enabling fully automated execution. Consequently, UFO transforms arduous and time-consuming processes into simple tasks achievable solely through natural language commands. We conducted testing of UFO across 9 popular Windows applications, encompassing a variety of scenarios reflective of users' daily usage. The results, derived from both quantitative metrics and real-case studies, underscore the superior effectiveness of UFO in fulfilling user requests. To the best of our knowledge, UFO stands as the first UI agent specifically tailored for task completion within the Windows OS environment. The open-source code for UFO is available on https://github.com/microsoft/UFO. △ Less

Submitted 23 May, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

arXiv:2402.05798 [pdf, other]

Visual Harmony: Text-Visual Interplay in Circular Infographics

Authors: Shuqi He, Yuqing Chen, Yuxin Xia, Yichun Li, Hai-Ning Liang, Lingyun Yu

Abstract: Infographics are visual representations designed for efficient and effective communication of data and knowledge. One crucial aspect of infographic design is the interplay between text and visual elements, particularly in circular visualizations where the textual descriptions can either be embedded within the graphics or placed adjacent to the visual representation. While several studies have exam… ▽ More Infographics are visual representations designed for efficient and effective communication of data and knowledge. One crucial aspect of infographic design is the interplay between text and visual elements, particularly in circular visualizations where the textual descriptions can either be embedded within the graphics or placed adjacent to the visual representation. While several studies have examined text layout design in visualizations in general, the text-visual interplay in infographics and its subsequent perceptual effects remain underexplored. To address this, our study investigates how varying text placement and descriptiveness impact pleasantness, comprehension and overall memorability in the infographics viewing experience. We recruited 30 participants and presented them with a collection of 15 infographics across a diverse set of topics, including media and public events, health and nutrition, science and research, and sustainability. The text placement (embed, side-to-side) and descriptiveness (simplistic, normal, descriptive) were systematically manipulated, resulting in a total of six experimental conditions. Our key findings indicate that text placement can significantly influence the memorability of infographics, whereas descriptiveness can significantly impact the pleasantness of the viewing experience. Embedding text placement and simplistic text can potentially contribute to more effective infographic designs. These results offer valuable insights for infographic designers, contributing to the creation of more effective and memorable visual representations. △ Less

Submitted 8 February, 2024; originally announced February 2024.

arXiv:2402.05438 [pdf, other]

Penalized spline estimation of principal components for sparse functional data: rates of convergence

Authors: Shiyuan He, Jianhua Z. Huang, Kejun He

Abstract: This paper gives a comprehensive treatment of the convergence rates of penalized spline estimators for simultaneously estimating several leading principal component functions, when the functional data is sparsely observed. The penalized spline estimators are defined as the solution of a penalized empirical risk minimization problem, where the loss function belongs to a general class of loss functi… ▽ More This paper gives a comprehensive treatment of the convergence rates of penalized spline estimators for simultaneously estimating several leading principal component functions, when the functional data is sparsely observed. The penalized spline estimators are defined as the solution of a penalized empirical risk minimization problem, where the loss function belongs to a general class of loss functions motivated by the matrix Bregman divergence, and the penalty term is the integrated squared derivative. The theory reveals that the asymptotic behavior of penalized spline estimators depends on the interesting interplay between several factors, i.e., the smoothness of the unknown functions, the spline degree, the spline knot number, the penalty order, and the penalty parameter. The theory also classifies the asymptotic behavior into seven scenarios and characterizes whether and how the minimax optimal rates of convergence are achievable in each scenario. △ Less

Submitted 8 February, 2024; originally announced February 2024.

arXiv:2402.01723 [pdf, other]

An Empirical Study on Large Language Models in Accuracy and Robustness under Chinese Industrial Scenarios

Authors: Zongjie Li, Wenying Qiu, Pingchuan Ma, Yichen Li, You Li, Sijia He, Baozheng Jiang, Shuai Wang, Weixi Gu

Abstract: Recent years have witnessed the rapid development of large language models (LLMs) in various domains. To better serve the large number of Chinese users, many commercial vendors in China have adopted localization strategies, training and providing local LLMs specifically customized for Chinese users. Furthermore, looking ahead, one of the key future applications of LLMs will be practical deployment… ▽ More Recent years have witnessed the rapid development of large language models (LLMs) in various domains. To better serve the large number of Chinese users, many commercial vendors in China have adopted localization strategies, training and providing local LLMs specifically customized for Chinese users. Furthermore, looking ahead, one of the key future applications of LLMs will be practical deployment in industrial production by enterprises and users in those sectors. However, the accuracy and robustness of LLMs in industrial scenarios have not been well studied. In this paper, we present a comprehensive empirical study on the accuracy and robustness of LLMs in the context of the Chinese industrial production area. We manually collected 1,200 domain-specific problems from 8 different industrial sectors to evaluate LLM accuracy. Furthermore, we designed a metamorphic testing framework containing four industrial-specific stability categories with eight abilities, totaling 13,631 questions with variants to evaluate LLM robustness. In total, we evaluated 9 different LLMs developed by Chinese vendors, as well as four different LLMs developed by global vendors. Our major findings include: (1) Current LLMs exhibit low accuracy in Chinese industrial contexts, with all LLMs scoring less than 0.6. (2) The robustness scores vary across industrial sectors, and local LLMs overall perform worse than global ones. (3) LLM robustness differs significantly across abilities. Global LLMs are more robust under logical-related variants, while advanced local LLMs perform better on problems related to understanding Chinese industrial terminology. Our study results provide valuable guidance for understanding and promoting the industrial domain capabilities of LLMs from both development and industrial enterprise perspectives. The results further motivate possible research directions and tooling support. △ Less

Submitted 26 January, 2024; originally announced February 2024.

arXiv:2402.00530 [pdf, other]

Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning

Authors: Ming Li, Yong Zhang, Shwai He, Zhitao Li, Hongyu Zhao, Jianzong Wang, Ning Cheng, Tianyi Zhou

Abstract: Instruction tuning is critical to improve LLMs but usually suffers from low-quality and redundant data. Data filtering for instruction tuning has proved important in improving both the efficiency and performance of the tuning process. But it also leads to extra cost and computation due to the involvement of LLMs in this process. To reduce the filtering cost, we study Superfiltering: Can we use a s… ▽ More Instruction tuning is critical to improve LLMs but usually suffers from low-quality and redundant data. Data filtering for instruction tuning has proved important in improving both the efficiency and performance of the tuning process. But it also leads to extra cost and computation due to the involvement of LLMs in this process. To reduce the filtering cost, we study Superfiltering: Can we use a smaller and weaker model to select data for finetuning a larger and stronger model? Despite the performance gap between weak and strong language models, we find their highly consistent capability to perceive instruction difficulty and data selection results. This enables us to use a much smaller and more efficient model to filter the instruction data used to train a larger language model. Not only does it largely speed up the data filtering, but the filtered-data-finetuned LLM achieves even better performance on standard benchmarks. Extensive experiments validate the efficacy and efficiency of our approach. △ Less

Submitted 7 June, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

Comments: ACL2024 main, Camera-ready

arXiv:2402.00268 [pdf, other]

Relation between timelike and spacelike entanglement entropy

Authors: Wu-zhong Guo, Song He, Yu-Xuan Zhang

Abstract: In this study, we establish a connection between timelike and spacelike entanglement entropy. Specifically, for a diverse range of states, the timelike entanglement entropy is uniquely determined by a linear combination of the spacelike entanglement entropy and its first-order temporal derivative. This framework reveals that the imaginary component of the timelike entanglement entropy primarily or… ▽ More In this study, we establish a connection between timelike and spacelike entanglement entropy. Specifically, for a diverse range of states, the timelike entanglement entropy is uniquely determined by a linear combination of the spacelike entanglement entropy and its first-order temporal derivative. This framework reveals that the imaginary component of the timelike entanglement entropy primarily originates from the non-commutativity between the twist operator and its first-order temporal derivative. Furthermore, we analyze the constraints of this relation and highlight the possible extension to accommodate more complex state configurations. △ Less

Submitted 31 January, 2024; originally announced February 2024.

Comments: 5+8 pages, 1 figure

arXiv:2401.15852 [pdf, ps, other]

The Spectral base and quotients of bounded symmetric domains

Authors: Siqi He, Jie Liu, Ngaiming Mok

Abstract: In this article, we explore Higgs bundles on a projective manifold $X$, focusing on their spectral bases, a concept introduced by T.Chen and B.Ngô. The spectral base is a specific closed subscheme within the space of symmetric differentials. We observe that if the spectral base vanishes, then any reductive representation $ρ: π_1(X) \to \text{GL}_r(\mathbb{C})$ is both rigid and integral. Additiona… ▽ More In this article, we explore Higgs bundles on a projective manifold $X$, focusing on their spectral bases, a concept introduced by T.Chen and B.Ngô. The spectral base is a specific closed subscheme within the space of symmetric differentials. We observe that if the spectral base vanishes, then any reductive representation $ρ: π_1(X) \to \text{GL}_r(\mathbb{C})$ is both rigid and integral. Additionally, we prove that for $X=Ω/Γ$, a quotient of a bounded symmetric domain $Ω$ of rank at least $2$ by a torsion-free cocompact irreducible lattice $Γ$, the spectral base indeed vanishes, which generalizes a result of B.Klingler. △ Less

Submitted 28 January, 2024; originally announced January 2024.

Comments: 21 pages

MSC Class: 14J60; 53C35

arXiv:2401.15123 [pdf, other]

Large Language Model Guided Knowledge Distillation for Time Series Anomaly Detection

Authors: Chen Liu, Shibo He, Qihang Zhou, Shizhong Li, Wenchao Meng

Abstract: Self-supervised methods have gained prominence in time series anomaly detection due to the scarcity of available annotations. Nevertheless, they typically demand extensive training data to acquire a generalizable representation map, which conflicts with scenarios of a few available samples, thereby limiting their performance. To overcome the limitation, we propose \textbf{AnomalyLLM}, a knowledge… ▽ More Self-supervised methods have gained prominence in time series anomaly detection due to the scarcity of available annotations. Nevertheless, they typically demand extensive training data to acquire a generalizable representation map, which conflicts with scenarios of a few available samples, thereby limiting their performance. To overcome the limitation, we propose \textbf{AnomalyLLM}, a knowledge distillation-based time series anomaly detection approach where the student network is trained to mimic the features of the large language model (LLM)-based teacher network that is pretrained on large-scale datasets. During the testing phase, anomalies are detected when the discrepancy between the features of the teacher and student networks is large. To circumvent the student network from learning the teacher network's feature of anomalous samples, we devise two key strategies. 1) Prototypical signals are incorporated into the student network to consolidate the normal feature extraction. 2) We use synthetic anomalies to enlarge the representation gap between the two networks. AnomalyLLM demonstrates state-of-the-art performance on 15 datasets, improving accuracy by at least 14.5\% in the UCR dataset. △ Less

Submitted 26 January, 2024; originally announced January 2024.

Comments: 12 pages, 5 figures

arXiv:2401.13714 [pdf, other]

Value-Driven Mixed-Precision Quantization for Patch-Based Inference on Microcontrollers

Authors: Wei Tao, Shenglin He, Kai Lu, Xiaoyang Qu, Guokuan Li, Jiguang Wan, Jianzong Wang, Jing Xiao

Abstract: Deploying neural networks on microcontroller units (MCUs) presents substantial challenges due to their constrained computation and memory resources. Previous researches have explored patch-based inference as a strategy to conserve memory without sacrificing model accuracy. However, this technique suffers from severe redundant computation overhead, leading to a substantial increase in execution lat… ▽ More Deploying neural networks on microcontroller units (MCUs) presents substantial challenges due to their constrained computation and memory resources. Previous researches have explored patch-based inference as a strategy to conserve memory without sacrificing model accuracy. However, this technique suffers from severe redundant computation overhead, leading to a substantial increase in execution latency. A feasible solution to address this issue is mixed-precision quantization, but it faces the challenges of accuracy degradation and a time-consuming search time. In this paper, we propose QuantMCU, a novel patch-based inference method that utilizes value-driven mixed-precision quantization to reduce redundant computation. We first utilize value-driven patch classification (VDPC) to maintain the model accuracy. VDPC classifies patches into two classes based on whether they contain outlier values. For patches containing outlier values, we apply 8-bit quantization to the feature maps on the dataflow branches that follow. In addition, for patches without outlier values, we utilize value-driven quantization search (VDQS) on the feature maps of their following dataflow branches to reduce search time. Specifically, VDQS introduces a novel quantization search metric that takes into account both computation and accuracy, and it employs entropy as an accuracy representation to avoid additional training. VDQS also adopts an iterative approach to determine the bitwidth of each feature map to further accelerate the search process. Experimental results on real-world MCU devices show that QuantMCU can reduce computation by 2.2x on average while maintaining comparable model accuracy compared to the state-of-the-art patch-based inference methods. △ Less

Submitted 23 January, 2024; originally announced January 2024.

Comments: Accepted by the 27th Design, Automation and Test in Europe Conference (DATE 2024)

arXiv:2401.11235 [pdf, other]

TreeMIL: A Multi-instance Learning Framework for Time Series Anomaly Detection with Inexact Supervision

Authors: Chen Liu, Shibo He, Haoyu Liu, Shizhong Li

Abstract: Time series anomaly detection (TSAD) plays a vital role in various domains such as healthcare, networks, and industry. Considering labels are crucial for detection but difficult to obtain, we turn to TSAD with inexact supervision: only series-level labels are provided during the training phase, while point-level anomalies are predicted during the testing phase. Previous works follow a traditional… ▽ More Time series anomaly detection (TSAD) plays a vital role in various domains such as healthcare, networks, and industry. Considering labels are crucial for detection but difficult to obtain, we turn to TSAD with inexact supervision: only series-level labels are provided during the training phase, while point-level anomalies are predicted during the testing phase. Previous works follow a traditional multi-instance learning (MIL) approach, which focuses on encouraging high anomaly scores at individual time steps. However, time series anomalies are not only limited to individual point anomalies, they can also be collective anomalies, typically exhibiting abnormal patterns over subsequences. To address the challenge of collective anomalies, in this paper, we propose a tree-based MIL framework (TreeMIL). We first adopt an N-ary tree structure to divide the entire series into multiple nodes, where nodes at different levels represent subsequences with different lengths. Then, the subsequence features are extracted to determine the presence of collective anomalies. Finally, we calculate point-level anomaly scores by aggregating features from nodes at different levels. Experiments conducted on seven public datasets and eight baselines demonstrate that TreeMIL achieves an average 32.3% improvement in F1- score compared to previous state-of-the-art methods. The code is available at https://github.com/fly-orange/TreeMIL. △ Less

Submitted 20 January, 2024; originally announced January 2024.

Comments: This paper has been accepted by IEEE ICASSP 2024

arXiv:2401.09991 [pdf, ps, other]

doi 10.1007/JHEP04(2024)138

Irrelevant and marginal deformed BMS field theories

Authors: Song He, Xin-Cheng Mao

Abstract: In this study, we investigate various deformations within the framework of Bondi-van der Burg-Metzner-Sachs invariant field theory (BMSFT). Specifically, we explore the impact of Bondi-van der Burg-Metzner-Sachs (BMS) symmetry on the theory by introducing key deformations, namely, $T \overline{T}$, $JT_μ$, and $\sqrt{T \overline{T}}$ deformations. In the context of generic seed theories possessing… ▽ More In this study, we investigate various deformations within the framework of Bondi-van der Burg-Metzner-Sachs invariant field theory (BMSFT). Specifically, we explore the impact of Bondi-van der Burg-Metzner-Sachs (BMS) symmetry on the theory by introducing key deformations, namely, $T \overline{T}$, $JT_μ$, and $\sqrt{T \overline{T}}$ deformations. In the context of generic seed theories possessing BMS symmetry, we derive the first-order correction of correlation functions using the systematic application of BMS symmetry ward identities. However, it is worth noting that higher-order corrections are intricately dependent on the specific characteristics of the seed theories. To illustrate our findings, we select the BMS free scalar and free fermion as representative seed theories. We then proceed to analytically determine the deformed action by solving the nontrivial flow equations. Additionally, we extend our analysis to include second-order deformations within these deformed theories. △ Less

Submitted 27 March, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

Comments: 54 pages, 0 figure

Journal ref: JHEP 04 (2024) 138

arXiv:2401.05483 [pdf, other]

NLSM $\subset$ Tr$(φ^3)$

Authors: Nima Arkani-Hamed, Qu Cao, Jin Dong, Carolina Figueiredo, Song He

Abstract: Scattering amplitudes for the simplest theory of colored scalar particles - the Tr($Φ^3$) theory - have recently been the subject of active investigations. In this letter we describe an unanticipated wider implication of this work: the Tr($Φ^3$) theory secretly contains Non-linear Sigma Model (NLSM) amplitudes to all loop orders. The NLSM amplitudes are obtained from Tr$(Φ^3)$ amplitudes by a uniq… ▽ More Scattering amplitudes for the simplest theory of colored scalar particles - the Tr($Φ^3$) theory - have recently been the subject of active investigations. In this letter we describe an unanticipated wider implication of this work: the Tr($Φ^3$) theory secretly contains Non-linear Sigma Model (NLSM) amplitudes to all loop orders. The NLSM amplitudes are obtained from Tr$(Φ^3)$ amplitudes by a unique shift of kinematic variables. We show that this shifted kinematics produces amplitudes for a cubic theory with a linear term in potential, with extrema spontaneously breaking $U(N) \to U(N-k) \times U(k)$. The Goldstone amplitudes for this theory coincide with those of pions in the $U(N) \times U(N) \to U(N)$ chiral Lagrangian to all orders in the planar limit. We also give a purely on-shell understanding of this correspondence, showing integrands defined by the kinematic shifts have the correct residues on poles and appropriately produce the Adler zero. Finally, we discuss how similar kinematic shifts produce certain infinite classes of mixed amplitudes of pions and Tr($Φ^3$) scalars, most of which are not interpretable from the Lagrangian description. △ Less

Submitted 15 April, 2024; v1 submitted 10 January, 2024; originally announced January 2024.

Comments: 10 pages, 13 figures. Addition of new material including a derivation of the results from a simple Lagrangian, identifying the symmetry breaking pattern, as well as some further discussions

arXiv:2401.04723 [pdf, other]

Spatio-temporal data fusion for the analysis of in situ and remote sensing data using the INLA-SPDE approach

Authors: Shiyu He, Samuel W. K. Wong

Abstract: We propose a Bayesian hierarchical model to address the challenge of spatial misalignment in spatio-temporal data obtained from in situ and satellite sources. The model is fit using the INLA-SPDE approach, which provides efficient computation. Our methodology combines the different data sources in a "fusion"" model via the construction of projection matrices in both spatial and temporal domains. T… ▽ More We propose a Bayesian hierarchical model to address the challenge of spatial misalignment in spatio-temporal data obtained from in situ and satellite sources. The model is fit using the INLA-SPDE approach, which provides efficient computation. Our methodology combines the different data sources in a "fusion"" model via the construction of projection matrices in both spatial and temporal domains. Through simulation studies, we demonstrate that the fusion model has superior performance in prediction accuracy across space and time compared to standalone "in situ" and "satellite" models based on only in situ or satellite data, respectively. The fusion model also generally outperforms the standalone models in terms of parameter inference. Such a modeling approach is motivated by environmental problems, and our specific focus is on the analysis and prediction of harmful algae bloom (HAB) events, where the convention is to conduct separate analyses based on either in situ samples or satellite images. A real data analysis shows that the proposed model is a necessary step towards a unified characterization of bloom dynamics and identifying the key drivers of HAB events. △ Less

Submitted 9 January, 2024; originally announced January 2024.

Comments: 23 pages, 7 figures

arXiv:2401.02880 [pdf, other]

Lotto: Secure Participant Selection against Adversarial Servers in Federated Learning

Authors: Zhifeng Jiang, Peng Ye, Shiqi He, Wei Wang, Ruichuan Chen, Bo Li

Abstract: In Federated Learning (FL), common privacy-enhancing techniques, such as secure aggregation and distributed differential privacy, rely on the critical assumption of an honest majority among participants to withstand various attacks. In practice, however, servers are not always trusted, and an adversarial server can strategically select compromised clients to create a dishonest majority, thereby un… ▽ More In Federated Learning (FL), common privacy-enhancing techniques, such as secure aggregation and distributed differential privacy, rely on the critical assumption of an honest majority among participants to withstand various attacks. In practice, however, servers are not always trusted, and an adversarial server can strategically select compromised clients to create a dishonest majority, thereby undermining the system's security guarantees. In this paper, we present Lotto, an FL system that addresses this fundamental, yet underexplored issue by providing secure participant selection against an adversarial server. Lotto supports two selection algorithms: random and informed. To ensure random selection without a trusted server, Lotto enables each client to autonomously determine their participation using verifiable randomness. For informed selection, which is more vulnerable to manipulation, Lotto approximates the algorithm by employing random selection within a refined client pool. Our theoretical analysis shows that Lotto effectively aligns the proportion of server-selected compromised participants with the base rate of dishonest clients in the population. Large-scale experiments further reveal that Lotto achieves time-to-accuracy performance comparable to that of insecure selection methods, indicating a low computational overhead for secure selection. △ Less

Submitted 6 March, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

Comments: This article has been accepted to USENIX Security '24

arXiv:2401.01718 [pdf]

RHDLPP: A multigroup radiation hydrodynamics code for laser-produced plasmas

Authors: Qi Min, Ziyang Xu, Siqi He, Haidong Lu, Xingbang Liu, Ruizi Shen, Yanhong Wu, Qikun Pan, Chongxiao Zhao, Fei Chen, Maogen Su, Chenzhong Dong

Abstract: We introduce the RHDLPP, a flux-limited multigroup radiation hydrodynamics numerical code designed for simulating laser-produced plasmas in diverse environments. The code bifurcates into two packages: RHDLPP-LTP for low-temperature plasmas generated by moderate-intensity nanosecond lasers, and RHDLPP-HTP for high-temperature, high-density plasmas formed by high-intensity laser pulses. The core rad… ▽ More We introduce the RHDLPP, a flux-limited multigroup radiation hydrodynamics numerical code designed for simulating laser-produced plasmas in diverse environments. The code bifurcates into two packages: RHDLPP-LTP for low-temperature plasmas generated by moderate-intensity nanosecond lasers, and RHDLPP-HTP for high-temperature, high-density plasmas formed by high-intensity laser pulses. The core radiation hydrodynamic equations are resolved in the Eulerian frame, employing an operator-split method. This method decomposes the solution into two substeps: first, the explicit resolution of the hyperbolic subsystems integrating radiation and fluid dynamics, and second, the implicit treatment of the parabolic part comprising stiff radiation diffusion, heat conduction, and energy exchange. Laser propagation and energy deposition are modeled through a hybrid approach, combining geometrical optics ray-tracing in sub-critical plasma regions with a one-dimensional solution of the Helmholtz wave equation in super-critical areas. The thermodynamic states are ascertained using an equation of state, based on either the real gas approximation or the quotidian equation of state (QEOS). Additionally, RHDLPP includes RHDLPP-SpeIma3D, a three-dimensional spectral simulation post-processing module, for generating both temporally-spatially resolved and time-integrated spectra and imaging, facilitating direct comparisons with experimental data. The paper showcases a series of verification tests to establish the code's accuracy and efficiency, followed by application cases, including simulations of laser-produced aluminum (Al) plasmas, pre-pulse-induced target deformation of tin (Sn) microdroplets relevant to extreme ultraviolet lithography light sources, and varied imaging and spectroscopic simulations. △ Less

Submitted 3 January, 2024; originally announced January 2024.

arXiv:2401.00667 [pdf, other]

Channelling Multimodality Through a Unimodalizing Transport: Warp-U Sampler and Stochastic Bridge Sampling

Authors: Fei Ding, David E. Jones, Shiyuan He, Xiao-Li Meng

Abstract: Monte Carlo integration is fundamental in scientific and statistical computation, but requires reliable samples from the target distribution, which poses a substantial challenge in the case of multi-modal distributions. Existing methods often involve time-consuming tuning, and typically lack tailored estimators for efficient use of the samples. This paper adapts the Warp-U transformation [Wang et… ▽ More Monte Carlo integration is fundamental in scientific and statistical computation, but requires reliable samples from the target distribution, which poses a substantial challenge in the case of multi-modal distributions. Existing methods often involve time-consuming tuning, and typically lack tailored estimators for efficient use of the samples. This paper adapts the Warp-U transformation [Wang et al., 2022] to form multi-modal sampling strategy called Warp-U sampling. It constructs a stochastic map to transport a multi-modal density into a uni-modal one, and subsequently inverts the transport but with new stochasticity injected. For efficient use of the samples for normalising constant estimation, we propose (i) an unbiased estimation scheme based coupled chains, where the Warp-U sampling is used to reduce the coupling time; and (ii) a stochastic Warp-U bridge sampling estimator, which improves its deterministic counterpart given in Wang et al. [2022]. Our overall approach requires less tuning and is easier to apply than common alternatives. Theoretically, we establish the ergodicity of our sampling algorithm and that our stochastic Warp-U bridge sampling estimator has greater (asymptotic) precision per CPU second compared to the Warp-U bridge estimator of Wang et al. [2022] under practical conditions. The advantages and current limitations of our approach are demonstrated through simulation studies and an application to exoplanet detection. △ Less

Submitted 1 January, 2024; originally announced January 2024.

arXiv:2401.00041 [pdf, other]

Scalar-Scaffolded Gluons and the Combinatorial Origins of Yang-Mills Theory

Authors: Nima Arkani-Hamed, Qu Cao, Jin Dong, Carolina Figueiredo, Song He

Abstract: We present a new formulation for Yang-Mills scattering amplitudes in any number of dimensions and at any loop order, based on the same combinatorial and binary-geometric ideas in kinematic space recently used to give an all-order description of Tr $φ^3$ theory. We propose that in a precise sense the amplitudes for a suitably "stringy" form of these two theories are identical, up to a simple shift… ▽ More We present a new formulation for Yang-Mills scattering amplitudes in any number of dimensions and at any loop order, based on the same combinatorial and binary-geometric ideas in kinematic space recently used to give an all-order description of Tr $φ^3$ theory. We propose that in a precise sense the amplitudes for a suitably "stringy" form of these two theories are identical, up to a simple shift of kinematic variables. This connection is made possible by describing the amplitudes for $n$ gluons via a "scalar scaffolding", arising from the scattering of $2n$ colored scalars coming in $n$ distinct pairs of flavors fusing to produce the gluons. Fundamental properties of the "$u$-variables", describing the "binary geometry" for surfaces appearing in the topological expansion, magically guarantee that the kinematically shifted Tr $φ^3$ amplitudes satisfy the physical properties needed to be interpreted as scaffolded gluons. These include multilinearity, gauge invariance, and factorization on tree- and loop- level gluon cuts. Our "stringy" scaffolded gluon amplitudes coincide with amplitudes in the bosonic string for extra-dimensional gluon polarizations at tree-level, but differ (and are simpler) at loop-level. We provide many checks on our proposal, including matching non-trivial leading singularities through two loops. The simple counting problem underlying the $u$ variables autonomously "knows" about everything needed to convert colored scalar to gluon amplitudes, exposing a striking "discovery" of Yang-Mills amplitudes from elementary combinatorial ideas in kinematic space. △ Less

Submitted 29 December, 2023; originally announced January 2024.

Comments: 92 pages, 37 figures

arXiv:2312.17591 [pdf, other]

Towards Faithful Explanations for Text Classification with Robustness Improvement and Explanation Guided Training

Authors: Dongfang Li, Baotian Hu, Qingcai Chen, Shan He

Abstract: Feature attribution methods highlight the important input tokens as explanations to model predictions, which have been widely applied to deep neural networks towards trustworthy AI. However, recent works show that explanations provided by these methods face challenges of being faithful and robust. In this paper, we propose a method with Robustness improvement and Explanation Guided training toward… ▽ More Feature attribution methods highlight the important input tokens as explanations to model predictions, which have been widely applied to deep neural networks towards trustworthy AI. However, recent works show that explanations provided by these methods face challenges of being faithful and robust. In this paper, we propose a method with Robustness improvement and Explanation Guided training towards more faithful EXplanations (REGEX) for text classification. First, we improve model robustness by input gradient regularization technique and virtual adversarial training. Secondly, we use salient ranking to mask noisy tokens and maximize the similarity between model attention and feature attribution, which can be seen as a self-training procedure without importing other external information. We conduct extensive experiments on six datasets with five attribution methods, and also evaluate the faithfulness in the out-of-domain setting. The results show that REGEX improves fidelity metrics of explanations in all settings and further achieves consistent gains based on two randomization tests. Moreover, we show that using highlight explanations produced by REGEX to train select-then-predict models results in comparable task performance to the end-to-end method. △ Less

Submitted 29 December, 2023; originally announced December 2023.

arXiv:2312.16282 [pdf, other]

Hidden zeros for particle/string amplitudes and the unity of colored scalars, pions and gluons

Authors: Nima Arkani-Hamed, Qu Cao, Jin Dong, Carolina Figueiredo, Song He

Abstract: Recent years have seen the emergence of a new understanding of scattering amplitudes in the simplest theory of colored scalar particles - the Tr$(φ^3)$ theory - based on combinatorial and geometric ideas in the kinematic space of scattering data. In this paper we report a surprise: far from the toy model it appears to be, the ''stringy'' Tr$(φ^3)$ amplitudes secretly contain the scattering amplitu… ▽ More Recent years have seen the emergence of a new understanding of scattering amplitudes in the simplest theory of colored scalar particles - the Tr$(φ^3)$ theory - based on combinatorial and geometric ideas in the kinematic space of scattering data. In this paper we report a surprise: far from the toy model it appears to be, the ''stringy'' Tr$(φ^3)$ amplitudes secretly contain the scattering amplitudes for pions, as well as non-supersymmetric gluons, in any number of dimensions. The amplitudes for the different theories are given by one and the same function, related by a simple shift of the kinematics. This discovery was spurred by another fundamental observation: the tree-level Tr$(φ^3)$ field theory amplitudes have a hidden pattern of zeros when a special set of non-planar Mandelstam invariants is set to zero. Furthermore, near these zeros, the amplitudes simplify, by factoring into a non-trivial product of smaller amplitudes. Remarkably the amplitudes for pions and gluons are observed to also vanish in the same kinematical locus. These properties further generalize to the ''stringy'' Tr$(φ^3)$ amplitudes. There is a unique shift of the kinematic data that preserves the zeros, and this shift is precisely the one that unifies colored scalars, pions, and gluons into a single object. We will focus in this paper on explaining the hidden zeros and factorization properties and the connection between all the colored theories, working for simplicity at tree-level. Subsequent works will describe this new formulation for the Non-linear Sigma Model and non-supersymmetric Yang-Mills theory, at all loop orders. △ Less

Submitted 1 May, 2024; v1 submitted 26 December, 2023; originally announced December 2023.

Comments: Added reference to early work of Gliozzi et. al. giving a different derivation of zeros for string amplitudes from monodromy relations, corrected typos

arXiv:2312.16218 [pdf, other]

Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object Structure via HyperNetworks

Authors: Christian Simon, Sen He, Juan-Manuel Perez-Rua, Mengmeng Xu, Amine Benhalloum, Tao Xiang

Abstract: Solving image-to-3D from a single view is an ill-posed problem, and current neural reconstruction methods addressing it through diffusion models still rely on scene-specific optimization, constraining their generalization capability. To overcome the limitations of existing approaches regarding generalization and consistency, we introduce a novel neural rendering technique. Our approach employs the… ▽ More Solving image-to-3D from a single view is an ill-posed problem, and current neural reconstruction methods addressing it through diffusion models still rely on scene-specific optimization, constraining their generalization capability. To overcome the limitations of existing approaches regarding generalization and consistency, we introduce a novel neural rendering technique. Our approach employs the signed distance function as the surface representation and incorporates generalizable priors through geometry-encoding volumes and HyperNetworks. Specifically, our method builds neural encoding volumes from generated multi-view inputs. We adjust the weights of the SDF network conditioned on an input image at test-time to allow model adaptation to novel scenes in a feed-forward manner via HyperNetworks. To mitigate artifacts derived from the synthesized views, we propose the use of a volume transformer module to improve the aggregation of image features instead of processing each viewpoint separately. Through our proposed method, dubbed as Hyper-VolTran, we avoid the bottleneck of scene-specific optimization and maintain consistency across the images generated from multiple viewpoints. Our experiments show the advantages of our proposed approach with consistent results and rapid generation. △ Less

Submitted 5 January, 2024; v1 submitted 24 December, 2023; originally announced December 2023.

arXiv:2312.15633 [pdf, other]

MuLA-GAN: Multi-Level Attention GAN for Enhanced Underwater Visibility

Authors: Ahsan Baidar Bakht, Zikai Jia, Muhayy ud Din, Waseem Akram, Lyes Saad Soud, Lakmal Seneviratne, Defu Lin, Shaoming He, Irfan Hussain

Abstract: The underwater environment presents unique challenges, including color distortions, reduced contrast, and blurriness, hindering accurate analysis. In this work, we introduce MuLA-GAN, a novel approach that leverages the synergistic power of Generative Adversarial Networks (GANs) and Multi-Level Attention mechanisms for comprehensive underwater image enhancement. The integration of Multi-Level Atte… ▽ More The underwater environment presents unique challenges, including color distortions, reduced contrast, and blurriness, hindering accurate analysis. In this work, we introduce MuLA-GAN, a novel approach that leverages the synergistic power of Generative Adversarial Networks (GANs) and Multi-Level Attention mechanisms for comprehensive underwater image enhancement. The integration of Multi-Level Attention within the GAN architecture significantly enhances the model's capacity to learn discriminative features crucial for precise image restoration. By selectively focusing on relevant spatial and multi-level features, our model excels in capturing and preserving intricate details in underwater imagery, essential for various applications. Extensive qualitative and quantitative analyses on diverse datasets, including UIEB test dataset, UIEB challenge dataset, U45, and UCCS dataset, highlight the superior performance of MuLA-GAN compared to existing state-of-the-art methods. Experimental evaluations on a specialized dataset tailored for bio-fouling and aquaculture applications demonstrate the model's robustness in challenging environmental conditions. On the UIEB test dataset, MuLA-GAN achieves exceptional PSNR (25.59) and SSIM (0.893) scores, surpassing Water-Net, the second-best model, with scores of 24.36 and 0.885, respectively. This work not only addresses a significant research gap in underwater image enhancement but also underscores the pivotal role of Multi-Level Attention in enhancing GANs, providing a novel and comprehensive framework for restoring underwater image quality. △ Less

Submitted 25 December, 2023; originally announced December 2023.

arXiv:2312.15484 [pdf, other]

On constructibility of AdS supergluon amplitudes

Authors: Qu Cao, Song He, Yichao Tang

Abstract: We prove that all tree-level $n$-point supergluon (scalar) amplitudes in AdS$_5$ can be recursively constructed, using factorization and flat-space limit. Our method is greatly facilitated by a natural R-symmetry basis for planar color-ordered amplitudes, which reduces the latter to "partial amplitudes" with simpler pole structures and factorization properties. Given the $n$-point scalar amplitude… ▽ More We prove that all tree-level $n$-point supergluon (scalar) amplitudes in AdS$_5$ can be recursively constructed, using factorization and flat-space limit. Our method is greatly facilitated by a natural R-symmetry basis for planar color-ordered amplitudes, which reduces the latter to "partial amplitudes" with simpler pole structures and factorization properties. Given the $n$-point scalar amplitude, we first extract spinning amplitudes with $n{-}2$ scalars and one gluon by imposing "gauge invariance", and then use a special "no-gluon kinematics" to determine the $(n{+}1)$-point scalar amplitude completely (which in turn contains the $n$-point single-gluon amplitude). Explicit results of up to 8-point scalar amplitudes and up to 6-point single-gluon amplitudes are included as supplemental materials. △ Less

Submitted 14 January, 2024; v1 submitted 24 December, 2023; originally announced December 2023.

Comments: 5 pages, 4 figures, major revision from v2 including new ancillary file

arXiv:2312.13875 [pdf, other]

Best Arm Identification in Batched Multi-armed Bandit Problems

Authors: Shengyu Cao, Simai He, Ruoqing Jiang, Jin Xu, Hongsong Yuan

Abstract: Recently multi-armed bandit problem arises in many real-life scenarios where arms must be sampled in batches, due to limited time the agent can wait for the feedback. Such applications include biological experimentation and online marketing. The problem is further complicated when the number of arms is large and the number of batches is small. We consider pure exploration in a batched multi-armed… ▽ More Recently multi-armed bandit problem arises in many real-life scenarios where arms must be sampled in batches, due to limited time the agent can wait for the feedback. Such applications include biological experimentation and online marketing. The problem is further complicated when the number of arms is large and the number of batches is small. We consider pure exploration in a batched multi-armed bandit problem. We introduce a general linear programming framework that can incorporate objectives of different theoretical settings in best arm identification. The linear program leads to a two-stage algorithm that can achieve good theoretical properties. We demonstrate by numerical studies that the algorithm also has good performance compared to certain UCB-type or Thompson sampling methods. △ Less

Submitted 21 December, 2023; originally announced December 2023.

arXiv:2312.11988 [pdf, other]

Xpert: Empowering Incident Management with Query Recommendations via Large Language Models

Authors: Yuxuan Jiang, Chaoyun Zhang, Shilin He, Zhihao Yang, Minghua Ma, Si Qin, Yu Kang, Yingnong Dang, Saravan Rajmohan, Qingwei Lin, Dongmei Zhang

Abstract: Large-scale cloud systems play a pivotal role in modern IT infrastructure. However, incidents occurring within these systems can lead to service disruptions and adversely affect user experience. To swiftly resolve such incidents, on-call engineers depend on crafting domain-specific language (DSL) queries to analyze telemetry data. However, writing these queries can be challenging and time-consumin… ▽ More Large-scale cloud systems play a pivotal role in modern IT infrastructure. However, incidents occurring within these systems can lead to service disruptions and adversely affect user experience. To swiftly resolve such incidents, on-call engineers depend on crafting domain-specific language (DSL) queries to analyze telemetry data. However, writing these queries can be challenging and time-consuming. This paper presents a thorough empirical study on the utilization of queries of KQL, a DSL employed for incident management in a large-scale cloud management system at Microsoft. The findings obtained underscore the importance and viability of KQL queries recommendation to enhance incident management. Building upon these valuable insights, we introduce Xpert, an end-to-end machine learning framework that automates KQL recommendation process. By leveraging historical incident data and large language models, Xpert generates customized KQL queries tailored to new incidents. Furthermore, Xpert incorporates a novel performance metric called Xcore, enabling a thorough evaluation of query quality from three comprehensive perspectives. We conduct extensive evaluations of Xpert, demonstrating its effectiveness in offline settings. Notably, we deploy Xpert in the real production environment of a large-scale incident management system in Microsoft, validating its efficiency in supporting incident management. To the best of our knowledge, this paper represents the first empirical study of its kind, and Xpert stands as a pioneering DSL query recommendation framework designed for incident management. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: Accepted as a reseach paper at ICSE 2024

arXiv:2312.11549 [pdf, other]

Label-Free Multivariate Time Series Anomaly Detection

Authors: Qihang Zhou, Shibo He, Haoyu Liu, Jiming Chen, Wenchao Meng

Abstract: Anomaly detection in multivariate time series (MTS) has been widely studied in one-class classification (OCC) setting. The training samples in OCC are assumed to be normal, which is difficult to guarantee in practical situations. Such a case may degrade the performance of OCC-based anomaly detection methods which fit the training distribution as the normal distribution. In this paper, we propose M… ▽ More Anomaly detection in multivariate time series (MTS) has been widely studied in one-class classification (OCC) setting. The training samples in OCC are assumed to be normal, which is difficult to guarantee in practical situations. Such a case may degrade the performance of OCC-based anomaly detection methods which fit the training distribution as the normal distribution. In this paper, we propose MTGFlow, an unsupervised anomaly detection approach for MTS anomaly detection via dynamic Graph and entity-aware normalizing Flow. MTGFlow first estimates the density of the entire training samples and then identifies anomalous instances based on the density of the test samples within the fitted distribution. This relies on a widely accepted assumption that anomalous instances exhibit more sparse densities than normal ones, with no reliance on the clean training dataset. However, it is intractable to directly estimate the density due to complex dependencies among entities and their diverse inherent characteristics. To mitigate this, we utilize the graph structure learning model to learn interdependent and evolving relations among entities, which effectively captures complex and accurate distribution patterns of MTS. In addition, our approach incorporates the unique characteristics of individual entities by employing an entity-aware normalizing flow. This enables us to represent each entity as a parameterized normal distribution. Furthermore, considering that some entities present similar characteristics, we propose a cluster strategy that capitalizes on the commonalities of entities with similar characteristics, resulting in more precise and detailed density estimation. We refer to this cluster-aware extension as MTGFlow_cluster. Extensive experiments are conducted on six widely used benchmark datasets, in which MTGFlow and MTGFlow cluster demonstrate their superior detection performance. △ Less

Submitted 6 February, 2024; v1 submitted 16 December, 2023; originally announced December 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2208.02108

arXiv:2312.10979 [pdf, ps, other]

3S-TSE: Efficient Three-Stage Target Speaker Extraction for Real-Time and Low-Resource Applications

Authors: Shulin He, Jinjiang liu, Hao Li, Yang Yang, Fei Chen, Xueliang Zhang

Abstract: Target speaker extraction (TSE) aims to isolate a specific voice from multiple mixed speakers relying on a registerd sample. Since voiceprint features usually vary greatly, current end-to-end neural networks require large model parameters which are computational intensive and impractical for real-time applications, espetially on resource-constrained platforms. In this paper, we address the TSE tas… ▽ More Target speaker extraction (TSE) aims to isolate a specific voice from multiple mixed speakers relying on a registerd sample. Since voiceprint features usually vary greatly, current end-to-end neural networks require large model parameters which are computational intensive and impractical for real-time applications, espetially on resource-constrained platforms. In this paper, we address the TSE task using microphone array and introduce a novel three-stage solution that systematically decouples the process: First, a neural network is trained to estimate the direction of the target speaker. Second, with the direction determined, the Generalized Sidelobe Canceller (GSC) is used to extract the target speech. Third, an Inplace Convolutional Recurrent Neural Network (ICRN) acts as a denoising post-processor, refining the GSC output to yield the final separated speech. Our approach delivers superior performance while drastically reducing computational load, setting a new standard for efficient real-time target speaker extraction. △ Less

Submitted 4 January, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

Comments: Accepted to ICASSP 2024

arXiv:2312.09716 [pdf, other]

Let All be Whitened: Multi-teacher Distillation for Efficient Visual Retrieval

Authors: Zhe Ma, Jianfeng Dong, Shouling Ji, Zhenguang Liu, Xuhong Zhang, Zonghui Wang, Sifeng He, Feng Qian, Xiaobo Zhang, Lei Yang

Abstract: Visual retrieval aims to search for the most relevant visual items, e.g., images and videos, from a candidate gallery with a given query item. Accuracy and efficiency are two competing objectives in retrieval tasks. Instead of crafting a new method pursuing further improvement on accuracy, in this paper we propose a multi-teacher distillation framework Whiten-MTD, which is able to transfer knowled… ▽ More Visual retrieval aims to search for the most relevant visual items, e.g., images and videos, from a candidate gallery with a given query item. Accuracy and efficiency are two competing objectives in retrieval tasks. Instead of crafting a new method pursuing further improvement on accuracy, in this paper we propose a multi-teacher distillation framework Whiten-MTD, which is able to transfer knowledge from off-the-shelf pre-trained retrieval models to a lightweight student model for efficient visual retrieval. Furthermore, we discover that the similarities obtained by different retrieval models are diversified and incommensurable, which makes it challenging to jointly distill knowledge from multiple models. Therefore, we propose to whiten the output of teacher models before fusion, which enables effective multi-teacher distillation for retrieval models. Whiten-MTD is conceptually simple and practically effective. Extensive experiments on two landmark image retrieval datasets and one video retrieval dataset demonstrate the effectiveness of our proposed method, and its good balance of retrieval performance and efficiency. Our source code is released at https://github.com/Maryeon/whiten_mtd. △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: Accepted by AAAI 2024

arXiv:2312.08672 [pdf, other]

doi 10.1016/j.ins.2024.120916

CAT: A Causally Graph Attention Network for Trimming Heterophilic Graph

Authors: Silu He, Qinyao Luo, Xinsha Fu, Ling Zhao, Ronghua Du, Haifeng Li

Abstract: Local Attention-guided Message Passing Mechanism (LAMP) adopted in Graph Attention Networks (GATs) is designed to adaptively learn the importance of neighboring nodes for better local aggregation on the graph, which can bring the representations of similar neighbors closer effectively, thus showing stronger discrimination ability. However, existing GATs suffer from a significant discrimination abi… ▽ More Local Attention-guided Message Passing Mechanism (LAMP) adopted in Graph Attention Networks (GATs) is designed to adaptively learn the importance of neighboring nodes for better local aggregation on the graph, which can bring the representations of similar neighbors closer effectively, thus showing stronger discrimination ability. However, existing GATs suffer from a significant discrimination ability decline in heterophilic graphs because the high proportion of dissimilar neighbors can weaken the self-attention of the central node, jointly resulting in the deviation of the central node from similar nodes in the representation space. This kind of effect generated by neighboring nodes is called the Distraction Effect (DE) in this paper. To estimate and weaken the DE of neighboring nodes, we propose a Causally graph Attention network for Trimming heterophilic graph (CAT). To estimate the DE, since the DE are generated through two paths (grab the attention assigned to neighbors and reduce the self-attention of the central node), we use Total Effect to model DE, which is a kind of causal estimand and can be estimated from intervened data; To weaken the DE, we identify the neighbors with the highest DE (we call them Distraction Neighbors) and remove them. We adopt three representative GATs as the base model within the proposed CAT framework and conduct experiments on seven heterophilic datasets in three different sizes. Comparative experiments show that CAT can improve the node classification accuracy of all base GAT models. Ablation experiments and visualization further validate the enhancement of discrimination ability brought by CAT. The source code is available at https://github.com/GeoX-Lab/CAT. △ Less

Submitted 17 June, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

Comments: 25 pages, 18 figures, 5 tables

Journal ref: Information Science 2024

arXiv:2312.05062 [pdf, ps, other]

Deep Learning Enabled Semantic Communication Systems for Video Transmission

Authors: Zhenguo Zhang, Qianqian Yang, Shibo He, Jiming Chen

Abstract: Semantic communication has emerged as a promising approach for improving efficient transmission in the next generation of wireless networks. Inspired by the success of semantic communication in different areas, we aim to provide a new semantic communication scheme from the semantic level. In this paper, we propose a novel DL-based semantic communication system for video transmission, which compact… ▽ More Semantic communication has emerged as a promising approach for improving efficient transmission in the next generation of wireless networks. Inspired by the success of semantic communication in different areas, we aim to provide a new semantic communication scheme from the semantic level. In this paper, we propose a novel DL-based semantic communication system for video transmission, which compacts semantic-related information to improve transmission efficiency. In particular, we utilize the Bi-optical flow to estimate residual information of inter-frame details. We also propose a feature choice module and a feature fusion module to drop semantically redundant features while paying more attention to the important semantic-related content. We employ a frame prediction module to reconstruct semantic features of the prediction frame from the received signal at the receiver. To enhance the system's robustness, we propose a noise attention module that assigns different importance weights to the extracted features. Simulation results indicate that our proposed method outperforms existing approaches in terms of transmission efficiency, achieving about 33.3\% reduction in the number of transmitted symbols while improving the peak signal-to-noise ratio (PSNR) performance by an average of 0.56dB. △ Less

Submitted 8 December, 2023; originally announced December 2023.

arXiv:2312.04557 [pdf, other]

GenTron: Diffusion Transformers for Image and Video Generation

Authors: Shoufa Chen, Mengmeng Xu, Jiawei Ren, Yuren Cong, Sen He, Yanping Xie, Animesh Sinha, Ping Luo, Tao Xiang, Juan-Manuel Perez-Rua

Abstract: In this study, we explore Transformer-based diffusion models for image and video generation. Despite the dominance of Transformer architectures in various fields due to their flexibility and scalability, the visual generative domain primarily utilizes CNN-based U-Net architectures, particularly in diffusion-based models. We introduce GenTron, a family of Generative models employing Transformer-bas… ▽ More In this study, we explore Transformer-based diffusion models for image and video generation. Despite the dominance of Transformer architectures in various fields due to their flexibility and scalability, the visual generative domain primarily utilizes CNN-based U-Net architectures, particularly in diffusion-based models. We introduce GenTron, a family of Generative models employing Transformer-based diffusion, to address this gap. Our initial step was to adapt Diffusion Transformers (DiTs) from class to text conditioning, a process involving thorough empirical exploration of the conditioning mechanism. We then scale GenTron from approximately 900M to over 3B parameters, observing significant improvements in visual quality. Furthermore, we extend GenTron to text-to-video generation, incorporating novel motion-free guidance to enhance video quality. In human evaluations against SDXL, GenTron achieves a 51.1% win rate in visual quality (with a 19.8% draw rate), and a 42.3% win rate in text alignment (with a 42.9% draw rate). GenTron also excels in the T2I-CompBench, underscoring its strengths in compositional generation. We believe this work will provide meaningful insights and serve as a valuable reference for future research. △ Less

Submitted 2 June, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

Comments: CVPR2024 Camera Ready. Website: https://www.shoufachen.com/gentron_website/

arXiv:2312.02679 [pdf, other]

Entanglement and Pseudo Entanglement Dynamics versus Fusion in CFT

Authors: Song He, Yu-Xuan Zhang, Long Zhao, Zi-Xuan Zhao

Abstract: The fusion rules and operator product expansion (OPE) serve as crucial tools in the study of operator algebras within conformal field theory (CFT). Building upon the vision of using entanglement to explore the connections between fusion coefficients and OPE coefficients, we employ the replica method and Schmidt decomposition method to investigate the time evolution of entanglement entropy (EE) and… ▽ More The fusion rules and operator product expansion (OPE) serve as crucial tools in the study of operator algebras within conformal field theory (CFT). Building upon the vision of using entanglement to explore the connections between fusion coefficients and OPE coefficients, we employ the replica method and Schmidt decomposition method to investigate the time evolution of entanglement entropy (EE) and pseudo entropy (PE) for linear combinations of operators in rational conformal field theory (RCFT). We obtain a formula that links fusion coefficients, quantum dimensions, and OPE coefficients. We also identify two definition schemes for linear combination operators. Under one scheme, the EE captures information solely for the heaviest operators, while the PE retains information for all operators, reflecting the phenomenon of pseudo entropy amplification. Irrespective of the scheme employed, the EE demonstrates a step-like evolution, illustrating the effectiveness of the quasiparticle propagation picture for the general superposition of locally excited states in RCFT. From the perspective of quasiparticle propagation, we observe spontaneous block-diagonalization of the reduced density matrix of a subsystem when quasiparticles enter the subsystem. △ Less

Submitted 29 June, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

Comments: 29 pages, 4 figures, published version

arXiv:2311.17541 [pdf, other]

TaskWeaver: A Code-First Agent Framework

Authors: Bo Qiao, Liqun Li, Xu Zhang, Shilin He, Yu Kang, Chaoyun Zhang, Fangkai Yang, Hang Dong, Jue Zhang, Lu Wang, Minghua Ma, Pu Zhao, Si Qin, Xiaoting Qin, Chao Du, Yong Xu, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang

Abstract: Large Language Models (LLMs) have shown impressive abilities in natural language understanding and generation, leading to their widespread use in applications such as chatbots and virtual assistants. However, existing LLM frameworks face limitations in handling domain-specific data analytics tasks with rich data structures. Moreover, they struggle with flexibility to meet diverse user requirements… ▽ More Large Language Models (LLMs) have shown impressive abilities in natural language understanding and generation, leading to their widespread use in applications such as chatbots and virtual assistants. However, existing LLM frameworks face limitations in handling domain-specific data analytics tasks with rich data structures. Moreover, they struggle with flexibility to meet diverse user requirements. To address these issues, TaskWeaver is proposed as a code-first framework for building LLM-powered autonomous agents. It converts user requests into executable code and treats user-defined plugins as callable functions. TaskWeaver provides support for rich data structures, flexible plugin usage, and dynamic plugin selection, and leverages LLM coding capabilities for complex logic. It also incorporates domain-specific knowledge through examples and ensures the secure execution of generated code. TaskWeaver offers a powerful and flexible framework for creating intelligent conversational agents that can handle complex tasks and adapt to domain-specific scenarios. The code is open sourced at https://github.com/microsoft/TaskWeaver/. △ Less

Submitted 19 June, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

arXiv:2311.14008 [pdf, ps, other]

A note on rational homology vanishing theorem for hypersurfaces in aspherical manifolds

Authors: Shihang He, Jintian Zhu

Abstract: In this note, we generalize Gromov's reduction \cite{Gro20} from the aspherical conjecture to the generalized filling radius conjecture to the smooth $\mathbb Q$-homology vanishing conjecture for hypersurface. In particular, we can show that any continuous map from a closed $4$-manifold admitting positive scalar curvature to an aspherical $5$-manifold induces zero map in $H_4(\cdot,\mathbb Q)$. As… ▽ More In this note, we generalize Gromov's reduction \cite{Gro20} from the aspherical conjecture to the generalized filling radius conjecture to the smooth $\mathbb Q$-homology vanishing conjecture for hypersurface. In particular, we can show that any continuous map from a closed $4$-manifold admitting positive scalar curvature to an aspherical $5$-manifold induces zero map in $H_4(\cdot,\mathbb Q)$. As a corollary, we obtain the following splitting theorem: if a complete aspherical $5$-manifold has nonnegative scalar curvature and two ends, then it splits into the Riemannian product of a closed flat manifold and the real line. △ Less

Submitted 23 November, 2023; originally announced November 2023.

Comments: 11 pages, all comments are welcome

arXiv:2311.13535 [pdf, other]

DiffusionMat: Alpha Matting as Sequential Refinement Learning

Authors: Yangyang Xu, Shengfeng He, Wenqi Shao, Kwan-Yee K. Wong, Yu Qiao, Ping Luo

Abstract: In this paper, we introduce DiffusionMat, a novel image matting framework that employs a diffusion model for the transition from coarse to refined alpha mattes. Diverging from conventional methods that utilize trimaps merely as loose guidance for alpha matte prediction, our approach treats image matting as a sequential refinement learning process. This process begins with the addition of noise to… ▽ More In this paper, we introduce DiffusionMat, a novel image matting framework that employs a diffusion model for the transition from coarse to refined alpha mattes. Diverging from conventional methods that utilize trimaps merely as loose guidance for alpha matte prediction, our approach treats image matting as a sequential refinement learning process. This process begins with the addition of noise to trimaps and iteratively denoises them using a pre-trained diffusion model, which incrementally guides the prediction towards a clean alpha matte. The key innovation of our framework is a correction module that adjusts the output at each denoising step, ensuring that the final result is consistent with the input image's structures. We also introduce the Alpha Reliability Propagation, a novel technique designed to maximize the utility of available guidance by selectively enhancing the trimap regions with confident alpha information, thus simplifying the correction task. To train the correction module, we devise specialized loss functions that target the accuracy of the alpha matte's edges and the consistency of its opaque and transparent regions. We evaluate our model across several image matting benchmarks, and the results indicate that DiffusionMat consistently outperforms existing methods. Project page at~\url{https://cnnlstm.github.io/DiffusionMat △ Less

Submitted 22 November, 2023; originally announced November 2023.

arXiv:2311.13381 [pdf, other]

Confidant: Customizing Transformer-based LLMs via Collaborative Edge Training

Authors: Yuhao Chen, Yuxuan Yan, Qianqian Yang, Yuanchao Shu, Shibo He, Jiming Chen

Abstract: Transformer-based large language models (LLMs) have demonstrated impressive capabilities in a variety of natural language processing (NLP) tasks. Nonetheless, it is challenging to deploy and fine-tune LLMs on mobile edge devices with limited computing, memory, and energy budgets. In this paper, we propose Confidant, a multi-backend collaborative training framework for customizing state-of-the-art… ▽ More Transformer-based large language models (LLMs) have demonstrated impressive capabilities in a variety of natural language processing (NLP) tasks. Nonetheless, it is challenging to deploy and fine-tune LLMs on mobile edge devices with limited computing, memory, and energy budgets. In this paper, we propose Confidant, a multi-backend collaborative training framework for customizing state-of-the-art LLMs on commodity mobile devices like smartphones. Confidant partitions an LLM into several sub-models so that each fits into a mobile device's memory. A pipeline parallel training mechanism is further developed to ensure fast and efficient distributed training. In addition, we propose a novel backend scheduler to allocate different attention heads to heterogeneous compute hardware, including mobile CPU and GPUs, to maximize the compute resource utilization on each edge device. Our preliminary experimental results show that Confidant achieves at most 45.3% memory reduction and 8.03x inference speedup in practical settings. △ Less

Submitted 22 November, 2023; originally announced November 2023.

Comments: 6 pages, 7 figures; Submitted to HotMobile 2024

arXiv:2311.11669 [pdf, other]

PMP-Swin: Multi-Scale Patch Message Passing Swin Transformer for Retinal Disease Classification

Authors: Zhihan Yang, Zhiming Cheng, Tengjin Weng, Shucheng He, Yaqi Wang, Xin Ye, Shuai Wang

Abstract: Retinal disease is one of the primary causes of visual impairment, and early diagnosis is essential for preventing further deterioration. Nowadays, many works have explored Transformers for diagnosing diseases due to their strong visual representation capabilities. However, retinal diseases exhibit milder forms and often present with overlapping signs, which pose great difficulties for accurate mu… ▽ More Retinal disease is one of the primary causes of visual impairment, and early diagnosis is essential for preventing further deterioration. Nowadays, many works have explored Transformers for diagnosing diseases due to their strong visual representation capabilities. However, retinal diseases exhibit milder forms and often present with overlapping signs, which pose great difficulties for accurate multi-class classification. Therefore, we propose a new framework named Multi-Scale Patch Message Passing Swin Transformer for multi-class retinal disease classification. Specifically, we design a Patch Message Passing (PMP) module based on the Message Passing mechanism to establish global interaction for pathological semantic features and to exploit the subtle differences further between different diseases. Moreover, considering the various scale of pathological features we integrate multiple PMP modules for different patch sizes. For evaluation, we have constructed a new dataset, named OPTOS dataset, consisting of 1,033 high-resolution fundus images photographed by Optos camera and conducted comprehensive experiments to validate the efficacy of our proposed method. And the results on both the public dataset and our dataset demonstrate that our method achieves remarkable performance compared to state-of-the-art methods. △ Less

Submitted 20 November, 2023; originally announced November 2023.

Comments: 9 pages, 7 figures

arXiv:2311.09861 [pdf, other]

ConceptPsy:A Benchmark Suite with Conceptual Comprehensiveness in Psychology

Authors: Junlei Zhang, Hongliang He, Nirui Song, Zhanchao Zhou, Shuyuan He, Shuai Zhang, Huachuan Qiu, Anqi Li, Yong Dai, Lizhi Ma, Zhenzhong Lan

Abstract: The critical field of psychology necessitates a comprehensive benchmark to enhance the evaluation and development of domain-specific Large Language Models (LLMs). Existing MMLU-type benchmarks, such as C-EVAL and CMMLU, include psychology-related subjects, but their limited number of questions and lack of systematic concept sampling strategies mean they cannot cover the concepts required in psycho… ▽ More The critical field of psychology necessitates a comprehensive benchmark to enhance the evaluation and development of domain-specific Large Language Models (LLMs). Existing MMLU-type benchmarks, such as C-EVAL and CMMLU, include psychology-related subjects, but their limited number of questions and lack of systematic concept sampling strategies mean they cannot cover the concepts required in psychology. Consequently, despite their broad subject coverage, these benchmarks lack the necessary depth in the psychology domain, making them inadequate as psychology-specific evaluation suite. To address this issue, this paper presents ConceptPsy, designed to evaluate Chinese complex reasoning and knowledge abilities in psychology. ConceptPsy includes 12 core subjects and 1383 manually collected concepts. Specifically, we prompt GPT-4 to generate questions for each concept using carefully designed diverse prompts and hire professional psychologists to review these questions. To help to understand the fine-grained performances and enhance the weaknesses, we annotate each question with a chapter label and provide chapter-wise accuracy. Based on ConceptPsy, we evaluate a broad range of LLMs. We observe that, although some LLMs achieve similar accuracies on overall performances, they exhibit significant performance variations across different psychology concepts, even when they are models from the same series. We hope our work can facilitate the development of LLMs in the field of psychology. △ Less

Submitted 16 June, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

Comments: Under Review

arXiv:2311.09636 [pdf, ps, other]

Holographic torus correlators in $\text{AdS}_3$ gravity coupled to scalar field

Authors: Song He, Yun-Ze Li, Yunda Zhang

Abstract: This paper investigates holographic torus correlators of generic operators at conformal infinity and a finite cutoff within AdS$_3$ gravity coupled with a free scalar field. Using a near-boundary analysis and solving the gravitational boundary value problem, we solve Einstein's equation and calculate mixed correlators for massless and massive coupled scalar fields. The conformal ward identity on t… ▽ More This paper investigates holographic torus correlators of generic operators at conformal infinity and a finite cutoff within AdS$_3$ gravity coupled with a free scalar field. Using a near-boundary analysis and solving the gravitational boundary value problem, we solve Einstein's equation and calculate mixed correlators for massless and massive coupled scalar fields. The conformal ward identity on the torus has been reproduced holographically, which can be regarded as a consistency check. Further, recurrence relations for a specific class of higher-point correlators are derived, validating AdS$_3$/CFT$_2$ with non-trivial boundary topology. While the two-point scalar correlator is accurately computed on the thermal AdS$_3$ saddle, the higher-point correlators associated with scalar and stress tensor operators are explored. △ Less

Submitted 24 May, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

Comments: 51 pages. Match the published version

arXiv:2311.07514 [pdf, other]

VGSG: Vision-Guided Semantic-Group Network for Text-based Person Search

Authors: Shuting He, Hao Luo, Wei Jiang, Xudong Jiang, Henghui Ding

Abstract: Text-based Person Search (TBPS) aims to retrieve images of target pedestrian indicated by textual descriptions. It is essential for TBPS to extract fine-grained local features and align them crossing modality. Existing methods utilize external tools or heavy cross-modal interaction to achieve explicit alignment of cross-modal fine-grained features, which is inefficient and time-consuming. In this… ▽ More Text-based Person Search (TBPS) aims to retrieve images of target pedestrian indicated by textual descriptions. It is essential for TBPS to extract fine-grained local features and align them crossing modality. Existing methods utilize external tools or heavy cross-modal interaction to achieve explicit alignment of cross-modal fine-grained features, which is inefficient and time-consuming. In this work, we propose a Vision-Guided Semantic-Group Network (VGSG) for text-based person search to extract well-aligned fine-grained visual and textual features. In the proposed VGSG, we develop a Semantic-Group Textual Learning (SGTL) module and a Vision-guided Knowledge Transfer (VGKT) module to extract textual local features under the guidance of visual local clues. In SGTL, in order to obtain the local textual representation, we group textual features from the channel dimension based on the semantic cues of language expression, which encourages similar semantic patterns to be grouped implicitly without external tools. In VGKT, a vision-guided attention is employed to extract visual-related textual features, which are inherently aligned with visual cues and termed vision-guided textual features. Furthermore, we design a relational knowledge transfer, including a vision-language similarity transfer and a class probability transfer, to adaptively propagate information of the vision-guided textual features to semantic-group textual features. With the help of relational knowledge transfer, VGKT is capable of aligning semantic-group textual features with corresponding visual features without external tools and complex pairwise interaction. Experimental results on two challenging benchmarks demonstrate its superiority over state-of-the-art methods. △ Less

Submitted 13 November, 2023; originally announced November 2023.

Comments: Accepted to IEEE TIP

arXiv:2311.07039 [pdf, other]

Time-Optimal Control for High-Order Chain-of-Integrators Systems with Full State Constraints and Arbitrary Terminal States (Extended Version)

Authors: Yunan Wang, Chuxiong Hu, Zeyang Li, Shize Lin, Suqin He, Yu Zhu

Abstract: Time-optimal control for high-order chain-of-integrators systems with full state constraints and arbitrarily given terminal states remains a challenging problem in the optimal control theory domain, yet to be resolved. To enhance further comprehension of the problem, this paper establishes a novel notation system and theoretical framework, providing the switching manifold for high-order problems i… ▽ More Time-optimal control for high-order chain-of-integrators systems with full state constraints and arbitrarily given terminal states remains a challenging problem in the optimal control theory domain, yet to be resolved. To enhance further comprehension of the problem, this paper establishes a novel notation system and theoretical framework, providing the switching manifold for high-order problems in the form of switching laws. Through deriving properties of switching laws regarding signs and dimension, this paper proposes a definite condition for time-optimal control. Guided by the developed theory, a trajectory planning method named the manifold-intercept method (MIM) is developed. The proposed MIM can plan time-optimal jerk-limited trajectories with full state constraints, and can also plan near-optimal non-chattering higher-order trajectories with negligible extra motion time compared to optimal profiles. Numerical results indicate that the proposed MIM outperforms all baselines in computational time, computational accuracy, and trajectory quality by a large gap. △ Less

Submitted 28 March, 2024; v1 submitted 12 November, 2023; originally announced November 2023.

arXiv:2311.07032 [pdf, other]

ExpNote: Black-box Large Language Models are Better Task Solvers with Experience Notebook

Authors: Wangtao Sun, Xuanqing Yu, Shizhu He, Jun Zhao, Kang Liu

Abstract: Black-box Large Language Models (LLMs) have shown great power in solving various tasks and are considered general problem solvers. However, LLMs still fail in many specific tasks although understand the task instruction. In this paper, we focus on the problem of boosting the ability of black-box LLMs to solve downstream tasks. We propose ExpNote, an automated framework to help LLMs better adapt to… ▽ More Black-box Large Language Models (LLMs) have shown great power in solving various tasks and are considered general problem solvers. However, LLMs still fail in many specific tasks although understand the task instruction. In this paper, we focus on the problem of boosting the ability of black-box LLMs to solve downstream tasks. We propose ExpNote, an automated framework to help LLMs better adapt to unfamiliar tasks through reflecting and noting experiences from training data and retrieving them from external memory during testing. We evaluate ExpNote on multiple tasks and the experimental results demonstrate that the proposed method significantly improves the performance of black-box LLMs. The data and code are available at https://github.com/forangel2014/ExpNote △ Less

Submitted 12 November, 2023; originally announced November 2023.

Comments: EMNLP 2023 findings

Showing 101–150 of 1,175 results for author: He, S