Search | arXiv e-print repository

Towards Multimodal Sentiment Analysis Debiasing via Bias Purification

Authors: Dingkang Yang, Mingcheng Li, Dongling Xiao, Yang Liu, Kun Yang, Zhaoyu Chen, Yuzheng Wang, Peng Zhai, Ke Li, Lihua Zhang

Abstract: Multimodal Sentiment Analysis (MSA) aims to understand human intentions by integrating emotion-related clues from diverse modalities, such as visual, language, and audio. Unfortunately, the current MSA task invariably suffers from unplanned dataset biases, particularly multimodal utterance-level label bias and word-level context bias. These harmful biases potentially mislead models to focus on sta… ▽ More Multimodal Sentiment Analysis (MSA) aims to understand human intentions by integrating emotion-related clues from diverse modalities, such as visual, language, and audio. Unfortunately, the current MSA task invariably suffers from unplanned dataset biases, particularly multimodal utterance-level label bias and word-level context bias. These harmful biases potentially mislead models to focus on statistical shortcuts and spurious correlations, causing severe performance bottlenecks. To alleviate these issues, we present a Multimodal Counterfactual Inference Sentiment (MCIS) analysis framework based on causality rather than conventional likelihood. Concretely, we first formulate a causal graph to discover harmful biases from already-trained vanilla models. In the inference phase, given a factual multimodal input, MCIS imagines two counterfactual scenarios to purify and mitigate these biases. Then, MCIS can make unbiased decisions from biased observations by comparing factual and counterfactual outcomes. We conduct extensive experiments on several standard MSA benchmarks. Qualitative and quantitative results show the effectiveness of the proposed framework. △ Less

Submitted 5 July, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

Comments: Accepted by ECCV 2024

arXiv:2403.05007 [pdf, other]

Age of Computing: A Metric of Computation Freshness in Communication and Computation Cooperative Networks

Authors: Xingran Chen, Yi Zhuang, Kun Yang

Abstract: In communication and computation cooperative networks (3CNs), timely computation is crucial but not always guaranteed. There is a strong demand for a computational task to be completed within a given deadline. The time taken involves both processing time, communication time, and the impact of the deadline. However, a measure of such timeliness in 3CNs is lacking. In this paper, we introduce the no… ▽ More In communication and computation cooperative networks (3CNs), timely computation is crucial but not always guaranteed. There is a strong demand for a computational task to be completed within a given deadline. The time taken involves both processing time, communication time, and the impact of the deadline. However, a measure of such timeliness in 3CNs is lacking. In this paper, we introduce the novel concept, Age of Computing (AoC), to capture computation freshness in 3CNs. We analyze AoC in a line topology consisting of a source, a transmitter, a receiver, and a computational node. Tasks generated by the source are immediately available at the transmitter, where they enter a communication queue. These tasks then pass to the receiver and subsequently to a computation queue at the computational node for processing. Each task has a deadline, requiring completion within this timeframe. AoC is evaluated under two types of deadlines: (i) soft deadline, tasks can be fed back to the source if delayed beyond the deadline, but with additional latency; (ii) hard deadline, tasks delayed beyond the deadline are discarded. Under both deadlines, we derive the AoC formula and a general expression for the time-average AoC. For the first-come, first-serve discipline, we obtain a closed-form expression for the average AoC under the soft deadline and an approximation for the hard deadline. In addition to freshness, we define computation throughput, providing a general expression and an approximation. To explore the relationship between freshness and throughput, we construct an optimization problem and prove that the objective pair is a weakly Pareto-optimal point. Numerical results validate all the theoretical findings. Additionally, they reveal that under the hard deadline, the computation throughput serves as a reliable proxy for the average AoC. △ Less

Submitted 24 July, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

arXiv:2403.03506 [pdf, other]

Detecting AI-Generated Sentences in Human-AI Collaborative Hybrid Texts: Challenges, Strategies, and Insights

Authors: Zijie Zeng, Shiqi Liu, Lele Sha, Zhuang Li, Kaixun Yang, Sannyuya Liu, Dragan Gašević, Guanliang Chen

Abstract: This study explores the challenge of sentence-level AI-generated text detection within human-AI collaborative hybrid texts. Existing studies of AI-generated text detection for hybrid texts often rely on synthetic datasets. These typically involve hybrid texts with a limited number of boundaries. We contend that studies of detecting AI-generated content within hybrid texts should cover different ty… ▽ More This study explores the challenge of sentence-level AI-generated text detection within human-AI collaborative hybrid texts. Existing studies of AI-generated text detection for hybrid texts often rely on synthetic datasets. These typically involve hybrid texts with a limited number of boundaries. We contend that studies of detecting AI-generated content within hybrid texts should cover different types of hybrid texts generated in realistic settings to better inform real-world applications. Therefore, our study utilizes the CoAuthor dataset, which includes diverse, realistic hybrid texts generated through the collaboration between human writers and an intelligent writing system in multi-turn interactions. We adopt a two-step, segmentation-based pipeline: (i) detect segments within a given hybrid text where each segment contains sentences of consistent authorship, and (ii) classify the authorship of each identified segment. Our empirical findings highlight (1) detecting AI-generated sentences in hybrid texts is overall a challenging task because (1.1) human writers' selecting and even editing AI-generated sentences based on personal preferences adds difficulty in identifying the authorship of segments; (1.2) the frequent change of authorship between neighboring sentences within the hybrid text creates difficulties for segment detectors in identifying authorship-consistent segments; (1.3) the short length of text segments within hybrid texts provides limited stylistic cues for reliable authorship determination; (2) before embarking on the detection process, it is beneficial to assess the average length of segments within the hybrid text. This assessment aids in deciding whether (2.1) to employ a text segmentation-based strategy for hybrid texts with longer segments, or (2.2) to adopt a direct sentence-by-sentence classification strategy for those with shorter segments. △ Less

Submitted 23 May, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

Comments: Camera-Ready version of our IJCAI 2024 accepted paper (Special Track: AI and Social Good)

arXiv:2403.03212 [pdf, other]

Performance of a modular ton-scale pixel-readout liquid argon time projection chamber

Authors: DUNE Collaboration, A. Abed Abud, B. Abi, R. Acciarri, M. A. Acero, M. R. Adames, G. Adamov, M. Adamowski, D. Adams, M. Adinolfi, C. Adriano, A. Aduszkiewicz, J. Aguilar, B. Aimard, F. Akbar, K. Allison, S. Alonso Monsalve, M. Alrashed, A. Alton, R. Alvarez, T. Alves, H. Amar, P. Amedo, J. Anderson, D. A. Andrade , et al. (1340 additional authors not shown)

Abstract: The Module-0 Demonstrator is a single-phase 600 kg liquid argon time projection chamber operated as a prototype for the DUNE liquid argon near detector. Based on the ArgonCube design concept, Module-0 features a novel 80k-channel pixelated charge readout and advanced high-coverage photon detection system. In this paper, we present an analysis of an eight-day data set consisting of 25 million cosmi… ▽ More The Module-0 Demonstrator is a single-phase 600 kg liquid argon time projection chamber operated as a prototype for the DUNE liquid argon near detector. Based on the ArgonCube design concept, Module-0 features a novel 80k-channel pixelated charge readout and advanced high-coverage photon detection system. In this paper, we present an analysis of an eight-day data set consisting of 25 million cosmic ray events collected in the spring of 2021. We use this sample to demonstrate the imaging performance of the charge and light readout systems as well as the signal correlations between the two. We also report argon purity and detector uniformity measurements, and provide comparisons to detector simulations. △ Less

Submitted 5 March, 2024; originally announced March 2024.

Comments: 47 pages, 41 figures

Report number: FERMILAB-PUB-24-0073-LBNF

arXiv:2403.03004 [pdf, other]

Ultralight vector dark matter search using data from the KAGRA O3GK run

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, A. G. Abac, R. Abbott, H. Abe, I. Abouelfettouh, F. Acernese, K. Ackley, C. Adamcewicz, S. Adhicary, N. Adhikari, R. X. Adhikari, V. K. Adkins, V. B. Adya, C. Affeldt, D. Agarwal, M. Agathos, O. D. Aguiar, I. Aguilar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi , et al. (1778 additional authors not shown)

Abstract: Among the various candidates for dark matter (DM), ultralight vector DM can be probed by laser interferometric gravitational wave detectors through the measurement of oscillating length changes in the arm cavities. In this context, KAGRA has a unique feature due to differing compositions of its mirrors, enhancing the signal of vector DM in the length change in the auxiliary channels. Here we prese… ▽ More Among the various candidates for dark matter (DM), ultralight vector DM can be probed by laser interferometric gravitational wave detectors through the measurement of oscillating length changes in the arm cavities. In this context, KAGRA has a unique feature due to differing compositions of its mirrors, enhancing the signal of vector DM in the length change in the auxiliary channels. Here we present the result of a search for $U(1)_{B-L}$ gauge boson DM using the KAGRA data from auxiliary length channels during the first joint observation run together with GEO600. By applying our search pipeline, which takes into account the stochastic nature of ultralight DM, upper bounds on the coupling strength between the $U(1)_{B-L}$ gauge boson and ordinary matter are obtained for a range of DM masses. While our constraints are less stringent than those derived from previous experiments, this study demonstrates the applicability of our method to the lower-mass vector DM search, which is made difficult in this measurement by the short observation time compared to the auto-correlation time scale of DM. △ Less

Submitted 5 March, 2024; originally announced March 2024.

Comments: 20 pages, 5 figures

Report number: LIGO-P2300250

arXiv:2403.02814 [pdf, other]

InjectTST: A Transformer Method of Injecting Global Information into Independent Channels for Long Time Series Forecasting

Authors: Ce Chi, Xing Wang, Kexin Yang, Zhiyan Song, Di Jin, Lin Zhu, Chao Deng, Junlan Feng

Abstract: Transformer has become one of the most popular architectures for multivariate time series (MTS) forecasting. Recent Transformer-based MTS models generally prefer channel-independent structures with the observation that channel independence can alleviate noise and distribution drift issues, leading to more robustness. Nevertheless, it is essential to note that channel dependency remains an inherent… ▽ More Transformer has become one of the most popular architectures for multivariate time series (MTS) forecasting. Recent Transformer-based MTS models generally prefer channel-independent structures with the observation that channel independence can alleviate noise and distribution drift issues, leading to more robustness. Nevertheless, it is essential to note that channel dependency remains an inherent characteristic of MTS, carrying valuable information. Designing a model that incorporates merits of both channel-independent and channel-mixing structures is a key to further improvement of MTS forecasting, which poses a challenging conundrum. To address the problem, an injection method for global information into channel-independent Transformer, InjectTST, is proposed in this paper. Instead of designing a channel-mixing model directly, we retain the channel-independent backbone and gradually inject global information into individual channels in a selective way. A channel identifier, a global mixing module and a self-contextual attention module are devised in InjectTST. The channel identifier can help Transformer distinguish channels for better representation. The global mixing module produces cross-channel global information. Through the self-contextual attention module, the independent channels can selectively concentrate on useful global information without robustness degradation, and channel mixing is achieved implicitly. Experiments indicate that InjectTST can achieve stable improvement compared with state-of-the-art models. △ Less

Submitted 5 March, 2024; originally announced March 2024.

arXiv:2403.01789 [pdf, other]

DECOR: Enhancing Logic Locking Against Machine Learning-Based Attacks

Authors: Yinghua Hu, Kaixin Yang, Subhajit Dutta Chowdhury, Pierluigi Nuzzo

Abstract: Logic locking (LL) has gained attention as a promising intellectual property protection measure for integrated circuits. However, recent attacks, facilitated by machine learning (ML), have shown the potential to predict the correct key in multiple LL schemes by exploiting the correlation of the correct key value with the circuit structure. This paper presents a generic LL enhancement method based… ▽ More Logic locking (LL) has gained attention as a promising intellectual property protection measure for integrated circuits. However, recent attacks, facilitated by machine learning (ML), have shown the potential to predict the correct key in multiple LL schemes by exploiting the correlation of the correct key value with the circuit structure. This paper presents a generic LL enhancement method based on a randomized algorithm that can significantly decrease the correlation between locked circuit netlist and correct key values in an LL scheme. Numerical results show that the proposed method can efficiently degrade the accuracy of state-of-the-art ML-based attacks down to around 50%, resulting in negligible advantage versus random guessing. △ Less

Submitted 4 March, 2024; originally announced March 2024.

Comments: 8 pages. Accepted at the International Symposium on Quality Electronic Design (ISQED), 2024

arXiv:2403.01738 [pdf, other]

ComS2T: A complementary spatiotemporal learning system for data-adaptive model evolution

Authors: Zhengyang Zhou, Qihe Huang, Binwu Wang, Jianpeng Hou, Kuo Yang, Yuxuan Liang, Yang Wang

Abstract: Spatiotemporal (ST) learning has become a crucial technique to enable smart cities and sustainable urban development. Current ST learning models capture the heterogeneity via various spatial convolution and temporal evolution blocks. However, rapid urbanization leads to fluctuating distributions in urban data and city structures over short periods, resulting in existing methods suffering generaliz… ▽ More Spatiotemporal (ST) learning has become a crucial technique to enable smart cities and sustainable urban development. Current ST learning models capture the heterogeneity via various spatial convolution and temporal evolution blocks. However, rapid urbanization leads to fluctuating distributions in urban data and city structures over short periods, resulting in existing methods suffering generalization and data adaptation issues. Despite efforts, existing methods fail to deal with newly arrived observations and those methods with generalization capacity are limited in repeated training. Motivated by complementary learning in neuroscience, we introduce a prompt-based complementary spatiotemporal learning termed ComS2T, to empower the evolution of models for data adaptation. ComS2T partitions the neural architecture into a stable neocortex for consolidating historical memory and a dynamic hippocampus for new knowledge update. We first disentangle two disjoint structures into stable and dynamic weights, and then train spatial and temporal prompts by characterizing distribution of main observations to enable prompts adaptive to new data. This data-adaptive prompt mechanism, combined with a two-stage training process, facilitates fine-tuning of the neural architecture conditioned on prompts, thereby enabling efficient adaptation during testing. Extensive experiments validate the efficacy of ComS2T in adapting to various spatiotemporal out-of-distribution scenarios while maintaining efficient inference capabilities. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2403.01107 [pdf, other]

doi 10.1038/s41377-024-01546-7

Octave-spanning Kerr soliton frequency combs in dispersion- and dissipation-engineered lithium niobate microresonators

Authors: Yunxiang Song, Yaowen Hu, Xinrui Zhu, Kiyoul Yang, Marko Loncar

Abstract: Dissipative Kerr solitons from optical microresonators, commonly referred to as soliton microcombs, have been developed for a broad range of applications, including precision measurement, optical frequency synthesis, and ultra-stable microwave and millimeter wave generation, all on a chip. An important goal for microcombs is self referencing, which requires octave-spanning bandwidths to detect and… ▽ More Dissipative Kerr solitons from optical microresonators, commonly referred to as soliton microcombs, have been developed for a broad range of applications, including precision measurement, optical frequency synthesis, and ultra-stable microwave and millimeter wave generation, all on a chip. An important goal for microcombs is self referencing, which requires octave-spanning bandwidths to detect and stabilize the comb carrier envelope offset frequency. Further, detection and locking of the comb spacings are often achieved using frequency division by electro-optic modulation. The thin-film lithium niobate photonic platform, with its low loss, strong second- and third-order nonlinearity, as well as large Pockels effect, is ideally suited for these tasks. However, octave-spanning soliton microcombs are challenging to demonstrate on this platform, largely complicated by strong Raman effects hindering reliable fabrication of soliton devices. Here, we demonstrate entirely connected and octave-spanning soliton microcombs on thin-film lithium niobate. With appropriate control over microresonator free spectral range and dissipation spectrum, we show that soliton-inhibiting Raman effects are suppressed, and soliton devices are fabricated with near-unity yield. Our work offers an unambiguous method for soliton generation on strongly Raman-active materials. Further, it anticipates monolithically integrated, self-referenced frequency standards in conjunction with established technologies, such as periodically poled waveguides and electro-optic modulators, on thin-film lithium niobate. △ Less

Submitted 25 May, 2024; v1 submitted 2 March, 2024; originally announced March 2024.

arXiv:2403.00331 [pdf, other]

WindGP: Efficient Graph Partitioning on Heterogenous Machines

Authors: Li Zeng, Haohan Huang, Binfan Zheng, Kang Yang, Shengcheng Shao, Jinhua Zhou, Jun Xie, Rongqian Zhao, Xin Chen

Abstract: Graph Partitioning is widely used in many real-world applications such as fraud detection and social network analysis, in order to enable the distributed graph computing on large graphs. However, existing works fail to balance the computation cost and communication cost on machines with different power (including computing capability, network bandwidth and memory size), as they only consider repli… ▽ More Graph Partitioning is widely used in many real-world applications such as fraud detection and social network analysis, in order to enable the distributed graph computing on large graphs. However, existing works fail to balance the computation cost and communication cost on machines with different power (including computing capability, network bandwidth and memory size), as they only consider replication factor and neglect the difference of machines in realistic data centers. In this paper, we propose a general graph partitioning algorithm WindGP, which can support fast and high-quality edge partitioning on heterogeneous machines. WindGP designs novel preprocessing techniques to simplify the metric and balance the computation cost according to the characteristics of graphs and machines. Also, best-first search is proposed instead of BFS and DFS, in order to generate clusters with high cohesion. Furthermore, WindGP adaptively tunes the partition results by sophisticated local search methods. Extensive experiments show that WindGP outperforms all state-of-the-art partition methods by 1.35 - 27 times on both dense and sparse distributed graph algorithms, and has good scalability with graph size and machine number. △ Less

Submitted 6 March, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

Comments: 19 pages, 15 figures, 18 tables

arXiv:2402.18393 [pdf, other]

Evaluating Decision Optimality of Autonomous Driving via Metamorphic Testing

Authors: Mingfei Cheng, Yuan Zhou, Xiaofei Xie, Junjie Wang, Guozhu Meng, Kairui Yang

Abstract: Autonomous Driving System (ADS) testing is crucial in ADS development, with the current primary focus being on safety. However, the evaluation of non-safety-critical performance, particularly the ADS's ability to make optimal decisions and produce optimal paths for autonomous vehicles (AVs), is equally vital to ensure the intelligence and reduce risks of AVs. Currently, there is little work dedica… ▽ More Autonomous Driving System (ADS) testing is crucial in ADS development, with the current primary focus being on safety. However, the evaluation of non-safety-critical performance, particularly the ADS's ability to make optimal decisions and produce optimal paths for autonomous vehicles (AVs), is equally vital to ensure the intelligence and reduce risks of AVs. Currently, there is little work dedicated to assessing ADSs' optimal decision-making performance due to the lack of corresponding oracles and the difficulty in generating scenarios with non-optimal decisions. In this paper, we focus on evaluating the decision-making quality of an ADS and propose the first method for detecting non-optimal decision scenarios (NoDSs), where the ADS does not compute optimal paths for AVs. Firstly, to deal with the oracle problem, we propose a novel metamorphic relation (MR) aimed at exposing violations of optimal decisions. The MR identifies the property that the ADS should retain optimal decisions when the optimal path remains unaffected by non-invasive changes. Subsequently, we develop a new framework, Decictor, designed to generate NoDSs efficiently. Decictor comprises three main components: Non-invasive Mutation, MR Check, and Feedback. The Non-invasive Mutation ensures that the original optimal path in the mutated scenarios is not affected, while the MR Check is responsible for determining whether non-optimal decisions are made. To enhance the effectiveness of identifying NoDSs, we design a feedback metric that combines both spatial and temporal aspects of the AV's movement. We evaluate Decictor on Baidu Apollo, an open-source and production-grade ADS. The experimental results validate the effectiveness of Decictor in detecting non-optimal decisions of ADSs. Our work provides valuable and original insights into evaluating the non-safety-critical performance of ADSs. △ Less

Submitted 28 February, 2024; originally announced February 2024.

arXiv:2402.18302 [pdf, other]

EchoTrack: Auditory Referring Multi-Object Tracking for Autonomous Driving

Authors: Jiacheng Lin, Jiajun Chen, Kunyu Peng, Xuan He, Zhiyong Li, Rainer Stiefelhagen, Kailun Yang

Abstract: This paper introduces the task of Auditory Referring Multi-Object Tracking (AR-MOT), which dynamically tracks specific objects in a video sequence based on audio expressions and appears as a challenging problem in autonomous driving. Due to the lack of semantic modeling capacity in audio and video, existing works have mainly focused on text-based multi-object tracking, which often comes at the cos… ▽ More This paper introduces the task of Auditory Referring Multi-Object Tracking (AR-MOT), which dynamically tracks specific objects in a video sequence based on audio expressions and appears as a challenging problem in autonomous driving. Due to the lack of semantic modeling capacity in audio and video, existing works have mainly focused on text-based multi-object tracking, which often comes at the cost of tracking quality, interaction efficiency, and even the safety of assistance systems, limiting the application of such methods in autonomous driving. In this paper, we delve into the problem of AR-MOT from the perspective of audio-video fusion and audio-video tracking. We put forward EchoTrack, an end-to-end AR-MOT framework with dual-stream vision transformers. The dual streams are intertwined with our Bidirectional Frequency-domain Cross-attention Fusion Module (Bi-FCFM), which bidirectionally fuses audio and video features from both frequency- and spatiotemporal domains. Moreover, we propose the Audio-visual Contrastive Tracking Learning (ACTL) regime to extract homogeneous semantic features between expressions and visual objects by learning homogeneous features between different audio and video objects effectively. Aside from the architectural design, we establish the first set of large-scale AR-MOT benchmarks, including Echo-KITTI, Echo-KITTI+, and Echo-BDD. Extensive experiments on the established benchmarks demonstrate the effectiveness of the proposed EchoTrack and its components. The source code and datasets are available at https://github.com/lab206/EchoTrack. △ Less

Submitted 5 August, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

Comments: Accepted to IEEE Transactions on Intelligent Transportation Systems (T-ITS). The source code and datasets are available at https://github.com/lab206/EchoTrack

arXiv:2402.15481 [pdf, other]

Prejudice and Volatility: A Statistical Framework for Measuring Social Discrimination in Large Language Models

Authors: Y Liu, K Yang, Z Qi, X Liu, Y Yu, C Zhai

Abstract: This study investigates why and how inconsistency in the generation of Large Language Models (LLMs) might induce or exacerbate societal injustice. For instance, LLMs frequently exhibit contrasting gender stereotypes regarding the same career depending on varied contexts, highlighting the arguably harmful unpredictability of LLMs' behavioral patterns. To augment the existing discrimination assessme… ▽ More This study investigates why and how inconsistency in the generation of Large Language Models (LLMs) might induce or exacerbate societal injustice. For instance, LLMs frequently exhibit contrasting gender stereotypes regarding the same career depending on varied contexts, highlighting the arguably harmful unpredictability of LLMs' behavioral patterns. To augment the existing discrimination assessment with the capability to account for variation in LLM generation, we formulate the Prejudice-Volatility Framework (PVF) that precisely defines behavioral metrics for assessing LLMs, which delineate the probability distribution of LLMs' stereotypes from the perspective of token prediction probability. Specifically, we employ a data-mining approach to approximate the possible applied contexts of LLMs and devise statistical metrics to evaluate the corresponding contextualized societal discrimination risk. Further, we mathematically dissect the aggregated discrimination risk of LLMs into prejudice risk, originating from their system bias, and volatility risk, stemming from their generation inconsistency. While initially intended for assessing discrimination in LLMs, our proposed PVF facilitates the comprehensive and flexible measurement of any inductive biases, including knowledge alongside prejudice, across various modality models. We apply PVF to 12 most commonly adopted LLMs and compare their risk levels. Our findings reveal that: i) prejudice risk is the primary cause of discrimination risk in LLMs, indicating that inherent biases in these models lead to stereotypical outputs; ii) most LLMs exhibit significant pro-male stereotypes across nearly all careers; iii) alignment with Reinforcement Learning from Human Feedback lowers discrimination by reducing prejudice, but increases volatility; iv) discrimination risk in LLMs correlates with socio-economic factors like profession salaries. △ Less

Submitted 24 May, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

arXiv:2402.15169 [pdf, ps, other]

Platforms for Efficient and Incentive-Aware Collaboration

Authors: Nika Haghtalab, Mingda Qiao, Kunhe Yang

Abstract: Collaboration is crucial for reaching collective goals. However, its effectiveness is often undermined by the strategic behavior of individual agents -- a fact that is captured by a high Price of Stability (PoS) in recent literature [Blum et al., 2021]. Implicit in the traditional PoS analysis is the assumption that agents have full knowledge of how their tasks relate to one another. We offer a ne… ▽ More Collaboration is crucial for reaching collective goals. However, its effectiveness is often undermined by the strategic behavior of individual agents -- a fact that is captured by a high Price of Stability (PoS) in recent literature [Blum et al., 2021]. Implicit in the traditional PoS analysis is the assumption that agents have full knowledge of how their tasks relate to one another. We offer a new perspective on bringing about efficient collaboration among strategic agents using information design. Inspired by the growing importance of collaboration in machine learning (such as platforms for collaborative federated learning and data cooperatives), we propose a framework where the platform has more information about how the agents' tasks relate to each other than the agents themselves. We characterize how and to what degree such platforms can leverage their information advantage to steer strategic agents toward efficient collaboration. Concretely, we consider collaboration networks where each node is a task type held by one agent, and each task benefits from contributions made in their inclusive neighborhood of tasks. This network structure is known to the agents and the platform, but only the platform knows each agent's real location -- from the agents' perspective, their location is determined by a random permutation. We employ private Bayesian persuasion and design two families of persuasive signaling schemes that the platform can use to ensure a small total workload when agents follow the signal. The first family aims to achieve the minmax optimal approximation ratio compared to the optimal collaboration, which is shown to be $Θ(\sqrt{n})$ for unit-weight graphs, $Θ(n^{2/3})$ for graphs with constant minimum edge weights, and $O(n^{3/4})$ for general weighted graphs. The second family ensures per-instance strict improvement compared to full information disclosure. △ Less

Submitted 23 February, 2024; originally announced February 2024.

arXiv:2402.14650 [pdf, other]

GaussianPro: 3D Gaussian Splatting with Progressive Propagation

Authors: Kai Cheng, Xiaoxiao Long, Kaizhi Yang, Yao Yao, Wei Yin, Yuexin Ma, Wenping Wang, Xuejin Chen

Abstract: The advent of 3D Gaussian Splatting (3DGS) has recently brought about a revolution in the field of neural rendering, facilitating high-quality renderings at real-time speed. However, 3DGS heavily depends on the initialized point cloud produced by Structure-from-Motion (SfM) techniques. When tackling with large-scale scenes that unavoidably contain texture-less surfaces, the SfM techniques always f… ▽ More The advent of 3D Gaussian Splatting (3DGS) has recently brought about a revolution in the field of neural rendering, facilitating high-quality renderings at real-time speed. However, 3DGS heavily depends on the initialized point cloud produced by Structure-from-Motion (SfM) techniques. When tackling with large-scale scenes that unavoidably contain texture-less surfaces, the SfM techniques always fail to produce enough points in these surfaces and cannot provide good initialization for 3DGS. As a result, 3DGS suffers from difficult optimization and low-quality renderings. In this paper, inspired by classical multi-view stereo (MVS) techniques, we propose GaussianPro, a novel method that applies a progressive propagation strategy to guide the densification of the 3D Gaussians. Compared to the simple split and clone strategies used in 3DGS, our method leverages the priors of the existing reconstructed geometries of the scene and patch matching techniques to produce new Gaussians with accurate positions and orientations. Experiments on both large-scale and small-scale scenes validate the effectiveness of our method, where our method significantly surpasses 3DGS on the Waymo dataset, exhibiting an improvement of 1.15dB in terms of PSNR. △ Less

Submitted 22 February, 2024; originally announced February 2024.

Comments: See the project page for code, data: https://kcheng1021.github.io/gaussianpro.github.io

arXiv:2402.13934 [pdf, other]

Do Efficient Transformers Really Save Computation?

Authors: Kai Yang, Jan Ackermann, Zhenyu He, Guhao Feng, Bohang Zhang, Yunzhen Feng, Qiwei Ye, Di He, Liwei Wang

Abstract: As transformer-based language models are trained on increasingly large datasets and with vast numbers of parameters, finding more efficient alternatives to the standard Transformer has become very valuable. While many efficient Transformers and Transformer alternatives have been proposed, none provide theoretical guarantees that they are a suitable replacement for the standard Transformer. This ma… ▽ More As transformer-based language models are trained on increasingly large datasets and with vast numbers of parameters, finding more efficient alternatives to the standard Transformer has become very valuable. While many efficient Transformers and Transformer alternatives have been proposed, none provide theoretical guarantees that they are a suitable replacement for the standard Transformer. This makes it challenging to identify when to use a specific model and what directions to prioritize for further investigation. In this paper, we aim to understand the capabilities and limitations of efficient Transformers, specifically the Sparse Transformer and the Linear Transformer. We focus on their reasoning capability as exhibited by Chain-of-Thought (CoT) prompts and follow previous works to model them as Dynamic Programming (DP) problems. Our results show that while these models are expressive enough to solve general DP tasks, contrary to expectations, they require a model size that scales with the problem size. Nonetheless, we identify a class of DP problems for which these models can be more efficient than the standard Transformer. We confirm our theoretical results through experiments on representative DP tasks, adding to the understanding of efficient Transformers' practical strengths and weaknesses. △ Less

Submitted 21 February, 2024; originally announced February 2024.

arXiv:2402.12659 [pdf, other]

FinBen: A Holistic Financial Benchmark for Large Language Models

Authors: Qianqian Xie, Weiguang Han, Zhengyu Chen, Ruoyu Xiang, Xiao Zhang, Yueru He, Mengxi Xiao, Dong Li, Yongfu Dai, Duanyu Feng, Yijing Xu, Haoqiang Kang, Ziyan Kuang, Chenhan Yuan, Kailai Yang, Zheheng Luo, Tianlin Zhang, Zhiwei Liu, Guojun Xiong, Zhiyang Deng, Yuechen Jiang, Zhiyuan Yao, Haohang Li, Yangyang Yu, Gang Hu , et al. (9 additional authors not shown)

Abstract: LLMs have transformed NLP and shown promise in various fields, yet their potential in finance is underexplored due to a lack of comprehensive evaluation benchmarks, the rapid development of LLMs, and the complexity of financial tasks. In this paper, we introduce FinBen, the first extensive open-source evaluation benchmark, including 36 datasets spanning 24 financial tasks, covering seven critical… ▽ More LLMs have transformed NLP and shown promise in various fields, yet their potential in finance is underexplored due to a lack of comprehensive evaluation benchmarks, the rapid development of LLMs, and the complexity of financial tasks. In this paper, we introduce FinBen, the first extensive open-source evaluation benchmark, including 36 datasets spanning 24 financial tasks, covering seven critical aspects: information extraction (IE), textual analysis, question answering (QA), text generation, risk management, forecasting, and decision-making. FinBen offers several key innovations: a broader range of tasks and datasets, the first evaluation of stock trading, novel agent and Retrieval-Augmented Generation (RAG) evaluation, and three novel open-source evaluation datasets for text summarization, question answering, and stock trading. Our evaluation of 15 representative LLMs, including GPT-4, ChatGPT, and the latest Gemini, reveals several key findings: While LLMs excel in IE and textual analysis, they struggle with advanced reasoning and complex tasks like text generation and forecasting. GPT-4 excels in IE and stock trading, while Gemini is better at text generation and forecasting. Instruction-tuned LLMs improve textual analysis but offer limited benefits for complex tasks such as QA. FinBen has been used to host the first financial LLMs shared task at the FinNLP-AgentScen workshop during IJCAI-2024, attracting 12 teams. Their novel solutions outperformed GPT-4, showcasing FinBen's potential to drive innovation in financial LLMs. All datasets, results, and codes are released for the research community: https://github.com/The-FinAI/PIXIU. △ Less

Submitted 18 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

Comments: 26 pages, 11 figures

arXiv:2402.12620 [pdf, other]

Are Large Language Models (LLMs) Good Social Predictors?

Authors: Kaiqi Yang, Hang Li, Hongzhi Wen, Tai-Quan Peng, Jiliang Tang, Hui Liu

Abstract: The prediction has served as a crucial scientific method in modern social studies. With the recent advancement of Large Language Models (LLMs), efforts have been made to leverage LLMs to predict the human features in social life, such as presidential voting. These works suggest that LLMs are capable of generating human-like responses. However, we find that the promising performance achieved by pre… ▽ More The prediction has served as a crucial scientific method in modern social studies. With the recent advancement of Large Language Models (LLMs), efforts have been made to leverage LLMs to predict the human features in social life, such as presidential voting. These works suggest that LLMs are capable of generating human-like responses. However, we find that the promising performance achieved by previous studies is because of the existence of input shortcut features to the response. In fact, by removing these shortcuts, the performance is reduced dramatically. To further revisit the ability of LLMs, we introduce a novel social prediction task, Soc-PRF Prediction, which utilizes general features as input and simulates real-world social study settings. With the comprehensive investigations on various LLMs, we reveal that LLMs cannot work as expected on social prediction when given general input features without shortcuts. We further investigate possible reasons for this phenomenon that suggest potential ways to enhance LLMs for social prediction. △ Less

Submitted 19 February, 2024; originally announced February 2024.

arXiv:2402.11669 [pdf, other]

Hybrid Kerr-electro-optic frequency combs on thin-film lithium niobate

Authors: Yunxiang Song, Yaowen Hu, Marko Lončar, Kiyoul Yang

Abstract: Optical frequency combs are indispensable links between the optical and microwave domains, enabling a wide range of applications including precision spectroscopy, ultrastable frequency generation, and timekeeping. Chip-scale integration miniaturizes bulk implementations onto photonic chips, offering highly compact, stable, and power-efficient frequency comb sources. State of the art integrated fre… ▽ More Optical frequency combs are indispensable links between the optical and microwave domains, enabling a wide range of applications including precision spectroscopy, ultrastable frequency generation, and timekeeping. Chip-scale integration miniaturizes bulk implementations onto photonic chips, offering highly compact, stable, and power-efficient frequency comb sources. State of the art integrated frequency comb sources are based on resonantly-enhanced Kerr effect and, more recently, on electro-optic effect. While the former can routinely reach octave-spanning bandwidths and the latter feature microwave-rate spacings, achieving both in the same material platform has been challenging. Here, we leverage both strong Kerr nonlinearity and efficient electro-optic phase modulation available in the ultralow-loss thin-film lithium niobate photonic platform, to demonstrate a hybrid Kerr-electro-optic frequency comb with stabilized spacing. In our approach, a dissipative Kerr soliton is first generated, and then electro-optic division is used to realize a frequency comb with 2,589 comb lines spaced by 29.308 GHz and spanning 75.9 THz (588 nm) end-to-end. Further, we demonstrate electronic stabilization and control of the soliton spacing, naturally facilitated by our approach. The broadband, microwave-rate comb in this work overcomes the spacing-span tradeoff that exists in all integrated frequency comb sources, and paves the way towards chip-scale solutions for complex tasks such as laser spectroscopy covering multiple bands, micro- and millimeter-wave generation, and massively parallel optical communications. △ Less

Submitted 18 February, 2024; originally announced February 2024.

arXiv:2402.11060 [pdf, other]

Persona-DB: Efficient Large Language Model Personalization for Response Prediction with Collaborative Data Refinement

Authors: Chenkai Sun, Ke Yang, Revanth Gangi Reddy, Yi R. Fung, Hou Pong Chan, Kevin Small, ChengXiang Zhai, Heng Ji

Abstract: The increasing demand for personalized interactions with large language models (LLMs) calls for methodologies capable of accurately and efficiently identifying user opinions and preferences. Retrieval augmentation emerges as an effective strategy, as it can accommodate a vast number of users without the costs from fine-tuning. Existing research, however, has largely focused on enhancing the retrie… ▽ More The increasing demand for personalized interactions with large language models (LLMs) calls for methodologies capable of accurately and efficiently identifying user opinions and preferences. Retrieval augmentation emerges as an effective strategy, as it can accommodate a vast number of users without the costs from fine-tuning. Existing research, however, has largely focused on enhancing the retrieval stage and devoted limited exploration toward optimizing the representation of the database, a crucial aspect for tasks such as personalization. In this work, we examine the problem from a novel angle, focusing on how data can be better represented for more data-efficient retrieval in the context of LLM customization. To tackle this challenge, we introduce Persona-DB, a simple yet effective framework consisting of a hierarchical construction process to improve generalization across task contexts and collaborative refinement to effectively bridge knowledge gaps among users. In the evaluation of response prediction, Persona-DB demonstrates superior context efficiency in maintaining accuracy with a significantly reduced retrieval size, a critical advantage in scenarios with extensive histories or limited context windows. Our experiments also indicate a marked improvement of over 10% under cold-start scenarios, when users have extremely sparse data. Furthermore, our analysis reveals the increasing importance of collaborative knowledge as the retrieval capacity expands. △ Less

Submitted 20 August, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

arXiv:2402.10197 [pdf, ps, other]

Bulk universality for complex eigenvalues of real non-symmetric random matrices with i.i.d. entries

Authors: Sofiia Dubova, Kevin Yang

Abstract: We consider an ensemble of non-Hermitian matrices with independent identically distributed real entries that have finite moments. We show that its $k$-point correlation function in the bulk away from the real line converges to a universal limit. We consider an ensemble of non-Hermitian matrices with independent identically distributed real entries that have finite moments. We show that its $k$-point correlation function in the bulk away from the real line converges to a universal limit. △ Less

Submitted 26 April, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

Comments: 67 pages, revised version, updated references

MSC Class: 60B20; 15B52

arXiv:2402.09723 [pdf, other]

Efficient Prompt Optimization Through the Lens of Best Arm Identification

Authors: Chengshuai Shi, Kun Yang, Zihan Chen, Jundong Li, Jing Yang, Cong Shen

Abstract: The remarkable instruction-following capability of large language models (LLMs) has sparked a growing interest in automatically finding good prompts, i.e., prompt optimization. Most existing works follow the scheme of selecting from a pre-generated pool of candidate prompts. However, these designs mainly focus on the generation strategy, while limited attention has been paid to the selection metho… ▽ More The remarkable instruction-following capability of large language models (LLMs) has sparked a growing interest in automatically finding good prompts, i.e., prompt optimization. Most existing works follow the scheme of selecting from a pre-generated pool of candidate prompts. However, these designs mainly focus on the generation strategy, while limited attention has been paid to the selection method. Especially, the cost incurred during the selection (e.g., accessing LLM and evaluating the responses) is rarely explicitly considered. To overcome this limitation, this work provides a principled framework, TRIPLE, to efficiently perform prompt selection under an explicit budget constraint. TRIPLE is built on a novel connection established between prompt optimization and fixed-budget best arm identification (BAI-FB) in multi-armed bandits (MAB); thus, it is capable of leveraging the rich toolbox from BAI-FB systematically and also incorporating unique characteristics of prompt optimization. Extensive experiments on multiple well-adopted tasks using various LLMs demonstrate the remarkable performance improvement of TRIPLE over baselines while satisfying the limited budget constraints. As an extension, variants of TRIPLE are proposed to efficiently select examples for few-shot prompts, also achieving superior empirical performance. △ Less

Submitted 30 May, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

arXiv:2402.07747 [pdf, ps, other]

Optimal score estimation via empirical Bayes smoothing

Authors: Andre Wibisono, Yihong Wu, Kaylee Yingxi Yang

Abstract: We study the problem of estimating the score function of an unknown probability distribution $ρ^*$ from $n$ independent and identically distributed observations in $d$ dimensions. Assuming that $ρ^*$ is subgaussian and has a Lipschitz-continuous score function $s^*$, we establish the optimal rate of $\tilde Θ(n^{-\frac{2}{d+4}})$ for this estimation problem under the loss function… ▽ More We study the problem of estimating the score function of an unknown probability distribution $ρ^*$ from $n$ independent and identically distributed observations in $d$ dimensions. Assuming that $ρ^*$ is subgaussian and has a Lipschitz-continuous score function $s^*$, we establish the optimal rate of $\tilde Θ(n^{-\frac{2}{d+4}})$ for this estimation problem under the loss function $\|\hat s - s^*\|^2_{L^2(ρ^*)}$ that is commonly used in the score matching literature, highlighting the curse of dimensionality where sample complexity for accurate score estimation grows exponentially with the dimension $d$. Leveraging key insights in empirical Bayes theory as well as a new convergence rate of smoothed empirical distribution in Hellinger distance, we show that a regularized score estimator based on a Gaussian kernel attains this rate, shown optimal by a matching minimax lower bound. We also discuss extensions to estimating $β$-Hölder continuous scores with $β\leq 1$, as well as the implication of our theory on the sample complexity of score-based generative models. △ Less

Submitted 12 June, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

Comments: COLT 2024; added the new results on extending to beta-Holder scores with beta <= 1

arXiv:2402.07426 [pdf, ps, other]

Computational Aspects of Bayesian Persuasion under Approximate Best Response

Authors: Kunhe Yang, Hanrui Zhang

Abstract: We study Bayesian persuasion under approximate best response, where the receiver may choose any action that is not too much suboptimal given their posterior belief upon receiving the signal. We focus on the computational aspects of the problem, aiming to design algorithms that efficiently compute (almost) optimal strategies for the sender. Despite the absence of the revelation principle -- which h… ▽ More We study Bayesian persuasion under approximate best response, where the receiver may choose any action that is not too much suboptimal given their posterior belief upon receiving the signal. We focus on the computational aspects of the problem, aiming to design algorithms that efficiently compute (almost) optimal strategies for the sender. Despite the absence of the revelation principle -- which has been one of the most powerful tools in Bayesian persuasion -- we design polynomial-time exact algorithms for the problem when either the state space or the action space is small, as well as a quasi-polynomial-time approximation scheme (QPTAS) for the general problem. On the negative side, we show there is no polynomial-time exact algorithm for the general problem unless $\mathsf{P} = \mathsf{NP}$. Our results build on several new algorithmic ideas, which might be useful in other principal-agent problems where robustness is desired. △ Less

Submitted 13 February, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

arXiv:2402.06299 [pdf, other]

A Functional Analysis Approach to Symbolic Regression

Authors: Kirill Antonov, Roman Kalkreuth, Kaifeng Yang, Thomas Bäck, Niki van Stein, Anna V Kononova

Abstract: Symbolic regression (SR) poses a significant challenge for randomized search heuristics due to its reliance on the synthesis of expressions for input-output mappings. Although traditional genetic programming (GP) algorithms have achieved success in various domains, they exhibit limited performance when tree-based representations are used for SR. To address these limitations, we introduce a novel S… ▽ More Symbolic regression (SR) poses a significant challenge for randomized search heuristics due to its reliance on the synthesis of expressions for input-output mappings. Although traditional genetic programming (GP) algorithms have achieved success in various domains, they exhibit limited performance when tree-based representations are used for SR. To address these limitations, we introduce a novel SR approach called Fourier Tree Growing (FTG) that draws insights from functional analysis. This new perspective enables us to perform optimization directly in a different space, thus avoiding intricate symbolic expressions. Our proposed algorithm exhibits significant performance improvements over traditional GP methods on a range of classical one-dimensional benchmarking problems. To identify and explain limiting factors of GP and FTG, we perform experiments on a large-scale polynomials benchmark with high-order polynomials up to degree 100. To the best of the authors' knowledge, this work represents the pioneering application of functional analysis in addressing SR problems. The superior performance of the proposed algorithm and insights into the limitations of GP open the way for further advancing GP for SR and related areas of explainable machine learning. △ Less

Submitted 9 February, 2024; originally announced February 2024.

Comments: 14 pages, 3 figures. Submitted to Genetic and Evolutionary Computation Conference (GECCO-2024)

arXiv:2402.04409 [pdf, other]

Towards Fair, Robust and Efficient Client Contribution Evaluation in Federated Learning

Authors: Meiying Zhang, Huan Zhao, Sheldon Ebron, Kan Yang

Abstract: The performance of clients in Federated Learning (FL) can vary due to various reasons. Assessing the contributions of each client is crucial for client selection and compensation. It is challenging because clients often have non-independent and identically distributed (non-iid) data, leading to potentially noisy or divergent updates. The risk of malicious clients amplifies the challenge especially… ▽ More The performance of clients in Federated Learning (FL) can vary due to various reasons. Assessing the contributions of each client is crucial for client selection and compensation. It is challenging because clients often have non-independent and identically distributed (non-iid) data, leading to potentially noisy or divergent updates. The risk of malicious clients amplifies the challenge especially when there's no access to clients' local data or a benchmark root dataset. In this paper, we introduce a novel method called Fair, Robust, and Efficient Client Assessment (FRECA) for quantifying client contributions in FL. FRECA employs a framework called FedTruth to estimate the global model's ground truth update, balancing contributions from all clients while filtering out impacts from malicious ones. This approach is robust against Byzantine attacks and incorporates a Byzantine-resilient aggregation algorithm. FRECA is also efficient, as it operates solely on local model updates and requires no validation operations or datasets. Our experimental results show that FRECA can accurately and efficiently quantify client contributions in a robust manner. △ Less

Submitted 6 February, 2024; originally announced February 2024.

arXiv:2402.04303 [pdf, other]

Broken Symmetry in Ideal Chern Bands

Authors: Hui Liu, Kang Yang, Ahmed Abouelkomsan, Zhao Liu, Emil J. Bergholtz

Abstract: Recent observations of the fractional anomalous quantum Hall effect in moiré materials have reignited the interest in fractional Chern insulators (FCIs). The chiral limit in which analytic Landau level-like single-particle states form an "ideal" Chern band and local interactions lead to Laughlin-like FCIs at $1/3$ filling, has been very useful for understanding these systems by relating them to th… ▽ More Recent observations of the fractional anomalous quantum Hall effect in moiré materials have reignited the interest in fractional Chern insulators (FCIs). The chiral limit in which analytic Landau level-like single-particle states form an "ideal" Chern band and local interactions lead to Laughlin-like FCIs at $1/3$ filling, has been very useful for understanding these systems by relating them to the lowest Landau level. We show, however, that, even in the idealized chiral limit, a fluctuating quantum geometry is associated with strongly broken symmetries and a phenomenology very different from that of Landau levels. In particular, particle-hole symmetry is strongly violated and e.g. at $2/3$ filling an emergent interaction driven Fermi liquid state with no Landau level counterpart is energetically favoured. In fact, even the exact Laughlin-like zero modes at $1/3$ filling have a non-uniform density tracking the underlying quantum geometry. Moreover, by switching to a Coulomb interaction, the ideal Chern band features charge density wave states with no lowest Landau level counterpart. △ Less

Submitted 26 February, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

Comments: 5 pages + 4 figures, comments are welcome!

arXiv:2402.03680 [pdf, ps, other]

doi 10.1093/mnras/stad3829

"Life" of dust originating from the irregular satellites of Jupiter

Authors: Zhenghan Chen, Kun Yang, Xiaodong Liu

Abstract: The irregular satellites of Jupiter produce dust particles through the impact of interplanetary micrometeoroids. In this paper, the dynamics of these particles is studied by both high-accuracy numerical simulation and analytical theory, in order to learn their transport, final fate, and spatial distribution. The perturbation forces that are considered in our dynamical model include the solar radia… ▽ More The irregular satellites of Jupiter produce dust particles through the impact of interplanetary micrometeoroids. In this paper, the dynamics of these particles is studied by both high-accuracy numerical simulation and analytical theory, in order to learn their transport, final fate, and spatial distribution. The perturbation forces that are considered in our dynamical model include the solar radiation pressure, solar gravity, Poynting-Robertson drag, Jovian oblateness, and the Galilean satellites' gravity. The trajectories of different size particles are simulated until they hit Jupiter, the Galilean satellites, or escape from the Jovian system. The average dynamical lifetimes of dust with different grain sizes are calculated, and the final fate of dust particles is reported and analysed. The steady-state spatial number density of particles is estimated by integrating the trajectories of dust particles over their initial size distribution, and compared to the previous work. The impact sites of dust on Callisto's surface are recorded and provide an important clue for the study of the hemisphere asymmetry of Callisto. Besides, the mass accretion rate, cross-sectional area influx, and mass influx density of dust on Callisto are calculated. A ring outside the orbit of Callisto dominated by dust between 2 and 25 $μ$m from Jupiter's irregular satellites is suggested, with the average normal geometric optical depth of the order of $10^{-8}$ and the configuration of the ring ansae similar to Jupiter's gossamer rings. △ Less

Submitted 5 February, 2024; originally announced February 2024.

Comments: 11 pages, 14 figures

Journal ref: Monthly Notices of the Royal Astronomical Society, 2024, 527(4): 11327-11337

arXiv:2402.02916 [pdf, ps, other]

On bilinear Strichartz estimates on waveguides with applications

Authors: Yangkendi Deng, Chenjie Fan, Kailong Yang, Zehua Zhao, Jiqiang Zheng

Abstract: We study local-in-time and global-in-time bilinear Strichartz estimates for the Schrödinger equation on waveguides. As applications, we apply those estimates to study global well-posedness of nonlinear Schrödinger equations on these waveguides. We study local-in-time and global-in-time bilinear Strichartz estimates for the Schrödinger equation on waveguides. As applications, we apply those estimates to study global well-posedness of nonlinear Schrödinger equations on these waveguides. △ Less

Submitted 29 June, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

Comments: 23 pages, 2 figures

arXiv:2402.01728 [pdf, other]

Hardware Phi-1.5B: A Large Language Model Encodes Hardware Domain Specific Knowledge

Authors: Weimin Fu, Shijie Li, Yifang Zhao, Haocheng Ma, Raj Dutta, Xuan Zhang, Kaichen Yang, Yier Jin, Xiaolong Guo

Abstract: In the rapidly evolving semiconductor industry, where research, design, verification, and manufacturing are intricately linked, the potential of Large Language Models to revolutionize hardware design and security verification is immense. The primary challenge, however, lies in the complexity of hardware specific issues that are not adequately addressed by the natural language or software code know… ▽ More In the rapidly evolving semiconductor industry, where research, design, verification, and manufacturing are intricately linked, the potential of Large Language Models to revolutionize hardware design and security verification is immense. The primary challenge, however, lies in the complexity of hardware specific issues that are not adequately addressed by the natural language or software code knowledge typically acquired during the pretraining stage. Additionally, the scarcity of datasets specific to the hardware domain poses a significant hurdle in developing a foundational model. Addressing these challenges, this paper introduces Hardware Phi 1.5B, an innovative large language model specifically tailored for the hardware domain of the semiconductor industry. We have developed a specialized, tiered dataset comprising small, medium, and large subsets and focused our efforts on pretraining using the medium dataset. This approach harnesses the compact yet efficient architecture of the Phi 1.5B model. The creation of this first pretrained, hardware domain specific large language model marks a significant advancement, offering improved performance in hardware design and verification tasks and illustrating a promising path forward for AI applications in the semiconductor sector. △ Less

Submitted 27 January, 2024; originally announced February 2024.

Comments: 6 pages, 6 figures

Journal ref: 29th IEEE/ACM Asia and South Pacific Design Automation Conference (ASP-DAC); 2024 January; Incheon Songdo Convensia, South Korea

arXiv:2402.01568 [pdf, other]

Doping Liquid Argon with Xenon in ProtoDUNE Single-Phase: Effects on Scintillation Light

Authors: DUNE Collaboration, A. Abed Abud, B. Abi, R. Acciarri, M. A. Acero, M. R. Adames, G. Adamov, M. Adamowski, D. Adams, M. Adinolfi, C. Adriano, A. Aduszkiewicz, J. Aguilar, B. Aimard, F. Akbar, K. Allison, S. Alonso Monsalve, M. Alrashed, A. Alton, R. Alvarez, H. Amar Es-sghir, P. Amedo, J. Anderson, D. A. Andrade, C. Andreopoulos , et al. (1297 additional authors not shown)

Abstract: Doping of liquid argon TPCs (LArTPCs) with a small concentration of xenon is a technique for light-shifting and facilitates the detection of the liquid argon scintillation light. In this paper, we present the results of the first doping test ever performed in a kiloton-scale LArTPC. From February to May 2020, we carried out this special run in the single-phase DUNE Far Detector prototype (ProtoDUN… ▽ More Doping of liquid argon TPCs (LArTPCs) with a small concentration of xenon is a technique for light-shifting and facilitates the detection of the liquid argon scintillation light. In this paper, we present the results of the first doping test ever performed in a kiloton-scale LArTPC. From February to May 2020, we carried out this special run in the single-phase DUNE Far Detector prototype (ProtoDUNE-SP) at CERN, featuring 720 t of total liquid argon mass with 410 t of fiducial mass. A 5.4 ppm nitrogen contamination was present during the xenon doping campaign. The goal of the run was to measure the light and charge response of the detector to the addition of xenon, up to a concentration of 18.8 ppm. The main purpose was to test the possibility for reduction of non-uniformities in light collection, caused by deployment of photon detectors only within the anode planes. Light collection was analysed as a function of the xenon concentration, by using the pre-existing photon detection system (PDS) of ProtoDUNE-SP and an additional smaller set-up installed specifically for this run. In this paper we first summarize our current understanding of the argon-xenon energy transfer process and the impact of the presence of nitrogen in argon with and without xenon dopant. We then describe the key elements of ProtoDUNE-SP and the injection method deployed. Two dedicated photon detectors were able to collect the light produced by xenon and the total light. The ratio of these components was measured to be about 0.65 as 18.8 ppm of xenon were injected. We performed studies of the collection efficiency as a function of the distance between tracks and light detectors, demonstrating enhanced uniformity of response for the anode-mounted PDS. We also show that xenon doping can substantially recover light losses due to contamination of the liquid argon by nitrogen. △ Less

Submitted 2 August, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

Comments: 36 pages, 20 figures. Corrected author list; corrected typos across paper and polished text

Report number: CERN-EP-2024-024; FERMILAB-PUB-23-0819-LBNF

arXiv:2402.00744 [pdf, other]

BATON: Aligning Text-to-Audio Model with Human Preference Feedback

Authors: Huan Liao, Haonan Han, Kai Yang, Tianjiao Du, Rui Yang, Zunnan Xu, Qinmei Xu, Jingquan Liu, Jiasheng Lu, Xiu Li

Abstract: With the development of AI-Generated Content (AIGC), text-to-audio models are gaining widespread attention. However, it is challenging for these models to generate audio aligned with human preference due to the inherent information density of natural language and limited model understanding ability. To alleviate this issue, we formulate the BATON, a framework designed to enhance the alignment betw… ▽ More With the development of AI-Generated Content (AIGC), text-to-audio models are gaining widespread attention. However, it is challenging for these models to generate audio aligned with human preference due to the inherent information density of natural language and limited model understanding ability. To alleviate this issue, we formulate the BATON, a framework designed to enhance the alignment between generated audio and text prompt using human preference feedback. Our BATON comprises three key stages: Firstly, we curated a dataset containing both prompts and the corresponding generated audio, which was then annotated based on human feedback. Secondly, we introduced a reward model using the constructed dataset, which can mimic human preference by assigning rewards to input text-audio pairs. Finally, we employed the reward model to fine-tune an off-the-shelf text-to-audio model. The experiment results demonstrate that our BATON can significantly improve the generation quality of the original text-to-audio models, concerning audio integrity, temporal relationship, and alignment with human preference. △ Less

Submitted 1 February, 2024; originally announced February 2024.

arXiv:2401.17837 [pdf, ps, other]

Safe Reinforcement Learning-Based Eco-Driving Control for Mixed Traffic Flows With Disturbances

Authors: Ke Lu, Dongjun Li, Qun Wang, Kaidi Yang, Lin Zhao, Ziyou Song

Abstract: This paper presents a safe learning-based eco-driving framework tailored for mixed traffic flows, which aims to optimize energy efficiency while guaranteeing safety during real-system operations. Even though reinforcement learning (RL) is capable of optimizing energy efficiency in intricate environments, it is challenged by safety requirements during the training process. The lack of safety guaran… ▽ More This paper presents a safe learning-based eco-driving framework tailored for mixed traffic flows, which aims to optimize energy efficiency while guaranteeing safety during real-system operations. Even though reinforcement learning (RL) is capable of optimizing energy efficiency in intricate environments, it is challenged by safety requirements during the training process. The lack of safety guarantees is the other concern when deploying a trained policy in real-world application. Compared with RL, model predicted control (MPC) can handle constrained dynamics systems, ensuring safe driving. However, the major challenges lie in complicated eco-driving tasks and the presence of disturbances, which respectively challenge the MPC design and the satisfaction of constraints. To address these limitations, the proposed framework incorporates the tube-based enhanced MPC (RMPC) to ensure the safe execution of the RL policy under disturbances, thereby improving the control robustness. RL not only optimizes the energy efficiency of the connected and automated vehicle in mixed traffic but also handles more uncertain scenarios, in which the energy consumption of the human-driven vehicle and its diverse and stochastic driving behaviors are considered in the optimization framework. Simulation results demonstrate that the proposed algorithm, compared with RMPC technique, shows an average improvement of 10.88% in holistic energy efficiency, while compared with RL algorithm, it effectively prevents inter-vehicle collisions. △ Less

Submitted 31 January, 2024; originally announced January 2024.

arXiv:2401.16923 [pdf, other]

Fourier Prompt Tuning for Modality-Incomplete Scene Segmentation

Authors: Ruiping Liu, Jiaming Zhang, Kunyu Peng, Yufan Chen, Ke Cao, Junwei Zheng, M. Saquib Sarfraz, Kailun Yang, Rainer Stiefelhagen

Abstract: Integrating information from multiple modalities enhances the robustness of scene perception systems in autonomous vehicles, providing a more comprehensive and reliable sensory framework. However, the modality incompleteness in multi-modal segmentation remains under-explored. In this work, we establish a task called Modality-Incomplete Scene Segmentation (MISS), which encompasses both system-level… ▽ More Integrating information from multiple modalities enhances the robustness of scene perception systems in autonomous vehicles, providing a more comprehensive and reliable sensory framework. However, the modality incompleteness in multi-modal segmentation remains under-explored. In this work, we establish a task called Modality-Incomplete Scene Segmentation (MISS), which encompasses both system-level modality absence and sensor-level modality errors. To avoid the predominant modality reliance in multi-modal fusion, we introduce a Missing-aware Modal Switch (MMS) strategy to proactively manage missing modalities during training. Utilizing bit-level batch-wise sampling enhances the model's performance in both complete and incomplete testing scenarios. Furthermore, we introduce the Fourier Prompt Tuning (FPT) method to incorporate representative spectral information into a limited number of learnable prompts that maintain robustness against all MISS scenarios. Akin to fine-tuning effects but with fewer tunable parameters (1.1%). Extensive experiments prove the efficacy of our proposed approach, showcasing an improvement of 5.84% mIoU over the prior state-of-the-art parameter-efficient methods in modality missing. The source code is publicly available at https://github.com/RuipingL/MISS. △ Less

Submitted 10 April, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

Comments: Accepted to IEEE IV 2024. The source code is publicly available at https://github.com/RuipingL/MISS

arXiv:2401.16712 [pdf, other]

LF Tracy: A Unified Single-Pipeline Approach for Salient Object Detection in Light Field Cameras

Authors: Fei Teng, Jiaming Zhang, Jiawei Liu, Kunyu Peng, Xina Cheng, Zhiyong Li, Kailun Yang

Abstract: Leveraging rich information is crucial for dense prediction tasks. Light field (LF) cameras are instrumental in this regard, as they allow data to be sampled from various perspectives. This capability provides valuable spatial, depth, and angular information, enhancing scene-parsing tasks. However, we have identified two overlooked issues for the LF salient object detection (SOD) task. (1): Previo… ▽ More Leveraging rich information is crucial for dense prediction tasks. Light field (LF) cameras are instrumental in this regard, as they allow data to be sampled from various perspectives. This capability provides valuable spatial, depth, and angular information, enhancing scene-parsing tasks. However, we have identified two overlooked issues for the LF salient object detection (SOD) task. (1): Previous approaches predominantly employ a customized two-stream design to discover the spatial and depth features within light field images. The network struggles to learn the implicit angular information between different images due to a lack of intra-network data connectivity. (2): Little research has been directed towards the data augmentation strategy for LF SOD. Research on inter-network data connectivity is scant. In this study, we propose an efficient paradigm (LF Tracy) to address those issues. This comprises a single-pipeline encoder paired with a highly efficient information aggregation (IA) module (around 8M parameters) to establish an intra-network connection. Then, a simple yet effective data augmentation strategy called MixLD is designed to bridge the inter-network connections. Owing to this innovative paradigm, our model surpasses the existing state-of-the-art method through extensive experiments. Especially, LF Tracy demonstrates a 23% improvement over previous results on the latest large-scale PKU dataset. The source code is publicly available at: https://github.com/FeiBryantkit/LF-Tracy. △ Less

Submitted 26 August, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

Comments: Accepted to ICPR 2024. The source code is publicly available at: https://github.com/FeiBryantkit/LF-Tracy

arXiv:2401.16700 [pdf, other]

Towards Precise 3D Human Pose Estimation with Multi-Perspective Spatial-Temporal Relational Transformers

Authors: Jianbin Jiao, Xina Cheng, Weijie Chen, Xiaoting Yin, Hao Shi, Kailun Yang

Abstract: 3D human pose estimation captures the human joint points in three-dimensional space while keeping the depth information and physical structure. That is essential for applications that require precise pose information, such as human-computer interaction, scene understanding, and rehabilitation training. Due to the challenges in data collection, mainstream datasets of 3D human pose estimation are pr… ▽ More 3D human pose estimation captures the human joint points in three-dimensional space while keeping the depth information and physical structure. That is essential for applications that require precise pose information, such as human-computer interaction, scene understanding, and rehabilitation training. Due to the challenges in data collection, mainstream datasets of 3D human pose estimation are primarily composed of multi-view video data collected in laboratory environments, which contains rich spatial-temporal correlation information besides the image frame content. Given the remarkable self-attention mechanism of transformers, capable of capturing the spatial-temporal correlation from multi-view video datasets, we propose a multi-stage framework for 3D sequence-to-sequence (seq2seq) human pose detection. Firstly, the spatial module represents the human pose feature by intra-image content, while the frame-image relation module extracts temporal relationships and 3D spatial positional relationship features between the multi-perspective images. Secondly, the self-attention mechanism is adopted to eliminate the interference from non-human body parts and reduce computing resources. Our method is evaluated on Human3.6M, a popular 3D human pose detection dataset. Experimental results demonstrate that our approach achieves state-of-the-art performance on this dataset. The source code will be available at https://github.com/WUJINHUAN/3D-human-pose. △ Less

Submitted 25 March, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

Comments: Accepted to IJCNN 2024. The source code will be available at https://github.com/WUJINHUAN/3D-human-pose

arXiv:2401.16448 [pdf, other]

doi 10.1109/AsianHOST59942.2023.10409307

LLM4SecHW: Leveraging Domain Specific Large Language Model for Hardware Debugging

Authors: Weimin Fu, Kaichen Yang, Raj Gautam Dutta, Xiaolong Guo, Gang Qu

Abstract: This paper presents LLM4SecHW, a novel framework for hardware debugging that leverages domain specific Large Language Model (LLM). Despite the success of LLMs in automating various software development tasks, their application in the hardware security domain has been limited due to the constraints of commercial LLMs and the scarcity of domain specific data. To address these challenges, we propose… ▽ More This paper presents LLM4SecHW, a novel framework for hardware debugging that leverages domain specific Large Language Model (LLM). Despite the success of LLMs in automating various software development tasks, their application in the hardware security domain has been limited due to the constraints of commercial LLMs and the scarcity of domain specific data. To address these challenges, we propose a unique approach to compile a dataset of open source hardware design defects and their remediation steps, utilizing version control data. This dataset provides a substantial foundation for training machine learning models for hardware. LLM4SecHW employs fine tuning of medium sized LLMs based on this dataset, enabling the identification and rectification of bugs in hardware designs. This pioneering approach offers a reference workflow for the application of fine tuning domain specific LLMs in other research areas. We evaluate the performance of our proposed system on various open source hardware designs, demonstrating its efficacy in accurately identifying and correcting defects. Our work brings a new perspective on automating the quality control process in hardware design. △ Less

Submitted 28 January, 2024; originally announced January 2024.

Comments: 6 pages. 1 figure

Journal ref: 2023 Asian Hardware Oriented Security and Trust Symposium (AsianHOST), Tianjin, China, 2023, pp. 1-6

arXiv:2401.16421 [pdf, other]

Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation

Authors: Zhenyu He, Guhao Feng, Shengjie Luo, Kai Yang, Liwei Wang, Jingjing Xu, Zhi Zhang, Hongxia Yang, Di He

Abstract: In this work, we leverage the intrinsic segmentation of language sequences and design a new positional encoding method called Bilevel Positional Encoding (BiPE). For each position, our BiPE blends an intra-segment encoding and an inter-segment encoding. The intra-segment encoding identifies the locations within a segment and helps the model capture the semantic information therein via absolute pos… ▽ More In this work, we leverage the intrinsic segmentation of language sequences and design a new positional encoding method called Bilevel Positional Encoding (BiPE). For each position, our BiPE blends an intra-segment encoding and an inter-segment encoding. The intra-segment encoding identifies the locations within a segment and helps the model capture the semantic information therein via absolute positional encoding. The inter-segment encoding specifies the segment index, models the relationships between segments, and aims to improve extrapolation capabilities via relative positional encoding. Theoretical analysis shows this disentanglement of positional information makes learning more effective. The empirical results also show that our BiPE has superior length extrapolation capabilities across a wide range of tasks in diverse text modalities. △ Less

Submitted 17 June, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

Comments: 17 pages, 7 figures, 8 tables; ICML 2024 Camera Ready version; Code: https://github.com/zhenyuhe00/BiPE

arXiv:2401.16154 [pdf, other]

doi 10.1088/0256-307X/40/7/077301

Superexchange interactions and magnetic anisotropy in MnPSe$_3$ monolayer

Authors: Guangyu Wang, Ke Yang, Yaozhenghang Ma, Lu Liu, Di Lu, Yuxuan Zhou, Hua Wu

Abstract: Two-dimensional van der Waals magnetic materials are of great current interest for their promising applications in spintronics. In this work, using density functional theory calculations in combination with the maximally localized Wannier functions method and the magnetic anisotropy analyses, we study the electronic and magnetic properties of MnPSe$_3$ monolayer. Our results show that it is a char… ▽ More Two-dimensional van der Waals magnetic materials are of great current interest for their promising applications in spintronics. In this work, using density functional theory calculations in combination with the maximally localized Wannier functions method and the magnetic anisotropy analyses, we study the electronic and magnetic properties of MnPSe$_3$ monolayer. Our results show that it is a charge transfer antiferromagnetic (AF) insulator. For this Mn$^{2+}$ $3d^5$ system, although it seems straightforward to explain the AF ground state using the direct exchange, we find that the near 90$^\circ$ Mn-Se-Mn charge transfer type superexchange plays a dominant role in stabilizing the AF ground state. Moreover, our results indicate that although the shape anisotropy favors an out-of-plane spin orientation, the spin-orbit coupling (SOC) leads to the experimentally observed in-plane spin orientation. We prove that the actual dominant contribution to the magnetic anisotropy comes from the second-order perturbation of the SOC, by analyzing its distribution over the reciprocal space. Using the AF exchange and anisotropy parameters obtained from our calculations, our Monte Carlo simulations give the Néel temperature $T_{\rm N}=47$ K for MnPSe$_3$ monolayer, which agrees with the experimental 40 K. Furthermore, our calculations show that under a uniaxial tensile (compressive) strain, Néel vector would be parallel (perpendicular) to the strain direction, which well reproduces the recent experiments. We also predict that $T_{\rm N}$ would be increased by a compressive strain. △ Less

Submitted 29 January, 2024; originally announced January 2024.

Comments: 8 pages, 9 figures

Journal ref: Chinese Phys. Lett. 40 077301 (2023)

arXiv:2401.15561 [pdf, other]

A Parameter Privacy-Preserving Strategy for Mixed-Autonomy Platoon Control

Authors: Jingyuan Zhou, Kaidi Yang

Abstract: It has been demonstrated that leading cruise control (LCC) can improve the operation of mixed-autonomy platoons by allowing connected and automated vehicles (CAVs) to make longitudinal control decisions based on the information provided by surrounding vehicles. However, LCC generally requires surrounding human-driven vehicles (HDVs) to share their real-time states, which can be used by adversaries… ▽ More It has been demonstrated that leading cruise control (LCC) can improve the operation of mixed-autonomy platoons by allowing connected and automated vehicles (CAVs) to make longitudinal control decisions based on the information provided by surrounding vehicles. However, LCC generally requires surrounding human-driven vehicles (HDVs) to share their real-time states, which can be used by adversaries to infer drivers' car-following behavior, potentially leading to financial losses or safety concerns. This paper aims to address such privacy concerns and protect the behavioral characteristics of HDVs by devising a parameter privacy-preserving approach for mixed-autonomy platoon control. First, we integrate a parameter privacy filter into LCC to protect sensitive car-following parameters. The privacy filter allows each vehicle to generate seemingly realistic pseudo states by distorting the true parameters to pseudo parameters, which can protect drivers' privacy in behavioral parameters without significantly influencing the control performance. Second, to enhance the practicality and reliability of the privacy filter within LCC, we first extend the current approach to accommodate continuous parameter spaces through a neural network estimator. Subsequently, we introduce an individual-level parameter privacy preservation constraint, focusing on the privacy level of each individual parameter pair, further enhancing the approach's reliability. Third, analysis of head-to-tail string stability reveals the potential impact of privacy filters in degrading mixed traffic flow performance. Simulation shows that this approach can effectively trade off privacy and control performance in LCC. We further demonstrate the benefit of such an approach in networked systems, i.e., by applying the privacy filter to a proceeding vehicle, one can also achieve a certain level of privacy for the following vehicle. △ Less

Submitted 27 January, 2024; originally announced January 2024.

arXiv:2401.12888 [pdf, other]

Data-Centric Evolution in Autonomous Driving: A Comprehensive Survey of Big Data System, Data Mining, and Closed-Loop Technologies

Authors: Lincan Li, Wei Shao, Wei Dong, Yijun Tian, Qiming Zhang, Kaixiang Yang, Wenjie Zhang

Abstract: The aspiration of the next generation's autonomous driving (AD) technology relies on the dedicated integration and interaction among intelligent perception, prediction, planning, and low-level control. There has been a huge bottleneck regarding the upper bound of autonomous driving algorithm performance, a consensus from academia and industry believes that the key to surmount the bottleneck lies i… ▽ More The aspiration of the next generation's autonomous driving (AD) technology relies on the dedicated integration and interaction among intelligent perception, prediction, planning, and low-level control. There has been a huge bottleneck regarding the upper bound of autonomous driving algorithm performance, a consensus from academia and industry believes that the key to surmount the bottleneck lies in data-centric autonomous driving technology. Recent advancement in AD simulation, closed-loop model training, and AD big data engine have gained some valuable experience. However, there is a lack of systematic knowledge and deep understanding regarding how to build efficient data-centric AD technology for AD algorithm self-evolution and better AD big data accumulation. To fill in the identified research gaps, this article will closely focus on reviewing the state-of-the-art data-driven autonomous driving technologies, with an emphasis on the comprehensive taxonomy of autonomous driving datasets characterized by milestone generations, key features, data acquisition settings, etc. Furthermore, we provide a systematic review of the existing benchmark closed-loop AD big data pipelines from the industrial frontier, including the procedure of closed-loop frameworks, key technologies, and empirical studies. Finally, the future directions, potential applications, limitations and concerns are discussed to arouse efforts from both academia and industry for promoting the further development of autonomous driving. The project repository is available at: https://github.com/LincanLi98/Awesome-Data-Centric-Autonomous-Driving. △ Less

Submitted 26 January, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

arXiv:2401.12852 [pdf, other]

Control-Aware Trajectory Predictions for Communication-Efficient Drone Swarm Coordination in Cluttered Environments

Authors: Longhao Yan, Jingyuan Zhou, Kaidi Yang

Abstract: Swarms of Unmanned Aerial Vehicles (UAV) have demonstrated enormous potential in many industrial and commercial applications. However, before deploying UAVs in the real world, it is essential to ensure they can operate safely in complex environments, especially with limited communication capabilities. To address this challenge, we propose a control-aware learning-based trajectory prediction algori… ▽ More Swarms of Unmanned Aerial Vehicles (UAV) have demonstrated enormous potential in many industrial and commercial applications. However, before deploying UAVs in the real world, it is essential to ensure they can operate safely in complex environments, especially with limited communication capabilities. To address this challenge, we propose a control-aware learning-based trajectory prediction algorithm that can enable communication-efficient UAV swarm control in a cluttered environment. Specifically, our proposed algorithm can enable each UAV to predict the planned trajectories of its neighbors in scenarios with various levels of communication capabilities. The predicted planned trajectories will serve as input to a distributed model predictive control (DMPC) approach. The proposed algorithm combines (1) a trajectory compression and reconstruction model based on Variational Auto-Encoder, (2) a trajectory prediction model based on EvolveGCN, a graph convolutional network (GCN) that can handle dynamic graphs, and (3) a KKT-informed training approach that applies the Karush-Kuhn-Tucker (KKT) conditions in the training process to encode DMPC information into the trained neural network. We evaluate our proposed algorithm in a funnel-like environment. Results show that the proposed algorithm outperforms state-of-the-art benchmarks, providing close-to-optimal control performance and robustness to limited communication capabilities and measurement noises. △ Less

Submitted 23 January, 2024; originally announced January 2024.

Comments: 15 pages, 15 figures, submitted to IEEE Transactions on Intelligent Vehicles

ACM Class: I.2.9

arXiv:2401.11836 [pdf, other]

Privacy-Preserving Data Fusion for Traffic State Estimation: A Vertical Federated Learning Approach

Authors: Qiqing Wang, Kaidi Yang

Abstract: This paper proposes a privacy-preserving data fusion method for traffic state estimation (TSE). Unlike existing works that assume all data sources to be accessible by a single trusted party, we explicitly address data privacy concerns that arise in the collaboration and data sharing between multiple data owners, such as municipal authorities (MAs) and mobility providers (MPs). To this end, we prop… ▽ More This paper proposes a privacy-preserving data fusion method for traffic state estimation (TSE). Unlike existing works that assume all data sources to be accessible by a single trusted party, we explicitly address data privacy concerns that arise in the collaboration and data sharing between multiple data owners, such as municipal authorities (MAs) and mobility providers (MPs). To this end, we propose a novel vertical federated learning (FL) approach, FedTSE, that enables multiple data owners to collaboratively train and apply a TSE model without having to exchange their private data. To enhance the applicability of the proposed FedTSE in common TSE scenarios with limited availability of ground-truth data, we further propose a privacy-preserving physics-informed FL approach, i.e., FedTSE-PI, that integrates traffic models into FL. Real-world data validation shows that the proposed methods can protect privacy while yielding similar accuracy to the oracle method without privacy considerations. △ Less

Submitted 22 January, 2024; originally announced January 2024.

arXiv:2401.11409 [pdf, other]

Robust Beamforming for Downlink Multi-Cell Systems: A Bilevel Optimization Perspective

Authors: Xingdi Chen, Yu Xiong, Kai Yang

Abstract: Utilization of inter-base station cooperation for information processing has shown great potential in enhancing the overall quality of communication services (QoS) in wireless communication networks. Nevertheless, such cooperations require the knowledge of channel state information (CSI) at base stations (BSs), which is assumed to be perfectly known. However, CSI errors are inevitable in practice… ▽ More Utilization of inter-base station cooperation for information processing has shown great potential in enhancing the overall quality of communication services (QoS) in wireless communication networks. Nevertheless, such cooperations require the knowledge of channel state information (CSI) at base stations (BSs), which is assumed to be perfectly known. However, CSI errors are inevitable in practice which necessitates beamforming techniques that can achieve robust performance in the presence of channel estimation errors. Existing approaches relax the robust beamforming design problems into semidefinite programming (SDP), which can only achieve a solution that is far from being optimal. To this end, this paper views robust beamforming design problems from a bilevel optimization perspective. In particular, we focus on maximizing the worst-case weighted sum-rate (WSR) in the downlink multi-cell multi-user multiple-input single-output (MISO) system considering bounded CSI errors. We first reformulate this problem into a bilevel optimization problem and then develop an efficient algorithm based on the cutting plane method. A distributed optimization algorithm has also been developed to facilitate the parallel processing in practical settings. Numerical results are provided to confirm the effectiveness of the proposed algorithm in terms of performance and complexity, particularly in the presence of CSI uncertainties. △ Less

Submitted 21 January, 2024; originally announced January 2024.

Comments: accepted at AAAI2024

arXiv:2401.11148 [pdf, other]

doi 10.1109/TIV.2024.3373512

Enhancing System-Level Safety in Mixed-Autonomy Platoon via Safe Reinforcement Learning

Authors: Jingyuan Zhou, Longhao Yan, Kaidi Yang

Abstract: Connected and automated vehicles (CAVs) have recently gained prominence in traffic research due to advances in communication technology and autonomous driving. Various longitudinal control strategies for CAVs have been developed to enhance traffic efficiency, stability, and safety in mixed-autonomy scenarios. Deep reinforcement learning (DRL) is one promising strategy for mixed-autonomy platoon co… ▽ More Connected and automated vehicles (CAVs) have recently gained prominence in traffic research due to advances in communication technology and autonomous driving. Various longitudinal control strategies for CAVs have been developed to enhance traffic efficiency, stability, and safety in mixed-autonomy scenarios. Deep reinforcement learning (DRL) is one promising strategy for mixed-autonomy platoon control, thanks to its capability of managing complex scenarios in real time after sufficient offline training. However, there are three research gaps for DRL-based mixed-autonomy platoon control: (i) the lack of theoretical collision-free guarantees, (ii) the widely adopted but impractical assumption of skilled and rational drivers who will not collide with preceding vehicles, and (iii) the strong assumption of a known human driver model. To address these research gaps, we propose a safe DRL-based controller that can provide a system-level safety guarantee for mixed-autonomy platoon control. First, we combine control barrier function (CBF)-based safety constraints and DRL via a quadratic programming (QP)-based differentiable neural network layer to provide theoretical safety guarantees. Second, we incorporate system-level safety constraints into our proposed method to account for the safety of both CAVs and the following HDVs to address the potential collisions due to irrational human driving behavior. Third, we devise a learning-based system identification approach to estimate the unknown human car-following behavior in the real system. Simulation results demonstrate that our proposed method effectively ensures CAV safety and improves HDV safety in mixed platoon environments while simultaneously enhancing traffic capacity and string stability. △ Less

Submitted 1 March, 2024; v1 submitted 20 January, 2024; originally announced January 2024.

Comments: IEEE Transactions on Intelligent Vehicles (2024)

arXiv:2401.09793 [pdf, other]

PatchAD: A Lightweight Patch-based MLP-Mixer for Time Series Anomaly Detection

Authors: Zhijie Zhong, Zhiwen Yu, Yiyuan Yang, Weizheng Wang, Kaixiang Yang

Abstract: Anomaly detection in time series analysis is a pivotal task, yet it poses the challenge of discerning normal and abnormal patterns in label-deficient scenarios. While prior studies have largely employed reconstruction-based approaches, which limits the models' representational capacities. Moreover, existing deep learning-based methods are not sufficiently lightweight. Addressing these issues, we p… ▽ More Anomaly detection in time series analysis is a pivotal task, yet it poses the challenge of discerning normal and abnormal patterns in label-deficient scenarios. While prior studies have largely employed reconstruction-based approaches, which limits the models' representational capacities. Moreover, existing deep learning-based methods are not sufficiently lightweight. Addressing these issues, we present PatchAD, our novel, highly efficient multiscale patch-based MLP-Mixer architecture that utilizes contrastive learning for representation extraction and anomaly detection. With its four distinct MLP Mixers and innovative dual project constraint module, PatchAD mitigates potential model degradation and offers a lightweight solution, requiring only $3.2$MB. Its efficacy is demonstrated by state-of-the-art results across $9$ datasets sourced from different application scenarios, outperforming over $30$ comparative algorithms. PatchAD significantly improves the classical F1 score by $50.5\%$, the Aff-F1 score by $7.8\%$, and the AUC by $10.0\%$. The code is publicly available. \url{https://github.com/EmorZz1G/PatchAD} △ Less

Submitted 28 May, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

Comments: 22 pages, 11 figures, 14 tables, Under review

arXiv:2401.09750 [pdf, other]

Exploration and Anti-Exploration with Distributional Random Network Distillation

Authors: Kai Yang, Jian Tao, Jiafei Lyu, Xiu Li

Abstract: Exploration remains a critical issue in deep reinforcement learning for an agent to attain high returns in unknown environments. Although the prevailing exploration Random Network Distillation (RND) algorithm has been demonstrated to be effective in numerous environments, it often needs more discriminative power in bonus allocation. This paper highlights the "bonus inconsistency" issue within RND,… ▽ More Exploration remains a critical issue in deep reinforcement learning for an agent to attain high returns in unknown environments. Although the prevailing exploration Random Network Distillation (RND) algorithm has been demonstrated to be effective in numerous environments, it often needs more discriminative power in bonus allocation. This paper highlights the "bonus inconsistency" issue within RND, pinpointing its primary limitation. To address this issue, we introduce the Distributional RND (DRND), a derivative of the RND. DRND enhances the exploration process by distilling a distribution of random networks and implicitly incorporating pseudo counts to improve the precision of bonus allocation. This refinement encourages agents to engage in more extensive exploration. Our method effectively mitigates the inconsistency issue without introducing significant computational overhead. Both theoretical analysis and experimental results demonstrate the superiority of our approach over the original RND algorithm. Our method excels in challenging online exploration scenarios and effectively serves as an anti-exploration mechanism in D4RL offline tasks. Our code is publicly available at https://github.com/yk7333/DRND. △ Less

Submitted 19 May, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

Comments: ICML 2024 accepted

arXiv:2401.09549 [pdf, other]

Interferometric Single-Shot Parity Measurement in an InAs-Al Hybrid Device

Authors: Morteza Aghaee, Alejandro Alcaraz Ramirez, Zulfi Alam, Rizwan Ali, Mariusz Andrzejczuk, Andrey Antipov, Mikhail Astafev, Amin Barzegar, Bela Bauer, Jonathan Becker, Umesh Kumar Bhaskar, Alex Bocharov, Srini Boddapati, David Bohn, Jouri Bommer, Leo Bourdet, Arnaud Bousquet, Samuel Boutin, Lucas Casparis, Benjamin James Chapman, Sohail Chatoor, Anna Wulff Christensen, Cassandra Chua, Patrick Codd, William Cole , et al. (137 additional authors not shown)

Abstract: The fusion of non-Abelian anyons or topological defects is a fundamental operation in measurement-only topological quantum computation. In topological superconductors, this operation amounts to a determination of the shared fermion parity of Majorana zero modes. As a step towards this, we implement a single-shot interferometric measurement of fermion parity in indium arsenide-aluminum heterostruct… ▽ More The fusion of non-Abelian anyons or topological defects is a fundamental operation in measurement-only topological quantum computation. In topological superconductors, this operation amounts to a determination of the shared fermion parity of Majorana zero modes. As a step towards this, we implement a single-shot interferometric measurement of fermion parity in indium arsenide-aluminum heterostructures with a gate-defined nanowire. The interferometer is formed by tunnel-coupling the proximitized nanowire to quantum dots. The nanowire causes a state-dependent shift of these quantum dots' quantum capacitance of up to 1 fF. Our quantum capacitance measurements show flux h/2e-periodic bimodality with a signal-to-noise ratio of 1 in 3.7 $μ$s at optimal flux values. From the time traces of the quantum capacitance measurements, we extract a dwell time in the two associated states that is longer than 1 ms at in-plane magnetic fields of approximately 2 T. These results are consistent with a measurement of the fermion parity encoded in a pair of Majorana zero modes that are separated by approximately 3 $μ$m and subjected to a low rate of poisoning by non-equilibrium quasiparticles. The large capacitance shift and long poisoning time enable a parity measurement error probability of 1%. △ Less

Submitted 2 April, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

Comments: Added data on a second measurement of device A and a measurement of device B, expanded discussion of a trivial scenario. Refs added, author list updated

arXiv:2401.08508 [pdf, other]

doi 10.1145/3637528.3671552

EmoLLMs: A Series of Emotional Large Language Models and Annotation Tools for Comprehensive Affective Analysis

Authors: Zhiwei Liu, Kailai Yang, Tianlin Zhang, Qianqian Xie, Sophia Ananiadou

Abstract: Sentiment analysis and emotion detection are important research topics in natural language processing (NLP) and benefit many downstream tasks. With the widespread application of LLMs, researchers have started exploring the application of LLMs based on instruction-tuning in the field of sentiment analysis. However, these models only focus on single aspects of affective classification tasks (e.g. se… ▽ More Sentiment analysis and emotion detection are important research topics in natural language processing (NLP) and benefit many downstream tasks. With the widespread application of LLMs, researchers have started exploring the application of LLMs based on instruction-tuning in the field of sentiment analysis. However, these models only focus on single aspects of affective classification tasks (e.g. sentimental polarity or categorical emotions), and overlook the regression tasks (e.g. sentiment strength or emotion intensity), which leads to poor performance in downstream tasks. The main reason is the lack of comprehensive affective instruction tuning datasets and evaluation benchmarks, which cover various affective classification and regression tasks. Moreover, although emotional information is useful for downstream tasks, existing downstream datasets lack high-quality and comprehensive affective annotations. In this paper, we propose EmoLLMs, the first series of open-sourced instruction-following LLMs for comprehensive affective analysis based on fine-tuning various LLMs with instruction data, the first multi-task affective analysis instruction dataset (AAID) with 234K data samples based on various classification and regression tasks to support LLM instruction tuning, and a comprehensive affective evaluation benchmark (AEB) with 14 tasks from various sources and domains to test the generalization ability of LLMs. We propose a series of EmoLLMs by fine-tuning LLMs with AAID to solve various affective instruction tasks. We compare our model with a variety of LLMs on AEB, where our models outperform all other open-sourced LLMs, and surpass ChatGPT and GPT-4 in most tasks, which shows that the series of EmoLLMs achieve the ChatGPT-level and GPT-4-level generalization capabilities on affective analysis tasks, and demonstrates our models can be used as affective annotation tools. △ Less

Submitted 17 June, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

Comments: Accepted by KDD 2024

arXiv:2401.08207 [pdf, other]

AGN jet-inflated bubbles as possible origin of odd radio circles

Authors: Yen-Hsing Lin, H. -Y. Karen Yang

Abstract: Odd radio circles (ORCs) are newly discovered extragalactic radio objects with unknown origin. In this work, we carry out three-dimensional cosmic-ray (CR) magnetohydrodynamic simulations using the FLASH code and predict the radio morphology of end-on active galactic nucleus (AGN) jet-inflated bubbles considering hadronic emission. We consider CR proton (CRp)-dominated jets as they tend to inflate… ▽ More Odd radio circles (ORCs) are newly discovered extragalactic radio objects with unknown origin. In this work, we carry out three-dimensional cosmic-ray (CR) magnetohydrodynamic simulations using the FLASH code and predict the radio morphology of end-on active galactic nucleus (AGN) jet-inflated bubbles considering hadronic emission. We consider CR proton (CRp)-dominated jets as they tend to inflate oblate bubbles, promising to reproduce the large inferred sizes of the ORCs when viewed end-on. We find that powerful and long-duration CRp-dominated jets can create bubbles with similar sizes ($\sim 300-600$ kpc) and radio morphology (circular and edge-brightened) to the observed ORCs in low-mass ($M_{\rm vir}\sim 8\times 10^{12} - 8\times 10^{13}~M_\odot$) halos. Given the same amount of input jet energy, longer-duration (thus lower-power) jets tend to create larger bubbles since high-power jets generate strong shocks that carry away a significant portion of the jet energy. The edge-brightened feature of the observed ORCs is naturally reproduced due to efficient hadronic collisions at the interface between the bubbles and the ambient medium. We further discuss the radio luminosity, X-ray detectability, and the possible origin of such strong AGN jets in the context of galaxy evolution. We conclude that end-on CR-dominated AGN bubbles could be a plausible scenario for the formation of ORCs. △ Less

Submitted 21 August, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

Comments: 17 pages, 10 figures, ApJ accepted

Showing 151–200 of 1,729 results for author: Yang, K