Search | arXiv e-print repository

HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions

Authors: Xuhui Zhou, Hyunwoo Kim, Faeze Brahman, Liwei Jiang, Hao Zhu, Ximing Lu, Frank Xu, Bill Yuchen Lin, Yejin Choi, Niloofar Mireshghallah, Ronan Le Bras, Maarten Sap

Abstract: AI agents are increasingly autonomous in their interactions with human users and tools, leading to increased interactional safety risks. We present HAICOSYSTEM, a framework examining AI agent safety within diverse and complex social interactions. HAICOSYSTEM features a modular sandbox environment that simulates multi-turn interactions between human users and AI agents, where the AI agents are equi… ▽ More AI agents are increasingly autonomous in their interactions with human users and tools, leading to increased interactional safety risks. We present HAICOSYSTEM, a framework examining AI agent safety within diverse and complex social interactions. HAICOSYSTEM features a modular sandbox environment that simulates multi-turn interactions between human users and AI agents, where the AI agents are equipped with a variety of tools (e.g., patient management platforms) to navigate diverse scenarios (e.g., a user attempting to access other patients' profiles). To examine the safety of AI agents in these interactions, we develop a comprehensive multi-dimensional evaluation framework that uses metrics covering operational, content-related, societal, and legal risks. Through running 1840 simulations based on 92 scenarios across seven domains (e.g., healthcare, finance, education), we demonstrate that HAICOSYSTEM can emulate realistic user-AI interactions and complex tool use by AI agents. Our experiments show that state-of-the-art LLMs, both proprietary and open-sourced, exhibit safety risks in over 50\% cases, with models generally showing higher risks when interacting with simulated malicious users. Our findings highlight the ongoing challenge of building agents that can safely navigate complex interactions, particularly when faced with malicious users. To foster the AI agent safety ecosystem, we release a code platform that allows practitioners to create custom scenarios, simulate interactions, and evaluate the safety and performance of their agents. △ Less

Submitted 26 September, 2024; v1 submitted 24 September, 2024; originally announced September 2024.

Comments: Both the second and third authors contributed equally

arXiv:2409.14051 [pdf, other]

GroupDebate: Enhancing the Efficiency of Multi-Agent Debate Using Group Discussion

Authors: Tongxuan Liu, Xingyu Wang, Weizhe Huang, Wenjiang Xu, Yuting Zeng, Lei Jiang, Hailong Yang, Jing Li

Abstract: In recent years, Large Language Models (LLMs) have demonstrated remarkable capabilities across diverse NLP tasks. Extensive research has explored how to enhance the logical reasoning abilities such as Chain-of-Thought, Chain-of-Thought with Self-Consistency, Tree-Of-Thoughts, and multi-agent debates. In the context of multi-agent debates, significant performance improvements can be achieved with a… ▽ More In recent years, Large Language Models (LLMs) have demonstrated remarkable capabilities across diverse NLP tasks. Extensive research has explored how to enhance the logical reasoning abilities such as Chain-of-Thought, Chain-of-Thought with Self-Consistency, Tree-Of-Thoughts, and multi-agent debates. In the context of multi-agent debates, significant performance improvements can be achieved with an increasing number of agents and debate rounds. However, the escalation in the number of agents and debate rounds can drastically raise the tokens cost of debates, thereby limiting the scalability of the multi-agent debate technique. To better harness the advantages of multi-agent debates in logical reasoning tasks, this paper proposes a method to significantly reduce token cost in multi-agent debates. This approach involves dividing all agents into multiple debate groups, with agents engaging in debates within their respective groups and sharing interim debate results between groups. Comparative experiments across multiple datasets have demonstrated that this method can reduce the total tokens by up to 51.7% during debates and while potentially enhancing accuracy by as much as 25%. Our method significantly enhances the performance and efficiency of interactions in the multi-agent debate. △ Less

Submitted 21 September, 2024; originally announced September 2024.

Comments: 18 pages

arXiv:2409.13545 [pdf, other]

Data Augmentation for Sequential Recommendation: A Survey

Authors: Yizhou Dang, Enneng Yang, Yuting Liu, Guibing Guo, Linying Jiang, Jianzhe Zhao, Xingwei Wang

Abstract: As an essential branch of recommender systems, sequential recommendation (SR) has received much attention due to its well-consistency with real-world situations. However, the widespread data sparsity issue limits the SR model's performance. Therefore, researchers have proposed many data augmentation (DA) methods to mitigate this phenomenon and have achieved impressive progress. In this survey, we… ▽ More As an essential branch of recommender systems, sequential recommendation (SR) has received much attention due to its well-consistency with real-world situations. However, the widespread data sparsity issue limits the SR model's performance. Therefore, researchers have proposed many data augmentation (DA) methods to mitigate this phenomenon and have achieved impressive progress. In this survey, we provide a comprehensive review of DA methods for SR. We start by introducing the research background and motivation. Then, we categorize existing methodologies regarding their augmentation principles, objects, and purposes. Next, we present a comparative discussion of their advantages and disadvantages, followed by the exhibition and analysis of representative experimental results. Finally, we outline directions for future research and summarize this survey. We also maintain a repository with a paper list at \url{https://github.com/KingGugu/DA-CL-4Rec}. △ Less

Submitted 20 September, 2024; originally announced September 2024.

arXiv:2409.13025 [pdf, other]

Hardware-efficient quantum error correction using concatenated bosonic qubits

Authors: Harald Putterman, Kyungjoo Noh, Connor T. Hann, Gregory S. MacCabe, Shahriar Aghaeimeibodi, Rishi N. Patel, Menyoung Lee, William M. Jones, Hesam Moradinejad, Roberto Rodriguez, Neha Mahuli, Jefferson Rose, John Clai Owens, Harry Levine, Emma Rosenfeld, Philip Reinhold, Lorenzo Moncelsi, Joshua Ari Alcid, Nasser Alidoust, Patricio Arrangoiz-Arriola, James Barnett, Przemyslaw Bienias, Hugh A. Carson, Cliff Chen, Li Chen , et al. (96 additional authors not shown)

Abstract: In order to solve problems of practical importance, quantum computers will likely need to incorporate quantum error correction, where a logical qubit is redundantly encoded in many noisy physical qubits. The large physical-qubit overhead typically associated with error correction motivates the search for more hardware-efficient approaches. Here, using a microfabricated superconducting quantum circ… ▽ More In order to solve problems of practical importance, quantum computers will likely need to incorporate quantum error correction, where a logical qubit is redundantly encoded in many noisy physical qubits. The large physical-qubit overhead typically associated with error correction motivates the search for more hardware-efficient approaches. Here, using a microfabricated superconducting quantum circuit, we realize a logical qubit memory formed from the concatenation of encoded bosonic cat qubits with an outer repetition code of distance $d=5$. The bosonic cat qubits are passively protected against bit flips using a stabilizing circuit. Cat-qubit phase-flip errors are corrected by the repetition code which uses ancilla transmons for syndrome measurement. We realize a noise-biased CX gate which ensures bit-flip error suppression is maintained during error correction. We study the performance and scaling of the logical qubit memory, finding that the phase-flip correcting repetition code operates below threshold, with logical phase-flip error decreasing with code distance from $d=3$ to $d=5$. Concurrently, the logical bit-flip error is suppressed with increasing cat-qubit mean photon number. The minimum measured logical error per cycle is on average $1.75(2)\%$ for the distance-3 code sections, and $1.65(3)\%$ for the longer distance-5 code, demonstrating the effectiveness of bit-flip error suppression throughout the error correction cycle. These results, where the intrinsic error suppression of the bosonic encodings allows us to use a hardware-efficient outer error correcting code, indicate that concatenated bosonic codes are a compelling paradigm for reaching fault-tolerant quantum computation. △ Less

Submitted 19 September, 2024; originally announced September 2024.

Comments: Comments on the manuscript welcome!

arXiv:2409.12020 [pdf, other]

Promise and Peril of Collaborative Code Generation Models: Balancing Effectiveness and Memorization

Authors: Zhi Chen, Lingxiao Jiang

Abstract: In the rapidly evolving field of machine learning, training models with datasets from various locations and organizations presents significant challenges due to privacy and legal concerns. The exploration of effective collaborative training settings capable of leveraging valuable knowledge from distributed and isolated datasets is increasingly crucial. This study investigates key factors that impa… ▽ More In the rapidly evolving field of machine learning, training models with datasets from various locations and organizations presents significant challenges due to privacy and legal concerns. The exploration of effective collaborative training settings capable of leveraging valuable knowledge from distributed and isolated datasets is increasingly crucial. This study investigates key factors that impact the effectiveness of collaborative training methods in code next-token prediction, as well as the correctness and utility of the generated code, demonstrating the promise of such methods. Additionally, we evaluate the memorization of different participant training data across various collaborative training settings, including centralized, federated, and incremental training, highlighting their potential risks in leaking data. Our findings indicate that the size and diversity of code datasets are pivotal factors influencing the success of collaboratively trained code models. We show that federated learning achieves competitive performance compared to centralized training while offering better data protection, as evidenced by lower memorization ratios in the generated code. However, federated learning can still produce verbatim code snippets from hidden training data, potentially violating privacy or copyright. Our study further explores effectiveness and memorization patterns in incremental learning, emphasizing the sequence in which individual participant datasets are introduced. We also identify cross-organizational clones as a prevalent challenge in both centralized and federated learning scenarios. Our findings highlight the persistent risk of data leakage during inference, even when training data remains unseen. We conclude with recommendations for practitioners and researchers to optimize multisource datasets, propelling cross-organizational collaboration forward. △ Less

Submitted 18 September, 2024; originally announced September 2024.

Comments: Paper accepted to the ASE 2024 Conference Research Track

arXiv:2409.10980 [pdf]

PSFHS Challenge Report: Pubic Symphysis and Fetal Head Segmentation from Intrapartum Ultrasound Images

Authors: Jieyun Bai, Zihao Zhou, Zhanhong Ou, Gregor Koehler, Raphael Stock, Klaus Maier-Hein, Marawan Elbatel, Robert Martí, Xiaomeng Li, Yaoyang Qiu, Panjie Gou, Gongping Chen, Lei Zhao, Jianxun Zhang, Yu Dai, Fangyijie Wang, Guénolé Silvestre, Kathleen Curran, Hongkun Sun, Jing Xu, Pengzhou Cai, Lu Jiang, Libin Lan, Dong Ni, Mei Zhong , et al. (4 additional authors not shown)

Abstract: Segmentation of the fetal and maternal structures, particularly intrapartum ultrasound imaging as advocated by the International Society of Ultrasound in Obstetrics and Gynecology (ISUOG) for monitoring labor progression, is a crucial first step for quantitative diagnosis and clinical decision-making. This requires specialized analysis by obstetrics professionals, in a task that i) is highly time-… ▽ More Segmentation of the fetal and maternal structures, particularly intrapartum ultrasound imaging as advocated by the International Society of Ultrasound in Obstetrics and Gynecology (ISUOG) for monitoring labor progression, is a crucial first step for quantitative diagnosis and clinical decision-making. This requires specialized analysis by obstetrics professionals, in a task that i) is highly time- and cost-consuming and ii) often yields inconsistent results. The utility of automatic segmentation algorithms for biometry has been proven, though existing results remain suboptimal. To push forward advancements in this area, the Grand Challenge on Pubic Symphysis-Fetal Head Segmentation (PSFHS) was held alongside the 26th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2023). This challenge aimed to enhance the development of automatic segmentation algorithms at an international scale, providing the largest dataset to date with 5,101 intrapartum ultrasound images collected from two ultrasound machines across three hospitals from two institutions. The scientific community's enthusiastic participation led to the selection of the top 8 out of 179 entries from 193 registrants in the initial phase to proceed to the competition's second stage. These algorithms have elevated the state-of-the-art in automatic PSFHS from intrapartum ultrasound images. A thorough analysis of the results pinpointed ongoing challenges in the field and outlined recommendations for future work. The top solutions and the complete dataset remain publicly available, fostering further advancements in automatic segmentation and biometry for intrapartum ultrasound imaging. △ Less

Submitted 17 September, 2024; originally announced September 2024.

arXiv:2409.06174 [pdf, other]

No Redshift Evolution in the Fe II/Mg II Flux Ratios of Quasars across Cosmic Time

Authors: Danyang Jiang, Masafusa Onoue, Linhua Jiang, Samuel Lai, Eduardo Banados, George D. Becker, Manuela Bischetti, Sarah E. I. Bosman, Rebecca L. Davies, Valentina DOdorico, Emanuele Paolo Farina, Martin G. Haehnelt, Chiara Mazzucchelli, Jan-Torge Schindler, Fabian Walter, Yongda Zhu

Abstract: The Fe II/Mg II emission line flux ratio in quasar spectra serves as a proxy for the relative Fe to alpha-element abundances in the broad line regions of quasars. Due to the expected different enrichment timescales of the two elements, they can be used as a cosmic clock in the early Universe. We present a study of the Fe II/Mg II ratios in a sample of luminous quasars exploiting high-quality near-… ▽ More The Fe II/Mg II emission line flux ratio in quasar spectra serves as a proxy for the relative Fe to alpha-element abundances in the broad line regions of quasars. Due to the expected different enrichment timescales of the two elements, they can be used as a cosmic clock in the early Universe. We present a study of the Fe II/Mg II ratios in a sample of luminous quasars exploiting high-quality near-IR spectra taken primarily by the XQR-30 program with VLT XSHOOTER. These quasars have a median bolometric luminosity of log(L_bol[erg s^-1])~47.3 and cover a redshift range of z=6.0-6.6. The median value of the measured Fe II/Mg II ratios is ~7.9 with a normalized median absolute deviation of ~2.2. In order to trace the cosmic evolution of Fe II/Mg II in an unbiased manner, we select two comparison samples of quasars with similar luminosities and high-quality spectra from the literature, one at intermediate redshifts (z=3.5-4.8) and the other at low redshifts (z=1.0-2.0). We perform the same spectral analysis for all these quasars, including the usage of the same iron template, the same spectral fitting method, and the same wavelength fitting windows. We find no significant redshift evolution in the Fe II/Mg II ratio over the wide redshift range from z=1 to 6.6. The result is consistent with previous studies and supports the scenario of a rapid iron enrichment in the vicinity of accreting supermassive black holes at high redshift. △ Less

Submitted 9 September, 2024; originally announced September 2024.

Comments: 23 pages, 9 figures; Accepted for publication in ApJ. The online materials are available at https://github.com/DJiang-astro/online_materials_of_D.Jiang-2024

arXiv:2409.05617 [pdf, other]

G-NeLF: Memory- and Data-Efficient Hybrid Neural Light Field for Novel View Synthesis

Authors: Lutao Jiang, Lin Wang

Abstract: Following the burgeoning interest in implicit neural representation, Neural Light Field (NeLF) has been introduced to predict the color of a ray directly. Unlike Neural Radiance Field (NeRF), NeLF does not create a point-wise representation by predicting color and volume density for each point in space. However, the current NeLF methods face a challenge as they need to train a NeRF model first and… ▽ More Following the burgeoning interest in implicit neural representation, Neural Light Field (NeLF) has been introduced to predict the color of a ray directly. Unlike Neural Radiance Field (NeRF), NeLF does not create a point-wise representation by predicting color and volume density for each point in space. However, the current NeLF methods face a challenge as they need to train a NeRF model first and then synthesize over 10K views to train NeLF for improved performance. Additionally, the rendering quality of NeLF methods is lower compared to NeRF methods. In this paper, we propose G-NeLF, a versatile grid-based NeLF approach that utilizes spatial-aware features to unleash the potential of the neural network's inference capability, and consequently overcome the difficulties of NeLF training. Specifically, we employ a spatial-aware feature sequence derived from a meticulously crafted grid as the ray's representation. Drawing from our empirical studies on the adaptability of multi-resolution hash tables, we introduce a novel grid-based ray representation for NeLF that can represent the entire space with a very limited number of parameters. To better utilize the sequence feature, we design a lightweight ray color decoder that simulates the ray propagation process, enabling a more efficient inference of the ray's color. G-NeLF can be trained without necessitating significant storage overhead and with the model size of only 0.95 MB to surpass previous state-of-the-art NeLF. Moreover, compared with grid-based NeRF methods, e.g., Instant-NGP, we only utilize one-tenth of its parameters to achieve higher performance. Our code will be released upon acceptance. △ Less

Submitted 9 September, 2024; originally announced September 2024.

arXiv:2409.04558 [pdf]

Solve paint color effect prediction problem in trajectory optimization of spray painting robot using artificial neural network inspired by the Kubelka Munk model

Authors: Hexiang Wang, Zhiyuan Bi, Zhen Cheng, Xinru Li, Jiake Zhu, Liyuan Jiang, Hao Li, Shizhou Lu

Abstract: Currently, the spray-painting robot trajectory planning technology aiming at spray painting quality mainly applies to single-color spraying. Conventional methods of optimizing the spray gun trajectory based on simulated thickness can only qualitatively reflect the color distribution, and can not simulate the color effect of spray painting at the pixel level. Therefore, it is not possible to accura… ▽ More Currently, the spray-painting robot trajectory planning technology aiming at spray painting quality mainly applies to single-color spraying. Conventional methods of optimizing the spray gun trajectory based on simulated thickness can only qualitatively reflect the color distribution, and can not simulate the color effect of spray painting at the pixel level. Therefore, it is not possible to accurately control the area covered by the color and the gradation of the edges of the area, and it is also difficult to deal with the situation where multiple colors of paint are sprayed in combination. To solve the above problems, this paper is inspired by the Kubelka-Munk model and combines the 3D machine vision method and artificial neural network to propose a spray painting color effect prediction method. The method is enabled to predict the execution effect of the spray gun trajectory with pixel-level accuracy from the dimension of the surface color of the workpiece after spray painting. On this basis, the method can be used to replace the traditional thickness simulation method to establish the objective function of the spray gun trajectory optimization problem, and thus solve the difficult problem of spray gun trajectory optimization for multi-color paint combination spraying. In this paper, the mathematical model of the spray painting color effect prediction problem is first determined through the analysis of the Kubelka-Munk paint film color rendering model, and at the same time, the spray painting color effect dataset is established with the help of the depth camera and point cloud processing algorithm. After that, the multilayer perceptron model was improved with the help of gating and residual structure and was used for the color prediction task. To verify ... △ Less

Submitted 6 September, 2024; originally announced September 2024.

arXiv:2409.03415 [pdf, other]

Anisotropic spin filtering by an altermagnetic barrier in magnetic tunnel junctions

Authors: Boyuan Chi, Leina Jiang, Yu Zhu, Guoqiang Yu, Caihua Wan, Xiufeng Han

Abstract: The spin filtering effect, distinct decaying lengths experienced by oppositely spin-polarized electrons in a magnetic barrier, generally occurs in ferromagnetic (FM) insulators or semiconductors. With the rise of altermagnetic (ALM) materials which exhibit similar capability of spin-polarizing electrons with ferromagnets, it is a nature question whether the ALM insulators or semiconductors can als… ▽ More The spin filtering effect, distinct decaying lengths experienced by oppositely spin-polarized electrons in a magnetic barrier, generally occurs in ferromagnetic (FM) insulators or semiconductors. With the rise of altermagnetic (ALM) materials which exhibit similar capability of spin-polarizing electrons with ferromagnets, it is a nature question whether the ALM insulators or semiconductors can also act as unique barriers for the spin splitting effect. Here, through first-principles calculations, we investigated the complex band structure of the ALM insulator FeF$_2$ and found that it possesses an anisotropic spin filtering effect: along the [001] direction of FeF$_2$, a current remains spin-neutral but has locally nonvanishing spin polarizations in the momentum space; moreover, along the [110] direction of FeF$_2$, a current will be globally spin-polarized by different attenuation lengths of oppositely spin-polarized electrons. Leveraging this anisotropic spin filtering effect, we designed two types of MTJs with the ALM barrier: ALM electrode/ALM insulator barrier/non-magnetic (NM) electrode and FM electrode/ALM insulator barrier/NM electrode, using RuO$_2$(001)/FeF$_2$/IrO$_2$ and CrO$_2$(110)/FeF$_2$/IrO$_2$ as the corresponding prototypes, respectively. We found that these two proposed MTJs exhibited the tunneling magnetoresistance (TMR) ratios of 216\% and 3956\%, by matching the conduction channels of the electrodes and the spin-resolved lowest decay rate of the barrier in the momentum space. Our work deepens and generalizes understanding toward the spin filtering effect for the rising ALM insulators and semiconductors, and broadens applications of the AFM spintronics. △ Less

Submitted 5 September, 2024; originally announced September 2024.

Comments: 6 pages, 4 figures

arXiv:2409.02418 [pdf, other]

MOSMOS: Multi-organ segmentation facilitated by medical report supervision

Authors: Weiwei Tian, Xinyu Huang, Junlin Hou, Caiyue Ren, Longquan Jiang, Rui-Wei Zhao, Gang Jin, Yuejie Zhang, Daoying Geng

Abstract: Owing to a large amount of multi-modal data in modern medical systems, such as medical images and reports, Medical Vision-Language Pre-training (Med-VLP) has demonstrated incredible achievements in coarse-grained downstream tasks (i.e., medical classification, retrieval, and visual question answering). However, the problem of transferring knowledge learned from Med-VLP to fine-grained multi-organ… ▽ More Owing to a large amount of multi-modal data in modern medical systems, such as medical images and reports, Medical Vision-Language Pre-training (Med-VLP) has demonstrated incredible achievements in coarse-grained downstream tasks (i.e., medical classification, retrieval, and visual question answering). However, the problem of transferring knowledge learned from Med-VLP to fine-grained multi-organ segmentation tasks has barely been investigated. Multi-organ segmentation is challenging mainly due to the lack of large-scale fully annotated datasets and the wide variation in the shape and size of the same organ between individuals with different diseases. In this paper, we propose a novel pre-training & fine-tuning framework for Multi-Organ Segmentation by harnessing Medical repOrt Supervision (MOSMOS). Specifically, we first introduce global contrastive learning to maximally align the medical image-report pairs in the pre-training stage. To remedy the granularity discrepancy, we further leverage multi-label recognition to implicitly learn the semantic correspondence between image pixels and organ tags. More importantly, our pre-trained models can be transferred to any segmentation model by introducing the pixel-tag attention maps. Different network settings, i.e., 2D U-Net and 3D UNETR, are utilized to validate the generalization. We have extensively evaluated our approach using different diseases and modalities on BTCV, AMOS, MMWHS, and BRATS datasets. Experimental results in various settings demonstrate the effectiveness of our framework. This framework can serve as the foundation to facilitate future research on automatic annotation tasks under the supervision of medical reports. △ Less

Submitted 3 September, 2024; originally announced September 2024.

Comments: 14 pages, 7 figures

arXiv:2409.01563 [pdf, other]

Blockchain-based Federated Recommendation with Incentive Mechanism

Authors: Jianhai Chen, Yanlin Wu, Dazhong Rong, Guoyao Yu, Lingqi Jiang, Zhenguang Liu, Peng Zhou, Rui Shen

Abstract: Nowadays, federated recommendation technology is rapidly evolving to help multiple organisations share data and train models while meeting user privacy, data security and government regulatory requirements. However, federated recommendation increases customer system costs such as power, computational and communication resources. Besides, federated recommendation systems are also susceptible to mod… ▽ More Nowadays, federated recommendation technology is rapidly evolving to help multiple organisations share data and train models while meeting user privacy, data security and government regulatory requirements. However, federated recommendation increases customer system costs such as power, computational and communication resources. Besides, federated recommendation systems are also susceptible to model attacks and data poisoning by participating malicious clients. Therefore, most customers are unwilling to participate in federated recommendation without any incentive. To address these problems, we propose a blockchain-based federated recommendation system with incentive mechanism to promote more trustworthy, secure, and efficient federated recommendation service. First, we construct a federated recommendation system based on NeuMF and FedAvg. Then we introduce a reverse auction mechanism to select optimal clients that can maximize the social surplus. Finally, we employ blockchain for on-chain evidence storage of models to ensure the safety of the federated recommendation system. The experimental results show that our proposed incentive mechanism can attract clients with superior training data to engage in the federal recommendation at a lower cost, which can increase the economic benefit of federal recommendation by 54.9\% while improve the recommendation performance. Thus our work provides theoretical and technological support for the construction of a harmonious and healthy ecological environment for the application of federal recommendation. △ Less

Submitted 2 September, 2024; originally announced September 2024.

Comments: This paper has been accepted on 2024 Blockchain and Web3 Technology Innovation and Application Exchange Conference (BWTAC 2024)

arXiv:2408.17224 [pdf, other]

Hadronic cross section measurements with the DAMPE space mission using 20GeV-10TeV cosmic-ray protons and $^4$He

Authors: F. Alemanno, Q. An, P. Azzarello, F. C. T. Barbato, P. Bernardini, X. J. Bi, I. Cagnoli, M. S. Cai, E. Casilli, E. Catanzani, J. Chang, D. Y. Chen, J. L. Chen, Z. F. Chen, P. Coppin, M. Y. Cui, T. S. Cui, Y. X. Cui, H. T. Dai, A. De Benedittis, I. De Mitri, F. de Palma, A. Di Giovanni, Q. Ding, T. K. Dong , et al. (126 additional authors not shown)

Abstract: Precise direct cosmic-ray (CR) measurements provide an important probe to study the energetic particle sources in our Galaxy, and the interstellar environment through which these particles propagate. Uncertainties on hadronic models, ion-nucleon cross sections in particular, are currently the limiting factor towards obtaining more accurate CR ion flux measurements with calorimetric space-based exp… ▽ More Precise direct cosmic-ray (CR) measurements provide an important probe to study the energetic particle sources in our Galaxy, and the interstellar environment through which these particles propagate. Uncertainties on hadronic models, ion-nucleon cross sections in particular, are currently the limiting factor towards obtaining more accurate CR ion flux measurements with calorimetric space-based experiments. We present an energy-dependent measurement of the inelastic cross section of protons and helium-4 nuclei (alpha particles) on a Bi$_4$Ge$_3$O$_{12}$ target, using 88 months of data collected by the DAMPE space mission. The kinetic energy range per nucleon of the measurement points ranges from 18 GeV to 9 TeV for protons, and from 5 GeV/n to 3 TeV/n for helium-4 nuclei. Our results lead to a significant improvement of the CR flux normalisation. In the case of helium-4, these results correspond to the first cross section measurements on a heavy target material at energies above 10 GeV/n. △ Less

Submitted 30 August, 2024; originally announced August 2024.

Comments: 17 pages, submitted to PRD

arXiv:2408.16771 [pdf]

Navigating Governance Paradigms: A Cross-Regional Comparative Study of Generative AI Governance Processes & Principles

Authors: Jose Luna, Ivan Tan, Xiaofei Xie, Lingxiao Jiang

Abstract: As Generative Artificial Intelligence (GenAI) technologies evolve at an unprecedented rate, global governance approaches struggle to keep pace with the technology, highlighting a critical issue in the governance adaptation of significant challenges. Depicting the nuances of nascent and diverse governance approaches based on risks, rules, outcomes, principles, or a mix across different regions arou… ▽ More As Generative Artificial Intelligence (GenAI) technologies evolve at an unprecedented rate, global governance approaches struggle to keep pace with the technology, highlighting a critical issue in the governance adaptation of significant challenges. Depicting the nuances of nascent and diverse governance approaches based on risks, rules, outcomes, principles, or a mix across different regions around the globe is fundamental to discern discrepancies and convergences and to shed light on specific limitations that need to be addressed, thereby facilitating the safe and trustworthy adoption of GenAI. In response to the need and the evolving nature of GenAI, this paper seeks to provide a collective view of different governance approaches around the world. Our research introduces a Harmonized GenAI Framework, "H-GenAIGF," based on the current governance approaches of six regions: European Union (EU), United States (US), China (CN), Canada (CA), United Kingdom (UK), and Singapore (SG). We have identified four constituents, fifteen processes, twenty-five sub-processes, and nine principles that aid the governance of GenAI, thus providing a comprehensive perspective on the current state of GenAI governance. In addition, we present a comparative analysis to facilitate the identification of common ground and distinctions based on the coverage of the processes by each region. The results show that risk-based approaches allow for better coverage of the processes, followed by mixed approaches. Other approaches lag behind, covering less than 50% of the processes. Most prominently, the analysis demonstrates that among the regions, only one process aligns across all approaches, highlighting the lack of consistent and executable provisions. Moreover, our case study on ChatGPT reveals process coverage deficiency, showing that harmonization of approaches is necessary to find alignment for GenAI governance. △ Less

Submitted 14 August, 2024; originally announced August 2024.

Comments: To appear at AIES 2024

ACM Class: K.5.2; K.4.1; H.1.2

arXiv:2408.16289 [pdf]

Convolutional Neural Network Compression Based on Low-Rank Decomposition

Authors: Yaping He, Linhao Jiang, Di Wu

Abstract: Deep neural networks typically impose significant computational loads and memory consumption. Moreover, the large parameters pose constraints on deploying the model on edge devices such as embedded systems. Tensor decomposition offers a clear advantage in compressing large-scale weight tensors. Nevertheless, direct utilization of low-rank decomposition typically leads to significant accuracy loss.… ▽ More Deep neural networks typically impose significant computational loads and memory consumption. Moreover, the large parameters pose constraints on deploying the model on edge devices such as embedded systems. Tensor decomposition offers a clear advantage in compressing large-scale weight tensors. Nevertheless, direct utilization of low-rank decomposition typically leads to significant accuracy loss. This paper proposes a model compression method that integrates Variational Bayesian Matrix Factorization (VBMF) with orthogonal regularization. Initially, the model undergoes over-parameterization and training, with orthogonal regularization applied to enhance its likelihood of achieving the accuracy of the original model. Secondly, VBMF is employed to estimate the rank of the weight tensor at each layer. Our framework is sufficiently general to apply to other convolutional neural networks and easily adaptable to incorporate other tensor decomposition methods. Experimental results show that for both high and low compression ratios, our compression model exhibits advanced performance. △ Less

Submitted 29 August, 2024; originally announced August 2024.

Comments: 10 pages, 1 figures

arXiv:2408.16183 [pdf, other]

Galaxies Lighting Up: Discovery of Seventy New Turn-on Changing-look Quasars

Authors: Qian Yang, Paul J. Green, Xue-Bing Wu, Michael Eracleous, Linhua Jiang, Yuming Fu

Abstract: "Changing-look quasars" (CLQs), discovered less than a decade ago, show dramatic, rapid changes in optical/UV continuum and broad line emission. The majority of CLQs have been found dimming as "turn-off" CLQs because most selection methods start from samples of spectroscopically-confirmed quasars. We present here a sample of 82 spectroscopically confirmed "turn-on" CLQs, 70 of which are newly iden… ▽ More "Changing-look quasars" (CLQs), discovered less than a decade ago, show dramatic, rapid changes in optical/UV continuum and broad line emission. The majority of CLQs have been found dimming as "turn-off" CLQs because most selection methods start from samples of spectroscopically-confirmed quasars. We present here a sample of 82 spectroscopically confirmed "turn-on" CLQs, 70 of which are newly identified. The turn-on CLQs are selected from spectroscopically classified galaxies with subsequent significant and dramatic variability in both the optical and mid-infrared bands, indicating a mechanism of changing accretion rate of the supermassive black holes rather than variable obscuration. Based on their bright state Eddington ratios, turn-on CLQs are associated with lower accretion rates compared to turn-off CLQs or typical SDSS quasars with similar redshift and magnitude distributions, even though turn-on CLQs have lower black hole masses. Most turn-on CLQs reside in host galaxies that follow local relations between the central black hole mass and host galaxy properties, such as stellar mass and velocity dispersion. However, their host galaxies have higher mass than normal inactive galaxies, with star formation rates more similar to hosts of Type 2 AGN than to the overall galaxy population. △ Less

Submitted 28 August, 2024; originally announced August 2024.

Comments: Submitted to ApJ; the spectral fitting code QGfit will be available on GitHub; 36 pages, 20 figures

arXiv:2408.15548 [pdf, other]

ConsistencyTrack: A Robust Multi-Object Tracker with a Generation Strategy of Consistency Model

Authors: Lifan Jiang, Zhihui Wang, Siqi Yin, Guangxiao Ma, Peng Zhang, Boxi Wu

Abstract: Multi-object tracking (MOT) is a critical technology in computer vision, designed to detect multiple targets in video sequences and assign each target a unique ID per frame. Existed MOT methods excel at accurately tracking multiple objects in real-time across various scenarios. However, these methods still face challenges such as poor noise resistance and frequent ID switches. In this research, we… ▽ More Multi-object tracking (MOT) is a critical technology in computer vision, designed to detect multiple targets in video sequences and assign each target a unique ID per frame. Existed MOT methods excel at accurately tracking multiple objects in real-time across various scenarios. However, these methods still face challenges such as poor noise resistance and frequent ID switches. In this research, we propose a novel ConsistencyTrack, joint detection and tracking(JDT) framework that formulates detection and association as a denoising diffusion process on perturbed bounding boxes. This progressive denoising strategy significantly improves the model's noise resistance. During the training phase, paired object boxes within two adjacent frames are diffused from ground-truth boxes to a random distribution, and then the model learns to detect and track by reversing this process. In inference, the model refines randomly generated boxes into detection and tracking results through minimal denoising steps. ConsistencyTrack also introduces an innovative target association strategy to address target occlusion. Experiments on the MOT17 and DanceTrack datasets demonstrate that ConsistencyTrack outperforms other compared methods, especially better than DiffusionTrack in inference speed and other performance metrics. Our code is available at https://github.com/Tankowa/ConsistencyTrack. △ Less

Submitted 28 August, 2024; originally announced August 2024.

Comments: arXiv admin note: text overlap with arXiv:2308.09905 by other authors

arXiv:2408.12725 [pdf, other]

DUNE Phase II: Scientific Opportunities, Detector Concepts, Technological Solutions

Authors: DUNE Collaboration, A. Abed Abud, B. Abi, R. Acciarri, M. A. Acero, M. R. Adames, G. Adamov, M. Adamowski, D. Adams, M. Adinolfi, C. Adriano, A. Aduszkiewicz, J. Aguilar, F. Akbar, K. Allison, S. Alonso Monsalve, M. Alrashed, A. Alton, R. Alvarez, T. Alves, H. Amar, P. Amedo, J. Anderson, C. Andreopoulos, M. Andreotti , et al. (1347 additional authors not shown)

Abstract: The international collaboration designing and constructing the Deep Underground Neutrino Experiment (DUNE) at the Long-Baseline Neutrino Facility (LBNF) has developed a two-phase strategy toward the implementation of this leading-edge, large-scale science project. The 2023 report of the US Particle Physics Project Prioritization Panel (P5) reaffirmed this vision and strongly endorsed DUNE Phase I… ▽ More The international collaboration designing and constructing the Deep Underground Neutrino Experiment (DUNE) at the Long-Baseline Neutrino Facility (LBNF) has developed a two-phase strategy toward the implementation of this leading-edge, large-scale science project. The 2023 report of the US Particle Physics Project Prioritization Panel (P5) reaffirmed this vision and strongly endorsed DUNE Phase I and Phase II, as did the European Strategy for Particle Physics. While the construction of the DUNE Phase I is well underway, this White Paper focuses on DUNE Phase II planning. DUNE Phase-II consists of a third and fourth far detector (FD) module, an upgraded near detector complex, and an enhanced 2.1 MW beam. The fourth FD module is conceived as a "Module of Opportunity", aimed at expanding the physics opportunities, in addition to supporting the core DUNE science program, with more advanced technologies. This document highlights the increased science opportunities offered by the DUNE Phase II near and far detectors, including long-baseline neutrino oscillation physics, neutrino astrophysics, and physics beyond the standard model. It describes the DUNE Phase II near and far detector technologies and detector design concepts that are currently under consideration. A summary of key R&D goals and prototyping phases needed to realize the Phase II detector technical designs is also provided. DUNE's Phase II detectors, along with the increased beam power, will complete the full scope of DUNE, enabling a multi-decadal program of groundbreaking science with neutrinos. △ Less

Submitted 22 August, 2024; originally announced August 2024.

Report number: FERMILAB-TM-2833-LBNF

arXiv:2408.12064 [pdf, other]

A Practical Introduction to Benchmarking and Characterization of Quantum Computers

Authors: Akel Hashim, Long B. Nguyen, Noah Goss, Brian Marinelli, Ravi K. Naik, Trevor Chistolini, Jordan Hines, J. P. Marceaux, Yosep Kim, Pranav Gokhale, Teague Tomesh, Senrui Chen, Liang Jiang, Samuele Ferracin, Kenneth Rudinger, Timothy Proctor, Kevin C. Young, Robin Blume-Kohout, Irfan Siddiqi

Abstract: Rapid progress in quantum technology has transformed quantum computing and quantum information science from theoretical possibilities into tangible engineering challenges. Breakthroughs in quantum algorithms, quantum simulations, and quantum error correction are bringing useful quantum computation closer to fruition. These remarkable achievements have been facilitated by advances in quantum charac… ▽ More Rapid progress in quantum technology has transformed quantum computing and quantum information science from theoretical possibilities into tangible engineering challenges. Breakthroughs in quantum algorithms, quantum simulations, and quantum error correction are bringing useful quantum computation closer to fruition. These remarkable achievements have been facilitated by advances in quantum characterization, verification, and validation (QCVV). QCVV methods and protocols enable scientists and engineers to scrutinize, understand, and enhance the performance of quantum information-processing devices. In this Tutorial, we review the fundamental principles underpinning QCVV, and introduce a diverse array of QCVV tools used by quantum researchers. We define and explain QCVV's core models and concepts -- quantum states, measurements, and processes -- and illustrate how these building blocks are leveraged to examine a target system or operation. We survey and introduce protocols ranging from simple qubit characterization to advanced benchmarking methods. Along the way, we provide illustrated examples and detailed descriptions of the protocols, highlight the advantages and disadvantages of each, and discuss their potential scalability to future large-scale quantum computers. This Tutorial serves as a guidebook for researchers unfamiliar with the benchmarking and characterization of quantum computers, and also as a detailed reference for experienced practitioners. △ Less

Submitted 21 August, 2024; originally announced August 2024.

arXiv:2408.11982 [pdf, other]

AIM 2024 Challenge on Compressed Video Quality Assessment: Methods and Results

Authors: Maksim Smirnov, Aleksandr Gushchin, Anastasia Antsiferova, Dmitry Vatolin, Radu Timofte, Ziheng Jia, Zicheng Zhang, Wei Sun, Jiaying Qian, Yuqin Cao, Yinan Sun, Yuxin Zhu, Xiongkuo Min, Guangtao Zhai, Kanjar De, Qing Luo, Ao-Xiang Zhang, Peng Zhang, Haibo Lei, Linyan Jiang, Yaqing Li, Wenhui Meng, Xiaoheng Tan, Haiqiang Wang, Xiaozhong Xu , et al. (11 additional authors not shown)

Abstract: Video quality assessment (VQA) is a crucial task in the development of video compression standards, as it directly impacts the viewer experience. This paper presents the results of the Compressed Video Quality Assessment challenge, held in conjunction with the Advances in Image Manipulation (AIM) workshop at ECCV 2024. The challenge aimed to evaluate the performance of VQA methods on a diverse dat… ▽ More Video quality assessment (VQA) is a crucial task in the development of video compression standards, as it directly impacts the viewer experience. This paper presents the results of the Compressed Video Quality Assessment challenge, held in conjunction with the Advances in Image Manipulation (AIM) workshop at ECCV 2024. The challenge aimed to evaluate the performance of VQA methods on a diverse dataset of 459 videos, encoded with 14 codecs of various compression standards (AVC/H.264, HEVC/H.265, AV1, and VVC/H.266) and containing a comprehensive collection of compression artifacts. To measure the methods performance, we employed traditional correlation coefficients between their predictions and subjective scores, which were collected via large-scale crowdsourced pairwise human comparisons. For training purposes, participants were provided with the Compressed Video Quality Assessment Dataset (CVQAD), a previously developed dataset of 1022 videos. Up to 30 participating teams registered for the challenge, while we report the results of 6 teams, which submitted valid final solutions and code for reproducing the results. Moreover, we calculated and present the performance of state-of-the-art VQA methods on the developed dataset, providing a comprehensive benchmark for future research. The dataset, results, and online leaderboard are publicly available at https://challenges.videoprocessing.ai/challenges/compressedvideo-quality-assessment.html. △ Less

Submitted 28 August, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

arXiv:2408.11706 [pdf, other]

FRAP: Faithful and Realistic Text-to-Image Generation with Adaptive Prompt Weighting

Authors: Liyao Jiang, Negar Hassanpour, Mohammad Salameh, Mohan Sai Singamsetti, Fengyu Sun, Wei Lu, Di Niu

Abstract: Text-to-image (T2I) diffusion models have demonstrated impressive capabilities in generating high-quality images given a text prompt. However, ensuring the prompt-image alignment remains a considerable challenge, i.e., generating images that faithfully align with the prompt's semantics. Recent works attempt to improve the faithfulness by optimizing the latent code, which potentially could cause th… ▽ More Text-to-image (T2I) diffusion models have demonstrated impressive capabilities in generating high-quality images given a text prompt. However, ensuring the prompt-image alignment remains a considerable challenge, i.e., generating images that faithfully align with the prompt's semantics. Recent works attempt to improve the faithfulness by optimizing the latent code, which potentially could cause the latent code to go out-of-distribution and thus produce unrealistic images. In this paper, we propose FRAP, a simple, yet effective approach based on adaptively adjusting the per-token prompt weights to improve prompt-image alignment and authenticity of the generated images. We design an online algorithm to adaptively update each token's weight coefficient, which is achieved by minimizing a unified objective function that encourages object presence and the binding of object-modifier pairs. Through extensive evaluations, we show FRAP generates images with significantly higher prompt-image alignment to prompts from complex datasets, while having a lower average latency compared to recent latent code optimization methods, e.g., 4 seconds faster than D&B on the COCO-Subject dataset. Furthermore, through visual comparisons and evaluation on the CLIP-IQA-Real metric, we show that FRAP not only improves prompt-image alignment but also generates more authentic images with realistic appearances. We also explore combining FRAP with prompt rewriting LLM to recover their degraded prompt-image alignment, where we observe improvements in both prompt-image alignment and image quality. △ Less

Submitted 21 August, 2024; originally announced August 2024.

arXiv:2408.09937 [pdf, other]

The curse of random quantum data

Authors: Kaining Zhang, Junyu Liu, Liu Liu, Liang Jiang, Min-Hsiu Hsieh, Dacheng Tao

Abstract: Quantum machine learning, which involves running machine learning algorithms on quantum devices, may be one of the most significant flagship applications for these devices. Unlike its classical counterparts, the role of data in quantum machine learning has not been fully understood. In this work, we quantify the performances of quantum machine learning in the landscape of quantum data. Provided th… ▽ More Quantum machine learning, which involves running machine learning algorithms on quantum devices, may be one of the most significant flagship applications for these devices. Unlike its classical counterparts, the role of data in quantum machine learning has not been fully understood. In this work, we quantify the performances of quantum machine learning in the landscape of quantum data. Provided that the encoding of quantum data is sufficiently random, the performance, we find that the training efficiency and generalization capabilities in quantum machine learning will be exponentially suppressed with the increase in the number of qubits, which we call "the curse of random quantum data". Our findings apply to both the quantum kernel method and the large-width limit of quantum neural networks. Conversely, we highlight that through meticulous design of quantum datasets, it is possible to avoid these curses, thereby achieving efficient convergence and robust generalization. Our conclusions are corroborated by extensive numerical simulations. △ Less

Submitted 19 August, 2024; originally announced August 2024.

Comments: 40 pages, 8 figures

arXiv:2408.09621 [pdf, other]

Refining Packing and Shuffling Strategies for Enhanced Performance in Generative Language Models

Authors: Yanbing Chen, Ruilin Wang, Zihao Yang, Lavender Yao Jiang, Eric Karl Oermann

Abstract: Packing and shuffling tokens is a common practice in training auto-regressive language models (LMs) to prevent overfitting and improve efficiency. Typically documents are concatenated to chunks of maximum sequence length (MSL) and then shuffled. However setting the atom size, the length for each data chunk accompanied by random shuffling, to MSL may lead to contextual incoherence due to tokens fro… ▽ More Packing and shuffling tokens is a common practice in training auto-regressive language models (LMs) to prevent overfitting and improve efficiency. Typically documents are concatenated to chunks of maximum sequence length (MSL) and then shuffled. However setting the atom size, the length for each data chunk accompanied by random shuffling, to MSL may lead to contextual incoherence due to tokens from different documents being packed into the same chunk. An alternative approach is to utilize padding, another common data packing strategy, to avoid contextual incoherence by only including one document in each shuffled chunk. To optimize both packing strategies (concatenation vs padding), we investigated the optimal atom size for shuffling and compared their performance and efficiency. We found that matching atom size to MSL optimizes performance for both packing methods (concatenation and padding), and padding yields lower final perplexity (higher performance) than concatenation at the cost of more training steps and lower compute efficiency. This trade-off informs the choice of packing methods in training language models. △ Less

Submitted 18 August, 2024; originally announced August 2024.

Comments: 11 pages (include appendix), 26 figures, submitted to ACL ARR Aug 2024

ACM Class: I.2.7

arXiv:2408.09327 [pdf, other]

Threshold Filtering Packing for Supervised Fine-Tuning: Training Related Samples within Packs

Authors: Jiancheng Dong, Lei Jiang, Wei Jin, Lu Cheng

Abstract: Packing for Supervised Fine-Tuning (SFT) in autoregressive models involves concatenating data points of varying lengths until reaching the designed maximum length to facilitate GPU processing. However, randomly concatenating data points and feeding them into an autoregressive transformer can lead to cross-contamination of sequences due to the significant difference in their subject matter. The mai… ▽ More Packing for Supervised Fine-Tuning (SFT) in autoregressive models involves concatenating data points of varying lengths until reaching the designed maximum length to facilitate GPU processing. However, randomly concatenating data points and feeding them into an autoregressive transformer can lead to cross-contamination of sequences due to the significant difference in their subject matter. The mainstream approaches in SFT ensure that each token in the attention calculation phase only focuses on tokens within its own short sequence, without providing additional learning signals for the preceding context. To address these challenges, we introduce Threshold Filtering Packing (TFP), a method that selects samples with related context while maintaining sufficient diversity within the same pack. Our experiments show that TFP offers a simple-to-implement and scalable approach that significantly enhances SFT performance, with observed improvements of up to 7\% on GSM8K, 4\% on HumanEval, and 15\% on the adult-census-income dataset. △ Less

Submitted 17 August, 2024; originally announced August 2024.

Comments: 13 pages, 4 figures

arXiv:2408.08772 [pdf, other]

Vital: Vulnerability-Oriented Symbolic Execution via Type-Unsafe Pointer-Guided Monte Carlo Tree Search

Authors: Haoxin Tu, Lingxiao Jiang, Marcel Böhme

Abstract: How to find memory safety bugs efficiently when navigating a symbolic execution tree that suffers from path explosion? Existing solutions either adopt path search heuristics to maximize coverage rate or chopped symbolic execution to skip uninteresting code (i.e., manually labeled as vulnerability-unrelated) during path exploration. However, most existing search heuristics are not vulnerability-ori… ▽ More How to find memory safety bugs efficiently when navigating a symbolic execution tree that suffers from path explosion? Existing solutions either adopt path search heuristics to maximize coverage rate or chopped symbolic execution to skip uninteresting code (i.e., manually labeled as vulnerability-unrelated) during path exploration. However, most existing search heuristics are not vulnerability-oriented, and manual labeling of irrelevant code-to-be-skipped relies heavily on prior expert knowledge, making it hard to detect vulnerabilities effectively in practice. This paper proposes Vital, a new vulnerability-oriented symbolic execution via type-unsafe pointer-guided Monte Carlo Tree Search (MCTS). A pointer that is type unsafe cannot be statically proven to be safely dereferenced without memory corruption. Our key hypothesis is that a path with more type unsafe pointers is more likely to contain vulnerabilities. Vital drives a guided MCTS to prioritize paths in the symbolic execution tree that contain a larger number of unsafe pointers and to effectively navigate the exploration-exploitation trade-off. We built Vital on top of KLEE and compared it with existing search strategies and chopped symbolic execution. In the former, the results demonstrate that Vital could cover up to 90.03% more unsafe pointers and detect up to 37.50% more unique memory errors. In the latter, the results show that Vital could achieve a speedup of up to 30x execution time and a reduction of up to 20x memory consumption on automatically detecting known vulnerabilities without prior expert knowledge. △ Less

Submitted 16 August, 2024; originally announced August 2024.

Comments: 12 pages

arXiv:2408.08476 [pdf, other]

Model free data assimilation with Takens embedding

Authors: Ziyi Wang, Lijian Jiang

Abstract: In many practical scenarios, the dynamical system is not available and standard data assimilation methods are not applicable. Our objective is to construct a data-driven model for state estimation without the underlying dynamics. Instead of directly modeling the observation operator with noisy observation, we establish the state space model of the denoised observation. Through data assimilation te… ▽ More In many practical scenarios, the dynamical system is not available and standard data assimilation methods are not applicable. Our objective is to construct a data-driven model for state estimation without the underlying dynamics. Instead of directly modeling the observation operator with noisy observation, we establish the state space model of the denoised observation. Through data assimilation techniques, the denoised observation information could be used to recover the original model state. Takens' theorem shows that an embedding of the partial and denoised observation is diffeomorphic to the attractor. This gives a theoretical base for estimating the model state using the reconstruction map. To realize the idea, the procedure consists of offline stage and online stage. In the offline stage, we construct the surrogate dynamics using dynamic mode decomposition with noisy snapshots to learn the transition operator for the denoised observation. The filtering distribution of the denoised observation can be estimated using adaptive ensemble Kalman filter, without knowledge of the model error and observation noise covariances. Then the reconstruction map can be established using the posterior mean of the embedding and its corresponding state. In the online stage, the observation is filtered with the surrogate dynamics. Then the online state estimation can be performed utilizing the reconstruction map and the filtered observation. Furthermore, the idea can be generalized to the nonparametric framework with nonparametric time series prediction methods for chaotic problems. The numerical results show the proposed method can estimate the state distribution without the physical dynamical system. △ Less

Submitted 15 August, 2024; originally announced August 2024.

arXiv:2408.08193 [pdf, other]

doi 10.1145/3663548.3675620

"I Try to Represent Myself as I Am": Self-Presentation Preferences of People with Invisible Disabilities through Embodied Social VR Avatars

Authors: Ria J. Gualano, Lucy Jiang, Kexin Zhang, Tanisha Shende, Andrea Stevenson Won, Shiri Azenkot

Abstract: With the increasing adoption of social virtual reality (VR), it is critical to design inclusive avatars. While researchers have investigated how and why blind and d/Deaf people wish to disclose their disabilities in VR, little is known about the preferences of many others with invisible disabilities (e.g., ADHD, dyslexia, chronic conditions). We filled this gap by interviewing 15 participants, eac… ▽ More With the increasing adoption of social virtual reality (VR), it is critical to design inclusive avatars. While researchers have investigated how and why blind and d/Deaf people wish to disclose their disabilities in VR, little is known about the preferences of many others with invisible disabilities (e.g., ADHD, dyslexia, chronic conditions). We filled this gap by interviewing 15 participants, each with one to three invisible disabilities, who represented 22 different invisible disabilities in total. We found that invisibly disabled people approached avatar-based disclosure through contextualized considerations informed by their prior experiences. For example, some wished to use VR's embodied affordances, such as facial expressions and body language, to dynamically represent their energy level or willingness to engage with others, while others preferred not to disclose their disability identity in any context. We define a binary framework for embodied invisible disability expression (public and private) and discuss three disclosure patterns (Activists, Non-Disclosers, and Situational Disclosers) to inform the design of future inclusive VR experiences. △ Less

Submitted 15 August, 2024; originally announced August 2024.

Comments: To appear at ASSETS 2024

arXiv:2408.07613 [pdf, other]

Rethinking the Key Factors for the Generalization of Remote Sensing Stereo Matching Networks

Authors: Liting Jiang, Feng Wang, Wenyi Zhang, Peifeng Li, Hongjian You, Yuming Xiang

Abstract: Stereo matching, a critical step of 3D reconstruction, has fully shifted towards deep learning due to its strong feature representation of remote sensing images. However, ground truth for stereo matching task relies on expensive airborne LiDAR data, thus making it difficult to obtain enough samples for supervised learning. To improve the generalization ability of stereo matching networks on cross-… ▽ More Stereo matching, a critical step of 3D reconstruction, has fully shifted towards deep learning due to its strong feature representation of remote sensing images. However, ground truth for stereo matching task relies on expensive airborne LiDAR data, thus making it difficult to obtain enough samples for supervised learning. To improve the generalization ability of stereo matching networks on cross-domain data from different sensors and scenarios, in this paper, we dedicate to study key training factors from three perspectives. (1) For the selection of training dataset, it is important to select data with similar regional target distribution as the test set instead of utilizing data from the same sensor. (2) For model structure, cascaded structure that flexibly adapts to different sizes of features is preferred. (3) For training manner, unsupervised methods generalize better than supervised methods, and we design an unsupervised early-stop strategy to help retain the best model with pre-trained weights as the basis. Extensive experiments are conducted to support the previous findings, on the basis of which we present an unsupervised stereo matching network with good generalization performance. We release the source code and the datasets at https://github.com/Elenairene/RKF_RSSM to reproduce the results and encourage future work. △ Less

Submitted 14 August, 2024; originally announced August 2024.

Comments: submitted to IEEE jstars

arXiv:2408.07419 [pdf, other]

Unsupervised Stereo Matching Network For VHR Remote Sensing Images Based On Error Prediction

Authors: Liting Jiang, Yuming Xiang, Feng Wang, Hongjian You

Abstract: Stereo matching in remote sensing has recently garnered increased attention, primarily focusing on supervised learning. However, datasets with ground truth generated by expensive airbone Lidar exhibit limited quantity and diversity, constraining the effectiveness of supervised networks. In contrast, unsupervised learning methods can leverage the increasing availability of very-high-resolution (VHR… ▽ More Stereo matching in remote sensing has recently garnered increased attention, primarily focusing on supervised learning. However, datasets with ground truth generated by expensive airbone Lidar exhibit limited quantity and diversity, constraining the effectiveness of supervised networks. In contrast, unsupervised learning methods can leverage the increasing availability of very-high-resolution (VHR) remote sensing images, offering considerable potential in the realm of stereo matching. Motivated by this intuition, we propose a novel unsupervised stereo matching network for VHR remote sensing images. A light-weight module to bridge confidence with predicted error is introduced to refine the core model. Robust unsupervised losses are formulated to enhance network convergence. The experimental results on US3D and WHU-Stereo datasets demonstrate that the proposed network achieves superior accuracy compared to other unsupervised networks and exhibits better generalization capabilities than supervised models. Our code will be available at https://github.com/Elenairene/CBEM. △ Less

Submitted 14 August, 2024; originally announced August 2024.

Comments: Accepted to International Geoscience and Remote Sensing Symposium (IGARSS), 2024

arXiv:2408.07096 [pdf, other]

doi 10.14778/3685800.3685900

OFL-W3: A One-shot Federated Learning System on Web 3.0

Authors: Linshan Jiang, Moming Duan, Bingsheng He, Yulin Sun, Peishen Yan, Yang Hua, Tao Song

Abstract: Federated Learning (FL) addresses the challenges posed by data silos, which arise from privacy, security regulations, and ownership concerns. Despite these barriers, FL enables these isolated data repositories to participate in collaborative learning without compromising privacy or security. Concurrently, the advancement of blockchain technology and decentralized applications (DApps) within Web 3.… ▽ More Federated Learning (FL) addresses the challenges posed by data silos, which arise from privacy, security regulations, and ownership concerns. Despite these barriers, FL enables these isolated data repositories to participate in collaborative learning without compromising privacy or security. Concurrently, the advancement of blockchain technology and decentralized applications (DApps) within Web 3.0 heralds a new era of transformative possibilities in web development. As such, incorporating FL into Web 3.0 paves the path for overcoming the limitations of data silos through collaborative learning. However, given the transaction speed constraints of core blockchains such as Ethereum (ETH) and the latency in smart contracts, employing one-shot FL, which minimizes client-server interactions in traditional FL to a single exchange, is considered more apt for Web 3.0 environments. This paper presents a practical one-shot FL system for Web 3.0, termed OFL-W3. OFL-W3 capitalizes on blockchain technology by utilizing smart contracts for managing transactions. Meanwhile, OFL-W3 utilizes the Inter-Planetary File System (IPFS) coupled with Flask communication, to facilitate backend server operations to use existing one-shot FL algorithms. With the integration of the incentive mechanism, OFL-W3 showcases an effective implementation of one-shot FL on Web 3.0, offering valuable insights and future directions for AI combined with Web 3.0 studies. △ Less

Submitted 12 August, 2024; originally announced August 2024.

Comments: VLDB 24 demo paper

arXiv:2408.05433 [pdf, ps, other]

Transfer and entanglement stability of property ($UW${\normalsize\it{E}})

Authors: Sinan Qiu, Lining Jiang

Abstract: An operator $T\in B(H)$ is said to satisfy property ($UW${\scriptsize \it{E}}) if the complement in the approximate point spectrum of the essential approximate point spectrum coincides with the isolated eigenvalues of the spectrum. Via the CI spectrum induced by consistent invertibility property of operators, we explore property ($UW${\scriptsize \it{E}}) for $T$ and $T^\ast$ simultaneously. Furth… ▽ More An operator $T\in B(H)$ is said to satisfy property ($UW${\scriptsize \it{E}}) if the complement in the approximate point spectrum of the essential approximate point spectrum coincides with the isolated eigenvalues of the spectrum. Via the CI spectrum induced by consistent invertibility property of operators, we explore property ($UW${\scriptsize \it{E}}) for $T$ and $T^\ast$ simultaneously. Furthermore, the transfer of property ($UW${\scriptsize \it{E}}) from $T$ to $f(T)$ and $f(T^{\ast})$ is obtained, where $f$ is a function which is analytic in a neighborhood of the spectrum of $T$. At last, with the help of the so-called $(A,B)$ entanglement stable spectra, the entanglement stability of property ($UW${\scriptsize \it{E}}) for $2\times 2$ upper triangular operator matrices is investigated. △ Less

Submitted 10 August, 2024; originally announced August 2024.

arXiv:2408.04421 [pdf, other]

Dark spin-cats as biased qubits

Authors: Andreas Kruckenhauser, Ming Yuan, Han Zheng, Mikhail Mamaev, Pei Zeng, Xuanhui Mao, Qian Xu, Torsten V. Zache, Liang Jiang, Rick van Bijnen, Peter Zoller

Abstract: We present a biased atomic qubit, universally implementable across all atomic platforms, encoded as a `spin-cat' within ground state Zeeman levels. The key characteristic of our configuration is the coupling of the ground state spin manifold of size $F_g \gg 1$ to an excited Zeeman spin manifold of size $F_e = F_g - 1$ using light. This coupling results in eigenstates of the driven atom that inclu… ▽ More We present a biased atomic qubit, universally implementable across all atomic platforms, encoded as a `spin-cat' within ground state Zeeman levels. The key characteristic of our configuration is the coupling of the ground state spin manifold of size $F_g \gg 1$ to an excited Zeeman spin manifold of size $F_e = F_g - 1$ using light. This coupling results in eigenstates of the driven atom that include exactly two dark states in the ground state manifold, which are decoupled from light and immune to spontaneous emission from the excited states. These dark states constitute the `spin-cat', leading to the designation `dark spin-cat'. We demonstrate that under strong Rabi drive and for large $F_g$, the `dark spin-cat' is autonomously stabilized against common noise sources and encodes a qubit with significantly biased noise. Specifically, the bit-flip error rate decreases exponentially with $F_g$ relative to the dephasing rate. We provide an analysis of dark spin-cats, their robustness to noise, and discuss bias-preserving single qubit and entangling gates, exemplified on a Rydberg tweezer platform. △ Less

Submitted 8 August, 2024; originally announced August 2024.

arXiv:2408.03376 [pdf, other]

Entanglement-enhanced learning of quantum processes at scale

Authors: Alireza Seif, Senrui Chen, Swarnadeep Majumder, Haoran Liao, Derek S. Wang, Moein Malekakhlagh, Ali Javadi-Abhari, Liang Jiang, Zlatko K. Minev

Abstract: Learning unknown processes affecting a quantum system reveals underlying physical mechanisms and enables suppression, mitigation, and correction of unwanted effects. Describing a general quantum process requires an exponentially large number of parameters. Measuring these parameters, when they are encoded in incompatible observables, is constrained by the uncertainty principle and requires exponen… ▽ More Learning unknown processes affecting a quantum system reveals underlying physical mechanisms and enables suppression, mitigation, and correction of unwanted effects. Describing a general quantum process requires an exponentially large number of parameters. Measuring these parameters, when they are encoded in incompatible observables, is constrained by the uncertainty principle and requires exponentially many measurements. However, for Pauli channels, having access to an ideal quantum memory and entangling operations allows encoding parameters in commuting observables, thereby exponentially reducing measurement complexity. In practice, though, quantum memory and entangling operations are always noisy and introduce errors, making the advantage of using noisy quantum memory unclear. To address these challenges we introduce error-mitigated entanglement-enhanced learning and show, both theoretically and experimentally, that even with noise, there is a separation in efficiency between learning Pauli channels with and without entanglement with noisy quantum memory. We demonstrate our protocol's efficacy in examples including hypothesis testing with up to 64 qubits and learning inherent noise processes in a layer of parallel gates using up to 16 qubits on a superconducting quantum processor. Our protocol provides accurate and practical information about the process, with an overhead factor of $1.33 \pm 0.05$ per qubit, much smaller than the fundamental lower bound of 2 without entanglement with quantum memory. Our study demonstrates that entanglement with auxiliary noisy quantum memory combined with error mitigation considerably enhances the learning of quantum processes. △ Less

Submitted 6 August, 2024; originally announced August 2024.

arXiv:2408.02878 [pdf]

Ultrahigh-speed thin-film lithium niobate optical coherent receiver

Authors: Xiaojun Xie, Chao Wei, Xingchen He, Yake Chen, Chenghao Wang, Jihui Sun, Lin Jiang, Jia Ye, Xihua Zou, Wei Pan, Lianshan Yan

Abstract: The rapid advancement of the thin-film lithium niobate platform has established it as a premier choice for high-performance photonics integration. High-speed optical coherent receivers are essential for supporting the large communication capacities required by data center interconnects. Although high-speed photodiodes have been demonstrated on the thin-film LiNbO3 platform, the development of an u… ▽ More The rapid advancement of the thin-film lithium niobate platform has established it as a premier choice for high-performance photonics integration. High-speed optical coherent receivers are essential for supporting the large communication capacities required by data center interconnects. Although high-speed photodiodes have been demonstrated on the thin-film LiNbO3 platform, the development of an ultrahigh-speed optical coherent receiver on this platform has not yet been realized. Here, we propose and experimentally demonstrate an ultra-wideband PD and ultrahigh-speed optical coherent receiver on an InP-LiNbO3 wafer-level heterogeneous integration platform. The fabricated single PD exhibits a record-high bandwidth of 140 GHz and successfully receives a high-quality 100-Gbaud pulse amplitude modulation (PAM4) signal. Furthermore, a thin-film LiNbO3 optical coherent receiver, featuring a large balanced detection bandwidth of 60 GHz, a large common mode rejection ratio (CMRR) exceeding 20 dB, and a low energy consumption of 9.6 fJ per bit, enables an ultrahigh-speed coherent reception with advanced modulation formats. The single-polarization I-Q coherent receiver, incorporating a compact 2x4 90 optical hybrid and a balanced photodetector array, achieves a receiving capacity of 600 Gbps per channel with 100-Gbaud 64 quadrature amplitude modulation (QAM) signal and 512 Gbps per channel with 128-Gbaud 16 QAM signal. Additionally, we demonstrate a long-distance reception of 100 Gbaud quadrature phase-shift keying (QPSK) and 16 QAM signals over transmission distances of 1040 km and 25 km. A seven-channel single-polarization I-Q coherent receiving chip achieves a total receiving capacity of 3.584 Tbps. This heterogeneous-integrated thin-film LiNbO3 optical coherent receiver shows the potential for Pbps-scale applications in future hyperscale data center interconnects. △ Less

Submitted 5 August, 2024; originally announced August 2024.

arXiv:2408.01269 [pdf, other]

A General Framework to Boost 3D GS Initialization for Text-to-3D Generation by Lexical Richness

Authors: Lutao Jiang, Hangyu Li, Lin Wang

Abstract: Text-to-3D content creation has recently received much attention, especially with the prevalence of 3D Gaussians Splatting. In general, GS-based methods comprise two key stages: initialization and rendering optimization. To achieve initialization, existing works directly apply random sphere initialization or 3D diffusion models, e.g., Point-E, to derive the initial shapes. However, such strategies… ▽ More Text-to-3D content creation has recently received much attention, especially with the prevalence of 3D Gaussians Splatting. In general, GS-based methods comprise two key stages: initialization and rendering optimization. To achieve initialization, existing works directly apply random sphere initialization or 3D diffusion models, e.g., Point-E, to derive the initial shapes. However, such strategies suffer from two critical yet challenging problems: 1) the final shapes are still similar to the initial ones even after training; 2) shapes can be produced only from simple texts, e.g., "a dog", not for lexically richer texts, e.g., "a dog is sitting on the top of the airplane". To address these problems, this paper proposes a novel general framework to boost the 3D GS Initialization for text-to-3D generation upon the lexical richness. Our key idea is to aggregate 3D Gaussians into spatially uniform voxels to represent complex shapes while enabling the spatial interaction among the 3D Gaussians and semantic interaction between Gaussians and texts. Specifically, we first construct a voxelized representation, where each voxel holds a 3D Gaussian with its position, scale, and rotation fixed while setting opacity as the sole factor to determine a position's occupancy. We then design an initialization network mainly consisting of two novel components: 1) Global Information Perception (GIP) block and 2) Gaussians-Text Fusion (GTF) block. Such a design enables each 3D Gaussian to assimilate the spatial information from other areas and semantic information from texts. Extensive experiments show the superiority of our framework of high-quality 3D GS initialization against the existing methods, e.g., Shap-E, by taking lexically simple, medium, and hard texts. Also, our framework can be seamlessly plugged into SoTA training frameworks, e.g., LucidDreamer, for semantically consistent text-to-3D generation. △ Less

Submitted 2 August, 2024; originally announced August 2024.

Journal ref: ACM MM 2024

arXiv:2408.01191 [pdf, other]

A Weakly Supervised and Globally Explainable Learning Framework for Brain Tumor Segmentation

Authors: Ruitao Xie, Limai Jiang, Xiaoxi He, Yi Pan, Yunpeng Cai

Abstract: Machine-based brain tumor segmentation can help doctors make better diagnoses. However, the complex structure of brain tumors and expensive pixel-level annotations present challenges for automatic tumor segmentation. In this paper, we propose a counterfactual generation framework that not only achieves exceptional brain tumor segmentation performance without the need for pixel-level annotations, b… ▽ More Machine-based brain tumor segmentation can help doctors make better diagnoses. However, the complex structure of brain tumors and expensive pixel-level annotations present challenges for automatic tumor segmentation. In this paper, we propose a counterfactual generation framework that not only achieves exceptional brain tumor segmentation performance without the need for pixel-level annotations, but also provides explainability. Our framework effectively separates class-related features from class-unrelated features of the samples, and generate new samples that preserve identity features while altering class attributes by embedding different class-related features. We perform topological data analysis on the extracted class-related features and obtain a globally explainable manifold, and for each abnormal sample to be segmented, a meaningful normal sample could be effectively generated with the guidance of the rule-based paths designed within the manifold for comparison for identifying the tumor regions. We evaluate our proposed method on two datasets, which demonstrates superior performance of brain tumor segmentation. The code is available at https://github.com/xrt11/tumor-segmentation. △ Less

Submitted 2 August, 2024; originally announced August 2024.

Comments: 2024 IEEE International Conference on Multimedia and Expo

arXiv:2408.00582 [pdf, other]

First Measurement of the Total Inelastic Cross-Section of Positively-Charged Kaons on Argon at Energies Between 5.0 and 7.5 GeV

Authors: DUNE Collaboration, A. Abed Abud, B. Abi, R. Acciarri, M. A. Acero, M. R. Adames, G. Adamov, M. Adamowski, D. Adams, M. Adinolfi, C. Adriano, A. Aduszkiewicz, J. Aguilar, F. Akbar, K. Allison, S. Alonso Monsalve, M. Alrashed, A. Alton, R. Alvarez, T. Alves, H. Amar, P. Amedo, J. Anderson, C. Andreopoulos, M. Andreotti , et al. (1341 additional authors not shown)

Abstract: ProtoDUNE Single-Phase (ProtoDUNE-SP) is a 770-ton liquid argon time projection chamber that operated in a hadron test beam at the CERN Neutrino Platform in 2018. We present a measurement of the total inelastic cross section of charged kaons on argon as a function of kaon energy using 6 and 7 GeV/$c$ beam momentum settings. The flux-weighted average of the extracted inelastic cross section at each… ▽ More ProtoDUNE Single-Phase (ProtoDUNE-SP) is a 770-ton liquid argon time projection chamber that operated in a hadron test beam at the CERN Neutrino Platform in 2018. We present a measurement of the total inelastic cross section of charged kaons on argon as a function of kaon energy using 6 and 7 GeV/$c$ beam momentum settings. The flux-weighted average of the extracted inelastic cross section at each beam momentum setting was measured to be 380$\pm$26 mbarns for the 6 GeV/$c$ setting and 379$\pm$35 mbarns for the 7 GeV/$c$ setting. △ Less

Submitted 1 August, 2024; originally announced August 2024.

Report number: CERN-EP-2024-211, FERMILAB-PUB-24-0216-V

arXiv:2407.21723 [pdf, other]

Coordinating Decisions via Quantum Telepathy

Authors: Dawei Ding, Liang Jiang

Abstract: Quantum telepathy is the phenomenon where two non-communicating parties can exhibit correlated behaviors that are impossible to achieve using classical mechanics. This is also known as Bell inequality violation and is made possible by quantum entanglement. In this work, we present a conceptual framework for applying quantum telepathy to real-world problems. In general, the problems involve coordin… ▽ More Quantum telepathy is the phenomenon where two non-communicating parties can exhibit correlated behaviors that are impossible to achieve using classical mechanics. This is also known as Bell inequality violation and is made possible by quantum entanglement. In this work, we present a conceptual framework for applying quantum telepathy to real-world problems. In general, the problems involve coordinating decisions given a set of observations without being able to communicate. We argue this inability is actually quite prevalent in the modern era where the decision-making timescales of computer processors are so short that the speed of light delay is actually quite appreciable in comparison. We highlight the example of high-frequency trading (HFT), where trades are made at microsecond timescales, but the speed of light delay between different exchanges can range from the order of 100 microseconds to 10 milliseconds. Due to the maturity of Bell inequality violation experiments, experimental realization of quantum telepathy schemes that can attain a quantum advantage for real-world problems $\textit{is already almost immediately possible}$. We demonstrate this by conducting a case study for a concrete HFT scenario that gives rise to a generalization of the CHSH game and evaluate different possible physical implementations for achieving a quantum advantage. It is well known that Bell inequality violation is a rigorous mathematical proof of a quantum advantage over any classical strategy and does not need any complexity-theoretic assumptions such as $\text{BQP}\neq\text{BPP}$. Moreover, fault tolerance is not necessary to realize a quantum advantage: for example, violating the CHSH inequality only requires single-qubit gates applied on two entangled physical qubits. △ Less

Submitted 10 September, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

Comments: 42 pages, 8 figures, v2 corrects an issue with the HFT example in v1

arXiv:2407.21654 [pdf, other]

MTA-CLIP: Language-Guided Semantic Segmentation with Mask-Text Alignment

Authors: Anurag Das, Xinting Hu, Li Jiang, Bernt Schiele

Abstract: Recent approaches have shown that large-scale vision-language models such as CLIP can improve semantic segmentation performance. These methods typically aim for pixel-level vision-language alignment, but often rely on low resolution image features from CLIP, resulting in class ambiguities along boundaries. Moreover, the global scene representations in CLIP text embeddings do not directly correlate… ▽ More Recent approaches have shown that large-scale vision-language models such as CLIP can improve semantic segmentation performance. These methods typically aim for pixel-level vision-language alignment, but often rely on low resolution image features from CLIP, resulting in class ambiguities along boundaries. Moreover, the global scene representations in CLIP text embeddings do not directly correlate with the local and detailed pixel-level features, making meaningful alignment more difficult. To address these limitations, we introduce MTA-CLIP, a novel framework employing mask-level vision-language alignment. Specifically, we first propose Mask-Text Decoder that enhances the mask representations using rich textual data with the CLIP language model. Subsequently, it aligns mask representations with text embeddings using Mask-to-Text Contrastive Learning. Furthermore, we introduce MaskText Prompt Learning, utilizing multiple context-specific prompts for text embeddings to capture diverse class representations across masks. Overall, MTA-CLIP achieves state-of-the-art, surpassing prior works by an average of 2.8% and 1.3% on on standard benchmark datasets, ADE20k and Cityscapes, respectively. △ Less

Submitted 31 July, 2024; originally announced July 2024.

Comments: accepted at ECCV 2024

arXiv:2407.21415 [pdf, other]

In situ Qubit Frequency Tuning Circuit for Scalable Superconducting Quantum Computing: Scheme and Experiment

Authors: Lei Jiang, Yu Xu, Shaowei Li, Zhiguang Yan, Ming Gong, Tao Rong, Chenyin Sun, Tianzuo Sun, Tao Jiang, Hui Deng, Chen Zha, Jin Lin, Fusheng Chen, Qingling Zhu, Yangsen Ye, Hao Rong, Kai Yan, Sirui Cao, Yuan Li, Shaojun Guo, Haoran Qian, Yisen Hu, Yulin Wu, Yuhuai Li, Gang Wu , et al. (8 additional authors not shown)

Abstract: Frequency tunable qubit plays a significant role for scalable superconducting quantum processors. The state-of-the-art room-temperature electronics for tuning qubit frequency suffers from unscalable limit, such as heating problem, linear growth of control cables, etc. Here we propose a scalable scheme to tune the qubit frequency by using in situ superconducting circuit, which is based on radio fre… ▽ More Frequency tunable qubit plays a significant role for scalable superconducting quantum processors. The state-of-the-art room-temperature electronics for tuning qubit frequency suffers from unscalable limit, such as heating problem, linear growth of control cables, etc. Here we propose a scalable scheme to tune the qubit frequency by using in situ superconducting circuit, which is based on radio frequency superconducting quantum interference device (rf-SQUID). We demonstrate both theoretically and experimentally that the qubit frequency could be modulated by inputting several single pulses into rf-SQUID. Compared with the traditional scheme, our scheme not only solves the heating problem, but also provides the potential to exponentially reduce the number of cables inside the dilute refrigerator and the room-temperature electronics resource for tuning qubit frequency, which is achieved by a time-division-multiplex (TDM) scheme combining rf-SQUID with switch arrays. With such TDM scheme, the number of cables could be reduced from the usual $\sim 3n$ to $\sim \log_2{(3n)} + 1$ for two-dimensional quantum processors comprising $n$ qubits and $\sim 2n$ couplers. Our work paves the way for large-scale control of superconducting quantum processor. △ Less

Submitted 31 July, 2024; originally announced July 2024.

Comments: 9 pages, 6 figures

arXiv:2407.21052 [pdf, other]

Table-Filling via Mean Teacher for Cross-domain Aspect Sentiment Triplet Extraction

Authors: Kun Peng, Lei Jiang, Qian Li, Haoran Li, Xiaoyan Yu, Li Sun, Shuo Sun, Yanxian Bi, Hao Peng

Abstract: Cross-domain Aspect Sentiment Triplet Extraction (ASTE) aims to extract fine-grained sentiment elements from target domain sentences by leveraging the knowledge acquired from the source domain. Due to the absence of labeled data in the target domain, recent studies tend to rely on pre-trained language models to generate large amounts of synthetic data for training purposes. However, these approach… ▽ More Cross-domain Aspect Sentiment Triplet Extraction (ASTE) aims to extract fine-grained sentiment elements from target domain sentences by leveraging the knowledge acquired from the source domain. Due to the absence of labeled data in the target domain, recent studies tend to rely on pre-trained language models to generate large amounts of synthetic data for training purposes. However, these approaches entail additional computational costs associated with the generation process. Different from them, we discover a striking resemblance between table-filling methods in ASTE and two-stage Object Detection (OD) in computer vision, which inspires us to revisit the cross-domain ASTE task and approach it from an OD standpoint. This allows the model to benefit from the OD extraction paradigm and region-level alignment. Building upon this premise, we propose a novel method named \textbf{T}able-\textbf{F}illing via \textbf{M}ean \textbf{T}eacher (TFMT). Specifically, the table-filling methods encode the sentence into a 2D table to detect word relations, while TFMT treats the table as a feature map and utilizes a region consistency to enhance the quality of those generated pseudo labels. Additionally, considering the existence of the domain gap, a cross-domain consistency based on Maximum Mean Discrepancy is designed to alleviate domain shift problems. Our method achieves state-of-the-art performance with minimal parameters and computational costs, making it a strong baseline for cross-domain ASTE. △ Less

Submitted 23 July, 2024; originally announced July 2024.

Comments: Accepted by CIKM2024

arXiv:2407.20897 [pdf, other]

Distributed Adaptive Time-Varying Optimization with Global Asymptotic Convergence

Authors: Liangze Jiang, Zheng-Guang Wu, Lei Wang

Abstract: In this note, we study distributed time-varying optimization for a multi-agent system. We first focus on a class of time-varying quadratic cost functions, and develop a new distributed algorithm that integrates an average estimator and an adaptive optimizer, with both bridged by a Dead Zone Algorithm. Based on a composite Lyapunov function and finite escape-time analysis, we prove the closed-loop… ▽ More In this note, we study distributed time-varying optimization for a multi-agent system. We first focus on a class of time-varying quadratic cost functions, and develop a new distributed algorithm that integrates an average estimator and an adaptive optimizer, with both bridged by a Dead Zone Algorithm. Based on a composite Lyapunov function and finite escape-time analysis, we prove the closed-loop global asymptotic convergence to the optimal solution under mild assumptions. Particularly, the introduction of the estimator relaxes the requirement for the Hessians of cost functions, and the integrated design eliminates the waiting time required in the relevant literature for estimating global parameter during algorithm implementation. We then extend this result to a more general class of time-varying cost functions. Two examples are used to verify the proposed designs. △ Less

Submitted 3 August, 2024; v1 submitted 30 July, 2024; originally announced July 2024.

Comments: 11 pages, 7 figures

arXiv:2407.20689 [pdf, other]

Controlling superradiant phase transition in quantum Rabi model

Authors: Xuan Xie, Cheng Liu, Lin-Lin Jiang, Jin-Feng Huang

Abstract: In the ultrastrong-coupling regime, the quantum Rabi model can exhibit quantum phase transition (QPT) when the ratio of the qubit transition frequency to the frequency of the cavity field approaches infinity. However, it is challenging to control the QPT in few-body systems because of the limited coupling strength and the A^2 terms. Here, we propose a practical scheme to manipulate the QPT of quan… ▽ More In the ultrastrong-coupling regime, the quantum Rabi model can exhibit quantum phase transition (QPT) when the ratio of the qubit transition frequency to the frequency of the cavity field approaches infinity. However, it is challenging to control the QPT in few-body systems because of the limited coupling strength and the A^2 terms. Here, we propose a practical scheme to manipulate the QPT of quantum Rabi model in the strong-coupling regime. By applying a periodic frequency modulation to the two-level system in a standard quantum Rabi model in the strong-coupling regime, an anisotropic quantum Rabi model with ultrastrong and tunable coupling strengths for rotating and counter-rotating terms is obtained. The ground-state and excitation energy of this model in terms of the modulation parameters are studied. We find that the QPT of quantum Rabi model can be observed in the strong-coupling regime and externally controlled by the modulation. △ Less

Submitted 30 July, 2024; originally announced July 2024.

Comments: 10 pages, 5 figures

arXiv:2407.20018 [pdf, other]

Efficient Training of Large Language Models on Distributed Infrastructures: A Survey

Authors: Jiangfei Duan, Shuo Zhang, Zerui Wang, Lijuan Jiang, Wenwen Qu, Qinghao Hu, Guoteng Wang, Qizhen Weng, Hang Yan, Xingcheng Zhang, Xipeng Qiu, Dahua Lin, Yonggang Wen, Xin Jin, Tianwei Zhang, Peng Sun

Abstract: Large Language Models (LLMs) like GPT and LLaMA are revolutionizing the AI industry with their sophisticated capabilities. Training these models requires vast GPU clusters and significant computing time, posing major challenges in terms of scalability, efficiency, and reliability. This survey explores recent advancements in training systems for LLMs, including innovations in training infrastructur… ▽ More Large Language Models (LLMs) like GPT and LLaMA are revolutionizing the AI industry with their sophisticated capabilities. Training these models requires vast GPU clusters and significant computing time, posing major challenges in terms of scalability, efficiency, and reliability. This survey explores recent advancements in training systems for LLMs, including innovations in training infrastructure with AI accelerators, networking, storage, and scheduling. Additionally, the survey covers parallelism strategies, as well as optimizations for computation, communication, and memory in distributed LLM training. It also includes approaches of maintaining system reliability over extended training periods. By examining current innovations and future directions, this survey aims to provide valuable insights towards improving LLM training systems and tackling ongoing challenges. Furthermore, traditional digital circuit-based computing systems face significant constraints in meeting the computational demands of LLMs, highlighting the need for innovative solutions such as optical computing and optical networks. △ Less

Submitted 29 July, 2024; originally announced July 2024.

arXiv:2407.19690 [pdf, other]

Synthetic monopole with half-integer magnetic charge in Bose-Einstein condensates

Authors: Xi-Yu Chen, Lijia Jiang, Wen-Kai Bai, Tao Yang, Jun-Hui Zheng

Abstract: We propose a scheme to create monopoles with half-integer magnetic charges in a spinful cold atom system. With a minimal monopole in the center, we derive the ground-state single-vortex wave function on the sphere and develop the vortex's kinematic equation in the presence of an external electromagnetic field. The vortex's trajectory is generally depicted by the precession of the system. We furthe… ▽ More We propose a scheme to create monopoles with half-integer magnetic charges in a spinful cold atom system. With a minimal monopole in the center, we derive the ground-state single-vortex wave function on the sphere and develop the vortex's kinematic equation in the presence of an external electromagnetic field. The vortex's trajectory is generally depicted by the precession of the system. We further formulate the inter-vortex interaction and build up a theory of multi-vortex dynamics in high-charge monopole systems. We predict the vortices'trajectory in the bi-vortex system and figure out stable vortex (line) patterns in multi-vortex systems. Our study provides deep insights into properties of magnetic monopoles and vortices and paves the way for experimental verification. △ Less

Submitted 29 July, 2024; originally announced July 2024.

Comments: 6+2+3 pages, 4+1 figures, 1 table

arXiv:2407.19397 [pdf, other]

Domain Adaptive Lung Nodule Detection in X-ray Image

Authors: Haifeng Zhao, Lixiang Jiang, Leilei Ma, Dengdi Sun, Yanping Fu

Abstract: Medical images from different healthcare centers exhibit varied data distributions, posing significant challenges for adapting lung nodule detection due to the domain shift between training and application phases. Traditional unsupervised domain adaptive detection methods often struggle with this shift, leading to suboptimal outcomes. To overcome these challenges, we introduce a novel domain adapt… ▽ More Medical images from different healthcare centers exhibit varied data distributions, posing significant challenges for adapting lung nodule detection due to the domain shift between training and application phases. Traditional unsupervised domain adaptive detection methods often struggle with this shift, leading to suboptimal outcomes. To overcome these challenges, we introduce a novel domain adaptive approach for lung nodule detection that leverages mean teacher self-training and contrastive learning. First, we propose a hierarchical contrastive learning strategy to refine nodule representations and enhance the distinction between nodules and background. Second, we introduce a nodule-level domain-invariant feature learning (NDL) module to capture domain-invariant features through adversarial learning across different domains. Additionally, we propose a new annotated dataset of X-ray images to aid in advancing lung nodule detection research. Extensive experiments conducted on multiple X-ray datasets demonstrate the efficacy of our approach in mitigating domain shift impacts. △ Less

Submitted 2 August, 2024; v1 submitted 28 July, 2024; originally announced July 2024.

Comments: This paper will submit to IEEE SMC 2024

arXiv:2407.19109 [pdf, other]

Microwave-Optical Entanglement from Pulse-pumped Electro-optomechanics

Authors: Changchun Zhong, Fangxin Li, Srujan Meesala, Steven Wood, David Lake, Oskar Painter, Liang Jiang

Abstract: Entangling microwave and optical photons is one of the promising ways to realize quantum transduction through quantum teleportation. This paper investigates the entanglement of microwave-optical photon pairs generated from an electro-optomechanical system driven by a blue-detuned pulsed Gaussian pump. The photon pairs are obtained through weak parametric-down-conversion, and their temporal correla… ▽ More Entangling microwave and optical photons is one of the promising ways to realize quantum transduction through quantum teleportation. This paper investigates the entanglement of microwave-optical photon pairs generated from an electro-optomechanical system driven by a blue-detuned pulsed Gaussian pump. The photon pairs are obtained through weak parametric-down-conversion, and their temporal correlation is revealed by the second-order correlation function. We then study the discrete variable entanglement encoded in the time bin degree of freedom, where entanglement is identified by Bell inequality violation. Furthermore, we estimate the laser-induced heating and show that the pulse-pumped system features lower heating effects while maintaining a reasonable coincidence photon counting rate. △ Less

Submitted 26 July, 2024; originally announced July 2024.

arXiv:2407.18490 [pdf, other]

Fast and Parallelizable Logical Computation with Homological Product Codes

Authors: Qian Xu, Hengyun Zhou, Guo Zheng, Dolev Bluvstein, J. Pablo Bonilla Ataides, Mikhail D. Lukin, Liang Jiang

Abstract: Quantum error correction is necessary to perform large-scale quantum computation, but requires extremely large overheads in both space and time. High-rate quantum low-density-parity-check (qLDPC) codes promise a route to reduce qubit numbers, but performing computation while maintaining low space cost has required serialization of operations and extra time costs. In this work, we design fast and p… ▽ More Quantum error correction is necessary to perform large-scale quantum computation, but requires extremely large overheads in both space and time. High-rate quantum low-density-parity-check (qLDPC) codes promise a route to reduce qubit numbers, but performing computation while maintaining low space cost has required serialization of operations and extra time costs. In this work, we design fast and parallelizable logical gates for qLDPC codes, and demonstrate their utility for key algorithmic subroutines such as the quantum adder. Our gate gadgets utilize transversal logical CNOTs between a data qLDPC code and a suitably constructed ancilla code to perform parallel Pauli product measurements (PPMs) on the data logical qubits. For hypergraph product codes, we show that the ancilla can be constructed by simply modifying the base classical codes of the data code, achieving parallel PPMs on a subgrid of the logical qubits with a lower space-time cost than existing schemes for an important class of circuits. Generalizations to 3D and 4D homological product codes further feature fast PPMs in constant depth. While prior work on qLDPC codes has focused on individual logical gates, we initiate the study of fault-tolerant compilation with our expanded set of native qLDPC code operations, constructing algorithmic primitives for preparing $k$-qubit GHZ states and distilling/teleporting $k$ magic states with $O(1)$ space overhead in $O(1)$ and $O(\sqrt{k} \log k)$ logical cycles, respectively. We further generalize this to key algorithmic subroutines, demonstrating the efficient implementation of quantum adders using parallel operations. Our constructions are naturally compatible with reconfigurable architectures such as neutral atom arrays, paving the way to large-scale quantum computation with low space and time overheads. △ Less

Submitted 25 July, 2024; originally announced July 2024.

arXiv:2407.18274 [pdf, other]

doi 10.1145/3627673.3679537

Adaptive Differentially Private Structural Entropy Minimization for Unsupervised Social Event Detection

Authors: Zhiwei Yang, Yuecen Wei, Haoran Li, Qian Li, Lei Jiang, Li Sun, Xiaoyan Yu, Chunming Hu, Hao Peng

Abstract: Social event detection refers to extracting relevant message clusters from social media data streams to represent specific events in the real world. Social event detection is important in numerous areas, such as opinion analysis, social safety, and decision-making. Most current methods are supervised and require access to large amounts of data. These methods need prior knowledge of the events and… ▽ More Social event detection refers to extracting relevant message clusters from social media data streams to represent specific events in the real world. Social event detection is important in numerous areas, such as opinion analysis, social safety, and decision-making. Most current methods are supervised and require access to large amounts of data. These methods need prior knowledge of the events and carry a high risk of leaking sensitive information in the messages, making them less applicable in open-world settings. Therefore, conducting unsupervised detection while fully utilizing the rich information in the messages and protecting data privacy remains a significant challenge. To this end, we propose a novel social event detection framework, ADP-SEMEvent, an unsupervised social event detection method that prioritizes privacy. Specifically, ADP-SEMEvent is divided into two stages, i.e., the construction stage of the private message graph and the clustering stage of the private message graph. In the first stage, an adaptive differential privacy approach is used to construct a private message graph. In this process, our method can adaptively apply differential privacy based on the events occurring each day in an open environment to maximize the use of the privacy budget. In the second stage, to address the reduction in data utility caused by noise, a novel 2-dimensional structural entropy minimization algorithm based on optimal subgraphs is used to detect events in the message graph. The highlight of this process is unsupervised and does not compromise differential privacy. Extensive experiments on two public datasets demonstrate that ADP-SEMEvent can achieve detection performance comparable to state-of-the-art methods while maintaining reasonable privacy budget parameters. △ Less

Submitted 23 July, 2024; originally announced July 2024.

Comments: Accepted to ACM CIKM 2024

arXiv:2407.17468 [pdf, other]

WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries

Authors: Wenting Zhao, Tanya Goyal, Yu Ying Chiu, Liwei Jiang, Benjamin Newman, Abhilasha Ravichander, Khyathi Chandu, Ronan Le Bras, Claire Cardie, Yuntian Deng, Yejin Choi

Abstract: While hallucinations of large language models (LLMs) prevail as a major challenge, existing evaluation benchmarks on factuality do not cover the diverse domains of knowledge that the real-world users of LLMs seek information about. To bridge this gap, we introduce WildHallucinations, a benchmark that evaluates factuality. It does so by prompting LLMs to generate information about entities mined fr… ▽ More While hallucinations of large language models (LLMs) prevail as a major challenge, existing evaluation benchmarks on factuality do not cover the diverse domains of knowledge that the real-world users of LLMs seek information about. To bridge this gap, we introduce WildHallucinations, a benchmark that evaluates factuality. It does so by prompting LLMs to generate information about entities mined from user-chatbot conversations in the wild. These generations are then automatically fact-checked against a systematically curated knowledge source collected from web search. Notably, half of these real-world entities do not have associated Wikipedia pages. We evaluate 118,785 generations from 15 LLMs on 7,919 entities. We find that LLMs consistently hallucinate more on entities without Wikipedia pages and exhibit varying hallucination rates across different domains. Finally, given the same base models, adding a retrieval component only slightly reduces hallucinations but does not eliminate hallucinations. △ Less

Submitted 24 July, 2024; originally announced July 2024.

Showing 1–50 of 1,722 results for author: Jiang, L