Search | arXiv e-print repository

A complete waveform comparison of post-Newtonian and numerical relativity in eccentric orbits

Authors: Hao Wang, Yuan-Chuan Zou, Qing-Wen Wu, Xiaolin Liu, Zhao Li

Abstract: This study presents a thorough comparative analysis between post-Newtonian (PN) and numerically relativistic (NR) waveforms in eccentric orbits, covering nonspinning and spin-aligned configurations. The comparison examines frequency, amplitude, and phase characteristics of various harmonic modes, such as 22, 21, 33, 32, 44, 43, and 55 modes. The study utilizes eccentric PN waveforms based on 3PN q… ▽ More This study presents a thorough comparative analysis between post-Newtonian (PN) and numerically relativistic (NR) waveforms in eccentric orbits, covering nonspinning and spin-aligned configurations. The comparison examines frequency, amplitude, and phase characteristics of various harmonic modes, such as 22, 21, 33, 32, 44, 43, and 55 modes. The study utilizes eccentric PN waveforms based on 3PN quasi-Keplerian parameterization with 3PN radiative reaction, surpassing Newtonian quadrupole moment with higher-order moments. NR waveforms from RIT and SXS catalogs span mass ratios from 1/4 to 1, eccentricities up to 0.45, and durations exceeding $17000M$ across nonspinning and spin-aligned configurations. Focusing on the 22 mode, frequency comparisons between quadrupole and higher-order moments of $Ψ_4^{22}$ and $h^{22}$ were conducted. Amplitude comparisons revealed superior accuracy in quadrupole moments of $Ψ_4^{22}$. Analysis of total 180 sets of eccentric waveforms showed increasing fitting residuals with rising eccentricity, correlating with smaller mass ratios. Comparisons of initial eccentricity from PN fitting, 3PN quasi-Keplerian parameterization, and RIT/SXS catalogs revealed alignment discrepancies. Frequency, phase, and amplitude comparisons of 22 modes showed consistent inspiral behavior between PN and NR, with divergences near merger for nonspinning PN and pre-200M for spin-aligned PN. △ Less

Submitted 26 September, 2024; originally announced September 2024.

Comments: Comments are very welcome and it have been submitted to PRD with 18 figures

arXiv:2409.17029 [pdf, other]

EventHDR: from Event to High-Speed HDR Videos and Beyond

Authors: Yunhao Zou, Ying Fu, Tsuyoshi Takatani, Yinqiang Zheng

Abstract: Event cameras are innovative neuromorphic sensors that asynchronously capture the scene dynamics. Due to the event-triggering mechanism, such cameras record event streams with much shorter response latency and higher intensity sensitivity compared to conventional cameras. On the basis of these features, previous works have attempted to reconstruct high dynamic range (HDR) videos from events, but h… ▽ More Event cameras are innovative neuromorphic sensors that asynchronously capture the scene dynamics. Due to the event-triggering mechanism, such cameras record event streams with much shorter response latency and higher intensity sensitivity compared to conventional cameras. On the basis of these features, previous works have attempted to reconstruct high dynamic range (HDR) videos from events, but have either suffered from unrealistic artifacts or failed to provide sufficiently high frame rates. In this paper, we present a recurrent convolutional neural network that reconstruct high-speed HDR videos from event sequences, with a key frame guidance to prevent potential error accumulation caused by the sparse event data. Additionally, to address the problem of severely limited real dataset, we develop a new optical system to collect a real-world dataset with paired high-speed HDR videos and event streams, facilitating future research in this field. Our dataset provides the first real paired dataset for event-to-HDR reconstruction, avoiding potential inaccuracies from simulation strategies. Experimental results demonstrate that our method can generate high-quality, high-speed HDR videos. We further explore the potential of our work in cross-camera reconstruction and downstream computer vision tasks, including object detection, panoramic segmentation, optical flow estimation, and monocular depth estimation under HDR scenarios. △ Less

Submitted 25 September, 2024; originally announced September 2024.

Comments: TPAMI 2024

arXiv:2409.15724 [pdf, other]

LLM-Cure: LLM-based Competitor User Review Analysis for Feature Enhancement

Authors: Maram Assi, Safwat Hassan, Ying Zou

Abstract: The exponential growth of the mobile app market underscores the importance of constant innovation and rapid response to user demands. As user satisfaction is paramount to the success of a mobile application (app), developers typically rely on user reviews, which represent user feedback that includes ratings and comments to identify areas for improvement. However, the sheer volume of user reviews p… ▽ More The exponential growth of the mobile app market underscores the importance of constant innovation and rapid response to user demands. As user satisfaction is paramount to the success of a mobile application (app), developers typically rely on user reviews, which represent user feedback that includes ratings and comments to identify areas for improvement. However, the sheer volume of user reviews poses challenges in manual analysis, necessitating automated approaches. Existing automated approaches either analyze only the target apps reviews, neglecting the comparison of similar features to competitors or fail to provide suggestions for feature enhancement. To address these gaps, we propose a Large Language Model (LLM)-based Competitive User Review Analysis for Feature Enhancement) (LLM-Cure), an approach powered by LLMs to automatically generate suggestion s for mobile app feature improvements. More specifically, LLM-Cure identifies and categorizes features within reviews by applying LLMs. When provided with a complaint in a user review, LLM-Cure curates highly rated (4 and 5 stars) reviews in competing apps related to the complaint and proposes potential improvements tailored to the target application. We evaluate LLM-Cure on 1,056,739 reviews of 70 popular Android apps. Our evaluation demonstrates that LLM-Cure significantly outperforms the state-of-the-art approaches in assigning features to reviews by up to 13% in F1-score, up to 16% in recall and up to 11% in precision. Additionally, LLM-Cure demonstrates its capability to provide suggestions for resolving user complaints. We verify the suggestions using the release notes that reflect the changes of features in the target mobile app. LLM-Cure achieves a promising average of 73% of the implementation of the provided suggestions. △ Less

Submitted 24 September, 2024; originally announced September 2024.

Comments: 25 pages

arXiv:2409.14968 [pdf, other]

Mutation-Based Deep Learning Framework Testing Method in JavaScript Environment

Authors: Yinglong Zou, Juan Zhai, Chunrong Fang, Jiawei Liu, Tao Zheng, Zhenyu Chen

Abstract: In recent years, Deep Learning (DL) applications in JavaScript environment have become increasingly popular. As the infrastructure for DL applications, JavaScript DL frameworks play a crucial role in the development and deployment. It is essential to ensure the quality of JavaScript DL frameworks. However, the bottleneck of limited computational resources in the JavaScript environment brings new c… ▽ More In recent years, Deep Learning (DL) applications in JavaScript environment have become increasingly popular. As the infrastructure for DL applications, JavaScript DL frameworks play a crucial role in the development and deployment. It is essential to ensure the quality of JavaScript DL frameworks. However, the bottleneck of limited computational resources in the JavaScript environment brings new challenges to framework testing. Specifically, JavaScript DL frameworks are equipped with various optimization mechanisms (e.g., cache reuse, inference acceleration) to overcome the bottleneck of limited computational resources. These optimization mechanisms are overlooked by existing methods, resulting in many bugs in JavaScript DL frameworks being missed. To address the above challenges, we propose a mutation-based JavaScript DL framework testing method named DLJSFuzzer. DLJSFuzzer designs 13 tensor mutation rules targeting the cache reuse mechanism to generate test input tensors. Besides, DLJSFuzzer designs eight model mutation rules targeting the inference acceleration mechanism to generate test input models. To evaluate the effectiveness of DLJSFuzzer, we conduct experiments on the most widely-used JavaScript DL framework, TensorFlow.js. The experimental results show that DLJSFuzzer outperforms state-of-the-art methods in both effectiveness and efficiency. DLJSFuzzer successfully detects 21 unique crashes and 126 unique NaN & Inconsistency bugs. All detected crashes have been reported to the open-source community, with 12 of them already confirmed by developers. Additionally, DLJSFuzzer has improved by over 47% in model generation efficiency and over 91% in bug detection efficiency compared to all baselines. △ Less

Submitted 23 September, 2024; originally announced September 2024.

arXiv:2409.13985 [pdf, other]

LiDAR-based Quadrotor for Slope Inspection in Dense Vegetation

Authors: Wenyi Liu, Yunfan Ren, Rui Guo, Vickie W. W. Kong, Anthony S. P. Hung, Fangcheng Zhu, Yixi Cai, Yuying Zou, Fu Zhang

Abstract: This work presents a LiDAR-based quadrotor system for slope inspection in dense vegetation environments. Cities like Hong Kong are vulnerable to climate hazards, which often result in landslides. To mitigate the landslide risks, the Civil Engineering and Development Department (CEDD) has constructed steel flexible debris-resisting barriers on vulnerable natural catchments to protect residents. How… ▽ More This work presents a LiDAR-based quadrotor system for slope inspection in dense vegetation environments. Cities like Hong Kong are vulnerable to climate hazards, which often result in landslides. To mitigate the landslide risks, the Civil Engineering and Development Department (CEDD) has constructed steel flexible debris-resisting barriers on vulnerable natural catchments to protect residents. However, it is necessary to carry out regular inspections to identify any anomalies, which may affect the proper functioning of the barriers. Traditional manual inspection methods face challenges and high costs due to steep terrain and dense vegetation. Compared to manual inspection, unmanned aerial vehicles (UAVs) equipped with LiDAR sensors and cameras have advantages such as maneuverability in complex terrain, and access to narrow areas and high spots. However, conducting slope inspections using UAVs in dense vegetation poses significant challenges. First, in terms of hardware, the overall design of the UAV must carefully consider its maneuverability in narrow spaces, flight time, and the types of onboard sensors required for effective inspection. Second, regarding software, navigation algorithms need to be designed to enable obstacle avoidance flight in dense vegetation environments. To overcome these challenges, we develop a LiDAR-based quadrotor, accompanied by a comprehensive software system. The goal is to deploy our quadrotor in field environments to achieve efficient slope inspection. To assess the feasibility of our hardware and software system, we conduct functional tests in non-operational scenarios. Subsequently, invited by CEDD, we deploy our quadrotor in six field environments, including five flexible debris-resisting barriers located in dense vegetation and one slope that experienced a landslide. These experiments demonstrated the superiority of our quadrotor in slope inspection. △ Less

Submitted 20 September, 2024; originally announced September 2024.

Comments: 36 pages

arXiv:2409.12678 [pdf, other]

PMR-Net: Parallel Multi-Resolution Encoder-Decoder Network Framework for Medical Image Segmentation

Authors: Xiaogang Du, Dongxin Gu, Tao Lei, Yipeng Jiao, Yibin Zou

Abstract: In recent years, encoder-decoder networks have focused on expanding receptive fields and incorporating multi-scale context to capture global features for objects of varying sizes. However, as networks deepen, they often discard fine spatial details, impairing precise object localization. Additionally, conventional decoders' use of interpolation for upsampling leads to a loss of global context, dim… ▽ More In recent years, encoder-decoder networks have focused on expanding receptive fields and incorporating multi-scale context to capture global features for objects of varying sizes. However, as networks deepen, they often discard fine spatial details, impairing precise object localization. Additionally, conventional decoders' use of interpolation for upsampling leads to a loss of global context, diminishing edge segmentation accuracy. To address the above problems, we propose a novel parallel multi-resolution encoder-decoder network, namely PMR-Net for short. First, we design a parallel multi-resolution encoder and a multi-resolution context encoder. The parallel multi-resolution encoder can extract and fuse multi-scale fine-grained local features in parallel for input images with different resolutions. The multi-resolution context encoder fuses the global context semantic features of different receptive fields from different encoder branches to maintain effectively the integrity of global information. Secondly, we design a parallel multi-resolution decoder symmetrical to the structure of parallel multi-resolution encoder. The decoder can continuously supplement the global context features of low-resolution branches to the feature maps of high-resolution branches, and effectively solve the problem of global context feature loss caused by upsampling operation in the decoding process. Extensive experiment results demonstrate that our proposed PMR-Net can achieve more accurate segmentation results than state-of-the-art methods on five public available datasets. Moreover, PMR-Net is also a flexible network framework, which can meet the requirements of different scenarios by adjusting the number of network layers and the number of parallel encoder-decoder branches. △ Less

Submitted 19 September, 2024; originally announced September 2024.

arXiv:2409.10025 [pdf, other]

DiffATR: Diffusion-based Generative Modeling for Audio-Text Retrieval

Authors: Yifei Xin, Xuxin Cheng, Zhihong Zhu, Xusheng Yang, Yuexian Zou

Abstract: Existing audio-text retrieval (ATR) methods are essentially discriminative models that aim to maximize the conditional likelihood, represented as p(candidates|query). Nevertheless, this methodology fails to consider the intrinsic data distribution p(query), leading to difficulties in discerning out-of-distribution data. In this work, we attempt to tackle this constraint through a generative perspe… ▽ More Existing audio-text retrieval (ATR) methods are essentially discriminative models that aim to maximize the conditional likelihood, represented as p(candidates|query). Nevertheless, this methodology fails to consider the intrinsic data distribution p(query), leading to difficulties in discerning out-of-distribution data. In this work, we attempt to tackle this constraint through a generative perspective and model the relationship between audio and text as their joint probability p(candidates,query). To this end, we present a diffusion-based ATR framework (DiffATR), which models ATR as an iterative procedure that progressively generates joint distribution from noise. Throughout its training phase, DiffATR is optimized from both generative and discriminative viewpoints: the generator is refined through a generation loss, while the feature extractor benefits from a contrastive loss, thus combining the merits of both methodologies. Experiments on the AudioCaps and Clotho datasets with superior performances, verify the effectiveness of our approach. Notably, without any alterations, our DiffATR consistently exhibits strong performance in out-of-domain retrieval settings. △ Less

Submitted 16 September, 2024; originally announced September 2024.

Comments: Accepted by Interspeech2024

arXiv:2409.09729 [pdf, other]

Quantum continual learning on a programmable superconducting processor

Authors: Chuanyu Zhang, Zhide Lu, Liangtian Zhao, Shibo Xu, Weikang Li, Ke Wang, Jiachen Chen, Yaozu Wu, Feitong Jin, Xuhao Zhu, Yu Gao, Ziqi Tan, Zhengyi Cui, Aosai Zhang, Ning Wang, Yiren Zou, Tingting Li, Fanhao Shen, Jiarun Zhong, Zehang Bao, Zitian Zhu, Zixuan Song, Jinfeng Deng, Hang Dong, Pengfei Zhang , et al. (10 additional authors not shown)

Abstract: Quantum computers may outperform classical computers on machine learning tasks. In recent years, a variety of quantum algorithms promising unparalleled potential to enhance, speed up, or innovate machine learning have been proposed. Yet, quantum learning systems, similar to their classical counterparts, may likewise suffer from the catastrophic forgetting problem, where training a model with new t… ▽ More Quantum computers may outperform classical computers on machine learning tasks. In recent years, a variety of quantum algorithms promising unparalleled potential to enhance, speed up, or innovate machine learning have been proposed. Yet, quantum learning systems, similar to their classical counterparts, may likewise suffer from the catastrophic forgetting problem, where training a model with new tasks would result in a dramatic performance drop for the previously learned ones. This problem is widely believed to be a crucial obstacle to achieving continual learning of multiple sequential tasks. Here, we report an experimental demonstration of quantum continual learning on a fully programmable superconducting processor. In particular, we sequentially train a quantum classifier with three tasks, two about identifying real-life images and the other on classifying quantum states, and demonstrate its catastrophic forgetting through experimentally observed rapid performance drops for prior tasks. To overcome this dilemma, we exploit the elastic weight consolidation strategy and show that the quantum classifier can incrementally learn and retain knowledge across the three distinct tasks, with an average prediction accuracy exceeding 92.3%. In addition, for sequential tasks involving quantum-engineered data, we demonstrate that the quantum classifier can achieve a better continual learning performance than a commonly used classical feedforward network with a comparable number of variational parameters. Our results establish a viable strategy for empowering quantum learning systems with desirable adaptability to multiple sequential tasks, marking an important primary experimental step towards the long-term goal of achieving quantum artificial general intelligence. △ Less

Submitted 15 September, 2024; originally announced September 2024.

Comments: 21 pages, 14 figures

arXiv:2409.09256 [pdf, other]

Audio-text Retrieval with Transformer-based Hierarchical Alignment and Disentangled Cross-modal Representation

Authors: Yifei Xin, Zhihong Zhu, Xuxin Cheng, Xusheng Yang, Yuexian Zou

Abstract: Most existing audio-text retrieval (ATR) approaches typically rely on a single-level interaction to associate audio and text, limiting their ability to align different modalities and leading to suboptimal matches. In this work, we present a novel ATR framework that leverages two-stream Transformers in conjunction with a Hierarchical Alignment (THA) module to identify multi-level correspondences of… ▽ More Most existing audio-text retrieval (ATR) approaches typically rely on a single-level interaction to associate audio and text, limiting their ability to align different modalities and leading to suboptimal matches. In this work, we present a novel ATR framework that leverages two-stream Transformers in conjunction with a Hierarchical Alignment (THA) module to identify multi-level correspondences of different Transformer blocks between audio and text. Moreover, current ATR methods mainly focus on learning a global-level representation, missing out on intricate details to capture audio occurrences that correspond to textual semantics. To bridge this gap, we introduce a Disentangled Cross-modal Representation (DCR) approach that disentangles high-dimensional features into compact latent factors to grasp fine-grained audio-text semantic correlations. Additionally, we develop a confidence-aware (CA) module to estimate the confidence of each latent factor pair and adaptively aggregate cross-modal latent factors to achieve local semantic alignment. Experiments show that our THA effectively boosts ATR performance, with the DCR approach further contributing to consistent performance gains. △ Less

Submitted 13 September, 2024; originally announced September 2024.

Comments: Accepted by Interspeech2024

arXiv:2409.07896 [pdf, other]

Microscopic-Mamba: Revealing the Secrets of Microscopic Images with Just 4M Parameters

Authors: Shun Zou, Zhuo Zhang, Yi Zou, Guangwei Gao

Abstract: In the field of medical microscopic image classification (MIC), CNN-based and Transformer-based models have been extensively studied. However, CNNs struggle with modeling long-range dependencies, limiting their ability to fully utilize semantic information in images. Conversely, Transformers are hampered by the complexity of quadratic computations. To address these challenges, we propose a model b… ▽ More In the field of medical microscopic image classification (MIC), CNN-based and Transformer-based models have been extensively studied. However, CNNs struggle with modeling long-range dependencies, limiting their ability to fully utilize semantic information in images. Conversely, Transformers are hampered by the complexity of quadratic computations. To address these challenges, we propose a model based on the Mamba architecture: Microscopic-Mamba. Specifically, we designed the Partially Selected Feed-Forward Network (PSFFN) to replace the last linear layer of the Visual State Space Module (VSSM), enhancing Mamba's local feature extraction capabilities. Additionally, we introduced the Modulation Interaction Feature Aggregation (MIFA) module to effectively modulate and dynamically aggregate global and local features. We also incorporated a parallel VSSM mechanism to improve inter-channel information interaction while reducing the number of parameters. Extensive experiments have demonstrated that our method achieves state-of-the-art performance on five public datasets. Code is available at https://github.com/zs1314/Microscopic-Mamba △ Less

Submitted 12 September, 2024; originally announced September 2024.

Comments: 5 pages, 1 figures

arXiv:2409.07400 [pdf, other]

Validation of up to seven TESS planet candidates through multi-colour transit photometry using MuSCAT2 data

Authors: A. Peláez-Torres, E. Esparza-Borges, E. Pallé, H. Parviainen, F. Murgas, G. Morello, M. R. Zapatero-Osorio, J. Korth, N. Narita, A. Fukui, I. Carleo, R. Luque, N. Abreu García, K. Barkaoui, A. Boyle, V. J. S. Béjar, Y. Calatayud-Borras, D. V. Cheryasov, J. L. Christiansen, D. R. Ciardi, G. Enoc, Z. Essack, I. Fukuda, G. Furesz, D. Galán , et al. (40 additional authors not shown)

Abstract: The TESS mission searches for transiting exoplanets by monitoring the brightness of hundreds of thousands of stars across the entire sky. M-type planet hosts are ideal targets for this mission due to their smaller size and cooler temperatures, which makes it easier to detect smaller planets near or within their habitable zones. Additionally, M~dwarfs have a smaller contrast ratio between the plane… ▽ More The TESS mission searches for transiting exoplanets by monitoring the brightness of hundreds of thousands of stars across the entire sky. M-type planet hosts are ideal targets for this mission due to their smaller size and cooler temperatures, which makes it easier to detect smaller planets near or within their habitable zones. Additionally, M~dwarfs have a smaller contrast ratio between the planet and the star, making it easier to measure the planet's properties accurately. Here, we report the validation analysis of 13 TESS exoplanet candidates orbiting around M dwarfs. We studied the nature of these candidates through a multi-colour transit photometry transit analysis using several ground-based instruments (MuSCAT2, MuSCAT3, and LCO-SINISTRO), high-spatial resolution observations, and TESS light curves. We present the validation of five new planetary systems: TOI-1883b, TOI-2274b, TOI2768b, TOI-4438b, and TOI-5319b, along with compelling evidence of a planetary nature for TOIs 2781b and 5486b. We also present an empirical definition for the Neptune desert boundaries. The remaining six systems could not be validated due to large true radius values overlapping with the brown dwarf regime or, alternatively, the presence of chromaticity in the MuSCAT2 light curves. △ Less

Submitted 11 September, 2024; originally announced September 2024.

arXiv:2409.05168 [pdf, other]

Magnetospheric control of ionospheric TEC perturbations via whistler-mode and ULF waves

Authors: Yangyang Shen, Olga P. Verkhoglyadova, Anton Artemyev, Michael D. Hartinger, Vassilis Angelopoulos, Xueling Shi, Ying Zou

Abstract: The weakly ionized plasma in the Earth's ionosphere is controlled by a complex interplay between solar and magnetospheric inputs from above, atmospheric processes from below, and plasma electrodynamics from within. This interaction results in ionosphere structuring and variability that pose major challenges for accurate ionosphere prediction for global navigation satellite system (GNSS) related ap… ▽ More The weakly ionized plasma in the Earth's ionosphere is controlled by a complex interplay between solar and magnetospheric inputs from above, atmospheric processes from below, and plasma electrodynamics from within. This interaction results in ionosphere structuring and variability that pose major challenges for accurate ionosphere prediction for global navigation satellite system (GNSS) related applications and space weather research. The ionospheric structuring and variability are often probed using the total electron content (TEC) and its relative perturbations (dTEC). Among dTEC variations observed at high latitudes, a unique modulation pattern has been linked to magnetospheric ultra low frequency (ULF) waves, yet its underlying mechanisms remain unclear. Here using magnetically-conjugate observations from the THEMIS spacecraft and a ground-based GPS receiver at Fairbanks, Alaska, we provide direct evidence that these dTEC modulations are driven by magnetospheric electron precipitation induced by ULF-modulated whistler-mode waves. We observed peak-to-peak dTEC amplitudes reaching ~0.5 TECU (1 TECU is equal to 10$^6$ electrons/m$^2$) with modulations spanning scales of ~5--100 km. The cross-correlation between our modeled and observed dTEC reached ~0.8 during the conjugacy period but decreased outside of it. The spectra of whistler-mode waves and dTEC also matched closely at ULF frequencies during the conjugacy period but diverged outside of it. Our findings elucidate the high-latitude dTEC generation from magnetospheric wave-induced precipitation, addressing a significant gap in current physics-based dTEC modeling. Theses results thus improve ionospheric dTEC prediction and enhance our understanding of magnetosphere-ionosphere coupling via ULF waves. △ Less

Submitted 8 September, 2024; originally announced September 2024.

Comments: 14 pages, 5 figures, manuscript under review in AGU Advances

arXiv:2409.02920 [pdf, other]

RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins (early version)

Authors: Yao Mu, Tianxing Chen, Shijia Peng, Zanxin Chen, Zeyu Gao, Yude Zou, Lunkai Lin, Zhiqiang Xie, Ping Luo

Abstract: Effective collaboration of dual-arm robots and their tool use capabilities are increasingly important areas in the advancement of robotics. These skills play a significant role in expanding robots' ability to operate in diverse real-world environments. However, progress is impeded by the scarcity of specialized training data. This paper introduces RoboTwin, a novel benchmark dataset combining real… ▽ More Effective collaboration of dual-arm robots and their tool use capabilities are increasingly important areas in the advancement of robotics. These skills play a significant role in expanding robots' ability to operate in diverse real-world environments. However, progress is impeded by the scarcity of specialized training data. This paper introduces RoboTwin, a novel benchmark dataset combining real-world teleoperated data with synthetic data from digital twins, designed for dual-arm robotic scenarios. Using the COBOT Magic platform, we have collected diverse data on tool usage and human-robot interaction. We present a innovative approach to creating digital twins using AI-generated content, transforming 2D images into detailed 3D models. Furthermore, we utilize large language models to generate expert-level training data and task-specific pose sequences oriented toward functionality. Our key contributions are: 1) the RoboTwin benchmark dataset, 2) an efficient real-to-simulation pipeline, and 3) the use of language models for automatic expert-level data generation. These advancements are designed to address the shortage of robotic training data, potentially accelerating the development of more capable and versatile robotic systems for a wide range of real-world applications. The project page is available at https://robotwin-benchmark.github.io/early-version/ △ Less

Submitted 4 September, 2024; originally announced September 2024.

Comments: Project page: https://robotwin-benchmark.github.io/early-version/

arXiv:2409.01893 [pdf, other]

What are the Essential Factors in Crafting Effective Long Context Multi-Hop Instruction Datasets? Insights and Best Practices

Authors: Zhi Chen, Qiguang Chen, Libo Qin, Qipeng Guo, Haijun Lv, Yicheng Zou, Wanxiang Che, Hang Yan, Kai Chen, Dahua Lin

Abstract: Recent advancements in large language models (LLMs) with extended context windows have significantly improved tasks such as information extraction, question answering, and complex planning scenarios. In order to achieve success in long context tasks, a large amount of work has been done to enhance the long context capabilities of the model through synthetic data. Existing methods typically utilize… ▽ More Recent advancements in large language models (LLMs) with extended context windows have significantly improved tasks such as information extraction, question answering, and complex planning scenarios. In order to achieve success in long context tasks, a large amount of work has been done to enhance the long context capabilities of the model through synthetic data. Existing methods typically utilize the Self-Instruct framework to generate instruction tuning data for better long context capability improvement. However, our preliminary experiments indicate that less than 35% of generated samples are multi-hop, and more than 40% exhibit poor quality, limiting comprehensive understanding and further research. To improve the quality of synthetic data, we propose the Multi-agent Interactive Multi-hop Generation (MIMG) framework, incorporating a Quality Verification Agent, a Single-hop Question Generation Agent, a Multiple Question Sampling Strategy, and a Multi-hop Question Merger Agent. This framework improves the data quality, with the proportion of high-quality, multi-hop, and diverse data exceeding 85%. Furthermore, we systematically investigate strategies for document selection, question merging, and validation techniques through extensive experiments across various models. Our findings show that our synthetic high-quality long-context instruction data significantly enhances model performance, even surpassing models trained on larger amounts of human-annotated data. Our code is available at: https://github.com/WowCZ/LongMIT. △ Less

Submitted 3 September, 2024; originally announced September 2024.

Comments: Work in progress

arXiv:2408.14261 [pdf, other]

Securing FC-RIS and UAV Empowered Multiuser Communications Against a Randomly Flying Eavesdropper

Authors: Shuying Lin, Yulong Zou, Yuhan Jiang, Libao Yang, Zhe Cui, Le-Nam Tran

Abstract: This paper investigates a wireless network consisting of an unmanned aerial vehicle (UAV) base station (BS), a fully-connected reconfigurable intelligent surface (FC-RIS), and multiple users, where the downlink signal can simultaneously be captured by an aerial eavesdropper at a random location. To improve the physical-layer security (PLS) of the considered downlink multiuser communications, we pr… ▽ More This paper investigates a wireless network consisting of an unmanned aerial vehicle (UAV) base station (BS), a fully-connected reconfigurable intelligent surface (FC-RIS), and multiple users, where the downlink signal can simultaneously be captured by an aerial eavesdropper at a random location. To improve the physical-layer security (PLS) of the considered downlink multiuser communications, we propose the fully-connected reconfigurable intelligent surface aided round-robin scheduling (FCR-RS) and the FC-RIS and ground channel state information (CSI) aided proportional fair scheduling (FCR-GCSI-PFS) schemes. Thereafter, we derive closed-form expressions of the zero secrecy rate probability (ZSRP). Numerical results not only validate the closed-form ZSRP analysis, but also verify that the proposed GCSI-PFS scheme obtains the same performance gain as the full-CSI-aided PFS in FC-RIS-aided communications. Furthermore, optimizing the hovering altitude remarkably enhances the PLS of the FC-RIS and UAV empowered multiuser communications. △ Less

Submitted 26 August, 2024; originally announced August 2024.

Comments: submitted to IEEE Wireless Communications letters

arXiv:2408.14158 [pdf, other]

Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning

Authors: Wei An, Xiao Bi, Guanting Chen, Shanhuang Chen, Chengqi Deng, Honghui Ding, Kai Dong, Qiushi Du, Wenjun Gao, Kang Guan, Jianzhong Guo, Yongqiang Guo, Zhe Fu, Ying He, Panpan Huang, Jiashi Li, Wenfeng Liang, Xiaodong Liu, Xin Liu, Yiyuan Liu, Yuxuan Liu, Shanghao Lu, Xuan Lu, Xiaotao Nie, Tian Pei , et al. (27 additional authors not shown)

Abstract: The rapid progress in Deep Learning (DL) and Large Language Models (LLMs) has exponentially increased demands of computational power and bandwidth. This, combined with the high costs of faster computing chips and interconnects, has significantly inflated High Performance Computing (HPC) construction costs. To address these challenges, we introduce the Fire-Flyer AI-HPC architecture, a synergistic… ▽ More The rapid progress in Deep Learning (DL) and Large Language Models (LLMs) has exponentially increased demands of computational power and bandwidth. This, combined with the high costs of faster computing chips and interconnects, has significantly inflated High Performance Computing (HPC) construction costs. To address these challenges, we introduce the Fire-Flyer AI-HPC architecture, a synergistic hardware-software co-design framework and its best practices. For DL training, we deployed the Fire-Flyer 2 with 10,000 PCIe A100 GPUs, achieved performance approximating the DGX-A100 while reducing costs by half and energy consumption by 40%. We specifically engineered HFReduce to accelerate allreduce communication and implemented numerous measures to keep our Computation-Storage Integrated Network congestion-free. Through our software stack, including HaiScale, 3FS, and HAI-Platform, we achieved substantial scalability by overlapping computation and communication. Our system-oriented experience from DL training provides valuable insights to drive future advancements in AI-HPC. △ Less

Submitted 31 August, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

Comments: This is the preprint version of the paper accepted for presentation at the 2024 International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'24). \c{opyright} 2024 IEEE. Personal use of this material is permitted. For other uses, permission from IEEE must be obtained. Please refer to IEEE Xplore for the final published version

arXiv:2408.13770 [pdf, other]

TranSplat: Generalizable 3D Gaussian Splatting from Sparse Multi-View Images with Transformers

Authors: Chuanrui Zhang, Yingshuang Zou, Zhuoling Li, Minmin Yi, Haoqian Wang

Abstract: Compared with previous 3D reconstruction methods like Nerf, recent Generalizable 3D Gaussian Splatting (G-3DGS) methods demonstrate impressive efficiency even in the sparse-view setting. However, the promising reconstruction performance of existing G-3DGS methods relies heavily on accurate multi-view feature matching, which is quite challenging. Especially for the scenes that have many non-overlap… ▽ More Compared with previous 3D reconstruction methods like Nerf, recent Generalizable 3D Gaussian Splatting (G-3DGS) methods demonstrate impressive efficiency even in the sparse-view setting. However, the promising reconstruction performance of existing G-3DGS methods relies heavily on accurate multi-view feature matching, which is quite challenging. Especially for the scenes that have many non-overlapping areas between various views and contain numerous similar regions, the matching performance of existing methods is poor and the reconstruction precision is limited. To address this problem, we develop a strategy that utilizes a predicted depth confidence map to guide accurate local feature matching. In addition, we propose to utilize the knowledge of existing monocular depth estimation models as prior to boost the depth estimation precision in non-overlapping areas between views. Combining the proposed strategies, we present a novel G-3DGS method named TranSplat, which obtains the best performance on both the RealEstate10K and ACID benchmarks while maintaining competitive speed and presenting strong cross-dataset generalization ability. Our code, and demos will be available at: https://xingyoujun.github.io/transplat. △ Less

Submitted 25 August, 2024; originally announced August 2024.

arXiv:2408.13385 [pdf, other]

MICM: Rethinking Unsupervised Pretraining for Enhanced Few-shot Learning

Authors: Zhenyu Zhang, Guangyao Chen, Yixiong Zou, Zhimeng Huang, Yuhua Li, Ruixuan Li

Abstract: Humans exhibit a remarkable ability to learn quickly from a limited number of labeled samples, a capability that starkly contrasts with that of current machine learning systems. Unsupervised Few-Shot Learning (U-FSL) seeks to bridge this divide by reducing reliance on annotated datasets during initial training phases. In this work, we first quantitatively assess the impacts of Masked Image Modelin… ▽ More Humans exhibit a remarkable ability to learn quickly from a limited number of labeled samples, a capability that starkly contrasts with that of current machine learning systems. Unsupervised Few-Shot Learning (U-FSL) seeks to bridge this divide by reducing reliance on annotated datasets during initial training phases. In this work, we first quantitatively assess the impacts of Masked Image Modeling (MIM) and Contrastive Learning (CL) on few-shot learning tasks. Our findings highlight the respective limitations of MIM and CL in terms of discriminative and generalization abilities, which contribute to their underperformance in U-FSL contexts. To address these trade-offs between generalization and discriminability in unsupervised pretraining, we introduce a novel paradigm named Masked Image Contrastive Modeling (MICM). MICM creatively combines the targeted object learning strength of CL with the generalized visual feature learning capability of MIM, significantly enhancing its efficacy in downstream few-shot learning inference. Extensive experimental analyses confirm the advantages of MICM, demonstrating significant improvements in both generalization and discrimination capabilities for few-shot learning. Our comprehensive quantitative evaluations further substantiate the superiority of MICM, showing that our two-stage U-FSL framework based on MICM markedly outperforms existing leading baselines. △ Less

Submitted 23 August, 2024; originally announced August 2024.

Comments: ACMMM 2024 (Oral)

arXiv:2408.13373 [pdf, other]

Learning Unknowns from Unknowns: Diversified Negative Prototypes Generator for Few-Shot Open-Set Recognition

Authors: Zhenyu Zhang, Guangyao Chen, Yixiong Zou, Yuhua Li, Ruixuan Li

Abstract: Few-shot open-set recognition (FSOR) is a challenging task that requires a model to recognize known classes and identify unknown classes with limited labeled data. Existing approaches, particularly Negative-Prototype-Based methods, generate negative prototypes based solely on known class data. However, as the unknown space is infinite while the known space is limited, these methods suffer from lim… ▽ More Few-shot open-set recognition (FSOR) is a challenging task that requires a model to recognize known classes and identify unknown classes with limited labeled data. Existing approaches, particularly Negative-Prototype-Based methods, generate negative prototypes based solely on known class data. However, as the unknown space is infinite while the known space is limited, these methods suffer from limited representation capability. To address this limitation, we propose a novel approach, termed \textbf{D}iversified \textbf{N}egative \textbf{P}rototypes \textbf{G}enerator (DNPG), which adopts the principle of "learning unknowns from unknowns." Our method leverages the unknown space information learned from base classes to generate more representative negative prototypes for novel classes. During the pre-training phase, we learn the unknown space representation of the base classes. This representation, along with inter-class relationships, is then utilized in the meta-learning process to construct negative prototypes for novel classes. To prevent prototype collapse and ensure adaptability to varying data compositions, we introduce the Swap Alignment (SA) module. Our DNPG model, by learning from the unknown space, generates negative prototypes that cover a broader unknown space, thereby achieving state-of-the-art performance on three standard FSOR datasets. △ Less

Submitted 23 August, 2024; originally announced August 2024.

Comments: ACMMM 2024

arXiv:2408.12609 [pdf, ps, other]

Enhanced Prediction of Multi-Agent Trajectories via Control Inference and State-Space Dynamics

Authors: Yu Zhang, Yongxiang Zou, Haoyu Zhang, Zeyu Liu, Houcheng Li, Long Cheng

Abstract: In the field of autonomous systems, accurately predicting the trajectories of nearby vehicles and pedestrians is crucial for ensuring both safety and operational efficiency. This paper introduces a novel methodology for trajectory forecasting based on state-space dynamic system modeling, which endows agents with models that have tangible physical implications. To enhance the precision of state est… ▽ More In the field of autonomous systems, accurately predicting the trajectories of nearby vehicles and pedestrians is crucial for ensuring both safety and operational efficiency. This paper introduces a novel methodology for trajectory forecasting based on state-space dynamic system modeling, which endows agents with models that have tangible physical implications. To enhance the precision of state estimations within the dynamic system, the paper also presents a novel modeling technique for control variables. This technique utilizes a newly introduced model, termed "Mixed Mamba," to derive initial control states, thereby improving the predictive accuracy of these variables. Moverover, the proposed approach ingeniously integrates graph neural networks with state-space models, effectively capturing the complexities of multi-agent interactions. This combination provides a robust and scalable framework for forecasting multi-agent trajectories across a range of scenarios. Comprehensive evaluations demonstrate that this model outperforms several established benchmarks across various metrics and datasets, highlighting its significant potential to advance trajectory forecasting in autonomous systems. △ Less

Submitted 8 August, 2024; originally announced August 2024.

arXiv:2408.11900 [pdf, other]

Quantum highway: Observation of minimal and maximal speed limits for few and many-body states

Authors: Zitian Zhu, Lei Gao, Zehang Bao, Liang Xiang, Zixuan Song, Shibo Xu, Ke Wang, Jiachen Chen, Feitong Jin, Xuhao Zhu, Yu Gao, Yaozu Wu, Chuanyu Zhang, Ning Wang, Yiren Zou, Ziqi Tan, Aosai Zhang, Zhengyi Cui, Fanhao Shen, Jiarun Zhong, Tingting Li, Jinfeng Deng, Xu Zhang, Hang Dong, Pengfei Zhang , et al. (8 additional authors not shown)

Abstract: Tracking the time evolution of a quantum state allows one to verify the thermalization rate or the propagation speed of correlations in generic quantum systems. Inspired by the energy-time uncertainty principle, bounds have been demonstrated on the maximal speed at which a quantum state can change, resulting in immediate and practical tasks. Based on a programmable superconducting quantum processo… ▽ More Tracking the time evolution of a quantum state allows one to verify the thermalization rate or the propagation speed of correlations in generic quantum systems. Inspired by the energy-time uncertainty principle, bounds have been demonstrated on the maximal speed at which a quantum state can change, resulting in immediate and practical tasks. Based on a programmable superconducting quantum processor, we test the dynamics of various emulated quantum mechanical systems encompassing single- and many-body states. We show that one can test the known quantum speed limits and that modifying a single Hamiltonian parameter allows the observation of the crossover of the different bounds on the dynamics. We also unveil the observation of minimal quantum speed limits in addition to more common maximal ones, i.e., the lowest rate of change of a unitarily evolved quantum state. Our results establish a comprehensive experimental characterization of quantum speed limits and pave the way for their subsequent study in engineered non-unitary conditions. △ Less

Submitted 21 August, 2024; originally announced August 2024.

Comments: 9 pages,4 figures + supplementary information

arXiv:2408.09816 [pdf, ps, other]

Asymptotic Expansion of the Eigenvalues of a Bathtub Potential with Quadratic Ends

Authors: Yuzhou Zou

Abstract: We consider the eigenvalues of a one-dimensional semiclassical Schrödinger operator, where the potential consist of two quadratic ends (that is, looks like a harmonic oscillator at each infinite end), possibly with a flat region in the middle. Such a potential notably has a discontinuity in the second derivative. We derive an asymptotic expansion, valid either in the high energy regime or the semi… ▽ More We consider the eigenvalues of a one-dimensional semiclassical Schrödinger operator, where the potential consist of two quadratic ends (that is, looks like a harmonic oscillator at each infinite end), possibly with a flat region in the middle. Such a potential notably has a discontinuity in the second derivative. We derive an asymptotic expansion, valid either in the high energy regime or the semiclassical regime, with a leading order term given by the Bohr-Sommerfeld quantization condition, and an asymptotic expansion consisting of negative powers of the leading order term, with coefficients that are oscillatory in the leading order term. We apply this expansion to study the results of the Gutzwiller Trace formula and the heat kernel asymptotic for this class of potentials, giving an idea into what results to expect for such trace formulas for non-smooth potentials. △ Less

Submitted 19 August, 2024; originally announced August 2024.

arXiv:2408.00857 [pdf, other]

Petz map recovery for long-range entangled quantum many-body states

Authors: Yangrui Hu, Yijian Zou

Abstract: Given a tripartite quantum state on $A,B,C$ and the erasure channel on $C$, the rotated Petz map is a recovery channel that acts on $B$ to recover the erased quantum information. The infidelity of the best recovery is upper-bounded by the conditional mutual information (CMI). In this work, we study the infidelity of the rotated Petz map on several physically-relevant long-range entangled quantum s… ▽ More Given a tripartite quantum state on $A,B,C$ and the erasure channel on $C$, the rotated Petz map is a recovery channel that acts on $B$ to recover the erased quantum information. The infidelity of the best recovery is upper-bounded by the conditional mutual information (CMI). In this work, we study the infidelity of the rotated Petz map on several physically-relevant long-range entangled quantum states. Specifically, we study three classes of quantum phases: (i) steady states of measurement-induced phase transitions, (ii) critical ground state under local measurements, and (iii) chiral states under local measurements. We find that the average Petz map infidelity sharply distinguishes the three classes: (i) and (ii) are distinguished by the scaling of the infidelity with CMI and (iii) is characterized by an asymmetry of the infidelity with the rotation parameter. We also study Petz map recovery for topological order and find an operational interpretation of the topological entanglement entropy. Our result indicates that the Petz map fidelity is a useful diagnostic of quantum phases of matter. △ Less

Submitted 7 August, 2024; v1 submitted 1 August, 2024; originally announced August 2024.

Comments: 9+8 pages, 8+1 figures

arXiv:2407.19870 [pdf, ps, other]

Optimal upper bounds for anti-canonical volumes of singular toric Fano varieties

Authors: Yu Zou

Abstract: Fix two positive integers $d\geq3$ and $q$. We give an upper bound for anti-canonical volumes of $d$-dimensional $\frac{1}{q}$-lc toric Fano varieties, which corresponds to an upper bound for the dual normalized volumes of the associated $d$-dimensional $\frac{1}{q}$-lc Fano polytopes. And we also construct examples to show that these upper bounds are optimal. Besides, we provide an optimal upper… ▽ More Fix two positive integers $d\geq3$ and $q$. We give an upper bound for anti-canonical volumes of $d$-dimensional $\frac{1}{q}$-lc toric Fano varieties, which corresponds to an upper bound for the dual normalized volumes of the associated $d$-dimensional $\frac{1}{q}$-lc Fano polytopes. And we also construct examples to show that these upper bounds are optimal. Besides, we provide an optimal upper bound for volumes of $d$-dimensional lattice simplices $S$ such that $\frac{1}{q}S$ has exactly one interior lattice point. △ Less

Submitted 29 July, 2024; originally announced July 2024.

Comments: 27 pages,comments are welcome

MSC Class: 14J45(Primary) 14M25; 52B20(Secondary)

arXiv:2407.17338 [pdf]

Accurate Inverse Process Optimization Framework in Laser Directed Energy Deposition

Authors: Xiao Shang, Evelyn Li, Ajay Talbot, Haitao Wen, Tianyi Lyu, Jiahui Zhang, Yu Zou

Abstract: In additive manufacturing (AM), particularly for laser-based metal AM, process optimization is crucial to the quality of products and the efficiency of production. The identification of optimal process parameters out of a vast parameter space, however, is a daunting task. Despite advances in simulations, the process optimization for specific materials and geometries is developed through a time-con… ▽ More In additive manufacturing (AM), particularly for laser-based metal AM, process optimization is crucial to the quality of products and the efficiency of production. The identification of optimal process parameters out of a vast parameter space, however, is a daunting task. Despite advances in simulations, the process optimization for specific materials and geometries is developed through a time-consuming trial-and-error approach, which often lacks the versatility to address multiple optimization objectives. Machine learning (ML) provides a powerful tool to accelerate the optimization process, but most current studies focus on simple single-track prints, which hardly translate to manufacturing 3D components for engineering applications. In this study, we develop an Accurate Inverse process optimization framework in laser Directed Energy Deposition (AIDED), based on machine learning models and a genetic algorithm, to aid process optimization in laser DED processes. Using the AIDED, we demonstrate the following: (i) Accurately predict single-track (R2 score 0.995), multi-track (R2 score 0.969), and multi-layer (1.07% and 10.75% error in width and height, respectively) cross-sectional melt pool geometries directly from process parameters; (ii) Determine appropriate hatch spacing and layer thickness for fabricating fully dense (density > 99.9%) multi-track and multi-layer prints; (iii) Inversely identify optimal process parameters directly from customizable application objectives within 1-3 hours. We also validate the effectiveness of the AIDED experimentally by achieving two exemplary targets: fast print speed and fine print resolution. Furthermore, we show the high transferability of the framework from stainless steel to pure nickel. With AIDED, we pave a new way for ''aiding'' the process optimization in the laser-based AM processes that is applicable to a wide range of materials. △ Less

Submitted 24 July, 2024; originally announced July 2024.

arXiv:2407.15914 [pdf, other]

Studying the 3d Ising surface CFTs on the fuzzy sphere

Authors: Zheng Zhou, Yijian Zou

Abstract: Boundaries not only are fundamental elements in nearly all realistic physical systems, but also greatly enrich the structure of quantum field theories. In this paper, we demonstrate that conformal field theory (CFT) with a boundary, known as surface CFT in three dimensions, can be studied with the setup of fuzzy sphere. We consider the example of surface criticality of the 3D Ising CFT. We propose… ▽ More Boundaries not only are fundamental elements in nearly all realistic physical systems, but also greatly enrich the structure of quantum field theories. In this paper, we demonstrate that conformal field theory (CFT) with a boundary, known as surface CFT in three dimensions, can be studied with the setup of fuzzy sphere. We consider the example of surface criticality of the 3D Ising CFT. We propose two schemes by cutting a boundary in the orbital space or the real space to realise the ordinary and the normal surface CFTs on the fuzzy sphere. We obtain the operator spectra through state-operator correspondence. We observe integer spacing of the conformal multiplets, and thus provide direct evidence of conformal symmetry. We identify the ordinary surface primary $o$, the displacement operator $\mathrm{D}$ and their conformal descendants and extract their scaling dimensions. We also study the one-point and two-point correlation functions and extract the bulk-to-surface OPE coefficients, some of which are reported for the first time. In addition, using the overlap of the bulk CFT state and the polarised state, we calculate the boundary central charges of the 3D Ising surface CFTs non-perturbatively. Other conformal data obtained in this way also agrees with prior methods. △ Less

Submitted 16 August, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

Comments: 33 pages, 14+7 figures and 2+2 tables

arXiv:2407.15824 [pdf, other]

Unveiling the Multifaceted GRB 200613A: Prompt Emission Dynamics, Afterglow Evolution, and the Host Galaxy's Properties

Authors: Shao-Yu Fu, Dong Xu, Wei-Hua Lei, Antonio de Ugarte Postigo, D. Alexander Kann, Christina C. Thöne, José Feliciano Agüí Fernández, Yi Shuang-Xi, Wei Xie, Yuan-Chuan Zou, Xing Liu, Shuai-Qing Jiang, Tian-Hua Lu, Jie An, Zi-Pei Zhu, Jie Zheng, Qing-Wen Tang, Peng-Wei Zhao, Li-Ping Xin, Jian-Yan Wei

Abstract: We present our optical observations and multi-wavelength analysis of the GRB\,200613A detected by \texttt{Fermi} satellite. Time-resolved spectral analysis of the prompt $γ$-ray emission was conducted utilizing the Bayesian block method to determine statistically optimal time bins. Based on the Bayesian Information Criterion (BIC), the data generally favor the Band+Blackbody (short as BB) model. W… ▽ More We present our optical observations and multi-wavelength analysis of the GRB\,200613A detected by \texttt{Fermi} satellite. Time-resolved spectral analysis of the prompt $γ$-ray emission was conducted utilizing the Bayesian block method to determine statistically optimal time bins. Based on the Bayesian Information Criterion (BIC), the data generally favor the Band+Blackbody (short as BB) model. We speculate that the main Band component comes from the Blandford-Znajek mechanism, while the additional BB component comes from the neutrino annihilation process. The BB component becomes significant for a low-spin, high-accretion rate black hole central engine, as evidenced by our model comparison with the data. The afterglow light curve exhibits typical power-law decay, and its behavior can be explained by the collision between the ejecta and constant interstellar medium (ISM). Model fitting yields the following parameters: $E_{K,iso} = (2.04^{+11.8}_{-1.50})\times 10^{53}$ erg, $Γ_0=354^{+578}_{-217}$, $p=2.09^{+0.02}_{-0.03}$, $n_{18}=(2.04^{+9.71}_{-1.87})\times 10^{2}$ cm$^{-3}$, $θ_j=24.0^{+6.50}_{-5.54}$ degree, $ε_e=1.66^{+4.09}_{-1.39})\times 10^{-1}$ and $ε_B=(7.76^{+48.5}_{-5.9})\times 10^{-6}$. In addition, we employed the public Python package \texttt{Prospector} perform a spectral energy distribution (SED) modeling of the host galaxy. The results suggest that the host galaxy is a massive galaxy ($\log(M_\ast / M_\odot)=11.75^{+0.10}_{-0.09}$) with moderate star formation rate ($\mbox{SFR}=22.58^{+13.63}_{-7.22} M_{\odot}$/yr). This SFR is consistent with the SFR of $\sim 34.2 M_{\odot}$ yr$^{-1}$ derived from the [OII] emission line in the observed spectrum. △ Less

Submitted 23 July, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

Comments: 30 pages, 16 figures, accepted by ApJ

arXiv:2407.12829 [pdf, other]

PICO-RAM: A PVT-Insensitive Analog Compute-In-Memory SRAM Macro with In-Situ Multi-Bit Charge Computing and 6T Thin-Cell-Compatible Layout

Authors: Zhiyu Chen, Ziyuan Wen, Weier Wan, Akhil Reddy Pakala, Yiwei Zou, Wei-Chen Wei, Zengyi Li, Yubei Chen, Kaiyuan Yang

Abstract: Analog compute-in-memory (CIM) in static random-access memory (SRAM) is promising for accelerating deep learning inference by circumventing the memory wall and exploiting ultra-efficient analog low-precision arithmetic. Latest analog CIM designs attempt bit-parallel schemes for multi-bit analog Matrix-Vector Multiplication (MVM), aiming at higher energy efficiency, throughput, and training simplic… ▽ More Analog compute-in-memory (CIM) in static random-access memory (SRAM) is promising for accelerating deep learning inference by circumventing the memory wall and exploiting ultra-efficient analog low-precision arithmetic. Latest analog CIM designs attempt bit-parallel schemes for multi-bit analog Matrix-Vector Multiplication (MVM), aiming at higher energy efficiency, throughput, and training simplicity and robustness over conventional bit-serial methods that digitally shift-and-add multiple partial analog computing results. However, bit-parallel operations require more complex analog computations and become more sensitive to well-known analog CIM challenges, including large cell areas, inefficient and inaccurate multi-bit analog operations, and vulnerability to PVT variations. This paper presents PICO-RAM, a PVT-insensitive and compact CIM SRAM macro with charge-domain bit-parallel computation. It adopts a multi-bit thin-cell Multiply-Accumulate (MAC) unit that shares the same transistor layout as the most compact 6T SRAM cell. All analog computing modules, including digital-to-analog converters (DACs), MAC units, analog shift-and-add, and analog-to-digital converters (ADCs) reuse one set of local capacitors inside the array, performing in-situ computation to save area and enhance accuracy. A compact 8.5-bit dual-threshold time-domain ADC power gates the main path most of the time, leading to a significant energy reduction. Our 65-nm prototype achieves the highest weight storage density of 559 Kb/mm${^2}$ and exceptional robustness to temperature and voltage variations (-40 to 105 $^{\circ}$C and 0.65 to 1.2 V) among SRAM-based analog CIM designs. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: This manuscript has been accepted to IEEE Journal of Solid-State Circuits (JSSC)

arXiv:2407.12347 [pdf, other]

Improved Nonlocality Certification via Bouncing between Bell Operators and Inequalities

Authors: Weikang Li, Mengyao Hu, Ke Wang, Shibo Xu, Zhide Lu, Jiachen Chen, Yaozu Wu, Chuanyu Zhang, Feitong Jin, Xuhao Zhu, Yu Gao, Zhengyi Cui, Aosai Zhang, Ning Wang, Yiren Zou, Fanhao Shen, Jiarun Zhong, Zehang Bao, Zitian Zhu, Pengfei Zhang, Hekang Li, Qiujiang Guo, Zhen Wang, Dong-Ling Deng, Chao Song , et al. (3 additional authors not shown)

Abstract: Bell nonlocality is an intrinsic feature of quantum mechanics, which can be certified via the violation of Bell inequalities. It is therefore a fundamental question to certify Bell nonlocality from experimental data. Here, we present an optimization scheme to improve nonlocality certification by exploring flexible mappings between Bell inequalities and Hamiltonians corresponding to the Bell operat… ▽ More Bell nonlocality is an intrinsic feature of quantum mechanics, which can be certified via the violation of Bell inequalities. It is therefore a fundamental question to certify Bell nonlocality from experimental data. Here, we present an optimization scheme to improve nonlocality certification by exploring flexible mappings between Bell inequalities and Hamiltonians corresponding to the Bell operators. We show that several Hamiltonian models can be mapped to new inequalities with improved classical bounds than the original one, enabling a more robust detection of nonlocality. From the other direction, we investigate the mapping from fixed Bell inequalities to Hamiltonians, aiming to maximize quantum violations while considering experimental imperfections. As a practical demonstration, we apply this method to an XXZ-like honeycomb-lattice model utilizing over 70 superconducting qubits. The successful application of this technique, as well as combining the two directions to form an optimization loop, may open new avenues for developing more practical and noise-resilient nonlocality certification techniques and enable broader experimental explorations. △ Less

Submitted 17 July, 2024; originally announced July 2024.

Comments: 11 pages, 5 figures, 1 table

arXiv:2407.10215 [pdf, other]

DMRIntTk: integrating different DMR sets based on density peak clustering

Authors: Wenjin Zhang, Wenlong Jie, Wanxin Cui, Guihua Duan, You zou, Xiaoqing Peng

Abstract: \textbf{Background}: Identifying differentially methylated regions (DMRs) is a basic task in DNA methylation analysis. However, due to the different strategies adopted, different DMR sets will be predicted on the same dataset, which poses a challenge in selecting a reliable and comprehensive DMR set for downstream analysis. \textbf{Results}: Here, we develop DMRIntTk, a toolkit for integrating DMR… ▽ More \textbf{Background}: Identifying differentially methylated regions (DMRs) is a basic task in DNA methylation analysis. However, due to the different strategies adopted, different DMR sets will be predicted on the same dataset, which poses a challenge in selecting a reliable and comprehensive DMR set for downstream analysis. \textbf{Results}: Here, we develop DMRIntTk, a toolkit for integrating DMR sets predicted by different methods on a same dataset. In DMRIntTk, the genome is segmented into bins and the reliability of each DMR set at different methylation thresholds is evaluated. Then, the bins are weighted based on the covered DMR sets and integrated into DMRs by using a density peak clustering algorithm. To demonstrate the practicality of DMRIntTk, DMRIntTk was applied to different scenarios, including different tissues with relatively large methylation differences, cancer tissues versus normal tissues with medium methylation differences, and disease tissues versus normal tissues with subtle methylation differences. The results show that DMRIntTk can effectively trim the regions with small methylation differences in the original DMR sets and therefore it can enhance the proportion of DMRs with higher methylation differences. In addition, the overlap analysis suggests that the integrated DMR sets are quite comprehensive, and the functional analysis indicates the integrated disease-related DMR sets are significantly enriched in biological pathways, which are associated with the pathological mechanisms of the diseases. \textbf{Conclusions}: Conclusively, DMRIntTk can help researchers obtaining a reliable and comprehensive DMR set from many prediction methods. \textbf{Keywords}:{Differentially methylated regions, Methylation array, Cancer-related differentially methylated regions, Tissue-specific differentially methylated regions, Density peak clustering.} △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: 21 pages, 9 figures

arXiv:2407.09984 [pdf, ps, other]

Stabilizing Dynamic Systems through Neural Network Learning: A Robust Approach

Authors: Yu Zhang, Haoyu Zhang, Yongxiang Zou, Houcheng Li, Long Cheng

Abstract: Point-to-point and periodic motions are ubiquitous in the world of robotics. To master these motions, Autonomous Dynamic System (DS) based algorithms are fundamental in the domain of Learning from Demonstration (LfD). However, these algorithms face the significant challenge of balancing precision in learning with the maintenance of system stability. This paper addresses this challenge by presentin… ▽ More Point-to-point and periodic motions are ubiquitous in the world of robotics. To master these motions, Autonomous Dynamic System (DS) based algorithms are fundamental in the domain of Learning from Demonstration (LfD). However, these algorithms face the significant challenge of balancing precision in learning with the maintenance of system stability. This paper addresses this challenge by presenting a novel ADS algorithm that leverages neural network technology. The proposed algorithm is designed to distill essential knowledge from demonstration data, ensuring stability during the learning of both point-to-point and periodic motions. For point-to-point motions, a neural Lyapunov function is proposed to align with the provided demonstrations. In the case of periodic motions, the neural Lyapunov function is used with the transversal contraction to ensure that all generated motions converge to a stable limit cycle. The model utilizes a streamlined neural network architecture, adept at achieving dual objectives: optimizing learning accuracy while maintaining global stability. To thoroughly assess the efficacy of the proposed algorithm, rigorous evaluations are conducted using the LASA dataset and a manually designed dataset. These assessments were complemented by empirical validation through robotic experiments, providing robust evidence of the algorithm's performance △ Less

Submitted 13 July, 2024; originally announced July 2024.

Comments: arXiv admin note: text overlap with arXiv:2309.08849

arXiv:2407.03743 [pdf, other]

Determining the viewing angle from TeV light curve of GRB 221009A

Authors: Lin Zhou, Yuan-Chuan Zou

Abstract: Gamma-ray bursts (GRBs) are among the most powerful explosive events in the universe. LHAASO recently observed the most luminous one: GRB 221009A, and unveiled its TeV light curve. The light curve exhibits a distinct jet break at around 670 seconds, enabling the derivation of the viewing angle based on the smoothness of the jet break. We constructed two models with or without considering the high-… ▽ More Gamma-ray bursts (GRBs) are among the most powerful explosive events in the universe. LHAASO recently observed the most luminous one: GRB 221009A, and unveiled its TeV light curve. The light curve exhibits a distinct jet break at around 670 seconds, enabling the derivation of the viewing angle based on the smoothness of the jet break. We constructed two models with or without considering the high-latitude radiation, where the viewing angle was treated as a free parameter, to fit the TeV light curve. We obtained the viewing angles being 9.4 $\times 10^{-4}$ radians and 5.9 $\times 10^{-3}$ radians, respectively. These values closely resemble an on-axis scenario, given the opening angle is 1.4 $\times 10^{-2}$ radians. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2407.01364 [pdf]

Co-benefits of Agricultural Diversification and Technology for Food and Nutrition Security in China

Authors: Thomas Cherico Wanger, Estelle Raveloaritiana, Siyan Zeng, Haixiu Gao, Xueqing He, Yiwen Shao, Panlong Wu, Kris A. G. Wyckhuys, Wenwu Zhou, Yi Zou, Zengrong Zhu, Ling Li, Haiyan Cen, Yunhui Liu, Shenggen Fan

Abstract: China is the leading crop producer and has successfully implemented sustainable development programs related to agriculture. Sustainable agriculture has been promoted to achieve national food security targets such as food self-sufficiency through the well-facilitated farmland construction (WFFC) approach. The WFFC is introduced in Chinas current national 10-year plan to consolidate farmlands into… ▽ More China is the leading crop producer and has successfully implemented sustainable development programs related to agriculture. Sustainable agriculture has been promoted to achieve national food security targets such as food self-sufficiency through the well-facilitated farmland construction (WFFC) approach. The WFFC is introduced in Chinas current national 10-year plan to consolidate farmlands into large and simplified production areas to maximise automation, and improve soil fertility and productivity. However, research suggests that diversified and smaller farms faciliate ecosystem services, can improve yield resilience, defuse human health threats, and increase farm profitability. Currently, WFFC has not considered ecological farmland improvements and it may miss long-term environmental benefits including ecosystem service preservation conducive to yields. Moreover, the nutritional status in China has changed in recent decades with undernutrition being dramatically reduced, but the prevalence of overweight, obesity, and chronic diseases being increased. While a strategic choice and management of crop and livestock species can improve nutrition, the environmental and production benefits of agricultural diversification are currently not well interlinked with Chinas food and nutrition security discussions. Lastly, the role of agricultural technology for socioeconomic benefits and the link with diversified agricultural production may provide vast benefits for food security. Here, we focus on the opportunities and co-benefits of agricultural diversification and technology innovations to advance food and nutrition security in China through ecosystem service and yield benefits. Our applied five-point research agenda can provide evidence-based opportunities to support China in reaching its ambitious food security targets through agricultural diversification with global ramifications. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2407.00700 [pdf, other]

Study of $τ^- \to ωπ^- ν_τ$ decay in resonance chiral theory with tensor sources

Authors: Feng-Zhi Chen, Xin-Qiang Li, Shi-Can Peng, Ya-Dong Yang, Yuan-He Zou

Abstract: In this work, we make a study of the $τ^- \to ωπ^-ν_τ$ decay in the framework of low-energy effective field theory. The $J^{\mathcal{P}G}$ decompositions of the quark currents and the $ωπ$ final state show that, besides the Standard Model vector interaction, only the non-standard tensor interaction can have a non-zero contribution to the decay. To discuss its effect, a reliable calculation of the… ▽ More In this work, we make a study of the $τ^- \to ωπ^-ν_τ$ decay in the framework of low-energy effective field theory. The $J^{\mathcal{P}G}$ decompositions of the quark currents and the $ωπ$ final state show that, besides the Standard Model vector interaction, only the non-standard tensor interaction can have a non-zero contribution to the decay. To discuss its effect, a reliable calculation of the $ωπ$ tensor form factors is necessary. After constructing the Lagrangian of resonance chiral theory with external tensor sources, we calculate both the vector and tensor form factors with the relevant resonance couplings determined by combining the QCD short-distance constraints, the fit to the spectral function of $τ^- \to ωπ^-ν_τ$ decay, as well as the matching between the $\mathcal{O}(p^4)$ odd-intrinsic-parity operators after integrating out the vector resonances and the $\mathcal{O}(p^6)$ operators of chiral perturbation theory. The new physics effect is then investigated in the distributions of the spectral function and the forward-backward asymmetry of $τ^- \to ωπ^-ν_τ$ decay. We find that the spectral function is dominated by the Standard Model, and the non-standard tensor contribution is negligible. However, since the forward-backward asymmetry can be only generated with a non-zero tensor interaction, the observable is quite sensitive to this kind of new physics. A future measurement of the observable at the Belle II experiment as well as at the proposed Tera-Z and STCF facilities is, therefore, strongly called for to check the existence of such a non-standard tensor interaction. △ Less

Submitted 6 September, 2024; v1 submitted 30 June, 2024; originally announced July 2024.

Comments: 27 pages, 4 tables, and 2 figures; minor modification, final version published in the journal

arXiv:2406.17841 [pdf, other]

Probing many-body Bell correlation depth with superconducting qubits

Authors: Ke Wang, Weikang Li, Shibo Xu, Mengyao Hu, Jiachen Chen, Yaozu Wu, Chuanyu Zhang, Feitong Jin, Xuhao Zhu, Yu Gao, Ziqi Tan, Aosai Zhang, Ning Wang, Yiren Zou, Tingting Li, Fanhao Shen, Jiarun Zhong, Zehang Bao, Zitian Zhu, Zixuan Song, Jinfeng Deng, Hang Dong, Xu Zhang, Pengfei Zhang, Wenjie Jiang , et al. (10 additional authors not shown)

Abstract: Quantum nonlocality describes a stronger form of quantum correlation than that of entanglement. It refutes Einstein's belief of local realism and is among the most distinctive and enigmatic features of quantum mechanics. It is a crucial resource for achieving quantum advantages in a variety of practical applications, ranging from cryptography and certified random number generation via self-testing… ▽ More Quantum nonlocality describes a stronger form of quantum correlation than that of entanglement. It refutes Einstein's belief of local realism and is among the most distinctive and enigmatic features of quantum mechanics. It is a crucial resource for achieving quantum advantages in a variety of practical applications, ranging from cryptography and certified random number generation via self-testing to machine learning. Nevertheless, the detection of nonlocality, especially in quantum many-body systems, is notoriously challenging. Here, we report an experimental certification of genuine multipartite Bell correlations, which signal nonlocality in quantum many-body systems, up to 24 qubits with a fully programmable superconducting quantum processor. In particular, we employ energy as a Bell correlation witness and variationally decrease the energy of a many-body system across a hierarchy of thresholds, below which an increasing Bell correlation depth can be certified from experimental data. As an illustrating example, we variationally prepare the low-energy state of a two-dimensional honeycomb model with 73 qubits and certify its Bell correlations by measuring an energy that surpasses the corresponding classical bound with up to 48 standard deviations. In addition, we variationally prepare a sequence of low-energy states and certify their genuine multipartite Bell correlations up to 24 qubits via energies measured efficiently by parity oscillation and multiple quantum coherence techniques. Our results establish a viable approach for preparing and certifying multipartite Bell correlations, which provide not only a finer benchmark beyond entanglement for quantum devices, but also a valuable guide towards exploiting multipartite Bell correlation in a wide spectrum of practical applications. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: 11 pages,6 figures + 14 pages, 6 figures

arXiv:2406.16722 [pdf, other]

Venturing into Uncharted Waters: The Navigation Compass from Transformer to Mamba

Authors: Yuchen Zou, Yineng Chen, Zuchao Li, Lefei Zhang, Hai Zhao

Abstract: Transformer, a deep neural network architecture, has long dominated the field of natural language processing and beyond. Nevertheless, the recent introduction of Mamba challenges its supremacy, sparks considerable interest among researchers, and gives rise to a series of Mamba-based models that have exhibited notable potential. This survey paper orchestrates a comprehensive discussion, diving into… ▽ More Transformer, a deep neural network architecture, has long dominated the field of natural language processing and beyond. Nevertheless, the recent introduction of Mamba challenges its supremacy, sparks considerable interest among researchers, and gives rise to a series of Mamba-based models that have exhibited notable potential. This survey paper orchestrates a comprehensive discussion, diving into essential research dimensions, covering: (i) the functioning of the Mamba mechanism and its foundation on the principles of structured state space models; (ii) the proposed improvements and the integration of Mamba with various networks, exploring its potential as a substitute for Transformers; (iii) the combination of Transformers and Mamba to compensate for each other's shortcomings. We have also made efforts to interpret Mamba and Transformer in the framework of kernel functions, allowing for a comparison of their mathematical nature within a unified context. Our paper encompasses the vast majority of improvements related to Mamba to date. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.16487 [pdf, other]

Decomposing God Header File via Multi-View Graph Clustering

Authors: Yue Wang, Wenhui Chang, Tongwei Deng, Yanzhen Zou, Bing Xie

Abstract: God Header Files, just like God Classes, pose significant challenges for code comprehension and maintenance. Additionally, they increase the time required for code recompilation. However, existing refactoring methods for God Classes are inappropriate to deal with God Header Files because the code elements in header files are mostly short declaration types, and build dependencies of the entire syst… ▽ More God Header Files, just like God Classes, pose significant challenges for code comprehension and maintenance. Additionally, they increase the time required for code recompilation. However, existing refactoring methods for God Classes are inappropriate to deal with God Header Files because the code elements in header files are mostly short declaration types, and build dependencies of the entire system should be considered with the aim of improving compilation efficiency. Meanwhile, ensuring acyclic dependencies among the decomposed sub-header files is also crucial in the God Header File decomposition. This paper proposes a multi-view graph clustering based approach for decomposing God Header Files. It first constructs and coarsens the code element graph, then a novel multi-view graph clustering algorithm is applied to identify the clusters and a heuristic algorithm is introduced to address the cyclic dependencies in the clustering results. To evaluate our approach, we built both a synthetic dataset and a real-world God Header Files dataset. The results show that 1) Our approach could achieve 11.5% higher accuracy than existing God Class refactoring methods; 2) Our decomposition results attain better architecture on real-world God Header Files, evidenced by higher modularity and acyclic dependencies; 3) We can reduce 15% to 60% recompilation time for historical commits that require recompiling. △ Less

Submitted 19 September, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

Comments: Accepted by ICSME 2024

arXiv:2406.15339 [pdf, other]

Image Conductor: Precision Control for Interactive Video Synthesis

Authors: Yaowei Li, Xintao Wang, Zhaoyang Zhang, Zhouxia Wang, Ziyang Yuan, Liangbin Xie, Yuexian Zou, Ying Shan

Abstract: Filmmaking and animation production often require sophisticated techniques for coordinating camera transitions and object movements, typically involving labor-intensive real-world capturing. Despite advancements in generative AI for video creation, achieving precise control over motion for interactive video asset generation remains challenging. To this end, we propose Image Conductor, a method for… ▽ More Filmmaking and animation production often require sophisticated techniques for coordinating camera transitions and object movements, typically involving labor-intensive real-world capturing. Despite advancements in generative AI for video creation, achieving precise control over motion for interactive video asset generation remains challenging. To this end, we propose Image Conductor, a method for precise control of camera transitions and object movements to generate video assets from a single image. An well-cultivated training strategy is proposed to separate distinct camera and object motion by camera LoRA weights and object LoRA weights. To further address cinematographic variations from ill-posed trajectories, we introduce a camera-free guidance technique during inference, enhancing object movements while eliminating camera transitions. Additionally, we develop a trajectory-oriented video motion data curation pipeline for training. Quantitative and qualitative experiments demonstrate our method's precision and fine-grained control in generating motion-controllable videos from images, advancing the practical application of interactive video synthesis. Project webpage available at https://liyaowei-stu.github.io/project/ImageConductor/ △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: Project webpage available at https://liyaowei-stu.github.io/project/ImageConductor/

arXiv:2406.14232 [pdf, other]

Enhancing robustness of data-driven SHM models: adversarial training with circle loss

Authors: Xiangli Yang, Xijie Deng, Hanwei Zhang, Yang Zou, Jianxi Yang

Abstract: Structural health monitoring (SHM) is critical to safeguarding the safety and reliability of aerospace, civil, and mechanical infrastructure. Machine learning-based data-driven approaches have gained popularity in SHM due to advancements in sensors and computational power. However, machine learning models used in SHM are vulnerable to adversarial examples -- even small changes in input can lead to… ▽ More Structural health monitoring (SHM) is critical to safeguarding the safety and reliability of aerospace, civil, and mechanical infrastructure. Machine learning-based data-driven approaches have gained popularity in SHM due to advancements in sensors and computational power. However, machine learning models used in SHM are vulnerable to adversarial examples -- even small changes in input can lead to different model outputs. This paper aims to address this problem by discussing adversarial defenses in SHM. In this paper, we propose an adversarial training method for defense, which uses circle loss to optimize the distance between features in training to keep examples away from the decision boundary. Through this simple yet effective constraint, our method demonstrates substantial improvements in model robustness, surpassing existing defense mechanisms. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: 12 pages, 9 figures

arXiv:2406.13626 [pdf, other]

Fine-Tuning Gemma-7B for Enhanced Sentiment Analysis of Financial News Headlines

Authors: Kangtong Mo, Wenyan Liu, Xuanzhen Xu, Chang Yu, Yuelin Zou, Fangqing Xia

Abstract: In this study, we explore the application of sentiment analysis on financial news headlines to understand investor sentiment. By leveraging Natural Language Processing (NLP) and Large Language Models (LLM), we analyze sentiment from the perspective of retail investors. The FinancialPhraseBank dataset, which contains categorized sentiments of financial news headlines, serves as the basis for our an… ▽ More In this study, we explore the application of sentiment analysis on financial news headlines to understand investor sentiment. By leveraging Natural Language Processing (NLP) and Large Language Models (LLM), we analyze sentiment from the perspective of retail investors. The FinancialPhraseBank dataset, which contains categorized sentiments of financial news headlines, serves as the basis for our analysis. We fine-tuned several models, including distilbert-base-uncased, Llama, and gemma-7b, to evaluate their effectiveness in sentiment classification. Our experiments demonstrate that the fine-tuned gemma-7b model outperforms others, achieving the highest precision, recall, and F1 score. Specifically, the gemma-7b model showed significant improvements in accuracy after fine-tuning, indicating its robustness in capturing the nuances of financial sentiment. This model can be instrumental in providing market insights, risk management, and aiding investment decisions by accurately predicting the sentiment of financial news. The results highlight the potential of advanced LLMs in transforming how we analyze and interpret financial information, offering a powerful tool for stakeholders in the financial industry. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.13450 [pdf, other]

Federating to Grow Transformers with Constrained Resources without Model Sharing

Authors: Shikun Shen, Yifei Zou, Yuan Yuan, Yanwei Zheng, Peng Li, Xiuzhen Cheng, Dongxiao Yu

Abstract: The high resource consumption of large-scale models discourages resource-constrained users from developing their customized transformers. To this end, this paper considers a federated framework named Fed-Grow for multiple participants to cooperatively scale a transformer from their pre-trained small models. Under the Fed-Grow, a Dual-LiGO (Dual Linear Growth Operator) architecture is designed to h… ▽ More The high resource consumption of large-scale models discourages resource-constrained users from developing their customized transformers. To this end, this paper considers a federated framework named Fed-Grow for multiple participants to cooperatively scale a transformer from their pre-trained small models. Under the Fed-Grow, a Dual-LiGO (Dual Linear Growth Operator) architecture is designed to help participants expand their pre-trained small models to a transformer. In Dual-LiGO, the Local-LiGO part is used to address the heterogeneity problem caused by the various pre-trained models, and the Global-LiGO part is shared to exchange the implicit knowledge from the pre-trained models, local data, and training process of participants. Instead of model sharing, only sharing the Global-LiGO strengthens the privacy of our approach. Compared with several state-of-the-art methods in simulation, our approach has higher accuracy, better precision, and lower resource consumption on computations and communications. To the best of our knowledge, most of the previous model-scaling works are centralized, and our work is the first one that cooperatively grows a transformer from multiple pre-trained heterogeneous models with the user privacy protected in terms of local data and models. We hope that our approach can extend the transformers to the broadly distributed scenarios and encourage more resource-constrained users to enjoy the bonus taken by the large-scale transformers. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.13351 [pdf, other]

A Resource-Adaptive Approach for Federated Learning under Resource-Constrained Environments

Authors: Ruirui Zhang, Xingze Wu, Yifei Zou, Zhenzhen Xie, Peng Li, Xiuzhen Cheng, Dongxiao Yu

Abstract: The paper studies a fundamental federated learning (FL) problem involving multiple clients with heterogeneous constrained resources. Compared with the numerous training parameters, the computing and communication resources of clients are insufficient for fast local training and real-time knowledge sharing. Besides, training on clients with heterogeneous resources may result in the straggler proble… ▽ More The paper studies a fundamental federated learning (FL) problem involving multiple clients with heterogeneous constrained resources. Compared with the numerous training parameters, the computing and communication resources of clients are insufficient for fast local training and real-time knowledge sharing. Besides, training on clients with heterogeneous resources may result in the straggler problem. To address these issues, we propose Fed-RAA: a Resource-Adaptive Asynchronous Federated learning algorithm. Different from vanilla FL methods, where all parameters are trained by each participating client regardless of resource diversity, Fed-RAA adaptively allocates fragments of the global model to clients based on their computing and communication capabilities. Each client then individually trains its assigned model fragment and asynchronously uploads the updated result. Theoretical analysis confirms the convergence of our approach. Additionally, we design an online greedy-based algorithm for fragment allocation in Fed-RAA, achieving fairness comparable to an offline strategy. We present numerical results on MNIST, CIFAR-10, and CIFAR-100, along with necessary comparisons and ablation studies, demonstrating the advantages of our work. To the best of our knowledge, this paper represents the first resource-adaptive asynchronous method for fragment-based FL with guaranteed theoretical convergence. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.10744 [pdf, other]

Technique Report of CVPR 2024 PBDL Challenges

Authors: Ying Fu, Yu Li, Shaodi You, Boxin Shi, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li, Yuze Han, Yingkai Zhang, Jianan Wang, Qinglin Liu, Wei Yu, Xiaoqian Lv, Jianing Li, Shengping Zhang, Xiangyang Ji, Yuanpei Chen, Yuhan Zhang, Weihang Peng, Liwen Zhang, Zhe Xu, Dingyong Gou, Cong Li, Senyan Xu , et al. (75 additional authors not shown)

Abstract: The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, a… ▽ More The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, and medium properties from images. In recent years, deep learning has shown promising improvements for various vision tasks, and when combined with physics-based vision, these approaches can enhance the robustness and accuracy of vision systems. This technical report summarizes the outcomes of the Physics-Based Vision Meets Deep Learning (PBDL) 2024 challenge, held in CVPR 2024 workshop. The challenge consisted of eight tracks, focusing on Low-Light Enhancement and Detection as well as High Dynamic Range (HDR) Imaging. This report details the objectives, methodologies, and results of each track, highlighting the top-performing solutions and their innovative approaches. △ Less

Submitted 12 July, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

Comments: CVPR 2024 PBDL Challenges: https://pbdl-ws.github.io/pbdl2024/challenge/index.html

arXiv:2406.10534 [pdf, other]

A Finite Difference Informed Graph Network for Solving Steady-State Incompressible Flows on Block-Structured Grids

Authors: Yiye Zou, Tianyu Li, Shufan Zou, Jingyu Wang, Laiping Zhang, Xiaogang Deng

Abstract: Recently, advancements in deep learning have enabled physics-informed neural networks (PINNs) to solve partial differential equations (PDEs). Numerical differentiation (ND) using the finite difference (FD) method is efficient in physics-constrained designs, even in parameterized settings, often employing body-fitted block-structured grids for complex flow cases. However, convolution operators in C… ▽ More Recently, advancements in deep learning have enabled physics-informed neural networks (PINNs) to solve partial differential equations (PDEs). Numerical differentiation (ND) using the finite difference (FD) method is efficient in physics-constrained designs, even in parameterized settings, often employing body-fitted block-structured grids for complex flow cases. However, convolution operators in CNNs for finite differences are typically limited to single-block grids. To address this, we use graphs and graph networks (GNs) to learn flow representations across multi-block structured grids. We propose a graph convolution-based finite difference method (GC-FDM) to train GNs in a physics-constrained manner, enabling differentiable finite difference operations on graph unstructured outputs. Our goal is to solve parametric steady incompressible Navier-Stokes equations for flows around a backward-facing step, a circular cylinder, and double cylinders, using multi-block structured grids. Comparing our method to a CFD solver under various boundary conditions, we demonstrate improved training efficiency and accuracy, achieving a minimum relative error of $10^{-3}$ in velocity field prediction and a 20\% reduction in training cost compared to PINNs. △ Less

Submitted 15 June, 2024; originally announced June 2024.

arXiv:2406.10248 [pdf, other]

On the Worst Prompt Performance of Large Language Models

Authors: Bowen Cao, Deng Cai, Zhisong Zhang, Yuexian Zou, Wai Lam

Abstract: The performance of large language models (LLMs) is acutely sensitive to the phrasing of prompts, which raises significant concerns about their reliability in real-world scenarios. Existing studies often divide prompts into task-level instructions and case-level inputs and primarily focus on evaluating and improving robustness against variations in tasks-level instructions. However, this setup fail… ▽ More The performance of large language models (LLMs) is acutely sensitive to the phrasing of prompts, which raises significant concerns about their reliability in real-world scenarios. Existing studies often divide prompts into task-level instructions and case-level inputs and primarily focus on evaluating and improving robustness against variations in tasks-level instructions. However, this setup fails to fully address the diversity of real-world user queries and assumes the existence of task-specific datasets. To address these limitations, we introduce RobustAlpacaEval, a new benchmark that consists of semantically equivalent case-level queries and emphasizes the importance of using the worst prompt performance to gauge the lower bound of model performance. Extensive experiments on RobustAlpacaEval with ChatGPT and six open-source LLMs from the Llama, Mistral, and Gemma families uncover substantial variability in model performance; for instance, a difference of 45.48% between the worst and best performance for the Llama-2-70B-chat model, with its worst performance dipping as low as 9.38%. We further illustrate the difficulty in identifying the worst prompt from both model-agnostic and model-dependent perspectives, emphasizing the absence of a shortcut to characterize the worst prompt. We also attempt to enhance the worst prompt performance using existing prompt engineering and prompt consistency methods, but find that their impact is limited. These findings underscore the need to create more resilient LLMs that can maintain high performance across diverse prompts. Data and code are available at https://github.com/cbwbuaa/On-the-Worst-Prompt- Performance-of-LLMs. △ Less

Submitted 21 June, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

arXiv:2406.10239 [pdf]

Predict Click-Through Rates with Deep Interest Network Model in E-commerce Advertising

Authors: Chang Zhou, Yang Zhao, Yuelin Zou, Jin Cao, Wenhan Fan, Yi Zhao, Chiyu Cheng

Abstract: This paper proposes new methods to enhance click-through rate (CTR) prediction models using the Deep Interest Network (DIN) model, specifically applied to the advertising system of Alibaba's Taobao platform. Unlike traditional deep learning approaches, this research focuses on localized user behavior activation for tailored ad targeting by leveraging extensive user behavior data. Compared to tradi… ▽ More This paper proposes new methods to enhance click-through rate (CTR) prediction models using the Deep Interest Network (DIN) model, specifically applied to the advertising system of Alibaba's Taobao platform. Unlike traditional deep learning approaches, this research focuses on localized user behavior activation for tailored ad targeting by leveraging extensive user behavior data. Compared to traditional models, this method demonstrates superior ability to handle diverse and dynamic user data, thereby improving the efficiency of ad systems and increasing revenue. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: Accepted by the 5th International Conference on Information Science, Parallel and Distributed Systems (ISPDS 2024), 2024 IEEE

arXiv:2406.09683 [pdf, other]

Interstellar Nitrogen Isotope Ratios: Measurements on tracers of C$^{14}$N and C$^{15}$N

Authors: J. L. Chen, J. S. Zhang, C. Henkel, Y. T. Yan, H. Z. Yu, Y. X. Wang, Y. P. Zou, J. Y. Zhao, X. Y. Wang

Abstract: The nitrogen isotope ratio 14N/15N is a powerful tool to trace Galactic stellar nucleosynthesis and constraining Galactic chemical evolution. Previous observations have found lower 14N/15N ratios in the Galactic center and higher values in the Galactic disk. This is consistent with the inside-out formation scenario of our Milky Way. However, previous studies mostly utilized double isotope ratios a… ▽ More The nitrogen isotope ratio 14N/15N is a powerful tool to trace Galactic stellar nucleosynthesis and constraining Galactic chemical evolution. Previous observations have found lower 14N/15N ratios in the Galactic center and higher values in the Galactic disk. This is consistent with the inside-out formation scenario of our Milky Way. However, previous studies mostly utilized double isotope ratios also including 12C/13C, which introduces additional uncertainties. Here we therefore present observations of C14N and its rare isotopologue, C15N, toward a sample of star forming regions, measured by the IRAM 30 m and/or the ARO 12 m telescope at $λ$ ~3 mm wavelength. For those 35 sources detected in both isotopologues, physical parameters are determined. Furthermore we have obtained nitrogen isotope ratios using the strongest hyperfine components of CN and C15N. For those sources showing small deviations from Local Thermodynamical Equilibrium and/or self-absorption, the weakest hyperfine component, likely free of the latter effect, was used to obtain reliable 14N/15N values. Our measured 14N/15N isotope ratios from C14N and C15N measurements are compatible with those from our earlier measurements of NH3 and 15NH3 (Paper I), i.e., increasing ratios to a Galacticentric distance of ~9 kpc. The unweighted second order polynomial fit yields $\frac{{\rm C^{14}N}}{{\rm C^{15}N}} = (-4.85 \pm 1.89)\;{\rm kpc^{-2}} \times R_{\rm GC}^{2} + (82.11 \pm 31.93) \;{\rm kpc^{-1}} \times R_{\rm GC} - (28.12 \pm 126.62)$. Toward the outer galaxy, the isotope ratio tends to decrease, supporting an earlier finding by H13CN/HC15N. Galactic chemical evolution models are consistent with our measurements of the 14N/15N isotope ratio, i.e. a rising trend from the Galactic center region to approximately 9 kpc, followed by a decreasing trend with increasing $R_{\rm GC}$ toward the outer Galaxy. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 34 pages, 9 figures, 6 tables

Journal ref: The Astrophysical Journal (2004)

arXiv:2406.09555 [pdf, other]

Approximate quantum error correcting codes from conformal field theory

Authors: Shengqi Sang, Timothy H. Hsieh, Yijian Zou

Abstract: The low-energy subspace of a conformal field theory (CFT) can serve as a quantum error correcting code, with important consequences in holography and quantum gravity. We consider generic 1+1D CFT codes under extensive local dephasing channels and analyze their error correctability in the thermodynamic limit. We show that (i) there is a finite decoding threshold if and only if the minimal nonzero s… ▽ More The low-energy subspace of a conformal field theory (CFT) can serve as a quantum error correcting code, with important consequences in holography and quantum gravity. We consider generic 1+1D CFT codes under extensive local dephasing channels and analyze their error correctability in the thermodynamic limit. We show that (i) there is a finite decoding threshold if and only if the minimal nonzero scaling dimension in the fusion algebra generated by the jump operator of the channel is larger than $1/2$ and (ii) the number of protected logical qubits $k \geq Ω( \log \log n)$, where $n$ is the number of physical qubits. As an application, we show that the one-dimensional quantum critical Ising model has a finite threshold for certain types of dephasing noise. Our general results also imply that a CFT code with continuous symmetry saturates a bound on the recovery fidelity for covariant codes. △ Less

Submitted 7 August, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

Comments: 5+12 pages, 7 figures

arXiv:2406.08431 [pdf, other]

Diffusion Soup: Model Merging for Text-to-Image Diffusion Models

Authors: Benjamin Biggs, Arjun Seshadri, Yang Zou, Achin Jain, Aditya Golatkar, Yusheng Xie, Alessandro Achille, Ashwin Swaminathan, Stefano Soatto

Abstract: We present Diffusion Soup, a compartmentalization method for Text-to-Image Generation that averages the weights of diffusion models trained on sharded data. By construction, our approach enables training-free continual learning and unlearning with no additional memory or inference costs, since models corresponding to data shards can be added or removed by re-averaging. We show that Diffusion Soup… ▽ More We present Diffusion Soup, a compartmentalization method for Text-to-Image Generation that averages the weights of diffusion models trained on sharded data. By construction, our approach enables training-free continual learning and unlearning with no additional memory or inference costs, since models corresponding to data shards can be added or removed by re-averaging. We show that Diffusion Soup samples from a point in weight space that approximates the geometric mean of the distributions of constituent datasets, which offers anti-memorization guarantees and enables zero-shot style mixing. Empirically, Diffusion Soup outperforms a paragon model trained on the union of all data shards and achieves a 30% improvement in Image Reward (.34 $\to$ .44) on domain sharded data, and a 59% improvement in IR (.37 $\to$ .59) on aesthetic data. In both cases, souping also prevails in TIFA score (respectively, 85.5 $\to$ 86.5 and 85.6 $\to$ 86.8). We demonstrate robust unlearning -- removing any individual domain shard only lowers performance by 1% in IR (.45 $\to$ .44) -- and validate our theoretical insights on anti-memorization using real data. Finally, we showcase Diffusion Soup's ability to blend the distinct styles of models finetuned on different shards, resulting in the zero-shot generation of hybrid styles. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.05685 [pdf, other]

Understanding Open Source Contributor Profiles in Popular Machine Learning Libraries

Authors: Jiawen Liu, Haoxiang Zhang, Ying Zou

Abstract: With the increasing popularity of machine learning (ML), many open-source software (OSS) contributors are attracted to developing and adopting ML approaches. Comprehensive understanding of ML contributors is crucial for successful ML OSS development and maintenance. Without such knowledge, there is a risk of inefficient resource allocation and hindered collaboration in ML OSS projects. Existing re… ▽ More With the increasing popularity of machine learning (ML), many open-source software (OSS) contributors are attracted to developing and adopting ML approaches. Comprehensive understanding of ML contributors is crucial for successful ML OSS development and maintenance. Without such knowledge, there is a risk of inefficient resource allocation and hindered collaboration in ML OSS projects. Existing research focuses on understanding the difficulties and challenges perceived by ML contributors by user surveys. There is a lack of understanding of ML contributors based on their activities tracked from software repositories. In this paper, we aim to understand ML contributors by identifying contributor profiles in ML libraries. We further study contributors' OSS engagement from three aspects: workload composition, work preferences, and technical importance. By investigating 7,640 contributors from 6 popular ML libraries (TensorFlow, PyTorch, Keras, MXNet, Theano, and ONNX), we identify four contributor profiles: Core-Afterhour, Core-Workhour, Peripheral-Afterhour, and Peripheral-Workhour. We find that: 1) project experience, authored files, collaborations, and geographical location are significant features of all profiles; 2) contributors in Core profiles exhibit significantly different OSS engagement compared to Peripheral profiles; 3) contributors' work preferences and workload compositions significantly impact project popularity; 4) long-term contributors evolve towards making fewer, constant, balanced and less technical contributions. △ Less

Submitted 9 June, 2024; originally announced June 2024.

Showing 1–50 of 837 results for author: Zou, Y