Search | arXiv e-print repository

arXiv:2409.09806 [pdf]

Room-temperature valley-selective emission in Si-MoSe2 heterostructures enabled by high-quality-factor chiroptical cavities

Authors: Feng Pan, Xin Li, Amalya C. Johnson, Scott Dhuey, Ashley Saunders, Meng-Xia Hu, Jefferson P. Dixon, Sahil Dagli, Sze-Cheung Lau, Tingting Weng, Chih-Yi Chen, Jun-Hao Zeng, Rajas Apte, Tony F. Heinz, Fang Liu, Zi-Lan Deng, Jennifer A. Dionne

Abstract: Transition metal dichalcogenides (TMDCs) possess valley pseudospin, allowing photon spin to be coupled to electron spin and enabling initialization and readout of both classical and quantum information. Rapid valley-dephasing processes have impeded the development of scalable, high-performance valleytronic devices operating at room temperature. Here we demonstrate that a chiral resonant metasurfac… ▽ More Transition metal dichalcogenides (TMDCs) possess valley pseudospin, allowing photon spin to be coupled to electron spin and enabling initialization and readout of both classical and quantum information. Rapid valley-dephasing processes have impeded the development of scalable, high-performance valleytronic devices operating at room temperature. Here we demonstrate that a chiral resonant metasurface can enable room-temperature valley-selective emission, even with linearly polarized excitation. This platform provides circular eigen-polarization states with a high quality factor (Q-factor) and strong chiral near-field enhancement, resulting in unitary emission circular dichroism (i.e. single-handed circularly polarized emission). Our fabricated Si chiral metasurfaces exhibit chiral electromagnetic modes with Q-factors up to 450 at visible wavelengths, spectrally tuned to the exciton energy of MoSe2 monolayers. Using spatially- and spectrally-resolved mapping from temperatures of 100 K to 294 K, we demonstrate degrees of circular polarization (DOP) as high as 0.5 at room temperature. Reciprocal space mapping of the exciton emission reveals the chiral q-BIC localizes valley-selective emission in the vicinity of the photonic gamma-point. Photon-spin and time-resolved photoluminescence measurements show that the high DOP can be attributed to the significantly increased chiroptical local density of states provided by the metasurface, which enhances valley-specific radiative transition rates by a factor of approximately 13, with lifetimes as short as 189 ps. Our work could facilitate the development of compact chiral classical and quantum light sources and the creation of molecular chiral polaritons for quantum enantioselective synthesis. △ Less

Submitted 20 September, 2024; v1 submitted 15 September, 2024; originally announced September 2024.

arXiv:2409.05847 [pdf, other]

LSVOS Challenge Report: Large-scale Complex and Long Video Object Segmentation

Authors: Henghui Ding, Lingyi Hong, Chang Liu, Ning Xu, Linjie Yang, Yuchen Fan, Deshui Miao, Yameng Gu, Xin Li, Zhenyu He, Yaowei Wang, Ming-Hsuan Yang, Jinming Chai, Qin Ma, Junpei Zhang, Licheng Jiao, Fang Liu, Xinyu Liu, Jing Zhang, Kexin Zhang, Xu Liu, LingLing Li, Hao Fang, Feiyu Pan, Xiankai Lu , et al. (8 additional authors not shown)

Abstract: Despite the promising performance of current video segmentation models on existing benchmarks, these models still struggle with complex scenes. In this paper, we introduce the 6th Large-scale Video Object Segmentation (LSVOS) challenge in conjunction with ECCV 2024 workshop. This year's challenge includes two tasks: Video Object Segmentation (VOS) and Referring Video Object Segmentation (RVOS). In… ▽ More Despite the promising performance of current video segmentation models on existing benchmarks, these models still struggle with complex scenes. In this paper, we introduce the 6th Large-scale Video Object Segmentation (LSVOS) challenge in conjunction with ECCV 2024 workshop. This year's challenge includes two tasks: Video Object Segmentation (VOS) and Referring Video Object Segmentation (RVOS). In this year, we replace the classic YouTube-VOS and YouTube-RVOS benchmark with latest datasets MOSE, LVOS, and MeViS to assess VOS under more challenging complex environments. This year's challenge attracted 129 registered teams from more than 20 institutes across over 8 countries. This report include the challenge and dataset introduction, and the methods used by top 7 teams in two tracks. More details can be found in our homepage https://lsvos.github.io/. △ Less

Submitted 9 September, 2024; originally announced September 2024.

Comments: ECCV 2024 LSVOS Challenge Report: https://lsvos.github.io/

arXiv:2409.04800 [pdf, ps, other]

doi 10.1021/jacs.4c04910

FePd2Te2: An Anisotropic Two-Dimensional Ferromagnet with One-Dimensional Fe Chains

Authors: Bingxian Shi, Yanyan Geng, Hengning Wang, Jianhui Yang, Chenglin Shang, Manyu Wang, Shuo Mi, Jiale Huang, Feihao Pan, Xuejuan Gui, Jinchen Wang, Juanjuan Liu, Daye Xu, Hongxia Zhang, Jianfei Qin, Hongliang Wang, Lijie Hao, Mingliang Tian, Zhihai Cheng, Guolin Zheng, Peng Cheng

Abstract: Two-dimensional (2D) magnets have attracted significant attentions in recent years due to their importance in the research on both fundamental physics and spintronic applications. Here, we report the discovery of a new ternary compound FePd2Te2. It features a layered quasi-2D crystal structure with one-dimensional Fe zigzag chains extending along the b-axis in the cleavage plane. Single crystals o… ▽ More Two-dimensional (2D) magnets have attracted significant attentions in recent years due to their importance in the research on both fundamental physics and spintronic applications. Here, we report the discovery of a new ternary compound FePd2Te2. It features a layered quasi-2D crystal structure with one-dimensional Fe zigzag chains extending along the b-axis in the cleavage plane. Single crystals of FePd2Te2 with centimeter-size could be grown. Density functional theory calculations, mechanical exfoliation and atomic force microscopy on these crystals reveal that they are 2D materialsthat can be thinned down to 5 nm. Magnetic characterization shows that FePd2Te2 is an easy-plane ferromagnet with Tc 183 K and strong in-plane uniaxial magnetic anisotropy. Magnetoresistance and anomalous Hall effect demonstrate that ferromagnetism could maintain in FePd2Te2 flakes with large coercivity. A crystal twinning effect is observed by scanning tunneling microscopy which makes the Fe chains right-angle bent in the cleavage plane and creates an intriguing spin texture. Our results show that FePd2Te2 is a correlated anisotropic 2D magnets that may attract multidisciplinary research interests. △ Less

Submitted 7 September, 2024; originally announced September 2024.

Journal ref: J.Am.Chem.Soc.2024,146,21546-21554

arXiv:2408.10129 [pdf, other]

UNINEXT-Cutie: The 1st Solution for LSVOS Challenge RVOS Track

Authors: Hao Fang, Feiyu Pan, Xiankai Lu, Wei Zhang, Runmin Cong

Abstract: Referring video object segmentation (RVOS) relies on natural language expressions to segment target objects in video. In this year, LSVOS Challenge RVOS Track replaced the origin YouTube-RVOS benchmark with MeViS. MeViS focuses on referring the target object in a video through its motion descriptions instead of static attributes, posing a greater challenge to RVOS task. In this work, we integrate… ▽ More Referring video object segmentation (RVOS) relies on natural language expressions to segment target objects in video. In this year, LSVOS Challenge RVOS Track replaced the origin YouTube-RVOS benchmark with MeViS. MeViS focuses on referring the target object in a video through its motion descriptions instead of static attributes, posing a greater challenge to RVOS task. In this work, we integrate strengths of that leading RVOS and VOS models to build up a simple and effective pipeline for RVOS. Firstly, We finetune the state-of-the-art RVOS model to obtain mask sequences that are correlated with language descriptions. Secondly, based on a reliable and high-quality key frames, we leverage VOS model to enhance the quality and temporal consistency of the mask results. Finally, we further improve the performance of the RVOS model using semi-supervised learning. Our solution achieved 62.57 J&F on the MeViS test set and ranked 1st place for 6th LSVOS Challenge RVOS Track. △ Less

Submitted 24 August, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

arXiv:2408.10125 [pdf, other]

Video Object Segmentation via SAM 2: The 4th Solution for LSVOS Challenge VOS Track

Authors: Feiyu Pan, Hao Fang, Runmin Cong, Wei Zhang, Xiankai Lu

Abstract: Video Object Segmentation (VOS) task aims to segmenting a particular object instance throughout the entire video sequence given only the object mask of the first frame. Recently, Segment Anything Model 2 (SAM 2) is proposed, which is a foundation model towards solving promptable visual segmentation in images and videos. SAM 2 builds a data engine, which improves model and data via user interaction… ▽ More Video Object Segmentation (VOS) task aims to segmenting a particular object instance throughout the entire video sequence given only the object mask of the first frame. Recently, Segment Anything Model 2 (SAM 2) is proposed, which is a foundation model towards solving promptable visual segmentation in images and videos. SAM 2 builds a data engine, which improves model and data via user interaction, to collect the largest video segmentation dataset to date. SAM 2 is a simple transformer architecture with streaming memory for real-time video processing, which trained on the date provides strong performance across a wide range of tasks. In this work, we evaluate the zero-shot performance of SAM 2 on the more challenging VOS datasets MOSE and LVOS. Without fine-tuning on the training set, SAM 2 achieved 75.79 J&F on the test set and ranked 4th place for 6th LSVOS Challenge VOS Track. △ Less

Submitted 24 August, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2408.00714

arXiv:2408.02127 [pdf, other]

Automatic Platform Configuration and Software Integration for Software-Defined Vehicles

Authors: Fengjunjie Pan, Jianjie Lin, Markus Rickert

Abstract: In the automotive industry, platform configuration and software integration are mostly manual tasks performed during the development phase, requiring consideration of various safety and non-safety requirements. This manual process often leads to prolonged development cycles and provides limited flexibility. This paper introduces a novel approach to automate platform configuration and software inte… ▽ More In the automotive industry, platform configuration and software integration are mostly manual tasks performed during the development phase, requiring consideration of various safety and non-safety requirements. This manual process often leads to prolonged development cycles and provides limited flexibility. This paper introduces a novel approach to automate platform configuration and software integration for software-defined vehicles (SDVs), shifting these activities from the development phase to runtime. Our approach features an integration manager that combines model-based methods and virtualization technologies to generate and execute deployment plans. By leveraging model-based systems engineering (MBSE), our method automatically generates platform configuration and software integration plans, which are then converted into deployment-ready formats using code generation techniques. Utilizing virtualization and container orchestration technologies, the proposed system enables dynamic and flexible resource allocation while ensuring compliance with safety requirements. Communication between the development and runtime platforms is facilitated via a REST API. A proof of concept was implemented on a simulated SDV platform with the Intel Whiskey Lake Board. This demonstration showcases the integration manager on an SDV with a central computer, highlighting the potential to shorten development cycles and adapt to diverse vehicle configurations. △ Less

Submitted 4 August, 2024; originally announced August 2024.

Comments: 7 pages, 6 figures, preprint

arXiv:2407.09665 [pdf, other]

Coupling and Recoupling Coefficients for Wigner's U(4) Supermultiplet Symmetry

Authors: Phong Dang, Jerry P. Draayer, Feng Pan, Tomas Dytrych, Daniel Langr, David Kekejian, Kevin S. Becker, and Noah Thompson

Abstract: A novel procedure for evaluating Wigner coupling coefficients and Racah recoupling coefficients for U(4) in two group-subgroup chains is presented. The canonical U(4)->U(3)->U(2)->U(1) coupling and recoupling coefficients are applicable to any system that possesses U(4) symmetry, while the physical U(4)->SU_S(2)xSU_T(2) coupling coefficients are more specific to nuclear structure studies that util… ▽ More A novel procedure for evaluating Wigner coupling coefficients and Racah recoupling coefficients for U(4) in two group-subgroup chains is presented. The canonical U(4)->U(3)->U(2)->U(1) coupling and recoupling coefficients are applicable to any system that possesses U(4) symmetry, while the physical U(4)->SU_S(2)xSU_T(2) coupling coefficients are more specific to nuclear structure studies that utilize Wigner's Supermultiplet Symmetry concept. The procedure that is proposed sidesteps the use of binomial coefficients and alternating sum series, and consequently enables fast and accurate computation of any and all U(4)-underpinned features. The inner multiplicity of a (S,T) pair within a single U(4) irreducible representation is obtained from the dimension of the null space of the SU(2) raising generators; while the resolution for the outer multiplicity follows from the work of Alex et al. on U(N). It is anticipated that a C++ library will ultimately be available for determining generic coupling and recoupling coefficients associated with both the \textit{canonical} and the \textit{physical} group-subgroup chains of U(4). △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.02386 [pdf, other]

OpenSlot: Mixed Open-set Recognition with Object-centric Learning

Authors: Xu Yin, Fei Pan, Guoyuan An, Yuchi Huo, Zixuan Xie, Sung-Eui Yoon

Abstract: Existing open-set recognition (OSR) studies typically assume that each image contains only one class label, and the unknown test set (negative) has a disjoint label space from the known test set (positive), a scenario termed full-label shift. This paper introduces the mixed OSR problem, where test images contain multiple class semantics, with known and unknown classes co-occurring in negatives, le… ▽ More Existing open-set recognition (OSR) studies typically assume that each image contains only one class label, and the unknown test set (negative) has a disjoint label space from the known test set (positive), a scenario termed full-label shift. This paper introduces the mixed OSR problem, where test images contain multiple class semantics, with known and unknown classes co-occurring in negatives, leading to a more challenging super-label shift. Addressing the mixed OSR requires classification models to accurately distinguish different class semantics within images and measure their "knowness". In this study, we propose the OpenSlot framework, built upon object-centric learning. OpenSlot utilizes slot features to represent diverse class semantics and produce class predictions. Through our proposed anti-noise-slot (ANS) technique, we mitigate the impact of noise (invalid and background) slots during classification training, effectively addressing the semantic misalignment between class predictions and the ground truth. We conduct extensive experiments with OpenSlot on mixed & conventional OSR benchmarks. Without elaborate designs, OpenSlot not only exceeds existing OSR studies in detecting super-label shifts across single & multi-label mixed OSR tasks but also achieves state-of-the-art performance on conventional benchmarks. Remarkably, our method can localize class objects without using bounding boxes during training. The competitive performance in open-set object detection demonstrates OpenSlot's ability to explicitly explain label shifts and benefits in computational efficiency and generalization. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: This study is under IEEE TMM review

arXiv:2407.00769 [pdf, other]

Achieving Energetic Superiority Through System-Level Quantum Circuit Simulation

Authors: Rong Fu, Zhongling Su, Han-Sen Zhong, Xiti Zhao, Jianyang Zhang, Feng Pan, Pan Zhang, Xianhe Zhao, Ming-Cheng Chen, Chao-Yang Lu, Jian-Wei Pan, Zhiling Pei, Xingcheng Zhang, Wanli Ouyang

Abstract: Quantum Computational Superiority boasts rapid computation and high energy efficiency. Despite recent advances in classical algorithms aimed at refuting the milestone claim of Google's sycamore, challenges remain in generating uncorrelated samples of random quantum circuits. In this paper, we present a groundbreaking large-scale system technology that leverages optimization on global, node, and de… ▽ More Quantum Computational Superiority boasts rapid computation and high energy efficiency. Despite recent advances in classical algorithms aimed at refuting the milestone claim of Google's sycamore, challenges remain in generating uncorrelated samples of random quantum circuits. In this paper, we present a groundbreaking large-scale system technology that leverages optimization on global, node, and device levels to achieve unprecedented scalability for tensor networks. This enables the handling of large-scale tensor networks with memory capacities reaching tens of terabytes, surpassing memory space constraints on a single node. Our techniques enable accommodating large-scale tensor networks with up to tens of terabytes of memory, reaching up to 2304 GPUs with a peak computing power of 561 PFLOPS half-precision. Notably, we have achieved a time-to-solution of 14.22 seconds with energy consumption of 2.39 kWh which achieved fidelity of 0.002 and our most remarkable result is a time-to-solution of 17.18 seconds, with energy consumption of only 0.29 kWh which achieved a XEB of 0.002 after post-processing, outperforming Google's quantum processor Sycamore in both speed and energy efficiency, which recorded 600 seconds and 4.3 kWh, respectively. △ Less

Submitted 30 June, 2024; originally announced July 2024.

arXiv:2406.18889 [pdf, ps, other]

Leapfrogging Sycamore: Harnessing 1432 GPUs for 7$\times$ Faster Quantum Random Circuit Sampling

Authors: Xian-He Zhao, Han-Sen Zhong, Feng Pan, Zi-Han Chen, Rong Fu, Zhongling Su, Xiaotong Xie, Chaoxing Zhao, Pan Zhang, Wanli Ouyang, Chao-Yang Lu, Jian-Wei Pan, Ming-Cheng Chen

Abstract: Random quantum circuit sampling serves as a benchmark to demonstrate quantum computational advantage. Recent progress in classical algorithms, especially those based on tensor network methods, has significantly reduced the classical simulation time and challenged the claim of the first-generation quantum advantage experiments. However, in terms of generating uncorrelated samples, time-to-solution,… ▽ More Random quantum circuit sampling serves as a benchmark to demonstrate quantum computational advantage. Recent progress in classical algorithms, especially those based on tensor network methods, has significantly reduced the classical simulation time and challenged the claim of the first-generation quantum advantage experiments. However, in terms of generating uncorrelated samples, time-to-solution, and energy consumption, previous classical simulation experiments still underperform the \textit{Sycamore} processor. Here we report an energy-efficient classical simulation algorithm, using 1432 GPUs to simulate quantum random circuit sampling which generates uncorrelated samples with higher linear cross entropy score and is 7 times faster than \textit{Sycamore} 53 qubits experiment. We propose a post-processing algorithm to reduce the overall complexity, and integrated state-of-the-art high-performance general-purpose GPU to achieve two orders of lower energy consumption compared to previous works. Our work provides the first unambiguous experimental evidence to refute \textit{Sycamore}'s claim of quantum advantage, and redefines the boundary of quantum computational advantage using random circuit sampling. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: This work was completed on August 2023. A further 50x improvement has been achieved and will be posted on arXiv shortly

arXiv:2406.18013 [pdf, other]

Effects of model size in density-functional-theory study of alloys: A case study of CsPbBr$_2$Cl

Authors: Fang Pan, Lin Yang, Zhuangde Jiang, Wei Ren, Zuo-Guang Ye, Jingrui Li

Abstract: The primary challenge of density-functional-theory exploration of alloy systems concerns the size of computational model. Small alloy models can hardly exhibit the chemical disorder properly, while large models induce difficulty in sampling the alignments within the massive material space. We study this problem with the γ phase of the mixed halide inorganic perovskite alloy CsPbBr$_2$Cl. The distr… ▽ More The primary challenge of density-functional-theory exploration of alloy systems concerns the size of computational model. Small alloy models can hardly exhibit the chemical disorder properly, while large models induce difficulty in sampling the alignments within the massive material space. We study this problem with the γ phase of the mixed halide inorganic perovskite alloy CsPbBr$_2$Cl. The distribution of alloy formation energy becomes narrower when the size of the model system increases along $\sqrt{2}\times\sqrt{2}\times2$, $2\times2\times2$, and $2\sqrt{2}\times2\sqrt{2}\times2$ models. This is primarily because the distribution of Br distribution parameters, which plays a leading role in determining the formation energy range, is more narrow for larger models. As a result, larger entropy stability effect can be observed with larger models especially at high temperatures, for which the approximation using mixing entropy based on the ideal solution model becomes better. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.17005 [pdf, other]

PVUW 2024 Challenge on Complex Video Understanding: Methods and Results

Authors: Henghui Ding, Chang Liu, Yunchao Wei, Nikhila Ravi, Shuting He, Song Bai, Philip Torr, Deshui Miao, Xin Li, Zhenyu He, Yaowei Wang, Ming-Hsuan Yang, Zhensong Xu, Jiangtao Yao, Chengjing Wu, Ting Liu, Luoqi Liu, Xinyu Liu, Jing Zhang, Kexin Zhang, Yuting Yang, Licheng Jiao, Shuyuan Yang, Mingqi Gao, Jingnan Luo , et al. (12 additional authors not shown)

Abstract: Pixel-level Video Understanding in the Wild Challenge (PVUW) focus on complex video understanding. In this CVPR 2024 workshop, we add two new tracks, Complex Video Object Segmentation Track based on MOSE dataset and Motion Expression guided Video Segmentation track based on MeViS dataset. In the two new tracks, we provide additional videos and annotations that feature challenging elements, such as… ▽ More Pixel-level Video Understanding in the Wild Challenge (PVUW) focus on complex video understanding. In this CVPR 2024 workshop, we add two new tracks, Complex Video Object Segmentation Track based on MOSE dataset and Motion Expression guided Video Segmentation track based on MeViS dataset. In the two new tracks, we provide additional videos and annotations that feature challenging elements, such as the disappearance and reappearance of objects, inconspicuous small objects, heavy occlusions, and crowded environments in MOSE. Moreover, we provide a new motion expression guided video segmentation dataset MeViS to study the natural language-guided video understanding in complex environments. These new videos, sentences, and annotations enable us to foster the development of a more comprehensive and robust pixel-level understanding of video scenes in complex environments and realistic scenarios. The MOSE challenge had 140 registered teams in total, 65 teams participated the validation phase and 12 teams made valid submissions in the final challenge phase. The MeViS challenge had 225 registered teams in total, 50 teams participated the validation phase and 5 teams made valid submissions in the final challenge phase. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: MOSE Challenge: https://henghuiding.github.io/MOSE/ChallengeCVPR2024, MeViS Challenge: https://henghuiding.github.io/MeViS/ChallengeCVPR2024

arXiv:2406.15755 [pdf, other]

Fine-grained Background Representation for Weakly Supervised Semantic Segmentation

Authors: Xu Yin, Woobin Im, Dongbo Min, Yuchi Huo, Fei Pan, Sung-Eui Yoon

Abstract: Generating reliable pseudo masks from image-level labels is challenging in the weakly supervised semantic segmentation (WSSS) task due to the lack of spatial information. Prevalent class activation map (CAM)-based solutions are challenged to discriminate the foreground (FG) objects from the suspicious background (BG) pixels (a.k.a. co-occurring) and learn the integral object regions. This paper pr… ▽ More Generating reliable pseudo masks from image-level labels is challenging in the weakly supervised semantic segmentation (WSSS) task due to the lack of spatial information. Prevalent class activation map (CAM)-based solutions are challenged to discriminate the foreground (FG) objects from the suspicious background (BG) pixels (a.k.a. co-occurring) and learn the integral object regions. This paper proposes a simple fine-grained background representation (FBR) method to discover and represent diverse BG semantics and address the co-occurring problems. We abandon using the class prototype or pixel-level features for BG representation. Instead, we develop a novel primitive, negative region of interest (NROI), to capture the fine-grained BG semantic information and conduct the pixel-to-NROI contrast to distinguish the confusing BG pixels. We also present an active sampling strategy to mine the FG negatives on-the-fly, enabling efficient pixel-to-pixel intra-foreground contrastive learning to activate the entire object region. Thanks to the simplicity of design and convenience in use, our proposed method can be seamlessly plugged into various models, yielding new state-of-the-art results under various WSSS settings across benchmarks. Leveraging solely image-level (I) labels as supervision, our method achieves 73.2 mIoU and 45.6 mIoU segmentation results on Pascal Voc and MS COCO test sets, respectively. Furthermore, by incorporating saliency maps as an additional supervision signal (I+S), we attain 74.9 mIoU on Pascal Voc test set. Concurrently, our FBR approach demonstrates meaningful performance gains in weakly-supervised instance segmentation (WSIS) tasks, showcasing its robustness and strong generalization capabilities across diverse domains. △ Less

Submitted 22 June, 2024; originally announced June 2024.

arXiv:2406.15000 [pdf, other]

Unveiling the Impact of Multi-Modal Interactions on User Engagement: A Comprehensive Evaluation in AI-driven Conversations

Authors: Lichao Zhang, Jia Yu, Shuai Zhang, Long Li, Yangyang Zhong, Guanbao Liang, Yuming Yan, Qing Ma, Fangsheng Weng, Fayu Pan, Jing Li, Renjun Xu, Zhenzhong Lan

Abstract: Large Language Models (LLMs) have significantly advanced user-bot interactions, enabling more complex and coherent dialogues. However, the prevalent text-only modality might not fully exploit the potential for effective user engagement. This paper explores the impact of multi-modal interactions, which incorporate images and audio alongside text, on user engagement in chatbot conversations. We cond… ▽ More Large Language Models (LLMs) have significantly advanced user-bot interactions, enabling more complex and coherent dialogues. However, the prevalent text-only modality might not fully exploit the potential for effective user engagement. This paper explores the impact of multi-modal interactions, which incorporate images and audio alongside text, on user engagement in chatbot conversations. We conduct a comprehensive analysis using a diverse set of chatbots and real-user interaction data, employing metrics such as retention rate and conversation length to evaluate user engagement. Our findings reveal a significant enhancement in user engagement with multi-modal interactions compared to text-only dialogues. Notably, the incorporation of a third modality significantly amplifies engagement beyond the benefits observed with just two modalities. These results suggest that multi-modal interactions optimize cognitive processing and facilitate richer information comprehension. This study underscores the importance of multi-modality in chatbot design, offering valuable insights for creating more engaging and immersive AI communication experiences and informing the broader AI community about the benefits of multi-modal interactions in enhancing user engagement. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.06852 [pdf, other]

A Survey of Backdoor Attacks and Defenses on Large Language Models: Implications for Security Measures

Authors: Shuai Zhao, Meihuizi Jia, Zhongliang Guo, Leilei Gan, Xiaoyu Xu, Xiaobao Wu, Jie Fu, Yichao Feng, Fengjun Pan, Luu Anh Tuan

Abstract: Large Language Models (LLMs), which bridge the gap between human language understanding and complex problem-solving, achieve state-of-the-art performance on several NLP tasks, particularly in few-shot and zero-shot settings. Despite the demonstrable efficacy of LLMs, due to constraints on computational resources, users have to engage with open-source language models or outsource the entire trainin… ▽ More Large Language Models (LLMs), which bridge the gap between human language understanding and complex problem-solving, achieve state-of-the-art performance on several NLP tasks, particularly in few-shot and zero-shot settings. Despite the demonstrable efficacy of LLMs, due to constraints on computational resources, users have to engage with open-source language models or outsource the entire training process to third-party platforms. However, research has demonstrated that language models are susceptible to potential security vulnerabilities, particularly in backdoor attacks. Backdoor attacks are designed to introduce targeted vulnerabilities into language models by poisoning training samples or model weights, allowing attackers to manipulate model responses through malicious triggers. While existing surveys on backdoor attacks provide a comprehensive overview, they lack an in-depth examination of backdoor attacks specifically targeting LLMs. To bridge this gap and grasp the latest trends in the field, this paper presents a novel perspective on backdoor attacks for LLMs by focusing on fine-tuning methods. Specifically, we systematically classify backdoor attacks into three categories: full-parameter fine-tuning, parameter-efficient fine-tuning, and no fine-tuning Based on insights from a substantial review, we also discuss crucial issues for future research on backdoor attacks, such as further exploring attack algorithms that do not require fine-tuning, or developing more covert attack algorithms. △ Less

Submitted 11 September, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.04842 [pdf, other]

3rd Place Solution for MeViS Track in CVPR 2024 PVUW workshop: Motion Expression guided Video Segmentation

Authors: Feiyu Pan, Hao Fang, Xiankai Lu

Abstract: Referring video object segmentation (RVOS) relies on natural language expressions to segment target objects in video, emphasizing modeling dense text-video relations. The current RVOS methods typically use independently pre-trained vision and language models as backbones, resulting in a significant domain gap between video and text. In cross-modal feature interaction, text features are only used a… ▽ More Referring video object segmentation (RVOS) relies on natural language expressions to segment target objects in video, emphasizing modeling dense text-video relations. The current RVOS methods typically use independently pre-trained vision and language models as backbones, resulting in a significant domain gap between video and text. In cross-modal feature interaction, text features are only used as query initialization and do not fully utilize important information in the text. In this work, we propose using frozen pre-trained vision-language models (VLM) as backbones, with a specific emphasis on enhancing cross-modal feature interaction. Firstly, we use frozen convolutional CLIP backbone to generate feature-aligned vision and text features, alleviating the issue of domain gap and reducing training costs. Secondly, we add more cross-modal feature fusion in the pipeline to enhance the utilization of multi-modal information. Furthermore, we propose a novel video query initialization method to generate higher quality video queries. Without bells and whistles, our method achieved 51.5 J&F on the MeViS test set and ranked 3rd place for MeViS Track in CVPR 2024 PVUW workshop: Motion Expression guided Video Segmentation. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2405.06843 [pdf, other]

New Procedure for Evaluation of U(3) Coupling and Recoupling Coefficients

Authors: Phong Dang, Jerry P. Draayer, Feng Pan, Kevin S. Becker

Abstract: A simple method to calculate Wigner coupling coefficients and Racah recoupling coefficients for U(3) in two group-subgroup chains is presented. While the canonical U(3)->U(2)->U(1) coupling and recoupling coefficients are applicable to any system that respects U(3) symmetry, the U(3)->SO(3) coupling coefficients are more specific to nuclear structure studies. This new procedure precludes the use o… ▽ More A simple method to calculate Wigner coupling coefficients and Racah recoupling coefficients for U(3) in two group-subgroup chains is presented. While the canonical U(3)->U(2)->U(1) coupling and recoupling coefficients are applicable to any system that respects U(3) symmetry, the U(3)->SO(3) coupling coefficients are more specific to nuclear structure studies. This new procedure precludes the use of binomial coefficients and alternating sums which were used in the 1973 formulation of Draayer and Akiyama, and hence provides faster and more accurate output of requested results. The resolution of the outer multiplicity is based on the null space concept of the U(3) generators proposed by Arne Alex et al., whereas the inner multiplicity in the angular momentum subgroup chain is obtained from the dimension of the null space of the SO(3) raising operator. A C++ library built on this new methodology will be published in a complementary journal that specializes in the management and distribution of such programs. △ Less

Submitted 10 May, 2024; originally announced May 2024.

arXiv:2404.16407 [pdf, other]

U2++ MoE: Scaling 4.7x parameters with minimal impact on RTF

Authors: Xingchen Song, Di Wu, Binbin Zhang, Dinghao Zhou, Zhendong Peng, Bo Dang, Fuping Pan, Chao Yang

Abstract: Scale has opened new frontiers in natural language processing, but at a high cost. In response, by learning to only activate a subset of parameters in training and inference, Mixture-of-Experts (MoE) have been proposed as an energy efficient path to even larger and more capable language models and this shift towards a new generation of foundation models is gaining momentum, particularly within the… ▽ More Scale has opened new frontiers in natural language processing, but at a high cost. In response, by learning to only activate a subset of parameters in training and inference, Mixture-of-Experts (MoE) have been proposed as an energy efficient path to even larger and more capable language models and this shift towards a new generation of foundation models is gaining momentum, particularly within the field of Automatic Speech Recognition (ASR). Recent works that incorporating MoE into ASR models have complex designs such as routing frames via supplementary embedding network, improving multilingual ability for the experts, and utilizing dedicated auxiliary losses for either expert load balancing or specific language handling. We found that delicate designs are not necessary, while an embarrassingly simple substitution of MoE layers for all Feed-Forward Network (FFN) layers is competent for the ASR task. To be more specific, we benchmark our proposed model on a large scale inner-source dataset (160k hours), the results show that we can scale our baseline Conformer (Dense-225M) to its MoE counterparts (MoE-1B) and achieve Dense-1B level Word Error Rate (WER) while maintaining a Dense-225M level Real Time Factor (RTF). Furthermore, by applying Unified 2-pass framework with bidirectional attention decoders (U2++), we achieve the streaming and non-streaming decoding modes in a single MoE based model, which we call U2++ MoE. We hope that our study can facilitate the research on scaling speech foundation models without sacrificing deployment efficiency. △ Less

Submitted 8 August, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

ACM Class: I.2.7

arXiv:2404.12683 [pdf, other]

A Containerized Microservice Architecture for a ROS 2 Autonomous Driving Software: An End-to-End Latency Evaluation

Authors: Tobias Betz, Long Wen, Fengjunjie Pan, Gemb Kaljavesi, Alexander Zuepke, Andrea Bastoni, Marco Caccamo, Alois Knoll, Johannes Betz

Abstract: The automotive industry is transitioning from traditional ECU-based systems to software-defined vehicles. A central role of this revolution is played by containers, lightweight virtualization technologies that enable the flexible consolidation of complex software applications on a common hardware platform. Despite their widespread adoption, the impact of containerization on fundamental real-time m… ▽ More The automotive industry is transitioning from traditional ECU-based systems to software-defined vehicles. A central role of this revolution is played by containers, lightweight virtualization technologies that enable the flexible consolidation of complex software applications on a common hardware platform. Despite their widespread adoption, the impact of containerization on fundamental real-time metrics such as end-to-end latency, communication jitter, as well as memory and CPU utilization has remained virtually unexplored. This paper presents a microservice architecture for a real-world autonomous driving application where containers isolate each service. Our comprehensive evaluation shows the benefits in terms of end-to-end latency of such a solution even over standard bare-Linux deployments. Specifically, in the case of the presented microservice architecture, the mean end-to-end latency can be improved by 5-8 %. Also, the maximum latencies were significantly reduced using container deployment. △ Less

Submitted 19 April, 2024; originally announced April 2024.

arXiv:2404.05508 [pdf, other]

Synergy of Large Language Model and Model Driven Engineering for Automated Development of Centralized Vehicular Systems

Authors: Nenad Petrovic, Fengjunjie Pan, Krzysztof Lebioda, Vahid Zolfaghari, Sven Kirchner, Nils Purschke, Muhammad Aqib Khan, Viktor Vorobev, Alois Knoll

Abstract: We present a prototype of a tool leveraging the synergy of model driven engineering (MDE) and Large Language Models (LLM) for the purpose of software development process automation in the automotive industry. In this approach, the user-provided input is free form textual requirements, which are first translated to Ecore model instance representation using an LLM, which is afterwards checked for co… ▽ More We present a prototype of a tool leveraging the synergy of model driven engineering (MDE) and Large Language Models (LLM) for the purpose of software development process automation in the automotive industry. In this approach, the user-provided input is free form textual requirements, which are first translated to Ecore model instance representation using an LLM, which is afterwards checked for consistency using Object Constraint Language (OCL) rules. After successful consistency check, the model instance is fed as input to another LLM for the purpose of code generation. The generated code is evaluated in a simulated environment using CARLA simulator connected to an example centralized vehicle architecture, in an emergency brake scenario. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Report number: TUM-I24109 ACM Class: D.2.1; D.2.2; D.2.4; I.2.7; I.2.2; I.7.0

arXiv:2404.01144 [pdf]

doi 10.1021/acs.nanolett.4c00084

Electrical-controllable antiferromagnet-based tunnel junction

Authors: Lei Han, Xuming Luo, Yingqian Xu, Hua Bai, Wenxuan Zhu, Yuxiang Zhu, Guoqiang Yu, Cheng Song, Feng Pan

Abstract: Electrical-controllable antiferromagnet tunnel junction is a key goal in spintronics, holding immense promise for ultra-dense and ultra-stable antiferromagnetic memory with high processing speed for modern information technology. Here, we have advanced towards this goal by achieving an electrical-controllable antiferromagnet-based tunnel junction of Pt/Co/Pt/Co/IrMn/MgO/Pt. The exchange coupling b… ▽ More Electrical-controllable antiferromagnet tunnel junction is a key goal in spintronics, holding immense promise for ultra-dense and ultra-stable antiferromagnetic memory with high processing speed for modern information technology. Here, we have advanced towards this goal by achieving an electrical-controllable antiferromagnet-based tunnel junction of Pt/Co/Pt/Co/IrMn/MgO/Pt. The exchange coupling between antiferromagnetic IrMn and Co/Pt perpendicular magnetic multilayers results in the formation of interfacial exchange bias and exchange spring in IrMn. Encoding information states 0 and 1 is realized through the exchange spring in IrMn, which can be electrically written by spin-orbit torque switching with high cyclability and electrically read by antiferromagnetic tunneling anisotropic magnetoresistance. Combining spin-orbit torque switching of both exchange spring andexchange bias, 16 Boolean logic operation is successfully demonstrated. With both memory and logic functionalities integrated into our electrical-controllable antiferromagnetic-based tunnel junction, we chart the course toward high-performance antiferromagnetic logic-in-memory. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: 20 pages, 4 figures

arXiv:2404.00380 [pdf, other]

DHR: Dual Features-Driven Hierarchical Rebalancing in Inter- and Intra-Class Regions for Weakly-Supervised Semantic Segmentation

Authors: Sanghyun Jo, Fei Pan, In-Jae Yu, Kyungsu Kim

Abstract: Weakly-supervised semantic segmentation (WSS) ensures high-quality segmentation with limited data and excels when employed as input seed masks for large-scale vision models such as Segment Anything. However, WSS faces challenges related to minor classes since those are overlooked in images with adjacent multiple classes, a limitation originating from the overfitting of traditional expansion method… ▽ More Weakly-supervised semantic segmentation (WSS) ensures high-quality segmentation with limited data and excels when employed as input seed masks for large-scale vision models such as Segment Anything. However, WSS faces challenges related to minor classes since those are overlooked in images with adjacent multiple classes, a limitation originating from the overfitting of traditional expansion methods like Random Walk. We first address this by employing unsupervised and weakly-supervised feature maps instead of conventional methodologies, allowing for hierarchical mask enhancement. This method distinctly categorizes higher-level classes and subsequently separates their associated lower-level classes, ensuring all classes are correctly restored in the mask without losing minor ones. Our approach, validated through extensive experimentation, significantly improves WSS across five benchmarks (VOC: 79.8\%, COCO: 53.9\%, Context: 49.0\%, ADE: 32.9\%, Stuff: 37.4\%), reducing the gap with fully supervised methods by over 84\% on the VOC validation set. Code is available at https://github.com/shjo-april/DHR. △ Less

Submitted 19 May, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

arXiv:2403.18775 [pdf, other]

ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object

Authors: Chenshuang Zhang, Fei Pan, Junmo Kim, In So Kweon, Chengzhi Mao

Abstract: We establish rigorous benchmarks for visual perception robustness. Synthetic images such as ImageNet-C, ImageNet-9, and Stylized ImageNet provide specific type of evaluation over synthetic corruptions, backgrounds, and textures, yet those robustness benchmarks are restricted in specified variations and have low synthetic quality. In this work, we introduce generative model as a data source for syn… ▽ More We establish rigorous benchmarks for visual perception robustness. Synthetic images such as ImageNet-C, ImageNet-9, and Stylized ImageNet provide specific type of evaluation over synthetic corruptions, backgrounds, and textures, yet those robustness benchmarks are restricted in specified variations and have low synthetic quality. In this work, we introduce generative model as a data source for synthesizing hard images that benchmark deep models' robustness. Leveraging diffusion models, we are able to generate images with more diversified backgrounds, textures, and materials than any prior work, where we term this benchmark as ImageNet-D. Experimental results show that ImageNet-D results in a significant accuracy drop to a range of vision models, from the standard ResNet visual classifier to the latest foundation models like CLIP and MiniGPT-4, significantly reducing their accuracy by up to 60\%. Our work suggests that diffusion models can be an effective source to test vision models. The code and dataset are available at https://github.com/chenshuang-zhang/imagenet_d. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: Accepted at CVPR 2024

arXiv:2403.14460 [pdf, other]

Towards Single-System Illusion in Software-Defined Vehicles -- Automated, AI-Powered Workflow

Authors: Krzysztof Lebioda, Viktor Vorobev, Nenad Petrovic, Fengjunjie Pan, Vahid Zolfaghari, Alois Knoll

Abstract: We propose a novel model- and feature-based approach to development of vehicle software systems, where the end architecture is not explicitly defined. Instead, it emerges from an iterative process of search and optimization given certain constraints, requirements and hardware architecture, while retaining the property of single-system illusion, where applications run in a logically uniform environ… ▽ More We propose a novel model- and feature-based approach to development of vehicle software systems, where the end architecture is not explicitly defined. Instead, it emerges from an iterative process of search and optimization given certain constraints, requirements and hardware architecture, while retaining the property of single-system illusion, where applications run in a logically uniform environment. One of the key points of the presented approach is the inclusion of modern generative AI, specifically Large Language Models (LLMs), in the loop. With the recent advances in the field, we expect that the LLMs will be able to assist in processing of requirements, generation of formal system models, as well as generation of software deployment specification and test code. The resulting pipeline is automated to a large extent, with feedback being generated at each step. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Report number: TUM-I24108 ACM Class: D.2.1; D.2.2; D.2.4; I.2.7; I.2.2; I.7.0

arXiv:2403.13427 [pdf]

Observation of non-volatile anomalous Nernst effect in altermagnet with collinear Néel vector

Authors: Lei Han, Xizhi Fu, Wenqing He, Yuxiang Zhu, Jiankun Dai, Wenfeng Yang, Wenxuan Zhu, Hua Bai, Chong Chen, Caihua Wan, Xiufeng Han, Cheng Song, Junwei Liu, Feng Pan

Abstract: Anomalous Nernst effect (ANE), a widely investigated transverse thermoelectric effect that converts waste heat into electrical energy with remarkable flexibility and integration capability, has been extended to antiferromagnets with non-collinear spin texture recently. ANE in compensated magnet with collinear Néel vector will bring more opportunities to construct magnetic-field-immune and ultrafas… ▽ More Anomalous Nernst effect (ANE), a widely investigated transverse thermoelectric effect that converts waste heat into electrical energy with remarkable flexibility and integration capability, has been extended to antiferromagnets with non-collinear spin texture recently. ANE in compensated magnet with collinear Néel vector will bring more opportunities to construct magnetic-field-immune and ultrafast transverse thermoelectric converters, but remains unachieved for long. It is due to the degenerated band structure of traditional collinear compensated magnet excludes non-zero Berry curvature. Here, we realize non-volatile ANE in altermagnet Mn5Si3 thin film with collinear Neel vector, whose unique alternating spin-splitting band structure plays vital role in creating non-zero Berry curvature and hotpots of anomalous Nernst conductivity near band intersections. Interestingly, ANE is relatively weak in stoichiometric Mn5Si3, but undergoes a sixfold enhancement through strategically raising the Fermi level by additional Mn doping, indicating sensitive intrinsic influence from specific location of the Fermi level on ANE in altermagnet. Moreover, our investigation reveals a unique Neel-vector-dependent temperature-scaling relationship of anomalous Nernst conductivity in Mn5Si3. Our work not only fills a longstanding gap by confirming the presence of non-volatile ANE in collinear compensated magnet, but also enlightens thermoelectric physics related to exotic spin-splitting band structure in altermagnet. △ Less

Submitted 20 March, 2024; originally announced March 2024.

Comments: 25 pages, 4 figures

arXiv:2403.11358 [pdf]

Spin dissymmetry in optical cavities

Authors: Jefferson Dixon, Zachary N. Mauri, Christopher J. Ciccarino, Priyanuj Bordoloi, Feng Pan, Felipe H. da Jornada, Jennifer Dionne

Abstract: We introduce the spin dissymmetry factor, a measure of the spin-selectivity in the optical transition rate of quantum particles. This spin dissymmetry factor is valid locally, including at material interfaces and within optical cavities. We design and numerically demonstrate an optical cavity with three-fold rotational symmetry that maximizes spin dissymmetry, thereby minimizing the spin dephasing… ▽ More We introduce the spin dissymmetry factor, a measure of the spin-selectivity in the optical transition rate of quantum particles. This spin dissymmetry factor is valid locally, including at material interfaces and within optical cavities. We design and numerically demonstrate an optical cavity with three-fold rotational symmetry that maximizes spin dissymmetry, thereby minimizing the spin dephasing of a cavity-coupled quantum particle. Our approach emphasizes the difference between spin and chirality in the nearfield and reveals a classical parameter for designing more efficient quantum optical devices. △ Less

Submitted 17 March, 2024; originally announced March 2024.

Comments: 11 pages, 1 figure, 1 table

arXiv:2403.07396 [pdf]

Crystal design of altermagnetism

Authors: Zhiyuan Zhou, Xingkai Cheng, Mengli Hu, Junwei Liu, Feng Pan, Cheng Song

Abstract: Symmetry plays a fundamental role in condensed matter. The unique entanglement between magnetic sublattices and alternating crystal environment in altermagnets provides a unique opportunity for designing magnetic space symmetry. There have been extensive experimental efforts concentrated on tuning the Neel vector to reconstruct altermagnetic symmetry. However, it remains challenging to modulate th… ▽ More Symmetry plays a fundamental role in condensed matter. The unique entanglement between magnetic sublattices and alternating crystal environment in altermagnets provides a unique opportunity for designing magnetic space symmetry. There have been extensive experimental efforts concentrated on tuning the Neel vector to reconstruct altermagnetic symmetry. However, it remains challenging to modulate the altermagnetic symmetry through the crystal aspect. Here, the crystal design of altermagnetism is successfully realized, by breaking glide mirrors and magnetic mirrors of the (0001) crystallographic plane in CrSb films via crystal distortion. We establish a locking relationship between altermagnetic symmetry and the emergent Dzyaloshinskii-Moriya (DM) vectors in different CrSb films, realizing unprecedentedly room-temperature spontaneous anomalous Hall effect in an altermagnetic metal. The concept of exchange-coupling torques is broadened to include both antiferromagnetic exchange-coupling torque and DM torque. Their relationship is designable, determining electrical manipulation modes, e.g., field-assisted switching for CrSb(1-100)/Pt and field-free switching for W/CrSb(11-20). Particularly, the unprecedentedly field-free 100-percent switching of Neel vectors is realized by making these two torques parallel or antiparallel, dependent on Neel vector orientation. Besides unravelling the rich mechanisms for electrical manipulation of altermagnetism rooted in broadened concept of exchange-coupling torques, we list other material candidates and propose that crystal design of altermagnetism would bring rich designability to magnonics, topology, etc. △ Less

Submitted 12 March, 2024; originally announced March 2024.

Comments: 23 pages, 4 figures

arXiv:2402.19274 [pdf, other]

Mixed-halide perovskite alloys $\text{CsPb}(\text{I}_{1-x}^{}\text{Br}_x^{})_3^{}$ and $\text{CsPb}(\text{Br}_{1-x}^{}\text{Cl}_x^{})_3^{}$: New insight of configuration entropy effect from first principles and phase diagrams

Authors: Fang Pan, Junni Zhai, Jinyu Chen, Lin Yang, Hua Dong, Fang Yuan, Zhuangde Jiang, Wei Ren, Zuo-Guang Ye, Guo-Xu Zhang, Jingrui Li

Abstract: Stability is one of the key issues in mixed-halide perovskite alloys which are promising in emergent optoelectronics. Previous density-functional-theory (DFT) and machine learning studies indicate that the formation-energy convex hulls of these materials are very shallow, and stable alloy compositions are rare. In this work, we revisit this problem using DFT with special focus on the effects of co… ▽ More Stability is one of the key issues in mixed-halide perovskite alloys which are promising in emergent optoelectronics. Previous density-functional-theory (DFT) and machine learning studies indicate that the formation-energy convex hulls of these materials are very shallow, and stable alloy compositions are rare. In this work, we revisit this problem using DFT with special focus on the effects of configuration and vibration entropies. Allowed by the $20$-atomic models for the $\text{CsPb}(\text{I}_{1-x}^{}\text{Br}_x^{})_3^{}$ and $\text{CsPb}(\text{Br}_{1-x}^{}\text{Cl}_x^{})_3^{}$ series, the partition functions and therewith thermodynamic state functions are calculated by traversing all possible mixed-halide configurations. We can thus evaluate the temperature- and system-dependent configuration entropy, which largely corrects the conventional approach based on the ideal solution model. Finally, temperature-composition phase diagrams that include $α$, $β$, $γ$ and $δ$ phases of both alloys are constructed based on the free energy data, for which the contribution of phonon vibrations is included. △ Less

Submitted 29 February, 2024; originally announced February 2024.

arXiv:2402.02079 [pdf, other]

Prototypical Contrastive Learning through Alignment and Uniformity for Recommendation

Authors: Yangxun Ou, Lei Chen, Fenglin Pan, Yupeng Wu

Abstract: Graph Collaborative Filtering (GCF), one of the most widely adopted recommendation system methods, effectively captures intricate relationships between user and item interactions. Graph Contrastive Learning (GCL) based GCF has gained significant attention as it leverages self-supervised techniques to extract valuable signals from real-world scenarios. However, many methods usually learn the instan… ▽ More Graph Collaborative Filtering (GCF), one of the most widely adopted recommendation system methods, effectively captures intricate relationships between user and item interactions. Graph Contrastive Learning (GCL) based GCF has gained significant attention as it leverages self-supervised techniques to extract valuable signals from real-world scenarios. However, many methods usually learn the instances of discrimination tasks that involve the construction of contrastive pairs through random sampling. GCL approaches suffer from sampling bias issues, where the negatives might have a semantic structure similar to that of the positives, thus leading to a loss of effective feature representation. To address these problems, we present the \underline{Proto}typical contrastive learning through \underline{A}lignment and \underline{U}niformity for recommendation, which is called \textbf{ProtoAU}. Specifically, we first propose prototypes (cluster centroids) as a latent space to ensure consistency across different augmentations from the origin graph, aiming to eliminate the need for random sampling of contrastive pairs. Furthermore, the absence of explicit negatives means that directly optimizing the consistency loss between instance and prototype could easily result in dimensional collapse issues. Therefore, we propose aligning and maintaining uniformity in the prototypes of users and items as optimization objectives to prevent falling into trivial solutions. Finally, we conduct extensive experiments on four datasets and evaluate their performance on the task of link prediction. Experimental results demonstrate that the proposed ProtoAU outperforms other representative methods. The source codes of our proposed ProtoAU are available at \url{https://github.com/oceanlvr/ProtoAU}. △ Less

Submitted 3 February, 2024; originally announced February 2024.

arXiv:2402.00248 [pdf]

Ultrahigh Thermal Conductivity of Cubic Boron Arsenide with an Unexpectedly Strong Temperature Dependence

Authors: Songrui Hou, Fengjiao Pan, Xinping Shi, Zahra Ebrahim Nataj, Fariborz Kargar, Alexander A. Balandin, David G. Cahill, Chen Li, Zhifeng Ren, Richard B. Wilson

Abstract: Materials with high thermal conductivity are needed to conduct heat away from hot spots in power electronics and optoelectronic devices. Cubic boron arsenide (c-BAs) has a high thermal conductivity due to its special phonon dispersion relation. Previous experimental studies of c-BAs report a room-temperature thermal conductivity between 1000 and 1300 watts per meter-kelvin. We synthesized high pur… ▽ More Materials with high thermal conductivity are needed to conduct heat away from hot spots in power electronics and optoelectronic devices. Cubic boron arsenide (c-BAs) has a high thermal conductivity due to its special phonon dispersion relation. Previous experimental studies of c-BAs report a room-temperature thermal conductivity between 1000 and 1300 watts per meter-kelvin. We synthesized high purity c-BAs single crystals with room-temperature thermal conductivity of 1500 watts per meter-kelvin. We observed its thermal conductivity to be proportional to the inverse square of temperature between 300 and 600 kelvin, a stronger dependence than predicted by state-of-the-art theory. △ Less

Submitted 31 January, 2024; originally announced February 2024.

Comments: 14 pages, 3 figures

arXiv:2401.17608 [pdf]

Electrical 180o switching of Néel vector in spin-splitting antiferromagnet

Authors: Lei Han, Xizhi Fu, Rui Peng, Xingkai Cheng, Jiankun Dai, Liangyang Liu, Yidian Li, Yichi Zhang, Wenxuan Zhu, Hua Bai, Yongjian Zhou, Shixuan Liang, Chong Chen, Qian Wang, Xianzhe Chen, Luyi Yang, Yang Zhang, Cheng Song, Junwei Liu, Feng Pan

Abstract: Antiferromagnetic spintronics have attracted wide attention due to its great potential in constructing ultra-dense and ultra-fast antiferromagnetic memory that suits modern high-performance information technology. The electrical 180o switching of Néel vector is a long-term goal for developing electrical-controllable antiferromagnetic memory with opposite Néel vectors as binary "0" and "1". However… ▽ More Antiferromagnetic spintronics have attracted wide attention due to its great potential in constructing ultra-dense and ultra-fast antiferromagnetic memory that suits modern high-performance information technology. The electrical 180o switching of Néel vector is a long-term goal for developing electrical-controllable antiferromagnetic memory with opposite Néel vectors as binary "0" and "1". However, the state-of-art antiferromagnetic switching mechanisms have long been limited for 90o or 120o switching of Néel vector, which unavoidably require multiple writing channels that contradicts ultra-dense integration. Here, we propose a deterministic switching mechanism based on spin-orbit torque with asymmetric energy barrier, and experimentally achieve electrical 180o switching of spin-splitting antiferromagnet Mn5Si3. Such a 180o switching is read out by the Néel vector-induced anomalous Hall effect. Based on our writing and readout methods, we fabricate an antiferromagnet device with electrical-controllable high and low resistance states that accomplishes robust write and read cycles. Besides fundamental advance, our work promotes practical spin-splitting antiferromagnetic devices based on spin-splitting antiferromagnet. △ Less

Submitted 31 January, 2024; originally announced January 2024.

Comments: 19 pages, 4 figures

Journal ref: Sci. Adv. 10, eadn0479 (2024)

arXiv:2401.14113 [pdf, other]

On the Affinity, Rationality, and Diversity of Hierarchical Topic Modeling

Authors: Xiaobao Wu, Fengjun Pan, Thong Nguyen, Yichao Feng, Chaoqun Liu, Cong-Duy Nguyen, Anh Tuan Luu

Abstract: Hierarchical topic modeling aims to discover latent topics from a corpus and organize them into a hierarchy to understand documents with desirable semantic granularity. However, existing work struggles with producing topic hierarchies of low affinity, rationality, and diversity, which hampers document understanding. To overcome these challenges, we in this paper propose Transport Plan and Context-… ▽ More Hierarchical topic modeling aims to discover latent topics from a corpus and organize them into a hierarchy to understand documents with desirable semantic granularity. However, existing work struggles with producing topic hierarchies of low affinity, rationality, and diversity, which hampers document understanding. To overcome these challenges, we in this paper propose Transport Plan and Context-aware Hierarchical Topic Model (TraCo). Instead of early simple topic dependencies, we propose a transport plan dependency method. It constrains dependencies to ensure their sparsity and balance, and also regularizes topic hierarchy building with them. This improves affinity and diversity of hierarchies. We further propose a context-aware disentangled decoder. Rather than previously entangled decoding, it distributes different semantic granularity to topics at different levels by disentangled decoding. This facilitates the rationality of hierarchies. Experiments on benchmark datasets demonstrate that our method surpasses state-of-the-art baselines, effectively improving the affinity, rationality, and diversity of hierarchical topic modeling with better performance on downstream tasks. △ Less

Submitted 31 January, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

Comments: Accepted to AAAI2024 conference. Our code is available at https://github.com/bobxwu/TraCo

arXiv:2401.06741 [pdf, ps, other]

doi 10.1103/PhysRevMaterials.8.074006

Magnetic properties of van der Waals layered single crystals DyOBr and SmOCl

Authors: Feihao Pan, Daye Xu, Songnan Sun, Jiale Huang, Chenglin Shang, Bingxian Shi, Xuejuan Gui, Jianfei Qin, Hongliang Wang, Lijie Hao, Jinchen Wang, Juanjuan Liu, Hongxia Zhang, Peng Cheng

Abstract: Two-dimensional van der Waals single crystals DyOBr and SmOCl have been grown by flux method and their anisotropic magnetic properties are reported. DyOBr orders antiferromagnetically at T$_{N}$=9.5 K with magnetic moments lying along $a$-axis, similar as DyOCl. Its magnetic susceptibility shows an anomaly at T$^{*}$=30 K possibly due to the crystal field effect. Furthermore a 1/3 magnetization pl… ▽ More Two-dimensional van der Waals single crystals DyOBr and SmOCl have been grown by flux method and their anisotropic magnetic properties are reported. DyOBr orders antiferromagnetically at T$_{N}$=9.5 K with magnetic moments lying along $a$-axis, similar as DyOCl. Its magnetic susceptibility shows an anomaly at T$^{*}$=30 K possibly due to the crystal field effect. Furthermore a 1/3 magnetization plateau is clearly observed under H$\parallel$a and H$\parallel$[110], which might be a field-induced spin-flop phase or some exotic quantum magnetic state. On the other hand, isostructural SmOCl undergoes an antiferromagnetic transition at T$_{N}$=7.1 K and exhibits a contrasting Ising-like perpendicular $c$-axis magnetic anisotropy, which could be well explained by our crystal field calculations. Both DyOBr and SmOCl are insulators with band gap of $\sim$5 eV, our results suggest they are promising in building van der Waals heterostructures and applications in multifunctional devices. △ Less

Submitted 7 July, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

Comments: 6 pages, 5 figures

arXiv:2401.05949 [pdf, other]

Universal Vulnerabilities in Large Language Models: Backdoor Attacks for In-context Learning

Authors: Shuai Zhao, Meihuizi Jia, Luu Anh Tuan, Fengjun Pan, Jinming Wen

Abstract: In-context learning, a paradigm bridging the gap between pre-training and fine-tuning, has demonstrated high efficacy in several NLP tasks, especially in few-shot settings. Despite being widely applied, in-context learning is vulnerable to malicious attacks. In this work, we raise security concerns regarding this paradigm. Our studies demonstrate that an attacker can manipulate the behavior of lar… ▽ More In-context learning, a paradigm bridging the gap between pre-training and fine-tuning, has demonstrated high efficacy in several NLP tasks, especially in few-shot settings. Despite being widely applied, in-context learning is vulnerable to malicious attacks. In this work, we raise security concerns regarding this paradigm. Our studies demonstrate that an attacker can manipulate the behavior of large language models by poisoning the demonstration context, without the need for fine-tuning the model. Specifically, we design a new backdoor attack method, named ICLAttack, to target large language models based on in-context learning. Our method encompasses two types of attacks: poisoning demonstration examples and poisoning demonstration prompts, which can make models behave in alignment with predefined intentions. ICLAttack does not require additional fine-tuning to implant a backdoor, thus preserving the model's generality. Furthermore, the poisoned examples are correctly labeled, enhancing the natural stealth of our attack method. Extensive experimental results across several language models, ranging in size from 1.3B to 180B parameters, demonstrate the effectiveness of our attack method, exemplified by a high average attack success rate of 95.0% across the three datasets on OPT models. △ Less

Submitted 16 February, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

arXiv:2401.01571 [pdf, other]

CodeFuse-Query: A Data-Centric Static Code Analysis System for Large-Scale Organizations

Authors: Xiaoheng Xie, Gang Fan, Xiaojun Lin, Ang Zhou, Shijie Li, Xunjin Zheng, Yinan Liang, Yu Zhang, Na Yu, Haokun Li, Xinyu Chen, Yingzhuang Chen, Yi Zhen, Dejun Dong, Xianjin Fu, Jinzhou Su, Fuxiong Pan, Pengshuai Luo, Youzheng Feng, Ruoxiang Hu, Jing Fan, Jinguo Zhou, Xiao Xiao, Peng Di

Abstract: In the domain of large-scale software development, the demands for dynamic and multifaceted static code analysis exceed the capabilities of traditional tools. To bridge this gap, we present CodeFuse-Query, a system that redefines static code analysis through the fusion of Domain Optimized System Design and Logic Oriented Computation Design. CodeFuse-Query reimagines code analysis as a data compu… ▽ More In the domain of large-scale software development, the demands for dynamic and multifaceted static code analysis exceed the capabilities of traditional tools. To bridge this gap, we present CodeFuse-Query, a system that redefines static code analysis through the fusion of Domain Optimized System Design and Logic Oriented Computation Design. CodeFuse-Query reimagines code analysis as a data computation task, support scanning over 10 billion lines of code daily and more than 300 different tasks. It optimizes resource utilization, prioritizes data reusability, applies incremental code extraction, and introduces tasks types specially for Code Change, underscoring its domain-optimized design. The system's logic-oriented facet employs Datalog, utilizing a unique two-tiered schema, COREF, to convert source code into data facts. Through Godel, a distinctive language, CodeFuse-Query enables formulation of complex tasks as logical expressions, harnessing Datalog's declarative prowess. This paper provides empirical evidence of CodeFuse-Query's transformative approach, demonstrating its robustness, scalability, and efficiency. We also highlight its real-world impact and diverse applications, emphasizing its potential to reshape the landscape of static code analysis in the context of large-scale software development.Furthermore, in the spirit of collaboration and advancing the field, our project is open-sourced and the repository is available for public access △ Less

Submitted 3 January, 2024; originally announced January 2024.

arXiv:2312.12479 [pdf, other]

Zero-shot Building Attribute Extraction from Large-Scale Vision and Language Models

Authors: Fei Pan, Sangryul Jeon, Brian Wang, Frank Mckenna, Stella X. Yu

Abstract: Existing building recognition methods, exemplified by BRAILS, utilize supervised learning to extract information from satellite and street-view images for classification and segmentation. However, each task module requires human-annotated data, hindering the scalability and robustness to regional variations and annotation imbalances. In response, we propose a new zero-shot workflow for building at… ▽ More Existing building recognition methods, exemplified by BRAILS, utilize supervised learning to extract information from satellite and street-view images for classification and segmentation. However, each task module requires human-annotated data, hindering the scalability and robustness to regional variations and annotation imbalances. In response, we propose a new zero-shot workflow for building attribute extraction that utilizes large-scale vision and language models to mitigate reliance on external annotations. The proposed workflow contains two key components: image-level captioning and segment-level captioning for the building images based on the vocabularies pertinent to structural and civil engineering. These two components generate descriptive captions by computing feature representations of the image and the vocabularies, and facilitating a semantic match between the visual and textual representations. Consequently, our framework offers a promising avenue to enhance AI-driven captioning for building attribute extraction in the structural and civil engineering domains, ultimately reducing reliance on human annotations while bolstering performance and adaptability. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: Accepted to WACV 2024, Project Page: https://sites.google.com/view/zobae/home

arXiv:2312.07254 [pdf, other]

The GUA-Speech System Description for CNVSRC Challenge 2023

Authors: Shengqiang Li, Chao Lei, Baozhong Ma, Binbin Zhang, Fuping Pan

Abstract: This study describes our system for Task 1 Single-speaker Visual Speech Recognition (VSR) fixed track in the Chinese Continuous Visual Speech Recognition Challenge (CNVSRC) 2023. Specifically, we use intermediate connectionist temporal classification (Inter CTC) residual modules to relax the conditional independence assumption of CTC in our model. Then we use a bi-transformer decoder to enable the… ▽ More This study describes our system for Task 1 Single-speaker Visual Speech Recognition (VSR) fixed track in the Chinese Continuous Visual Speech Recognition Challenge (CNVSRC) 2023. Specifically, we use intermediate connectionist temporal classification (Inter CTC) residual modules to relax the conditional independence assumption of CTC in our model. Then we use a bi-transformer decoder to enable the model to capture both past and future contextual information. In addition, we use Chinese characters as the modeling units to improve the recognition accuracy of our model. Finally, we use a recurrent neural network language model (RNNLM) for shallow fusion in the inference stage. Experiments show that our system achieves a character error rate (CER) of 38.09% on the Eval set which reaches a relative CER reduction of 21.63% over the official baseline, and obtains a second place in the challenge. △ Less

Submitted 12 December, 2023; originally announced December 2023.

Comments: CNVSRC 2023 Challenge

arXiv:2311.15033 [pdf, other]

Agent as Cerebrum, Controller as Cerebellum: Implementing an Embodied LMM-based Agent on Drones

Authors: Haoran Zhao, Fengxing Pan, Huqiuyue Ping, Yaoming Zhou

Abstract: In this study, we present a novel paradigm for industrial robotic embodied agents, encapsulating an 'agent as cerebrum, controller as cerebellum' architecture. Our approach harnesses the power of Large Multimodal Models (LMMs) within an agent framework known as AeroAgent, tailored for drone technology in industrial settings. To facilitate seamless integration with robotic systems, we introduce ROS… ▽ More In this study, we present a novel paradigm for industrial robotic embodied agents, encapsulating an 'agent as cerebrum, controller as cerebellum' architecture. Our approach harnesses the power of Large Multimodal Models (LMMs) within an agent framework known as AeroAgent, tailored for drone technology in industrial settings. To facilitate seamless integration with robotic systems, we introduce ROSchain, a bespoke linkage framework connecting LMM-based agents to the Robot Operating System (ROS). We report findings from extensive empirical research, including simulated experiments on the Airgen and real-world case study, particularly in individual search and rescue operations. The results demonstrate AeroAgent's superior performance in comparison to existing Deep Reinforcement Learning (DRL)-based agents, highlighting the advantages of the embodied LMM in complex, real-world scenarios. △ Less

Submitted 25 November, 2023; originally announced November 2023.

Comments: 17 pages, 12 figures

arXiv:2311.12067 [pdf, other]

Quality and Quantity: Unveiling a Million High-Quality Images for Text-to-Image Synthesis in Fashion Design

Authors: Jia Yu, Lichao Zhang, Zijie Chen, Fayu Pan, MiaoMiao Wen, Yuming Yan, Fangsheng Weng, Shuai Zhang, Lili Pan, Zhenzhong Lan

Abstract: The fusion of AI and fashion design has emerged as a promising research area. However, the lack of extensive, interrelated data on clothing and try-on stages has hindered the full potential of AI in this domain. Addressing this, we present the Fashion-Diffusion dataset, a product of multiple years' rigorous effort. This dataset, the first of its kind, comprises over a million high-quality fashion… ▽ More The fusion of AI and fashion design has emerged as a promising research area. However, the lack of extensive, interrelated data on clothing and try-on stages has hindered the full potential of AI in this domain. Addressing this, we present the Fashion-Diffusion dataset, a product of multiple years' rigorous effort. This dataset, the first of its kind, comprises over a million high-quality fashion images, paired with detailed text descriptions. Sourced from a diverse range of geographical locations and cultural backgrounds, the dataset encapsulates global fashion trends. The images have been meticulously annotated with fine-grained attributes related to clothing and humans, simplifying the fashion design process into a Text-to-Image (T2I) task. The Fashion-Diffusion dataset not only provides high-quality text-image pairs and diverse human-garment pairs but also serves as a large-scale resource about humans, thereby facilitating research in T2I generation. Moreover, to foster standardization in the T2I-based fashion design field, we propose a new benchmark comprising multiple datasets for evaluating the performance of fashion design models. This work represents a significant leap forward in the realm of AI-driven fashion design, setting a new standard for future research in this field. △ Less

Submitted 18 March, 2024; v1 submitted 19 November, 2023; originally announced November 2023.

arXiv:2311.01387 [pdf]

Millimeter-scale exfoliation of hBN with tunable flake thickness

Authors: Amy S. McKeown-Green, Helen J. Zeng, Ashley P. Saunders, Jiayi Li, Jenny Hu, Jiaojian Shi, Yuejun Shen, Feng Pan, Jennifer A. Dionne, Tony F. Heinz, Stephen Wu, Fan Zheng, Fang Liu

Abstract: As a two-dimensional (2D) dielectric material, hexagonal boron nitride (hBN) is in high demand for applications in photonics, nonlinear optics, and nanoelectronics. Unfortunately, the high-throughput preparation of macroscopic-scale, high-quality hBN flakes with controlled thickness is an ongoing challenge, limiting device fabrication and technological integration. Here, we present a metal thin-fi… ▽ More As a two-dimensional (2D) dielectric material, hexagonal boron nitride (hBN) is in high demand for applications in photonics, nonlinear optics, and nanoelectronics. Unfortunately, the high-throughput preparation of macroscopic-scale, high-quality hBN flakes with controlled thickness is an ongoing challenge, limiting device fabrication and technological integration. Here, we present a metal thin-film exfoliation method to prepare hBN flakes with millimeter-scale dimension, near-unity yields, and tunable flake thickness distribution from 1-7 layers, a substantial improvement over scotch tape exfoliation. The single crystallinity and high quality of the exfoliated hBN are demonstrated with optical microscopy, atomic force microscopy, Raman spectroscopy, and second harmonic generation. We further explore a possible mechanism for the effectiveness and selectivity based on thin-film residual stress measurements, density functional theory calculations, and transmission electron microscopy imaging of the deposited metal films. We find that the magnitude of the residual tensile stress induced by thin film deposition plays a key role in determining exfoliated flake thickness in a manner which closely resembles 3D semiconductor spalling. Lastly, we demonstrate that our exfoliated, large-area hBN flakes can be readily incorporated as encapsulating layers for other 2D monolayers. Altogether, this method brings us one step closer to the high throughput, mass production of hBN-based 2D photonic, optoelectronic, and quantum devices. △ Less

Submitted 2 November, 2023; originally announced November 2023.

Comments: 21 pages, 5 figures, work completed at Stanford University

arXiv:2310.10587 [pdf, ps, other]

A Tri-Level Optimization Model for Interdependent Infrastructure Network Resilience Against Compound Hazard Events

Authors: Matthew R. Oster, Ilya Amburg, Samrat Chatterjee, Daniel A. Eisenberg, Dennis G. Thomas, Feng Pan, Auroop R. Ganguly

Abstract: Resilient operation of interdependent infrastructures against compound hazard events is essential for maintaining societal well-being. To address consequence assessment challenges in this problem space, we propose a novel tri-level optimization model applied to a proof-of-concept case study with fuel distribution and transportation networks -- encompassing one realistic network; one fictitious, ye… ▽ More Resilient operation of interdependent infrastructures against compound hazard events is essential for maintaining societal well-being. To address consequence assessment challenges in this problem space, we propose a novel tri-level optimization model applied to a proof-of-concept case study with fuel distribution and transportation networks -- encompassing one realistic network; one fictitious, yet realistic network; as well as networks drawn from three synthetic distributions. Mathematically, our approach takes the form of a defender-attacker-defender (DAD) model -- a multi-agent tri-level optimization, comprised of a defender, attacker, and an operator acting in sequence. Here, our notional operator may choose proxy actions to operate an interdependent system comprised of fuel terminals and gas stations (functioning as supplies) and a transportation network with traffic flow (functioning as demand) to minimize unmet demand at gas stations. A notional attacker aims to hypothetically disrupt normal operations by reducing supply at the supply terminals, and the notional defender aims to identify best proxy defense policy options which include hardening supply terminals or allowing alternative distribution methods such as trucking reserve supplies. We solve our DAD formulation at a metropolitan scale and present practical defense policy insights against hypothetical compound hazards. We demonstrate the generalizability of our framework by presenting results for a realistic network; a fictitious, yet realistic network; as well as for three networks drawn from synthetic distributions. Additionally, we demonstrate the scalability of the framework by investigating runtime performance as a function of the network size. Steps for future research are also discussed. △ Less

Submitted 16 October, 2023; originally announced October 2023.

arXiv:2310.03987 [pdf]

Antiferromagnetic magnonic charge current generation via ultrafast optical excitation

Authors: Lin Huang, Liyang Liao, Hongsong Qiu, Xianzhe Chen, Hua Bai, Lei Han, Yongjian Zhou, Yichen Su, Zhiyuan Zhou, Feng Pan, Biaobing Jin, Cheng Song

Abstract: Néel spin-orbit torque allows a charge current pulse to efficiently manipulate the Néel vector in antiferromagnets, which offers a unique opportunity for ultrahigh density information storage with high speed. However, the reciprocal process of Néel spin-orbit torque, the generation of ultrafast charge current in antiferromagnets has not been demonstrated. Here, we report the experimental observati… ▽ More Néel spin-orbit torque allows a charge current pulse to efficiently manipulate the Néel vector in antiferromagnets, which offers a unique opportunity for ultrahigh density information storage with high speed. However, the reciprocal process of Néel spin-orbit torque, the generation of ultrafast charge current in antiferromagnets has not been demonstrated. Here, we report the experimental observation of charge current generation in antiferromagnetic metallic Mn2Au thin films using ultrafast optical excitation. The ultrafast laser pulse excites antiferromagnetic magnons, resulting in instantaneous non-equilibrium spin polarization at the antiferromagnetic spin sublattices with broken spatial symmetry. Then the charge current is generated directly via spin-orbit fields at the two sublattices, which is termed as the reciprocal phenomenon of Néel spin-orbit torque, and the associated THz emission can be detected at room temperature. Besides the fundamental significance on the Onsager reciprocity, the observed magnonic charge current generation in antiferromagnet would advance the development of antiferromagnetic THz emitter. △ Less

Submitted 5 October, 2023; originally announced October 2023.

Comments: 15 pages, 4 figures, this work was submitted to Nature Communications on Jan. 4th, 2023, now is under the 3rd review process

arXiv:2310.03978 [pdf, other]

Efficient Quantum Circuit Simulation by Tensor Network Methods on Modern GPUs

Authors: Feng Pan, Hanfeng Gu, Lvlin Kuang, Bing Liu, Pan Zhang

Abstract: Efficient simulation of quantum circuits has become indispensable with the rapid development of quantum hardware. The primary simulation methods are based on state vectors and tensor networks. As the number of qubits and quantum gates grows larger in current quantum devices, traditional state-vector based quantum circuit simulation methods prove inadequate due to the overwhelming size of the Hilbe… ▽ More Efficient simulation of quantum circuits has become indispensable with the rapid development of quantum hardware. The primary simulation methods are based on state vectors and tensor networks. As the number of qubits and quantum gates grows larger in current quantum devices, traditional state-vector based quantum circuit simulation methods prove inadequate due to the overwhelming size of the Hilbert space and extensive entanglement. Consequently, brutal force tensor network simulation algorithms become the only viable solution in such scenarios. The two main challenges faced in tensor network simulation algorithms are optimal contraction path finding and efficient execution on modern computing devices, with the latter determines the actual efficiency. In this study, we investigate the optimization of such tensor network simulations on modern GPUs and propose general optimization strategies from two aspects: computational efficiency and accuracy. Firstly, we propose to transform critical Einstein summation operations into GEMM operations, leveraging the specific features of tensor network simulations to amplify the efficiency of GPUs. Secondly, by analyzing the data characteristics of quantum circuits, we employ extended precision to ensure the accuracy of simulation results and mixed precision to fully exploit the potential of GPUs, resulting in faster and more precise simulations. Our numerical experiments demonstrate that our approach can achieve a 3.96x reduction in verification time for random quantum circuit samples in the 18-cycle case of Sycamore, with sustained performance exceeding 21 TFLOPS on one A100. This method can be easily extended to the 20-cycle case, maintaining the same performance, accelerating by 12.5x compared to the state-of-the-art CPU-based results and 4.48-6.78x compared to the state-of-the-art GPU-based results reported in the literature. △ Less

Submitted 12 August, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

Comments: 25 pages, 10 figures

arXiv:2309.16502 [pdf, ps, other]

doi 10.1103/PhysRevB.108.184431

FeGe1-xSbx:a series of novel kagome metals with noncollinear antiferromagnetism

Authors: Jiale Huang, Chenglin Shang, Jianfei Qin, Feihao Pan, Bingxian Shi, Jinchen Wang, Juanjuan Liu, Daye Xu, Hongxia Zhang, Hongliang Wang, Lijie Hao, Peng Cheng

Abstract: Kagome metals are important for exploring emergent phenomena due to the interplay between band topology and electron correlation.Motivated by the recent discovery of charge density wave in a kagome lattice antiferromagnetic FeGe,we investigate the impact of Sb doping on the structural,charge and magnetic order of FeGe.The charge density wave is rapidly suppressed by Sb doping(~1.5%) and the antife… ▽ More Kagome metals are important for exploring emergent phenomena due to the interplay between band topology and electron correlation.Motivated by the recent discovery of charge density wave in a kagome lattice antiferromagnetic FeGe,we investigate the impact of Sb doping on the structural,charge and magnetic order of FeGe.The charge density wave is rapidly suppressed by Sb doping(~1.5%) and the antiferromagnetic ordering temperature gradually shifts to 280K for FeGe0.7Sb0.3.For FeGe1-xSbx with x>0.1,crystal structures with slightly distorted Fe kagome lattice are formed.Their magnetic anisotropy has significant change,temperature driven spin-reorientation and field-induced spin-flop transition are identified from magnetization measurement.Interestingly,neutron diffraction reveals noncollinear antiferromagnetic structures widely exist below TN for all sample with x>0.1.This noncollinear magnetic orders could possibly be unconventional and resulted from onsite repulsion and filling condition of kagome flat band,as predicted by a recent theoretical work. △ Less

Submitted 3 October, 2023; v1 submitted 28 September, 2023; originally announced September 2023.

Journal ref: Physical Review B 108, 184431 (2023)

arXiv:2309.11711 [pdf, other]

MoDA: Leveraging Motion Priors from Videos for Advancing Unsupervised Domain Adaptation in Semantic Segmentation

Authors: Fei Pan, Xu Yin, Seokju Lee, Axi Niu, Sungeui Yoon, In So Kweon

Abstract: Unsupervised domain adaptation (UDA) has been a potent technique to handle the lack of annotations in the target domain, particularly in semantic segmentation task. This study introduces a different UDA scenarios where the target domain contains unlabeled video frames. Drawing upon recent advancements of self-supervised learning of the object motion from unlabeled videos with geometric constraint,… ▽ More Unsupervised domain adaptation (UDA) has been a potent technique to handle the lack of annotations in the target domain, particularly in semantic segmentation task. This study introduces a different UDA scenarios where the target domain contains unlabeled video frames. Drawing upon recent advancements of self-supervised learning of the object motion from unlabeled videos with geometric constraint, we design a \textbf{Mo}tion-guided \textbf{D}omain \textbf{A}daptive semantic segmentation framework (MoDA). MoDA harnesses the self-supervised object motion cues to facilitate cross-domain alignment for segmentation task. First, we present an object discovery module to localize and segment target moving objects using object motion information. Then, we propose a semantic mining module that takes the object masks to refine the pseudo labels in the target domain. Subsequently, these high-quality pseudo labels are used in the self-training loop to bridge the cross-domain gap. On domain adaptive video and image segmentation experiments, MoDA shows the effectiveness utilizing object motion as guidance for domain alignment compared with optical flow information. Moreover, MoDA exhibits versatility as it can complement existing state-of-the-art UDA approaches. Code at https://github.com/feipanir/MoDA. △ Less

Submitted 15 April, 2024; v1 submitted 20 September, 2023; originally announced September 2023.

Comments: CVPR 2024 Workshop on Learning with Limited Labelled Data for Image and Video Understanding. Best Paper Award

arXiv:2309.06908 [pdf, other]

Towards the TopMost: A Topic Modeling System Toolkit

Authors: Xiaobao Wu, Fengjun Pan, Anh Tuan Luu

Abstract: Topic models have a rich history with various applications and have recently been reinvigorated by neural topic modeling. However, these numerous topic models adopt totally distinct datasets, implementations, and evaluations. This impedes quick utilization and fair comparisons, and thereby hinders their research progress and applications. To tackle this challenge, we in this paper propose a Topic… ▽ More Topic models have a rich history with various applications and have recently been reinvigorated by neural topic modeling. However, these numerous topic models adopt totally distinct datasets, implementations, and evaluations. This impedes quick utilization and fair comparisons, and thereby hinders their research progress and applications. To tackle this challenge, we in this paper propose a Topic Modeling System Toolkit (TopMost). Compared to existing toolkits, TopMost stands out by supporting more extensive features. It covers a broader spectrum of topic modeling scenarios with their complete lifecycles, including datasets, preprocessing, models, training, and evaluations. Thanks to its highly cohesive and decoupled modular design, TopMost enables rapid utilization, fair comparisons, and flexible extensions of diverse cutting-edge topic models. Our code, tutorials, and documentation are available at https://github.com/bobxwu/topmost. △ Less

Submitted 14 June, 2024; v1 submitted 13 September, 2023; originally announced September 2023.

Comments: Accepted to ACL 2024 System Demonstrations Track

arXiv:2309.01179 [pdf, other]

Cognition-Mode Aware Variational Representation Learning Framework for Knowledge Tracing

Authors: Moyu Zhang, Xinning Zhu, Chunhong Zhang, Feng Pan, Wenchen Qian, Hui Zhao

Abstract: The Knowledge Tracing (KT) task plays a crucial role in personalized learning, and its purpose is to predict student responses based on their historical practice behavior sequence. However, the KT task suffers from data sparsity, which makes it challenging to learn robust representations for students with few practice records and increases the risk of model overfitting. Therefore, in this paper, w… ▽ More The Knowledge Tracing (KT) task plays a crucial role in personalized learning, and its purpose is to predict student responses based on their historical practice behavior sequence. However, the KT task suffers from data sparsity, which makes it challenging to learn robust representations for students with few practice records and increases the risk of model overfitting. Therefore, in this paper, we propose a Cognition-Mode Aware Variational Representation Learning Framework (CMVF) that can be directly applied to existing KT methods. Our framework uses a probabilistic model to generate a distribution for each student, accounting for uncertainty in those with limited practice records, and estimate the student's distribution via variational inference (VI). In addition, we also introduce a cognition-mode aware multinomial distribution as prior knowledge that constrains the posterior student distributions learning, so as to ensure that students with similar cognition modes have similar distributions, avoiding overwhelming personalization for students with few practice records. At last, extensive experimental results confirm that CMVF can effectively aid existing KT methods in learning more robust student representations. Our code is available at https://github.com/zmy-9/CMVF. △ Less

Submitted 3 September, 2023; originally announced September 2023.

Comments: Accepted by ICDM 2023, 10 pages, 5 figures, 4 tables

Journal ref: 2023 ICDM

arXiv:2308.16569 [pdf, other]

doi 10.1109/ICASSP49357.2023.10096710

LightGrad: Lightweight Diffusion Probabilistic Model for Text-to-Speech

Authors: Jie Chen, Xingchen Song, Zhendong Peng, Binbin Zhang, Fuping Pan, Zhiyong Wu

Abstract: Recent advances in neural text-to-speech (TTS) models bring thousands of TTS applications into daily life, where models are deployed in cloud to provide services for customs. Among these models are diffusion probabilistic models (DPMs), which can be stably trained and are more parameter-efficient compared with other generative models. As transmitting data between customs and the cloud introduces h… ▽ More Recent advances in neural text-to-speech (TTS) models bring thousands of TTS applications into daily life, where models are deployed in cloud to provide services for customs. Among these models are diffusion probabilistic models (DPMs), which can be stably trained and are more parameter-efficient compared with other generative models. As transmitting data between customs and the cloud introduces high latency and the risk of exposing private data, deploying TTS models on edge devices is preferred. When implementing DPMs onto edge devices, there are two practical problems. First, current DPMs are not lightweight enough for resource-constrained devices. Second, DPMs require many denoising steps in inference, which increases latency. In this work, we present LightGrad, a lightweight DPM for TTS. LightGrad is equipped with a lightweight U-Net diffusion decoder and a training-free fast sampling technique, reducing both model parameters and inference latency. Streaming inference is also implemented in LightGrad to reduce latency further. Compared with Grad-TTS, LightGrad achieves 62.2% reduction in paramters, 65.7% reduction in latency, while preserving comparable speech quality on both Chinese Mandarin and English in 4 denoising steps. △ Less

Submitted 31 August, 2023; originally announced August 2023.

Comments: Accepted by ICASSP 2023

arXiv:2308.05970 [pdf]

Focused Specific Objects NeRF

Authors: Yuesong Li, Feng Pan, Helong Yan, Xiuli Xin, Xiaoxue Feng

Abstract: Most NeRF-based models are designed for learning the entire scene, and complex scenes can lead to longer learning times and poorer rendering effects. This paper utilizes scene semantic priors to make improvements in fast training, allowing the network to focus on the specific targets and not be affected by complex backgrounds. The training speed can be increased by 7.78 times with better rendering… ▽ More Most NeRF-based models are designed for learning the entire scene, and complex scenes can lead to longer learning times and poorer rendering effects. This paper utilizes scene semantic priors to make improvements in fast training, allowing the network to focus on the specific targets and not be affected by complex backgrounds. The training speed can be increased by 7.78 times with better rendering effect, and small to medium sized targets can be rendered faster. In addition, this improvement applies to all NeRF-based models. Considering the inherent multi-view consistency and smoothness of NeRF, this paper also studies weak supervision by sparsely sampling negative ray samples. With this method, training can be further accelerated and rendering quality can be maintained. Finally, this paper extends pixel semantic and color rendering formulas and proposes a new scene editing technique that can achieve unique displays of the specific semantic targets or masking them in rendering. To address the problem of unsupervised regions incorrect inferences in the scene, we also designed a self-supervised loop that combines morphological operations and clustering. △ Less

Submitted 11 August, 2023; originally announced August 2023.

Comments: 17 pages,32 figures

arXiv:2308.03488 [pdf, other]

doi 10.1145/3583780.3614988

No Length Left Behind: Enhancing Knowledge Tracing for Modeling Sequences of Excessive or Insufficient Lengths

Authors: Moyu Zhang, Xinning Zhu, Chunhong Zhang, Feng Pan, Wenchen Qian, Hui Zhao

Abstract: Knowledge tracing (KT) aims to predict students' responses to practices based on their historical question-answering behaviors. However, most current KT methods focus on improving overall AUC, leaving ample room for optimization in modeling sequences of excessive or insufficient lengths. As sequences get longer, computational costs will increase exponentially. Therefore, KT methods usually truncat… ▽ More Knowledge tracing (KT) aims to predict students' responses to practices based on their historical question-answering behaviors. However, most current KT methods focus on improving overall AUC, leaving ample room for optimization in modeling sequences of excessive or insufficient lengths. As sequences get longer, computational costs will increase exponentially. Therefore, KT methods usually truncate sequences to an acceptable length, which makes it difficult for models on online service systems to capture complete historical practice behaviors of students with too long sequences. Conversely, modeling students with short practice sequences using most KT methods may result in overfitting due to limited observation samples. To address the above limitations, we propose a model called Sequence-Flexible Knowledge Tracing (SFKT). △ Less

Submitted 7 August, 2023; originally announced August 2023.

Comments: Accepted by CIKM 2023, 10 pages, 8 figures, 5 tables

Journal ref: CIKM 2023

Showing 1–50 of 245 results for author: Pan, F