-
XLD: A Cross-Lane Dataset for Benchmarking Novel Driving View Synthesis
Authors:
Hao Li,
Ming Yuan,
Yan Zhang,
Chenming Wu,
Chen Zhao,
Chunyu Song,
Haocheng Feng,
Errui Ding,
Dingwen Zhang,
Jingdong Wang
Abstract:
Thoroughly testing autonomy systems is crucial in the pursuit of safe autonomous driving vehicles. It necessitates creating safety-critical scenarios that go beyond what can be safely collected from real-world data, as many of these scenarios occur infrequently on public roads. However, the evaluation of most existing NVS methods relies on sporadic sampling of image frames from the training data,…
▽ More
Thoroughly testing autonomy systems is crucial in the pursuit of safe autonomous driving vehicles. It necessitates creating safety-critical scenarios that go beyond what can be safely collected from real-world data, as many of these scenarios occur infrequently on public roads. However, the evaluation of most existing NVS methods relies on sporadic sampling of image frames from the training data, comparing the rendered images with ground truth images using metrics. Unfortunately, this evaluation protocol falls short of meeting the actual requirements in closed-loop simulations. Specifically, the true application demands the capability to render novel views that extend beyond the original trajectory (such as cross-lane views), which are challenging to capture in the real world. To address this, this paper presents a novel driving view synthesis dataset and benchmark specifically designed for autonomous driving simulations. This dataset is unique as it includes testing images captured by deviating from the training trajectory by 1-4 meters. It comprises six sequences encompassing various time and weather conditions. Each sequence contains 450 training images, 150 testing images, and their corresponding camera poses and intrinsic parameters. Leveraging this novel dataset, we establish the first realistic benchmark for evaluating existing NVS approaches under front-only and multi-camera settings. The experimental findings underscore the significant gap that exists in current approaches, revealing their inadequate ability to fulfill the demanding prerequisites of cross-lane or closed-loop simulation. Our dataset is released publicly at the project page: https://3d-aigc.github.io/XLD/.
△ Less
Submitted 26 June, 2024; v1 submitted 26 June, 2024;
originally announced June 2024.
-
Probing many-body Bell correlation depth with superconducting qubits
Authors:
Ke Wang,
Weikang Li,
Shibo Xu,
Mengyao Hu,
Jiachen Chen,
Yaozu Wu,
Chuanyu Zhang,
Feitong Jin,
Xuhao Zhu,
Yu Gao,
Ziqi Tan,
Aosai Zhang,
Ning Wang,
Yiren Zou,
Tingting Li,
Fanhao Shen,
Jiarun Zhong,
Zehang Bao,
Zitian Zhu,
Zixuan Song,
Jinfeng Deng,
Hang Dong,
Xu Zhang,
Pengfei Zhang,
Wenjie Jiang
, et al. (10 additional authors not shown)
Abstract:
Quantum nonlocality describes a stronger form of quantum correlation than that of entanglement. It refutes Einstein's belief of local realism and is among the most distinctive and enigmatic features of quantum mechanics. It is a crucial resource for achieving quantum advantages in a variety of practical applications, ranging from cryptography and certified random number generation via self-testing…
▽ More
Quantum nonlocality describes a stronger form of quantum correlation than that of entanglement. It refutes Einstein's belief of local realism and is among the most distinctive and enigmatic features of quantum mechanics. It is a crucial resource for achieving quantum advantages in a variety of practical applications, ranging from cryptography and certified random number generation via self-testing to machine learning. Nevertheless, the detection of nonlocality, especially in quantum many-body systems, is notoriously challenging. Here, we report an experimental certification of genuine multipartite Bell correlations, which signal nonlocality in quantum many-body systems, up to 24 qubits with a fully programmable superconducting quantum processor. In particular, we employ energy as a Bell correlation witness and variationally decrease the energy of a many-body system across a hierarchy of thresholds, below which an increasing Bell correlation depth can be certified from experimental data. As an illustrating example, we variationally prepare the low-energy state of a two-dimensional honeycomb model with 73 qubits and certify its Bell correlations by measuring an energy that surpasses the corresponding classical bound with up to 48 standard deviations. In addition, we variationally prepare a sequence of low-energy states and certify their genuine multipartite Bell correlations up to 24 qubits via energies measured efficiently by parity oscillation and multiple quantum coherence techniques. Our results establish a viable approach for preparing and certifying multipartite Bell correlations, which provide not only a finer benchmark beyond entanglement for quantum devices, but also a valuable guide towards exploiting multipartite Bell correlation in a wide spectrum of practical applications.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Terahertz photocurrent probe of quantum geometry and interactions in magic-angle twisted bilayer graphene
Authors:
Roshan Krishna Kumar,
Geng Li,
Riccardo Bertini,
Swati Chaudhary,
Krystian Nowakowski,
Jeong Min Park,
Sebastian Castilla,
Zhen Zhan,
Pierre A. Pantaleón,
Hitesh Agarwal,
Sergi Battle-Porro,
Eike Icking,
Matteo Ceccanti,
Antoine Reserbat-Plantey,
Giulia Piccinini,
Julien Barrier,
Ekaterina Khestanova,
Takashi Taniguchi,
Kenji Watanabe,
Christoph Stampfer,
Gil Refael,
Francisco Guinea,
Pablo Jarillo-Herrero,
Justin C. W. Song,
Petr Stepanov
, et al. (2 additional authors not shown)
Abstract:
Moiré materials represent strongly interacting electron systems bridging topological and correlated physics. Despite significant advances, decoding wavefunction properties underlying the quantum geometry remains challenging. Here, we utilize polarization-resolved photocurrent measurements to probe magic-angle twisted bilayer graphene, leveraging its sensitivity to the Berry connection that encompa…
▽ More
Moiré materials represent strongly interacting electron systems bridging topological and correlated physics. Despite significant advances, decoding wavefunction properties underlying the quantum geometry remains challenging. Here, we utilize polarization-resolved photocurrent measurements to probe magic-angle twisted bilayer graphene, leveraging its sensitivity to the Berry connection that encompasses quantum "textures" of electron wavefunctions. Using terahertz light resonant with optical transitions of its flat bands, we observe bulk photocurrents driven by broken symmetries and reveal the interplay between electron interactions and quantum geometry. We observe inversion-breaking gapped states undetectable through quantum transport, sharp changes in the polarization axes caused by interaction-induced band renormalization, and recurring photocurrent patterns at integer fillings of the moiré unit cell that track the evolution of quantum geometry through the cascade of phase transitions. The large and tunable terahertz response intrinsic to flat-band systems offers direct insights into the quantum geometry of interacting electrons and paves the way for innovative terahertz quantum technologies.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Isolated singularities of 3-dimensional Yang-Mills-Higgs fields
Authors:
Bo Chen,
Chong Song
Abstract:
In this paper, we derive decay estimates near isolated singularities of 3-dimensional (3d) Yang-Mills-Higgs fields defined on a fiber bundle, where the fiber space is a compact Riemannian manifold and the structure group is a connected compact Lie group. As an application, we obtain removable singularity theorems for 3d Yang-Mills-Higgs fields under different types of energy conditions, which gene…
▽ More
In this paper, we derive decay estimates near isolated singularities of 3-dimensional (3d) Yang-Mills-Higgs fields defined on a fiber bundle, where the fiber space is a compact Riemannian manifold and the structure group is a connected compact Lie group. As an application, we obtain removable singularity theorems for 3d Yang-Mills-Higgs fields under different types of energy conditions, which generalizes classical removable singularity theorems for 3d Yang-Mills fields~\cite{S84,SS84} and 3d harmonic maps~\cite{L85}.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
Tailored topotactic chemistry unlocks heterostructures of magnetic intercalation compounds
Authors:
Samra Husremović,
Oscar Gonzalez,
Berit H. Goodge,
Lilia S. Xie,
Zhizhi Kong,
Wanlin Zhang,
Sae Hee Ryu,
Stephanie M. Ribet,
Karen C. Bustillo,
Chengyu Song,
Jim Ciston,
Takashi Taniguchi,
Kenji Watanabe,
Colin Ophus,
Chris Jozwiak,
Aaron Bostwick,
Eli Rotenberg,
D. Kwabena Bediako
Abstract:
The construction of thin film heterostructures has been a widely successful archetype for fabricating materials with emergent physical properties. This strategy is of particular importance for the design of multilayer magnetic architectures in which direct interfacial spin--spin interactions between magnetic phases in dissimilar layers lead to emergent and controllable magnetic behavior. However,…
▽ More
The construction of thin film heterostructures has been a widely successful archetype for fabricating materials with emergent physical properties. This strategy is of particular importance for the design of multilayer magnetic architectures in which direct interfacial spin--spin interactions between magnetic phases in dissimilar layers lead to emergent and controllable magnetic behavior. However, crystallographic incommensurability and atomic-scale interfacial disorder can severely limit the types of materials amenable to this strategy, as well as the performance of these systems. Here, we demonstrate a method for synthesizing heterostructures comprising magnetic intercalation compounds of transition metal dichalcogenides (TMDs), through directed topotactic reaction of the TMD with a metal oxide. The mechanism of the intercalation reaction enables thermally initiated intercalation of the TMD from lithographically patterned oxide films, giving access to a new family of multi-component magnetic architectures through the combination of deterministic van der Waals assembly and directed intercalation chemistry.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
One Fling to Goal: Environment-aware Dynamics for Goal-conditioned Fabric Flinging
Authors:
Linhan Yang,
Lei Yang,
Haoran Sun,
Zeqing Zhang,
Haibin He,
Fang Wan,
Chaoyang Song,
Jia Pan
Abstract:
Fabric manipulation dynamically is commonly seen in manufacturing and domestic settings. While dynamically manipulating a fabric piece to reach a target state is highly efficient, this task presents considerable challenges due to the varying properties of different fabrics, complex dynamics when interacting with environments, and meeting required goal conditions. To address these challenges, we pr…
▽ More
Fabric manipulation dynamically is commonly seen in manufacturing and domestic settings. While dynamically manipulating a fabric piece to reach a target state is highly efficient, this task presents considerable challenges due to the varying properties of different fabrics, complex dynamics when interacting with environments, and meeting required goal conditions. To address these challenges, we present \textit{One Fling to Goal}, an algorithm capable of handling fabric pieces with diverse shapes and physical properties across various scenarios. Our method learns a graph-based dynamics model equipped with environmental awareness. With this dynamics model, we devise a real-time controller to enable high-speed fabric manipulation in one attempt, requiring less than 3 seconds to finish the goal-conditioned task. We experimentally validate our method on a goal-conditioned manipulation task in five diverse scenarios. Our method significantly improves this goal-conditioned task, achieving an average error of 13.2mm in complex scenarios. Our method can be seamlessly transferred to real-world robotic systems and generalized to unseen scenarios in a zero-shot manner.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Towards a Client-Centered Assessment of LLM Therapists by Client Simulation
Authors:
Jiashuo Wang,
Yang Xiao,
Yanran Li,
Changhe Song,
Chunpu Xu,
Chenhao Tan,
Wenjie Li
Abstract:
Although there is a growing belief that LLMs can be used as therapists, exploring LLMs' capabilities and inefficacy, particularly from the client's perspective, is limited. This work focuses on a client-centered assessment of LLM therapists with the involvement of simulated clients, a standard approach in clinical medical education. However, there are two challenges when applying the approach to a…
▽ More
Although there is a growing belief that LLMs can be used as therapists, exploring LLMs' capabilities and inefficacy, particularly from the client's perspective, is limited. This work focuses on a client-centered assessment of LLM therapists with the involvement of simulated clients, a standard approach in clinical medical education. However, there are two challenges when applying the approach to assess LLM therapists at scale. Ethically, asking humans to frequently mimic clients and exposing them to potentially harmful LLM outputs can be risky and unsafe. Technically, it can be difficult to consistently compare the performances of different LLM therapists interacting with the same client. To this end, we adopt LLMs to simulate clients and propose ClientCAST, a client-centered approach to assessing LLM therapists by client simulation. Specifically, the simulated client is utilized to interact with LLM therapists and complete questionnaires related to the interaction. Based on the questionnaire results, we assess LLM therapists from three client-centered aspects: session outcome, therapeutic alliance, and self-reported feelings. We conduct experiments to examine the reliability of ClientCAST and use it to evaluate LLMs therapists implemented by Claude-3, GPT-3.5, LLaMA3-70B, and Mixtral 8*7B. Codes are released at https://github.com/wangjs9/ClientCAST.
△ Less
Submitted 20 June, 2024; v1 submitted 18 June, 2024;
originally announced June 2024.
-
TutteNet: Injective 3D Deformations by Composition of 2D Mesh Deformations
Authors:
Bo Sun,
Thibault Groueix,
Chen Song,
Qixing Huang,
Noam Aigerman
Abstract:
This work proposes a novel representation of injective deformations of 3D space, which overcomes existing limitations of injective methods: inaccuracy, lack of robustness, and incompatibility with general learning and optimization frameworks. The core idea is to reduce the problem to a deep composition of multiple 2D mesh-based piecewise-linear maps. Namely, we build differentiable layers that pro…
▽ More
This work proposes a novel representation of injective deformations of 3D space, which overcomes existing limitations of injective methods: inaccuracy, lack of robustness, and incompatibility with general learning and optimization frameworks. The core idea is to reduce the problem to a deep composition of multiple 2D mesh-based piecewise-linear maps. Namely, we build differentiable layers that produce mesh deformations through Tutte's embedding (guaranteed to be injective in 2D), and compose these layers over different planes to create complex 3D injective deformations of the 3D volume. We show our method provides the ability to efficiently and accurately optimize and learn complex deformations, outperforming other injective approaches. As a main application, we produce complex and artifact-free NeRF and SDF deformations.
△ Less
Submitted 20 June, 2024; v1 submitted 17 June, 2024;
originally announced June 2024.
-
VideoLLM-online: Online Video Large Language Model for Streaming Video
Authors:
Joya Chen,
Zhaoyang Lv,
Shiwei Wu,
Kevin Qinghong Lin,
Chenan Song,
Difei Gao,
Jia-Wei Liu,
Ziteng Gao,
Dongxing Mao,
Mike Zheng Shou
Abstract:
Recent Large Language Models have been enhanced with vision capabilities, enabling them to comprehend images, videos, and interleaved vision-language content. However, the learning methods of these large multimodal models typically treat videos as predetermined clips, making them less effective and efficient at handling streaming video inputs. In this paper, we propose a novel Learning-In-Video-St…
▽ More
Recent Large Language Models have been enhanced with vision capabilities, enabling them to comprehend images, videos, and interleaved vision-language content. However, the learning methods of these large multimodal models typically treat videos as predetermined clips, making them less effective and efficient at handling streaming video inputs. In this paper, we propose a novel Learning-In-Video-Stream (LIVE) framework, which enables temporally aligned, long-context, and real-time conversation within a continuous video stream. Our LIVE framework comprises comprehensive approaches to achieve video streaming dialogue, encompassing: (1) a training objective designed to perform language modeling for continuous streaming inputs, (2) a data generation scheme that converts offline temporal annotations into a streaming dialogue format, and (3) an optimized inference pipeline to speed up the model responses in real-world video streams. With our LIVE framework, we built VideoLLM-online model upon Llama-2/Llama-3 and demonstrate its significant advantages in processing streaming videos. For instance, on average, our model can support streaming dialogue in a 5-minute video clip at over 10 FPS on an A100 GPU. Moreover, it also showcases state-of-the-art performance on public offline video benchmarks, such as recognition, captioning, and forecasting. The code, model, data, and demo have been made available at https://showlab.github.io/videollm-online.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Constraints on Ultra Heavy Dark Matter Properties from Dwarf Spheroidal Galaxies with LHAASO Observations
Authors:
Zhen Cao,
F. Aharonian,
Q. An,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen,
S. Z. Chen
, et al. (255 additional authors not shown)
Abstract:
In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes…
▽ More
In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes of astrophysical $γ$-ray background while large amount of dark matter. By analyzing more than 700 days observational data at LHAASO, no significant dark matter signal from 1 TeV to 1 EeV is detected. Accordingly we derive the most stringent constraints on the ultra-heavy dark matter annihilation cross-section up to EeV. The constraints on the lifetime of dark matter in decay mode are also derived.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Frustrated phonon with charge density wave in vanadium Kagome metal
Authors:
Seung-Phil Heo,
Choongjae Won,
Heemin Lee,
Hanbyul Kim,
Eunyoung Park,
Sung Yun Lee,
Junha Hwang,
Hyeongi Choi,
Sang-Youn Park,
Byungjune Lee,
Woo-Suk Noh,
Hoyoung Jang,
Jae-Hoon Park,
Dongbin Shin,
Changyong Song
Abstract:
Crystals with unique ionic arrangements and strong electronic correlations serve as a fertile ground for the emergence of exotic phases, as evidenced by the coexistence of charge density wave (CDW) and superconductivity in vanadium Kagome metals, specifically AV3Sb5 (where A represents K, Rb, or Cs). The formation of a star of David CDW superstructure, resulting from the coordinated displacements…
▽ More
Crystals with unique ionic arrangements and strong electronic correlations serve as a fertile ground for the emergence of exotic phases, as evidenced by the coexistence of charge density wave (CDW) and superconductivity in vanadium Kagome metals, specifically AV3Sb5 (where A represents K, Rb, or Cs). The formation of a star of David CDW superstructure, resulting from the coordinated displacements of vanadium ions on a corner sharing triangular lattice, has garnered significant attention in efforts to comprehend the influence of electron phonon interaction within this geometrically intricate lattice. However, understanding of the underlying mechanism behind CDW formation, coupled with symmetry protected lattice vibrations, remains elusive. In this study, we employed time resolved X ray scattering experiments utilising an X ray free electron laser. Our findings reveal that the phonon mode associated with the out of plane motion of Cs ions becomes frustrated in the CDW phase. Furthermore, we observed the photoinduced emergence of a metastable CDW phase, facilitated by the alleviation of frustration through nonadiabatic changes in free energy. By elucidating the longstanding puzzle surrounding the intervention of phonons in CDW ordering, this research offers fresh insights into the competition between phonons and periodic lattice distortions, a phenomenon widespread in other correlated quantum materials including layered high Tc superconductors.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Universal spatial inflation of human mobility
Authors:
Lu Zhong,
Lei Dong,
Qi Wang,
Chaoming Song,
Jianxi Gao
Abstract:
Understanding the interplay between egocentric preference and urban structure in shaping human mobility has profound implications for improving epidemic intervention, social equity, and urban resilience. However, numerous existing studies either solely identify the egocentric preferences -- the anchoring effects from home -- or the impact of hierarchical urban structures. Here, we propose a networ…
▽ More
Understanding the interplay between egocentric preference and urban structure in shaping human mobility has profound implications for improving epidemic intervention, social equity, and urban resilience. However, numerous existing studies either solely identify the egocentric preferences -- the anchoring effects from home -- or the impact of hierarchical urban structures. Here, we propose a network-based approach to present human mobility in both spatial and topological aspects within the urban system, using cell phone trajectory data from millions of users across three countries. By segmenting mobility trajectories into modules and examining their overlap with urban scales, we have observed the inflation law that the geospatial extent of these modules increases sub-linearly with their distance from home. Moreover, the egocentric preference for higher urban levels leads to this increase. This universal finding indicates that home-based preferences distort the hierarchical scales of human mobility in the urban environment, regardless of demographics or geography.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Cross-Modal Safety Alignment: Is textual unlearning all you need?
Authors:
Trishna Chakraborty,
Erfan Shayegani,
Zikui Cai,
Nael Abu-Ghazaleh,
M. Salman Asif,
Yue Dong,
Amit K. Roy-Chowdhury,
Chengyu Song
Abstract:
Recent studies reveal that integrating new modalities into Large Language Models (LLMs), such as Vision-Language Models (VLMs), creates a new attack surface that bypasses existing safety training techniques like Supervised Fine-tuning (SFT) and Reinforcement Learning with Human Feedback (RLHF). While further SFT and RLHF-based safety training can be conducted in multi-modal settings, collecting mu…
▽ More
Recent studies reveal that integrating new modalities into Large Language Models (LLMs), such as Vision-Language Models (VLMs), creates a new attack surface that bypasses existing safety training techniques like Supervised Fine-tuning (SFT) and Reinforcement Learning with Human Feedback (RLHF). While further SFT and RLHF-based safety training can be conducted in multi-modal settings, collecting multi-modal training datasets poses a significant challenge. Inspired by the structural design of recent multi-modal models, where, regardless of the combination of input modalities, all inputs are ultimately fused into the language space, we aim to explore whether unlearning solely in the textual domain can be effective for cross-modality safety alignment. Our evaluation across six datasets empirically demonstrates the transferability -- textual unlearning in VLMs significantly reduces the Attack Success Rate (ASR) to less than 8\% and in some cases, even as low as nearly 2\% for both text-based and vision-text-based attacks, alongside preserving the utility. Moreover, our experiments show that unlearning with a multi-modal dataset offers no potential benefits but incurs significantly increased computational demands, possibly up to 6 times higher.
△ Less
Submitted 27 May, 2024;
originally announced June 2024.
-
Representing Animatable Avatar via Factorized Neural Fields
Authors:
Chunjin Song,
Zhijie Wu,
Bastian Wandt,
Leonid Sigal,
Helge Rhodin
Abstract:
For reconstructing high-fidelity human 3D models from monocular videos, it is crucial to maintain consistent large-scale body shapes along with finely matched subtle wrinkles. This paper explores the observation that the per-frame rendering results can be factorized into a pose-independent component and a corresponding pose-dependent equivalent to facilitate frame consistency. Pose adaptive textur…
▽ More
For reconstructing high-fidelity human 3D models from monocular videos, it is crucial to maintain consistent large-scale body shapes along with finely matched subtle wrinkles. This paper explores the observation that the per-frame rendering results can be factorized into a pose-independent component and a corresponding pose-dependent equivalent to facilitate frame consistency. Pose adaptive textures can be further improved by restricting frequency bands of these two components. In detail, pose-independent outputs are expected to be low-frequency, while highfrequency information is linked to pose-dependent factors. We achieve a coherent preservation of both coarse body contours across the entire input video and finegrained texture features that are time variant with a dual-branch network with distinct frequency components. The first branch takes coordinates in canonical space as input, while the second branch additionally considers features outputted by the first branch and pose information of each frame. Our network integrates the information predicted by both branches and utilizes volume rendering to generate photo-realistic 3D human images. Through experiments, we demonstrate that our network surpasses the neural radiance fields (NeRF) based state-of-the-art methods in preserving high-frequency details and ensuring consistent body contours.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
Supervised Contrastive Learning for Snapshot Spectral Imaging Face Anti-Spoofing
Authors:
Chuanbiao Song,
Yan Hong,
Jun Lan,
Huijia Zhu,
Weiqiang Wang,
Jianfu Zhang
Abstract:
This study reveals a cutting-edge re-balanced contrastive learning strategy aimed at strengthening face anti-spoofing capabilities within facial recognition systems, with a focus on countering the challenges posed by printed photos, and highly realistic silicone or latex masks. Leveraging the HySpeFAS dataset, which benefits from Snapshot Spectral Imaging technology to provide hyperspectral images…
▽ More
This study reveals a cutting-edge re-balanced contrastive learning strategy aimed at strengthening face anti-spoofing capabilities within facial recognition systems, with a focus on countering the challenges posed by printed photos, and highly realistic silicone or latex masks. Leveraging the HySpeFAS dataset, which benefits from Snapshot Spectral Imaging technology to provide hyperspectral images, our approach harmonizes class-level contrastive learning with data resampling and an innovative real-face oriented reweighting technique. This method effectively mitigates dataset imbalances and reduces identity-related biases. Notably, our strategy achieved an unprecedented 0.0000\% Average Classification Error Rate (ACER) on the HySpeFAS dataset, ranking first at the Chalearn Snapshot Spectral Imaging Face Anti-spoofing Challenge on CVPR 2024.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Sync4D: Video Guided Controllable Dynamics for Physics-Based 4D Generation
Authors:
Zhoujie Fu,
Jiacheng Wei,
Wenhao Shen,
Chaoyue Song,
Xiaofeng Yang,
Fayao Liu,
Xulei Yang,
Guosheng Lin
Abstract:
In this work, we introduce a novel approach for creating controllable dynamics in 3D-generated Gaussians using casually captured reference videos. Our method transfers the motion of objects from reference videos to a variety of generated 3D Gaussians across different categories, ensuring precise and customizable motion transfer. We achieve this by employing blend skinning-based non-parametric shap…
▽ More
In this work, we introduce a novel approach for creating controllable dynamics in 3D-generated Gaussians using casually captured reference videos. Our method transfers the motion of objects from reference videos to a variety of generated 3D Gaussians across different categories, ensuring precise and customizable motion transfer. We achieve this by employing blend skinning-based non-parametric shape reconstruction to extract the shape and motion of reference objects. This process involves segmenting the reference objects into motion-related parts based on skinning weights and establishing shape correspondences with generated target shapes. To address shape and temporal inconsistencies prevalent in existing methods, we integrate physical simulation, driving the target shapes with matched motion. This integration is optimized through a displacement loss to ensure reliable and genuine dynamics. Our approach supports diverse reference inputs, including humans, quadrupeds, and articulated objects, and can generate dynamics of arbitrary length, providing enhanced fidelity and applicability. Unlike methods heavily reliant on diffusion video generation models, our technique offers specific and high-quality motion transfer, maintaining both shape integrity and temporal consistency.
△ Less
Submitted 6 June, 2024; v1 submitted 27 May, 2024;
originally announced May 2024.
-
Decoding at the Speed of Thought: Harnessing Parallel Decoding of Lexical Units for LLMs
Authors:
Chenxi Sun,
Hongzhi Zhang,
Zijia Lin,
Jingyuan Zhang,
Fuzheng Zhang,
Zhongyuan Wang,
Bin Chen,
Chengru Song,
Di Zhang,
Kun Gai,
Deyi Xiong
Abstract:
Large language models have demonstrated exceptional capability in natural language understanding and generation. However, their generation speed is limited by the inherently sequential nature of their decoding process, posing challenges for real-time applications. This paper introduces Lexical Unit Decoding (LUD), a novel decoding methodology implemented in a data-driven manner, accelerating the d…
▽ More
Large language models have demonstrated exceptional capability in natural language understanding and generation. However, their generation speed is limited by the inherently sequential nature of their decoding process, posing challenges for real-time applications. This paper introduces Lexical Unit Decoding (LUD), a novel decoding methodology implemented in a data-driven manner, accelerating the decoding process without sacrificing output quality. The core of our approach is the observation that a pre-trained language model can confidently predict multiple contiguous tokens, forming the basis for a \textit{lexical unit}, in which these contiguous tokens could be decoded in parallel. Extensive experiments validate that our method substantially reduces decoding time while maintaining generation quality, i.e., 33\% speed up on natural language generation with no quality loss, and 30\% speed up on code generation with a negligible quality loss of 3\%. Distinctively, LUD requires no auxiliary models and does not require changes to existing architectures. It can also be integrated with other decoding acceleration methods, thus achieving an even more pronounced inference efficiency boost. We posit that the foundational principles of LUD could define a new decoding paradigm for future language models, enhancing their applicability for a broader spectrum of applications. All codes are be publicly available at https://github.com/tjunlp-lab/Lexical-Unit-Decoding-LUD-.
Keywords: Parallel Decoding, Lexical Unit Decoding, Large Language Model
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
HemSeg-200: A Voxel-Annotated Dataset for Intracerebral Hemorrhages Segmentation in Brain CT Scans
Authors:
Changwei Song,
Qing Zhao,
Jianqiang Li,
Xin Yue,
Ruoyun Gao,
Zhaoxuan Wang,
An Gao,
Guanghui Fu
Abstract:
Acute intracerebral hemorrhage is a life-threatening condition that demands immediate medical intervention. Intraparenchymal hemorrhage (IPH) and intraventricular hemorrhage (IVH) are critical subtypes of this condition. Clinically, when such hemorrhages are suspected, immediate CT scanning is essential to assess the extent of the bleeding and to facilitate the formulation of a targeted treatment…
▽ More
Acute intracerebral hemorrhage is a life-threatening condition that demands immediate medical intervention. Intraparenchymal hemorrhage (IPH) and intraventricular hemorrhage (IVH) are critical subtypes of this condition. Clinically, when such hemorrhages are suspected, immediate CT scanning is essential to assess the extent of the bleeding and to facilitate the formulation of a targeted treatment plan. While current research in deep learning has largely focused on qualitative analyses, such as identifying subtypes of cerebral hemorrhages, there remains a significant gap in quantitative analysis crucial for enhancing clinical treatments. Addressing this gap, our paper introduces a dataset comprising 222 CT annotations, sourced from the RSNA 2019 Brain CT Hemorrhage Challenge and meticulously annotated at the voxel level for precise IPH and IVH segmentation. This dataset was utilized to train and evaluate seven advanced medical image segmentation algorithms, with the goal of refining the accuracy of segmentation for these hemorrhages. Our findings demonstrate that this dataset not only furthers the development of sophisticated segmentation algorithms but also substantially aids scientific research and clinical practice by improving the diagnosis and management of these severe hemorrhages. Our dataset and codes are available at \url{https://github.com/songchangwei/3DCT-SD-IVH-ICH}.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Data quality control system and long-term performance monitor of the LHAASO-KM2A
Authors:
Zhen Cao,
F. Aharonian,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
W. Bian,
A. V. Bukevich,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
H. X. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. Chen
, et al. (263 additional authors not shown)
Abstract:
The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To…
▽ More
The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To ensure the reliability of the LHAASO-KM2A data, a three-level quality control system has been established. It is used to monitor the status of detector units, stability of reconstructed parameters and the performance of the array based on observations of the Crab Nebula and Moon shadow. This paper will introduce the control system and its application on the LHAASO-KM2A data collected from August 2021 to July 2023. During this period, the pointing and angular resolution of the array were stable. From the observations of the Moon shadow and Crab Nebula, the results achieved using the two methods are consistent with each other. According to the observation of the Crab Nebula at energies from 25 TeV to 100 TeV, the time averaged pointing errors are estimated to be $-0.003^{\circ} \pm 0.005^{\circ}$ and $0.001^{\circ} \pm 0.006^{\circ}$ in the R.A. and Dec directions, respectively.
△ Less
Submitted 13 June, 2024; v1 submitted 20 May, 2024;
originally announced May 2024.
-
The Red Supergiant Progenitor of Type II Supernova 2024ggi
Authors:
Danfeng Xiang,
Jun Mo,
Xiaofeng Wang,
Lingzhi Wang,
Jujia Zhang,
Han Lin,
Liyang Chen,
Cuiying Song,
Liang-Duan Liu,
Zhenyu Wang,
Gaici Li
Abstract:
We present a detailed analysis of the progenitor and its local environment for the recently discovered type II supernova (SN) 2024ggi at a distance of about 6.7~Mpc, by utilizing the pre-explosion images from the Hubble Space Telescope (HST) and \textit{Spitzer} Space Telescope. The progenitor is identified as a red, bright variable star, with absolute $F814W$-band magnitudes being $-$6.2 mag in 1…
▽ More
We present a detailed analysis of the progenitor and its local environment for the recently discovered type II supernova (SN) 2024ggi at a distance of about 6.7~Mpc, by utilizing the pre-explosion images from the Hubble Space Telescope (HST) and \textit{Spitzer} Space Telescope. The progenitor is identified as a red, bright variable star, with absolute $F814W$-band magnitudes being $-$6.2 mag in 1995 to $-$7.2 mag in 2003, respectively, consistent with that of a normal red supergiant (RSG) star. Combining with the historical mid-infrared light curves, a pulsational period of about 379~days can be inferred for the progenitor star. Fitting its spectral energy distribution with stellar spectral models yields the stellar parameters of temperature, radius and bolometric luminosity as $T_*=3290_{-27}^{+19}$~K, $R_*=887_{-51}^{+60}$~R$_{\odot}$, and log($L$/L$_{\odot}$)$=4.92_{-0.04}^{+0.05}$, respectively. The above parameters indicate that the progenitor of SN 2024ggi is consistent with the stellar evolutionary track of a solar-metallicity massive star with an initial mass of $13_{-1}^{+1}$~M$_{\odot}$. Moreover, our analysis indicates a relatively low mass loss rate (i.e., $< 3\times10^{-6}$~M$_{\odot}$~yr$^{-1}$) for the progenitor compared to that inferred from the flashed spectra and X-ray detection (i.e., $10^{-2}$$-$$ 10$$^{-5}$~M$_{\odot}$~yr$^{-1}$), implying a significant enhancement in mass loss within a few years prior to the explosion.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Discovery of Very-high-energy Gamma-ray Emissions from the Low Luminosity AGN NGC 4278 by LHAASO
Authors:
Zhen Cao,
F. Aharonian,
Q. An,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen,
S. Z. Chen
, et al. (255 additional authors not shown)
Abstract:
The first source catalog of Large High Altitude Air Shower Observatory reported the detection of a very-high-energy gamma ray source, 1LHAASO J1219+2915. In this paper a further detailed study of the spectral and temporal behavior of this point-like source have been carried. The best-fit position of the TeV source ($\rm{RA}=185.05^{\circ}\pm0.04^{\circ}$, $\rm{Dec}=29.25^{\circ}\pm0.03^{\circ}$) i…
▽ More
The first source catalog of Large High Altitude Air Shower Observatory reported the detection of a very-high-energy gamma ray source, 1LHAASO J1219+2915. In this paper a further detailed study of the spectral and temporal behavior of this point-like source have been carried. The best-fit position of the TeV source ($\rm{RA}=185.05^{\circ}\pm0.04^{\circ}$, $\rm{Dec}=29.25^{\circ}\pm0.03^{\circ}$) is compatible with NGC 4278 within $\sim0.03$ degree. Variation analysis shows an indication of the variability at a few months level in the TeV band, which is consistent with low frequency observations. Based on these observations, we report the detection of TeV $γ$-ray emissions from this low-luminosity AGN NGC 4278. The observations by LHAASO-WCDA during active period has a significance level of 8.8\,$σ$ with best-fit photon spectral index $\varGamma=2.56\pm0.14$ and a flux $f_{1-10\,\rm{TeV}}=(7.0\pm1.1_{\rm{sta}}\pm0.35_{\rm{syst}})\times10^{-13}\,\rm{photons\,cm^{-2}\,s^{-1}}$, or approximately $5\%$ of the Crab Nebula. The discovery of VHE from NGC 4278 indicates that the compact, weak radio jet can efficiently accelerate particles and emit TeV photons.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Self-supervised Gait-based Emotion Representation Learning from Selective Strongly Augmented Skeleton Sequences
Authors:
Cheng Song,
Lu Lu,
Zhen Ke,
Long Gao,
Shuai Ding
Abstract:
Emotion recognition is an important part of affective computing. Extracting emotional cues from human gaits yields benefits such as natural interaction, a nonintrusive nature, and remote detection. Recently, the introduction of self-supervised learning techniques offers a practical solution to the issues arising from the scarcity of labeled data in the field of gait-based emotion recognition. Howe…
▽ More
Emotion recognition is an important part of affective computing. Extracting emotional cues from human gaits yields benefits such as natural interaction, a nonintrusive nature, and remote detection. Recently, the introduction of self-supervised learning techniques offers a practical solution to the issues arising from the scarcity of labeled data in the field of gait-based emotion recognition. However, due to the limited diversity of gaits and the incompleteness of feature representations for skeletons, the existing contrastive learning methods are usually inefficient for the acquisition of gait emotions. In this paper, we propose a contrastive learning framework utilizing selective strong augmentation (SSA) for self-supervised gait-based emotion representation, which aims to derive effective representations from limited labeled gait data. First, we propose an SSA method for the gait emotion recognition task, which includes upper body jitter and random spatiotemporal mask. The goal of SSA is to generate more diverse and targeted positive samples and prompt the model to learn more distinctive and robust feature representations. Then, we design a complementary feature fusion network (CFFN) that facilitates the integration of cross-domain information to acquire topological structural and global adaptive features. Finally, we implement the distributional divergence minimization loss to supervise the representation learning of the generally and strongly augmented queries. Our approach is validated on the Emotion-Gait (E-Gait) and Emilya datasets and outperforms the state-of-the-art methods under different evaluation protocols.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
Fine-grained Speech Sentiment Analysis in Chinese Psychological Support Hotlines Based on Large-scale Pre-trained Model
Authors:
Zhonglong Chen,
Changwei Song,
Yining Chen,
Jianqiang Li,
Guanghui Fu,
Yongsheng Tong,
Qing Zhao
Abstract:
Suicide and suicidal behaviors remain significant challenges for public policy and healthcare. In response, psychological support hotlines have been established worldwide to provide immediate help to individuals in mental crises. The effectiveness of these hotlines largely depends on accurately identifying callers' emotional states, particularly underlying negative emotions indicative of increased…
▽ More
Suicide and suicidal behaviors remain significant challenges for public policy and healthcare. In response, psychological support hotlines have been established worldwide to provide immediate help to individuals in mental crises. The effectiveness of these hotlines largely depends on accurately identifying callers' emotional states, particularly underlying negative emotions indicative of increased suicide risk. However, the high demand for psychological interventions often results in a shortage of professional operators, highlighting the need for an effective speech emotion recognition model. This model would automatically detect and analyze callers' emotions, facilitating integration into hotline services. Additionally, it would enable large-scale data analysis of psychological support hotline interactions to explore psychological phenomena and behaviors across populations. Our study utilizes data from the Beijing psychological support hotline, the largest suicide hotline in China. We analyzed speech data from 105 callers containing 20,630 segments and categorized them into 11 types of negative emotions. We developed a negative emotion recognition model and a fine-grained multi-label classification model using a large-scale pre-trained model. Our experiments indicate that the negative emotion recognition model achieves a maximum F1-score of 76.96%. However, it shows limited efficacy in the fine-grained multi-label classification task, with the best model achieving only a 41.74% weighted F1-score. We conducted an error analysis for this task, discussed potential future improvements, and considered the clinical application possibilities of our study. All the codes are public available.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
DRAMScope: Uncovering DRAM Microarchitecture and Characteristics by Issuing Memory Commands
Authors:
Hwayong Nam,
Seungmin Baek,
Minbok Wi,
Michael Jaemin Kim,
Jaehyun Park,
Chihun Song,
Nam Sung Kim,
Jung Ho Ahn
Abstract:
The demand for precise information on DRAM microarchitectures and error characteristics has surged, driven by the need to explore processing in memory, enhance reliability, and mitigate security vulnerability. Nonetheless, DRAM manufacturers have disclosed only a limited amount of information, making it difficult to find specific information on their DRAM microarchitectures. This paper addresses t…
▽ More
The demand for precise information on DRAM microarchitectures and error characteristics has surged, driven by the need to explore processing in memory, enhance reliability, and mitigate security vulnerability. Nonetheless, DRAM manufacturers have disclosed only a limited amount of information, making it difficult to find specific information on their DRAM microarchitectures. This paper addresses this gap by presenting more rigorous findings on the microarchitectures of commodity DRAM chips and their impacts on the characteristics of activate-induced bitflips (AIBs), such as RowHammer and RowPress. The previous studies have also attempted to understand the DRAM microarchitectures and associated behaviors, but we have found some of their results to be misled by inaccurate address mapping and internal data swizzling, or lack of a deeper understanding of the modern DRAM cell structure. For accurate and efficient reverse-engineering, we use three tools: AIBs, retention time test, and RowCopy, which can be cross-validated. With these three tools, we first take a macroscopic view of modern DRAM chips to uncover the size, structure, and operation of their subarrays, memory array tiles (MATs), and rows. Then, we analyze AIB characteristics based on the microscopic view of the DRAM microarchitecture, such as 6F^2 cell layout, through which we rectify misunderstandings regarding AIBs and discover a new data pattern that accelerates AIBs. Lastly, based on our findings at both macroscopic and microscopic levels, we identify previously unknown AIB vulnerabilities and propose a simple yet effective protection solution.
△ Less
Submitted 3 May, 2024;
originally announced May 2024.
-
Performance of Superconducting Resonators Suspended on SiN Membranes
Authors:
Trevor Chistolini,
Kyunghoon Lee,
Archan Banerjee,
Mohammed Alghadeer,
Christian Jünger,
M. Virginia P. Altoé,
Chengyu Song,
Sudi Chen,
Feng Wang,
David I. Santiago,
Irfan Siddiqi
Abstract:
Correlated errors in superconducting circuits due to nonequilibrium quasiparticles are a notable concern in efforts to achieve fault tolerant quantum computing. The propagation of quasiparticles causing these correlated errors can potentially be mediated by phonons in the substrate. Therefore, methods that decouple devices from the substrate are possible solutions, such as isolating devices atop S…
▽ More
Correlated errors in superconducting circuits due to nonequilibrium quasiparticles are a notable concern in efforts to achieve fault tolerant quantum computing. The propagation of quasiparticles causing these correlated errors can potentially be mediated by phonons in the substrate. Therefore, methods that decouple devices from the substrate are possible solutions, such as isolating devices atop SiN membranes. In this work, we validate the compatibility of SiN membrane technology with high quality superconducting circuits, adding the technique to the community's fabrication toolbox. We do so by fabricating superconducting coplanar waveguide resonators entirely atop a thin ($\sim$110 nm) SiN layer, where the bulk Si originally supporting it has been etched away, achieving a suspended membrane where the shortest length to its thickness yields an aspect ratio of approximately $7.4 \times 10^3$. We compare these membrane resonators to on-substrate resonators on the same chip, finding similar internal quality factors $\sim$$10^5$ at single photon levels. Furthermore, we confirm that these membranes do not adversely affect the resonator thermalization rate. With these important benchmarks validated, this technique can be extended to qubits.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
Nonlinear Superconducting Magnetoelectric Effect
Authors:
Jin-Xin Hu,
Oles Matsyshyn,
Justin C. W. Song
Abstract:
A supercurrent flow can induce a nonvanishing spin magnetization in noncentrosymmetric superconductors with spin-orbit interaction. Often known as the non-dissipative magnetoelectric effect, these are most commonly found at linear order in supercurrent flow. Here, we argue that a nonlinear superconducting magnetoelectric effect (NSM) can naturally manifest in altermagnet/superconductor (ALM/SC) he…
▽ More
A supercurrent flow can induce a nonvanishing spin magnetization in noncentrosymmetric superconductors with spin-orbit interaction. Often known as the non-dissipative magnetoelectric effect, these are most commonly found at linear order in supercurrent flow. Here, we argue that a nonlinear superconducting magnetoelectric effect (NSM) can naturally manifest in altermagnet/superconductor (ALM/SC) heterostructures: NSM manifests as a spin polarization generated as a second-order response to a driving supercurrent. Strikingly, we find NSM is the leading order magnetization response in ALM/SC heterostructures and survives even in the presence of centrosymmetry; $C_4 \mathcal{T}$ symmetry in altermagnets zeroes both the equilibrium magnetization as well as out-of-plane linear magnetoelectric response. This renders NSM a powerful electric and non-dissipative means of controlling magnetization in ALM/SC heterostructures, a promising platform for superconducting spintronics.
△ Less
Submitted 13 May, 2024; v1 submitted 29 April, 2024;
originally announced April 2024.
-
Simulating unsteady fluid flows on a superconducting quantum processor
Authors:
Zhaoyuan Meng,
Jiarun Zhong,
Shibo Xu,
Ke Wang,
Jiachen Chen,
Feitong Jin,
Xuhao Zhu,
Yu Gao,
Yaozu Wu,
Chuanyu Zhang,
Ning Wang,
Yiren Zou,
Aosai Zhang,
Zhengyi Cui,
Fanhao Shen,
Zehang Bao,
Zitian Zhu,
Ziqi Tan,
Tingting Li,
Pengfei Zhang,
Shiying Xiong,
Hekang Li,
Qiujiang Guo,
Zhen Wang,
Chao Song
, et al. (2 additional authors not shown)
Abstract:
Recent advancements of intermediate-scale quantum processors have triggered tremendous interest in the exploration of practical quantum advantage. The simulation of fluid dynamics, a highly challenging problem in classical physics but vital for practical applications, emerges as a good candidate for showing quantum utility. Here, we report an experiment on the digital simulation of unsteady flows,…
▽ More
Recent advancements of intermediate-scale quantum processors have triggered tremendous interest in the exploration of practical quantum advantage. The simulation of fluid dynamics, a highly challenging problem in classical physics but vital for practical applications, emerges as a good candidate for showing quantum utility. Here, we report an experiment on the digital simulation of unsteady flows, which consists of quantum encoding, evolution, and detection of flow states, with a superconducting quantum processor. The quantum algorithm is based on the Hamiltonian simulation using the hydrodynamic formulation of the Schrödinger equation. With the median fidelities of 99.97% and 99.67% for parallel single- and two-qubit gates respectively, we simulate the dynamics of a two-dimensional (2D) compressible diverging flow and a 2D decaying vortex with ten qubits. The experimental results well capture the temporal evolution of averaged density and momentum profiles, and qualitatively reproduce spatial flow fields with moderate noises. This work demonstrates the potential of quantum computing in simulating more complex flows, such as turbulence, for practical applications.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
Complete CP-eigen Bases of Meson-Baryon Chiral Lagrangian up to $p^5$-order
Authors:
Chuan-Qiang Song,
Hao Sun,
Jiang-Hao Yu
Abstract:
Chiral perturbation theory describes the low energy dynamics of mesons and baryons in terms of the nonlinear Goldstone boson and fermion degrees of freedom. Through the Young tensor technique, we construct the on-shell operator bases for the meson-baryon system up to $p^5$-order, using the chiral dimension power counting and heavy baryon expansion. For the Lorentz structure, additional treatments…
▽ More
Chiral perturbation theory describes the low energy dynamics of mesons and baryons in terms of the nonlinear Goldstone boson and fermion degrees of freedom. Through the Young tensor technique, we construct the on-shell operator bases for the meson-baryon system up to $p^5$-order, using the chiral dimension power counting and heavy baryon expansion. For the Lorentz structure, additional treatments on off-shell external sources and operators with higher derivatives are necessarily considered, while for the internal structure, the invariant tensor basis is converted into the trace basis equivalently, and Cayley-Hamilton relations are utilized to classify different CP eigen-operators. Finally we present the complete operator set of $C$+$P$+, $C$+$P$-, $C$-$P$+, and $C$-$P$- eigen-operators at the $p^5$-order, and obtain the operator counting from the Hilbert series.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Camera Agnostic Two-Head Network for Ego-Lane Inference
Authors:
Chaehyeon Song,
Sungho Yoon,
Minhyeok Heo,
Ayoung Kim,
Sujung Kim
Abstract:
Vision-based ego-lane inference using High-Definition (HD) maps is essential in autonomous driving and advanced driver assistance systems. The traditional approach necessitates well-calibrated cameras, which confines variation of camera configuration, as the algorithm relies on intrinsic and extrinsic calibration. In this paper, we propose a learning-based ego-lane inference by directly estimating…
▽ More
Vision-based ego-lane inference using High-Definition (HD) maps is essential in autonomous driving and advanced driver assistance systems. The traditional approach necessitates well-calibrated cameras, which confines variation of camera configuration, as the algorithm relies on intrinsic and extrinsic calibration. In this paper, we propose a learning-based ego-lane inference by directly estimating the ego-lane index from a single image. To enhance robust performance, our model incorporates the two-head structure inferring ego-lane in two perspectives simultaneously. Furthermore, we utilize an attention mechanism guided by vanishing point-and-line to adapt to changes in viewpoint without requiring accurate calibration. The high adaptability of our model was validated in diverse environments, devices, and camera mounting points and orientations.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
AI-Enhanced Cognitive Behavioral Therapy: Deep Learning and Large Language Models for Extracting Cognitive Pathways from Social Media Texts
Authors:
Meng Jiang,
Yi Jing Yu,
Qing Zhao,
Jianqiang Li,
Changwei Song,
Hongzhi Qi,
Wei Zhai,
Dan Luo,
Xiaoqin Wang,
Guanghui Fu,
Bing Xiang Yang
Abstract:
Cognitive Behavioral Therapy (CBT) is an effective technique for addressing the irrational thoughts stemming from mental illnesses, but it necessitates precise identification of cognitive pathways to be successfully implemented in patient care. In current society, individuals frequently express negative emotions on social media on specific topics, often exhibiting cognitive distortions, including…
▽ More
Cognitive Behavioral Therapy (CBT) is an effective technique for addressing the irrational thoughts stemming from mental illnesses, but it necessitates precise identification of cognitive pathways to be successfully implemented in patient care. In current society, individuals frequently express negative emotions on social media on specific topics, often exhibiting cognitive distortions, including suicidal behaviors in extreme cases. Yet, there is a notable absence of methodologies for analyzing cognitive pathways that could aid psychotherapists in conducting effective interventions online. In this study, we gathered data from social media and established the task of extracting cognitive pathways, annotating the data based on a cognitive theoretical framework. We initially categorized the task of extracting cognitive pathways as a hierarchical text classification with four main categories and nineteen subcategories. Following this, we structured a text summarization task to help psychotherapists quickly grasp the essential information. Our experiments evaluate the performance of deep learning and large language models (LLMs) on these tasks. The results demonstrate that our deep learning method achieved a micro-F1 score of 62.34% in the hierarchical text classification task. Meanwhile, in the text summarization task, GPT-4 attained a Rouge-1 score of 54.92 and a Rouge-2 score of 30.86, surpassing the experimental deep learning model's performance. However, it may suffer from an issue of hallucination. We have made all models and codes publicly available to support further research in this field.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
REACTO: Reconstructing Articulated Objects from a Single Video
Authors:
Chaoyue Song,
Jiacheng Wei,
Chuan-Sheng Foo,
Guosheng Lin,
Fayao Liu
Abstract:
In this paper, we address the challenge of reconstructing general articulated 3D objects from a single video. Existing works employing dynamic neural radiance fields have advanced the modeling of articulated objects like humans and animals from videos, but face challenges with piece-wise rigid general articulated objects due to limitations in their deformation models. To tackle this, we propose Qu…
▽ More
In this paper, we address the challenge of reconstructing general articulated 3D objects from a single video. Existing works employing dynamic neural radiance fields have advanced the modeling of articulated objects like humans and animals from videos, but face challenges with piece-wise rigid general articulated objects due to limitations in their deformation models. To tackle this, we propose Quasi-Rigid Blend Skinning, a novel deformation model that enhances the rigidity of each part while maintaining flexible deformation of the joints. Our primary insight combines three distinct approaches: 1) an enhanced bone rigging system for improved component modeling, 2) the use of quasi-sparse skinning weights to boost part rigidity and reconstruction fidelity, and 3) the application of geodesic point assignment for precise motion and seamless deformation. Our method outperforms previous works in producing higher-fidelity 3D reconstructions of general articulated objects, as demonstrated on both real and synthetic datasets. Project page: https://chaoyuesong.github.io/REACTO.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
FedDistill: Global Model Distillation for Local Model De-Biasing in Non-IID Federated Learning
Authors:
Changlin Song,
Divya Saxena,
Jiannong Cao,
Yuqing Zhao
Abstract:
Federated Learning (FL) is a novel approach that allows for collaborative machine learning while preserving data privacy by leveraging models trained on decentralized devices. However, FL faces challenges due to non-uniformly distributed (non-iid) data across clients, which impacts model performance and its generalization capabilities. To tackle the non-iid issue, recent efforts have utilized the…
▽ More
Federated Learning (FL) is a novel approach that allows for collaborative machine learning while preserving data privacy by leveraging models trained on decentralized devices. However, FL faces challenges due to non-uniformly distributed (non-iid) data across clients, which impacts model performance and its generalization capabilities. To tackle the non-iid issue, recent efforts have utilized the global model as a teaching mechanism for local models. However, our pilot study shows that their effectiveness is constrained by imbalanced data distribution, which induces biases in local models and leads to a 'local forgetting' phenomenon, where the ability of models to generalize degrades over time, particularly for underrepresented classes. This paper introduces FedDistill, a framework enhancing the knowledge transfer from the global model to local models, focusing on the issue of imbalanced class distribution. Specifically, FedDistill employs group distillation, segmenting classes based on their frequency in local datasets to facilitate a focused distillation process to classes with fewer samples. Additionally, FedDistill dissects the global model into a feature extractor and a classifier. This separation empowers local models with more generalized data representation capabilities and ensures more accurate classification across all classes. FedDistill mitigates the adverse effects of data imbalance, ensuring that local models do not forget underrepresented classes but instead become more adept at recognizing and classifying them accurately. Our comprehensive experiments demonstrate FedDistill's effectiveness, surpassing existing baselines in accuracy and convergence speed across several benchmark datasets.
△ Less
Submitted 14 April, 2024;
originally announced April 2024.
-
GNN-based Probabilistic Supply and Inventory Predictions in Supply Chain Networks
Authors:
Hyung-il Ahn,
Young Chol Song,
Santiago Olivar,
Hershel Mehta,
Naveen Tewari
Abstract:
Successful supply chain optimization must mitigate imbalances between supply and demand over time. While accurate demand prediction is essential for supply planning, it alone does not suffice. The key to successful supply planning for optimal and viable execution lies in maximizing predictability for both demand and supply throughout an execution horizon. Therefore, enhancing the accuracy of suppl…
▽ More
Successful supply chain optimization must mitigate imbalances between supply and demand over time. While accurate demand prediction is essential for supply planning, it alone does not suffice. The key to successful supply planning for optimal and viable execution lies in maximizing predictability for both demand and supply throughout an execution horizon. Therefore, enhancing the accuracy of supply predictions is imperative to create an attainable supply plan that matches demand without overstocking or understocking. However, in complex supply chain networks with numerous nodes and edges, accurate supply predictions are challenging due to dynamic node interactions, cascading supply delays, resource availability, production and logistic capabilities. Consequently, supply executions often deviate from their initial plans. To address this, we present the Graph-based Supply Prediction (GSP) probabilistic model. Our attention-based graph neural network (GNN) model predicts supplies, inventory, and imbalances using graph-structured historical data, demand forecasting, and original supply plan inputs. The experiments, conducted using historical data from a global consumer goods company's large-scale supply chain, demonstrate that GSP significantly improves supply and inventory prediction accuracy, potentially offering supply plan corrections to optimize executions.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
Generative Probabilistic Planning for Optimizing Supply Chain Networks
Authors:
Hyung-il Ahn,
Santiago Olivar,
Hershel Mehta,
Young Chol Song
Abstract:
Supply chain networks in enterprises are typically composed of complex topological graphs involving various types of nodes and edges, accommodating numerous products with considerable demand and supply variability. However, as supply chain networks expand in size and complexity, traditional supply chain planning methods (e.g., those found in heuristic rule-based and operations research-based syste…
▽ More
Supply chain networks in enterprises are typically composed of complex topological graphs involving various types of nodes and edges, accommodating numerous products with considerable demand and supply variability. However, as supply chain networks expand in size and complexity, traditional supply chain planning methods (e.g., those found in heuristic rule-based and operations research-based systems) tend to become locally optimal or lack computational scalability, resulting in substantial imbalances between supply and demand across nodes in the network. This paper introduces a novel Generative AI technique, which we call Generative Probabilistic Planning (GPP). GPP generates dynamic supply action plans that are globally optimized across all network nodes over the time horizon for changing objectives like maximizing profits or service levels, factoring in time-varying probabilistic demand, lead time, and production conditions. GPP leverages attention-based graph neural networks (GNN), offline deep reinforcement learning (Offline RL), and policy simulations to train generative policy models and create optimal plans through probabilistic simulations, effectively accounting for various uncertainties. Our experiments using historical data from a global consumer goods company with complex supply chain networks demonstrate that GPP accomplishes objective-adaptable, probabilistically resilient, and dynamic planning for supply chain networks, leading to significant improvements in performance and profitability for enterprises. Our work plays a pivotal role in shaping the trajectory of AI adoption within the supply chain domain.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
Tianyu: search for the second solar system and explore the dynamic universe
Authors:
Fabo Feng,
Yicheng Rui,
Zhimao Du,
Qing Lin,
Congcong Zhang,
Dan Zhou,
Kaiming Cui,
Masahiro Ogihara,
Ming Yang,
Jie Lin,
Yongzhi Cai,
Taozhi Yang,
Xiaoying Pang,
Mingjie Jian,
Wenxiong Li,
Hengxiao Guo,
Xian Shi,
Jianchun Shi,
Jianyang Li,
Kangrou Guo,
Song Yao,
Aming Chen,
Peng Jia,
Xianyu Tan,
James S. Jenkins
, et al. (10 additional authors not shown)
Abstract:
Giant planets like Jupiter and Saturn, play important roles in the formation and habitability of Earth-like planets. The detection of solar system analogs that have multiple cold giant planets is essential for our understanding of planet habitability and planet formation. Although transit surveys such as Kepler and TESS have discovered thousands of exoplanets, these missions are not sensitive to l…
▽ More
Giant planets like Jupiter and Saturn, play important roles in the formation and habitability of Earth-like planets. The detection of solar system analogs that have multiple cold giant planets is essential for our understanding of planet habitability and planet formation. Although transit surveys such as Kepler and TESS have discovered thousands of exoplanets, these missions are not sensitive to long period planets due to their limited observation baseline. The Tianyu project, comprising two 1-meter telescopes (Tianyu-I and II), is designed to detect transiting cold giant planets in order to find solar system analogs. Featuring a large field of view and equipped with a high-speed CMOS camera, Tianyu-I will perform a high-precision photometric survey of about 100 million stars, measuring light curves at hour-long cadence. The candidates found by Tianyu-I will be confirmed by Tianyu-II and other surveys and follow-up facilities through multi-band photometry, spectroscopy, and high resolution imaging. Tianyu telescopes will be situated at an elevation about 4000 meters in Lenghu, China. With a photometric precision of 1% for stars with V < 18 mag, Tianyu is expected to find more than 300 transiting exoplanets, including about 12 cold giant planets, over five years. A five-year survey of Tianyu would discover 1-2 solar system analogs. Moreover, Tianyu is also designed for non-exoplanetary exploration, incorporating multiple survey modes covering timescales from sub-seconds to months, with a particular emphasis on events occurring within the sub-second to hour range. It excels in observing areas such as infant supernovae, rare variable stars and binaries, tidal disruption events, Be stars, cometary activities, and interstellar objects. These discoveries not only enhance our comprehension of the universe but also offer compelling opportunities for public engagement in scientific exploration.
△ Less
Submitted 10 April, 2024; v1 submitted 10 April, 2024;
originally announced April 2024.
-
pfl-research: simulation framework for accelerating research in Private Federated Learning
Authors:
Filip Granqvist,
Congzheng Song,
Áine Cahill,
Rogier van Dalen,
Martin Pelikan,
Yi Sheng Chan,
Xiaojun Feng,
Natarajan Krishnaswami,
Vojta Jina,
Mona Chitnis
Abstract:
Federated learning (FL) is an emerging machine learning (ML) training paradigm where clients own their data and collaborate to train a global model, without revealing any data to the server and other participants. Researchers commonly perform experiments in a simulation environment to quickly iterate on ideas. However, existing open-source tools do not offer the efficiency required to simulate FL…
▽ More
Federated learning (FL) is an emerging machine learning (ML) training paradigm where clients own their data and collaborate to train a global model, without revealing any data to the server and other participants. Researchers commonly perform experiments in a simulation environment to quickly iterate on ideas. However, existing open-source tools do not offer the efficiency required to simulate FL on larger and more realistic FL datasets. We introduce pfl-research, a fast, modular, and easy-to-use Python framework for simulating FL. It supports TensorFlow, PyTorch, and non-neural network models, and is tightly integrated with state-of-the-art privacy algorithms. We study the speed of open-source FL frameworks and show that pfl-research is 7-72$\times$ faster than alternative open-source frameworks on common cross-device setups. Such speedup will significantly boost the productivity of the FL research community and enable testing hypotheses on realistic FL datasets that were previously too resource intensive. We release a suite of benchmarks that evaluates an algorithm's overall performance on a diverse set of realistic scenarios. The code is available on GitHub at https://github.com/apple/pfl-research.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
LHAASO-KM2A detector simulation using Geant4
Authors:
Zhen Cao,
F. Aharonian,
Q. An,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen,
S. Z. Chen
, et al. (254 additional authors not shown)
Abstract:
KM2A is one of the main sub-arrays of LHAASO, working on gamma ray astronomy and cosmic ray physics at energies above 10 TeV. Detector simulation is the important foundation for estimating detector performance and data analysis. It is a big challenge to simulate the KM2A detector in the framework of Geant4 due to the need to track numerous photons from a large number of detector units (>6000) with…
▽ More
KM2A is one of the main sub-arrays of LHAASO, working on gamma ray astronomy and cosmic ray physics at energies above 10 TeV. Detector simulation is the important foundation for estimating detector performance and data analysis. It is a big challenge to simulate the KM2A detector in the framework of Geant4 due to the need to track numerous photons from a large number of detector units (>6000) with large altitude difference (30 m) and huge coverage (1.3 km^2). In this paper, the design of the KM2A simulation code G4KM2A based on Geant4 is introduced. The process of G4KM2A is optimized mainly in memory consumption to avoid memory overffow. Some simpliffcations are used to signiffcantly speed up the execution of G4KM2A. The running time is reduced by at least 30 times compared to full detector simulation. The particle distributions and the core/angle resolution comparison between simulation and experimental data of the full KM2A array are also presented, which show good agreement.
△ Less
Submitted 7 April, 2024;
originally announced April 2024.
-
HyperCLOVA X Technical Report
Authors:
Kang Min Yoo,
Jaegeun Han,
Sookyo In,
Heewon Jeon,
Jisu Jeong,
Jaewook Kang,
Hyunwook Kim,
Kyung-Min Kim,
Munhyong Kim,
Sungju Kim,
Donghyun Kwak,
Hanock Kwak,
Se Jung Kwon,
Bado Lee,
Dongsoo Lee,
Gichang Lee,
Jooho Lee,
Baeseong Park,
Seongjin Shin,
Joonsang Yu,
Seolki Baek,
Sumin Byeon,
Eungsup Cho,
Dooseok Choe,
Jeesung Han
, et al. (371 additional authors not shown)
Abstract:
We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t…
▽ More
We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in developing their sovereign LLMs.
△ Less
Submitted 13 April, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
On Train-Test Class Overlap and Detection for Image Retrieval
Authors:
Chull Hwan Song,
Jooyoung Yoon,
Taebaek Hwang,
Shunghyun Choi,
Yeong Hyeon Gu,
Yannis Avrithis
Abstract:
How important is it for training and evaluation sets to not have class overlap in image retrieval? We revisit Google Landmarks v2 clean, the most popular training set, by identifying and removing class overlap with Revisited Oxford and Paris [34], the most popular evaluation set. By comparing the original and the new RGLDv2-clean on a benchmark of reproduced state-of-the-art methods, our findings…
▽ More
How important is it for training and evaluation sets to not have class overlap in image retrieval? We revisit Google Landmarks v2 clean, the most popular training set, by identifying and removing class overlap with Revisited Oxford and Paris [34], the most popular evaluation set. By comparing the original and the new RGLDv2-clean on a benchmark of reproduced state-of-the-art methods, our findings are striking. Not only is there a dramatic drop in performance, but it is inconsistent across methods, changing the ranking.What does it take to focus on objects or interest and ignore background clutter when indexing? Do we need to train an object detector and the representation separately? Do we need location supervision? We introduce Single-stage Detect-to-Retrieve (CiDeR), an end-to-end, single-stage pipeline to detect objects of interest and extract a global image representation. We outperform previous state-of-the-art on both existing training sets and the new RGLDv2-clean. Our dataset is available at https://github.com/dealicious-inc/RGLDv2-clean.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
SyncMask: Synchronized Attentional Masking for Fashion-centric Vision-Language Pretraining
Authors:
Chull Hwan Song,
Taebaek Hwang,
Jooyoung Yoon,
Shunghyun Choi,
Yeong Hyeon Gu
Abstract:
Vision-language models (VLMs) have made significant strides in cross-modal understanding through large-scale paired datasets. However, in fashion domain, datasets often exhibit a disparity between the information conveyed in image and text. This issue stems from datasets containing multiple images of a single fashion item all paired with one text, leading to cases where some textual details are no…
▽ More
Vision-language models (VLMs) have made significant strides in cross-modal understanding through large-scale paired datasets. However, in fashion domain, datasets often exhibit a disparity between the information conveyed in image and text. This issue stems from datasets containing multiple images of a single fashion item all paired with one text, leading to cases where some textual details are not visible in individual images. This mismatch, particularly when non-co-occurring elements are masked, undermines the training of conventional VLM objectives like Masked Language Modeling and Masked Image Modeling, thereby hindering the model's ability to accurately align fine-grained visual and textual features. Addressing this problem, we propose Synchronized attentional Masking (SyncMask), which generate masks that pinpoint the image patches and word tokens where the information co-occur in both image and text. This synchronization is accomplished by harnessing cross-attentional features obtained from a momentum model, ensuring a precise alignment between the two modalities. Additionally, we enhance grouped batch sampling with semi-hard negatives, effectively mitigating false negative issues in Image-Text Matching and Image-Text Contrastive learning objectives within fashion datasets. Our experiments demonstrate the effectiveness of the proposed approach, outperforming existing methods in three downstream tasks.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
Electrical-controllable antiferromagnet-based tunnel junction
Authors:
Lei Han,
Xuming Luo,
Yingqian Xu,
Hua Bai,
Wenxuan Zhu,
Yuxiang Zhu,
Guoqiang Yu,
Cheng Song,
Feng Pan
Abstract:
Electrical-controllable antiferromagnet tunnel junction is a key goal in spintronics, holding immense promise for ultra-dense and ultra-stable antiferromagnetic memory with high processing speed for modern information technology. Here, we have advanced towards this goal by achieving an electrical-controllable antiferromagnet-based tunnel junction of Pt/Co/Pt/Co/IrMn/MgO/Pt. The exchange coupling b…
▽ More
Electrical-controllable antiferromagnet tunnel junction is a key goal in spintronics, holding immense promise for ultra-dense and ultra-stable antiferromagnetic memory with high processing speed for modern information technology. Here, we have advanced towards this goal by achieving an electrical-controllable antiferromagnet-based tunnel junction of Pt/Co/Pt/Co/IrMn/MgO/Pt. The exchange coupling between antiferromagnetic IrMn and Co/Pt perpendicular magnetic multilayers results in the formation of interfacial exchange bias and exchange spring in IrMn. Encoding information states 0 and 1 is realized through the exchange spring in IrMn, which can be electrically written by spin-orbit torque switching with high cyclability and electrically read by antiferromagnetic tunneling anisotropic magnetoresistance. Combining spin-orbit torque switching of both exchange spring andexchange bias, 16 Boolean logic operation is successfully demonstrated. With both memory and logic functionalities integrated into our electrical-controllable antiferromagnetic-based tunnel junction, we chart the course toward high-performance antiferromagnetic logic-in-memory.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
S2RC-GCN: A Spatial-Spectral Reliable Contrastive Graph Convolutional Network for Complex Land Cover Classification Using Hyperspectral Images
Authors:
Renxiang Guan,
Zihao Li,
Chujia Song,
Guo Yu,
Xianju Li,
Ruyi Feng
Abstract:
Spatial correlations between different ground objects are an important feature of mining land cover research. Graph Convolutional Networks (GCNs) can effectively capture such spatial feature representations and have demonstrated promising results in performing hyperspectral imagery (HSI) classification tasks of complex land. However, the existing GCN-based HSI classification methods are prone to i…
▽ More
Spatial correlations between different ground objects are an important feature of mining land cover research. Graph Convolutional Networks (GCNs) can effectively capture such spatial feature representations and have demonstrated promising results in performing hyperspectral imagery (HSI) classification tasks of complex land. However, the existing GCN-based HSI classification methods are prone to interference from redundant information when extracting complex features. To classify complex scenes more effectively, this study proposes a novel spatial-spectral reliable contrastive graph convolutional classification framework named S2RC-GCN. Specifically, we fused the spectral and spatial features extracted by the 1D- and 2D-encoder, and the 2D-encoder includes an attention model to automatically extract important information. We then leveraged the fused high-level features to construct graphs and fed the resulting graphs into the GCNs to determine more effective graph representations. Furthermore, a novel reliable contrastive graph convolution was proposed for reliable contrastive learning to learn and fuse robust features. Finally, to test the performance of the model on complex object classification, we used imagery taken by Gaofen-5 in the Jiang Xia area to construct complex land cover datasets. The test results show that compared with other models, our model achieved the best results and effectively improved the classification performance of complex remote sensing imagery.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
Non-Abelian braiding of Fibonacci anyons with a superconducting processor
Authors:
Shibo Xu,
Zheng-Zhi Sun,
Ke Wang,
Hekang Li,
Zitian Zhu,
Hang Dong,
Jinfeng Deng,
Xu Zhang,
Jiachen Chen,
Yaozu Wu,
Chuanyu Zhang,
Feitong Jin,
Xuhao Zhu,
Yu Gao,
Aosai Zhang,
Ning Wang,
Yiren Zou,
Ziqi Tan,
Fanhao Shen,
Jiarun Zhong,
Zehang Bao,
Weikang Li,
Wenjie Jiang,
Li-Wei Yu,
Zixuan Song
, et al. (7 additional authors not shown)
Abstract:
Non-Abelian topological orders offer an intriguing path towards fault-tolerant quantum computation, where information can be encoded and manipulated in a topologically protected manner immune to arbitrary local noises and perturbations. However, realizing non-Abelian topologically ordered states is notoriously challenging in both condensed matter and programmable quantum systems, and it was not un…
▽ More
Non-Abelian topological orders offer an intriguing path towards fault-tolerant quantum computation, where information can be encoded and manipulated in a topologically protected manner immune to arbitrary local noises and perturbations. However, realizing non-Abelian topologically ordered states is notoriously challenging in both condensed matter and programmable quantum systems, and it was not until recently that signatures of non-Abelian statistics were observed through digital quantum simulation approaches. Despite these exciting progresses, none of them has demonstrated the appropriate type of topological orders and associated non-Abelian anyons whose braidings alone support universal quantum computation. Here, we report the realization of non-Abelian topologically ordered states of the Fibonacci string-net model and demonstrate braidings of Fibonacci anyons featuring universal computational power, with a superconducting quantum processor. We exploit efficient quantum circuits to prepare the desired states and verify their nontrivial topological nature by measuring the topological entanglement entropy. In addition, we create two pairs of Fibonacci anyons and demonstrate their fusion rule and non-Abelian braiding statistics by applying unitary gates on the underlying physical qubits. Our results establish a versatile digital approach to exploring exotic non-Abelian topological states and their associated braiding statistics with current noisy intermediate-scale quantum processors.
△ Less
Submitted 29 March, 2024;
originally announced April 2024.
-
3D-Speaker-Toolkit: An Open Source Toolkit for Multi-modal Speaker Verification and Diarization
Authors:
Yafeng Chen,
Siqi Zheng,
Hui Wang,
Luyao Cheng,
Tinglong Zhu,
Changhe Song,
Rongjie Huang,
Ziyang Ma,
Qian Chen,
Shiliang Zhang,
Xihao Li
Abstract:
This paper introduces 3D-Speaker-Toolkit, an open source toolkit for multi-modal speaker verification and diarization. It is designed for the needs of academic researchers and industrial practitioners. The 3D-Speaker-Toolkit adeptly leverages the combined strengths of acoustic, semantic, and visual data, seamlessly fusing these modalities to offer robust speaker recognition capabilities. The acous…
▽ More
This paper introduces 3D-Speaker-Toolkit, an open source toolkit for multi-modal speaker verification and diarization. It is designed for the needs of academic researchers and industrial practitioners. The 3D-Speaker-Toolkit adeptly leverages the combined strengths of acoustic, semantic, and visual data, seamlessly fusing these modalities to offer robust speaker recognition capabilities. The acoustic module extracts speaker embeddings from acoustic features, employing both fully-supervised and self-supervised learning approaches. The semantic module leverages advanced language models to apprehend the substance and context of spoken language, thereby augmenting the system's proficiency in distinguishing speakers through linguistic patterns. Finally, the visual module applies image processing technologies to scrutinize facial features, which bolsters the precision of speaker diarization in multi-speaker environments. Collectively, these modules empower the 3D-Speaker-Toolkit to attain elevated levels of accuracy and dependability in executing speaker-related tasks, establishing a new benchmark in multi-modal speaker analysis. The 3D-Speaker project also includes a handful of open-sourced state-of-the-art models and a large dataset containing over 10,000 speakers. The toolkit is publicly available at https://github.com/alibaba-damo-academy/3D-Speaker.
△ Less
Submitted 29 March, 2024;
originally announced March 2024.
-
The recessionary pressures of generative AI: A threat to wellbeing
Authors:
Jo-An Occhipinti,
Ante Prodan,
William Hynes,
Roy Green,
Sharan Burrow,
Harris A Eyre,
Adam Skinner,
Goran Ujdur,
John Buchanan,
Ian B Hickie,
Mark Heffernan,
Christine Song,
Marcel Tanner
Abstract:
Generative Artificial Intelligence (AI) stands as a transformative force that presents a paradox; it offers unprecedented opportunities for productivity growth while potentially posing significant threats to economic stability and societal wellbeing. Many consider generative AI as akin to previous technological advancements, using historical precedent to argue that fears of widespread job displace…
▽ More
Generative Artificial Intelligence (AI) stands as a transformative force that presents a paradox; it offers unprecedented opportunities for productivity growth while potentially posing significant threats to economic stability and societal wellbeing. Many consider generative AI as akin to previous technological advancements, using historical precedent to argue that fears of widespread job displacement are unfounded, while others contend that generative AI`s unique capacity to undertake non-routine cognitive tasks sets it apart from other forms of automation capital and presents a threat to the quality and availability of work that underpin stable societies. This paper explores the conditions under which both may be true. We posit the existence of an AI-capital-to-labour ratio threshold beyond which a self-reinforcing cycle of recessionary pressures could be triggered, exacerbating social disparities, reducing social cohesion, heightening tensions, and requiring sustained government intervention to maintain stability. To prevent this, the paper underscores the urgent need for proactive policy responses, making recommendations to reduce these risks through robust regulatory frameworks and a new social contract characterised by progressive social and economic policies. This approach aims to ensure a sustainable, inclusive, and resilient economic future where human contribution to the economy is retained and integrated with generative AI to enhance the Mental Wealth of nations.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
Measuring Spectral Form Factor in Many-Body Chaotic and Localized Phases of Quantum Processors
Authors:
Hang Dong,
Pengfei Zhang,
Ceren B. Dag,
Yu Gao,
Ning Wang,
Jinfeng Deng,
Xu Zhang,
Jiachen Chen,
Shibo Xu,
Ke Wang,
Yaozu Wu,
Chuanyu Zhang,
Feitong Jin,
Xuhao Zhu,
Aosai Zhang,
Yiren Zou,
Ziqi Tan,
Zhengyi Cui,
Zitian Zhu,
Fanhao Shen,
Tingting Li,
Jiarun Zhong,
Zehang Bao,
Hekang Li,
Zhen Wang
, et al. (6 additional authors not shown)
Abstract:
The spectral form factor (SFF) captures universal spectral fluctuations as signatures of quantum chaos, and has been instrumental in advancing multiple frontiers of physics including the studies of black holes and quantum many-body systems. However, the measurement of SFF in many-body systems is challenging due to the difficulty in resolving level spacings that become exponentially small with incr…
▽ More
The spectral form factor (SFF) captures universal spectral fluctuations as signatures of quantum chaos, and has been instrumental in advancing multiple frontiers of physics including the studies of black holes and quantum many-body systems. However, the measurement of SFF in many-body systems is challenging due to the difficulty in resolving level spacings that become exponentially small with increasing system size. Here we experimentally measure the SFF to probe the presence or absence of chaos in quantum many-body systems using a superconducting quantum processor with a randomized measurement protocol. For a Floquet chaotic system, we observe signatures of spectral rigidity of random matrix theory in SFF given by the ramp-plateau behavior. For a Hamiltonian system, we utilize SFF to distinguish the quantum many-body chaotic phase and the prethermal many-body localization. We observe the dip-ramp-plateau behavior of random matrix theory in the chaotic phase, and contrast the scaling of the plateau time in system size between the many-body chaotic and localized phases. Furthermore, we probe the eigenstate statistics by measuring a generalization of the SFF, known as the partial SFF, and observe distinct behaviors in the purities of the reduced density matrix in the two phases. This work unveils a new way of extracting the universal signatures of many-body quantum chaos in quantum devices by probing the correlations in eigenenergies and eigenstates.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Contribution of neutrino-dominated accretion flows to cosmic MeV neutrino background
Authors:
Yun-Feng Wei,
Tong Liu,
Cui-Ying Song
Abstract:
Neutrino-dominated accretion flows (NDAFs) are one of the important MeV neutrino sources and significantly contribute to the cosmic diffuse neutrino background. In this paper, we investigate the spectrum of diffuse NDAF neutrino background (DNNB) by fully considering the effects of the progenitor properties and initial explosion energies based on core-collapse supernova (CCSN) simulations, and est…
▽ More
Neutrino-dominated accretion flows (NDAFs) are one of the important MeV neutrino sources and significantly contribute to the cosmic diffuse neutrino background. In this paper, we investigate the spectrum of diffuse NDAF neutrino background (DNNB) by fully considering the effects of the progenitor properties and initial explosion energies based on core-collapse supernova (CCSN) simulations, and estimate the detectable event rate by Super-Kamiokande detector. We find that the predicted background neutrino flux is mainly determined by the typical CCSN initial explosion energy and progenitor metallicity. For the optimistic cases in which the typical initial explosion energy is low, the diffuse flux of DNNB is comparable to the diffuse supernova neutrino background, which might be detected by the upcoming larger neutrino detectors such as Hyper-Kamiokande, JUNO, and DUNE. Moreover, the strong outflows from NDAFs could dramatically decrease their contribution to the neutrino background.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
An Optimization Framework to Enforce Multi-View Consistency for Texturing 3D Meshes Using Pre-Trained Text-to-Image Models
Authors:
Zhengyi Zhao,
Chen Song,
Xiaodong Gu,
Yuan Dong,
Qi Zuo,
Weihao Yuan,
Zilong Dong,
Liefeng Bo,
Qixing Huang
Abstract:
A fundamental problem in the texturing of 3D meshes using pre-trained text-to-image models is to ensure multi-view consistency. State-of-the-art approaches typically use diffusion models to aggregate multi-view inputs, where common issues are the blurriness caused by the averaging operation in the aggregation step or inconsistencies in local features. This paper introduces an optimization framewor…
▽ More
A fundamental problem in the texturing of 3D meshes using pre-trained text-to-image models is to ensure multi-view consistency. State-of-the-art approaches typically use diffusion models to aggregate multi-view inputs, where common issues are the blurriness caused by the averaging operation in the aggregation step or inconsistencies in local features. This paper introduces an optimization framework that proceeds in four stages to achieve multi-view consistency. Specifically, the first stage generates an over-complete set of 2D textures from a predefined set of viewpoints using an MV-consistent diffusion process. The second stage selects a subset of views that are mutually consistent while covering the underlying 3D model. We show how to achieve this goal by solving semi-definite programs. The third stage performs non-rigid alignment to align the selected views across overlapping regions. The fourth stage solves an MRF problem to associate each mesh face with a selected view. In particular, the third and fourth stages are iterated, with the cuts obtained in the fourth stage encouraging non-rigid alignment in the third stage to focus on regions close to the cuts. Experimental results show that our approach significantly outperforms baseline approaches both qualitatively and quantitatively.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
Quantum spin driven Yu-Shiba-Rusinov multiplets and fermion-parity-preserving phase transition in K$_3$C$_{60}$
Authors:
Shu-Ze Wang,
Xue-Qing Yu,
Li-Xuan Wei,
Li Wang,
Qiang-Jun Cheng,
Kun Peng,
Fang-Jun Cheng,
Yu Liu,
Fang-Sen Li,
Xu-Cun Ma,
Qi-Kun Xue,
Can-Li Song
Abstract:
Magnetic impurities in superconductors are of increasing interest due to emergent Yu-Shiba-Rusinov (YSR) states and Majorana zero modes for fault-tolerant quantum computation. However, a direct relationship between the YSR multiple states and magnetic anisotropy splitting of quantum impurity spins remains poorly characterized. By using scanning tunneling microscopy, we resolve systematically indiv…
▽ More
Magnetic impurities in superconductors are of increasing interest due to emergent Yu-Shiba-Rusinov (YSR) states and Majorana zero modes for fault-tolerant quantum computation. However, a direct relationship between the YSR multiple states and magnetic anisotropy splitting of quantum impurity spins remains poorly characterized. By using scanning tunneling microscopy, we resolve systematically individual transition-metal (Fe, Cr and Ni) impurities induced YSR multiplets as well as their Zeeman effects in K$_3$C$_{60}$ superconductor. The YSR multiplets show identical $d$ orbital-like wave functions that are symmetry-mismatched to the threefold K$_3$C$_{60}$(111) host surface, breaking point-group symmetries of the spatial distribution of YSR bound states in real space. Remarkably, we identify an unprecedented fermion-parity-preserving quantum phase transition between ground states with opposite signs of the uniaxial magnetic anisotropy that can be manipulated by an external magnetic field. These findings can be readily understood in terms of anisotropy splitting of quantum impurity spins, and thus elucidate the intricate interplay between the magnetic anisotropy and YSR multiplets.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
Observation of non-volatile anomalous Nernst effect in altermagnet with collinear Néel vector
Authors:
Lei Han,
Xizhi Fu,
Wenqing He,
Yuxiang Zhu,
Jiankun Dai,
Wenfeng Yang,
Wenxuan Zhu,
Hua Bai,
Chong Chen,
Caihua Wan,
Xiufeng Han,
Cheng Song,
Junwei Liu,
Feng Pan
Abstract:
Anomalous Nernst effect (ANE), a widely investigated transverse thermoelectric effect that converts waste heat into electrical energy with remarkable flexibility and integration capability, has been extended to antiferromagnets with non-collinear spin texture recently. ANE in compensated magnet with collinear Néel vector will bring more opportunities to construct magnetic-field-immune and ultrafas…
▽ More
Anomalous Nernst effect (ANE), a widely investigated transverse thermoelectric effect that converts waste heat into electrical energy with remarkable flexibility and integration capability, has been extended to antiferromagnets with non-collinear spin texture recently. ANE in compensated magnet with collinear Néel vector will bring more opportunities to construct magnetic-field-immune and ultrafast transverse thermoelectric converters, but remains unachieved for long. It is due to the degenerated band structure of traditional collinear compensated magnet excludes non-zero Berry curvature. Here, we realize non-volatile ANE in altermagnet Mn5Si3 thin film with collinear Neel vector, whose unique alternating spin-splitting band structure plays vital role in creating non-zero Berry curvature and hotpots of anomalous Nernst conductivity near band intersections. Interestingly, ANE is relatively weak in stoichiometric Mn5Si3, but undergoes a sixfold enhancement through strategically raising the Fermi level by additional Mn doping, indicating sensitive intrinsic influence from specific location of the Fermi level on ANE in altermagnet. Moreover, our investigation reveals a unique Neel-vector-dependent temperature-scaling relationship of anomalous Nernst conductivity in Mn5Si3. Our work not only fills a longstanding gap by confirming the presence of non-volatile ANE in collinear compensated magnet, but also enlightens thermoelectric physics related to exotic spin-splitting band structure in altermagnet.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.