Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 1,714 results for author: Cheng, Y

.
  1. arXiv:2409.09740  [pdf, other

    cs.CV

    VGG-Tex: A Vivid Geometry-Guided Facial Texture Estimation Model for High Fidelity Monocular 3D Face Reconstruction

    Authors: Haoyu Wu, Ziqiao Peng, Xukun Zhou, Yunfei Cheng, Jun He, Hongyan Liu, Zhaoxin Fan

    Abstract: 3D face reconstruction from monocular images has promoted the development of various applications such as augmented reality. Though existing methods have made remarkable progress, most of them emphasize geometric reconstruction, while overlooking the importance of texture prediction. To address this issue, we propose VGG-Tex, a novel Vivid Geometry-Guided Facial Texture Estimation model designed f… ▽ More

    Submitted 15 September, 2024; originally announced September 2024.

  2. arXiv:2409.09682  [pdf

    cs.RO

    A Robust Probability-based Joint Registration Method of Multiple Point Clouds Considering Local Consistency

    Authors: Lingjie Su, Wei Xu, Shuyang Zhao, Yuqi Cheng, Wenlong Li

    Abstract: In robotic inspection, joint registration of multiple point clouds is an essential technique for estimating the transformation relationships between measured parts, such as multiple blades in a propeller. However, the presence of noise and outliers in the data can significantly impair the registration performance by affecting the correctness of correspondences. To address this issue, we incorporat… ▽ More

    Submitted 15 September, 2024; originally announced September 2024.

    Comments: Submitted to ICRA 2025

  3. arXiv:2409.08898  [pdf, other

    math.NA quant-ph

    Kraus is King: High-order Completely Positive and Trace Preserving (CPTP) Low Rank Method for the Lindblad Master Equation

    Authors: Daniel Appelo, Yingda Cheng

    Abstract: We design high order accurate methods that exploit low rank structure in the density matrix while respecting the essential structure of the Lindblad equation. Our methods preserves complete positivity and are trace preserving.

    Submitted 13 September, 2024; originally announced September 2024.

  4. arXiv:2409.08872  [pdf, other

    cs.CL cs.SD eess.AS

    Exploring the Impact of Data Quantity on ASR in Extremely Low-resource Languages

    Authors: Yao-Fei Cheng, Li-Wei Chen, Hung-Shin Lee, Hsin-Min Wang

    Abstract: This study investigates the efficacy of data augmentation techniques for low-resource automatic speech recognition (ASR), focusing on two endangered Austronesian languages, Amis and Seediq. Recognizing the potential of self-supervised learning (SSL) in low-resource settings, we explore the impact of data volume on the continued pre-training of SSL models. We propose a novel data-selection scheme l… ▽ More

    Submitted 13 September, 2024; originally announced September 2024.

  5. arXiv:2409.07731  [pdf, other

    quant-ph

    Group delay controlled by the decoherence of a single artificial atom

    Authors: Y. -T. Cheng, K. -M. Hsieh, B. -Y. Wu, Z. Q. Niu, F. Aziz, Y. -H. Huang, P. Y. Wen, K. -T. Lin, Y. -H. Lin, J. C. Chen, A. F. Kockum, G. -D. Lin, Z. -R. Lin, Y. Lu, I. -C. Hoi

    Abstract: The ability to slow down light at the single-photon level has applications in quantum information processing and other quantum technologies. We demonstrate two methods, both using just a single artificial atom, enabling dynamic control over microwave light velocities in waveguide quantum electrodynamics (waveguide QED). Our methods are based on two distinct mechanisms harnessing the balance betwee… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

  6. arXiv:2409.06772  [pdf, other

    astro-ph.GA

    Broad-Line AGN at $3.5<z<6$: The Black Hole Mass Function and a Connection with Little Red Dots

    Authors: Anthony J. Taylor, Steven L. Finkelstein, Dale D. Kocevski, Junehyoung Jeon, Volker Bromm, Ricardo O. Amorin, Pablo Arrabal Haro, Bren E. Backhaus, Micaela B. Bagley, Eduardo Bañados, Rachana Bhatawdekar, Madisyn Brooks, Antonello Calabro, Oscar A. Chavez Ortiz, Yingjie Cheng, Nikko J. Cleri, Justin W. Cole, Kelcey Davis, Mark Dickinson, Callum Donnan, James S. Dunlop, Richard S. Ellis, Vital Fernandez, Adriano Fontana, Seiji Fujimoto , et al. (26 additional authors not shown)

    Abstract: We present a sample of 50 H-alpha detected broad-line active galactic nuclei (BLAGN) at redshifts 3.5<z<6.8 using data from the CEERS and RUBIES surveys. We select these sources directly from JWST/NIRSpec G395M/F290LP spectra. We use a multi-step pre-selection and a Bayesian fitting procedure to ensure a high-quality sample of sources with broad Balmer lines and narrow forbidden lines. We compute… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

    Comments: 28 pages, 14 figures, 4 tables. Submitted to ApJ

  7. arXiv:2409.06216  [pdf, other

    cs.CL

    SubRegWeigh: Effective and Efficient Annotation Weighing with Subword Regularization

    Authors: Kohei Tsuji, Tatsuya Hiraoka, Yuchang Cheng, Tomoya Iwakura

    Abstract: Many datasets of natural language processing (NLP) sometimes include annotation errors. Researchers have attempted to develop methods to reduce the adverse effect of errors in datasets automatically. However, an existing method is time-consuming because it requires many trained models to detect errors. We propose a novel method to reduce the time of error detection. Specifically, we use a tokeniza… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

    Comments: 14 pages, 1 figures, 10 tables

  8. arXiv:2409.06100  [pdf, other

    astro-ph.GA

    The Abundance and Properties of Barred Galaxies out to $z \sim$ 4 Using $\textit{JWST}$ CEERS Data

    Authors: Yuchen Guo, Shardha Jogee, Eden Wise, Keith Pritchett Jr., Elizabeth J. McGrath, Steven L. Finkelstein, Kartheik G. Iyer, Pablo Arrabal Haro, Micaela B. Bagley, Mark Dickinson, Jeyhan S. Kartaltepe, Anton M. Koekemoer, Casey Papovich, Nor Pirzkal, L. Y. Aaron Yung, Bren E. Backhaus, Eric F. Bell, Rachana Bhatawdekar, Yingjie Cheng, Luca Costantin, Alexander de la Vega, Mauro Giavalisco, Nimish P. Hathi, Benne W. Holwerda, Peter Kurczynski , et al. (4 additional authors not shown)

    Abstract: We analyze $\textit{JWST}$ CEERS NIRCam images to present {the first estimate} of the observed fraction and properties of bars out to $z \sim 4$. We analyze a sample of 1770 galaxies with stellar mass $M_\star > 10^{10} M_\odot$ at $0.5 \leq z \leq 4$ and identify barred galaxies via ellipse fits and visual classification of both F200W and F444W images. Our results apply mainly to bars with projec… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

    Comments: 25 pages, 16 figures, submitted to ApJ, Comments are welcome

  9. arXiv:2409.05576  [pdf, other

    cs.SE

    JavaVFC: Java Vulnerability Fixing Commits from Open-source Software

    Authors: Tan Bui, Yan Naing Tun, Yiran Cheng, Ivana Clairine Irsan, Ting Zhang, Hong Jin Kang

    Abstract: We present a comprehensive dataset of Java vulnerability-fixing commits (VFCs) to advance research in Java vulnerability analysis. Our dataset, derived from thousands of open-source Java projects on GitHub, comprises two variants: JavaVFC and JavaVFC-extended. The dataset was constructed through a rigorous process involving heuristic rules and multiple rounds of manual labeling. We initially used… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

  10. arXiv:2409.04774  [pdf, other

    cs.CL cs.AI

    Untie the Knots: An Efficient Data Augmentation Strategy for Long-Context Pre-Training in Language Models

    Authors: Junfeng Tian, Da Zheng, Yang Cheng, Rui Wang, Colin Zhang, Debing Zhang

    Abstract: Large language models (LLM) have prioritized expanding the context window from which models can incorporate more information. However, training models to handle long contexts presents significant challenges. These include the scarcity of high-quality natural long-context data, the potential for performance degradation on short-context tasks, and the reduced training efficiency associated with atte… ▽ More

    Submitted 7 September, 2024; originally announced September 2024.

  11. arXiv:2409.04761  [pdf

    eess.SP

    Transformer Based Tissue Classification in Robotic Needle Biopsy

    Authors: Fanxin Wang, Yikun Cheng, Sudipta S Mukherjee, Rohit Bhargava, Thenkurussi Kesavadas

    Abstract: Image-guided minimally invasive robotic surgery is commonly employed for tasks such as needle biopsies or localized therapies. However, the nonlinear deformation of various tissue types presents difficulties for surgeons in achieving precise needle tip placement, particularly when relying on low-fidelity biopsy imaging systems. In this paper, we introduce a method to classify needle biopsy interve… ▽ More

    Submitted 7 September, 2024; originally announced September 2024.

    Comments: 8 pages

    Journal ref: IEEE SMC 2024

  12. arXiv:2409.03381  [pdf, other

    cs.CL cs.AI

    CogniDual Framework: Self-Training Large Language Models within a Dual-System Theoretical Framework for Improving Cognitive Tasks

    Authors: Yongxin Deng, Xihe Qiu, Xiaoyu Tan, Chao Qu, Jing Pan, Yuan Cheng, Yinghui Xu, Wei Chu

    Abstract: Cognitive psychology investigates perception, attention, memory, language, problem-solving, decision-making, and reasoning. Kahneman's dual-system theory elucidates the human decision-making process, distinguishing between the rapid, intuitive System 1 and the deliberative, rational System 2. Recent advancements have positioned large language Models (LLMs) as formidable tools nearing human-level p… ▽ More

    Submitted 6 September, 2024; v1 submitted 5 September, 2024; originally announced September 2024.

  13. arXiv:2409.01526  [pdf, other

    physics.optics

    Directional sources realised by toroidal dipoles

    Authors: Junho Jung, Yuqiong Cheng, Wanyue Xiao, Shubo Wang

    Abstract: Directional optical sources can give rise to the directional excitation and propagation of light. The directionality of the conventional directional dipole (CDD) sources are attributed to the interference of the electric and/or magnetic dipoles, while the effect of the toroidal dipole on optical directionality remains unexplored.} Here, we numerically and analytically investigate the directional p… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: 21 pages, 6 figures

  14. ESP-PCT: Enhanced VR Semantic Performance through Efficient Compression of Temporal and Spatial Redundancies in Point Cloud Transformers

    Authors: Luoyu Mei, Shuai Wang, Yun Cheng, Ruofeng Liu, Zhimeng Yin, Wenchao Jiang, Shuai Wang, Wei Gong

    Abstract: Semantic recognition is pivotal in virtual reality (VR) applications, enabling immersive and interactive experiences. A promising approach is utilizing millimeter-wave (mmWave) signals to generate point clouds. However, the high computational and memory demands of current mmWave point cloud models hinder their efficiency and reliability. To address this limitation, our paper introduces ESP-PCT, a… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Journal ref: Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI 2024

  15. arXiv:2408.17150  [pdf, other

    cs.CV cs.AI

    Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning

    Authors: Xiaoye Qu, Jiashuo Sun, Wei Wei, Yu Cheng

    Abstract: Recently, Large Vision-Language Models (LVLMs) have demonstrated impressive capabilities in multi-modal context comprehension. However, they still suffer from hallucination problems referring to generating inconsistent outputs with the image content. To mitigate hallucinations, previous studies mainly focus on retraining LVLMs with custom datasets. Although effective, they inherently come with add… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: 13 pages, 7 tables, 7 figures

  16. arXiv:2408.16500  [pdf, other

    cs.CV

    CogVLM2: Visual Language Models for Image and Video Understanding

    Authors: Wenyi Hong, Weihan Wang, Ming Ding, Wenmeng Yu, Qingsong Lv, Yan Wang, Yean Cheng, Shiyu Huang, Junhui Ji, Zhao Xue, Lei Zhao, Zhuoyi Yang, Xiaotao Gu, Xiaohan Zhang, Guanyu Feng, Da Yin, Zihan Wang, Ji Qi, Xixuan Song, Peng Zhang, Debing Liu, Bin Xu, Juanzi Li, Yuxiao Dong, Jie Tang

    Abstract: Beginning with VisualGLM and CogVLM, we are continuously exploring VLMs in pursuit of enhanced vision-language fusion, efficient higher-resolution architecture, and broader modalities and applications. Here we propose the CogVLM2 family, a new generation of visual language models for image and video understanding including CogVLM2, CogVLM2-Video and GLM-4V. As an image understanding model, CogVLM2… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  17. arXiv:2408.16030  [pdf

    cs.SD cs.AI cs.LG eess.AS

    A Deep Learning Approach to Localizing Multi-level Airway Collapse Based on Snoring Sounds

    Authors: Ying-Chieh Hsu, Stanley Yung-Chuan Liu, Chao-Jung Huang, Chi-Wei Wu, Ren-Kai Cheng, Jane Yung-Jen Hsu, Shang-Ran Huang, Yuan-Ren Cheng, Fu-Shun Hsu

    Abstract: This study investigates the application of machine/deep learning to classify snoring sounds excited at different levels of the upper airway in patients with obstructive sleep apnea (OSA) using data from drug-induced sleep endoscopy (DISE). The snoring sounds of 39 subjects were analyzed and labeled according to the Velum, Oropharynx, Tongue Base, and Epiglottis (VOTE) classification system. The da… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  18. arXiv:2408.15632  [pdf, other

    eess.SY cs.AI

    Structural Optimization of Lightweight Bipedal Robot via SERL

    Authors: Yi Cheng, Chenxi Han, Yuheng Min, Linqi Ye, Houde Liu, Hang Liu

    Abstract: Designing a bipedal robot is a complex and challenging task, especially when dealing with a multitude of structural parameters. Traditional design methods often rely on human intuition and experience. However, such approaches are time-consuming, labor-intensive, lack theoretical guidance and hard to obtain optimal design results within vast design spaces, thus failing to full exploit the inherent… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  19. arXiv:2408.14757  [pdf, other

    cs.CV cs.LG

    Learning effective pruning at initialization from iterative pruning

    Authors: Shengkai Liu, Yaofeng Cheng, Fusheng Zha, Wei Guo, Lining Sun, Zhenshan Bing, Chenguang Yang

    Abstract: Pruning at initialization (PaI) reduces training costs by removing weights before training, which becomes increasingly crucial with the growing network size. However, current PaI methods still have a large accuracy gap with iterative pruning, especially at high sparsity levels. This raises an intriguing question: can we get inspiration from iterative pruning to improve the PaI performance? In the… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  20. arXiv:2408.12352  [pdf, other

    cs.CV

    GarmentAligner: Text-to-Garment Generation via Retrieval-augmented Multi-level Corrections

    Authors: Shiyue Zhang, Zheng Chong, Xujie Zhang, Hanhui Li, Yuhao Cheng, Yiqiang Yan, Xiaodan Liang

    Abstract: General text-to-image models bring revolutionary innovation to the fields of arts, design, and media. However, when applied to garment generation, even the state-of-the-art text-to-image models suffer from fine-grained semantic misalignment, particularly concerning the quantity, position, and interrelations of garment components. Addressing this, we propose GarmentAligner, a text-to-garment diffus… ▽ More

    Submitted 23 August, 2024; v1 submitted 22 August, 2024; originally announced August 2024.

    Comments: Accepted by ECCV 2024

  21. arXiv:2408.12076  [pdf, other

    cs.CL cs.AI

    ConflictBank: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLM

    Authors: Zhaochen Su, Jun Zhang, Xiaoye Qu, Tong Zhu, Yanshu Li, Jiashuo Sun, Juntao Li, Min Zhang, Yu Cheng

    Abstract: Large language models (LLMs) have achieved impressive advancements across numerous disciplines, yet the critical issue of knowledge conflicts, a major source of hallucinations, has rarely been studied. Only a few research explored the conflicts between the inherent knowledge of LLMs and the retrieved contextual knowledge. However, a thorough assessment of knowledge conflict in LLMs is still missin… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: Under Review

  22. arXiv:2408.11826  [pdf, other

    cs.CY cs.AI

    Generative Organizational Behavior Simulation using Large Language Model based Autonomous Agents: A Holacracy Perspective

    Authors: Chen Zhu, Yihang Cheng, Jingshuai Zhang, Yusheng Qiu, Sitao Xia, Hengshu Zhu

    Abstract: In this paper, we present the technical details and periodic findings of our project, CareerAgent, which aims to build a generative simulation framework for a Holacracy organization using Large Language Model-based Autonomous Agents. Specifically, the simulation framework includes three phases: construction, execution, and evaluation, and it incorporates basic characteristics of individuals, organ… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  23. arXiv:2408.11405  [pdf, other

    cs.SD eess.AS

    DDSP Guitar Amp: Interpretable Guitar Amplifier Modeling

    Authors: Yen-Tung Yeh, Yu-Hua Chen, Yuan-Chiao Cheng, Jui-Te Wu, Jun-Jie Fu, Yi-Fan Yeh, Yi-Hsuan Yang

    Abstract: Neural network models for guitar amplifier emulation, while being effective, often demand high computational cost and lack interpretability. Drawing ideas from physical amplifier design, this paper aims to address these issues with a new differentiable digital signal processing (DDSP)-based model, called ``DDSP guitar amp,'' that models the four components of a guitar amp (i.e., preamp, tone stack… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: Preprint paper

  24. arXiv:2408.11283  [pdf, other

    cs.PL cs.AI

    Inference Plans for Hybrid Particle Filtering

    Authors: Ellie Y. Cheng, Eric Atkinson, Guillaume Baudart, Louis Mandel, Michael Carbin

    Abstract: Advanced probabilistic programming languages (PPLs) use hybrid inference systems to combine symbolic exact inference and Monte Carlo methods to improve inference performance. These systems use heuristics to partition random variables within the program into variables that are encoded symbolically and variables that are encoded with sampled values, and the heuristics are not necessarily aligned wit… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  25. arXiv:2408.10513  [pdf, other

    physics.atom-ph

    Relativistic and Electron Correlation Effects in Static Dipole Polarizabilities for Main-Group Elements

    Authors: YingXing Cheng

    Abstract: In this study, I compute the static dipole polarizability of main-group elements using the finite-field method combined with relativistic coupled-cluster and configuration interaction simulations. The computational results closely align with the values recommended in the 2018 table of static dipole polarizabilities of neutral elements [Mol. Phys. 117, 1200 (2019)]. Additionally, I investigate the… ▽ More

    Submitted 14 September, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

  26. arXiv:2408.10199  [pdf, other

    astro-ph.SR astro-ph.GA

    Magnetic Fields in Massive Star-forming Regions (MagMaR) IV: Tracing the Magnetic Fields in the O-type protostellar system IRAS 16547$-$4247

    Authors: Luis A. Zapata, Manuel Fernández-López, Patricio Sanhueza, Josep M. Girart, Luis F. Rodríguez, Paulo Cortes, Koch Patrick, María T. Beltrán, Kate Pattle, Henrik Beuther, Piyali Saha, Wenyu Jiao, Fengwei Xu, Xing Walker Lu, Fernando Olguin, Shanghuo Li, Ian W. Stephens, Ji-hyun Kang, Yu Cheng, Spandan Choudhury, Kaho Morii, Eun Jung Chung, Jia-Wei Wang, Jihye Hwang, A-Ran Lyo , et al. (2 additional authors not shown)

    Abstract: The formation of the massive stars, and in particular, the role that the magnetic fields play in their early evolutionary phase is still far from being completely understood. Here, we present Atacama Large Millimeter/Submillimeter Array (ALMA) 1.2 mm full polarized continuum, and H$^{13}$CO$^+$(3$-$2), CS(5$-$4), and HN$^{13}$C(3$-$2) line observations with a high angular resolution ($\sim$0.4… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: Accepted by the Astrophysical Journal, 13 pages

  27. arXiv:2408.07965  [pdf, other

    quant-ph

    Efficient simulation of inhomogeneously correlated systems using block interaction product states

    Authors: Yifan Cheng, Zhaoxuan Xie, Xiaoyu Xie, Haibo Ma

    Abstract: The strength of DMRG lies in its treatment of identical sites that are energetically degenerate and spatially similar. However, this becomes a drawback when applied to quantum chemistry calculations for large systems, as entangled orbitals often span broad ranges in energy and space, with notably inhomogeneous interactions. In this study, we propose addressing strong intra-fragment and weak inter-… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  28. arXiv:2408.07490  [pdf, other

    cs.CV

    Attention-Guided Perturbation for Unsupervised Image Anomaly Detection

    Authors: Tingfeng Huang, Yuxuan Cheng, Jingbo Xia, Rui Yu, Yuxuan Cai, Jinhai Xiang, Xinwei He, Xiang Bai

    Abstract: Reconstruction-based methods have significantly advanced modern unsupervised anomaly detection. However, the strong capacity of neural networks often violates the underlying assumptions by reconstructing abnormal samples well. To alleviate this issue, we present a simple yet effective reconstruction framework named Attention-Guided Pertuation Network (AGPNet), which learns to add perturbation nois… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  29. arXiv:2408.07321  [pdf, other

    cs.SE cs.CR

    LLM-Enhanced Static Analysis for Precise Identification of Vulnerable OSS Versions

    Authors: Yiran Cheng, Lwin Khin Shar, Ting Zhang, Shouguo Yang, Chaopeng Dong, David Lo, Shichao Lv, Zhiqiang Shi, Limin Sun

    Abstract: Open-source software (OSS) has experienced a surge in popularity, attributed to its collaborative development model and cost-effective nature. However, the adoption of specific software versions in development projects may introduce security risks when these versions bring along vulnerabilities. Current methods of identifying vulnerable versions typically analyze and trace the code involved in vul… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  30. arXiv:2408.07262  [pdf, other

    cs.CV cs.AI cs.LG

    Ensemble architecture in polyp segmentation

    Authors: Hao-Yun Hsu, Yi-Ching Cheng, Guan-Hua Huang

    Abstract: In this research, we revisit the architecture of semantic segmentation and evaluate the models excelling in polyp segmentation. We introduce an integrated framework that harnesses the advantages of different models to attain an optimal outcome. More specifically, we fuse the learned features from convolutional and transformer models for prediction, and we view this approach as an ensemble techniqu… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  31. arXiv:2408.06072  [pdf, other

    cs.CV

    CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

    Authors: Zhuoyi Yang, Jiayan Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong, Jie Tang

    Abstract: We introduce CogVideoX, a large-scale diffusion transformer model designed for generating videos based on text prompts. To efficently model video data, we propose to levearge a 3D Variational Autoencoder (VAE) to compress videos along both spatial and temporal dimensions. To improve the text-video alignment, we propose an expert transformer with the expert adaptive LayerNorm to facilitate the deep… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  32. arXiv:2408.05907  [pdf

    physics.optics quant-ph

    Cryogenic nonlinear conversion processes in periodically-poled thin-film lithium niobate waveguides

    Authors: Yujie Cheng, Xiaoting Li, Lantian Feng, Haochuan Li, Wenzhao Sun, Xinyu Song, Yuyang Ding, Guangcan Guo, Cheng Wang, Xifeng Ren

    Abstract: Periodically poled thin-film lithium niobate (TFLN) waveguides, which enable efficient quadratic nonlinear processes, serve as crucial foundation for classical and quantum signal processing with photonic integrated circuits. To expand their application scope, we provide, to our best knowledge, the first investigation of nonlinear conversion processes in periodically poled TFLN waveguides at cryoge… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

  33. arXiv:2408.04808  [pdf, other

    cs.DC cs.LG

    Scaling Deep Learning Computation over the Inter-Core Connected Intelligence Processor

    Authors: Yiqi Liu, Yuqi Xue, Yu Cheng, Lingxiao Ma, Ziming Miao, Jilong Xue, Jian Huang

    Abstract: As AI chips incorporate numerous parallelized cores to scale deep learning (DL) computing, inter-core communication is enabled recently by employing high-bandwidth and low-latency interconnect links on the chip (e.g., Graphcore IPU). It allows each core to directly access the fast scratchpad memory in other cores, which enables new parallel computing paradigms. However, without proper support for… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: This paper is accepted at The 30th ACM Symposium on Operating Systems Principles (SOSP'24)

  34. arXiv:2408.03847  [pdf, other

    eess.SY

    GAIA -- A Large Language Model for Advanced Power Dispatch

    Authors: Yuheng Cheng, Huan Zhao, Xiyuan Zhou, Junhua Zhao, Yuji Cao, Chao Yang

    Abstract: Power dispatch is essential for providing stable, cost-effective, and eco-friendly electricity to society. However, traditional methods falter as power systems grow in scale and complexity, struggling with multitasking, swift problem-solving, and human-machine collaboration. This paper introduces GAIA, the pioneering Large Language Model (LLM) tailored for power dispatch tasks. We have developed a… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  35. arXiv:2408.03220  [pdf, other

    cs.LG cs.DC

    Masked Random Noise for Communication Efficient Federaetd Learning

    Authors: Shiwei Li, Yingyi Cheng, Haozhao Wang, Xing Tang, Shijie Xu, Weihong Luo, Yuhua Li, Dugang Liu, Xiuqiang He, and Ruixuan Li

    Abstract: Federated learning is a promising distributed training paradigm that effectively safeguards data privacy. However, it may involve significant communication costs, which hinders training efficiency. In this paper, we aim to enhance communication efficiency from a new perspective. Specifically, we request the distributed clients to find optimal model updates relative to global model parameters withi… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: Accepted by MM 2024

  36. arXiv:2408.00550  [pdf, other

    cs.CV cs.AI cs.CL

    Mitigating Multilingual Hallucination in Large Vision-Language Models

    Authors: Xiaoye Qu, Mingyang Song, Wei Wei, Jianfeng Dong, Yu Cheng

    Abstract: While Large Vision-Language Models (LVLMs) have exhibited remarkable capabilities across a wide range of tasks, they suffer from hallucination problems, where models generate plausible yet incorrect answers given the input image-query pair. This hallucination phenomenon is even more severe when querying the image in non-English languages, while existing methods for mitigating hallucinations in LVL… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  37. arXiv:2408.00026  [pdf, other

    astro-ph.HE astro-ph.IM

    Study of Wide-Field-of-View X-ray Observations of the Virgo Cluster Using the Lobster Eye Imager for Astronomy

    Authors: Wen-Cheng Feng, Shu-Mei Jia, Hai-Hui Zhao, Heng Yu, Hai-Wu Pan, Cheng-Kui Li, Yu-Lin Cheng, Shan-Shan Weng, Yong Chen, Yuan Liu, Zhi-Xing Ling, Chen Zhang

    Abstract: The Lobster Eye Imager for Astronomy (LEIA) is the pathfinder of the wide-field X-ray telescope used in the Einstein Probe mission. In this study, we present an image of the Virgo Cluster taken by LEIA in the 0.5-4.5 keV band with an exposure time of $\sim$17.3 ks in the central region. This extended emission is generally consistent with the results obtained by ROSAT. However, the field is affecte… ▽ More

    Submitted 31 July, 2024; originally announced August 2024.

    Comments: 9 pages, 6 figures, 1 table

  38. arXiv:2407.20875  [pdf, other

    nlin.SI

    Localized stem structures in quasi-resonant two-soliton solutions for the asymmetric Nizhnik-Novikov-Veselov system

    Authors: Feng Yuan, Jiguang Rao, Jingsong He, Yi Cheng

    Abstract: Elastic collisions of solitons generally have a finite phase shift. When the phase shift has a finitely large value, the two vertices of the (2+1)-dimensional 2-soliton are significantly separated due to the phase shift, accompanied by the formation of a local structure connecting the two V-shaped solitons. We define this local structure as the stem structure. This study systematically investigate… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: 14 pages, 6 figures;Accepted by journal of mathematical physics(July, 2024)

  39. arXiv:2407.19941  [pdf, other

    cs.LG

    Boosting Graph Foundation Model from Structural Perspective

    Authors: Yao Cheng, Yige Zhao, Jianxiang Yu, Xiang Li

    Abstract: Graph foundation models have recently attracted significant attention due to its strong generalizability. Although existing methods resort to language models to learn unified semantic representations across domains, they disregard the unique structural characteristics of graphs from different domains. To address the problem, in this paper, we boost graph foundation model from structural perspectiv… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  40. arXiv:2407.19467  [pdf, other

    cs.IR cs.LG

    Enhancing Taobao Display Advertising with Multimodal Representations: Challenges, Approaches and Insights

    Authors: Xiang-Rong Sheng, Feifan Yang, Litong Gong, Biao Wang, Zhangming Chan, Yujing Zhang, Yueyao Cheng, Yong-Nan Zhu, Tiezheng Ge, Han Zhu, Yuning Jiang, Jian Xu, Bo Zheng

    Abstract: Despite the recognized potential of multimodal data to improve model accuracy, many large-scale industrial recommendation systems, including Taobao display advertising system, predominantly depend on sparse ID features in their models. In this work, we explore approaches to leverage multimodal data to enhance the recommendation accuracy. We start from identifying the key challenges in adopting mul… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: Accepted at CIKM 2024

  41. arXiv:2407.19352  [pdf

    cs.LG q-fin.RM

    Design and Optimization of Big Data and Machine Learning-Based Risk Monitoring System in Financial Markets

    Authors: Liyang Wang, Yu Cheng, Xingxin Gu, Zhizhong Wu

    Abstract: With the increasing complexity of financial markets and rapid growth in data volume, traditional risk monitoring methods no longer suffice for modern financial institutions. This paper designs and optimizes a risk monitoring system based on big data and machine learning. By constructing a four-layer architecture, it effectively integrates large-scale financial data and advanced machine learning al… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

  42. arXiv:2407.18357  [pdf, other

    cs.RO

    Needle Segmentation Using GAN: Restoring Thin Instrument Visibility in Robotic Ultrasound

    Authors: Zhongliang Jiang, Xuesong Li, Xiangyu Chu, Angelos Karlas, Yuan Bi, Yingsheng Cheng, K. W. Samuel Au, Nassir Navab

    Abstract: Ultrasound-guided percutaneous needle insertion is a standard procedure employed in both biopsy and ablation in clinical practices. However, due to the complex interaction between tissue and instrument, the needle may deviate from the in-plane view, resulting in a lack of close monitoring of the percutaneous needle. To address this challenge, we introduce a robot-assisted ultrasound (US) imaging s… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: accepted by IEEE TIM. code: https://github.com/noseefood/NeedleSegmentation-GAN; video: https://youtu.be/4WuEP9PACs0

  43. arXiv:2407.17892  [pdf, ps, other

    cs.LG cs.AI

    An Iterative Approach to Topic Modelling

    Authors: Albert Wong, Florence Wing Yau Cheng, Ashley Keung, Yamileth Hercules, Mary Alexandra Garcia, Yew-Wei Lim, Lien Pham

    Abstract: Topic modelling has become increasingly popular for summarizing text data, such as social media posts and articles. However, topic modelling is usually completed in one shot. Assessing the quality of resulting topics is challenging. No effective methods or measures have been developed for assessing the results or for making further enhancements to the topics. In this research, we propose we propos… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  44. arXiv:2407.16654  [pdf, other

    astro-ph.GA

    Magnetic Fields in Massive Star-forming Regions (MagMaR): Unveiling an Hourglass Magnetic Field in G333.46-0.16 using ALMA

    Authors: Piyali Saha, Patricio Sanhueza, Marco Padovani, Josep M. Girart, Paulo Cortes, Kaho Morii, Junhao Liu, A. Sanchez-Monge, Daniele Galli, Shantanu Basu, Patrick M. Koch, Maria T. Beltran, Shanghuo Li, Henrik Beuther, Ian W. Stephens, Fumitaka Nakamura, Qizhou Zhang, Wenyu Jiao, M. Fernandez-Lopez, Jihye Hwang, Eun Jung Chung, Kate Pattle, Luis A. Zapata, Fengwei Xu, Fernando A. Olguin , et al. (11 additional authors not shown)

    Abstract: The contribution of the magnetic field to the formation of high-mass stars is poorly understood. We report the high-angular resolution ($\sim0.3^{\prime\prime}$, 870 au) map of the magnetic field projected on the plane of the sky (B$_\mathrm{POS}$) towards the high-mass star forming region G333.46$-$0.16 (G333), obtained with the Atacama Large Millimeter/submillimeter Array (ALMA) at 1.2 mm as par… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  45. arXiv:2407.16140  [pdf

    physics.optics

    Frequency stabilization based on H13C14N absorption in lithium niobate micro-disk laser

    Authors: Zhen Yi, Zhihao Zhang, Jianglin Guan, Guanghui Zhao, Renhong Gao, Botao Fu, Jintian Lin, Jinming Chen, Jian Liu, Yijie Pan, Ya Cheng

    Abstract: We demonstrate an on-chip lithium niobate micro-disk laser based on hydrogen cyanide (H13C14N) gas saturation absorption method for frequency stabilization. The laser chip consists of two main components: a micro-disk laser and a combined racetrack ring cavity. By operating on the H13C14N P12 absorption line at 1551.3 nm, the laser frequency can be precisely stabilized. The laser demonstrates rema… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  46. arXiv:2407.15795  [pdf, other

    cs.CV

    AdaCLIP: Adapting CLIP with Hybrid Learnable Prompts for Zero-Shot Anomaly Detection

    Authors: Yunkang Cao, Jiangning Zhang, Luca Frittoli, Yuqi Cheng, Weiming Shen, Giacomo Boracchi

    Abstract: Zero-shot anomaly detection (ZSAD) targets the identification of anomalies within images from arbitrary novel categories. This study introduces AdaCLIP for the ZSAD task, leveraging a pre-trained vision-language model (VLM), CLIP. AdaCLIP incorporates learnable prompts into CLIP and optimizes them through training on auxiliary annotated anomaly detection data. Two types of learnable prompts are pr… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  47. arXiv:2407.15451  [pdf, other

    cs.CV

    Domain-Adaptive 2D Human Pose Estimation via Dual Teachers in Extremely Low-Light Conditions

    Authors: Yihao Ai, Yifei Qi, Bo Wang, Yu Cheng, Xinchao Wang, Robby T. Tan

    Abstract: Existing 2D human pose estimation research predominantly concentrates on well-lit scenarios, with limited exploration of poor lighting conditions, which are a prevalent aspect of daily life. Recent studies on low-light pose estimation require the use of paired well-lit and low-light images with ground truths for training, which are impractical due to the inherent challenges associated with annotat… ▽ More

    Submitted 23 July, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

    Comments: 18 pages, 3 figure. Accepted by ECCV24

  48. arXiv:2407.15226  [pdf, other

    eess.SP eess.SY

    Variation Bayesian Interference for Multiple Extended Targets or Unresolved Group Targets Tracking

    Authors: Yuanhao Cheng, Yunhe Cao, Tat-Soon Yeo, Yulin Zhang, Fu Jie

    Abstract: In this work, we propose a tracking method for multiple extended targets or unresolvable group targets based on the Variational Bayesian Inference (VBI). Firstly, based on the most commonly used Random Matrix Model (RMM), the joint states of a single target are modeled as a Gamma Gaussian Inverse Wishart (GGIW) distribution, and the multi-target joint association variables are involved in the esti… ▽ More

    Submitted 6 August, 2024; v1 submitted 21 July, 2024; originally announced July 2024.

    Comments: 21 pages, 15 figures, 3 tables

  49. arXiv:2407.15225  [pdf

    physics.optics physics.app-ph

    An electro-optically tunable arrayed waveguide grating fabricated on thin film lithium niobate

    Authors: Zhe Wang, 1 Zhiwei Fang, Yiran Zhu, Jian Liu, Lang Gao, Jianping Yu, Haisu Zhang, Min Wang, Ya Cheng

    Abstract: We design and fabricate an 8-channel thin film lithium niobate (TFLN) arrayed-waveguide grating (AWG) and demonstrate the electro-optical tunability of the device. The monolithically integrated microelectrodes are designed for waveguides phase modulation and wavelength tunning. Experiments show that the fabricated electro-optically controlled TFLN AWG has a channel spacing of 200 GHz and a wavelen… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

  50. arXiv:2407.14601  [pdf, other

    astro-ph.IM

    ANDES, the high resolution spectrograph for the ELT: science goals, project overview and future developments

    Authors: A. Marconi, M. Abreu, V. Adibekyan, V. Alberti, S. Albrecht, J. Alcaniz, M. Aliverti, C. Allende Prieto, J. D. Alvarado Gómez, C. S. Alves, P. J. Amado, M. Amate, M. I. Andersen, S. Antoniucci, E. Artigau, C. Bailet, C. Baker, V. Baldini, A. Balestra, S. A. Barnes, F. Baron, S. C. C. Barros, S. M. Bauer, M. Beaulieu, O. Bellido-Tirado , et al. (264 additional authors not shown)

    Abstract: The first generation of ELT instruments includes an optical-infrared high-resolution spectrograph, indicated as ELT-HIRES and recently christened ANDES (ArmazoNes high Dispersion Echelle Spectrograph). ANDES consists of three fibre-fed spectrographs ([U]BV, RIZ, YJH) providing a spectral resolution of $\sim$100,000 with a minimum simultaneous wavelength coverage of 0.4-1.8 $μ$m with the goal of ex… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: SPIE astronomical telescope and instrumentation 2024, in press