Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 691 results for author: Sun, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.18690  [pdf, other

    cs.AI

    Collaborative Evolving Strategy for Automatic Data-Centric Development

    Authors: Xu Yang, Haotian Chen, Wenjun Feng, Haoxue Wang, Zeqi Ye, Xinjie Shen, Xiao Yang, Shizhao Sun, Weiqing Liu, Jiang Bian

    Abstract: Artificial Intelligence (AI) significantly influences many fields, largely thanks to the vast amounts of high-quality data for machine learning models. The emphasis is now on a data-centric AI strategy, prioritizing data development over model design progress. Automating this process is crucial. In this paper, we serve as the first work to introduce the automatic data-centric development (AD^2) ta… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: 23 pages, 7 figures

  2. arXiv:2407.18039  [pdf, other

    cs.LG cs.AI

    Peak-Controlled Logits Poisoning Attack in Federated Distillation

    Authors: Yuhan Tang, Aoxu Zhang, Zhiyuan Wu, Bo Gao, Tian Wen, Yuwei Wang, Sheng Sun

    Abstract: Federated Distillation (FD) offers an innovative approach to distributed machine learning, leveraging knowledge distillation for efficient and flexible cross-device knowledge transfer without necessitating the upload of extensive model parameters to a central server. While FD has gained popularity, its vulnerability to poisoning attacks remains underexplored. To address this gap, we previously int… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2401.03685

  3. arXiv:2407.15309  [pdf, other

    cs.DC cs.LG

    vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving

    Authors: Jiale Xu, Rui Zhang, Cong Guo, Weiming Hu, Zihan Liu, Feiyang Wu, Yu Feng, Shixuan Sun, Changxu Shao, Yuhong Guo, Junping Zhao, Ke Zhang, Minyi Guo, Jingwen Leng

    Abstract: Large Language Models (LLMs) are widely used across various domains, processing millions of daily requests. This surge in demand poses significant challenges in optimizing throughput and latency while keeping costs manageable. The Key-Value (KV) cache, a standard method for retaining previous computations, makes LLM inference highly bounded by memory. While batching strategies can enhance performa… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: 16 pages, 12 figures

  4. arXiv:2407.13976  [pdf, other

    cs.CV

    PlacidDreamer: Advancing Harmony in Text-to-3D Generation

    Authors: Shuo Huang, Shikun Sun, Zixuan Wang, Xiaoyu Qin, Yanmin Xiong, Yuan Zhang, Pengfei Wan, Di Zhang, Jia Jia

    Abstract: Recently, text-to-3D generation has attracted significant attention, resulting in notable performance enhancements. Previous methods utilize end-to-end 3D generation models to initialize 3D Gaussians, multi-view diffusion models to enforce multi-view consistency, and text-to-image diffusion models to refine details with score distillation algorithms. However, these methods exhibit two limitations.… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM Multimedia 2024

    ACM Class: I.4.0

  5. arXiv:2407.11998  [pdf, other

    cs.HC

    Custom Cloth Creation and Virtual Try-on for Everyone

    Authors: Pei Chen, Heng Wang, Sainan Sun, Zhiyuan Chen, Zhenkun Liu, Shuhua Cao, Li Yang, Minghui Yang

    Abstract: This demo showcases a simple tool that utilizes AIGC technology, enabling both professional designers and regular users to easily customize clothing for their digital avatars. Customization options include changing clothing colors, textures, logos, and patterns. Compared with traditional 3D modeling processes, our approach significantly enhances efficiency and interactivity and reduces production… ▽ More

    Submitted 13 June, 2024; originally announced July 2024.

  6. arXiv:2407.11440  [pdf, other

    cs.SE

    End-user Comprehension of Transfer Risks in Smart Contracts

    Authors: Yustynn Panicker, Ezekiel Soremekun, Sumei Sun, Sudipta Chattopadhyay

    Abstract: Smart contracts are increasingly used in critical use cases (e.g., financial transactions). Thus, it is pertinent to ensure that end-users understand the transfer risks in smart contracts. To address this, we investigate end-user comprehension of risks in the most popular Ethereum smart contract (i.e., USD Tether (USDT)) and their prevalence in the top ERC-20 smart contracts. We focus on five tran… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  7. arXiv:2407.10172  [pdf, other

    cs.CV

    Restoring Images in Adverse Weather Conditions via Histogram Transformer

    Authors: Shangquan Sun, Wenqi Ren, Xinwei Gao, Rui Wang, Xiaochun Cao

    Abstract: Transformer-based image restoration methods in adverse weather have achieved significant progress. Most of them use self-attention along the channel dimension or within spatially fixed-range blocks to reduce computational load. However, such a compromise results in limitations in capturing long-range spatial features. Inspired by the observation that the weather-induced degradation factors mainly… ▽ More

    Submitted 25 July, 2024; v1 submitted 14 July, 2024; originally announced July 2024.

    Comments: 19 pages, 7 figures, 10MB

  8. arXiv:2407.09958  [pdf, other

    cs.CR cs.LG

    Partner in Crime: Boosting Targeted Poisoning Attacks against Federated Learning

    Authors: Shihua Sun, Shridatt Sugrim, Angelos Stavrou, Haining Wang

    Abstract: Federated Learning (FL) exposes vulnerabilities to targeted poisoning attacks that aim to cause misclassification specifically from the source class to the target class. However, using well-established defense frameworks, the poisoning impact of these attacks can be greatly mitigated. We introduce a generalized pre-training stage approach to Boost Targeted Poisoning Attacks against FL, called BoTP… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  9. arXiv:2407.03741  [pdf, other

    cs.IT

    A Unified Expression for Upper Bounds on the BLER of Spinal Codes over Fading Channels

    Authors: Aimin Li, Xiaomeng Chen, Shaohua Wu, Gary C. F. Lee, Sumei Sun

    Abstract: Performance evaluation of particular channel coding has been a significant topic in coding theory, often involving the use of bounding techniques. This paper focuses on the new family of capacity-achieving codes, Spinal codes, to provide a comprehensive analysis framework to tightly upper bound the block error rate (BLER) of Spinal codes in the finite block length (FBL) regime. First, we resort to… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  10. arXiv:2407.01046  [pdf, other

    cs.AI cs.CL

    FRoG: Evaluating Fuzzy Reasoning of Generalized Quantifiers in Large Language Models

    Authors: Yiyuan Li, Shichao Sun, Pengfei Liu

    Abstract: Fuzzy reasoning is vital due to the frequent use of imprecise information in daily contexts. However, the ability of current large language models (LLMs) to handle such reasoning remains largely uncharted. In this paper, we introduce a new benchmark, FRoG, for fuzzy reasoning, featuring real-world mathematical word problems that incorporate generalized quantifiers. Our experimental findings reveal… ▽ More

    Submitted 2 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

    Comments: Under review

  11. arXiv:2406.19651  [pdf, other

    cs.DB cs.AI

    CANDY: A Benchmark for Continuous Approximate Nearest Neighbor Search with Dynamic Data Ingestion

    Authors: Xianzhi Zeng, Zhuoyan Wu, Xinjing Hu, Xuanhua Shi, Shixuan Sun, Shuhao Zhang

    Abstract: Approximate K Nearest Neighbor (AKNN) algorithms play a pivotal role in various AI applications, including information retrieval, computer vision, and natural language processing. Although numerous AKNN algorithms and benchmarks have been developed recently to evaluate their effectiveness, the dynamic nature of real-world data presents significant challenges that existing benchmarks fail to addres… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  12. arXiv:2406.19371  [pdf, other

    cs.CL

    Suri: Multi-constraint Instruction Following for Long-form Text Generation

    Authors: Chau Minh Pham, Simeng Sun, Mohit Iyyer

    Abstract: Existing research on instruction following largely focuses on tasks with simple instructions and short responses. In this work, we explore multi-constraint instruction following for generating long-form text. We create Suri, a dataset with 20K human-written long-form texts paired with LLM-generated backtranslated instructions that contain multiple complex constraints. Because of prohibitive challe… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  13. arXiv:2406.18862  [pdf, other

    cs.SD eess.AS

    Streaming Decoder-Only Automatic Speech Recognition with Discrete Speech Units: A Pilot Study

    Authors: Peikun Chen, Sining Sun, Changhao Shan, Qing Yang, Lei Xie

    Abstract: Unified speech-text models like SpeechGPT, VioLA, and AudioPaLM have shown impressive performance across various speech-related tasks, especially in Automatic Speech Recognition (ASR). These models typically adopt a unified method to model discrete speech and text tokens, followed by training a decoder-only transformer. However, they are all designed for non-streaming ASR tasks, where the entire s… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted for Interspeech 2024

  14. arXiv:2406.16020  [pdf, other

    cs.SD cs.CL eess.AS

    AudioBench: A Universal Benchmark for Audio Large Language Models

    Authors: Bin Wang, Xunlong Zou, Geyu Lin, Shuo Sun, Zhuohan Liu, Wenyu Zhang, Zhengyuan Liu, AiTi Aw, Nancy F. Chen

    Abstract: We introduce AudioBench, a new benchmark designed to evaluate audio large language models (AudioLLMs). AudioBench encompasses 8 distinct tasks and 26 carefully selected or newly curated datasets, focusing on speech understanding, voice interpretation, and audio scene understanding. Despite the rapid advancement of large language models, including multimodal versions, a significant gap exists in co… ▽ More

    Submitted 25 June, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

    Comments: 20 pages; v2 - typo update; Code: https://github.com/AudioLLMs/AudioBench

  15. arXiv:2406.14117  [pdf, other

    cs.IR cs.CL

    An Investigation of Prompt Variations for Zero-shot LLM-based Rankers

    Authors: Shuoqi Sun, Shengyao Zhuang, Shuai Wang, Guido Zuccon

    Abstract: We provide a systematic understanding of the impact of specific components and wordings used in prompts on the effectiveness of rankers based on zero-shot Large Language Models (LLMs). Several zero-shot ranking methods based on LLMs have recently been proposed. Among many aspects, methods differ across (1) the ranking algorithm they implement, e.g., pointwise vs. listwise, (2) the backbone LLMs us… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  16. arXiv:2406.12753  [pdf, other

    cs.CL cs.AI

    OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI

    Authors: Zhen Huang, Zengzhi Wang, Shijie Xia, Xuefeng Li, Haoyang Zou, Ruijie Xu, Run-Ze Fan, Lyumanshan Ye, Ethan Chern, Yixin Ye, Yikai Zhang, Yuqing Yang, Ting Wu, Binjie Wang, Shichao Sun, Yang Xiao, Yiyuan Li, Fan Zhou, Steffi Chern, Yiwei Qin, Yan Ma, Jiadi Su, Yixiu Liu, Yuxiang Zheng, Shaoting Zhang , et al. (3 additional authors not shown)

    Abstract: The evolution of Artificial Intelligence (AI) has been significantly accelerated by advancements in Large Language Models (LLMs) and Large Multimodal Models (LMMs), gradually showcasing potential cognitive reasoning abilities in problem-solving and scientific discovery (i.e., AI4Science) once exclusive to human intellect. To comprehensively evaluate current models' performance in cognitive reasoni… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 44 pages

  17. arXiv:2406.11704  [pdf, other

    cs.CL cs.AI cs.LG

    Nemotron-4 340B Technical Report

    Authors: Nvidia, :, Bo Adler, Niket Agarwal, Ashwath Aithal, Dong H. Anh, Pallab Bhattacharya, Annika Brundyn, Jared Casper, Bryan Catanzaro, Sharon Clay, Jonathan Cohen, Sirshak Das, Ayush Dattagupta, Olivier Delalleau, Leon Derczynski, Yi Dong, Daniel Egert, Ellie Evans, Aleksander Ficek, Denys Fridman, Shaona Ghosh, Boris Ginsburg, Igor Gitman, Tomasz Grzegorzek , et al. (58 additional authors not shown)

    Abstract: We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward. Our models are open access under the NVIDIA Open Model License Agreement, a permissive model license that allows distribution, modification, and use of the models and its outputs. These models perform competitively to open access models on a wide range of evaluation be… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  18. arXiv:2406.10391  [pdf, other

    q-bio.QM cs.LG

    BEACON: Benchmark for Comprehensive RNA Tasks and Language Models

    Authors: Yuchen Ren, Zhiyuan Chen, Lifeng Qiao, Hongtai Jing, Yuchen Cai, Sheng Xu, Peng Ye, Xinzhu Ma, Siqi Sun, Hongliang Yan, Dong Yuan, Wanli Ouyang, Xihui Liu

    Abstract: RNA plays a pivotal role in translating genetic instructions into functional outcomes, underscoring its importance in biological processes and disease mechanisms. Despite the emergence of numerous deep learning approaches for RNA, particularly universal RNA language models, there remains a significant lack of standardized benchmarks to assess the effectiveness of these methods. In this study, we i… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  19. arXiv:2406.10118  [pdf, other

    cs.CL

    SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages

    Authors: Holy Lovenia, Rahmad Mahendra, Salsabil Maulana Akbar, Lester James V. Miranda, Jennifer Santoso, Elyanah Aco, Akhdan Fadhilah, Jonibek Mansurov, Joseph Marvin Imperial, Onno P. Kampman, Joel Ruben Antony Moniz, Muhammad Ravi Shulthan Habibi, Frederikus Hudi, Railey Montalan, Ryan Ignatius, Joanito Agili Lopo, William Nixon, Börje F. Karlsson, James Jaya, Ryandito Diandaru, Yuze Gao, Patrick Amadeus, Bin Wang, Jan Christian Blaise Cruz, Chenxi Whitehouse , et al. (36 additional authors not shown)

    Abstract: Southeast Asia (SEA) is a region rich in linguistic diversity and cultural variety, with over 1,300 indigenous languages and a population of 671 million people. However, prevailing AI models suffer from a significant lack of representation of texts, images, and audio datasets from SEA, compromising the quality of AI models for SEA languages. Evaluating models for SEA languages is challenging due t… ▽ More

    Submitted 8 July, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: https://github.com/SEACrowd

  20. arXiv:2406.07399  [pdf, other

    cs.LG eess.SP

    Redefining Automotive Radar Imaging: A Domain-Informed 1D Deep Learning Approach for High-Resolution and Efficient Performance

    Authors: Ruxin Zheng, Shunqiao Sun, Holger Caesar, Honglei Chen, Jian Li

    Abstract: Millimeter-wave (mmWave) radars are indispensable for perception tasks of autonomous vehicles, thanks to their resilience in challenging weather conditions. Yet, their deployment is often limited by insufficient spatial resolution for precise semantic scene interpretation. Classical super-resolution techniques adapted from optical imaging inadequately address the distinct characteristics of radar… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  21. arXiv:2406.05962  [pdf, other

    cs.DC cs.DB

    Data Caching for Enterprise-Grade Petabyte-Scale OLAP

    Authors: Chunxu Tang, Bin Fan, Jing Zhao, Chen Liang, Yi Wang, Beinan Wang, Ziyue Qiu, Lu Qiu, Bowen Ding, Shouzhuo Sun, Saiguang Che, Jiaming Mai, Shouwei Chen, Yu Zhu, Jianjian Xie, Yutian, Sun, Yao Li, Yangjun Zhang, Ke Wang, Mingmin Chen

    Abstract: With the exponential growth of data and evolving use cases, petabyte-scale OLAP data platforms are increasingly adopting a model that decouples compute from storage. This shift, evident in organizations like Uber and Meta, introduces operational challenges including massive, read-heavy I/O traffic with potential throttling, as well as skewed and fragmented data access patterns. Addressing these ch… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted to the USENIX Annual Technical Conference (USENIX ATC) 2024

  22. arXiv:2406.05477  [pdf, other

    cs.CV cs.LG

    Attri-Net: A Globally and Locally Inherently Interpretable Model for Multi-Label Classification Using Class-Specific Counterfactuals

    Authors: Susu Sun, Stefano Woerner, Andreas Maier, Lisa M. Koch, Christian F. Baumgartner

    Abstract: Interpretability is crucial for machine learning algorithms in high-stakes medical applications. However, high-performing neural networks typically cannot explain their predictions. Post-hoc explanation methods provide a way to understand neural networks but have been shown to suffer from conceptual problems. Moreover, current research largely focuses on providing local explanations for individual… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: Extension of paper: Inherently Interpretable Multi-Label Classification Using Class-Specific Counterfactuals (Sun et al., MIDL 2023)

  23. arXiv:2406.04485  [pdf, other

    cs.AI cs.CV

    GenAI Arena: An Open Evaluation Platform for Generative Models

    Authors: Dongfu Jiang, Max Ku, Tianle Li, Yuansheng Ni, Shizhuo Sun, Rongqi Fan, Wenhu Chen

    Abstract: Generative AI has made remarkable strides to revolutionize fields such as image and video generation. These advancements are driven by innovative algorithms, architecture, and data. However, the rapid proliferation of generative models has highlighted a critical gap: the absence of trustworthy evaluation metrics. Current automatic assessments such as FID, CLIP, FVD, etc often fail to capture the n… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 9 pages,7 figures

  24. arXiv:2406.02622  [pdf, other

    cs.CR cs.AI

    Safeguarding Large Language Models: A Survey

    Authors: Yi Dong, Ronghui Mu, Yanghao Zhang, Siqi Sun, Tianle Zhang, Changshun Wu, Gaojie Jin, Yi Qi, Jinwei Hu, Jie Meng, Saddek Bensalem, Xiaowei Huang

    Abstract: In the burgeoning field of Large Language Models (LLMs), developing a robust safety mechanism, colloquially known as "safeguards" or "guardrails", has become imperative to ensure the ethical use of LLMs within prescribed boundaries. This article provides a systematic literature review on the current status of this critical mechanism. It discusses its major challenges and how it can be enhanced int… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: under review. arXiv admin note: text overlap with arXiv:2402.01822

  25. arXiv:2406.01937  [pdf, other

    cs.IT eess.SP

    Cramér-Rao Bound Analysis and Beamforming Design for Integrated Sensing and Communication with Extended Targets

    Authors: Yiqiu Wang, Meixia Tao, Shu Sun

    Abstract: This paper studies an integrated sensing and communication (ISAC) system, where a multi-antenna base station transmits beamformed signals for joint downlink multi-user communication and radar sensing of an extended target (ET). By considering echo signals as reflections from valid elements on the ET contour, a set of novel Cramér-Rao bounds (CRBs) is derived for parameter estimation of the ET, inc… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Submitted to IEEE Transactions on Wireless Communications. arXiv admin note: text overlap with arXiv:2312.10641

  26. arXiv:2406.00507  [pdf, other

    cs.CL cs.AI

    Prompt Chaining or Stepwise Prompt? Refinement in Text Summarization

    Authors: Shichao Sun, Ruifeng Yuan, Ziqiang Cao, Wenjie Li, Pengfei Liu

    Abstract: Large language models (LLMs) have demonstrated the capacity to improve summary quality by mirroring a human-like iterative process of critique and refinement starting from the initial draft. Two strategies are designed to perform this iterative process: Prompt Chaining and Stepwise Prompt. Prompt chaining orchestrates the drafting, critiquing, and refining phases through a series of three discrete… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: Accepted to Findings of ACL 2024

  27. arXiv:2405.19567  [pdf, other

    cs.AI cs.CL cs.CV cs.LG

    Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical Grounding

    Authors: Shenghuan Sun, Gregory M. Goldgof, Alexander Schubert, Zhiqing Sun, Thomas Hartvigsen, Atul J. Butte, Ahmed Alaa

    Abstract: Vision-Language Models (VLM) can support clinicians by analyzing medical images and engaging in natural language interactions to assist in diagnostic and treatment tasks. However, VLMs often exhibit "hallucinogenic" behavior, generating textual outputs not grounded in contextual multimodal information. This challenge is particularly pronounced in the medical domain, where we do not only require VL… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Code available at: https://github.com/AlaaLab/Dr-LLaVA

  28. arXiv:2405.18844  [pdf, other

    cs.IT eess.SP

    Optical IRS for Visible Light Communication: Modeling, Design, and Open Issues

    Authors: Shiyuan Sun, Fang Yang, Weidong Mei, Jian Song, Zhu Han, Rui Zhang

    Abstract: Optical intelligent reflecting surface (OIRS) offers a new and effective approach to resolving the line-of-sight blockage issue in visible light communication (VLC) by enabling redirection of light to bypass obstacles, thereby dramatically enhancing indoor VLC coverage and reliability. This article provides a comprehensive overview of OIRS for VLC, including channel modeling, design techniques, an… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  29. arXiv:2405.18536  [pdf, other

    cs.LG

    Data-Driven Simulator for Mechanical Circulatory Support with Domain Adversarial Neural Process

    Authors: Sophia Sun, Wenyuan Chen, Zihao Zhou, Sonia Fereidooni, Elise Jortberg, Rose Yu

    Abstract: Mechanical Circulatory Support (MCS) devices, implemented as a probabilistic deep sequence model. Existing mechanical simulators for MCS rely on oversimplifying assumptions and are insensitive to patient-specific behavior, limiting their applicability to real-world treatment scenarios. To address these shortcomings, our model Domain Adversarial Neural Process (DANP) employs a neural process archit… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  30. arXiv:2405.16893  [pdf, other

    cs.IT eess.SP

    Cross Far- and Near-Field Channel Measurement and Modeling in Extremely Large-scale Antenna Array (ELAA) Systems

    Authors: Yiqin Wang, Chong Han, Shu Sun, Jianhua Zhang

    Abstract: Technologies like ultra-massive multiple-input-multiple-output (UM-MIMO) and reconfigurable intelligent surfaces (RISs) are of special interest to meet the key performance indicators of future wireless systems including ubiquitous connectivity and lightning-fast data rates. One of their common features, the extremely large-scale antenna array (ELAA) systems with hundreds or thousands of antennas,… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 14 pages, 33 figures

  31. arXiv:2405.16557  [pdf, other

    cs.LG cs.AI

    Scalable Numerical Embeddings for Multivariate Time Series: Enhancing Healthcare Data Representation Learning

    Authors: Chun-Kai Huang, Yi-Hsien Hsieh, Ta-Jung Chien, Li-Cheng Chien, Shao-Hua Sun, Tung-Hung Su, Jia-Horng Kao, Che Lin

    Abstract: Multivariate time series (MTS) data, when sampled irregularly and asynchronously, often present extensive missing values. Conventional methodologies for MTS analysis tend to rely on temporal embeddings based on timestamps that necessitate subsequent imputations, yet these imputed values frequently deviate substantially from their actual counterparts, thereby compromising prediction accuracy. Furth… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  32. arXiv:2405.16450  [pdf, other

    cs.LG cs.AI cs.PL

    Synthesizing Programmatic Reinforcement Learning Policies with Large Language Model Guided Search

    Authors: Max Liu, Chan-Hung Yu, Wei-Hsu Lee, Cheng-Wei Hung, Yen-Chun Chen, Shao-Hua Sun

    Abstract: Programmatic reinforcement learning (PRL) has been explored for representing policies through programs as a means to achieve interpretability and generalization. Despite promising outcomes, current state-of-the-art PRL methods are hindered by sample inefficiency, necessitating tens of millions of program-environment interactions. To tackle this challenge, we introduce a novel LLM-guided search fra… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  33. arXiv:2405.16194  [pdf, other

    cs.LG cs.AI cs.RO

    Diffusion-Reward Adversarial Imitation Learning

    Authors: Chun-Mao Lai, Hsiang-Chun Wang, Ping-Chun Hsieh, Yu-Chiang Frank Wang, Min-Hung Chen, Shao-Hua Sun

    Abstract: Imitation learning aims to learn a policy from observing expert demonstrations without access to reward signals from environments. Generative adversarial imitation learning (GAIL) formulates imitation learning as adversarial learning, employing a generator policy learning to imitate expert behaviors and discriminator learning to distinguish the expert demonstrations from agent trajectories. Despit… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  34. arXiv:2405.14796  [pdf, ps, other

    cs.CV cs.AI q-bio.QM

    Generative Plant Growth Simulation from Sequence-Informed Environmental Conditions

    Authors: Mohamed Debbagh, Yixue Liu, Zhouzhou Zheng, Xintong Jiang, Shangpeng Sun, Mark Lefsrud

    Abstract: A plant growth simulation can be characterized as a reconstructed visual representation of a plant or plant system. The phenotypic characteristics and plant structures are controlled by the scene environment and other contextual attributes. Considering the temporal dependencies and compounding effects of various factors on growth trajectories, we formulate a probabilistic approach to the simulatio… ▽ More

    Submitted 9 July, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  35. arXiv:2405.13378  [pdf, other

    cs.LG

    FedCache 2.0: Exploiting the Potential of Distilled Data in Knowledge Cache-driven Federated Learning

    Authors: Quyang Pan, Sheng Sun, Zhiyuan Wu, Yuwei Wang, Min Liu, Bo Gao

    Abstract: Federated Edge Learning (FEL) has emerged as a promising approach for enabling edge devices to collaboratively train machine learning models while preserving data privacy. Despite its advantages, practical FEL deployment faces significant challenges related to device constraints and device-server interactions, necessitating heterogeneous, user-adaptive model training with limited and uncertain com… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 20 pages, 8 figures, 10 tables

  36. arXiv:2405.11629  [pdf, other

    cs.CV cs.AI

    Searching Realistic-Looking Adversarial Objects For Autonomous Driving Systems

    Authors: Shengxiang Sun, Shenzhe Zhu

    Abstract: Numerous studies on adversarial attacks targeting self-driving policies fail to incorporate realistic-looking adversarial objects, limiting real-world applicability. Building upon prior research that facilitated the transition of adversarial objects from simulations to practical applications, this paper discusses a modified gradient-based texture optimization method to discover realistic-looking a… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

  37. arXiv:2405.06373  [pdf, other

    cs.CL cs.AI

    LLM Discussion: Enhancing the Creativity of Large Language Models via Discussion Framework and Role-Play

    Authors: Li-Chun Lu, Shou-Jen Chen, Tsung-Min Pai, Chan-Hung Yu, Hung-yi Lee, Shao-Hua Sun

    Abstract: Large language models (LLMs) have shown exceptional proficiency in natural language processing but often fall short of generating creative and original responses to open-ended questions. To enhance LLM creativity, our key insight is to emulate the human process of inducing collective creativity through engaging discussions with participants from diverse backgrounds and perspectives. To this end, w… ▽ More

    Submitted 18 May, 2024; v1 submitted 10 May, 2024; originally announced May 2024.

    Comments: 10 pages, 6 figures, Under review as a conference paper at COLM 2024

  38. arXiv:2405.05130  [pdf, other

    cs.CV cs.MM

    Multi-scale Bottleneck Transformer for Weakly Supervised Multimodal Violence Detection

    Authors: Shengyang Sun, Xiaojin Gong

    Abstract: Weakly supervised multimodal violence detection aims to learn a violence detection model by leveraging multiple modalities such as RGB, optical flow, and audio, while only video-level annotations are available. In the pursuit of effective multimodal violence detection (MVD), information redundancy, modality imbalance, and modality asynchrony are identified as three key challenges. In this work, we… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Accepted by ICME 2024

  39. Audio Matters Too! Enhancing Markerless Motion Capture with Audio Signals for String Performance Capture

    Authors: Yitong Jin, Zhiping Qiu, Yi Shi, Shuangpeng Sun, Chongwu Wang, Donghao Pan, Jiachen Zhao, Zhenghao Liang, Yuan Wang, Xiaobing Li, Feng Yu, Tao Yu, Qionghai Dai

    Abstract: In this paper, we touch on the problem of markerless multi-modal human motion capture especially for string performance capture which involves inherently subtle hand-string contacts and intricate movements. To fulfill this goal, we first collect a dataset, named String Performance Dataset (SPD), featuring cello and violin performances. The dataset includes videos captured from up to 23 different v… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: SIGGRAPH2024

  40. arXiv:2405.03524  [pdf, other

    cs.AI

    Exploring knowledge graph-based neural-symbolic system from application perspective

    Authors: Shenzhe Zhu, Shengxiang Sun

    Abstract: Advancements in Artificial Intelligence (AI) and deep neural networks have driven significant progress in vision and text processing. However, achieving human-like reasoning and interpretability in AI systems remains a substantial challenge. The Neural-Symbolic paradigm, which integrates neural networks with symbolic systems, presents a promising pathway toward more interpretable AI. Within this p… ▽ More

    Submitted 29 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

  41. arXiv:2405.02042  [pdf, other

    cs.IT

    Sampling to Achieve the Goal: An Age-aware Remote Markov Decision Process

    Authors: Aimin Li, Shaohua Wu, Gary C. F. Lee, Xiaomeng Cheng, Sumei Sun

    Abstract: Age of Information (AoI) has been recognized as an important metric to measure the freshness of information. Central to this consensus is that minimizing AoI can enhance the freshness of information, thereby facilitating the accuracy of subsequent decision-making processes. However, to date the direct causal relationship that links AoI to the utility of the decision-making process is unexplored. T… ▽ More

    Submitted 11 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

    Comments: 12 pages, 4 figures

  42. arXiv:2405.01481  [pdf, other

    cs.CL cs.AI cs.LG

    NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment

    Authors: Gerald Shen, Zhilin Wang, Olivier Delalleau, Jiaqi Zeng, Yi Dong, Daniel Egert, Shengyang Sun, Jimmy Zhang, Sahil Jain, Ali Taghibakhshi, Markel Sanz Ausin, Ashwath Aithal, Oleksii Kuchaiev

    Abstract: Aligning Large Language Models (LLMs) with human values and preferences is essential for making them helpful and safe. However, building efficient tools to perform alignment can be challenging, especially for the largest and most competent LLMs which often contain tens or hundreds of billions of parameters. We create NeMo-Aligner, a toolkit for model alignment that can efficiently scale to using h… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 13 pages, 4 figures

  43. arXiv:2405.00900  [pdf, other

    cs.CV

    LidaRF: Delving into Lidar for Neural Radiance Field on Street Scenes

    Authors: Shanlin Sun, Bingbing Zhuang, Ziyu Jiang, Buyu Liu, Xiaohui Xie, Manmohan Chandraker

    Abstract: Photorealistic simulation plays a crucial role in applications such as autonomous driving, where advances in neural radiance fields (NeRFs) may allow better scalability through the automatic creation of digital 3D assets. However, reconstruction quality suffers on street scenes due to largely collinear camera motions and sparser samplings at higher speeds. On the other hand, the application often… ▽ More

    Submitted 4 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

    Comments: CVPR2024 Highlights

  44. arXiv:2405.00797  [pdf, other

    cs.RO cs.CV

    ADM: Accelerated Diffusion Model via Estimated Priors for Robust Motion Prediction under Uncertainties

    Authors: Jiahui Li, Tianle Shen, Zekai Gu, Jiawei Sun, Chengran Yuan, Yuhang Han, Shuo Sun, Marcelo H. Ang Jr

    Abstract: Motion prediction is a challenging problem in autonomous driving as it demands the system to comprehend stochastic dynamics and the multi-modal nature of real-world agent interactions. Diffusion models have recently risen to prominence, and have proven particularly effective in pedestrian motion prediction tasks. However, the significant time consumption and sensitivity to noise have limited the r… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: 7 pages, 4 figures

  45. arXiv:2404.16687  [pdf, other

    cs.CV

    NTIRE 2024 Quality Assessment of AI-Generated Content Challenge

    Authors: Xiaohong Liu, Xiongkuo Min, Guangtao Zhai, Chunyi Li, Tengchuan Kou, Wei Sun, Haoning Wu, Yixuan Gao, Yuqin Cao, Zicheng Zhang, Xiele Wu, Radu Timofte, Fei Peng, Huiyuan Fu, Anlong Ming, Chuanming Wang, Huadong Ma, Shuai He, Zifei Dou, Shu Chen, Huacong Zhang, Haiyi Xie, Chengwei Wang, Baoying Chen, Jishen Zeng , et al. (89 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Conte… ▽ More

    Submitted 7 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  46. arXiv:2404.15010  [pdf, other

    cs.CV

    X-3D: Explicit 3D Structure Modeling for Point Cloud Recognition

    Authors: Shuofeng Sun, Yongming Rao, Jiwen Lu, Haibin Yan

    Abstract: Numerous prior studies predominantly emphasize constructing relation vectors for individual neighborhood points and generating dynamic kernels for each vector and embedding these into high-dimensional spaces to capture implicit local structures. However, we contend that such implicit high-dimensional structure modeling approch inadequately represents the local geometric structure of point clouds d… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Journal ref: The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024

  47. arXiv:2404.14778  [pdf, other

    cs.IT eess.SP

    Channel Estimation for Optical Intelligent Reflecting Surface-Assisted VLC System: A Joint Space-Time Sampling Approach

    Authors: Shiyuan Sun, Fang Yang, Weidong Mei, Jian Song, Zhu Han, Rui Zhang

    Abstract: Optical intelligent reflecting surface (OIRS) has attracted increasing attention due to its capability of overcoming signal blockages in visible light communication (VLC), an emerging technology for the next-generation advanced transceivers. However, current works on OIRS predominantly assume known channel state information (CSI), which is essential to practical OIRS configuration. To bridge such… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  48. arXiv:2404.14706  [pdf, other

    cs.IT eess.SP

    Channel Estimation for Optical IRS-Assisted VLC System via Spatial Coherence

    Authors: Shiyuan Sun, Fang Yang, Weidong Mei, Jian Song, Zhu Han, Rui Zhang

    Abstract: Optical intelligent reflecting surface (OIRS) has been considered a promising technology for visible light communication (VLC) by constructing visual line-of-sight propagation paths to address the signal blockage issue. However, the existing works on OIRSs are mostly based on perfect channel state information (CSI), whose acquisition appears to be challenging due to the passive nature of the OIRS.… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  49. arXiv:2404.13573  [pdf, other

    cs.CV

    Exploring AIGC Video Quality: A Focus on Visual Harmony, Video-Text Consistency and Domain Distribution Gap

    Authors: Bowen Qu, Xiaoyu Liang, Shangkun Sun, Wei Gao

    Abstract: The recent advancements in Text-to-Video Artificial Intelligence Generated Content (AIGC) have been remarkable. Compared with traditional videos, the assessment of AIGC videos encounters various challenges: visual inconsistency that defy common sense, discrepancies between content and the textual prompt, and distribution gap between various generative models, etc. Target at these challenges, in th… ▽ More

    Submitted 27 April, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

    Comments: 9 pages, 3 figures, 3 tables. Accepted by CVPR2024 Workshop (3rd place winner of NTIRE2024 Quality Assessment for AI-Generated Content - Track 2 Video)

  50. arXiv:2404.10556  [pdf, other

    cs.NI eess.SP

    Generative AI for Advanced UAV Networking

    Authors: Geng Sun, Wenwen Xie, Dusit Niyato, Hongyang Du, Jiawen Kang, Jing Wu, Sumei Sun, Ping Zhang

    Abstract: With the impressive achievements of chatGPT and Sora, generative artificial intelligence (GAI) has received increasing attention. Not limited to the field of content generation, GAI is also widely used to solve the problems in wireless communication scenarios due to its powerful learning and generalization capabilities. Therefore, we discuss key applications of GAI in improving unmanned aerial veh… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.