Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 5,409 results for author: Chang

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.03421  [pdf

    cs.RO

    F3T: A soft tactile unit with 3D force and temperature mathematical decoupling ability for robots

    Authors: Xiong Yang, Hao Ren, Dong Guo, Zhengrong Ling, Tieshan Zhang, Gen Li, Yifeng Tang, Haoxiang Zhao, Jiale Wang, Hongyuan Chang, Jia Dong, Yajing Shen

    Abstract: The human skin exhibits remarkable capability to perceive contact forces and environmental temperatures, providing intricate information essential for nuanced manipulation. Despite recent advancements in soft tactile sensors, a significant challenge remains in accurately decoupling signals - specifically, separating force from directional orientation and temperature - resulting in fail to meet the… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  2. arXiv:2409.03363  [pdf, other

    cs.CL

    Con-ReCall: Detecting Pre-training Data in LLMs via Contrastive Decoding

    Authors: Cheng Wang, Yiwei Wang, Bryan Hooi, Yujun Cai, Nanyun Peng, Kai-Wei Chang

    Abstract: The training data in large language models is key to their success, but it also presents privacy and security risks, as it may contain sensitive information. Detecting pre-training data is crucial for mitigating these concerns. Existing methods typically analyze target text in isolation or solely with non-member contexts, overlooking potential insights from simultaneously considering both member a… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  3. arXiv:2409.02802  [pdf, other

    cs.LG cs.CR stat.ML

    Boosting Certificate Robustness for Time Series Classification with Efficient Self-Ensemble

    Authors: Chang Dong, Zhengyang Li, Liangwei Zheng, Weitong Chen, Wei Emma Zhang

    Abstract: Recently, the issue of adversarial robustness in the time series domain has garnered significant attention. However, the available defense mechanisms remain limited, with adversarial training being the predominant approach, though it does not provide theoretical guarantees. Randomized Smoothing has emerged as a standout method due to its ability to certify a provable lower bound on robustness radi… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: 6 figures, 4 tables, 10 pages

    ACM Class: H.3.3

  4. arXiv:2409.02795  [pdf, other

    cs.CL

    Towards a Unified View of Preference Learning for Large Language Models: A Survey

    Authors: Bofei Gao, Feifan Song, Yibo Miao, Zefan Cai, Zhe Yang, Liang Chen, Helan Hu, Runxin Xu, Qingxiu Dong, Ce Zheng, Wen Xiao, Ge Zhang, Daoguang Zan, Keming Lu, Bowen Yu, Dayiheng Liu, Zeyu Cui, Jian Yang, Lei Sha, Houfeng Wang, Zhifang Sui, Peiyi Wang, Tianyu Liu, Baobao Chang

    Abstract: Large Language Models (LLMs) exhibit remarkably powerful capabilities. One of the crucial factors to achieve success is aligning the LLM's output with human preferences. This alignment process often requires only a small amount of data to efficiently enhance the LLM's performance. While effective, research in this area spans multiple domains, and the methods involved are relatively complex to unde… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: Initial Commit, 21 pages

  5. arXiv:2409.02771  [pdf, other

    cs.PL cs.GR

    CoolerSpace: A Language for Physically Correct and Computationally Efficient Color Programming

    Authors: Ethan Chen, Jiwon Chang, Yuhao Zhu

    Abstract: Color programmers manipulate lights, materials, and the resulting colors from light-material interactions. Existing libraries for color programming provide only a thin layer of abstraction around matrix operations. Color programs are, thus, vulnerable to bugs arising from mathematically permissible but physically meaningless matrix computations. Correct implementations are difficult to write and o… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  6. arXiv:2409.02512  [pdf, other

    cs.LG cs.AI

    Continual Diffuser (CoD): Mastering Continual Offline Reinforcement Learning with Experience Rehearsal

    Authors: Jifeng Hu, Li Shen, Sili Huang, Zhejian Yang, Hechang Chen, Lichao Sun, Yi Chang, Dacheng Tao

    Abstract: Artificial neural networks, especially recent diffusion-based models, have shown remarkable superiority in gaming, control, and QA systems, where the training tasks' datasets are usually static. However, in real-world applications, such as robotic control of reinforcement learning (RL), the tasks are changing, and new tasks arise in a sequential order. This situation poses the new challenge of pla… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  7. arXiv:2409.01556  [pdf, other

    cs.CL cs.AI

    Benchmarking Cognitive Domains for LLMs: Insights from Taiwanese Hakka Culture

    Authors: Chen-Chi Chang, Ching-Yuan Chen, Hung-Shin Lee, Chih-Cheng Lee

    Abstract: This study introduces a comprehensive benchmark designed to evaluate the performance of large language models (LLMs) in understanding and processing cultural knowledge, with a specific focus on Hakka culture as a case study. Leveraging Bloom's Taxonomy, the study develops a multi-dimensional framework that systematically assesses LLMs across six cognitive domains: Remembering, Understanding, Apply… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: Submitted to O-COCOSDA 2024

  8. arXiv:2409.01548  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    VoxHakka: A Dialectally Diverse Multi-speaker Text-to-Speech System for Taiwanese Hakka

    Authors: Li-Wei Chen, Hung-Shin Lee, Chen-Chi Chang

    Abstract: This paper introduces VoxHakka, a text-to-speech (TTS) system designed for Taiwanese Hakka, a critically under-resourced language spoken in Taiwan. Leveraging the YourTTS framework, VoxHakka achieves high naturalness and accuracy and low real-time factor in speech synthesis while supporting six distinct Hakka dialects. This is achieved by training the model with dialect-specific data, allowing for… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: Submitted to O-COCOSDA 2024

  9. arXiv:2409.01541  [pdf, other

    cs.CV cs.CR

    Purification-Agnostic Proxy Learning for Agentic Copyright Watermarking against Adversarial Evidence Forgery

    Authors: Erjin Bao, Ching-Chun Chang, Hanrui Wang, Isao Echizen

    Abstract: With the proliferation of AI agents in various domains, protecting the ownership of AI models has become crucial due to the significant investment in their development. Unauthorized use and illegal distribution of these models pose serious threats to intellectual property, necessitating effective copyright protection measures. Model watermarking has emerged as a key technique to address this issue… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  10. arXiv:2409.01037  [pdf, other

    cs.CL

    NYK-MS: A Well-annotated Multi-modal Metaphor and Sarcasm Understanding Benchmark on Cartoon-Caption Dataset

    Authors: Ke Chang, Hao Li, Junzhao Zhang, Yunfang Wu

    Abstract: Metaphor and sarcasm are common figurative expressions in people's communication, especially on the Internet or the memes popular among teenagers. We create a new benchmark named NYK-MS (NewYorKer for Metaphor and Sarcasm), which contains 1,583 samples for metaphor understanding tasks and 1,578 samples for sarcasm understanding tasks. These tasks include whether it contains metaphor/sarcasm, which… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: 13 pages, 6 figures

  11. arXiv:2409.01007  [pdf, other

    cs.AI

    Unlocking the Wisdom of Large Language Models: An Introduction to The Path to Artificial General Intelligence

    Authors: Edward Y. Chang

    Abstract: This booklet, "Unlocking the Wisdom of Large Language Models," serves as an introduction to the comprehensive work "The Path to Artificial General Intelligence." Through a series of nine aphorisms, we distill key insights and principles that underpin the larger exploration of AI's future through adversarial LLM dialogue. We propose this approach as a potential path to realizing artificial general… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    ACM Class: I.2.7

  12. arXiv:2409.00690  [pdf, other

    cs.CV

    Decoupled and Interactive Regression Modeling for High-performance One-stage 3D Object Detection

    Authors: Weiping Xiao, Yiqiang Wu, Chang Liu, Yu Qin, Xiaomao Li, Liming Xin

    Abstract: Inadequate bounding box modeling in regression tasks constrains the performance of one-stage 3D object detection. Our study reveals that the primary reason lies in two aspects: (1) The limited center-offset prediction seriously impairs the bounding box localization since many highest response positions significantly deviate from object centers. (2) The low-quality sample ignored in regression task… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  13. arXiv:2409.00499  [pdf, other

    cs.RO cs.CV

    DAP: Diffusion-based Affordance Prediction for Multi-modality Storage

    Authors: Haonan Chang, Kowndinya Boyalakuntla, Yuhan Liu, Xinyu Zhang, Liam Schramm, Abdeslam Boularias

    Abstract: Solving storage problem: where objects must be accurately placed into containers with precise orientations and positions, presents a distinct challenge that extends beyond traditional rearrangement tasks. These challenges are primarily due to the need for fine-grained 6D manipulation and the inherent multi-modality of solution spaces, where multiple viable goal configurations exist for the same st… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

    Comments: Paper Accepted by IROS2024. Arxiv version is 8 pages

  14. arXiv:2409.00250  [pdf, other

    cs.CV

    Medical Report Generation Is A Multi-label Classification Problem

    Authors: Yijian Fan, Zhenbang Yang, Rui Liu, Mingjie Li, Xiaojun Chang

    Abstract: Medical report generation is a critical task in healthcare that involves the automatic creation of detailed and accurate descriptions from medical images. Traditionally, this task has been approached as a sequence generation problem, relying on vision-and-language techniques to generate coherent and contextually relevant reports. However, in this paper, we propose a novel perspective: rethinking m… ▽ More

    Submitted 30 August, 2024; originally announced September 2024.

    Comments: Accepted to 2024 IEEE International Conference on Medical Artificial Intelligence

  15. arXiv:2409.00121  [pdf, other

    eess.SP cs.AI cs.LG eess.AS

    BELT-2: Bootstrapping EEG-to-Language representation alignment for multi-task brain decoding

    Authors: Jinzhao Zhou, Yiqun Duan, Fred Chang, Thomas Do, Yu-Kai Wang, Chin-Teng Lin

    Abstract: The remarkable success of large language models (LLMs) across various multi-modality applications is well established. However, integrating large language models with humans, or brain dynamics, remains relatively unexplored. In this paper, we introduce BELT-2, a pioneering multi-task model designed to enhance both encoding and decoding performance from EEG signals. To bolster the quality of the EE… ▽ More

    Submitted 28 August, 2024; originally announced September 2024.

  16. arXiv:2408.17397  [pdf, other

    cs.IT eess.SP

    End-to-End Learning for Task-Oriented Semantic Communications Over MIMO Channels: An Information-Theoretic Framework

    Authors: Chang Cai, Xiaojun Yuan, Ying-Jun Angela Zhang

    Abstract: This paper addresses the problem of end-to-end (E2E) design of learning and communication in a task-oriented semantic communication system. In particular, we consider a multi-device cooperative edge inference system over a wireless multiple-input multiple-output (MIMO) multiple access channel, where multiple devices transmit extracted features to a server to perform a classification task. We formu… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: major revision in IEEE JSAC

  17. arXiv:2408.16633  [pdf

    cs.RO cs.AI

    Optimizing Automated Picking Systems in Warehouse Robots Using Machine Learning

    Authors: Keqin Li, Jin Wang, Xubo Wu, Xirui Peng, Runmian Chang, Xiaoyu Deng, Yiwen Kang, Yue Yang, Fanghao Ni, Bo Hong

    Abstract: With the rapid growth of global e-commerce, the demand for automation in the logistics industry is increasing. This study focuses on automated picking systems in warehouses, utilizing deep learning and reinforcement learning technologies to enhance picking efficiency and accuracy while reducing system failure rates. Through empirical analysis, we demonstrate the effectiveness of these technologies… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  18. arXiv:2408.16629  [pdf, other

    cs.CY cs.AI cs.SI

    LLMs generate structurally realistic social networks but overestimate political homophily

    Authors: Serina Chang, Alicja Chaszczewicz, Emma Wang, Maya Josifovska, Emma Pierson, Jure Leskovec

    Abstract: Generating social networks is essential for many applications, such as epidemic modeling and social simulations. Prior approaches either involve deep learning models, which require many observed networks for training, or stylized models, which are limited in their realism and flexibility. In contrast, LLMs offer the potential for zero-shot and flexible network generation. However, two key question… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  19. arXiv:2408.16343  [pdf, other

    cs.CV cs.AI

    Toward Robust Early Detection of Alzheimer's Disease via an Integrated Multimodal Learning Approach

    Authors: Yifei Chen, Shenghao Zhu, Zhaojie Fang, Chang Liu, Binfeng Zou, Yuhe Wang, Shuo Chang, Fan Jia, Feiwei Qin, Jin Fan, Yong Peng, Changmiao Wang

    Abstract: Alzheimer's Disease (AD) is a complex neurodegenerative disorder marked by memory loss, executive dysfunction, and personality changes. Early diagnosis is challenging due to subtle symptoms and varied presentations, often leading to misdiagnosis with traditional unimodal diagnostic methods due to their limited scope. This study introduces an advanced multimodal classification model that integrates… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: 5 pages, 2 figures

  20. arXiv:2408.15777  [pdf, other

    cs.CV

    A Survey on Facial Expression Recognition of Static and Dynamic Emotions

    Authors: Yan Wang, Shaoqi Yan, Yang Liu, Wei Song, Jing Liu, Yang Chang, Xinji Mai, Xiping Hu, Wenqiang Zhang, Zhongxue Gan

    Abstract: Facial expression recognition (FER) aims to analyze emotional states from static images and dynamic sequences, which is pivotal in enhancing anthropomorphic communication among humans, robots, and digital avatars by leveraging AI technologies. As the FER field evolves from controlled laboratory environments to more complex in-the-wild scenarios, advanced methods have been rapidly developed and new… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  21. arXiv:2408.15609  [pdf, other

    cs.NI cs.LG

    Statistical QoS Provision in Business-Centric Networks

    Authors: Chang Wu, Yuang Chen, Hancheng Lu

    Abstract: More refined resource management and Quality of Service (QoS) provisioning is a critical goal of wireless communication technologies. In this paper, we propose a novel Business-Centric Network (BCN) aimed at enabling scalable QoS provisioning, based on a cross-layer framework that captures the relationship between application, transport parameters, and channels. We investigate both continuous flow… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 13 pages

  22. arXiv:2408.15425  [pdf, other

    cs.RO cs.AI cs.SE

    Fast and Modular Autonomy Software for Autonomous Racing Vehicles

    Authors: Andrew Saba, Aderotimi Adetunji, Adam Johnson, Aadi Kothari, Matthew Sivaprakasam, Joshua Spisak, Prem Bharatia, Arjun Chauhan, Brendan Duff Jr., Noah Gasparro, Charles King, Ryan Larkin, Brian Mao, Micah Nye, Anjali Parashar, Joseph Attias, Aurimas Balciunas, Austin Brown, Chris Chang, Ming Gao, Cindy Heredia, Andrew Keats, Jose Lavariega, William Muckelroy III, Andre Slavescu , et al. (5 additional authors not shown)

    Abstract: Autonomous motorsports aim to replicate the human racecar driver with software and sensors. As in traditional motorsports, Autonomous Racing Vehicles (ARVs) are pushed to their handling limits in multi-agent scenarios at extremely high ($\geq 150mph$) speeds. This Operational Design Domain (ODD) presents unique challenges across the autonomy stack. The Indy Autonomous Challenge (IAC) is an interna… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: Published in Journal of Field Robotics

    Journal ref: Field Robotics Volume 4 (2024) 1-45

  23. arXiv:2408.15235  [pdf, other

    cs.CV

    Learning-based Multi-View Stereo: A Survey

    Authors: Fangjinhua Wang, Qingtian Zhu, Di Chang, Quankai Gao, Junlin Han, Tong Zhang, Richard Hartley, Marc Pollefeys

    Abstract: 3D reconstruction aims to recover the dense 3D structure of a scene. It plays an essential role in various applications such as Augmented/Virtual Reality (AR/VR), autonomous driving and robotics. Leveraging multiple views of a scene captured from different viewpoints, Multi-View Stereo (MVS) algorithms synthesize a comprehensive 3D representation, enabling precise reconstruction in complex environ… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  24. arXiv:2408.14575  [pdf, other

    cs.AI

    EVINCE: Optimizing Adversarial LLM Dialogues via Conditional Statistics and Information Theory

    Authors: Edward Y. Chang

    Abstract: This paper introduces EVINCE (Entropy and Variation IN Conditional Exchanges), a dialogue framework advancing Artificial General Intelligence (AGI) by enhancing versatility, adaptivity, and reasoning in large language models (LLMs). Leveraging adversarial debate and a novel dual entropy theory, EVINCE improves prediction accuracy, robustness, and stability in LLMs by integrating statistical modeli… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: 19 pages, 7 figures, four tables

    ACM Class: I.2.7

  25. arXiv:2408.14453  [pdf

    cs.LG eess.IV eess.SP

    Reconstructing physiological signals from fMRI across the adult lifespan

    Authors: Shiyu Wang, Ziyuan Xu, Yamin Li, Mara Mather, Roza G. Bayrak, Catie Chang

    Abstract: Interactions between the brain and body are of fundamental importance for human behavior and health. Functional magnetic resonance imaging (fMRI) captures whole-brain activity noninvasively, and modeling how fMRI signals interact with physiological dynamics of the body can provide new insight into brain function and offer potential biomarkers of disease. However, physiological recordings are not a… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  26. arXiv:2408.14267  [pdf, other

    cs.LG cs.CV

    1-Bit FQT: Pushing the Limit of Fully Quantized Training to 1-bit

    Authors: Chang Gao, Jianfei Chen, Kang Zhao, Jiaqi Wang, Liping Jing

    Abstract: Fully quantized training (FQT) accelerates the training of deep neural networks by quantizing the activations, weights, and gradients into lower precision. To explore the ultimate limit of FQT (the lowest achievable precision), we make a first attempt to 1-bit FQT. We provide a theoretical analysis of FQT based on Adam and SGD, revealing that the gradient variance influences the convergence of FQT… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  27. arXiv:2408.14262  [pdf

    cs.CL cs.SD eess.AS

    Self-supervised Speech Representations Still Struggle with African American Vernacular English

    Authors: Kalvin Chang, Yi-Hui Chou, Jiatong Shi, Hsuan-Ming Chen, Nicole Holliday, Odette Scharenborg, David R. Mortensen

    Abstract: Underperformance of ASR systems for speakers of African American Vernacular English (AAVE) and other marginalized language varieties is a well-documented phenomenon, and one that reinforces the stigmatization of these varieties. We investigate whether or not the recent wave of Self-Supervised Learning (SSL) speech models can close the gap in ASR performance between AAVE and Mainstream American Eng… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: INTERSPEECH 2024

  28. arXiv:2408.14009  [pdf

    cs.RO cs.AI

    Optimizing TD3 for 7-DOF Robotic Arm Grasping: Overcoming Suboptimality with Exploration-Enhanced Contrastive Learning

    Authors: Wen-Han Hsieh, Jen-Yuan Chang

    Abstract: In actor-critic-based reinforcement learning algorithms such as Twin Delayed Deep Deterministic policy gradient (TD3), insufficient exploration of the spatial space can result in suboptimal policies when controlling 7-DOF robotic arms. To address this issue, we propose a novel Exploration-Enhanced Contrastive Learning (EECL) module that improves exploration by providing additional rewards for enco… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: 4 pages, 2 figures, IEEE-ICKII-2024

  29. arXiv:2408.13980  [pdf, other

    cs.CV

    FusionSAM: Latent Space driven Segment Anything Model for Multimodal Fusion and Segmentation

    Authors: Daixun Li, Weiying Xie, Mingxiang Cao, Yunke Wang, Jiaqing Zhang, Yunsong Li, Leyuan Fang, Chang Xu

    Abstract: Multimodal image fusion and segmentation enhance scene understanding in autonomous driving by integrating data from various sensors. However, current models struggle to efficiently segment densely packed elements in such scenes, due to the absence of comprehensive fusion features that can guide mid-process fine-tuning and focus attention on relevant areas. The Segment Anything Model (SAM) has emer… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  30. arXiv:2408.13906  [pdf, other

    cs.CV cs.AI cs.LG

    ConVis: Contrastive Decoding with Hallucination Visualization for Mitigating Hallucinations in Multimodal Large Language Models

    Authors: Yeji Park, Deokyeong Lee, Junsuk Choe, Buru Chang

    Abstract: Hallucinations in Multimodal Large Language Models (MLLMs) where generated responses fail to accurately reflect the given image pose a significant challenge to their reliability. To address this, we introduce ConVis, a novel training-free contrastive decoding method. ConVis leverages a text-to-image (T2I) generation model to semantically reconstruct the given image from hallucinated captions. By c… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: First two authors contributed equally. Source code is available at https://github.com/yejipark-m/ConVis

  31. arXiv:2408.13772  [pdf, ps, other

    cs.OS

    FRAP: A Flexible Resource Accessing Protocol for Multiprocessor Real-Time Systems

    Authors: Shuai Zhao, Hanzhi Xu, Nan Chen, Ruoxian Su, Wanli Chang

    Abstract: Fully-partitioned fixed-priority scheduling (FP-FPS) multiprocessor systems are widely found in real-time applications, where spin-based protocols are often deployed to manage the mutually exclusive access of shared resources. Unfortunately, existing approaches either enforce rigid spin priority rules for resource accessing or carry significant pessimism in the schedulability analysis, imposing su… ▽ More

    Submitted 27 August, 2024; v1 submitted 25 August, 2024; originally announced August 2024.

  32. arXiv:2408.13464  [pdf, other

    cs.AI cs.CL cs.LG

    Uncovering Biases with Reflective Large Language Models

    Authors: Edward Y. Chang

    Abstract: Biases inherent in human endeavors pose significant challenges for machine learning, particularly in supervised learning that relies on potentially biased "ground truth" data. This reliance, coupled with models' tendency to generalize based on statistical maximal likelihood, can propagate and amplify biases, exacerbating societal issues. To address this, our study proposes a reflective methodology… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

    Comments: 16 pages, 3 figures, 8 tables

    ACM Class: I.2.7

  33. arXiv:2408.13423  [pdf, other

    cs.CV

    Training-free Long Video Generation with Chain of Diffusion Model Experts

    Authors: Wenhao Li, Yichao Cao, Xiu Su, Xi Lin, Shan You, Mingkai Zheng, Yi Chen, Chang Xu

    Abstract: Video generation models hold substantial potential in areas such as filmmaking. However, current video diffusion models need high computational costs and produce suboptimal results due to high complexity of video generation task. In this paper, we propose \textbf{ConFiner}, an efficient high-quality video generation framework that decouples video generation into easier subtasks: structure \textbf{… ▽ More

    Submitted 2 September, 2024; v1 submitted 23 August, 2024; originally announced August 2024.

  34. arXiv:2408.13040  [pdf, other

    eess.AS cs.AI cs.CL cs.LG

    SpeechPrompt: Prompting Speech Language Models for Speech Processing Tasks

    Authors: Kai-Wei Chang, Haibin Wu, Yu-Kai Wang, Yuan-Kuei Wu, Hua Shen, Wei-Cheng Tseng, Iu-thing Kang, Shang-Wen Li, Hung-yi Lee

    Abstract: Prompting has become a practical method for utilizing pre-trained language models (LMs). This approach offers several advantages. It allows an LM to adapt to new tasks with minimal training and parameter updates, thus achieving efficiency in both storage and computation. Additionally, prompting modifies only the LM's inputs and harnesses the generative capabilities of language models to address va… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: Published in IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP)

    Journal ref: in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 3730-3744, 2024

  35. arXiv:2408.12991  [pdf, other

    cs.CE q-fin.TR

    Controllable Financial Market Generation with Diffusion Guided Meta Agent

    Authors: Yu-Hao Huang, Chang Xu, Yang Liu, Weiqing Liu, Wu-Jun Li, Jiang Bian

    Abstract: Order flow modeling stands as the most fundamental and essential financial task, as orders embody the minimal unit within a financial market. However, current approaches often result in unsatisfactory fidelity in generating order flow, and their generation lacks controllability, thereby limiting their application scenario. In this paper, we advocate incorporating controllability into the market ge… ▽ More

    Submitted 1 September, 2024; v1 submitted 23 August, 2024; originally announced August 2024.

  36. arXiv:2408.12957  [pdf, other

    cs.CV

    Image Segmentation in Foundation Model Era: A Survey

    Authors: Tianfei Zhou, Fei Zhang, Boyu Chang, Wenguan Wang, Ye Yuan, Ender Konukoglu, Daniel Cremers

    Abstract: Image segmentation is a long-standing challenge in computer vision, studied continuously over several decades, as evidenced by seminal algorithms such as N-Cut, FCN, and MaskFormer. With the advent of foundation models (FMs), contemporary segmentation methodologies have embarked on a new epoch by either adapting FMs (e.g., CLIP, Stable Diffusion, DINO) for image segmentation or developing dedicate… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: A comprehensive survey of image segmentation in foundation model era (work in progress)

  37. arXiv:2408.12307  [pdf

    cs.LG

    Leveraging Unlabeled Data Sharing through Kernel Function Approximation in Offline Reinforcement Learning

    Authors: Yen-Ru Lai, Fu-Chieh Chang, Pei-Yuan Wu

    Abstract: Offline reinforcement learning (RL) learns policies from a fixed dataset, but often requires large amounts of data. The challenge arises when labeled datasets are expensive, especially when rewards have to be provided by human labelers for large datasets. In contrast, unlabelled data tends to be less expensive. This situation highlights the importance of finding effective ways to use unlabelled da… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  38. arXiv:2408.11791  [pdf, other

    cs.LG

    Critique-out-Loud Reward Models

    Authors: Zachary Ankner, Mansheej Paul, Brandon Cui, Jonathan D. Chang, Prithviraj Ammanabrolu

    Abstract: Traditionally, reward models used for reinforcement learning from human feedback (RLHF) are trained to directly predict preference scores without leveraging the generation capabilities of the underlying large language model (LLM). This limits the capabilities of reward models as they must reason implicitly about the quality of a response, i.e., preference modeling must be performed in a single for… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  39. arXiv:2408.11451  [pdf, other

    cs.AI

    Bidirectional Gated Mamba for Sequential Recommendation

    Authors: Ziwei Liu, Qidong Liu, Yejing Wang, Wanyu Wang, Pengyue Jia, Maolin Wang, Zitao Liu, Yi Chang, Xiangyu Zhao

    Abstract: In various domains, Sequential Recommender Systems (SRS) have become essential due to their superior capability to discern intricate user preferences. Typically, SRS utilize transformer-based architectures to forecast the subsequent item within a sequence. Nevertheless, the quadratic computational complexity inherent in these models often leads to inefficiencies, hindering the achievement of real-… ▽ More

    Submitted 22 August, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

  40. Audio Description Customization

    Authors: Rosiana Natalie, Ruei-Che Chang, Smitha Sheshadri, Anhong Guo, Kotaro Hara

    Abstract: Blind and low-vision (BLV) people use audio descriptions (ADs) to access videos. However, current ADs are unalterable by end users, thus are incapable of supporting BLV individuals' potentially diverse needs and preferences. This research investigates if customizing AD could improve how BLV individuals consume videos. We conducted an interview study (Study 1) with fifteen BLV participants, which r… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: ASSETS 2024

  41. arXiv:2408.11276  [pdf, other

    math.PR cs.LG math.DG stat.ML

    Chernoff Bounds for Tensor Expanders on Riemannian Manifolds Using Graph Laplacian Approximation

    Authors: Shih-Yu Chang

    Abstract: This paper addresses the advancement of probability tail bound analysis, a crucial statistical tool for assessing the probability of large deviations of random variables from their expected values. Traditional tail bounds, such as Markov's, Chebyshev's, and Chernoff bounds, have proven valuable across numerous scientific and engineering fields. However, as data complexity grows, there is a pressin… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  42. arXiv:2408.11264  [pdf, other

    cs.LG cs.CR

    Correlation Analysis of Adversarial Attack in Time Series Classification

    Authors: Zhengyang Li, Wenhao Liang, Chang Dong, Weitong Chen, Dong Huang

    Abstract: This study investigates the vulnerability of time series classification models to adversarial attacks, with a focus on how these models process local versus global information under such conditions. By leveraging the Normalized Auto Correlation Function (NACF), an exploration into the inclination of neural networks is conducted. It is demonstrated that regularization techniques, particularly those… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 15 pages, 7 figures

    ACM Class: I.2.0

  43. arXiv:2408.11194  [pdf, other

    cs.CV

    Compress Guidance in Conditional Diffusion Sampling

    Authors: Anh-Dung Dinh, Daochang Liu, Chang Xu

    Abstract: Enforcing guidance throughout the entire sampling process often proves counterproductive due to the model-fitting issue., where samples are generated to match the classifier's parameters rather than generalizing the expected condition. This work identifies and quantifies the problem, demonstrating that reducing or excluding guidance at numerous timesteps can mitigate this issue. By distributing th… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 10 pages, 5 figures, Computer Vision and Machine Learning

    ACM Class: I.4

  44. arXiv:2408.10908  [pdf, other

    cs.RO cs.HC

    Enhancing End-to-End Autonomous Driving Systems Through Synchronized Human Behavior Data

    Authors: Yiqun Duan, Zhuoli Zhuang, Jinzhao Zhou, Yu-Cheng Chang, Yu-Kai Wang, Chin-Teng Lin

    Abstract: This paper presents a pioneering exploration into the integration of fine-grained human supervision within the autonomous driving domain to enhance system performance. The current advances in End-to-End autonomous driving normally are data-driven and rely on given expert trials. However, this reliance limits the systems' generalizability and their ability to earn human trust. Addressing this gap,… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  45. arXiv:2408.10899  [pdf, other

    cs.RO

    All Robots in One: A New Standard and Unified Dataset for Versatile, General-Purpose Embodied Agents

    Authors: Zhiqiang Wang, Hao Zheng, Yunshuang Nie, Wenjun Xu, Qingwei Wang, Hua Ye, Zhe Li, Kaidong Zhang, Xuewen Cheng, Wanxi Dong, Chang Cai, Liang Lin, Feng Zheng, Xiaodan Liang

    Abstract: Embodied AI is transforming how AI systems interact with the physical world, yet existing datasets are inadequate for developing versatile, general-purpose agents. These limitations include a lack of standardized formats, insufficient data diversity, and inadequate data volume. To address these issues, we introduce ARIO (All Robots In One), a new data standard that enhances existing datasets by of… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: Project website: https://imaei.github.io/project_pages/ario/

  46. arXiv:2408.10764  [pdf, other

    cs.CL

    Predicting Rewards Alongside Tokens: Non-disruptive Parameter Insertion for Efficient Inference Intervention in Large Language Model

    Authors: Chenhan Yuan, Fei Huang, Ru Peng, Keming Lu, Bowen Yu, Chang Zhou, Jingren Zhou

    Abstract: Transformer-based large language models (LLMs) exhibit limitations such as generating unsafe responses, unreliable reasoning, etc. Existing inference intervention approaches attempt to mitigate these issues by finetuning additional models to produce calibration signals (such as rewards) that guide the LLM's decoding process. However, this solution introduces substantial time and space overhead due… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 16 pages

  47. arXiv:2408.10668  [pdf, other

    cs.CR cs.AI

    Probing the Safety Response Boundary of Large Language Models via Unsafe Decoding Path Generation

    Authors: Haoyu Wang, Bingzhe Wu, Yatao Bian, Yongzhe Chang, Xueqian Wang, Peilin Zhao

    Abstract: Large Language Models (LLMs) are implicit troublemakers. While they provide valuable insights and assist in problem-solving, they can also potentially serve as a resource for malicious activities. Implementing safety alignment could mitigate the risk of LLMs generating harmful responses. We argue that: even when an LLM appears to successfully block harmful queries, there may still be hidden vulner… ▽ More

    Submitted 26 August, 2024; v1 submitted 20 August, 2024; originally announced August 2024.

  48. arXiv:2408.10504  [pdf, other

    cs.AI

    QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning

    Authors: Yilun Kong, Hangyu Mao, Qi Zhao, Bin Zhang, Jingqing Ruan, Li Shen, Yongzhe Chang, Xueqian Wang, Rui Zhao, Dacheng Tao

    Abstract: Prompt engineering has demonstrated remarkable success in enhancing the performance of large language models (LLMs) across diverse tasks. However, most existing prompt optimization methods only focus on the task-level performance, overlooking the importance of query-preferred prompts, which leads to suboptimal performances. Additionally, these methods rely heavily on frequent interactions with LLM… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  49. arXiv:2408.10441  [pdf, other

    cs.CL

    Goldfish: Monolingual Language Models for 350 Languages

    Authors: Tyler A. Chang, Catherine Arnett, Zhuowen Tu, Benjamin K. Bergen

    Abstract: For many low-resource languages, the only available language models are large multilingual models trained on many languages simultaneously. However, using FLORES perplexity as a metric, we find that these models perform worse than bigrams for many languages (e.g. 24% of languages in XGLM 4.5B; 43% in BLOOM 7.1B). To facilitate research that focuses on low-resource languages, we pre-train and relea… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  50. arXiv:2408.10099  [pdf, other

    cs.GR

    Neural Representation of Shape-Dependent Laplacian Eigenfunctions

    Authors: Yue Chang, Otman Benchekroun, Maurizio M. Chiaramonte, Peter Yichen Chen, Eitan Grinspun

    Abstract: The eigenfunctions of the Laplace operator are essential in mathematical physics, engineering, and geometry processing. Typically, these are computed by discretizing the domain and performing eigendecomposition, tying the results to a specific mesh. However, this method is unsuitable for continuously-parameterized shapes. We propose a novel representation for eigenfunctions in continuously-param… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.