Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 208 results for author: Wan, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18139  [pdf, other

    cs.CL cs.CV

    LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference

    Authors: Zhongwei Wan, Ziang Wu, Che Liu, Jinfa Huang, Zhihong Zhu, Peng Jin, Longyue Wang, Li Yuan

    Abstract: Long-context Multimodal Large Language Models (MLLMs) demand substantial computational resources for inference as the growth of their multimodal Key-Value (KV) cache, in response to increasing input lengths, challenges memory and time efficiency. Unlike single-modality LLMs that manage only textual contexts, the KV cache of long-context MLLMs includes representations from multiple images with temp… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  2. arXiv:2406.14278  [pdf, ps, other

    cs.DS

    Efficient Deterministic Algorithms for Maximizing Symmetric Submodular Functions

    Authors: Zongqi Wan, Jialin Zhang, Xiaoming Sun, Zhijie Zhang

    Abstract: Symmetric submodular maximization is an important class of combinatorial optimization problems, including MAX-CUT on graphs and hyper-graphs. The state-of-the-art algorithm for the problem over general constraints has an approximation ratio of $0.432$. The algorithm applies the canonical continuous greedy technique that involves a sampling process. It, therefore, suffers from high query complexity… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  3. arXiv:2406.13060  [pdf, other

    cs.LG cs.AI stat.AP

    Scale-Translation Equivariant Network for Oceanic Internal Solitary Wave Localization

    Authors: Zhang Wan, Shuo Wang, Xudong Zhang

    Abstract: Internal solitary waves (ISWs) are gravity waves that are often observed in the interior ocean rather than the surface. They hold significant importance due to their capacity to carry substantial energy, thus influence pollutant transport, oil platform operations, submarine navigation, etc. Researchers have studied ISWs through optical images, synthetic aperture radar (SAR) images, and altimeter d… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 29 pages, 5 figures

  4. arXiv:2406.13035  [pdf, other

    cs.CL

    D2O: Dynamic Discriminative Operations for Efficient Generative Inference of Large Language Models

    Authors: Zhongwei Wan, Xinjian Wu, Yu Zhang, Yi Xin, Chaofan Tao, Zhihong Zhu, Xin Wang, Siqi Luo, Jing Xiong, Mi Zhang

    Abstract: Efficient inference in Large Language Models (LLMs) is impeded by the growing memory demands of key-value (KV) caching, especially for longer sequences. Traditional KV cache eviction strategies, which prioritize less critical KV-pairs based on attention scores, often degrade generation quality, leading to issues such as context loss or hallucinations. To address this, we introduce Dynamic Discrimi… ▽ More

    Submitted 23 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

  5. arXiv:2406.07146  [pdf, other

    cs.CV cs.AI

    Benchmarking and Boosting Radiology Report Generation for 3D High-Resolution Medical Images

    Authors: Che Liu, Zhongwei Wan, Yuqi Wang, Hui Shen, Haozhe Wang, Kangyu Zheng, Mi Zhang, Rossella Arcucci

    Abstract: Automatic radiology report generation can significantly benefit the labor-intensive process of report writing by radiologists, especially for 3D radiographs like CT scans, which are crucial for broad clinical diagnostics yet underexplored compared to 2D radiographs. Existing methods often handle 3D volumes either slice-wise or with aggressive downsampling due to current GPU memory limitations, whi… ▽ More

    Submitted 12 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

  6. arXiv:2406.04835  [pdf, other

    cs.RO

    SLR: Learning Quadruped Locomotion without Privileged Information

    Authors: Shiyi Chen, Zeyu Wan, Shiyang Yan, Chun Zhang, Weiyi Zhang, Qiang Li, Debing Zhang, Fasih Ud Din Farrukh

    Abstract: Traditional reinforcement learning control for quadruped robots often relies on privileged information, demanding meticulous selection and precise estimation, thereby imposing constraints on the development process. This work proposes a Self-learning Latent Representation (SLR) method, which achieves high-performance control policy learning without the need for privileged information. To enhance t… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  7. arXiv:2406.03403  [pdf, other

    cs.LG cs.AI q-bio.QM

    Structure-based Drug Design Benchmark: Do 3D Methods Really Dominate?

    Authors: Kangyu Zheng, Yingzhou Lu, Zaixi Zhang, Zhongwei Wan, Yao Ma, Marinka Zitnik, Tianfan Fu

    Abstract: Currently, the field of structure-based drug design is dominated by three main types of algorithms: search-based algorithms, deep generative models, and reinforcement learning. While existing works have typically focused on comparing models within a single algorithmic category, cross-algorithm comparisons remain scarce. In this paper, to fill the gap, we establish a benchmark to evaluate the perfo… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  8. arXiv:2406.01601  [pdf, other

    cs.DC cs.AI cs.LG

    Backpropogation-Free Multi-modal On-Device Model Adaptation via Cloud-Device Collaboration

    Authors: Wei Ji, Li Li, Zheqi Lv, Wenqiao Zhang, Mengze Li, Zhen Wan, Wenqiang Lei, Roger Zimmermann

    Abstract: In our increasingly interconnected world, where intelligent devices continually amass copious personalized multi-modal data, a pressing need arises to deliver high-quality, personalized device-aware services. However, this endeavor presents a multifaceted challenge to prevailing artificial intelligence (AI) systems primarily rooted in the cloud. As these systems grapple with shifting data distribu… ▽ More

    Submitted 21 May, 2024; originally announced June 2024.

  9. arXiv:2405.18132  [pdf, other

    cs.CV

    EG4D: Explicit Generation of 4D Object without Score Distillation

    Authors: Qi Sun, Zhiyang Guo, Ziyu Wan, Jing Nathan Yan, Shengming Yin, Wengang Zhou, Jing Liao, Houqiang Li

    Abstract: In recent years, the increasing demand for dynamic 3D assets in design and gaming applications has given rise to powerful generative pipelines capable of synthesizing high-quality 4D objects. Previous methods generally rely on score distillation sampling (SDS) algorithm to infer the unseen views and motion of 4D objects, thus leading to unsatisfactory results with defects like over-saturation and… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  10. arXiv:2405.15821  [pdf, other

    cs.AI cs.LG

    Reinforcing Language Agents via Policy Optimization with Action Decomposition

    Authors: Muning Wen, Ziyu Wan, Weinan Zhang, Jun Wang, Ying Wen

    Abstract: Language models as intelligent agents push the boundaries of sequential decision-making agents but struggle with limited knowledge of environmental dynamics and exponentially huge action space. Recent efforts like GLAM and TWOSOME manually constrain the action space to a restricted subset and employ reinforcement learning to align agents' knowledge with specific environments. However, they overloo… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 24 pages with 9 pages are main context

  11. arXiv:2405.12757  [pdf, other

    cs.CV

    BIMM: Brain Inspired Masked Modeling for Video Representation Learning

    Authors: Zhifan Wan, Jie Zhang, Changzhen Li, Shiguang Shan

    Abstract: The visual pathway of human brain includes two sub-pathways, ie, the ventral pathway and the dorsal pathway, which focus on object identification and dynamic information modeling, respectively. Both pathways comprise multi-layer structures, with each layer responsible for processing different aspects of visual information. Inspired by visual information processing mechanism of the human brain, we… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  12. arXiv:2405.11916  [pdf, ps, other

    cs.LG cs.CR

    Information Leakage from Embedding in Large Language Models

    Authors: Zhipeng Wan, Anda Cheng, Yinggui Wang, Lei Wang

    Abstract: The widespread adoption of large language models (LLMs) has raised concerns regarding data privacy. This study aims to investigate the potential for privacy invasion through input reconstruction attacks, in which a malicious model provider could potentially recover user inputs from embeddings. We first propose two base methods to reconstruct original texts from a model's hidden states. We find tha… ▽ More

    Submitted 22 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

  13. arXiv:2405.10767  [pdf, other

    cs.HC cs.AI

    Evaluating Saliency Explanations in NLP by Crowdsourcing

    Authors: Xiaotian Lu, Jiyi Li, Zhen Wan, Xiaofeng Lin, Koh Takeuchi, Hisashi Kashima

    Abstract: Deep learning models have performed well on many NLP tasks. However, their internal mechanisms are typically difficult for humans to understand. The development of methods to explain models has become a key issue in the reliability of deep learning models in many important applications. Various saliency explanation methods, which give each feature of input a score proportional to the contribution… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: 13 pages, 4 figures, Accepted for LREC-Coling 2024 (Oral)

  14. arXiv:2404.15700  [pdf, other

    cs.CV cs.RO

    MAS-SAM: Segment Any Marine Animal with Aggregated Features

    Authors: Tianyu Yan, Zifu Wan, Xinhao Deng, Pingping Zhang, Yang Liu, Huchuan Lu

    Abstract: Recently, Segment Anything Model (SAM) shows exceptional performance in generating high-quality object masks and achieving zero-shot image segmentation. However, as a versatile vision model, SAM is primarily trained with large-scale natural light images. In underwater scenes, it exhibits substantial performance degradation due to the light scattering and absorption. Meanwhile, the simplicity of th… ▽ More

    Submitted 9 May, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

    Comments: Accepted by IJCAI2024 as Poster

  15. arXiv:2404.15261  [pdf, other

    math.OC cs.DM cs.LG math.PR

    All You Need is Resistance: On the Equivalence of Effective Resistance and Certain Optimal Transport Problems on Graphs

    Authors: Sawyer Robertson, Zhengchao Wan, Alexander Cloninger

    Abstract: The fields of effective resistance and optimal transport on graphs are filled with rich connections to combinatorics, geometry, machine learning, and beyond. In this article we put forth a bold claim: that the two fields should be understood as one and the same, up to a choice of $p$. We make this claim precise by introducing the parameterized family of $p$-Beckmann distances for probability measu… ▽ More

    Submitted 26 April, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

    Comments: 35 pages, 7 figures

    MSC Class: 65K10; 05C21; 90C25; 68R10; 05C50

  16. arXiv:2404.12083  [pdf, other

    cs.CV

    MambaPupil: Bidirectional Selective Recurrent model for Event-based Eye tracking

    Authors: Zhong Wang, Zengyu Wan, Han Han, Bohao Liao, Yuliang Wu, Wei Zhai, Yang Cao, Zheng-jun Zha

    Abstract: Event-based eye tracking has shown great promise with the high temporal resolution and low redundancy provided by the event camera. However, the diversity and abruptness of eye movement patterns, including blinking, fixating, saccades, and smooth pursuit, pose significant challenges for eye localization. To achieve a stable event-based eye-tracking system, this paper proposes a bidirectional long-… ▽ More

    Submitted 30 April, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024 Workshop (AIS: Vision, Graphics and AI for Streaming), top solution of challenge Event-based Eye Tracking, see https://www.kaggle.com/competitions/event-based-eye-tracking-ais2024

  17. arXiv:2404.11770  [pdf, other

    cs.CV cs.AI

    Event-Based Eye Tracking. AIS 2024 Challenge Survey

    Authors: Zuowen Wang, Chang Gao, Zongwei Wu, Marcos V. Conde, Radu Timofte, Shih-Chii Liu, Qinyu Chen, Zheng-jun Zha, Wei Zhai, Han Han, Bohao Liao, Yuliang Wu, Zengyu Wan, Zhong Wang, Yang Cao, Ganchao Tan, Jinze Chen, Yan Ru Pei, Sasskia Brüers, Sébastien Crouzet, Douglas McLelland, Oliver Coenen, Baoheng Zhang, Yizhao Gao, Jingyuan Li , et al. (14 additional authors not shown)

    Abstract: This survey reviews the AIS 2024 Event-Based Eye Tracking (EET) Challenge. The task of the challenge focuses on processing eye movement recorded with event cameras and predicting the pupil center of the eye. The challenge emphasizes efficient eye tracking with event cameras to achieve good task accuracy and efficiency trade-off. During the challenge period, 38 participants registered for the Kaggl… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Qinyu Chen is the corresponding author

  18. Understanding the Role of Temperature in Diverse Question Generation by GPT-4

    Authors: Arav Agarwal, Karthik Mittal, Aidan Doyle, Pragnya Sridhar, Zipiao Wan, Jacob Arthur Doughty, Jaromir Savelka, Majd Sakr

    Abstract: We conduct a preliminary study of the effect of GPT's temperature parameter on the diversity of GPT4-generated questions. We find that using higher temperature values leads to significantly higher diversity, with different temperatures exposing different types of similarity between generated sets of questions. We also demonstrate that diverse question generation is especially difficult for questio… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  19. arXiv:2404.04256  [pdf, other

    cs.CV

    Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation

    Authors: Zifu Wan, Yuhao Wang, Silong Yong, Pingping Zhang, Simon Stepputtis, Katia Sycara, Yaqi Xie

    Abstract: Multi-modal semantic segmentation significantly enhances AI agents' perception and scene understanding, especially under adverse conditions like low-light or overexposed environments. Leveraging additional modalities (X-modality) like thermal and depth alongside traditional RGB provides complementary information, enabling more robust and reliable segmentation. In this work, we introduce Sigma, a S… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  20. arXiv:2404.04173  [pdf, other

    cs.AR cs.LG

    H3DFact: Heterogeneous 3D Integrated CIM for Factorization with Holographic Perceptual Representations

    Authors: Zishen Wan, Che-Kai Liu, Mohamed Ibrahim, Hanchen Yang, Samuel Spetalnick, Tushar Krishna, Arijit Raychowdhury

    Abstract: Disentangling attributes of various sensory signals is central to human-like perception and reasoning and a critical task for higher-order cognitive and neuro-symbolic AI systems. An elegant approach to represent this intricate factorization is via high-dimensional holographic vectors drawing on brain-inspired vector symbolic architectures. However, holographic factorization involves iterative com… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: 2024 Design Automation and Test in Europe (DATE); The first two authors have equal contributions

  21. arXiv:2404.03654  [pdf, other

    cs.CV

    RaFE: Generative Radiance Fields Restoration

    Authors: Zhongkai Wu, Ziyu Wan, Jing Zhang, Jing Liao, Dong Xu

    Abstract: NeRF (Neural Radiance Fields) has demonstrated tremendous potential in novel view synthesis and 3D reconstruction, but its performance is sensitive to input image quality, which struggles to achieve high-fidelity rendering when provided with low-quality sparse input viewpoints. Previous methods for NeRF restoration are tailored for specific degradation type, ignoring the generality of restoration.… ▽ More

    Submitted 7 April, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

    Comments: Project Page: https://zkaiwu.github.io/RaFE

  22. arXiv:2403.12268  [pdf, other

    cs.IT eess.SP

    Near-Field Channel Modeling for Electromagnetic Information Theory

    Authors: Zhongzhichao Wan, Jieao Zhu, Linglong Dai

    Abstract: Electromagnetic information theory (EIT) is one of the emerging topics for 6G communication due to its potential to reveal the performance limit of wireless communication systems. For EIT, the research foundation is reasonable and accurate channel modeling. Existing channel modeling works for EIT in non-line-of-sight (NLoS) scenario focus on far-field modeling, which can not accurately capture the… ▽ More

    Submitted 26 May, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: In this paper, we propose the near-field channel model for EIT based on electromagnetic scattering theory. Then, we derive the analytical expression of the correlation function of the fields and analyze the characteristics of it. Finally, we design a channel estimation scheme for near-field scenario

  23. arXiv:2403.07378  [pdf, other

    cs.CL cs.LG

    SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression

    Authors: Xin Wang, Yu Zheng, Zhongwei Wan, Mi Zhang

    Abstract: The advancements in Large Language Models (LLMs) have been hindered by their substantial sizes, which necessitate LLM compression methods for practical deployment. Singular Value Decomposition (SVD) offers a promising solution for LLM compression. However, state-of-the-art SVD-based LLM compression methods have two key limitations: truncating smaller singular values may lead to higher compression… ▽ More

    Submitted 28 May, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: Code available at: https://github.com/AIoT-MLSys-Lab/SVD-LLM

  24. arXiv:2403.07153  [pdf, other

    cs.CV

    2023 Low-Power Computer Vision Challenge (LPCVC) Summary

    Authors: Leo Chen, Benjamin Boardley, Ping Hu, Yiru Wang, Yifan Pu, Xin Jin, Yongqiang Yao, Ruihao Gong, Bo Li, Gao Huang, Xianglong Liu, Zifu Wan, Xinwang Chen, Ning Liu, Ziyi Zhang, Dongping Liu, Ruijie Shan, Zhengping Che, Fachao Zhang, Xiaofeng Mou, Jian Tang, Maxim Chuprov, Ivan Malofeev, Alexander Goncharenko, Andrey Shcherbin , et al. (5 additional authors not shown)

    Abstract: This article describes the 2023 IEEE Low-Power Computer Vision Challenge (LPCVC). Since 2015, LPCVC has been an international competition devoted to tackling the challenge of computer vision (CV) on edge devices. Most CV researchers focus on improving accuracy, at the expense of ever-growing sizes of machine models. LPCVC balances accuracy with resource requirements. Winners must achieve high accu… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: LPCVC 2023, website: https://lpcv.ai/

  25. arXiv:2403.06659  [pdf, other

    eess.SP cs.AI cs.LG

    Zero-Shot ECG Classification with Multimodal Learning and Test-time Clinical Knowledge Enhancement

    Authors: Che Liu, Zhongwei Wan, Cheng Ouyang, Anand Shah, Wenjia Bai, Rossella Arcucci

    Abstract: Electrocardiograms (ECGs) are non-invasive diagnostic tools crucial for detecting cardiac arrhythmic diseases in clinical practice. While ECG Self-supervised Learning (eSSL) methods show promise in representation learning from unannotated ECG data, they often overlook the clinical knowledge that can be found in reports. This oversight and the requirement for annotated samples for downstream tasks… ▽ More

    Submitted 6 May, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: Accepted by ICML2024

  26. arXiv:2403.05465  [pdf, other

    cs.AR cs.AI cs.LG cs.NE

    Algorithm-Hardware Co-Design of Distribution-Aware Logarithmic-Posit Encodings for Efficient DNN Inference

    Authors: Akshat Ramachandran, Zishen Wan, Geonhwa Jeong, John Gustafson, Tushar Krishna

    Abstract: Traditional Deep Neural Network (DNN) quantization methods using integer, fixed-point, or floating-point data types struggle to capture diverse DNN parameter distributions at low precision, and often require large silicon overhead and intensive quantization-aware training. In this study, we introduce Logarithmic Posits (LP), an adaptive, hardware-friendly data type inspired by posits that dynamica… ▽ More

    Submitted 26 March, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

    Comments: 2024 61st IEEE/ACM Design Automation Conference (DAC)

  27. arXiv:2403.04945  [pdf, other

    cs.CL cs.LG eess.SP

    MEIT: Multi-Modal Electrocardiogram Instruction Tuning on Large Language Models for Report Generation

    Authors: Zhongwei Wan, Che Liu, Xin Wang, Chaofan Tao, Hui Shen, Zhenwu Peng, Jie Fu, Rossella Arcucci, Huaxiu Yao, Mi Zhang

    Abstract: Electrocardiogram (ECG) is the primary non-invasive diagnostic tool for monitoring cardiac conditions and is crucial in assisting clinicians. Recent studies have concentrated on classifying cardiac conditions using ECG data but have overlooked ECG report generation, which is time-consuming and requires clinical expertise. To automate ECG report generation and ensure its versatility, we propose the… ▽ More

    Submitted 18 June, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

    Comments: Under review

  28. arXiv:2403.03690  [pdf

    cs.CL cs.AI

    Rapidly Developing High-quality Instruction Data and Evaluation Benchmark for Large Language Models with Minimal Human Effort: A Case Study on Japanese

    Authors: Yikun Sun, Zhen Wan, Nobuhiro Ueda, Sakiko Yahata, Fei Cheng, Chenhui Chu, Sadao Kurohashi

    Abstract: The creation of instruction data and evaluation benchmarks for serving Large language models often involves enormous human annotation. This issue becomes particularly pronounced when rapidly developing such resources for a non-English language like Japanese. Instead of following the popular practice of directly translating existing English resources into Japanese (e.g., Japanese-Alpaca), we propos… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: COLING 2024. Our code are available here: \href{https://github.com/hitoshizuku7/awesome-Ja-self-instruct}{self-instruct data} and \href{https://github.com/ku-nlp/ja-vicuna-qa-benchmark}{evaluation benchmark}

  29. arXiv:2402.15398  [pdf, other

    cs.LG cs.AI cs.CY

    TransFlower: An Explainable Transformer-Based Model with Flow-to-Flow Attention for Commuting Flow Prediction

    Authors: Yan Luo, Zhuoyue Wan, Yuzhong Chen, Gengchen Mai, Fu-lai Chung, Kent Larson

    Abstract: Understanding the link between urban planning and commuting flows is crucial for guiding urban development and policymaking. This research, bridging computer science and urban studies, addresses the challenge of integrating these fields with their distinct focuses. Traditional urban studies methods, like the gravity and radiation models, often underperform in complex scenarios due to their limited… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  30. arXiv:2402.14202  [pdf, other

    cs.LG

    Comparing Graph Transformers via Positional Encodings

    Authors: Mitchell Black, Zhengchao Wan, Gal Mishne, Amir Nayyeri, Yusu Wang

    Abstract: The distinguishing power of graph transformers is closely tied to the choice of positional encoding: features used to augment the base transformer with information about the graph. There are two primary types of positional encoding: absolute positional encodings (APEs) and relative positional encodings (RPEs). APEs assign features to each node and are given as input to the transformer. RPEs instea… ▽ More

    Submitted 4 June, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

    Comments: accepted to ICML 2024

  31. arXiv:2402.13607  [pdf, other

    cs.CV cs.CL

    CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models

    Authors: Fuwen Luo, Chi Chen, Zihao Wan, Zhaolu Kang, Qidong Yan, Yingjie Li, Xiaolong Wang, Siyu Wang, Ziyue Wang, Xiaoyue Mi, Peng Li, Ning Ma, Maosong Sun, Yang Liu

    Abstract: Multimodal large language models (MLLMs) have demonstrated promising results in a variety of tasks that combine vision and language. As these models become more integral to research and applications, conducting comprehensive evaluations of their capabilities has grown increasingly important. However, most existing benchmarks fail to consider that, in certain situations, images need to be interpret… ▽ More

    Submitted 4 June, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

  32. arXiv:2402.07157  [pdf, other

    cs.CL cs.AI cs.LG

    Natural Language Reinforcement Learning

    Authors: Xidong Feng, Ziyu Wan, Mengyue Yang, Ziyan Wang, Girish A. Koushik, Yali Du, Ying Wen, Jun Wang

    Abstract: Reinforcement Learning (RL) has shown remarkable abilities in learning policies for decision-making tasks. However, RL is often hindered by issues such as low sample efficiency, lack of interpretability, and sparse supervision signals. To tackle these limitations, we take inspiration from the human learning process and introduce Natural Language Reinforcement Learning (NLRL), which innovatively co… ▽ More

    Submitted 14 February, 2024; v1 submitted 11 February, 2024; originally announced February 2024.

    Comments: Work in Progress

  33. arXiv:2402.06023  [pdf, other

    cs.LG cs.AI cs.GT

    Decision Theory-Guided Deep Reinforcement Learning for Fast Learning

    Authors: Zelin Wan, Jin-Hee Cho, Mu Zhu, Ahmed H. Anwar, Charles Kamhoua, Munindar P. Singh

    Abstract: This paper introduces a novel approach, Decision Theory-guided Deep Reinforcement Learning (DT-guided DRL), to address the inherent cold start problem in DRL. By integrating decision theory principles, DT-guided DRL enhances agents' initial performance and robustness in complex environments, enabling more efficient and reliable convergence during learning. Our investigation encompasses two primary… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  34. arXiv:2402.04467  [pdf, other

    cs.LG math.DS

    DySLIM: Dynamics Stable Learning by Invariant Measure for Chaotic Systems

    Authors: Yair Schiff, Zhong Yi Wan, Jeffrey B. Parker, Stephan Hoyer, Volodymyr Kuleshov, Fei Sha, Leonardo Zepeda-Núñez

    Abstract: Learning dynamics from dissipative chaotic systems is notoriously difficult due to their inherent instability, as formalized by their positive Lyapunov exponents, which exponentially amplify errors in the learned dynamics. However, many of these systems exhibit ergodicity and an attractor: a compact and highly complex manifold, to which trajectories converge in finite-time, that supports an invari… ▽ More

    Submitted 5 June, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

    Comments: ICML 2024; Code to reproduce our experiments is available at https://github.com/google-research/swirl-dynamics/tree/main/swirl_dynamics/projects/ergodic

  35. arXiv:2401.16193  [pdf, other

    cs.LG cs.DB

    Contributing Dimension Structure of Deep Feature for Coreset Selection

    Authors: Zhijing Wan, Zhixiang Wang, Yuran Wang, Zheng Wang, Hongyuan Zhu, Shin'ichi Satoh

    Abstract: Coreset selection seeks to choose a subset of crucial training samples for efficient learning. It has gained traction in deep learning, particularly with the surge in training dataset sizes. Sample selection hinges on two main aspects: a sample's representation in enhancing performance and the role of sample diversity in averting overfitting. Existing methods typically measure both the representat… ▽ More

    Submitted 2 March, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: 13 pages,11 figures, to be published in AAAI2024

  36. arXiv:2401.11713  [pdf, other

    cs.CV cs.AI

    Medical Image Debiasing by Learning Adaptive Agreement from a Biased Council

    Authors: Luyang Luo, Xin Huang, Minghao Wang, Zhuoyue Wan, Hao Chen

    Abstract: Deep learning could be prone to learning shortcuts raised by dataset bias and result in inaccurate, unreliable, and unfair models, which impedes its adoption in real-world clinical applications. Despite its significance, there is a dearth of research in the medical image classification domain to address dataset bias. Furthermore, the bias labels are often agnostic, as identifying biases can be lab… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: 10 pages, 5 figures, 3 tables. Code and benchmark will be released via https://github.com/LLYXC/Ada-ABC/tree/main

  37. arXiv:2401.10443  [pdf, other

    cs.SE

    Towards Automated Driving Violation Cause Analysis in Scenario-Based Testing for Autonomous Driving Systems

    Authors: Ziwen Wan, Yuqi Huai, Yuntianyi Chen, Joshua Garcia, Qi Alfred Chen

    Abstract: The rapid advancement of Autonomous Vehicles (AVs), exemplified by companies like Waymo and Cruise offering 24/7 paid taxi services, highlights the paramount importance of ensuring AVs' compliance with various policies, such as safety regulations, traffic rules, and mission directives. Despite significant progress in the development of Autonomous Driving System (ADS) testing tools, there has been… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

  38. arXiv:2401.08330  [pdf, other

    cs.LG cs.AI math.OC

    Boosting Gradient Ascent for Continuous DR-submodular Maximization

    Authors: Qixin Zhang, Zongqi Wan, Zengde Deng, Zaiyi Chen, Xiaoming Sun, Jialin Zhang, Yu Yang

    Abstract: Projected Gradient Ascent (PGA) is the most commonly used optimization scheme in machine learning and operations research areas. Nevertheless, numerous studies and examples have shown that the PGA methods may fail to achieve the tight approximation ratio for continuous DR-submodular maximization problems. To address this challenge, we present a boosting technique in this paper, which can efficient… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: 74 pages, 6 figures and 9 tables. An extended version of Stochastic Continuous Submodular Maximization: Boosting via Non-oblivious Function (ICML 2022)

  39. arXiv:2401.07584  [pdf, other

    cs.CV

    Collaboratively Self-supervised Video Representation Learning for Action Recognition

    Authors: Jie Zhang, Zhifan Wan, Lanqing Hu, Stephen Lin, Shuzhe Wu, Shiguang Shan

    Abstract: Considering the close connection between action recognition and human pose estimation, we design a Collaboratively Self-supervised Video Representation (CSVR) learning framework specific to action recognition by jointly considering generative pose prediction and discriminative context matching as pretext tasks. Specifically, our CSVR consists of three branches: a generative pose prediction branch,… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

  40. arXiv:2401.01923  [pdf, other

    cs.DC cs.LG cs.NI

    IoT in the Era of Generative AI: Vision and Challenges

    Authors: Xin Wang, Zhongwei Wan, Arvin Hekmati, Mingyu Zong, Samiul Alam, Mi Zhang, Bhaskar Krishnamachari

    Abstract: Equipped with sensing, networking, and computing capabilities, Internet of Things (IoT) such as smartphones, wearables, smart speakers, and household robots have been seamlessly weaved into our daily lives. Recent advancements in Generative AI exemplified by GPT, LLaMA, DALL-E, and Stable Difussion hold immense promise to push IoT to the next level. In this article, we share our vision and views o… ▽ More

    Submitted 5 January, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

    Comments: 8 pages, 3 figures

  41. arXiv:2401.01040  [pdf, other

    cs.AI cs.AR

    Towards Cognitive AI Systems: a Survey and Prospective on Neuro-Symbolic AI

    Authors: Zishen Wan, Che-Kai Liu, Hanchen Yang, Chaojian Li, Haoran You, Yonggan Fu, Cheng Wan, Tushar Krishna, Yingyan Lin, Arijit Raychowdhury

    Abstract: The remarkable advancements in artificial intelligence (AI), primarily driven by deep neural networks, have significantly impacted various aspects of our lives. However, the current challenges surrounding unsustainable computational trajectories, limited robustness, and a lack of explainability call for the development of next-generation AI systems. Neuro-symbolic AI (NSAI) emerges as a promising… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

    Comments: Workshop on Systems for Next-Gen AI Paradigms, 6th Conference on Machine Learning and Systems (MLSys), June 4-8, 2023, Miami, FL, USA

  42. arXiv:2401.00283  [pdf, other

    cs.IT eess.SP

    Near-Space Communications: the Last Piece of 6G Space-Air-Ground-Sea Integrated Network Puzzle

    Authors: Hongshan Liu, Tong Qin, Zhen Gao, Tianqi Mao, Keke Ying, Ziwei Wan, Li Qiao, Rui Na, Zhongxiang Li, Chun Hu, Yikun Mei, Tuan Li, Guanghui Wen, Lei Chen, Zhonghuai Wu, Ruiqi Liu, Gaojie Chen, Shuo Wang, Dezhi Zheng

    Abstract: This article presents a comprehensive study on the emerging near-space communications (NS-COM) within the context of space-air-ground-sea integrated network (SAGSIN). Specifically, we firstly explore the recent technical developments of NS-COM, followed by the discussions about motivations behind integrating NS-COM into SAGSIN. To further demonstrate the necessity of NS-COM, a comparative analysis… ▽ More

    Submitted 4 March, 2024; v1 submitted 30 December, 2023; originally announced January 2024.

    Comments: 28 pages, 8 figures, 2 tables

  43. arXiv:2312.13131  [pdf, other

    cs.LG cs.AI cs.CR

    Scaling Compute Is Not All You Need for Adversarial Robustness

    Authors: Edoardo Debenedetti, Zishen Wan, Maksym Andriushchenko, Vikash Sehwag, Kshitij Bhardwaj, Bhavya Kailkhura

    Abstract: The last six years have witnessed significant progress in adversarially robust deep learning. As evidenced by the CIFAR-10 dataset category in RobustBench benchmark, the accuracy under $\ell_\infty$ adversarial perturbations improved from 44\% in \citet{Madry2018Towards} to 71\% in \citet{peng2023robust}. Although impressive, existing state-of-the-art is still far from satisfactory. It is further… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

  44. arXiv:2312.11302  [pdf, other

    cs.IT eess.SP

    AFDM-SCMA: A Promising Waveform for Massive Connectivity over High Mobility Channels

    Authors: Qu Luo, Pei Xiao, Zilong Liu, Ziwei Wan, Thomos Nikolaos, Zhen Gao, Ziming He

    Abstract: This paper studies the affine frequency division multiplexing (AFDM)-empowered sparse code multiple access (SCMA) system, referred to as AFDM-SCMA, for supporting massive connectivity in high-mobility environments. First, by placing the sparse codewords on the AFDM chirp subcarriers, the input-output (I/O) relation of AFDM-SCMA systems is presented. Next, we delve into the generalized receiver des… ▽ More

    Submitted 11 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  45. arXiv:2312.09685  [pdf, other

    cs.SE

    When Contracts Meets Crypto: Exploring Developers' Struggles with Ethereum Cryptographic APIs

    Authors: Jiashuo Zhang, Jiachi Chen, Zhiyuan Wan, Ting Chen, Jianbo Gao, Zhong Chen

    Abstract: To empower smart contracts with the promising capabilities of cryptography, Ethereum officially introduced a set of cryptographic APIs that facilitate basic cryptographic operations within smart contracts, such as elliptic curve operations. However, since developers are not necessarily cryptography experts, requiring them to directly interact with these basic APIs has caused real-world security is… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: To appear at ICSE'24

  46. arXiv:2312.07539  [pdf, other

    cs.CV

    HeadArtist: Text-conditioned 3D Head Generation with Self Score Distillation

    Authors: Hongyu Liu, Xuan Wang, Ziyu Wan, Yujun Shen, Yibing Song, Jing Liao, Qifeng Chen

    Abstract: This work presents HeadArtist for 3D head generation from text descriptions. With a landmark-guided ControlNet serving as the generative prior, we come up with an efficient pipeline that optimizes a parameterized 3D head model under the supervision of the prior distillation itself. We call such a process self score distillation (SSD). In detail, given a sampled camera pose, we first render an imag… ▽ More

    Submitted 8 May, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

    Comments: Amazing results are shown in https://kumapowerliu.github.io/HeadArtist. Accepted by SIGGRAPH 2024

  47. arXiv:2312.06663  [pdf, other

    cs.CV cs.GR

    CAD: Photorealistic 3D Generation via Adversarial Distillation

    Authors: Ziyu Wan, Despoina Paschalidou, Ian Huang, Hongyu Liu, Bokui Shen, Xiaoyu Xiang, Jing Liao, Leonidas Guibas

    Abstract: The increased demand for 3D data in AR/VR, robotics and gaming applications, gave rise to powerful generative pipelines capable of synthesizing high-quality 3D objects. Most of these models rely on the Score Distillation Sampling (SDS) algorithm to optimize a 3D representation such that the rendered image maintains a high likelihood as evaluated by a pre-trained diffusion model. However, finding a… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Comments: Project page: http://raywzy.com/CAD/

  48. arXiv:2312.03863  [pdf, other

    cs.CL cs.AI

    Efficient Large Language Models: A Survey

    Authors: Zhongwei Wan, Xin Wang, Che Liu, Samiul Alam, Yu Zheng, Jiachen Liu, Zhongnan Qu, Shen Yan, Yi Zhu, Quanlu Zhang, Mosharaf Chowdhury, Mi Zhang

    Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in important tasks such as natural language understanding and language generation, and thus have the potential to make a substantial impact on our society. Such capabilities, however, come with the considerable resources they demand, highlighting the strong need to develop effective techniques for addressing their efficiency ch… ▽ More

    Submitted 23 May, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

    Comments: Camera ready version of Transactions on Machine Learning Research (TMLR)

  49. arXiv:2312.03173  [pdf, other

    cs.CY cs.AI cs.CL

    A Comparative Study of AI-Generated (GPT-4) and Human-crafted MCQs in Programming Education

    Authors: Jacob Doughty, Zipiao Wan, Anishka Bompelli, Jubahed Qayum, Taozhi Wang, Juran Zhang, Yujia Zheng, Aidan Doyle, Pragnya Sridhar, Arav Agarwal, Christopher Bogart, Eric Keylor, Can Kultur, Jaromir Savelka, Majd Sakr

    Abstract: There is a constant need for educators to develop and maintain effective up-to-date assessments. While there is a growing body of research in computing education on utilizing large language models (LLMs) in generation and engagement with coding exercises, the use of LLMs for generating programming MCQs has not been extensively explored. We analyzed the capability of GPT-4 to produce multiple-choic… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

  50. arXiv:2310.14214  [pdf, other

    cs.CV cs.MM

    TransY-Net:Learning Fully Transformer Networks for Change Detection of Remote Sensing Images

    Authors: Tianyu Yan, Zifu Wan, Pingping Zhang, Gong Cheng, Huchuan Lu

    Abstract: In the remote sensing field, Change Detection (CD) aims to identify and localize the changed regions from dual-phase images over the same places. Recently, it has achieved great progress with the advances of deep learning. However, current methods generally deliver incomplete CD regions and irregular CD boundaries due to the limited representation ability of the extracted visual features. To relie… ▽ More

    Submitted 22 October, 2023; originally announced October 2023.

    Comments: This work is accepted by TGRS2023. It is an extension of our ACCV2022 paper and arXiv:2210.00757