Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 956 results for author: Sun, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.14136  [pdf, other

    cs.RO

    One Fling to Goal: Environment-aware Dynamics for Goal-conditioned Fabric Flinging

    Authors: Linhan Yang, Lei Yang, Haoran Sun, Zeqing Zhang, Haibin He, Fang Wan, Chaoyang Song, Jia Pan

    Abstract: Fabric manipulation dynamically is commonly seen in manufacturing and domestic settings. While dynamically manipulating a fabric piece to reach a target state is highly efficient, this task presents considerable challenges due to the varying properties of different fabrics, complex dynamics when interacting with environments, and meeting required goal conditions. To address these challenges, we pr… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  2. arXiv:2406.12300  [pdf

    eess.IV cs.CV q-bio.NC

    IR2QSM: Quantitative Susceptibility Mapping via Deep Neural Networks with Iterative Reverse Concatenations and Recurrent Modules

    Authors: Min Li, Chen Chen, Zhuang Xiong, Ying Liu, Pengfei Rong, Shanshan Shan, Feng Liu, Hongfu Sun, Yang Gao

    Abstract: Quantitative susceptibility mapping (QSM) is an MRI phase-based post-processing technique to extract the distribution of tissue susceptibilities, demonstrating significant potential in studying neurological diseases. However, the ill-conditioned nature of dipole inversion makes QSM reconstruction from the tissue field prone to noise and artifacts. In this work, we propose a novel deep learning-bas… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 10 pages, 9 figures

  3. arXiv:2406.11890  [pdf, other

    cs.LG cs.AI cs.CL

    Unraveling the Mechanics of Learning-Based Demonstration Selection for In-Context Learning

    Authors: Hui Liu, Wenya Wang, Hao Sun, Chris Xing Tian, Chenqi Kong, Xin Dong, Haoliang Li

    Abstract: Large Language Models (LLMs) have demonstrated impressive in-context learning (ICL) capabilities from few-shot demonstration exemplars. While recent learning-based demonstration selection methods have proven beneficial to ICL by choosing more useful exemplars, their underlying mechanisms are opaque, hindering efforts to address limitations such as high training costs and poor generalization across… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  4. arXiv:2406.11474  [pdf, other

    cs.CL cs.AI

    How Far Can In-Context Alignment Go? Exploring the State of In-Context Alignment

    Authors: Heyan Huang, Yinghao Li, Huashan Sun, Yu Bai, Yang Gao

    Abstract: Recent studies have demonstrated that In-Context Learning (ICL), through the use of specific demonstrations, can align Large Language Models (LLMs) with human preferences known as In-Context Alignment (ICA), indicating that models can comprehend human instructions without requiring parameter adjustments. However, the exploration of the mechanism and applicability of ICA remains limited. In this pa… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 22 pages, 6 figures, work in progress

  5. arXiv:2406.10539  [pdf, other

    cs.CV

    Self-Supervised Vision Transformer for Enhanced Virtual Clothes Try-On

    Authors: Lingxiao Lu, Shengyi Wu, Haoxuan Sun, Junhong Gou, Jianlou Si, Chen Qian, Jianfu Zhang, Liqing Zhang

    Abstract: Virtual clothes try-on has emerged as a vital feature in online shopping, offering consumers a critical tool to visualize how clothing fits. In our research, we introduce an innovative approach for virtual clothes try-on, utilizing a self-supervised Vision Transformer (ViT) coupled with a diffusion model. Our method emphasizes detail enhancement by contrasting local clothing image embeddings, gene… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  6. arXiv:2406.08098  [pdf, other

    cs.SE

    Scalable Defect Detection via Traversal on Code Graph

    Authors: Zhengyao Liu, Xitong Zhong, Xingjing Deng, Shuo Hong, Xiang Gao, Hailong Sun

    Abstract: Detecting defects and vulnerabilities in the early stage has long been a challenge in software engineering. Static analysis, a technique that inspects code without execution, has emerged as a key strategy to address this challenge. Among recent advancements, the use of graph-based representations, particularly Code Property Graph (CPG), has gained traction due to its comprehensive depiction of cod… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  7. arXiv:2406.07006  [pdf, other

    cs.CV

    MIPI 2024 Challenge on Few-shot RAW Image Denoising: Methods and Results

    Authors: Xin Jin, Chunle Guo, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Ruoqi Li, Chang Liu, Ziyi Wang, Yao Du, Jingjing Yang, Long Bao, Heng Sun, Xiangyu Kong, Xiaoxia Xing, Jinlong Wu, Yuanyang Xue, Hyunhee Park, Sejun Song, Changho Kim, Jingfan Tan , et al. (17 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: CVPR 2024 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Few-shot RAWImage Denoising Challenge Report. Website: https://mipi-challenge.org/MIPI2024/

  8. arXiv:2406.06607  [pdf, other

    cs.LG cs.AI

    Continuous Test-time Domain Adaptation for Efficient Fault Detection under Evolving Operating Conditions

    Authors: Han Sun, Kevin Ammann, Stylianos Giannoulakis, Olga Fink

    Abstract: Fault detection is crucial in industrial systems to prevent failures and optimize performance by distinguishing abnormal from normal operating conditions. Data-driven methods have been gaining popularity for fault detection tasks as the amount of condition monitoring data from complex industrial systems increases. Despite these advances, early fault detection remains a challenge under real-world s… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 11 pages including references

  9. arXiv:2406.06048  [pdf, other

    cs.CV cs.AI cs.MM

    Robust Latent Representation Tuning for Image-text Classification

    Authors: Hao Sun, Yu Song

    Abstract: Large models have demonstrated exceptional generalization capabilities in computer vision and natural language processing. Recent efforts have focused on enhancing these models with multimodal processing abilities. However, addressing the challenges posed by scenarios where one modality is absent remains a significant hurdle. In response to this issue, we propose a robust latent representation tun… ▽ More

    Submitted 14 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

  10. arXiv:2406.04875  [pdf, other

    cs.CV

    3DRealCar: An In-the-wild RGB-D Car Dataset with 360-degree Views

    Authors: Xiaobiao Du, Haiyang Sun, Shuyun Wang, Zhuojie Wu, Hongwei Sheng, Jiaying Ying, Ming Lu, Tianqing Zhu, Kun Zhan, Xin Yu

    Abstract: 3D cars are commonly used in self-driving systems, virtual/augmented reality, and games. However, existing 3D car datasets are either synthetic or low-quality, presenting a significant gap toward the high-quality real-world 3D car datasets and limiting their applications in practical scenarios. In this paper, we propose the first large-scale 3D real car dataset, termed 3DRealCar, offering three di… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Project Page: https://xiaobiaodu.github.io/3drealcar

  11. arXiv:2406.04523  [pdf, other

    cs.CL cs.LG

    Proofread: Fixes All Errors with One Tap

    Authors: Renjie Liu, Yanxiang Zhang, Yun Zhu, Haicheng Sun, Yuanbo Zhang, Michael Xuelin Huang, Shanqing Cai, Lei Meng, Shumin Zhai

    Abstract: The impressive capabilities in Large Language Models (LLMs) provide a powerful approach to reimagine users' typing experience. This paper demonstrates Proofread, a novel Gboard feature powered by a server-side LLM in Gboard, enabling seamless sentence-level and paragraph-level corrections with a single tap. We describe the complete system in this paper, from data generation, metrics design to mode… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 8 pages, 3 figures, 2 tables

  12. arXiv:2406.04158  [pdf, other

    cs.CV eess.IV

    Sparse Multi-baseline SAR Cross-modal 3D Reconstruction of Vehicle Targets

    Authors: Da Li, Guoqiang Zhao, Houjun Sun, Jiacheng Bao

    Abstract: Multi-baseline SAR 3D imaging faces significant challenges due to data sparsity. In recent years, deep learning techniques have achieved notable success in enhancing the quality of sparse SAR 3D imaging. However, previous work typically rely on full-aperture high-resolution radar images to supervise the training of deep neural networks (DNNs), utilizing only single-modal information from radar dat… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  13. arXiv:2406.03814  [pdf, other

    cs.CL cs.SD eess.AS

    Improving Zero-Shot Chinese-English Code-Switching ASR with kNN-CTC and Gated Monolingual Datastores

    Authors: Jiaming Zhou, Shiwan Zhao, Hui Wang, Tian-Hao Zhang, Haoqin Sun, Xuechen Wang, Yong Qin

    Abstract: The kNN-CTC model has proven to be effective for monolingual automatic speech recognition (ASR). However, its direct application to multilingual scenarios like code-switching, presents challenges. Although there is potential for performance improvement, a kNN-CTC model utilizing a single bilingual datastore can inadvertently introduce undesirable noise from the alternative language. To address thi… ▽ More

    Submitted 13 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

  14. arXiv:2406.03152  [pdf, other

    cs.DS cs.LG

    Dynamic Spectral Clustering with Provable Approximation Guarantee

    Authors: Steinar Laenen, He Sun

    Abstract: This paper studies clustering algorithms for dynamically evolving graphs $\{G_t\}$, in which new edges (and potential new vertices) are added into a graph, and the underlying cluster structure of the graph can gradually change. The paper proves that, under some mild condition on the cluster-structure, the clusters of the final graph $G_T$ of $n_T$ vertices at time $T$ can be well approximated by a… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: This work is accepted at the 41st International Conference on Machine Learning (ICML'24)

  15. arXiv:2406.02888  [pdf, other

    cs.CL cs.AI cs.LG

    HYDRA: Model Factorization Framework for Black-Box LLM Personalization

    Authors: Yuchen Zhuang, Haotian Sun, Yue Yu, Rushi Qiang, Qifan Wang, Chao Zhang, Bo Dai

    Abstract: Personalization has emerged as a critical research area in modern intelligent systems, focusing on mining users' behavioral history and adapting to their preferences for delivering tailored experiences. Despite the remarkable few-shot capabilities exhibited by black-box large language models (LLMs), the inherent opacity of their model parameters presents significant challenges in aligning the gene… ▽ More

    Submitted 10 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: 24 pages, 6 figures, work in progress

  16. arXiv:2406.02642  [pdf, other

    cs.LG cs.AI

    E-ICL: Enhancing Fine-Grained Emotion Recognition through the Lens of Prototype Theory

    Authors: Zhou Yang, Zhaochun Ren, Chenglong Ye, Yufeng Wang, Haizhou Sun, Chao Chen, Xiaofei Zhu, Yunbing Wu, Xiangwen Liao

    Abstract: In-context learning (ICL) achieves remarkable performance in various domains such as knowledge acquisition, commonsense reasoning, and semantic understanding. However, its performance significantly deteriorates for emotion detection tasks, especially fine-grained emotion recognition. The underlying reasons for this remain unclear. In this paper, we identify the reasons behind ICL's poor performanc… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 16 pages, 7 figures, 5 tables

  17. arXiv:2406.02539  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Parrot: Multilingual Visual Instruction Tuning

    Authors: Hai-Long Sun, Da-Wei Zhou, Yang Li, Shiyin Lu, Chao Yi, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, De-Chuan Zhan, Han-Jia Ye

    Abstract: The rapid development of Multimodal Large Language Models (MLLMs) like GPT-4V has marked a significant step towards artificial general intelligence. Existing methods mainly focus on aligning vision encoders with LLMs through supervised fine-tuning (SFT) to endow LLMs with multimodal abilities, making MLLMs' inherent ability to react to multiple languages progressively deteriorate as the training p… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  18. arXiv:2406.02147  [pdf, other

    cs.CV

    UA-Track: Uncertainty-Aware End-to-End 3D Multi-Object Tracking

    Authors: Lijun Zhou, Tao Tang, Pengkun Hao, Zihang He, Kalok Ho, Shuo Gu, Wenbo Hou, Zhihui Hao, Haiyang Sun, Kun Zhan, Peng Jia, Xianpeng Lang, Xiaodan Liang

    Abstract: 3D multiple object tracking (MOT) plays a crucial role in autonomous driving perception. Recent end-to-end query-based trackers simultaneously detect and track objects, which have shown promising potential for the 3D MOT task. However, existing methods overlook the uncertainty issue, which refers to the lack of precise confidence about the state and location of tracked objects. Uncertainty arises… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  19. arXiv:2406.02079  [pdf, ps, other

    cs.CL

    Assessing the Performance of Chinese Open Source Large Language Models in Information Extraction Tasks

    Authors: Yida Cai, Hao Sun, Hsiu-Yuan Huang, Yunfang Wu

    Abstract: Information Extraction (IE) plays a crucial role in Natural Language Processing (NLP) by extracting structured information from unstructured text, thereby facilitating seamless integration with various real-world applications that rely on structured data. Despite its significance, recent experiments focusing on English IE tasks have shed light on the challenges faced by Large Language Models (LLMs… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  20. arXiv:2406.01349  [pdf, other

    cs.CV

    Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video Generation

    Authors: Enhui Ma, Lijun Zhou, Tao Tang, Zhan Zhang, Dong Han, Junpeng Jiang, Kun Zhan, Peng Jia, Xianpeng Lang, Haiyang Sun, Di Lin, Kaicheng Yu

    Abstract: Using generative models to synthesize new data has become a de-facto standard in autonomous driving to address the data scarcity issue. Though existing approaches are able to boost perception models, we discover that these approaches fail to improve the performance of planning of end-to-end autonomous driving models as the generated videos are usually less than 8 frames and the spatial and tempora… ▽ More

    Submitted 6 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: Project Page: https://westlake-autolab.github.io/delphi.github.io/, 8 figures

  21. arXiv:2406.01306  [pdf, other

    cs.CL

    Unsupervised Distractor Generation via Large Language Model Distilling and Counterfactual Contrastive Decoding

    Authors: Fanyi Qu, Hao Sun, Yunfang Wu

    Abstract: Within the context of reading comprehension, the task of Distractor Generation (DG) aims to generate several incorrect options to confuse readers. Traditional supervised methods for DG rely heavily on expensive human-annotated distractor labels. In this paper, we propose an unsupervised DG framework, leveraging Large Language Models (LLMs) as cost-effective annotators to enhance the DG capability… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted as a long paper in ACL 2024 findings

  22. arXiv:2406.01078  [pdf, other

    cs.CV

    CUT: A Controllable, Universal, and Training-Free Visual Anomaly Generation Framework

    Authors: Han Sun, Yunkang Cao, Olga Fink

    Abstract: Visual anomaly detection (AD) inherently faces significant challenges due to the scarcity of anomalous data. Although numerous works have been proposed to synthesize anomalous samples, the generated samples often lack authenticity or can only reflect the distribution of the available training data samples. In this work, we propose CUT: a Controllable, Universal and Training-free visual anomaly gen… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 9 pages excluding appendix

  23. arXiv:2406.00163  [pdf, other

    cs.IT eess.SY

    A Stochastic Incentive-based Demand Response Program for Virtual Power Plant with Solar, Battery, Electric Vehicles, and Controllable Loads

    Authors: Pratik Harsh, Hongjian Sun, Debapriya Das, Goyal Awagan, Jing Jiang

    Abstract: The growing integration of distributed energy resources (DERs) into the power grid necessitates an effective coordination strategy to maximize their benefits. Acting as an aggregator of DERs, a virtual power plant (VPP) facilitates this coordination, thereby amplifying their impact on the transmission level of the power grid. Further, a demand response program enhances the scheduling approach by m… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

    Comments: 11 pages, 8 figures, submitted to IEEE Transactions on Industry Applications for potential publication

  24. arXiv:2405.18113  [pdf, other

    cs.CL cs.AI

    Facilitating Multi-Role and Multi-Behavior Collaboration of Large Language Models for Online Job Seeking and Recruiting

    Authors: Hongda Sun, Hongzhan Lin, Haiyu Yan, Chen Zhu, Yang Song, Xin Gao, Shuo Shang, Rui Yan

    Abstract: The emergence of online recruitment services has revolutionized the traditional landscape of job seeking and recruitment, necessitating the development of high-quality industrial applications to improve person-job fitting. Existing methods generally rely on modeling the latent semantics of resumes and job descriptions and learning a matching function between them. Inspired by the powerful role-pla… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  25. arXiv:2405.18023  [pdf, ps, other

    cs.IT

    Generator polynomials of cyclic expurgated or extended Goppa codes

    Authors: Xue Jia, Fengwei Li, Huan Sun, Qin Yue

    Abstract: Classical Goppa codes are a well-known class of codes with applications in code-based cryptography, which are a special case of alternant codes. Many papers are devoted to the search for Goppa codes with a cyclic extension or with a cyclic parity-check subcode. Let $\Bbb F_q$ be a finite field with $q=2^l$ elements, where $l$ is a positive integer. In this paper, we determine all the generator pol… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  26. arXiv:2405.17903  [pdf, other

    cs.CV q-bio.NC

    Reliable Object Tracking by Multimodal Hybrid Feature Extraction and Transformer-Based Fusion

    Authors: Hongze Sun, Rui Liu, Wuque Cai, Jun Wang, Yue Wang, Huajin Tang, Yan Cui, Dezhong Yao, Daqing Guo

    Abstract: Visual object tracking, which is primarily based on visible light image sequences, encounters numerous challenges in complicated scenarios, such as low light conditions, high dynamic ranges, and background clutter. To address these challenges, incorporating the advantages of multiple visual modalities is a promising solution for achieving reliable object tracking. However, the existing approaches… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 16 pages, 7 figures, 9 tabes; This work has been submitted for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  27. arXiv:2405.17755  [pdf, other

    cs.CL cs.AI

    XL3M: A Training-free Framework for LLM Length Extension Based on Segment-wise Inference

    Authors: Shengnan Wang, Youhui Bai, Lin Zhang, Pingyi Zhou, Shixiong Zhao, Gong Zhang, Sen Wang, Renhai Chen, Hua Xu, Hongwei Sun

    Abstract: Length generalization failure problem, namely the large language model (LLM) fails to generalize to texts longer than its maximum training length, greatly restricts the application of LLM in the scenarios with streaming long inputs. To address this problem, the existing methods either require substantial costs or introduce precision loss. In this paper, we empirically find that the accuracy of the… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 11 pages, 5 figures

  28. arXiv:2405.16276  [pdf, ps, other

    cs.GT

    Mechanism Design for LLM Fine-tuning with Multiple Reward Models

    Authors: Haoran Sun, Yurong Chen, Siwei Wang, Wei Chen, Xiaotie Deng

    Abstract: Recent research on fine-tuning large language models (LLMs) through the aggregation of multiple preferences has attracted considerable attention. However, the existing literature predominantly focuses on the empirical performance of aggregation algorithms, while neglecting the underlying motivation for agents to misreport their preferences. In this paper, we formalize this as a multi-parameter mec… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  29. arXiv:2405.16240  [pdf, other

    cs.LG

    Analytic Federated Learning

    Authors: Huiping Zhuang, Run He, Kai Tong, Di Fang, Han Sun, Haoran Li, Tianyi Chen, Ziqian Zeng

    Abstract: In this paper, we introduce analytic federated learning (AFL), a new training paradigm that brings analytical (i.e., closed-form) solutions to the federated learning (FL) community. Our AFL draws inspiration from analytic learning -- a gradient-free technique that trains neural networks with analytical solutions in one epoch. In the local client training stage, the AFL facilitates a one-epoch trai… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  30. arXiv:2405.16093  [pdf, other

    cs.CV

    Diverse Teacher-Students for Deep Safe Semi-Supervised Learning under Class Mismatch

    Authors: Qikai Wang, Rundong He, Yongshun Gong, Chunxiao Ren, Haoliang Sun, Xiaoshui Huang, Yilong Yin

    Abstract: Semi-supervised learning can significantly boost model performance by leveraging unlabeled data, particularly when labeled data is scarce. However, real-world unlabeled data often contain unseen-class samples, which can hinder the classification of seen classes. To address this issue, mainstream safe SSL methods suggest detecting and discarding unseen-class samples from unlabeled data. Nevertheles… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  31. arXiv:2405.15624  [pdf, other

    cs.LG cs.AI

    Inverse-RLignment: Inverse Reinforcement Learning from Demonstrations for LLM Alignment

    Authors: Hao Sun, Mihaela van der Schaar

    Abstract: Aligning Large Language Models (LLMs) is crucial for enhancing their safety and utility. However, existing methods, primarily based on preference datasets, face challenges such as noisy labels, high annotation costs, and privacy concerns. In this work, we introduce Alignment from Demonstrations (AfD), a novel approach leveraging high-quality demonstration data to overcome these challenges. We form… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  32. arXiv:2405.15071  [pdf, other

    cs.CL

    Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization

    Authors: Boshi Wang, Xiang Yue, Yu Su, Huan Sun

    Abstract: We study whether transformers can learn to implicitly reason over parametric knowledge, a skill that even the most capable language models struggle with. Focusing on two representative reasoning types, composition and comparison, we consistently find that transformers can learn implicit reasoning, but only through grokking, i.e., extended training far beyond overfitting. The levels of generalizati… ▽ More

    Submitted 26 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: 22 pages, 16 figures. Code and data: https://github.com/OSU-NLP-Group/GrokkedTransformer

  33. Survey on Visual Signal Coding and Processing with Generative Models: Technologies, Standards and Optimization

    Authors: Zhibo Chen, Heming Sun, Li Zhang, Fan Zhang

    Abstract: This paper provides a survey of the latest developments in visual signal coding and processing with generative models. Specifically, our focus is on presenting the advancement of generative models and their influence on research in the domain of visual signal coding and processing. This survey study begins with a brief introduction of well-established generative models, including the Variational A… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  34. arXiv:2405.12954  [pdf, other

    cs.LG cs.AI

    A Method on Searching Better Activation Functions

    Authors: Haoyuan Sun, Zihao Wu, Bo Xia, Pu Chang, Zibin Dong, Yifu Yuan, Yongzhe Chang, Xueqian Wang

    Abstract: The success of artificial neural networks (ANNs) hinges greatly on the judicious selection of an activation function, introducing non-linearity into network and enabling them to model sophisticated relationships in data. However, the search of activation functions has largely relied on empirical knowledge in the past, lacking theoretical guidance, which has hindered the identification of more effe… ▽ More

    Submitted 22 May, 2024; v1 submitted 18 May, 2024; originally announced May 2024.

    Comments: 16 pages,3 figures

  35. arXiv:2405.12432  [pdf, ps, other

    cs.IT eess.SP

    Power Measurement Based Channel Estimation for IRS-Enhanced Wireless Coverage

    Authors: He Sun, Lipeng Zhu, Weidong Mei, Rui Zhang

    Abstract: In this paper, we study an IRS-assisted coverage enhancement problem for a given region, aiming to optimize the passive reflection of the IRS for improving the average communication performance in the region by accounting for both deterministic and random channels in the environment. To this end, we first derive the closed-form expression of the average received signal power in terms of the determ… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: text overlap with arXiv:2309.08275

  36. arXiv:2405.12245  [pdf, ps, other

    cs.IT

    Low Complexity Successive Cancellation Decoding of Polar Codes based on Pruning Strategy in Deletion Error Channels

    Authors: He Sun, Rongke Liu, Bin Dai

    Abstract: A novel SC decoding method of polar codes is proposed in $d$-deletion channels, where a new pruning strategy is designed to reduce decoding complexity. Considering the difference of the scenario weight distributions, pruning thresholds for each node are designed separately according to a uniform constraint on the pruning error probability, which further reduce the number of scenarios that need to… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

  37. arXiv:2405.11739  [pdf

    cs.LG cs.AI cs.CY

    Contactless Polysomnography: What Radio Waves Tell Us about Sleep

    Authors: Hao He, Chao Li, Wolfgang Ganglberger, Kaileigh Gallagher, Rumen Hristov, Michail Ouroutzoglou, Haoqi Sun, Jimeng Sun, Brandon Westover, Dina Katabi

    Abstract: The ability to assess sleep at home, capture sleep stages, and detect the occurrence of apnea (without on-body sensors) simply by analyzing the radio waves bouncing off people's bodies while they sleep is quite powerful. Such a capability would allow for longitudinal data collection in patients' homes, informing our understanding of sleep and its interaction with various diseases and their therape… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: The first two authors contributed equally to this work

  38. arXiv:2405.09554  [pdf, ps, other

    eess.SP cs.IT

    Underdetermined DOA Estimation of Off-Grid Sources Based on the Generalized Double Pareto Prior

    Authors: Yongfeng Huang, Zhendong Chen, Kun Ye, Lang Zhou, Haixin Sun

    Abstract: In this letter, we investigate a new generalized double Pareto based on off-grid sparse Bayesian learning (GDPOGSBL) approach to improve the performance of direction of arrival (DOA) estimation in underdetermined scenarios. The method aims to enhance the sparsity of source signal by utilizing the generalized double Pareto (GDP) prior. Firstly, we employ a first-order linear Taylor expansion to mod… ▽ More

    Submitted 17 May, 2024; v1 submitted 18 April, 2024; originally announced May 2024.

  39. arXiv:2405.09220  [pdf, other

    cs.LG cs.AI cs.CL

    ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language Models

    Authors: Siwei Wang, Yifei Shen, Shi Feng, Haoran Sun, Shang-Hua Teng, Wei Chen

    Abstract: In this paper, we present the findings of our Project ALPINE which stands for ``Autoregressive Learning for Planning In NEtworks." Project ALPINE initiates a theoretical investigation into the development of planning capabilities in Transformer-based language models through their autoregressive learning mechanisms, aiming to identify any potential limitations in their planning abilities. We abstra… ▽ More

    Submitted 27 May, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

  40. arXiv:2405.09059  [pdf, other

    cs.CV

    Task-adaptive Q-Face

    Authors: Haomiao Sun, Mingjie He, Shiguang Shan, Hu Han, Xilin Chen

    Abstract: Although face analysis has achieved remarkable improvements in the past few years, designing a multi-task face analysis model is still challenging. Most face analysis tasks are studied as separate problems and do not benefit from the synergy among related tasks. In this work, we propose a novel task-adaptive multi-task face analysis method named as Q-Face, which simultaneously performs multiple fa… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: Ever submitted to ECCV2024

  41. arXiv:2405.08931  [pdf, ps, other

    cs.DS

    A QPTAS for Facility Location on Unit Disk graphs

    Authors: Zachary Friggstad, Mohsen Rezapour, Mohammad R. Salavatipour, Hao Sun

    Abstract: We study the classic \textsc{(Uncapacitated) Facility Location} problem on Unit Disk Graphs (UDGs). For a given point set $P$ in the plane, the unit disk graph UDG(P) on $P$ has vertex set $P$ and an edge between two distinct points $p, q \in P$ if and only if their Euclidean distance $|pq|$ is at most 1. The weight of the edge $pq$ is equal to their distance $|pq|$. An instance of \fl on UDG(P) c… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  42. arXiv:2405.08668  [pdf, other

    cs.CV cs.AI cs.LG stat.AP

    Promoting AI Equity in Science: Generalized Domain Prompt Learning for Accessible VLM Research

    Authors: Qinglong Cao, Yuntian Chen, Lu Lu, Hao Sun, Zhenzhong Zeng, Xiaokang Yang, Dongxiao Zhang

    Abstract: Large-scale Vision-Language Models (VLMs) have demonstrated exceptional performance in natural vision tasks, motivating researchers across domains to explore domain-specific VLMs. However, the construction of powerful domain-specific VLMs demands vast amounts of annotated data, substantial electrical energy, and computing resources, primarily accessible to industry, yet hindering VLM research in a… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  43. arXiv:2405.06342  [pdf, other

    cs.CV eess.IV

    Compression-Realized Deep Structural Network for Video Quality Enhancement

    Authors: Hanchi Sun, Xiaohong Liu, Xinyang Jiang, Yifei Shen, Dongsheng Li, Xiongkuo Min, Guangtao Zhai

    Abstract: This paper focuses on the task of quality enhancement for compressed videos. Although deep network-based video restorers achieve impressive progress, most of the existing methods lack a structured design to optimally leverage the priors within compression codecs. Since the quality degradation of the video is primarily induced by the compression algorithm, a new paradigm is urgently needed for a mo… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  44. arXiv:2405.04307  [pdf, other

    cs.RO cs.AI cs.LG

    Improving Offline Reinforcement Learning with Inaccurate Simulators

    Authors: Yiwen Hou, Haoyuan Sun, Jinming Ma, Feng Wu

    Abstract: Offline reinforcement learning (RL) provides a promising approach to avoid costly online interaction with the real environment. However, the performance of offline RL highly depends on the quality of the datasets, which may cause extrapolation error in the learning process. In many robotic applications, an inaccurate simulator is often available. However, the data directly collected from the inacc… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  45. LGTM: Local-to-Global Text-Driven Human Motion Diffusion Model

    Authors: Haowen Sun, Ruikun Zheng, Haibin Huang, Chongyang Ma, Hui Huang, Ruizhen Hu

    Abstract: In this paper, we introduce LGTM, a novel Local-to-Global pipeline for Text-to-Motion generation. LGTM utilizes a diffusion-based architecture and aims to address the challenge of accurately translating textual descriptions into semantically coherent human motion in computer animation. Specifically, traditional methods often struggle with semantic discrepancies, particularly in aligning specific m… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 9 pages,7 figures, SIGGRAPH 2024

  46. arXiv:2405.02914  [pdf, other

    cs.RO

    Simulation of Optical Tactile Sensors Supporting Slip and Rotation using Path Tracing and IMPM

    Authors: Zirong Shen, Yuhao Sun, Shixin Zhang, Zixi Chen, Heyi Sun, Fuchun Sun, Bin Fang

    Abstract: Optical tactile sensors are extensively utilized in intelligent robot manipulation due to their ability to acquire high-resolution tactile information at a lower cost. However, achieving adequate reality and versatility in simulating optical tactile sensors is challenging. In this paper, we propose a simulation method and validate its effectiveness through experiments. We utilize path tracing for… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  47. arXiv:2405.01515  [pdf, other

    cs.IT eess.SP

    Model-based Deep Learning for Rate Split Multiple Access in Vehicular Communications

    Authors: Hanwen Zhang, Mingzhe Chen, Alireza Vahid, Haijian Sun

    Abstract: Rate split multiple access (RSMA) has been proven as an effective communication scheme for 5G and beyond, especially in vehicular scenarios. However, RSMA requires complicated iterative algorithms for proper resource allocation, which cannot fulfill the stringent latency requirement in resource constrained vehicles. Although data driven approaches can alleviate this issue, they suffer from poor ge… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: submitted to IEEE conference

  48. arXiv:2405.00719  [pdf, other

    eess.SP cs.LG q-bio.NC

    EEG-Deformer: A Dense Convolutional Transformer for Brain-computer Interfaces

    Authors: Yi Ding, Yong Li, Hao Sun, Rui Liu, Chengxuan Tong, Cuntai Guan

    Abstract: Effectively learning the temporal dynamics in electroencephalogram (EEG) signals is challenging yet essential for decoding brain activities using brain-computer interfaces (BCIs). Although Transformers are popular for their long-term sequential learning ability in the BCI field, most methods combining Transformers with convolutional neural networks (CNNs) fail to capture the coarse-to-fine tempora… ▽ More

    Submitted 25 April, 2024; originally announced May 2024.

    Comments: 10 pages, 9 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  49. arXiv:2404.19026  [pdf, other

    cs.CV

    MeGA: Hybrid Mesh-Gaussian Head Avatar for High-Fidelity Rendering and Head Editing

    Authors: Cong Wang, Di Kang, He-Yi Sun, Shen-Han Qian, Zi-Xuan Wang, Linchao Bao, Song-Hai Zhang

    Abstract: Creating high-fidelity head avatars from multi-view videos is a core issue for many AR/VR applications. However, existing methods usually struggle to obtain high-quality renderings for all different head components simultaneously since they use one single representation to model components with drastically different characteristics (e.g., skin vs. hair). In this paper, we propose a Hybrid Mesh-Gau… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: Project page: https://conallwang.github.io/MeGA_Pages/

  50. arXiv:2404.18394  [pdf, other

    cs.CV

    Reconstructing Satellites in 3D from Amateur Telescope Images

    Authors: Zhiming Chang, Boyang Liu, Yifei Xia, Youming Guo, Boxin Shi, He Sun

    Abstract: This paper proposes a framework for the 3D reconstruction of satellites in low-Earth orbit, utilizing videos captured by small amateur telescopes. The video data obtained from these telescopes differ significantly from data for standard 3D reconstruction tasks, characterized by intense motion blur, atmospheric turbulence, pervasive background light pollution, extended focal length and constrained… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.