Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 545 results for author: Shen, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.06083  [pdf, other

    cs.CV

    Towards Robust Monocular Depth Estimation in Non-Lambertian Surfaces

    Authors: Junrui Zhang, Jiaqi Li, Yachuan Huang, Yiran Wang, Jinghong Zheng, Liao Shen, Zhiguo Cao

    Abstract: In the field of monocular depth estimation (MDE), many models with excellent zero-shot performance in general scenes emerge recently. However, these methods often fail in predicting non-Lambertian surfaces, such as transparent or mirror (ToM) surfaces, due to the unique reflective properties of these regions. Previous methods utilize externally provided ToM masks and aim to obtain correct depth ma… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  2. arXiv:2408.05502  [pdf, other

    cs.CV cs.CL

    GEM: Context-Aware Gaze EstiMation with Visual Search Behavior Matching for Chest Radiograph

    Authors: Shaonan Liu, Wenting Chen, Jie Liu, Xiaoling Luo, Linlin Shen

    Abstract: Gaze estimation is pivotal in human scene comprehension tasks, particularly in medical diagnostic analysis. Eye-tracking technology facilitates the recording of physicians' ocular movements during image interpretation, thereby elucidating their visual attention patterns and information-processing strategies. In this paper, we initially define the context-aware gaze estimation problem in medical ra… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

    Comments: 9 figures

  3. Towards High-resolution 3D Anomaly Detection via Group-Level Feature Contrastive Learning

    Authors: Hongze Zhu, Guoyang Xie, Chengbin Hou, Tao Dai, Can Gao, Jinbao Wang, Linlin Shen

    Abstract: High-resolution point clouds~(HRPCD) anomaly detection~(AD) plays a critical role in precision machining and high-end equipment manufacturing. Despite considerable 3D-AD methods that have been proposed recently, they still cannot meet the requirements of the HRPCD-AD task. There are several challenges: i) It is difficult to directly capture HRPCD information due to large amounts of points at the s… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: ACMMM24, 12 pages, 5 figures

  4. arXiv:2408.03876  [pdf, other

    cs.HC

    From Data to Story: Towards Automatic Animated Data Video Creation with LLM-based Multi-Agent Systems

    Authors: Leixian Shen, Haotian Li, Yun Wang, Huamin Qu

    Abstract: Creating data stories from raw data is challenging due to humans' limited attention spans and the need for specialized skills. Recent advancements in large language models (LLMs) offer great opportunities to develop systems with autonomous agents to streamline the data storytelling workflow. Though multi-agent systems have benefits such as fully realizing LLM potentials with decomposed tasks for i… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: 6 pages, 2 figures, IEEE VIS 2024 Gen4DS Workshop

  5. arXiv:2408.02801  [pdf, ps, other

    cs.LG math.OC stat.ML

    Sparse Deep Learning Models with the $\ell_1$ Regularization

    Authors: Lixin Shen, Rui Wang, Yuesheng Xu, Mingsong Yan

    Abstract: Sparse neural networks are highly desirable in deep learning in reducing its complexity. The goal of this paper is to study how choices of regularization parameters influence the sparsity level of learned neural networks. We first derive the $\ell_1$-norm sparsity-promoting deep learning models including single and multiple regularization parameters models, from a statistical viewpoint. We then ch… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  6. NotePlayer: Engaging Jupyter Notebooks for Dynamic Presentation of Analytical Processes

    Authors: Yang Ouyang, Leixian Shen, Yun Wang, Quan Li

    Abstract: Diverse presentation formats play a pivotal role in effectively conveying code and analytical processes during data analysis. One increasingly popular format is tutorial videos, particularly those based on Jupyter notebooks, which offer an intuitive interpretation of code and vivid explanations of analytical procedures. However, creating such videos requires a diverse skill set and significant man… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: 20 pages, UIST 2024

  7. arXiv:2407.18627  [pdf, ps, other

    cs.LG eess.SP

    Multi-Agent Deep Reinforcement Learning for Energy Efficient Multi-Hop STAR-RIS-Assisted Transmissions

    Authors: Pei-Hsiang Liao, Li-Hsiang Shen, Po-Chen Wu, Kai-Ten Feng

    Abstract: Simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) provides a promising way to expand coverage in wireless communications. However, limitation of single STAR-RIS inspire us to integrate the concept of multi-hop transmissions, as focused on RIS in existing research. Therefore, we propose the novel architecture of multi-hop STAR-RISs to achieve a wider range of… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: Accepted by Proc. IEEE VTC-fall

  8. arXiv:2407.17584  [pdf

    cs.CY

    Theorizing neuro-induced relationships between cognitive diversity, motivation, grit and academic performance in multidisciplinary engineering education context

    Authors: Duy Duong-Tran, Siqing Wei, Li Shen

    Abstract: Nowadays, engineers need to tackle many unprecedented challenges that are often complex, and, most importantly, cannot be exhaustively compartmentalized into a single engineering discipline. In other words, most engineering problems need to be solved from a multidisciplinary approach. However, conventional engineering programs usually adopt pedagogical approaches specifically tailored to tradition… ▽ More

    Submitted 13 May, 2024; originally announced July 2024.

    Comments: 10 pages, 2 figures

  9. arXiv:2407.17412  [pdf, other

    cs.CV cs.AI

    (PASS) Visual Prompt Locates Good Structure Sparsity through a Recurrent HyperNetwork

    Authors: Tianjin Huang, Fang Meng, Li Shen, Fan Liu, Yulong Pei, Mykola Pechenizkiy, Shiwei Liu, Tianlong Chen

    Abstract: Large-scale neural networks have demonstrated remarkable performance in different domains like vision and language processing, although at the cost of massive computation resources. As illustrated by compression literature, structural model pruning is a prominent algorithm to encourage model efficiency, thanks to its acceleration-friendly sparsity patterns. One of the key questions of structural p… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: Under review

  10. arXiv:2407.17126  [pdf

    cs.CL cs.AI

    SDoH-GPT: Using Large Language Models to Extract Social Determinants of Health (SDoH)

    Authors: Bernardo Consoli, Xizhi Wu, Song Wang, Xinyu Zhao, Yanshan Wang, Justin Rousseau, Tom Hartvigsen, Li Shen, Huanmei Wu, Yifan Peng, Qi Long, Tianlong Chen, Ying Ding

    Abstract: Extracting social determinants of health (SDoH) from unstructured medical notes depends heavily on labor-intensive annotations, which are typically task-specific, hampering reusability and limiting sharing. In this study we introduced SDoH-GPT, a simple and effective few-shot Large Language Model (LLM) method leveraging contrastive examples and concise instructions to extract SDoH without relying… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  11. arXiv:2407.13426  [pdf, other

    cs.CV

    WiNet: Wavelet-based Incremental Learning for Efficient Medical Image Registration

    Authors: Xinxing Cheng, Xi Jia, Wenqi Lu, Qiufu Li, Linlin Shen, Alexander Krull, Jinming Duan

    Abstract: Deep image registration has demonstrated exceptional accuracy and fast inference. Recent advances have adopted either multiple cascades or pyramid architectures to estimate dense deformation fields in a coarse-to-fine manner. However, due to the cascaded nature and repeated composition/warping operations on feature maps, these methods negatively increase memory usage during training and testing. M… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted by MICCAI 2024

  12. arXiv:2407.12709  [pdf, other

    cs.CV

    MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models

    Authors: Leyang Shen, Gongwei Chen, Rui Shao, Weili Guan, Liqiang Nie

    Abstract: Multimodal large language models (MLLMs) have demonstrated impressive capabilities across various vision-language tasks. However, a generalist MLLM typically underperforms compared with a specialist MLLM on most VL tasks, which can be attributed to task interference. In this paper, we propose a mixture of multimodal experts (MoME) to mitigate task interference and obtain a generalist MLLM. Our MoM… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Github: https://github.com/JiuTian-VL/MoME

  13. arXiv:2407.12676  [pdf, other

    cs.CV eess.IV

    CoSIGN: Few-Step Guidance of ConSIstency Model to Solve General INverse Problems

    Authors: Jiankun Zhao, Bowen Song, Liyue Shen

    Abstract: Diffusion models have been demonstrated as strong priors for solving general inverse problems. Most existing Diffusion model-based Inverse Problem Solvers (DIS) employ a plug-and-play approach to guide the sampling trajectory with either projections or gradients. Though effective, these methods generally necessitate hundreds of sampling steps, posing a dilemma between inference time and reconstruc… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  14. arXiv:2407.11188  [pdf, other

    cs.CV

    Efficient In-Context Medical Segmentation with Meta-driven Visual Prompt Selection

    Authors: Chenwei Wu, David Restrepo, Zitao Shuai, Zhongming Liu, Liyue Shen

    Abstract: In-context learning (ICL) with Large Vision Models (LVMs) presents a promising avenue in medical image segmentation by reducing the reliance on extensive labeling. However, the ICL performance of LVMs highly depends on the choices of visual prompts and suffers from domain shifts. While existing works leveraging LVMs for medical tasks have focused mainly on model-centric approaches like fine-tuning… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  15. Segmenting Medical Images with Limited Data

    Authors: Zhaoshan Liua, Qiujie Lv, Chau Hung Lee, Lei Shen

    Abstract: While computer vision has proven valuable for medical image segmentation, its application faces challenges such as limited dataset sizes and the complexity of effectively leveraging unlabeled images. To address these challenges, we present a novel semi-supervised, consistency-based approach termed the data-efficient medical segmenter (DEMS). The DEMS features an encoder-decoder architecture and in… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Neural Networks Accepted

  16. arXiv:2407.08466  [pdf, other

    eess.IV cs.CV

    Global Spatial-Temporal Information-based Residual ConvLSTM for Video Space-Time Super-Resolution

    Authors: Congrui Fu, Hui Yuan, Shiqi Jiang, Guanghui Zhang, Liquan Shen, Raouf Hamzaoui

    Abstract: By converting low-frame-rate, low-resolution videos into high-frame-rate, high-resolution ones, space-time video super-resolution techniques can enhance visual experiences and facilitate more efficient information dissemination. We propose a convolutional neural network (CNN) for space-time video super-resolution, namely GIRNet. To generate highly accurate features and thus improve performance, th… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  17. arXiv:2407.07089  [pdf, other

    cs.LG

    Fine-Tuning Linear Layers Only Is a Simple yet Effective Way for Task Arithmetic

    Authors: Ruochen Jin, Bojian Hou, Jiancong Xiao, Weijie Su, Li Shen

    Abstract: Task arithmetic has recently emerged as a cost-effective and scalable approach to edit pre-trained models directly in weight space, by adding the fine-tuned weights of different tasks. The performance has been further improved by a linear property which is illustrated by weight disentanglement. Yet, conventional linearization methods (e.g., NTK linearization) not only double the time and training… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  18. arXiv:2407.04489  [pdf, other

    cs.CV

    Dude: Dual Distribution-Aware Context Prompt Learning For Large Vision-Language Model

    Authors: Duy M. H. Nguyen, An T. Le, Trung Q. Nguyen, Nghiem T. Diep, Tai Nguyen, Duy Duong-Tran, Jan Peters, Li Shen, Mathias Niepert, Daniel Sonntag

    Abstract: Prompt learning methods are gaining increasing attention due to their ability to customize large vision-language models to new domains using pre-trained contextual knowledge and minimal training data. However, existing works typically rely on optimizing unified prompt inputs, often struggling with fine-grained classification tasks due to insufficient discriminative attributes. To tackle this, we c… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Version 1

  19. arXiv:2407.01552  [pdf

    cs.NI physics.optics

    High Spectral-Efficiency, Ultra-low MIMO SDM Transmission over a Field-Deployed Multi-Core OAM Fiber

    Authors: Junyi Liu, Zengquan Xu, Shuqi Mo, Yuming Huang, Yining Huang, Zhenhua Li, Yuying Guo, Lei Shen, Shuo Xu, Ran Gao, Cheng Du, Qian Feng, Jie Luo, Jie Liu, Siyuan Yu

    Abstract: Few-mode multi-core fiber (FM-MCF) based Space-Division Multiplexing (SDM) systems possess the potential to maximize the number of multiplexed spatial channels per fiber by harnessing both the space (fiber cores) and mode (optical mode per core) dimensions. However, to date, no SDM transmissions over field-deployed FM-MCFs in realistic outdoor settings have been reported, which contrasts with SDM… ▽ More

    Submitted 29 April, 2024; originally announced July 2024.

    Comments: 17 pages, 8 figures

  20. arXiv:2406.20078  [pdf, other

    cs.CV

    GM-DF: Generalized Multi-Scenario Deepfake Detection

    Authors: Yingxin Lai, Zitong Yu, Jing Yang, Bin Li, Xiangui Kang, Linlin Shen

    Abstract: Existing face forgery detection usually follows the paradigm of training models in a single domain, which leads to limited generalization capacity when unseen scenarios and unknown attacks occur. In this paper, we elaborately investigate the generalization capacity of deepfake detection models when jointly trained on multiple face forgery detection datasets. We first find a rapid degradation of de… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  21. arXiv:2406.18146  [pdf, other

    cs.CV

    A Refer-and-Ground Multimodal Large Language Model for Biomedicine

    Authors: Xiaoshuang Huang, Haifeng Huang, Lingdong Shen, Yehui Yang, Fangxin Shang, Junwei Liu, Jia Liu

    Abstract: With the rapid development of multimodal large language models (MLLMs), especially their capabilities in visual chat through refer and ground functionalities, their significance is increasingly recognized. However, the biomedical field currently exhibits a substantial gap in this area, primarily due to the absence of a dedicated refer and ground dataset for biomedical images. To address this chall… ▽ More

    Submitted 28 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted by MICCAI2024

  22. Joint Admission Control and Resource Allocation of Virtual Network Embedding via Hierarchical Deep Reinforcement Learning

    Authors: Tianfu Wang, Li Shen, Qilin Fan, Tong Xu, Tongliang Liu, Hui Xiong

    Abstract: As an essential resource management problem in network virtualization, virtual network embedding (VNE) aims to allocate the finite resources of physical network to sequentially arriving virtual network requests (VNRs) with different resource demands. Since this is an NP-hard combinatorial optimization problem, many efforts have been made to provide viable solutions. However, most existing approach… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Accepted by IEEE Transactions on Services Computing (TSC)

    Journal ref: IEEE Transactions on Services Computing ( Volume: 17, Issue: 3, May-June 2024)

  23. arXiv:2406.16518  [pdf

    cs.CV

    Vision Mamba-based autonomous crack segmentation on concrete, asphalt, and masonry surfaces

    Authors: Zhaohui Chen, Elyas Asadi Shamsabadi, Sheng Jiang, Luming Shen, Daniel Dias-da-Costa

    Abstract: Convolutional neural networks (CNNs) and Transformers have shown advanced accuracy in crack detection under certain conditions. Yet, the fixed local attention can compromise the generalisation of CNNs, and the quadratic complexity of the global self-attention restricts the practical deployment of Transformers. Given the emergence of the new-generation architecture of Mamba, this paper proposes a V… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 23 pages, 9 figures

  24. arXiv:2406.14794  [pdf, other

    eess.IV cs.CV cs.LG

    ImageFlowNet: Forecasting Multiscale Trajectories of Disease Progression with Irregularly-Sampled Longitudinal Medical Images

    Authors: Chen Liu, Ke Xu, Liangbo L. Shen, Guillaume Huguet, Zilong Wang, Alexander Tong, Danilo Bzdok, Jay Stewart, Jay C. Wang, Lucian V. Del Priore, Smita Krishnaswamy

    Abstract: The forecasting of disease progression from images is a holy grail for clinical decision making. However, this task is complicated by the inherent high dimensionality, temporal sparsity and sampling irregularity in longitudinal image acquisitions. Existing methods often rely on extracting hand-crafted features and performing time-series analysis in this vector space, leading to a loss of rich spat… ▽ More

    Submitted 12 July, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: Fixed some typos. Merged multibib

  25. arXiv:2406.13975  [pdf, other

    cs.CL cs.AI

    MR-BEN: A Comprehensive Meta-Reasoning Benchmark for Large Language Models

    Authors: Zhongshen Zeng, Yinhong Liu, Yingjia Wan, Jingyao Li, Pengguang Chen, Jianbo Dai, Yuxuan Yao, Rongwu Xu, Zehan Qi, Wanru Zhao, Linling Shen, Jianqiao Lu, Haochen Tan, Yukang Chen, Hao Zhang, Zhan Shi, Bailin Wang, Zhijiang Guo, Jiaya Jia

    Abstract: Large language models (LLMs) have shown increasing capability in problem-solving and decision-making, largely based on the step-by-step chain-of-thought reasoning processes. However, it has been increasingly challenging to evaluate the reasoning capability of LLMs. Concretely, existing outcome-based benchmarks begin to saturate and become less sufficient to monitor the progress. To this end, we pr… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  26. arXiv:2406.13963  [pdf, ps, other

    cs.CV

    SSAD: Self-supervised Auxiliary Detection Framework for Panoramic X-ray based Dental Disease Diagnosis

    Authors: Zijian Cai, Xinquan Yang, Xuguang Li, Xiaoling Luo, Xuechen Li, Linlin Shen, He Meng, Yongqiang Deng

    Abstract: Panoramic X-ray is a simple and effective tool for diagnosing dental diseases in clinical practice. When deep learning models are developed to assist dentist in interpreting panoramic X-rays, most of their performance suffers from the limited annotated data, which requires dentist's expertise and a lot of time cost. Although self-supervised learning (SSL) has been proposed to address this challeng… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  27. MegaVul: A C/C++ Vulnerability Dataset with Comprehensive Code Representation

    Authors: Chao Ni, Liyu Shen, Xiaohu Yang, Yan Zhu, Shaohua Wang

    Abstract: We constructed a newly large-scale and comprehensive C/C++ vulnerability dataset named MegaVul by crawling the Common Vulnerabilities and Exposures (CVE) database and CVE-related open-source projects. Specifically, we collected all crawlable descriptive information of the vulnerabilities from the CVE database and extracted all vulnerability-related code changes from 28 Git-based websites. We adopt… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 5 pages, 4figures

  28. arXiv:2406.11637  [pdf, other

    cs.HC

    PyGWalker: On-the-fly Assistant for Exploratory Visual Data Analysis

    Authors: Yue Yu, Leixian Shen, Fei Long, Huamin Qu, Hao Chen

    Abstract: Exploratory visual data analysis tools empower data analysts to efficiently and intuitively explore data insights throughout the entire analysis cycle. However, the gap between common programmatic analysis (e.g., within computational notebooks) and exploratory visual analysis leads to a disjointed and inefficient data analysis experience. To bridge this gap, we developed PyGWalker, a Python librar… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: To appear at the IEEE VIS Conference 2024

  29. arXiv:2406.10225  [pdf, other

    cs.CV

    SatDiffMoE: A Mixture of Estimation Method for Satellite Image Super-resolution with Latent Diffusion Models

    Authors: Zhaoxu Luo, Bowen Song, Liyue Shen

    Abstract: During the acquisition of satellite images, there is generally a trade-off between spatial resolution and temporal resolution (acquisition frequency) due to the onboard sensors of satellite imaging systems. High-resolution satellite images are very important for land crop monitoring, urban planning, wildfire management and a variety of applications. It is a significant yet challenging task to achi… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  30. arXiv:2406.10211  [pdf, other

    cs.CV

    DiffusionBlend: Learning 3D Image Prior through Position-aware Diffusion Score Blending for 3D Computed Tomography Reconstruction

    Authors: Bowen Song, Jason Hu, Zhaoxu Luo, Jeffrey A. Fessler, Liyue Shen

    Abstract: Diffusion models face significant challenges when employed for large-scale medical image reconstruction in real practice such as 3D Computed Tomography (CT). Due to the demanding memory, time, and data requirements, it is difficult to train a diffusion model directly on the entire volume of high-dimensional data to obtain an efficient 3D diffusion prior. Existing works utilizing diffusion priors o… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  31. arXiv:2406.09900  [pdf, other

    cs.CL

    GEB-1.3B: Open Lightweight Large Language Model

    Authors: Jie Wu, Yufeng Zhu, Lei Shen, Xuqing Lu

    Abstract: Recently developed large language models (LLMs) such as ChatGPT, Claude, and Llama have demonstrated impressive abilities, and even surpass human-level performance in several tasks. Despite their success, the resource-intensive demands of these models, requiring significant computational power for both training and inference, limit their deployment to high-performance servers. Additionally, the ex… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: GEB-1.3B technical report

  32. arXiv:2406.09770  [pdf, other

    cs.LG cs.AI

    Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion

    Authors: Anke Tang, Li Shen, Yong Luo, Shiwei Liu, Han Hu, Bo Du

    Abstract: Solving multi-objective optimization problems for large deep neural networks is a challenging task due to the complexity of the loss landscape and the expensive computational cost of training and evaluating models. Efficient Pareto front approximation of large models enables multi-objective optimization for various tasks such as multi-task learning and trade-off analysis. Existing algorithms for l… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: code is available at https://github.com/tanganke/pareto_set_learning

  33. arXiv:2406.09643  [pdf, other

    cs.LG

    Reinforced Decoder: Towards Training Recurrent Neural Networks for Time Series Forecasting

    Authors: Qi Sima, Xinze Zhang, Yukun Bao, Siyue Yang, Liang Shen

    Abstract: Recurrent neural network-based sequence-to-sequence models have been extensively applied for multi-step-ahead time series forecasting. These models typically involve a decoder trained using either its previous forecasts or the actual observed values as the decoder inputs. However, relying on self-generated predictions can lead to the rapid accumulation of errors over multiple steps, while using th… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 12 pages,8 figures

  34. arXiv:2406.08372  [pdf, other

    cs.CV

    APSeg: Auto-Prompt Network for Cross-Domain Few-Shot Semantic Segmentation

    Authors: Weizhao He, Yang Zhang, Wei Zhuo, Linlin Shen, Jiaqi Yang, Songhe Deng, Liang Sun

    Abstract: Few-shot semantic segmentation (FSS) endeavors to segment unseen classes with only a few labeled samples. Current FSS methods are commonly built on the assumption that their training and application scenarios share similar domains, and their performances degrade significantly while applied to a distinct domain. To this end, we propose to leverage the cutting-edge foundation model, the Segment Anyt… ▽ More

    Submitted 12 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: 15 pages, 9 figures

  35. arXiv:2406.07971  [pdf, other

    cs.CL cs.AI cs.LG

    It Takes Two: On the Seamlessness between Reward and Policy Model in RLHF

    Authors: Taiming Lu, Lingfeng Shen, Xinyu Yang, Weiting Tan, Beidi Chen, Huaxiu Yao

    Abstract: Reinforcement Learning from Human Feedback (RLHF) involves training policy models (PMs) and reward models (RMs) to align language models with human preferences. Instead of focusing solely on PMs and RMs independently, we propose to examine their interactions during fine-tuning, introducing the concept of seamlessness. Our study starts with observing the saturation phenomenon, where continual impro… ▽ More

    Submitted 13 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  36. arXiv:2406.07890  [pdf, other

    eess.AS cs.CL cs.LG

    Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions

    Authors: Anfeng Xu, Kevin Huang, Tiantian Feng, Lue Shen, Helen Tager-Flusberg, Shrikanth Narayanan

    Abstract: Speech foundation models, trained on vast datasets, have opened unique opportunities in addressing challenging low-resource speech understanding, such as child speech. In this work, we explore the capabilities of speech foundation models on child-adult speaker diarization. We show that exemplary foundation models can achieve 39.5% and 62.3% relative reductions in Diarization Error Rate and Speaker… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Interspeech 2024

  37. arXiv:2406.04603  [pdf, ps, other

    cs.CV

    Simplify Implant Depth Prediction as Video Grounding: A Texture Perceive Implant Depth Prediction Network

    Authors: Xinquan Yang, Xuguang Li, Xiaoling Luo, Leilei Zeng, Yudi Zhang, Linlin Shen, Yongqiang Deng

    Abstract: Surgical guide plate is an important tool for the dental implant surgery. However, the design process heavily relies on the dentist to manually simulate the implant angle and depth. When deep neural networks have been applied to assist the dentist quickly locates the implant position, most of them are not able to determine the implant depth. Inspired by the video grounding task which localizes the… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Journal ref: MICCAI'2024

  38. arXiv:2406.03280  [pdf, other

    cs.LG cs.AI cs.CL

    FusionBench: A Comprehensive Benchmark of Deep Model Fusion

    Authors: Anke Tang, Li Shen, Yong Luo, Han Hu, Bo Du, Dacheng Tao

    Abstract: Deep model fusion is an emerging technique that unifies the predictions or parameters of several deep neural networks into a single model in a cost-effective and data-efficient manner. This enables the unified model to take advantage of the original models' strengths, potentially exceeding their performance. Although a variety of deep model fusion techniques have been introduced, their evaluations… ▽ More

    Submitted 14 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

    Comments: Project homepage: https://github.com/tanganke/fusion_bench

  39. arXiv:2406.02462  [pdf, other

    cs.CV cs.AI

    Learning Image Priors through Patch-based Diffusion Models for Solving Inverse Problems

    Authors: Jason Hu, Bowen Song, Xiaojian Xu, Liyue Shen, Jeffrey A. Fessler

    Abstract: Diffusion models can learn strong image priors from underlying data distribution and use them to solve inverse problems, but the training process is computationally expensive and requires lots of data. Such bottlenecks prevent most existing works from being feasible for high-dimensional and high-resolution data such as 3D images. This paper proposes a method to learn an efficient data prior for th… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  40. arXiv:2406.01417  [pdf, other

    cs.LG cs.CV

    Mixup Augmentation with Multiple Interpolations

    Authors: Lifeng Shen, Jincheng Yu, Hansi Yang, James T. Kwok

    Abstract: Mixup and its variants form a popular class of data augmentation techniques.Using a random sample pair, it generates a new sample by linear interpolation of the inputs and labels. However, generating only one single interpolation may limit its augmentation ability. In this paper, we propose a simple yet effective extension called multi-mix, which generates multiple interpolations from a sample pai… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  41. arXiv:2405.19346  [pdf, other

    eess.SP cs.AI cs.LG

    Subject-Adaptive Transfer Learning Using Resting State EEG Signals for Cross-Subject EEG Motor Imagery Classification

    Authors: Sion An, Myeongkyun Kang, Soopil Kim, Philip Chikontwe, Li Shen, Sang Hyun Park

    Abstract: Electroencephalography (EEG) motor imagery (MI) classification is a fundamental, yet challenging task due to the variation of signals between individuals i.e., inter-subject variability. Previous approaches try to mitigate this using task-specific (TS) EEG signals from the target subject in training. However, recording TS EEG signals requires time and limits its applicability in various fields. In… ▽ More

    Submitted 9 July, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

    Comments: Early Accepted at MICCAI 2024

  42. arXiv:2405.18435  [pdf, other

    eess.IV cs.CV

    QUBIQ: Uncertainty Quantification for Biomedical Image Segmentation Challenge

    Authors: Hongwei Bran Li, Fernando Navarro, Ivan Ezhov, Amirhossein Bayat, Dhritiman Das, Florian Kofler, Suprosanna Shit, Diana Waldmannstetter, Johannes C. Paetzold, Xiaobin Hu, Benedikt Wiestler, Lucas Zimmer, Tamaz Amiranashvili, Chinmay Prabhakar, Christoph Berger, Jonas Weidner, Michelle Alonso-Basant, Arif Rashid, Ujjwal Baid, Wesam Adel, Deniz Ali, Bhakti Baheti, Yingbin Bai, Ishaan Bhatt, Sabri Can Cetindag , et al. (55 additional authors not shown)

    Abstract: Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the de… ▽ More

    Submitted 24 June, 2024; v1 submitted 19 March, 2024; originally announced May 2024.

    Comments: initial technical report

  43. arXiv:2405.18187  [pdf, other

    cs.LG

    AlignIQL: Policy Alignment in Implicit Q-Learning through Constrained Optimization

    Authors: Longxiang He, Li Shen, Junbo Tan, Xueqian Wang

    Abstract: Implicit Q-learning (IQL) serves as a strong baseline for offline RL, which learns the value function using only dataset actions through quantile regression. However, it is unclear how to recover the implicit policy from the learned implicit Q-function and why IQL can utilize weighted regression for policy extraction. IDQL reinterprets IQL as an actor-critic method and gets weights of implicit pol… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 19 pages, 3 figures, 4 tables

  44. arXiv:2405.18080  [pdf, other

    cs.LG

    HarmoDT: Harmony Multi-Task Decision Transformer for Offline Reinforcement Learning

    Authors: Shengchao Hu, Ziqing Fan, Li Shen, Ya Zhang, Yanfeng Wang, Dacheng Tao

    Abstract: The purpose of offline multi-task reinforcement learning (MTRL) is to develop a unified policy applicable to diverse tasks without the need for online environmental interaction. Recent advancements approach this through sequence modeling, leveraging the Transformer architecture's scalability and the benefits of parameter sharing to exploit task similarities. However, variations in task content and… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Published at ICML 2024

  45. arXiv:2405.17876  [pdf, other

    cs.LG cs.DC math.OC

    Decentralized Directed Collaboration for Personalized Federated Learning

    Authors: Yingqi Liu, Yifan Shi, Qinglun Li, Baoyuan Wu, Xueqian Wang, Li Shen

    Abstract: Personalized Federated Learning (PFL) is proposed to find the greatest personalized models for each client. To avoid the central failure and communication bottleneck in the server-based FL, we concentrate on the Decentralized Personalized Federated Learning (DPFL) that performs distributed model training in a Peer-to-Peer (P2P) manner. Most personalized works in DPFL are based on undirected and sy… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: CVPR 2024. arXiv admin note: text overlap with arXiv:2305.15157

  46. arXiv:2405.17098  [pdf, other

    cs.LG

    Q-value Regularized Transformer for Offline Reinforcement Learning

    Authors: Shengchao Hu, Ziqing Fan, Chaoqin Huang, Li Shen, Ya Zhang, Yanfeng Wang, Dacheng Tao

    Abstract: Recent advancements in offline reinforcement learning (RL) have underscored the capabilities of Conditional Sequence Modeling (CSM), a paradigm that learns the action distribution based on history trajectory and target returns for each state. However, these methods often struggle with stitching together optimal trajectories from sub-optimal ones due to the inconsistency between the sampled returns… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Published at ICML 2024

  47. arXiv:2405.17079  [pdf, other

    stat.ML cs.LG

    Learning with User-Level Local Differential Privacy

    Authors: Puning Zhao, Li Shen, Rongfei Fan, Qingming Li, Huiwen Wu, Jiafei Wu, Zhe Liu

    Abstract: User-level privacy is important in distributed systems. Previous research primarily focuses on the central model, while the local models have received much less attention. Under the central model, user-level DP is strictly stronger than the item-level one. However, under the local model, the relationship between user-level and item-level LDP becomes more complex, thus the analysis is crucially dif… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  48. arXiv:2405.16560  [pdf, other

    cs.LG

    Task Groupings Regularization: Data-Free Meta-Learning with Heterogeneous Pre-trained Models

    Authors: Yongxian Wei, Zixuan Hu, Li Shen, Zhenyi Wang, Yu Li, Chun Yuan, Dacheng Tao

    Abstract: Data-Free Meta-Learning (DFML) aims to derive knowledge from a collection of pre-trained models without accessing their original data, enabling the rapid adaptation to new unseen tasks. Current methods often overlook the heterogeneity among pre-trained models, which leads to performance degradation due to task conflicts. In this paper, we empirically and theoretically identify and analyze the mode… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  49. arXiv:2405.13771  [pdf, other

    eess.IV cs.CV cs.LG

    Multi-Dataset Multi-Task Learning for COVID-19 Prognosis

    Authors: Filippo Ruffini, Lorenzo Tronchin, Zhuoru Wu, Wenting Chen, Paolo Soda, Linlin Shen, Valerio Guarrasi

    Abstract: In the fight against the COVID-19 pandemic, leveraging artificial intelligence to predict disease outcomes from chest radiographic images represents a significant scientific aim. The challenge, however, lies in the scarcity of large, labeled datasets with compatible tasks for training deep learning models without leading to overfitting. Addressing this issue, we introduce a novel multi-dataset mul… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  50. arXiv:2405.13694  [pdf, other

    cs.CV

    Gaussian Time Machine: A Real-Time Rendering Methodology for Time-Variant Appearances

    Authors: Licheng Shen, Ho Ngai Chow, Lingyun Wang, Tong Zhang, Mengqiu Wang, Yuxing Han

    Abstract: Recent advancements in neural rendering techniques have significantly enhanced the fidelity of 3D reconstruction. Notably, the emergence of 3D Gaussian Splatting (3DGS) has marked a significant milestone by adopting a discrete scene representation, facilitating efficient training and real-time rendering. Several studies have successfully extended the real-time rendering capability of 3DGS to dynam… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 14 pages, 6 figures