Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 612 results for author: Chen, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.01207  [pdf, other

    cs.LG

    Towards General Industrial Intelligence: A Survey on IIoT-Enhanced Continual Large Models

    Authors: Jiao Chen, Jiayi He, Fangfang Chen, Zuohong Lv, Jianhua Tang, Weihua Li, Zuozhu Liu, Howard H. Yang, Guangjie Han

    Abstract: Currently, most applications in the Industrial Internet of Things (IIoT) still rely on CNN-based neural networks. Although Transformer-based large models (LMs), including language, vision, and multimodal models, have demonstrated impressive capabilities in AI-generated content (AIGC), their application in industrial domains, such as detection, planning, and control, remains relatively limited. Dep… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  2. arXiv:2409.00973  [pdf, other

    cs.CV

    IVGF: The Fusion-Guided Infrared and Visible General Framework

    Authors: Fangcen Liu, Chenqiang Gao, Fang Chen, Pengcheng Li, Junjie Guo, Deyu Meng

    Abstract: Infrared and visible dual-modality tasks such as semantic segmentation and object detection can achieve robust performance even in extreme scenes by fusing complementary information. Most current methods design task-specific frameworks, which are limited in generalization across multiple tasks. In this paper, we propose a fusion-guided infrared and visible general framework, IVGF, which can be eas… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: 11 pages, 8 figures

  3. arXiv:2409.00694  [pdf, other

    cs.CV

    IAFI-FCOS: Intra- and across-layer feature interaction FCOS model for lesion detection of CT images

    Authors: Qiu Guan, Mengjie Pan, Feng Chen, Zhiqiang Yang, Zhongwen Yu, Qianwei Zhou, Haigen Hu

    Abstract: Effective lesion detection in medical image is not only rely on the features of lesion region,but also deeply relative to the surrounding information.However,most current methods have not fully utilize it.What is more,multi-scale feature fusion mechanism of most traditional detectors are unable to transmit detail information without loss,which makes it hard to detect small and boundary ambiguous l… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

    Comments: 2024 IJCNN

  4. arXiv:2408.16245  [pdf, other

    cs.LG q-bio.BM

    Large-Scale Multi-omic Biosequence Transformers for Modeling Peptide-Nucleotide Interactions

    Authors: Sully F. Chen, Robert J. Steele, Beakal Lemeneh, Shivanand P. Lad, Eric Oermann

    Abstract: The transformer architecture has revolutionized bioinformatics and driven progress in the understanding and prediction of the properties of biomolecules. Almost all research on large-scale biosequence transformers has focused on one domain at a time (single-omic), usually nucleotides or peptides. These models have seen incredible success in downstream tasks in each domain and have achieved particu… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 27 pages, 5 figures

  5. arXiv:2408.16235  [pdf, other

    cs.CV

    LMT-GP: Combined Latent Mean-Teacher and Gaussian Process for Semi-supervised Low-light Image Enhancement

    Authors: Ye Yu, Fengxin Chen, Jun Yu, Zhen Kan

    Abstract: While recent low-light image enhancement (LLIE) methods have made significant advancements, they still face challenges in terms of low visual quality and weak generalization ability when applied to complex scenarios. To address these issues, we propose a semi-supervised method based on latent mean-teacher and Gaussian process, named LMT-GP. We first design a latent mean-teacher framework that inte… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  6. arXiv:2408.14418  [pdf, other

    cs.CL cs.AI

    MEDSAGE: Enhancing Robustness of Medical Dialogue Summarization to ASR Errors with LLM-generated Synthetic Dialogues

    Authors: Kuluhan Binici, Abhinav Ramesh Kashyap, Viktor Schlegel, Andy T. Liu, Vijay Prakash Dwivedi, Thanh-Tung Nguyen, Xiaoxue Gao, Nancy F. Chen, Stefan Winkler

    Abstract: Automatic Speech Recognition (ASR) systems are pivotal in transcribing speech into text, yet the errors they introduce can significantly degrade the performance of downstream tasks like summarization. This issue is particularly pronounced in clinical dialogue summarization, a low-resource domain where supervised data for fine-tuning is scarce, necessitating the use of ASR models as black-box solut… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  7. arXiv:2408.14087  [pdf, other

    cs.CV

    LSM-YOLO: A Compact and Effective ROI Detector for Medical Detection

    Authors: Zhongwen Yu, Qiu Guan, Jianmin Yang, Zhiqiang Yang, Qianwei Zhou, Yang Chen, Feng Chen

    Abstract: In existing medical Region of Interest (ROI) detection, there lacks an algorithm that can simultaneously satisfy both real-time performance and accuracy, not meeting the growing demand for automatic detection in medicine. Although the basic YOLO framework ensures real-time detection due to its fast speed, it still faces challenges in maintaining precision concurrently. To alleviate the above probl… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  8. arXiv:2408.12673  [pdf, other

    cs.AI

    Enhancing Transferability of Adversarial Attacks with GE-AdvGAN+: A Comprehensive Framework for Gradient Editing

    Authors: Zhibo Jin, Jiayu Zhang, Zhiyu Zhu, Yuchen Zhang, Jiahao Huang, Jianlong Zhou, Fang Chen

    Abstract: Transferable adversarial attacks pose significant threats to deep neural networks, particularly in black-box scenarios where internal model information is inaccessible. Studying adversarial attack methods helps advance the performance of defense mechanisms and explore model vulnerabilities. These methods can uncover and exploit weaknesses in models, promoting the development of more robust archite… ▽ More

    Submitted 31 August, 2024; v1 submitted 22 August, 2024; originally announced August 2024.

  9. arXiv:2408.11480  [pdf, other

    eess.IV cs.CV

    OAPT: Offset-Aware Partition Transformer for Double JPEG Artifacts Removal

    Authors: Qiao Mo, Yukang Ding, Jinhua Hao, Qiang Zhu, Ming Sun, Chao Zhou, Feiyu Chen, Shuyuan Zhu

    Abstract: Deep learning-based methods have shown remarkable performance in single JPEG artifacts removal task. However, existing methods tend to degrade on double JPEG images, which are prevalent in real-world scenarios. To address this issue, we propose Offset-Aware Partition Transformer for double JPEG artifacts removal, termed as OAPT. We conduct an analysis of double JPEG compression that results in up… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: 14 pages, 9 figures. Codes and models are available at https://github.com/QMoQ/OAPT.git

  10. arXiv:2408.10119  [pdf, other

    cs.CV cs.AI

    Factorized-Dreamer: Training A High-Quality Video Generator with Limited and Low-Quality Data

    Authors: Tao Yang, Yangming Shi, Yunwen Huang, Feng Chen, Yin Zheng, Lei Zhang

    Abstract: Text-to-video (T2V) generation has gained significant attention due to its wide applications to video generation, editing, enhancement and translation, \etc. However, high-quality (HQ) video synthesis is extremely challenging because of the diverse and complex motions existed in real world. Most existing works struggle to address this problem by collecting large-scale HQ videos, which are inaccess… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  11. arXiv:2408.09972  [pdf, other

    cs.RO cs.AI

    Edge-Cloud Collaborative Motion Planning for Autonomous Driving with Large Language Models

    Authors: Jiao Chen, Suyan Dai, Fangfang Chen, Zuohong Lv, Jianhua Tang

    Abstract: Integrating large language models (LLMs) into autonomous driving enhances personalization and adaptability in open-world scenarios. However, traditional edge computing models still face significant challenges in processing complex driving data, particularly regarding real-time performance and system efficiency. To address these challenges, this study introduces EC-Drive, a novel edge-cloud collabo… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  12. arXiv:2408.09248  [pdf, other

    cs.CV

    MagicID: Flexible ID Fidelity Generation System

    Authors: Zhaoli Deng, Wen Liu, Fanyi Wang, Junkang Zhang, Fan Chen, Meng Zhang, Wendong Zhang, Zhenpeng Mi

    Abstract: Portrait Fidelity Generation is a prominent research area in generative models, with a primary focus on enhancing both controllability and fidelity. Current methods face challenges in generating high-fidelity portrait results when faces occupy a small portion of the image with a low resolution, especially in multi-person group photo settings. To tackle these issues, we propose a systematic solutio… ▽ More

    Submitted 20 August, 2024; v1 submitted 17 August, 2024; originally announced August 2024.

  13. arXiv:2408.08656  [pdf, other

    cs.CL

    LLMs Are Biased Towards Output Formats! Systematically Evaluating and Mitigating Output Format Bias of LLMs

    Authors: Do Xuan Long, Hai Nguyen Ngoc, Tiviatis Sim, Hieu Dao, Shafiq Joty, Kenji Kawaguchi, Nancy F. Chen, Min-Yen Kan

    Abstract: We present the first systematic evaluation examining format bias in performance of large language models (LLMs). Our approach distinguishes between two categories of an evaluation metric under format constraints to reliably and accurately assess performance: one measures performance when format constraints are adhered to, while the other evaluates performance regardless of constraint adherence. We… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  14. arXiv:2408.08554  [pdf, other

    cs.LG

    ABQ-LLM: Arbitrary-Bit Quantized Inference Acceleration for Large Language Models

    Authors: Chao Zeng, Songwei Liu, Yusheng Xie, Hong Liu, Xiaojian Wang, Miao Wei, Shu Yang, Fangmin Chen, Xing Mei

    Abstract: Large Language Models (LLMs) have revolutionized natural language processing tasks. However, their practical application is constrained by substantial memory and computational demands. Post-training quantization (PTQ) is considered an effective method to accelerate LLM inference. Despite its growing popularity in LLM model compression, PTQ deployment faces two major challenges. First, low-bit quan… ▽ More

    Submitted 22 August, 2024; v1 submitted 16 August, 2024; originally announced August 2024.

  15. arXiv:2408.07733  [pdf, other

    cs.LG cs.CR

    Enhancing Adversarial Attacks via Parameter Adaptive Adversarial Attack

    Authors: Zhibo Jin, Jiayu Zhang, Zhiyu Zhu, Chenyu Zhang, Jiahao Huang, Jianlong Zhou, Fang Chen

    Abstract: In recent times, the swift evolution of adversarial attacks has captured widespread attention, particularly concerning their transferability and other performance attributes. These techniques are primarily executed at the sample level, frequently overlooking the intrinsic parameters of models. Such neglect suggests that the perturbations introduced in adversarial samples might have the potential f… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  16. arXiv:2408.06827  [pdf, other

    eess.AS cs.LG

    PRESENT: Zero-Shot Text-to-Prosody Control

    Authors: Perry Lam, Huayun Zhang, Nancy F. Chen, Berrak Sisman, Dorien Herremans

    Abstract: Current strategies for achieving fine-grained prosody control in speech synthesis entail extracting additional style embeddings or adopting more complex architectures. To enable zero-shot application of pretrained text-to-speech (TTS) models, we present PRESENT (PRosody Editing without Style Embeddings or New Training), which exploits explicit prosody prediction in FastSpeech2-based models by modi… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  17. arXiv:2408.06646  [pdf, other

    cs.CV

    Hybrid SD: Edge-Cloud Collaborative Inference for Stable Diffusion Models

    Authors: Chenqian Yan, Songwei Liu, Hongjian Liu, Xurui Peng, Xiaojian Wang, Fangming Chen, Lean Fu, Xing Mei

    Abstract: Stable Diffusion Models (SDMs) have shown remarkable proficiency in image synthesis. However, their broad application is impeded by their large model sizes and intensive computational requirements, which typically require expensive cloud servers for deployment. On the flip side, while there are many compact models tailored for edge devices that can reduce these demands, they often compromise on se… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  18. arXiv:2408.04236  [pdf, other

    cs.LG cs.AI

    Cluster-Wide Task Slowdown Detection in Cloud System

    Authors: Feiyi Chen, Yingying Zhang, Lunting Fan, Yuxuan Liang, Guansong Pang, Qingsong Wen, Shuiguang Deng

    Abstract: Slow task detection is a critical problem in cloud operation and maintenance since it is highly related to user experience and can bring substantial liquidated damages. Most anomaly detection methods detect it from a single-task aspect. However, considering millions of concurrent tasks in large-scale cloud computing clusters, it becomes impractical and inefficient. Moreover, single-task slowdowns… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: This paper has been accepted by KDD2024

  19. arXiv:2408.02561  [pdf, other

    cs.CV

    HQOD: Harmonious Quantization for Object Detection

    Authors: Long Huang, Zhiwei Dong, Song-Lu Chen, Ruiyao Zhang, Shutong Ti, Feng Chen, Xu-Cheng Yin

    Abstract: Task inharmony problem commonly occurs in modern object detectors, leading to inconsistent qualities between classification and regression tasks. The predicted boxes with high classification scores but poor localization positions or low classification scores but accurate localization positions will worsen the performance of detectors after Non-Maximum Suppression. Furthermore, when object detector… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: 2024 IEEE International Conference on Multimedia and Expo (ICME), July 15 - July 19, 2024, Niagra Falls, Ontario, Canada

  20. Towards End-to-End Explainable Facial Action Unit Recognition via Vision-Language Joint Learning

    Authors: Xuri Ge, Junchen Fu, Fuhai Chen, Shan An, Nicu Sebe, Joemon M. Jose

    Abstract: Facial action units (AUs), as defined in the Facial Action Coding System (FACS), have received significant research interest owing to their diverse range of applications in facial state analysis. Current mainstream FAU recognition models have a notable limitation, i.e., focusing only on the accuracy of AU recognition and overlooking explanations of corresponding AU states. In this paper, we propos… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: 10 pages, 5 figures, 4 tables

    Journal ref: ACM Multimedia 2024

  21. arXiv:2407.20283  [pdf, other

    cs.LG physics.ao-ph

    Spatial Temporal Approach for High-Resolution Gridded Wind Forecasting across Southwest Western Australia

    Authors: Fuling Chen, Kevin Vinsen, Arthur Filoche

    Abstract: Accurate wind speed and direction forecasting is paramount across many sectors, spanning agriculture, renewable energy generation, and bushfire management. However, conventional forecasting models encounter significant challenges in precisely predicting wind conditions at high spatial resolutions for individual locations or small geographical areas (< 20 km2) and capturing medium to long-range tem… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  22. arXiv:2407.19507  [pdf, other

    cs.CV cs.AI

    WeCromCL: Weakly Supervised Cross-Modality Contrastive Learning for Transcription-only Supervised Text Spotting

    Authors: Jingjing Wu, Zhengyao Fang, Pengyuan Lyu, Chengquan Zhang, Fanglin Chen, Guangming Lu, Wenjie Pei

    Abstract: Transcription-only Supervised Text Spotting aims to learn text spotters relying only on transcriptions but no text boundaries for supervision, thus eliminating expensive boundary annotation. The crux of this task lies in locating each transcription in scene text images without location annotations. In this work, we formulate this challenging problem as a Weakly Supervised Cross-modality Contrastiv… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  23. arXiv:2407.18392  [pdf, other

    cs.CV

    A Reference-Based 3D Semantic-Aware Framework for Accurate Local Facial Attribute Editing

    Authors: Yu-Kai Huang, Yutong Zheng, Yen-Shuo Su, Anudeepsekhar Bolimera, Han Zhang, Fangyi Chen, Marios Savvides

    Abstract: Facial attribute editing plays a crucial role in synthesizing realistic faces with specific characteristics while maintaining realistic appearances. Despite advancements, challenges persist in achieving precise, 3D-aware attribute modifications, which are crucial for consistent and accurate representations of faces from different angles. Current methods struggle with semantic entanglement and lack… ▽ More

    Submitted 28 July, 2024; v1 submitted 25 July, 2024; originally announced July 2024.

  24. arXiv:2407.17477  [pdf

    cs.CY cs.CL cs.LG

    Toward Automated Detection of Biased Social Signals from the Content of Clinical Conversations

    Authors: Feng Chen, Manas Satish Bedmutha, Ray-Yuan Chung, Janice Sabin, Wanda Pratt, Brian R. Wood, Nadir Weibel, Andrea L. Hartzler, Trevor Cohen

    Abstract: Implicit bias can impede patient-provider interactions and lead to inequities in care. Raising awareness is key to reducing such bias, but its manifestations in the social dynamics of patient-provider communication are difficult to detect. In this study, we used automated speech recognition (ASR) and natural language processing (NLP) to identify social signals in patient-provider interactions. We… ▽ More

    Submitted 30 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted by AMIA 2024 Annual Symposium

  25. arXiv:2407.16161  [pdf, other

    cs.LG

    TransFeat-TPP: An Interpretable Deep Covariate Temporal Point Processes

    Authors: Zizhuo Meng, Boyu Li, Xuhui Fan, Zhidong Li, Yang Wang, Fang Chen, Feng Zhou

    Abstract: The classical temporal point process (TPP) constructs an intensity function by taking the occurrence times into account. Nevertheless, occurrence time may not be the only relevant factor, other contextual data, termed covariates, may also impact the event evolution. Incorporating such covariates into the model is beneficial, while distinguishing their relevance to the event dynamics is of great pr… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  26. arXiv:2407.14006  [pdf, other

    eess.AS cs.SD

    MSceneSpeech: A Multi-Scene Speech Dataset For Expressive Speech Synthesis

    Authors: Qian Yang, Jialong Zuo, Zhe Su, Ziyue Jiang, Mingze Li, Zhou Zhao, Feiyang Chen, Zhefeng Wang, Baoxing Huai

    Abstract: We introduce an open source high-quality Mandarin TTS dataset MSceneSpeech (Multiple Scene Speech Dataset), which is intended to provide resources for expressive speech synthesis. MSceneSpeech comprises numerous audio recordings and texts performed and recorded according to daily life scenarios. Each scenario includes multiple speakers and a diverse range of prosodic styles, making it suitable for… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted by INTERSPEECH 2024

  27. arXiv:2407.13943  [pdf, other

    cs.CL cs.AI

    Werewolf Arena: A Case Study in LLM Evaluation via Social Deduction

    Authors: Suma Bailis, Jane Friedhoff, Feiyang Chen

    Abstract: This paper introduces Werewolf Arena, a novel framework for evaluating large language models (LLMs) through the lens of the classic social deduction game, Werewolf. In Werewolf Arena, LLMs compete against each other, navigating the game's complex dynamics of deception, deduction, and persuasion. The framework introduces a dynamic turn-taking system based on bidding, mirroring real-world discussion… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 13 pages, 10 figures

  28. arXiv:2407.10903  [pdf, other

    cs.CE

    Hedging Beyond the Mean: A Distributional Reinforcement Learning Perspective for Hedging Portfolios with Structured Products

    Authors: Anil Sharma, Freeman Chen, Jaesun Noh, Julio DeJesus, Mario Schlener

    Abstract: Research in quantitative finance has demonstrated that reinforcement learning (RL) methods have delivered promising outcomes in the context of hedging financial portfolios. For example, hedging a portfolio of European options using RL achieves better $PnL$ distribution than the trading hedging strategies like Delta neutral and Delta-Gamma neutral [Cao et. al. 2020]. There is great attention given… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  29. arXiv:2407.10061  [pdf, other

    cs.CV

    InfiniMotion: Mamba Boosts Memory in Transformer for Arbitrary Long Motion Generation

    Authors: Zeyu Zhang, Akide Liu, Qi Chen, Feng Chen, Ian Reid, Richard Hartley, Bohan Zhuang, Hao Tang

    Abstract: Text-to-motion generation holds potential for film, gaming, and robotics, yet current methods often prioritize short motion generation, making it challenging to produce long motion sequences effectively: (1) Current methods struggle to handle long motion sequences as a single input due to prohibitively high computational cost; (2) Breaking down the generation of long motion sequences into shorter… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  30. arXiv:2407.08926  [pdf, other

    cs.IR

    Toward Automatic Group Membership Annotation for Group Fairness Evaluation

    Authors: Fumian Chen, Dayu Yang, Hui Fang

    Abstract: With the increasing research attention on fairness in information retrieval systems, more and more fairness-aware algorithms have been proposed to ensure fairness for a sustainable and healthy retrieval ecosystem. However, as the most adopted measurement of fairness-aware algorithms, group fairness evaluation metrics, require group membership information that needs massive human annotations and is… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Journal ref: NLDB2024

  31. arXiv:2407.03655  [pdf, other

    eess.IV cs.CV

    Pathological Semantics-Preserving Learning for H&E-to-IHC Virtual Staining

    Authors: Fuqiang Chen, Ranran Zhang, Boyun Zheng, Yiwen Sun, Jiahui He, Wenjian Qin

    Abstract: Conventional hematoxylin-eosin (H&E) staining is limited to revealing cell morphology and distribution, whereas immunohistochemical (IHC) staining provides precise and specific visualization of protein activation at the molecular level. Virtual staining technology has emerged as a solution for highly efficient IHC examination, which directly transforms H&E-stained images to IHC-stained images. How… ▽ More

    Submitted 28 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

    Comments: accepted by MICCAI2024

  32. arXiv:2407.00928  [pdf, other

    cs.LG cs.CL

    FoldGPT: Simple and Effective Large Language Model Compression Scheme

    Authors: Songwei Liu, Chao Zeng, Lianqiang Li, Chenqian Yan, Lean Fu, Xing Mei, Fangmin Chen

    Abstract: The demand for deploying large language models(LLMs) on mobile devices continues to increase, driven by escalating data security concerns and cloud costs. However, network bandwidth and memory limitations pose challenges for deploying billion-level models on mobile devices. In this study, we investigate the outputs of different layers across various scales of LLMs and found that the outputs of mos… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  33. arXiv:2406.19791  [pdf, other

    cs.RO

    Mobile Robot Oriented Large-Scale Indoor Dataset for Dynamic Scene Understanding

    Authors: Yifan Tang, Cong Tai, Fangxing Chen, Wanting Zhang, Tao Zhang, Xueping Liu, Yongjin Liu, Long Zeng

    Abstract: Most existing robotic datasets capture static scene data and thus are limited in evaluating robots' dynamic performance. To address this, we present a mobile robot oriented large-scale indoor dataset, denoted as THUD (Tsinghua University Dynamic) robotic dataset, for training and evaluating their dynamic scene understanding algorithms. Specifically, the THUD dataset construction is first detailed,… ▽ More

    Submitted 30 June, 2024; v1 submitted 28 June, 2024; originally announced June 2024.

    Comments: This version has been accepted by ICRA2024 and the dataset has been published, where the link can be found in the paper

    Journal ref: IEEE International Conference on Robotics & Automation,2024

  34. arXiv:2406.18579  [pdf, other

    cs.CV cs.IR

    Hire: Hybrid-modal Interaction with Multiple Relational Enhancements for Image-Text Matching

    Authors: Xuri Ge, Fuhai Chen, Songpei Xu, Fuxiang Tao, Jie Wang, Joemon M. Jose

    Abstract: Image-text matching (ITM) is a fundamental problem in computer vision. The key issue lies in jointly learning the visual and textual representation to estimate their similarity accurately. Most existing methods focus on feature enhancement within modality or feature interaction across modalities, which, however, neglects the contextual information of the object representation based on the inter-ob… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 22pages, 5 Figures, 6 tables, the extension of CMSEI in WACV23, and submitted to ACM TIST. arXiv admin note: text overlap with arXiv:2210.08908

  35. arXiv:2406.17274  [pdf, other

    cs.CL cs.LG

    Can We Trust the Performance Evaluation of Uncertainty Estimation Methods in Text Summarization?

    Authors: Jianfeng He, Runing Yang, Linlin Yu, Changbin Li, Ruoxi Jia, Feng Chen, Ming Jin, Chang-Tien Lu

    Abstract: Text summarization, a key natural language generation (NLG) task, is vital in various domains. However, the high cost of inaccurate summaries in risk-critical applications, particularly those involving human-in-the-loop decision-making, raises concerns about the reliability of uncertainty estimation on text summarization (UE-TS) evaluation methods. This concern stems from the dependency of uncerta… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 63 pages, 41 figures, 11 tables

  36. arXiv:2406.17109  [pdf, other

    cs.CV

    GMT: Guided Mask Transformer for Leaf Instance Segmentation

    Authors: Feng Chen, Sotirios A. Tsaftaris, Mario Valerio Giuffrida

    Abstract: Leaf instance segmentation is a challenging multi-instance segmentation task, aiming to separate and delineate each leaf in an image of a plant. The delineation of each leaf is a necessary prerequisite task for several biology-related applications such as the fine-grained monitoring of plant growth, and crop yield estimation. The task is challenging because self-similarity of instances is high (si… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  37. arXiv:2406.16020  [pdf, other

    cs.SD cs.CL eess.AS

    AudioBench: A Universal Benchmark for Audio Large Language Models

    Authors: Bin Wang, Xunlong Zou, Geyu Lin, Shuo Sun, Zhuohan Liu, Wenyu Zhang, Zhengyuan Liu, AiTi Aw, Nancy F. Chen

    Abstract: We introduce AudioBench, a universal benchmark designed to evaluate Audio Large Language Models (AudioLLMs). It encompasses 8 distinct tasks and 26 datasets, among which, 7 are newly proposed datasets. The evaluation targets three main aspects: speech understanding, audio scene understanding, and voice understanding (paralinguistic). Despite recent advancements, there lacks a comprehensive benchma… ▽ More

    Submitted 2 September, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

    Comments: v3 - Abundent update on models and evaluation details; Code: https://github.com/AudioLLMs/AudioBench

  38. arXiv:2406.12219  [pdf, other

    cs.CV

    PCIE_EgoHandPose Solution for EgoExo4D Hand Pose Challenge

    Authors: Feng Chen, Ling Ding, Kanokphan Lertniphonphan, Jian Li, Kaer Huang, Zhepeng Wang

    Abstract: This report presents our team's 'PCIE_EgoHandPose' solution for the EgoExo4D Hand Pose Challenge at CVPR2024. The main goal of the challenge is to accurately estimate hand poses, which involve 21 3D joints, using an RGB egocentric video image provided for the task. This task is particularly challenging due to the subtle movements and occlusions. To handle the complexity of the task, we propose the… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  39. arXiv:2406.12211  [pdf, other

    cs.CV

    PCIE_LAM Solution for Ego4D Looking At Me Challenge

    Authors: Kanokphan Lertniphonphan, Jun Xie, Yaqing Meng, Shijing Wang, Feng Chen, Zhepeng Wang

    Abstract: This report presents our team's 'PCIE_LAM' solution for the Ego4D Looking At Me Challenge at CVPR2024. The main goal of the challenge is to accurately determine if a person in the scene is looking at the camera wearer, based on a video where the faces of social partners have been localized. Our proposed solution, InternLSTM, consists of an InternVL image encoder and a Bi-LSTM network. The InternVL… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  40. arXiv:2406.11434  [pdf, other

    cs.DB

    DB-GPT-Hub: Towards Open Benchmarking Text-to-SQL Empowered by Large Language Models

    Authors: Fan Zhou, Siqiao Xue, Danrui Qi, Wenhui Shi, Wang Zhao, Ganglin Wei, Hongyang Zhang, Caigai Jiang, Gangwei Jiang, Zhixuan Chu, Faqiang Chen

    Abstract: Large language models (LLMs) becomes the dominant paradigm for the challenging task of text-to-SQL. LLM-empowered text-to-SQL methods are typically categorized into prompting-based and tuning approaches. Compared to prompting-based methods, benchmarking fine-tuned LLMs for text-to-SQL is important yet under-explored, partially attributed to the prohibitively high computational cost. In this paper,… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  41. arXiv:2406.10484  [pdf, other

    cs.CV

    Beyond Raw Videos: Understanding Edited Videos with Large Multimodal Model

    Authors: Lu Xu, Sijie Zhu, Chunyuan Li, Chia-Wen Kuo, Fan Chen, Xinyao Wang, Guang Chen, Dawei Du, Ye Yuan, Longyin Wen

    Abstract: The emerging video LMMs (Large Multimodal Models) have achieved significant improvements on generic video understanding in the form of VQA (Visual Question Answering), where the raw videos are captured by cameras. However, a large portion of videos in real-world applications are edited videos, \textit{e.g.}, users usually cut and add effects/modifications to the raw video before publishing it on s… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  42. arXiv:2406.07920  [pdf, ps, other

    cs.LG cs.AI cs.CC math.ST stat.ML

    Near-Optimal Learning and Planning in Separated Latent MDPs

    Authors: Fan Chen, Constantinos Daskalakis, Noah Golowich, Alexander Rakhlin

    Abstract: We study computational and statistical aspects of learning Latent Markov Decision Processes (LMDPs). In this model, the learner interacts with an MDP drawn at the beginning of each epoch from an unknown mixture of MDPs. To sidestep known impossibility results, we consider several notions of separation of the constituent MDPs. The main thrust of this paper is in establishing a nearly-sharp *statist… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: COLT 2024

  43. arXiv:2406.07294  [pdf, other

    cs.RO cs.CV

    OTO Planner: An Efficient Only Travelling Once Exploration Planner for Complex and Unknown Environments

    Authors: Bo Zhou, Chuanzhao Lu, Yan Pan, Fu Chen

    Abstract: Autonomous exploration in complex and cluttered environments is essential for various applications. However, there are many challenges due to the lack of global heuristic information. Existing exploration methods suffer from the repeated paths and considerable computational resource requirement in large-scale environments. To address the above issues, this letter proposes an efficient exploration… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  44. arXiv:2406.06158  [pdf, other

    cs.LG cs.AI stat.ML

    Get rich quick: exact solutions reveal how unbalanced initializations promote rapid feature learning

    Authors: Daniel Kunin, Allan Raventós, Clémentine Dominé, Feng Chen, David Klindt, Andrew Saxe, Surya Ganguli

    Abstract: While the impressive performance of modern neural networks is often attributed to their capacity to efficiently extract task-relevant features from data, the mechanisms underlying this rich feature learning regime remain elusive, with much of our theoretical understanding stemming from the opposing lazy regime. In this work, we derive exact solutions to a minimal model that transitions between laz… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 40 pages, 12 figures

  45. arXiv:2406.03744  [pdf, other

    cs.CV cs.LG

    ReDistill: Residual Encoded Distillation for Peak Memory Reduction

    Authors: Fang Chen, Gourav Datta, Mujahid Al Rafi, Hyeran Jeon, Meng Tang

    Abstract: The expansion of neural network sizes and the enhancement of image resolution through modern camera sensors result in heightened memory and power demands for neural networks. Reducing peak memory, which is the maximum memory consumed during the execution of a neural network, is critical to deploy neural networks on edge devices with limited memory budget. A naive approach to reducing peak memory i… ▽ More

    Submitted 6 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

  46. arXiv:2406.03345  [pdf, other

    cs.LG cs.AI

    Feature Contamination: Neural Networks Learn Uncorrelated Features and Fail to Generalize

    Authors: Tianren Zhang, Chujie Zhao, Guanyu Chen, Yizhou Jiang, Feng Chen

    Abstract: Learning representations that generalize under distribution shifts is critical for building robust machine learning models. However, despite significant efforts in recent years, algorithmic advances in this direction have been limited. In this work, we seek to understand the fundamental difficulty of out-of-distribution generalization with deep neural networks. We first empirically show that perha… ▽ More

    Submitted 6 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  47. arXiv:2406.02963  [pdf, other

    cs.SD eess.AS

    Dataset-Distillation Generative Model for Speech Emotion Recognition

    Authors: Fabian Ritter-Gutierrez, Kuan-Po Huang, Jeremy H. M Wong, Dianwen Ng, Hung-yi Lee, Nancy F. Chen, Eng Siong Chng

    Abstract: Deep learning models for speech rely on large datasets, presenting computational challenges. Yet, performance hinges on training data size. Dataset Distillation (DD) aims to learn a smaller dataset without much performance degradation when training with it. DD has been investigated in computer vision but not yet in speech. This paper presents the first approach for DD to speech targeting Speech Em… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024

  48. arXiv:2405.20986  [pdf, other

    cs.LG cs.CV

    Uncertainty Quantification for Bird's Eye View Semantic Segmentation: Methods and Benchmarks

    Authors: Linlin Yu, Bowen Yang, Tianhao Wang, Kangshuo Li, Feng Chen

    Abstract: The fusion of raw features from multiple sensors on an autonomous vehicle to create a Bird's Eye View (BEV) representation is crucial for planning and control systems. There is growing interest in using deep learning models for BEV semantic segmentation. Anticipating segmentation errors and improving the explainability of DNNs is essential for autonomous driving, yet it is under-studied. This pape… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  49. arXiv:2405.20562  [pdf, other

    cs.LG cs.AI

    Can Machine Learning Assist in Diagnosis of Primary Immune Thrombocytopenia? A feasibility study

    Authors: Haroon Miah, Dimitrios Kollias, Giacinto Luca Pedone, Drew Provan, Frederick Chen

    Abstract: Primary Immune thrombocytopenia (ITP) is a rare autoimmune disease characterised by immune-mediated destruction of peripheral blood platelets in patients leading to low platelet counts and bleeding. The diagnosis and effective management of ITP is challenging because there is no established test to confirm the disease and no biomarker with which one can predict the response to treatment and outcom… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  50. arXiv:2405.19854  [pdf, other

    cs.CV

    RTGen: Generating Region-Text Pairs for Open-Vocabulary Object Detection

    Authors: Fangyi Chen, Han Zhang, Zhantao Yang, Hao Chen, Kai Hu, Marios Savvides

    Abstract: Open-vocabulary object detection (OVD) requires solid modeling of the region-semantic relationship, which could be learned from massive region-text pairs. However, such data is limited in practice due to significant annotation costs. In this work, we propose RTGen to generate scalable open-vocabulary region-text pairs and demonstrate its capability to boost the performance of open-vocabulary objec… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Technical report