Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 850 results for author: Ma, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.19174  [pdf, other

    cs.CV

    Reducing Spurious Correlation for Federated Domain Generalization

    Authors: Shuran Ma, Weiying Xie, Daixun Li, Haowei Li, Yunsong Li

    Abstract: The rapid development of multimedia has provided a large amount of data with different distributions for visual tasks, forming different domains. Federated Learning (FL) can efficiently use this diverse data distributed on different client media in a decentralized manner through model sharing. However, in open-world scenarios, there is a challenge: global models may struggle to predict well on ent… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

    Comments: 10 pages, 4 figures

  2. arXiv:2407.18961  [pdf, other

    cs.AI

    MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains

    Authors: Guoli Yin, Haoping Bai, Shuang Ma, Feng Nan, Yanchao Sun, Zhaoyang Xu, Shen Ma, Jiarui Lu, Xiang Kong, Aonan Zhang, Dian Ang Yap, Yizhe zhang, Karsten Ahnert, Vik Kamath, Mathias Berglund, Dominic Walsh, Tobias Gindele, Juergen Wiest, Zhengfeng Lai, Xiaoming Wang, Jiulong Shan, Meng Cao, Ruoming Pang, Zirui Wang

    Abstract: Recent advances in large language models (LLMs) have increased the demand for comprehensive benchmarks to evaluate their capabilities as human-like agents. Existing benchmarks, while useful, often focus on specific application scenarios, emphasizing task completion but failing to dissect the underlying skills that drive these outcomes. This lack of granularity makes it difficult to deeply discern… ▽ More

    Submitted 30 July, 2024; v1 submitted 17 July, 2024; originally announced July 2024.

  3. arXiv:2407.15498  [pdf, other

    cs.CL

    Refining Corpora from a Model Calibration Perspective for Chinese Spelling Correction

    Authors: Dingyao Yu, Yang An, Wei Ye, Xiongfeng Xiao, Shaoguang Mao, Tao Ge, Shikun Zhang

    Abstract: Chinese Spelling Correction (CSC) commonly lacks large-scale high-quality corpora, due to the labor-intensive labeling of spelling errors in real-life human writing or typing scenarios. Two data augmentation methods are widely adopted: (1) \textit{Random Replacement} with the guidance of confusion sets and (2) \textit{OCR/ASR-based Generation} that simulates character misusing. However, both metho… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  4. arXiv:2407.14814  [pdf, other

    cs.LG

    FMamba: Mamba based on Fast-attention for Multivariate Time-series Forecasting

    Authors: Shusen Ma, Yu Kang, Peng Bai, Yun-Bo Zhao

    Abstract: In multivariate time-series forecasting (MTSF), extracting the temporal correlations of the input sequences is crucial. While popular Transformer-based predictive models can perform well, their quadratic computational complexity results in inefficiency and high overhead. The recently emerged Mamba, a selective state space model, has shown promising results in many fields due to its strong temporal… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

  5. arXiv:2407.14192  [pdf, other

    cs.CL cs.AI

    LeKUBE: A Legal Knowledge Update BEnchmark

    Authors: Changyue Wang, Weihang Su, Hu Yiran, Qingyao Ai, Yueyue Wu, Cheng Luo, Yiqun Liu, Min Zhang, Shaoping Ma

    Abstract: Recent advances in Large Language Models (LLMs) have significantly shaped the applications of AI in multiple fields, including the studies of legal intelligence. Trained on extensive legal texts, including statutes and legal documents, the legal LLMs can capture important legal knowledge/concepts effectively and provide important support for downstream legal applications such as legal consultancy.… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  6. arXiv:2407.14009  [pdf, other

    cs.CV

    Scale Disparity of Instances in Interactive Point Cloud Segmentation

    Authors: Chenrui Han, Xuan Yu, Yuxuan Xie, Yili Liu, Sitong Mao, Shunbo Zhou, Rong Xiong, Yue Wang

    Abstract: Interactive point cloud segmentation has become a pivotal task for understanding 3D scenes, enabling users to guide segmentation models with simple interactions such as clicks, therefore significantly reducing the effort required to tailor models to diverse scenarios and new categories. However, in the realm of interactive segmentation, the meaning of instance diverges from that in instance segmen… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted by 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems

  7. arXiv:2407.13228  [pdf

    cs.CL cs.CY cs.ET cs.LG

    Evaluating Large Language Models for Anxiety and Depression Classification using Counseling and Psychotherapy Transcripts

    Authors: Junwei Sun, Siqi Ma, Yiran Fan, Peter Washington

    Abstract: We aim to evaluate the efficacy of traditional machine learning and large language models (LLMs) in classifying anxiety and depression from long conversational transcripts. We fine-tune both established transformer models (BERT, RoBERTa, Longformer) and more recent large models (Mistral-7B), trained a Support Vector Machine with feature engineering, and assessed GPT models through prompting. We ob… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  8. arXiv:2407.12070  [pdf, other

    cs.LG cs.AI

    Co-Designing Binarized Transformer and Hardware Accelerator for Efficient End-to-End Edge Deployment

    Authors: Yuhao Ji, Chao Fang, Shaobo Ma, Haikuo Shao, Zhongfeng Wang

    Abstract: Transformer models have revolutionized AI tasks, but their large size hinders real-world deployment on resource-constrained and latency-critical edge devices. While binarized Transformers offer a promising solution by significantly reducing model size, existing approaches suffer from algorithm-hardware mismatches with limited co-design exploration, leading to suboptimal performance on edge devices… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: This paper is accepted by ICCAD 2024

  9. arXiv:2407.11449  [pdf, other

    cs.CV cs.AI

    Controllable Contextualized Image Captioning: Directing the Visual Narrative through User-Defined Highlights

    Authors: Shunqi Mao, Chaoyi Zhang, Hang Su, Hwanjun Song, Igor Shalyminov, Weidong Cai

    Abstract: Contextualized Image Captioning (CIC) evolves traditional image captioning into a more complex domain, necessitating the ability for multimodal reasoning. It aims to generate image captions given specific contextual information. This paper further introduces a novel domain of Controllable Contextualized Image Captioning (Ctrl-CIC). Unlike CIC, which solely relies on broad context, Ctrl-CIC accentu… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  10. arXiv:2407.11372  [pdf, other

    cs.CR cs.CV

    UNIT: Backdoor Mitigation via Automated Neural Distribution Tightening

    Authors: Siyuan Cheng, Guangyu Shen, Kaiyuan Zhang, Guanhong Tao, Shengwei An, Hanxi Guo, Shiqing Ma, Xiangyu Zhang

    Abstract: Deep neural networks (DNNs) have demonstrated effectiveness in various fields. However, DNNs are vulnerable to backdoor attacks, which inject a unique pattern, called trigger, into the input to cause misclassification to an attack-chosen target label. While existing works have proposed various methods to mitigate backdoor effects in poisoned models, they tend to be less effective against recent ad… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: The 18th European Conference on Computer Vision ECCV 2024

  11. arXiv:2407.11282  [pdf, other

    cs.CL

    Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models

    Authors: Qingcheng Zeng, Mingyu Jin, Qinkai Yu, Zhenting Wang, Wenyue Hua, Zihao Zhou, Guangyan Sun, Yanda Meng, Shiqing Ma, Qifan Wang, Felix Juefei-Xu, Kaize Ding, Fan Yang, Ruixiang Tang, Yongfeng Zhang

    Abstract: Large Language Models (LLMs) are employed across various high-stakes domains, where the reliability of their outputs is crucial. One commonly used method to assess the reliability of LLMs' responses is uncertainty estimation, which gauges the likelihood of their answers being correct. While many studies focus on improving the accuracy of uncertainty estimations for LLMs, our research investigates… ▽ More

    Submitted 19 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

  12. arXiv:2407.10969  [pdf, other

    cs.CL cs.LG

    Q-Sparse: All Large Language Models can be Fully Sparsely-Activated

    Authors: Hongyu Wang, Shuming Ma, Ruiping Wang, Furu Wei

    Abstract: We introduce, Q-Sparse, a simple yet effective approach to training sparsely-activated large language models (LLMs). Q-Sparse enables full sparsity of activations in LLMs which can bring significant efficiency gains in inference. This is achieved by applying top-K sparsification to the activations and the straight-through-estimator to the training. We also introduce Block Q-Sparse for batch traini… ▽ More

    Submitted 24 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: Work in progress

  13. arXiv:2407.10805  [pdf, other

    cs.CL cs.AI

    Think-on-Graph 2.0: Deep and Interpretable Large Language Model Reasoning with Knowledge Graph-guided Retrieval

    Authors: Shengjie Ma, Chengjin Xu, Xuhui Jiang, Muzhi Li, Huaren Qu, Jian Guo

    Abstract: Retrieval-augmented generation (RAG) has significantly advanced large language models (LLMs) by enabling dynamic information retrieval to mitigate knowledge gaps and hallucinations in generated content. However, these systems often falter with complex reasoning and consistency across diverse queries. In this work, we present Think-on-Graph 2.0, an enhanced RAG framework that aligns questions with… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  14. arXiv:2407.10131  [pdf, other

    cs.CV

    WPS-SAM: Towards Weakly-Supervised Part Segmentation with Foundation Models

    Authors: Xinjian Wu, Ruisong Zhang, Jie Qin, Shijie Ma, Cheng-Lin Liu

    Abstract: Segmenting and recognizing diverse object parts is crucial in computer vision and robotics. Despite significant progress in object segmentation, part-level segmentation remains underexplored due to complex boundaries and scarce annotated data. To address this, we propose a novel Weakly-supervised Part Segmentation (WPS) setting and an approach called WPS-SAM, built on the large-scale pre-trained v… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  15. arXiv:2407.08919  [pdf, other

    cs.NI cs.ET eess.SP

    Redefinition of Digital Twin and its Situation Awareness Framework Designing Towards Fourth Paradigm for Energy Internet of Things

    Authors: Xing He, Yuezhong Tang, Shuyan Ma, Qian Ai, Fei Tao, Robert Qiu

    Abstract: Traditional knowledge-based situation awareness (SA) modes struggle to adapt to the escalating complexity of today's Energy Internet of Things (EIoT), necessitating a pivotal paradigm shift. In response, this work introduces a pioneering data-driven SA framework, termed digital twin-based situation awareness (DT-SA), aiming to bridge existing gaps between data and demands, and further to enhance S… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 16 pages, 15 figures Accepted by IEEE Transactions on Systems, Man and Cybernetics: Systems

  16. arXiv:2407.07587  [pdf, other

    cs.CV

    Let Occ Flow: Self-Supervised 3D Occupancy Flow Prediction

    Authors: Yili Liu, Linzhan Mou, Xuan Yu, Chenrui Han, Sitong Mao, Rong Xiong, Yue Wang

    Abstract: Accurate perception of the dynamic environment is a fundamental task for autonomous driving and robot systems. This paper introduces Let Occ Flow, the first self-supervised work for joint 3D occupancy and occupancy flow prediction using only camera inputs, eliminating the need for 3D annotations. Utilizing TPV for unified scene representation and deformable attention layers for feature aggregation… ▽ More

    Submitted 18 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

  17. arXiv:2407.06159  [pdf, other

    cs.CV

    A Semantic-Aware and Multi-Guided Network for Infrared-Visible Image Fusion

    Authors: Xiaoli Zhang, Liying Wang, Libo Zhao, Xiongfei Li, Siwei Ma

    Abstract: Multi-modality image fusion aims at fusing specific-modality and shared-modality information from two source images. To tackle the problem of insufficient feature extraction and lack of semantic awareness for complex scenes, this paper focuses on how to model correlation-driven decomposing features and reason high-level graph representation by efficiently extracting complementary features and mult… ▽ More

    Submitted 11 June, 2024; originally announced July 2024.

  18. arXiv:2407.06112  [pdf, other

    cs.CL

    Enhancing Language Model Rationality with Bi-Directional Deliberation Reasoning

    Authors: Yadong Zhang, Shaoguang Mao, Wenshan Wu, Yan Xia, Tao Ge, Man Lan, Furu Wei

    Abstract: This paper introduces BI-Directional DEliberation Reasoning (BIDDER), a novel reasoning approach to enhance the decision rationality of language models. Traditional reasoning methods typically rely on historical information and employ uni-directional (left-to-right) reasoning strategy. This lack of bi-directional deliberation reasoning results in limited awareness of potential future outcomes and… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  19. arXiv:2407.04675  [pdf, other

    eess.AS cs.SD

    Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition

    Authors: Ye Bai, Jingping Chen, Jitong Chen, Wei Chen, Zhuo Chen, Chuang Ding, Linhao Dong, Qianqian Dong, Yujiao Du, Kepan Gao, Lu Gao, Yi Guo, Minglun Han, Ting Han, Wenchao Hu, Xinying Hu, Yuxiang Hu, Deyu Hua, Lu Huang, Mingkun Huang, Youjia Huang, Jishuo Jin, Fanliu Kong, Zongwei Lan, Tianyu Li , et al. (30 additional authors not shown)

    Abstract: Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this wor… ▽ More

    Submitted 10 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

  20. arXiv:2407.02805  [pdf, other

    cs.SE cs.AI

    Efficient DNN-Powered Software with Fair Sparse Models

    Authors: Xuanqi Gao, Weipeng Jiang, Juan Zhai, Shiqing Ma, Xiaoyu Zhang, Chao Shen

    Abstract: With the emergence of the Software 3.0 era, there is a growing trend of compressing and integrating large models into software systems, with significant societal implications. Regrettably, in numerous instances, model compression techniques impact the fairness performance of these models and thus the ethical behavior of DNN-powered software. One of the most notable example is the Lottery Ticket Hy… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  21. arXiv:2407.01896  [pdf, other

    cs.CL cs.IR

    LogEval: A Comprehensive Benchmark Suite for Large Language Models In Log Analysis

    Authors: Tianyu Cui, Shiyu Ma, Ziang Chen, Tong Xiao, Shimin Tao, Yilun Liu, Shenglin Zhang, Duoming Lin, Changchang Liu, Yuzhe Cai, Weibin Meng, Yongqian Sun, Dan Pei

    Abstract: Log analysis is crucial for ensuring the orderly and stable operation of information systems, particularly in the field of Artificial Intelligence for IT Operations (AIOps). Large Language Models (LLMs) have demonstrated significant potential in natural language processing tasks. In the AIOps domain, they excel in tasks such as anomaly detection, root cause analysis of faults, operations and maint… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  22. arXiv:2407.01537  [pdf, other

    cs.RO cs.CV

    WaveShot: A Compact Portable Unmanned Surface Vessel for Dynamic Water Surface Videography and Media Production

    Authors: Shijian Ma, Shicong Ma, Weize Ma

    Abstract: This paper presents WaveShot, an innovative portable unmanned surface vessel that aims to transform water surface videography by offering a highly maneuverable, cost-effective, and safe alternative to traditional filming methods. WaveShot is specially designed for the modern demands of film production, advertising, documentaries, and visual arts, equipped with professional-grade waterproof cameras… ▽ More

    Submitted 12 March, 2024; originally announced July 2024.

  23. arXiv:2407.01349  [pdf, other

    cs.CV cs.RO

    PanopticRecon: Leverage Open-vocabulary Instance Segmentation for Zero-shot Panoptic Reconstruction

    Authors: Xuan Yu, Yili Liu, Chenrui Han, Sitong Mao, Shunbo Zhou, Rong Xiong, Yiyi Liao, Yue Wang

    Abstract: Panoptic reconstruction is a challenging task in 3D scene understanding. However, most existing methods heavily rely on pre-trained semantic segmentation models and known 3D object bounding boxes for 3D panoptic segmentation, which is not available for in-the-wild scenes. In this paper, we propose a novel zero-shot panoptic reconstruction method from RGB-D images of scenes. For zero-shot segmentat… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  24. arXiv:2407.00466  [pdf, other

    cs.CL cs.AI

    BioKGBench: A Knowledge Graph Checking Benchmark of AI Agent for Biomedical Science

    Authors: Xinna Lin, Siqi Ma, Junjie Shan, Xiaojing Zhang, Shell Xu Hu, Tiannan Guo, Stan Z. Li, Kaicheng Yu

    Abstract: Pursuing artificial intelligence for biomedical science, a.k.a. AI Scientist, draws increasing attention, where one common approach is to build a copilot agent driven by Large Language Models (LLMs). However, to evaluate such systems, people either rely on direct Question-Answering (QA) to the LLM itself, or in a biomedical experimental manner. How to precisely benchmark biomedical agents from an… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  25. arXiv:2407.00247  [pdf, other

    cs.CV

    Prompt Refinement with Image Pivot for Text-to-Image Generation

    Authors: Jingtao Zhan, Qingyao Ai, Yiqun Liu, Yingwei Pan, Ting Yao, Jiaxin Mao, Shaoping Ma, Tao Mei

    Abstract: For text-to-image generation, automatically refining user-provided natural language prompts into the keyword-enriched prompts favored by systems is essential for the user experience. Such a prompt refinement process is analogous to translating the prompt from "user languages" into "system languages". However, the scarcity of such parallel corpora makes it difficult to train a prompt refinement mod… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: Accepted by ACL 2024

  26. arXiv:2406.19581  [pdf, ps, other

    cs.HC cs.LG

    HarmonICA: Neural non-stationarity correction and source separation for motor neuron interfaces

    Authors: Alexander Kenneth Clarke, Agnese Grison, Irene Mendez Guerra, Pranav Mamidanna, Shihan Ma, Silvia Muceli, Dario Farina

    Abstract: A major outstanding problem when interfacing with spinal motor neurons is how to accurately compensate for non-stationary effects in the signal during source separation routines, particularly when they cannot be estimated in advance. This forces current systems to instead use undifferentiated bulk signal, which limits the potential degrees of freedom for control. In this study we propose a potenti… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  27. arXiv:2406.14367  [pdf, other

    cs.CV cs.AI

    PoseBench: Benchmarking the Robustness of Pose Estimation Models under Corruptions

    Authors: Sihan Ma, Jing Zhang, Qiong Cao, Dacheng Tao

    Abstract: Pose estimation aims to accurately identify anatomical keypoints in humans and animals using monocular images, which is crucial for various applications such as human-machine interaction, embodied AI, and autonomous driving. While current models show promising results, they are typically trained and tested on clean data, potentially overlooking the corruption during real-world deployment and thus… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Technical report. Project page: https://xymsh.github.io/PoseBench/

  28. arXiv:2406.13117  [pdf, other

    cs.AI

    State-of-the-Art Review: The Use of Digital Twins to Support Artificial Intelligence-Guided Predictive Maintenance

    Authors: Sizhe Ma, Katherine A. Flanigan, Mario Bergés

    Abstract: In recent years, predictive maintenance (PMx) has gained prominence for its potential to enhance efficiency, automation, accuracy, and cost-effectiveness while reducing human involvement. Importantly, PMx has evolved in tandem with digital advancements, such as Big Data and the Internet of Things (IOT). These technological strides have enabled Artificial Intelligence (AI) to revolutionize PMx proc… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: This work has been submitted to Springer for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  29. arXiv:2406.12196  [pdf, other

    cs.SE

    CITADEL: Context Similarity Based Deep Learning Framework Bug Finding

    Authors: Xiaoyu Zhang, Juan Zhai, Shiqing Ma, Shiwei Wang, Chao Shen

    Abstract: With deep learning (DL) technology becoming an integral part of the new intelligent software, tools of DL framework testing and bug-finding are in high demand. Existing DL framework testing tools have limited coverage on bug types. For example, they lack the capability of finding performance bugs, which are critical for DL model training and inference regarding performance, economics, and the envi… ▽ More

    Submitted 18 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: 12 pages, 10 figures

  30. arXiv:2406.11931  [pdf, other

    cs.SE cs.AI cs.LG

    DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

    Authors: DeepSeek-AI, Qihao Zhu, Daya Guo, Zhihong Shao, Dejian Yang, Peiyi Wang, Runxin Xu, Y. Wu, Yukun Li, Huazuo Gao, Shirong Ma, Wangding Zeng, Xiao Bi, Zihui Gu, Hanwei Xu, Damai Dai, Kai Dong, Liyue Zhang, Yishi Piao, Zhibin Gou, Zhenda Xie, Zhewen Hao, Bingxuan Wang, Junxiao Song, Deli Chen , et al. (15 additional authors not shown)

    Abstract: We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathe… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  31. arXiv:2406.11698  [pdf, other

    cs.CL

    Meta Reasoning for Large Language Models

    Authors: Peizhong Gao, Ao Xie, Shaoguang Mao, Wenshan Wu, Yan Xia, Haipeng Mi, Furu Wei

    Abstract: We introduce Meta-Reasoning Prompting (MRP), a novel and efficient system prompting method for large language models (LLMs) inspired by human meta-reasoning. Traditional in-context learning-based reasoning techniques, such as Tree-of-Thoughts, show promise but lack consistent state-of-the-art performance across diverse tasks due to their specialized nature. MRP addresses this limitation by guiding… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  32. arXiv:2406.11633  [pdf, other

    cs.CV

    DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models

    Authors: Renqiu Xia, Song Mao, Xiangchao Yan, Hongbin Zhou, Bo Zhang, Haoyang Peng, Jiahao Pi, Daocheng Fu, Wenjie Wu, Hancheng Ye, Shiyang Feng, Bin Wang, Chao Xu, Conghui He, Pinlong Cai, Min Dou, Botian Shi, Sheng Zhou, Yongwei Wang, Bin Wang, Junchi Yan, Fei Wu, Yu Qiao

    Abstract: Scientific documents record research findings and valuable human knowledge, comprising a vast corpus of high-quality data. Leveraging multi-modality data extracted from these documents and assessing large models' abilities to handle scientific document-oriented tasks is therefore meaningful. Despite promising advancements, large models still perform poorly on multi-page scientific document extract… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Homepage of DocGenome: https://unimodal4reasoning.github.io/DocGenome_page 22 pages, 11 figures

  33. arXiv:2406.09627  [pdf, other

    cs.CV cs.AI eess.IV

    RobustSAM: Segment Anything Robustly on Degraded Images

    Authors: Wei-Ting Chen, Yu-Jiet Vong, Sy-Yen Kuo, Sizhuo Ma, Jian Wang

    Abstract: Segment Anything Model (SAM) has emerged as a transformative approach in image segmentation, acclaimed for its robust zero-shot segmentation capabilities and flexible prompting system. Nonetheless, its performance is challenged by images with degraded quality. Addressing this limitation, we propose the Robust Segment Anything Model (RobustSAM), which enhances SAM's performance on low-quality image… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted by CVPR2024 (Highlight); Project Page: https://robustsam.github.io/

  34. arXiv:2406.09622  [pdf, other

    cs.CV cs.AI eess.IV

    DSL-FIQA: Assessing Facial Image Quality via Dual-Set Degradation Learning and Landmark-Guided Transformer

    Authors: Wei-Ting Chen, Gurunandan Krishnan, Qiang Gao, Sy-Yen Kuo, Sizhuo Ma, Jian Wang

    Abstract: Generic Face Image Quality Assessment (GFIQA) evaluates the perceptual quality of facial images, which is crucial in improving image restoration algorithms and selecting high-quality face images for downstream tasks. We present a novel transformer-based method for GFIQA, which is aided by two unique mechanisms. First, a Dual-Set Degradation Representation Learning (DSL) mechanism uses facial image… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted by CVPR 2024, Project Page: https://dsl-fiqa.github.io/

  35. arXiv:2406.09389  [pdf, other

    eess.IV cs.CV

    Sagiri: Low Dynamic Range Image Enhancement with Generative Diffusion Prior

    Authors: Baiang Li, Sizhuo Ma, Yanhong Zeng, Xiaogang Xu, Youqing Fang, Zhao Zhang, Jian Wang, Kai Chen

    Abstract: Capturing High Dynamic Range (HDR) scenery using 8-bit cameras often suffers from over-/underexposure, loss of fine details due to low bit-depth compression, skewed color distributions, and strong noise in dark areas. Traditional LDR image enhancement methods primarily focus on color mapping, which enhances the visual representation by expanding the image's color range and adjusting the brightness… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: https://sagiri0208.github.io

  36. arXiv:2406.08851  [pdf, other

    cs.LG

    Inverse Probability of Treatment Weighting with Deep Sequence Models Enables Accurate treatment effect Estimation from Electronic Health Records

    Authors: Junghwan Lee, Simin Ma, Nicoleta Serban, Shihao Yang

    Abstract: Observational data have been actively used to estimate treatment effect, driven by the growing availability of electronic health records (EHRs). However, EHRs typically consist of longitudinal records, often introducing time-dependent confoundings that hinder the unbiased estimation of treatment effect. Inverse probability of treatment weighting (IPTW) is a widely used propensity score method sinc… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  37. arXiv:2406.07411  [pdf, other

    cs.SE cs.CL

    VersiCode: Towards Version-controllable Code Generation

    Authors: Tongtong Wu, Weigang Wu, Xingyu Wang, Kang Xu, Suyu Ma, Bo Jiang, Ping Yang, Zhenchang Xing, Yuan-Fang Li, Gholamreza Haffari

    Abstract: Significant research has focused on improving the performance of large language model on code-related tasks due to their practical importance. Although performance is typically evaluated using public benchmark datasets, the existing datasets do not account for the concept of \emph{version}, which is crucial in professional software development. In this paper, we introduce VersiCode, the first comp… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  38. arXiv:2406.05927  [pdf, other

    cs.CV cs.CR cs.LG

    MeanSparse: Post-Training Robustness Enhancement Through Mean-Centered Feature Sparsification

    Authors: Sajjad Amini, Mohammadreza Teymoorianfard, Shiqing Ma, Amir Houmansadr

    Abstract: We present a simple yet effective method to improve the robustness of Convolutional Neural Networks (CNNs) against adversarial examples by post-processing an adversarially trained model. Our technique, MeanSparse, cascades the activation functions of a trained model with novel operators that sparsify mean-centered feature vectors. This is equivalent to reducing feature variations around the mean,… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  39. arXiv:2406.05688  [pdf, other

    cs.CL cs.AI cs.LG

    Peer Review as A Multi-Turn and Long-Context Dialogue with Role-Based Interactions

    Authors: Cheng Tan, Dongxin Lyu, Siyuan Li, Zhangyang Gao, Jingxuan Wei, Siqi Ma, Zicheng Liu, Stan Z. Li

    Abstract: Large Language Models (LLMs) have demonstrated wide-ranging applications across various fields and have shown significant potential in the academic peer-review process. However, existing applications are primarily limited to static review generation based on submitted papers, which fail to capture the dynamic and iterative nature of real-world peer reviews. In this paper, we reformulate the peer-r… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Under review

  40. arXiv:2406.01593  [pdf, other

    cs.CV

    Reconstructing and Simulating Dynamic 3D Objects with Mesh-adsorbed Gaussian Splatting

    Authors: Shaojie Ma, Yawei Luo, Yi Yang

    Abstract: 3D reconstruction and simulation, while interrelated, have distinct objectives: reconstruction demands a flexible 3D representation adaptable to diverse scenes, whereas simulation requires a structured representation to model motion principles effectively. This paper introduces the Mesh-adsorbed Gaussian Splatting (MaGS) method to resolve such a dilemma. MaGS constrains 3D Gaussians to hover on th… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Project Page: see https://wcwac.github.io/MaGS-page/

  41. arXiv:2406.00699  [pdf, other

    cs.CV

    Towards General Robustness Verification of MaxPool-based Convolutional Neural Networks via Tightening Linear Approximation

    Authors: Yuan Xiao, Shiqing Ma, Juan Zhai, Chunrong Fang, Jinyuan Jia, Zhenyu Chen

    Abstract: The robustness of convolutional neural networks (CNNs) is vital to modern AI-driven systems. It can be quantified by formal verification by providing a certified lower bound, within which any perturbation does not alter the original input's classification result. It is challenging due to nonlinear components, such as MaxPool. At present, many verification methods are sound but risk losing some pre… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: Accepted to CVPR2024. Project page: https://github.com/xiaoyuanpigo/maxlin

  42. arXiv:2406.00602  [pdf, other

    cs.SE cs.PL

    From Effectiveness to Efficiency: Comparative Evaluation of Code Generated by LCGMs for Bilingual Programming Questions

    Authors: Weipeng Jiang, Xuanqi Gao, Juan Zhai, Shiqing Ma, Xiaoyu Zhang, Chao Shen

    Abstract: Large Code Generation Models (LCGMs) have garnered significant attention and achieved promising results across various programming tasks. However, concerns arise regarding performance when using non-English prompts, as these models are primarily trained on English-centric corpora, and most programming language tokens resemble English. Existing benchmarks often rely on English programming questions… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: 10 and a quarter pages, 6 figures

  43. arXiv:2405.20773  [pdf, other

    cs.CR cs.AI

    Visual-RolePlay: Universal Jailbreak Attack on MultiModal Large Language Models via Role-playing Image Character

    Authors: Siyuan Ma, Weidi Luo, Yu Wang, Xiaogeng Liu

    Abstract: With the advent and widespread deployment of Multimodal Large Language Models (MLLMs), ensuring their safety has become increasingly critical. To achieve this objective, it requires us to proactively discover the vulnerability of MLLMs by exploring the attack methods. Thus, structure-based jailbreak attacks, where harmful semantic content is embedded within images, have been proposed to mislead th… ▽ More

    Submitted 12 June, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

  44. arXiv:2405.20568  [pdf, other

    cs.LG cs.NI

    Generative AI for Deep Reinforcement Learning: Framework, Analysis, and Use Cases

    Authors: Geng Sun, Wenwen Xie, Dusit Niyato, Fang Mei, Jiawen Kang, Hongyang Du, Shiwen Mao

    Abstract: As a form of artificial intelligence (AI) technology based on interactive learning, deep reinforcement learning (DRL) has been widely applied across various fields and has achieved remarkable accomplishments. However, DRL faces certain limitations, including low sample efficiency and poor generalization. Therefore, we present how to leverage generative AI (GAI) to address these issues above and en… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  45. Correctable Landmark Discovery via Large Models for Vision-Language Navigation

    Authors: Bingqian Lin, Yunshuang Nie, Ziming Wei, Yi Zhu, Hang Xu, Shikui Ma, Jianzhuang Liu, Xiaodan Liang

    Abstract: Vision-Language Navigation (VLN) requires the agent to follow language instructions to reach a target position. A key factor for successful navigation is to align the landmarks implied in the instruction with diverse visual observations. However, previous VLN agents fail to perform accurate modality alignment especially in unexplored scenes, since they learn from limited navigation data and lack s… ▽ More

    Submitted 5 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: Accepted by TPAMI 2024

  46. arXiv:2405.18240  [pdf, other

    cs.CV

    MSPE: Multi-Scale Patch Embedding Prompts Vision Transformers to Any Resolution

    Authors: Wenzhuo Liu, Fei Zhu, Shijie Ma, Cheng-Lin Liu

    Abstract: Although Vision Transformers (ViTs) have recently advanced computer vision tasks significantly, an important real-world problem was overlooked: adapting to variable input resolutions. Typically, images are resized to a fixed resolution, such as 224x224, for efficiency during training and inference. However, uniform input size conflicts with real-world scenarios where images naturally vary in resol… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  47. arXiv:2405.18058  [pdf, other

    cs.IR

    ReChorus2.0: A Modular and Task-Flexible Recommendation Library

    Authors: Jiayu Li, Hanyu Li, Zhiyu He, Weizhi Ma, Peijie Sun, Min Zhang, Shaoping Ma

    Abstract: With the applications of recommendation systems rapidly expanding, an increasing number of studies have focused on every aspect of recommender systems with different data inputs, models, and task settings. Therefore, a flexible library is needed to help researchers implement the experimental strategies they require. Existing open libraries for recommendation scenarios have enabled reproducing vari… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 10 pages, 3 figures. Under review

  48. arXiv:2405.14866  [pdf, other

    cs.CV

    Tele-Aloha: A Low-budget and High-authenticity Telepresence System Using Sparse RGB Cameras

    Authors: Hanzhang Tu, Ruizhi Shao, Xue Dong, Shunyuan Zheng, Hao Zhang, Lili Chen, Meili Wang, Wenyu Li, Siyan Ma, Shengping Zhang, Boyao Zhou, Yebin Liu

    Abstract: In this paper, we present a low-budget and high-authenticity bidirectional telepresence system, Tele-Aloha, targeting peer-to-peer communication scenarios. Compared to previous systems, Tele-Aloha utilizes only four sparse RGB cameras, one consumer-grade GPU, and one autostereoscopic screen to achieve high-resolution (2048x2048), real-time (30 fps), low-latency (less than 150ms) and robust distant… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: Paper accepted by SIGGRAPH 2024. Project page: http://118.178.32.38/c/Tele-Aloha/

  49. arXiv:2405.14672  [pdf, other

    cs.CV

    Towards Imperceptible Backdoor Attack in Self-supervised Learning

    Authors: Hanrong Zhang, Zhenting Wang, Tingxu Han, Mingyu Jin, Chenlu Zhan, Mengnan Du, Hongwei Wang, Shiqing Ma

    Abstract: Self-supervised learning models are vulnerable to backdoor attacks. Existing backdoor attacks that are effective in self-supervised learning often involve noticeable triggers, like colored patches, which are vulnerable to human inspection. In this paper, we propose an imperceptible and effective backdoor attack against self-supervised models. We first find that existing imperceptible triggers desi… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  50. arXiv:2405.14431  [pdf, other

    cs.CL cs.AI cs.IR

    RaFe: Ranking Feedback Improves Query Rewriting for RAG

    Authors: Shengyu Mao, Yong Jiang, Boli Chen, Xiao Li, Peng Wang, Xinyu Wang, Pengjun Xie, Fei Huang, Huajun Chen, Ningyu Zhang

    Abstract: As Large Language Models (LLMs) and Retrieval Augmentation Generation (RAG) techniques have evolved, query rewriting has been widely incorporated into the RAG system for downstream tasks like open-domain QA. Many works have attempted to utilize small models with reinforcement learning rather than costly LLMs to improve query rewriting. However, current methods require annotations (e.g., labeled re… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 16 pages