Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 209 results for author: Peng, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18129  [pdf, other

    cs.CV cs.LG

    CTS: Sim-to-Real Unsupervised Domain Adaptation on 3D Detection

    Authors: Meiying Zhang, Weiyuan Peng, Guangyao Ding, Chenyang Lei, Chunlin Ji, Qi Hao

    Abstract: Simulation data can be accurately labeled and have been expected to improve the performance of data-driven algorithms, including object detection. However, due to the various domain inconsistencies from simulation to reality (sim-to-real), cross-domain object detection algorithms usually suffer from dramatic performance drops. While numerous unsupervised domain adaptation (UDA) methods have been d… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  2. arXiv:2406.18116  [pdf, other

    cs.CL cs.AI cs.HC

    BADGE: BADminton report Generation and Evaluation with LLM

    Authors: Shang-Hsuan Chiang, Lin-Wei Chao, Kuang-Da Wang, Chih-Chuan Wang, Wen-Chih Peng

    Abstract: Badminton enjoys widespread popularity, and reports on matches generally include details such as player names, game scores, and ball types, providing audiences with a comprehensive view of the games. However, writing these reports can be a time-consuming task. This challenge led us to explore whether a Large Language Model (LLM) could automate the generation and evaluation of badminton reports. We… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted by IJCAI 2024 Workshop: The 2nd International Workshop on Intelligent Technologies for Precision Sports Science (IT4PSS)

  3. arXiv:2406.11483  [pdf

    cs.CE

    Analysis of water injection heat recovery potential of abandoned oil wells to geothermal wells in northern Shaanxi

    Authors: Yu Huagui, Liu Shi, Pang Yanyan, Wang Peng, Gao Qian

    Abstract: The Chang 2 bottom water reservoir area in the western part of northern Shaanxi is one of the core oil-producing areas in the Ordos Basin.One of the main reservoirs is the Chang 2 reservoir of the Triassic Yanchang Formation, which has good physical conditions, active edge and bottom water, and high geothermal gradient. In this paper, the reservoir numerical simulation software CMG is used to simu… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Journal ref: Modern Electric Power, 2023, 1-9

  4. arXiv:2406.11176  [pdf, other

    cs.CL cs.AI

    Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement

    Authors: Weimin Xiong, Yifan Song, Xiutian Zhao, Wenhao Wu, Xun Wang, Ke Wang, Cheng Li, Wei Peng, Sujian Li

    Abstract: Large language model agents have exhibited exceptional performance across a range of complex interactive tasks. Recent approaches have utilized tuning with expert trajectories to enhance agent performance, yet they primarily concentrate on outcome rewards, which may lead to errors or suboptimal actions due to the absence of process supervision signals. In this paper, we introduce the Iterative ste… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  5. arXiv:2406.10744  [pdf, other

    cs.CV

    Technique Report of CVPR 2024 PBDL Challenges

    Authors: Ying Fu, Yu Li, Shaodi You, Boxin Shi, Jose Alvarez, Coert van Gemeren, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li, Yuze Han, Yingkai Zhang, Jianan Wang, Qinglin Liu, Wei Yu, Xiaoqian Lv, Jianing Li, Shengping Zhang, Xiangyang Ji, Yuanpei Chen, Yuhan Zhang, Weihang Peng, Liwen Zhang, Zhe Xu, Dingyong Gou , et al. (77 additional authors not shown)

    Abstract: The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, a… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: CVPR 2024 Workshop - PBDL Challenge Report

  6. arXiv:2406.09265  [pdf, other

    cs.CL

    Sharing Matters: Analysing Neurons Across Languages and Tasks in LLMs

    Authors: Weixuan Wang, Barry Haddow, Wei Peng, Alexandra Birch

    Abstract: Multilingual large language models (LLMs) have greatly increased the ceiling of performance on non-English tasks. However the mechanisms behind multilingualism in these LLMs are poorly understood. Of particular interest is the degree to which internal representations are shared between languages. Recent work on neuron analysis of LLMs has focused on the monolingual case, and the limited work on th… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  7. arXiv:2405.15299  [pdf, other

    cs.CV

    Transparent Object Depth Completion

    Authors: Yifan Zhou, Wanli Peng, Zhongyu Yang, He Liu, Yi Sun

    Abstract: The perception of transparent objects for grasp and manipulation remains a major challenge, because existing robotic grasp methods which heavily rely on depth maps are not suitable for transparent objects due to their unique visual properties. These properties lead to gaps and inaccuracies in the depth maps of the transparent objects captured by depth sensors. To address this issue, we propose an… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  8. arXiv:2405.10305  [pdf, other

    cs.CV cs.AI

    4D Panoptic Scene Graph Generation

    Authors: Jingkang Yang, Jun Cen, Wenxuan Peng, Shuai Liu, Fangzhou Hong, Xiangtai Li, Kaiyang Zhou, Qifeng Chen, Ziwei Liu

    Abstract: We are living in a three-dimensional space while moving forward through a fourth dimension: time. To allow artificial intelligence to develop a comprehensive understanding of such a 4D environment, we introduce 4D Panoptic Scene Graph (PSG-4D), a new representation that bridges the raw visual data perceived in a dynamic 4D world and high-level visual understanding. Specifically, PSG-4D abstracts r… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: Accepted as NeurIPS 2023. Code: https://github.com/Jingkang50/PSG4D Previous Series: PSG https://github.com/Jingkang50/OpenPSG and PVSG https://github.com/Jingkang50/OpenPVSG

  9. arXiv:2405.06964  [pdf, other

    cs.RO cs.AI

    ManiFoundation Model for General-Purpose Robotic Manipulation of Contact Synthesis with Arbitrary Objects and Robots

    Authors: Zhixuan Xu, Chongkai Gao, Zixuan Liu, Gang Yang, Chenrui Tie, Haozhuo Zheng, Haoyu Zhou, Weikun Peng, Debang Wang, Tianyi Chen, Zhouliang Yu, Lin Shao

    Abstract: To substantially enhance robot intelligence, there is a pressing need to develop a large model that enables general-purpose robots to proficiently undertake a broad spectrum of manipulation tasks, akin to the versatile task-planning ability exhibited by LLMs. The vast diversity in objects, robots, and manipulation tasks presents huge challenges. Our work introduces a comprehensive framework to dev… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

  10. arXiv:2404.18527  [pdf

    cs.LG cs.AI cs.CR stat.AP

    Bridging Data Barriers among Participants: Assessing the Potential of Geoenergy through Federated Learning

    Authors: Weike Peng, Jiaxin Gao, Yuntian Chen, Shengwei Wang

    Abstract: Machine learning algorithms emerge as a promising approach in energy fields, but its practical is hindered by data barriers, stemming from high collection costs and privacy concerns. This study introduces a novel federated learning (FL) framework based on XGBoost models, enabling safe collaborative modeling with accessible yet concealed data from multiple parties. Hyperparameter tuning of the mode… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  11. arXiv:2404.10413  [pdf, other

    cs.DB cs.LG cs.PF

    VDTuner: Automated Performance Tuning for Vector Data Management Systems

    Authors: Tiannuo Yang, Wen Hu, Wangqi Peng, Yusen Li, Jianguo Li, Gang Wang, Xiaoguang Liu

    Abstract: Vector data management systems (VDMSs) have become an indispensable cornerstone in large-scale information retrieval and machine learning systems like large language models. To enhance the efficiency and flexibility of similarity search, VDMS exposes many tunable index parameters and system parameters for users to specify. However, due to the inherent characteristics of VDMS, automatic performance… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: Accepted by ICDE 2024

  12. arXiv:2404.10229  [pdf, other

    cs.CL

    Generative Text Steganography with Large Language Model

    Authors: Jiaxuan Wu, Zhengxian Wu, Yiming Xue, Juan Wen, Wanli Peng

    Abstract: Recent advances in large language models (LLMs) have blurred the boundary of high-quality text generation between humans and machines, which is favorable for generative text steganography. While, current advanced steganographic mapping is not suitable for LLMs since most users are restricted to accessing only the black-box API or user interface of the LLMs, thereby lacking access to the training v… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  13. arXiv:2404.07200  [pdf, other

    cs.LG

    Toward a Better Understanding of Fourier Neural Operators: Analysis and Improvement from a Spectral Perspective

    Authors: Shaoxiang Qin, Fuyuan Lyu, Wenhui Peng, Dingyang Geng, Ju Wang, Naiping Gao, Xue Liu, Liangzhu Leon Wang

    Abstract: In solving partial differential equations (PDEs), Fourier Neural Operators (FNOs) have exhibited notable effectiveness compared to Convolutional Neural Networks (CNNs). This paper presents clear empirical evidence through spectral analysis to elucidate the superiority of FNO over CNNs: FNO is significantly more capable of learning low-frequencies. This empirical evidence also unveils FNO's distinc… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  14. arXiv:2403.12406  [pdf, other

    cs.AI cs.LG

    Offline Imitation of Badminton Player Behavior via Experiential Contexts and Brownian Motion

    Authors: Kuang-Da Wang, Wei-Yao Wang, Ping-Chun Hsieh, Wen-Chih Peng

    Abstract: In the dynamic and rapid tactic involvements of turn-based sports, badminton stands out as an intrinsic paradigm that requires alter-dependent decision-making of players. While the advancement of learning from offline expert data in sequential decision-making has been witnessed in various domains, how to rally-wise imitate the behaviors of human players from offline badminton matches has remained… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Preprint

  15. arXiv:2403.10281  [pdf, other

    cs.CL cs.AI cs.LG

    Team Trifecta at Factify5WQA: Setting the Standard in Fact Verification with Fine-Tuning

    Authors: Shang-Hsuan Chiang, Ming-Chih Lo, Lin-Wei Chao, Wen-Chih Peng

    Abstract: In this paper, we present Pre-CoFactv3, a comprehensive framework comprised of Question Answering and Text Classification components for fact verification. Leveraging In-Context Learning, Fine-tuned Large Language Models (LLMs), and the FakeNet model, we address the challenges of fact verification. Our experiments explore diverse approaches, comparing different Pre-trained LLMs, introducing FakeNe… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: Accepted by AAAI 2024 Workshop: FACTIFY 3.0 - Workshop Series on Multimodal Fact-Checking and Hate Speech Detection

  16. arXiv:2403.04785  [pdf, other

    cs.CL cs.AI

    Large Language Multimodal Models for 5-Year Chronic Disease Cohort Prediction Using EHR Data

    Authors: Jun-En Ding, Phan Nguyen Minh Thao, Wen-Chih Peng, Jian-Zhe Wang, Chun-Cheng Chug, Min-Chen Hsieh, Yun-Chien Tseng, Ling Chen, Dongsheng Luo, Chi-Te Wang, Pei-fu Chen, Feng Liu, Fang-Ming Hung

    Abstract: Chronic diseases such as diabetes are the leading causes of morbidity and mortality worldwide. Numerous research studies have been attempted with various deep learning models in diagnosis. However, most previous studies had certain limitations, including using publicly available datasets (e.g. MIMIC), and imbalanced data. In this study, we collected five-year electronic health records (EHRs) from… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

  17. arXiv:2402.01204  [pdf, other

    cs.LG cs.AI

    A Survey on Self-Supervised Learning for Non-Sequential Tabular Data

    Authors: Wei-Yao Wang, Wei-Wei Du, Derek Xu, Wei Wang, Wen-Chih Peng

    Abstract: Self-supervised learning (SSL) has been incorporated into many state-of-the-art models in various domains, where SSL defines pretext tasks based on unlabeled datasets to learn contextualized and robust representations. Recently, SSL has been a new trend in exploring the representation learning capability in the realm of tabular data, which is more challenging due to not having explicit relations f… ▽ More

    Submitted 5 February, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

    Comments: The paper list can be found at https://github.com/wwweiwei/awesome-self-supervised-learning-for-tabular-data

  18. arXiv:2402.01140  [pdf, other

    cs.LG cs.AI cs.DC

    Root Cause Analysis In Microservice Using Neural Granger Causal Discovery

    Authors: Cheng-Ming Lin, Ching Chang, Wei-Yao Wang, Kuang-Da Wang, Wen-Chih Peng

    Abstract: In recent years, microservices have gained widespread adoption in IT operations due to their scalability, maintenance, and flexibility. However, it becomes challenging for site reliability engineers (SREs) to pinpoint the root cause due to the complex relationships in microservices when facing system malfunctions. Previous research employed structured learning methods (e.g., PC-algorithm) to estab… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

    Comments: AAAI 2024 Main Track

  19. arXiv:2402.00253  [pdf, other

    cs.CV cs.CL cs.LG

    A Survey on Hallucination in Large Vision-Language Models

    Authors: Hanchao Liu, Wenyuan Xue, Yifei Chen, Dapeng Chen, Xiutian Zhao, Ke Wang, Liping Hou, Rongjun Li, Wei Peng

    Abstract: Recent development of Large Vision-Language Models (LVLMs) has attracted growing attention within the AI landscape for its practical implementation potential. However, ``hallucination'', or more specifically, the misalignment between factual visual content and corresponding textual generation, poses a significant challenge of utilizing LVLMs. In this comprehensive survey, we dissect LVLM-related h… ▽ More

    Submitted 5 May, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

  20. arXiv:2401.15509  [pdf, other

    cs.CL cs.AI cs.SI

    Style-News: Incorporating Stylized News Generation and Adversarial Verification for Neural Fake News Detection

    Authors: Wei-Yao Wang, Yu-Chieh Chang, Wen-Chih Peng

    Abstract: With the improvements in generative models, the issues of producing hallucinations in various domains (e.g., law, writing) have been brought to people's attention due to concerns about misinformation. In this paper, we focus on neural fake news, which refers to content generated by neural networks aiming to mimic the style of real news to deceive people. To prevent harmful disinformation spreading… ▽ More

    Submitted 27 January, 2024; originally announced January 2024.

    Comments: EACL 2024 Main Track

  21. arXiv:2401.09025  [pdf, other

    cs.HC cs.CY

    Exploring the Diversity of Music Experiences for Deaf and Hard of Hearing People

    Authors: Kyrie Zhixuan Zhou, Weirui Peng, Yuhan Liu, Rachel F. Adler

    Abstract: Sensory substitution or enhancement techniques have been proposed to enable deaf or hard of hearing (DHH) people to listen to and even compose music. However, little is known about how such techniques enhance DHH people's music experience. Since deafness is a spectrum -- as are DHH people's preferences and perceptions of music -- a more situated understanding of their interaction with music is nee… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

  22. arXiv:2401.08053  [pdf, other

    cs.CV

    SCoFT: Self-Contrastive Fine-Tuning for Equitable Image Generation

    Authors: Zhixuan Liu, Peter Schaldenbrand, Beverley-Claire Okogwu, Wenxuan Peng, Youngsik Yun, Andrew Hundt, Jihie Kim, Jean Oh

    Abstract: Accurate representation in media is known to improve the well-being of the people who consume it. Generative image models trained on large web-crawled datasets such as LAION are known to produce images with harmful stereotypes and misrepresentations of cultures. We improve inclusive representation in generated images by (1) engaging with communities to collect a culturally representative dataset t… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

  23. arXiv:2401.06775  [pdf, other

    cs.CL cs.AI

    Large language models in healthcare and medical domain: A review

    Authors: Zabir Al Nazi, Wei Peng

    Abstract: The deployment of large language models (LLMs) within the healthcare sector has sparked both enthusiasm and apprehension. These models exhibit the remarkable capability to provide proficient responses to free-text queries, demonstrating a nuanced understanding of professional medical knowledge. This comprehensive survey delves into the functionalities of existing LLMs designed for healthcare appli… ▽ More

    Submitted 12 December, 2023; originally announced January 2024.

  24. arXiv:2401.00652  [pdf, other

    cs.CV

    From Covert Hiding to Visual Editing: Robust Generative Video Steganography

    Authors: Xueying Mao, Xiaoxiao Hu, Wanli Peng, Zhenliang Gan, Qichao Ying, Zhenxing Qian, Sheng Li, Xinpeng Zhang

    Abstract: Traditional video steganography methods are based on modifying the covert space for embedding, whereas we propose an innovative approach that embeds secret message within semantic feature for steganography during the video editing process. Although existing traditional video steganography methods display a certain level of security and embedding capacity, they lack adequate robustness against comm… ▽ More

    Submitted 31 December, 2023; originally announced January 2024.

    Comments: Under Review

  25. arXiv:2312.17617  [pdf, other

    cs.CL

    Large Language Models for Generative Information Extraction: A Survey

    Authors: Derong Xu, Wei Chen, Wenjun Peng, Chao Zhang, Tong Xu, Xiangyu Zhao, Xian Wu, Yefeng Zheng, Yang Wang, Enhong Chen

    Abstract: Information extraction (IE) aims to extract structural knowledge (such as entities, relations, and events) from plain natural language texts. Recently, generative Large Language Models (LLMs) have demonstrated remarkable capabilities in text understanding and generation, allowing for generalization across various domains and tasks. As a result, numerous works have been proposed to harness abilitie… ▽ More

    Submitted 4 June, 2024; v1 submitted 29 December, 2023; originally announced December 2023.

    Comments: v2: Updated 100+ new papers, 5 technical categories

  26. arXiv:2312.11553  [pdf, other

    cs.SI cs.AI cs.CL cs.LG

    SeGA: Preference-Aware Self-Contrastive Learning with Prompts for Anomalous User Detection on Twitter

    Authors: Ying-Ying Chang, Wei-Yao Wang, Wen-Chih Peng

    Abstract: In the dynamic and rapidly evolving world of social media, detecting anomalous users has become a crucial task to address malicious activities such as misinformation and cyberbullying. As the increasing number of anomalous users improves the ability to mimic normal users and evade detection, existing methods only focusing on bot detection are ineffective in terms of capturing subtle distinctions b… ▽ More

    Submitted 17 December, 2023; originally announced December 2023.

    Comments: AAAI 2024 Main Track

  27. arXiv:2312.10942  [pdf, other

    cs.AI cs.LG

    ShuttleSHAP: A Turn-Based Feature Attribution Approach for Analyzing Forecasting Models in Badminton

    Authors: Wei-Yao Wang, Wen-Chih Peng, Wei Wang, Philip S. Yu

    Abstract: Agent forecasting systems have been explored to investigate agent patterns and improve decision-making in various domains, e.g., pedestrian predictions and marketing bidding. Badminton represents a fascinating example of a multifaceted turn-based sport, requiring both sophisticated tactic developments and alternate-dependent decision-making. Recent deep learning approaches for player tactic foreca… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: Preprint

  28. arXiv:2312.06372  [pdf, other

    cs.CV

    Ternary Spike: Learning Ternary Spikes for Spiking Neural Networks

    Authors: Yufei Guo, Yuanpei Chen, Xiaode Liu, Weihang Peng, Yuhan Zhang, Xuhui Huang, Zhe Ma

    Abstract: The Spiking Neural Network (SNN), as one of the biologically inspired neural network infrastructures, has drawn increasing attention recently. It adopts binary spike activations to transmit information, thus the multiplications of activations and weights can be substituted by additions, which brings high energy efficiency. However, in the paper, we theoretically and experimentally prove that the b… ▽ More

    Submitted 16 December, 2023; v1 submitted 11 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI2024

  29. arXiv:2312.04142  [pdf, other

    cs.LG cs.AI

    TimeDRL: Disentangled Representation Learning for Multivariate Time-Series

    Authors: Ching Chang, Chiao-Tung Chan, Wei-Yao Wang, Wen-Chih Peng, Tien-Fu Chen

    Abstract: Multivariate time-series data in numerous real-world applications (e.g., healthcare and industry) are informative but challenging due to the lack of labels and high dimensionality. Recent studies in self-supervised learning have shown their potential in learning rich representations without relying on labels, yet they fall short in learning disentangled embeddings and addressing issues of inductiv… ▽ More

    Submitted 13 March, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

    Comments: This paper has been accepted by the International Conference on Data Engineering (ICDE) 2024

  30. arXiv:2312.02366  [pdf, other

    cs.CV cs.AI

    Towards General Purpose Vision Foundation Models for Medical Image Analysis: An Experimental Study of DINOv2 on Radiology Benchmarks

    Authors: Mohammed Baharoon, Waseem Qureshi, Jiahong Ouyang, Yanwu Xu, Abdulrhman Aljouie, Wei Peng

    Abstract: The integration of deep learning systems into healthcare has been hindered by the resource-intensive process of data annotation and the inability of these systems to generalize to different data distributions. Foundation models, which are models pre-trained on large datasets, have emerged as a solution to reduce reliance on annotated data and enhance model generalizability and robustness. DINOv2 i… ▽ More

    Submitted 28 December, 2023; v1 submitted 4 December, 2023; originally announced December 2023.

  31. arXiv:2312.00081  [pdf, other

    cs.CV

    Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language Understanding

    Authors: Wujian Peng, Sicheng Xie, Zuyao You, Shiyi Lan, Zuxuan Wu

    Abstract: Vision language models (VLM) have demonstrated remarkable performance across various downstream tasks. However, understanding fine-grained visual-linguistic concepts, such as attributes and inter-object relationships, remains a significant challenge. While several benchmarks aim to evaluate VLMs in finer granularity, their primary focus remains on the linguistic aspect, neglecting the visual dimen… ▽ More

    Submitted 30 March, 2024; v1 submitted 29 November, 2023; originally announced December 2023.

    Comments: Accepted by CVPR 2024

  32. arXiv:2311.17058  [pdf, other

    cs.CV cs.AI

    Panoptic Video Scene Graph Generation

    Authors: Jingkang Yang, Wenxuan Peng, Xiangtai Li, Zujin Guo, Liangyu Chen, Bo Li, Zheng Ma, Kaiyang Zhou, Wayne Zhang, Chen Change Loy, Ziwei Liu

    Abstract: Towards building comprehensive real-world visual perception systems, we propose and study a new problem called panoptic scene graph generation (PVSG). PVSG relates to the existing video scene graph generation (VidSGG) problem, which focuses on temporal interactions between humans and objects grounded with bounding boxes in videos. However, the limitation of bounding boxes in detecting non-rigid ob… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

    Comments: Accepted to CVPR 2023. Project Page: https://jingkang50.github.io/PVSG/. Codebase: https://github.com/LilyDaytoy/OpenPVSG. We provide 400 long videos with frame-level panoptic segmentation, scene graph, dense captions, and QA annotations

  33. arXiv:2311.16113  [pdf, other

    cs.CR

    BAGEL: Backdoor Attacks against Federated Contrastive Learning

    Authors: Yao Huang, Kongyang Chen, Jiannong Cao, Jiaxing Shen, Shaowei Wang, Yun Peng, Weilong Peng, Kechao Cai

    Abstract: Federated Contrastive Learning (FCL) is an emerging privacy-preserving paradigm in distributed learning for unlabeled data. In FCL, distributed parties collaboratively learn a global encoder with unlabeled data, and the global encoder could be widely used as a feature extractor to build models for many downstream tasks. However, FCL is also vulnerable to many security threats (e.g., backdoor attac… ▽ More

    Submitted 14 September, 2023; originally announced November 2023.

  34. arXiv:2311.15619  [pdf, other

    cs.CV cs.AI

    Align before Adapt: Leveraging Entity-to-Region Alignments for Generalizable Video Action Recognition

    Authors: Yifei Chen, Dapeng Chen, Ruijin Liu, Sai Zhou, Wenyuan Xue, Wei Peng

    Abstract: Large-scale visual-language pre-trained models have achieved significant success in various video tasks. However, most existing methods follow an "adapt then align" paradigm, which adapts pre-trained image encoders to model video-level representations and utilizes one-hot or text embedding of the action labels for supervision. This paradigm overlooks the challenge of mapping from static images to… ▽ More

    Submitted 20 March, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

    Comments: Accepted at CVPR 2024

  35. arXiv:2311.14091  [pdf, other

    cs.HC cs.AI cs.CY cs.MM

    PortfolioMentor: Multimodal Generative AI Companion for Learning and Crafting Interactive Digital Art Portfolios

    Authors: Tao Long, Weirui Peng

    Abstract: Digital art portfolios serve as impactful mediums for artists to convey their visions, weaving together visuals, audio, interactions, and narratives. However, without technical backgrounds, design students often find it challenging to translate creative ideas into tangible codes and designs, given the lack of tailored resources for the non-technical, academic support in art schools, and a comprehe… ▽ More

    Submitted 23 November, 2023; originally announced November 2023.

    Comments: 3 pages, 1 figure, work in progress

  36. arXiv:2311.05876  [pdf, other

    cs.CL

    Trends in Integration of Knowledge and Large Language Models: A Survey and Taxonomy of Methods, Benchmarks, and Applications

    Authors: Zhangyin Feng, Weitao Ma, Weijiang Yu, Lei Huang, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, Ting liu

    Abstract: Large language models (LLMs) exhibit superior performance on various natural language tasks, but they are susceptible to issues stemming from outdated data and domain-specific limitations. In order to address these challenges, researchers have pursued two primary strategies, knowledge editing and retrieval augmentation, to enhance LLMs by incorporating external information from different aspects.… ▽ More

    Submitted 7 December, 2023; v1 submitted 10 November, 2023; originally announced November 2023.

    Comments: Work in progress; 22 pages. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  37. arXiv:2311.05232  [pdf, other

    cs.CL

    A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions

    Authors: Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, Ting Liu

    Abstract: The emergence of large language models (LLMs) has marked a significant breakthrough in natural language processing (NLP), leading to remarkable advancements in text understanding and generation. Nevertheless, alongside these strides, LLMs exhibit a critical tendency to produce hallucinations, resulting in content that is inconsistent with real-world facts or user inputs. This phenomenon poses subs… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

    Comments: Work in progress; 49 pages

  38. arXiv:2311.03758  [pdf, other

    cs.IR

    Large Language Model based Long-tail Query Rewriting in Taobao Search

    Authors: Wenjun Peng, Guiyang Li, Yue Jiang, Zilong Wang, Dan Ou, Xiaoyi Zeng, Derong Xu, Tong Xu, Enhong Chen

    Abstract: In the realm of e-commerce search, the significance of semantic matching cannot be overstated, as it directly impacts both user experience and company revenue. Along this line, query rewriting, serving as an important technique to bridge the semantic gaps inherent in the semantic matching process, has attached wide attention from the industry and academia. However, existing query rewriting methods… ▽ More

    Submitted 4 March, 2024; v1 submitted 7 November, 2023; originally announced November 2023.

    Comments: WWW Industry

  39. arXiv:2311.00246  [pdf, ps, other

    cs.CV eess.IV

    RAUNE-Net: A Residual and Attention-Driven Underwater Image Enhancement Method

    Authors: Wangzhen Peng, Chenghao Zhou, Runze Hu, Jingchao Cao, Yutao Liu

    Abstract: Underwater image enhancement (UIE) poses challenges due to distinctive properties of the underwater environment, including low contrast, high turbidity, visual blurriness, and color distortion. In recent years, the application of deep learning has quietly revolutionized various areas of scientific research, including UIE. However, existing deep learning-based UIE methods generally suffer from issu… ▽ More

    Submitted 31 October, 2023; originally announced November 2023.

  40. arXiv:2310.09820  [pdf, other

    cs.CL

    Assessing the Reliability of Large Language Model Knowledge

    Authors: Weixuan Wang, Barry Haddow, Alexandra Birch, Wei Peng

    Abstract: Large language models (LLMs) have been treated as knowledge bases due to their strong performance in knowledge probing tasks. LLMs are typically evaluated using accuracy, yet this metric does not capture the vulnerability of LLMs to hallucination-inducing factors like prompt and context variability. How do we evaluate the capabilities of LLMs to consistently produce factually correct answers? In t… ▽ More

    Submitted 15 October, 2023; originally announced October 2023.

  41. arXiv:2310.09773  [pdf, other

    cs.CL

    RSVP: Customer Intent Detection via Agent Response Contrastive and Generative Pre-Training

    Authors: Yu-Chien Tang, Wei-Yao Wang, An-Zi Yen, Wen-Chih Peng

    Abstract: The dialogue systems in customer services have been developed with neural models to provide users with precise answers and round-the-clock support in task-oriented conversations by detecting customer intents based on their utterances. Existing intent detection approaches have highly relied on adaptively pre-training language models with large-scale datasets, yet the predominant cost of data collec… ▽ More

    Submitted 15 October, 2023; originally announced October 2023.

    Comments: Accepted by EMNLP 2023 Findings

  42. arXiv:2310.06232  [pdf, other

    cs.CV

    Spiking PointNet: Spiking Neural Networks for Point Clouds

    Authors: Dayong Ren, Zhe Ma, Yuanpei Chen, Weihang Peng, Xiaode Liu, Yuhan Zhang, Yufei Guo

    Abstract: Recently, Spiking Neural Networks (SNNs), enjoying extreme energy efficiency, have drawn much research attention on 2D visual recognition and shown gradually increasing application potential. However, it still remains underexplored whether SNNs can be generalized to 3D recognition. To this end, we present Spiking PointNet in the paper, the first spiking neural model for efficient deep learning on… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

    Comments: Accepted by NeurIPS

  43. arXiv:2310.05010  [pdf, other

    cs.CV

    Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and Data

    Authors: Zuxuan Wu, Zejia Weng, Wujian Peng, Xitong Yang, Ang Li, Larry S. Davis, Yu-Gang Jiang

    Abstract: Despite significant results achieved by Contrastive Language-Image Pretraining (CLIP) in zero-shot image recognition, limited effort has been made exploring its potential for zero-shot video recognition. This paper presents Open-VCLIP++, a simple yet effective framework that adapts CLIP to a strong zero-shot video classifier, capable of identifying novel actions and events during testing. Open-VCL… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2302.00624

  44. arXiv:2310.04630  [pdf, other

    eess.IV cs.CV

    Metadata-Conditioned Generative Models to Synthesize Anatomically-Plausible 3D Brain MRIs

    Authors: Wei Peng, Tomas Bosschieter, Jiahong Ouyang, Robert Paul, Ehsan Adeli, Qingyu Zhao, Kilian M. Pohl

    Abstract: Generative AI models hold great potential in creating synthetic brain MRIs that advance neuroimaging studies by, for example, enriching data diversity. However, the mainstay of AI research only focuses on optimizing the visual quality (such as signal-to-noise ratio) of the synthetic MRIs while lacking insights into their relevance to neuroscience. To gain these insights with respect to T1-weighted… ▽ More

    Submitted 6 October, 2023; originally announced October 2023.

  45. arXiv:2310.03559  [pdf, other

    eess.IV cs.CV

    MedSyn: Text-guided Anatomy-aware Synthesis of High-Fidelity 3D CT Images

    Authors: Yanwu Xu, Li Sun, Wei Peng, Shyam Visweswaran, Kayhan Batmanghelich

    Abstract: This paper introduces an innovative methodology for producing high-quality 3D lung CT images guided by textual information. While diffusion-based generative models are increasingly used in medical imaging, current state-of-the-art approaches are limited to low-resolution outputs and underutilize radiology reports' abundant information. The radiology reports can enhance the generation process by pr… ▽ More

    Submitted 18 June, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

  46. arXiv:2310.02264  [pdf, other

    cs.RO cs.CL cs.CV cs.LG

    Generalizable Long-Horizon Manipulations with Large Language Models

    Authors: Haoyu Zhou, Mingyu Ding, Weikun Peng, Masayoshi Tomizuka, Lin Shao, Chuang Gan

    Abstract: This work introduces a framework harnessing the capabilities of Large Language Models (LLMs) to generate primitive task conditions for generalizable long-horizon manipulations with novel objects and unseen tasks. These task conditions serve as guides for the generation and adjustment of Dynamic Movement Primitives (DMP) trajectories for long-horizon task execution. We further create a challenging… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

  47. arXiv:2310.00213  [pdf, other

    cs.CV

    LSOR: Longitudinally-Consistent Self-Organized Representation Learning

    Authors: Jiahong Ouyang, Qingyu Zhao, Ehsan Adeli, Wei Peng, Greg Zaharchuk, Kilian M. Pohl

    Abstract: Interpretability is a key issue when applying deep learning models to longitudinal brain MRIs. One way to address this issue is by visualizing the high-dimensional latent spaces generated by deep learning via self-organizing maps (SOM). SOM separates the latent space into clusters and then maps the cluster centers to a discrete (typically 2D) grid preserving the high-dimensional relationship betwe… ▽ More

    Submitted 29 September, 2023; originally announced October 2023.

    Journal ref: International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) 2023

  48. arXiv:2309.15402  [pdf, other

    cs.CL cs.AI

    Navigate through Enigmatic Labyrinth A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future

    Authors: Zheng Chu, Jingchang Chen, Qianglong Chen, Weijiang Yu, Tao He, Haotian Wang, Weihua Peng, Ming Liu, Bing Qin, Ting Liu

    Abstract: Reasoning, a fundamental cognitive process integral to human intelligence, has garnered substantial interest within artificial intelligence. Notably, recent studies have revealed that chain-of-thought prompting significantly enhances LLM's reasoning capabilities, which attracts widespread attention from both academics and industry. In this paper, we systematically investigate relevant research, su… ▽ More

    Submitted 5 June, 2024; v1 submitted 27 September, 2023; originally announced September 2023.

    Comments: Accepted to ACL 2024

  49. arXiv:2309.12717  [pdf, other

    cs.CV cs.MM

    Transformer-based Image Compression with Variable Image Quality Objectives

    Authors: Chia-Hao Kao, Yi-Hsin Chen, Cheng Chien, Wei-Chen Chiu, Wen-Hsiao Peng

    Abstract: This paper presents a Transformer-based image compression system that allows for a variable image quality objective according to the user's preference. Optimizing a learned codec for different quality objectives leads to reconstructed images with varying visual characteristics. Our method provides the user with the flexibility to choose a trade-off between two image quality objectives using a sing… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

    Journal ref: 2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

  50. arXiv:2309.07594  [pdf, other

    cs.AI cs.IR

    Neuro-Symbolic Recommendation Model based on Logic Query

    Authors: Maonian Wu, Bang Chen, Shaojun Zhu, Bo Zheng, Wei Peng, Mingyi Zhang

    Abstract: A recommendation system assists users in finding items that are relevant to them. Existing recommendation models are primarily based on predicting relationships between users and items and use complex matching models or incorporate extensive external information to capture association patterns in data. However, recommendation is not only a problem of inductive statistics using data; it is also a c… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

    Comments: 17 pages, 6 figures