Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 2,132 results for author: Yang, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.03319  [pdf, other

    cs.ET

    Semantic Communication for Efficient Point Cloud Transmission

    Authors: Shangzhuo Xie, Qianqian Yang, Yuyi Sun, Tianxiao Han, Zhaohui Yang, Zhiguo Shi

    Abstract: As three-dimensional acquisition technologies like LiDAR cameras advance, the need for efficient transmission of 3D point clouds is becoming increasingly important. In this paper, we present a novel semantic communication (SemCom) approach for efficient 3D point cloud transmission. Different from existing methods that rely on downsampling and feature extraction for compression, our approach utiliz… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  2. arXiv:2409.02795  [pdf, other

    cs.CL

    Towards a Unified View of Preference Learning for Large Language Models: A Survey

    Authors: Bofei Gao, Feifan Song, Yibo Miao, Zefan Cai, Zhe Yang, Liang Chen, Helan Hu, Runxin Xu, Qingxiu Dong, Ce Zheng, Wen Xiao, Ge Zhang, Daoguang Zan, Keming Lu, Bowen Yu, Dayiheng Liu, Zeyu Cui, Jian Yang, Lei Sha, Houfeng Wang, Zhifang Sui, Peiyi Wang, Tianyu Liu, Baobao Chang

    Abstract: Large Language Models (LLMs) exhibit remarkably powerful capabilities. One of the crucial factors to achieve success is aligning the LLM's output with human preferences. This alignment process often requires only a small amount of data to efficiently enhance the LLM's performance. While effective, research in this area spans multiple domains, and the methods involved are relatively complex to unde… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: Initial Commit, 21 pages

  3. arXiv:2409.02512  [pdf, other

    cs.LG cs.AI

    Continual Diffuser (CoD): Mastering Continual Offline Reinforcement Learning with Experience Rehearsal

    Authors: Jifeng Hu, Li Shen, Sili Huang, Zhejian Yang, Hechang Chen, Lichao Sun, Yi Chang, Dacheng Tao

    Abstract: Artificial neural networks, especially recent diffusion-based models, have shown remarkable superiority in gaming, control, and QA systems, where the training tasks' datasets are usually static. However, in real-world applications, such as robotic control of reinforcement learning (RL), the tasks are changing, and new tasks arise in a sequential order. This situation poses the new challenge of pla… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  4. Multi-Sources Fusion Learning for Multi-Points NLOS Localization in OFDM System

    Authors: Bohao Wang, Zitao Shuai, Chongwen Huang, Qianqian Yang, Zhaohui Yang, Richeng Jin, Ahmed Al Hammadi, Zhaoyang Zhang, Chau Yuen, Mérouane Debbah

    Abstract: Accurate localization of mobile terminals is a pivotal aspect of integrated sensing and communication systems. Traditional fingerprint-based localization methods, which infer coordinates from channel information within pre-set rectangular areas, often face challenges due to the heterogeneous distribution of fingerprints inherent in non-line-of-sight (NLOS) scenarios, particularly within orthogonal… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: 12 pages, 14 figures, accepted by IEEE Journal of Selected Topics in Signal Processing (JSTSP). arXiv admin note: substantial text overlap with arXiv:2401.12538

  5. arXiv:2409.02145  [pdf, other

    cs.LG cs.AI

    A Multimodal Object-level Contrast Learning Method for Cancer Survival Risk Prediction

    Authors: Zekang Yang, Hong Liu, Xiangdong Wang

    Abstract: Computer-aided cancer survival risk prediction plays an important role in the timely treatment of patients. This is a challenging weakly supervised ordinal regression task associated with multiple clinical factors involved such as pathological images, genomic data and etc. In this paper, we propose a new training method, multimodal object-level contrast learning, for cancer survival risk predictio… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  6. arXiv:2409.02143  [pdf, other

    q-bio.GN cs.LG

    CMOB: Large-Scale Cancer Multi-Omics Benchmark with Open Datasets, Tasks, and Baselines

    Authors: Ziwei Yang, Rikuto Kotoge, Zheng Chen, Xihao Piao, Yasuko Matsubara, Yasushi Sakurai

    Abstract: Machine learning has shown great potential in the field of cancer multi-omics studies, offering incredible opportunities for advancing precision medicine. However, the challenges associated with dataset curation and task formulation pose significant hurdles, especially for researchers lacking a biomedical background. Here, we introduce the CMOB, the first large-scale cancer multi-omics benchmark i… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  7. arXiv:2409.01502  [pdf, other

    cs.CV cs.AI cs.GR

    AMG: Avatar Motion Guided Video Generation

    Authors: Zhangsihao Yang, Mengyi Shan, Mohammad Farazi, Wenhui Zhu, Yanxi Chen, Xuanzhao Dong, Yalin Wang

    Abstract: Human video generation task has gained significant attention with the advancement of deep generative models. Generating realistic videos with human movements is challenging in nature, due to the intricacies of human body topology and sensitivity to visual artifacts. The extensively studied 2D media generation methods take advantage of massive human media datasets, but struggle with 3D-aware contro… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: The project page is at https://github.com/zshyang/amg

  8. A Survey and Comparison of Post-quantum and Quantum Blockchains

    Authors: Zebo Yang, Haneen Alfauri, Behrooz Farkiani, Raj Jain, Roberto Di Pietro, Aiman Erbad

    Abstract: Blockchains have gained substantial attention from academia and industry for their ability to facilitate decentralized trust and communications. However, the rapid progress of quantum computing poses a significant threat to the security of existing blockchain technologies. Notably, the emergence of Shor's and Grover's algorithms raises concerns regarding the compromise of the cryptographic systems… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Journal ref: IEEE Communications Surveys & Tutorials, vol. 26, no. 2, pp. 967-1002, Secondquarter 2024

  9. arXiv:2409.01309  [pdf, other

    cs.IT

    Refined Statistical Bounds for Classification Error Mismatches with Constrained Bayes Error

    Authors: Zijian Yang, Vahe Eminyan, Ralf Schlüter, Hermann Ney

    Abstract: In statistical classification/multiple hypothesis testing and machine learning, a model distribution estimated from the training data is usually applied to replace the unknown true distribution in the Bayes decision rule, which introduces a mismatch between the Bayes error and the model-based classification error. In this work, we derive the classification error bound to study the relationship bet… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: accepted at 2024 IEEE Information Theory Workshop

  10. arXiv:2409.01012  [pdf, other

    cs.IR cs.LG

    Improved Diversity-Promoting Collaborative Metric Learning for Recommendation

    Authors: Shilong Bao, Qianqian Xu, Zhiyong Yang, Yuan He, Xiaochun Cao, Qingming Huang

    Abstract: Collaborative Metric Learning (CML) has recently emerged as a popular method in recommendation systems (RS), closing the gap between metric learning and collaborative filtering. Following the convention of RS, existing practices exploit unique user representation in their model design. This paper focuses on a challenging scenario where a user has multiple categories of interests. Under this settin… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: arXiv admin note: text overlap with arXiv:2209.15292

  11. arXiv:2409.00918  [pdf, other

    cs.DC

    LuWu: An End-to-End In-Network Out-of-Core Optimizer for 100B-Scale Model-in-Network Data-Parallel Training on Distributed GPUs

    Authors: Mo Sun, Zihan Yang, Changyue Liao, Yingtao Li, Fei Wu, Zeke Wang

    Abstract: The recent progress made in large language models (LLMs) has brought tremendous application prospects to the world. The growing model size demands LLM training on multiple GPUs, while data parallelism is the most popular distributed training strategy due to its simplicity, efficiency, and scalability. Current systems adopt the model-sharded data parallelism to enable memory-efficient training, how… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  12. arXiv:2409.00839  [pdf, other

    cs.CV cs.AI cs.IT

    Entropy Loss: An Interpretability Amplifier of 3D Object Detection Network for Intelligent Driving

    Authors: Haobo Yang, Shiyan Zhang, Zhuoyi Yang, Xinyu Zhang, Li Wang, Yifan Tang, Jilong Guo, Jun Li

    Abstract: With the increasing complexity of the traffic environment, the significance of safety perception in intelligent driving is intensifying. Traditional methods in the field of intelligent driving perception rely on deep learning, which suffers from limited interpretability, often described as a "black box." This paper introduces a novel type of loss function, termed "Entropy Loss," along with an inno… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  13. arXiv:2409.00694  [pdf, other

    cs.CV

    IAFI-FCOS: Intra- and across-layer feature interaction FCOS model for lesion detection of CT images

    Authors: Qiu Guan, Mengjie Pan, Feng Chen, Zhiqiang Yang, Zhongwen Yu, Qianwei Zhou, Haigen Hu

    Abstract: Effective lesion detection in medical image is not only rely on the features of lesion region,but also deeply relative to the surrounding information.However,most current methods have not fully utilize it.What is more,multi-scale feature fusion mechanism of most traditional detectors are unable to transmit detail information without loss,which makes it hard to detect small and boundary ambiguous l… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

    Comments: 2024 IJCNN

  14. arXiv:2409.00250  [pdf, other

    cs.CV

    Medical Report Generation Is A Multi-label Classification Problem

    Authors: Yijian Fan, Zhenbang Yang, Rui Liu, Mingjie Li, Xiaojun Chang

    Abstract: Medical report generation is a critical task in healthcare that involves the automatic creation of detailed and accurate descriptions from medical images. Traditionally, this task has been approached as a sequence generation problem, relying on vision-and-language techniques to generate coherent and contextually relevant reports. However, in this paper, we propose a novel perspective: rethinking m… ▽ More

    Submitted 30 August, 2024; originally announced September 2024.

    Comments: Accepted to 2024 IEEE International Conference on Medical Artificial Intelligence

  15. arXiv:2409.00099  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Query-by-Example Keyword Spotting Using Spectral-Temporal Graph Attentive Pooling and Multi-Task Learning

    Authors: Zhenyu Wang, Shuyu Kong, Li Wan, Biqiao Zhang, Yiteng Huang, Mumin Jin, Ming Sun, Xin Lei, Zhaojun Yang

    Abstract: Existing keyword spotting (KWS) systems primarily rely on predefined keyword phrases. However, the ability to recognize customized keywords is crucial for tailoring interactions with intelligent devices. In this paper, we present a novel Query-by-Example (QbyE) KWS system that employs spectral-temporal graph attentive pooling and multi-task learning. This framework aims to effectively learn speake… ▽ More

    Submitted 26 August, 2024; originally announced September 2024.

    Journal ref: INTERSPEECH 2024

  16. arXiv:2409.00009  [pdf, other

    cs.IR cs.AI

    Web Retrieval Agents for Evidence-Based Misinformation Detection

    Authors: Jacob-Junqi Tian, Hao Yu, Yury Orlovskiy, Tyler Vergho, Mauricio Rivera, Mayank Goel, Zachary Yang, Jean-Francois Godbout, Reihaneh Rabbany, Kellin Pelrine

    Abstract: This paper develops an agent-based automated fact-checking approach for detecting misinformation. We demonstrate that combining a powerful LLM agent, which does not have access to the internet for searches, with an online web search agent yields better results than when each tool is used independently. Our approach is robust across multiple models, outperforming alternatives and increasing the mac… ▽ More

    Submitted 15 August, 2024; originally announced September 2024.

    Comments: 1 main figure, 8 tables, 10 pages, 12 figures in Appendix, 7 tables in Appendix

  17. arXiv:2408.16500  [pdf, other

    cs.CV

    CogVLM2: Visual Language Models for Image and Video Understanding

    Authors: Wenyi Hong, Weihan Wang, Ming Ding, Wenmeng Yu, Qingsong Lv, Yan Wang, Yean Cheng, Shiyu Huang, Junhui Ji, Zhao Xue, Lei Zhao, Zhuoyi Yang, Xiaotao Gu, Xiaohan Zhang, Guanyu Feng, Da Yin, Zihan Wang, Ji Qi, Xixuan Song, Peng Zhang, Debing Liu, Bin Xu, Juanzi Li, Yuxiao Dong, Jie Tang

    Abstract: Beginning with VisualGLM and CogVLM, we are continuously exploring VLMs in pursuit of enhanced vision-language fusion, efficient higher-resolution architecture, and broader modalities and applications. Here we propose the CogVLM2 family, a new generation of visual language models for image and video understanding including CogVLM2, CogVLM2-Video and GLM-4V. As an image understanding model, CogVLM2… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  18. arXiv:2408.15991  [pdf, other

    cs.CV

    Distribution Backtracking Builds A Faster Convergence Trajectory for One-step Diffusion Distillation

    Authors: Shengyuan Zhang, Ling Yang, Zejian Li, An Zhao, Chenye Meng, Changyuan Yang, Guang Yang, Zhiyuan Yang, Lingyun Sun

    Abstract: Accelerating the sampling speed of diffusion models remains a significant challenge. Recent score distillation methods distill a heavy teacher model into an one-step student generator, which is optimized by calculating the difference between the two score functions on the samples generated by the student model. However, there is a score mismatch issue in the early stage of the distillation process… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  19. arXiv:2408.15708  [pdf, other

    cs.CV

    Towards Realistic Example-based Modeling via 3D Gaussian Stitching

    Authors: Xinyu Gao, Ziyi Yang, Bingchen Gong, Xiaoguang Han, Sipeng Yang, Xiaogang Jin

    Abstract: Using parts of existing models to rebuild new models, commonly termed as example-based modeling, is a classical methodology in the realm of computer graphics. Previous works mostly focus on shape composition, making them very hard to use for realistic composition of 3D objects captured from real-world scenes. This leads to combining multiple NeRFs into a single 3D scene to achieve seamless appeara… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  20. arXiv:2408.14805  [pdf, other

    cs.CV

    Platypus: A Generalized Specialist Model for Reading Text in Various Forms

    Authors: Peng Wang, Zhaohai Li, Jun Tang, Humen Zhong, Fei Huang, Zhibo Yang, Cong Yao

    Abstract: Reading text from images (either natural scenes or documents) has been a long-standing research topic for decades, due to the high technical challenge and wide application range. Previously, individual specialist models are developed to tackle the sub-tasks of text reading (e.g., scene text recognition, handwritten text recognition and mathematical expression recognition). However, such specialist… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: Accepted by ECCV2024

  21. arXiv:2408.14562  [pdf, other

    cs.CV cs.AI

    A Survey of Camouflaged Object Detection and Beyond

    Authors: Fengyang Xiao, Sujie Hu, Yuqi Shen, Chengyu Fang, Jinfa Huang, Chunming He, Longxiang Tang, Ziyun Yang, Xiu Li

    Abstract: Camouflaged Object Detection (COD) refers to the task of identifying and segmenting objects that blend seamlessly into their surroundings, posing a significant challenge for computer vision systems. In recent years, COD has garnered widespread attention due to its potential applications in surveillance, wildlife conservation, autonomous systems, and more. While several surveys on COD exist, they o… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: 26 pages, 10 figures, 8 tables

  22. arXiv:2408.14511  [pdf, other

    cs.AI cs.CL cs.LG math.ST stat.ML

    Unveiling the Statistical Foundations of Chain-of-Thought Prompting Methods

    Authors: Xinyang Hu, Fengzhuo Zhang, Siyu Chen, Zhuoran Yang

    Abstract: Chain-of-Thought (CoT) prompting and its variants have gained popularity as effective methods for solving multi-step reasoning problems using pretrained large language models (LLMs). In this work, we analyze CoT prompting from a statistical estimation perspective, providing a comprehensive characterization of its sample complexity. To this end, we introduce a multi-step latent variable model that… ▽ More

    Submitted 28 August, 2024; v1 submitted 25 August, 2024; originally announced August 2024.

    Comments: 150 pages, 18 figures, 3 tables

  23. arXiv:2408.14416  [pdf, ps, other

    cs.LG cs.DC

    Hyperdimensional Computing Empowered Federated Foundation Model over Wireless Networks for Metaverse

    Authors: Yahao Ding, Wen Shang, Minrui Xu, Zhaohui Yang, Ye Hu, Dusit Niyato, Mohammad Shikh-Bahaei

    Abstract: The Metaverse, a burgeoning collective virtual space merging augmented reality and persistent virtual worlds, necessitates advanced artificial intelligence (AI) and communication technologies to support immersive and interactive experiences. Federated learning (FL) has emerged as a promising technique for collaboratively training AI models while preserving data privacy. However, FL faces challenge… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  24. arXiv:2408.14087  [pdf, other

    cs.CV

    LSM-YOLO: A Compact and Effective ROI Detector for Medical Detection

    Authors: Zhongwen Yu, Qiu Guan, Jianmin Yang, Zhiqiang Yang, Qianwei Zhou, Yang Chen, Feng Chen

    Abstract: In existing medical Region of Interest (ROI) detection, there lacks an algorithm that can simultaneously satisfy both real-time performance and accuracy, not meeting the growing demand for automatic detection in medicine. Although the basic YOLO framework ensures real-time detection due to its fast speed, it still faces challenges in maintaining precision concurrently. To alleviate the above probl… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  25. arXiv:2408.14014  [pdf, ps, other

    cs.LG

    Category-Theoretical and Topos-Theoretical Frameworks in Machine Learning: A Survey

    Authors: Yiyang Jia, Guohong Peng, Zheng Yang, Tianhao Chen

    Abstract: In this survey, we provide an overview of category theory-derived machine learning from four mainstream perspectives: gradient-based learning, probability-based learning, invariance and equivalence-based learning, and topos-based learning. For the first three topics, we primarily review research in the past five years, updating and expanding on the previous survey by Shiebler et al.. The fourth to… ▽ More

    Submitted 29 August, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

  26. arXiv:2408.13898  [pdf, other

    cs.CV

    Evaluating Attribute Comprehension in Large Vision-Language Models

    Authors: Haiwen Zhang, Zixi Yang, Yuanzhi Liu, Xinran Wang, Zheqi He, Kongming Liang, Zhanyu Ma

    Abstract: Currently, large vision-language models have gained promising progress on many downstream tasks. However, they still suffer many challenges in fine-grained visual understanding tasks, such as object attribute comprehension. Besides, there have been growing efforts on the evaluations of large vision-language models, but lack of in-depth study of attribute comprehension and the visual language fine-… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: 15 pages, 4 figures

  27. arXiv:2408.13788  [pdf, other

    cs.CV

    3D-VirtFusion: Synthetic 3D Data Augmentation through Generative Diffusion Models and Controllable Editing

    Authors: Shichao Dong, Ze Yang, Guosheng Lin

    Abstract: Data augmentation plays a crucial role in deep learning, enhancing the generalization and robustness of learning-based models. Standard approaches involve simple transformations like rotations and flips for generating extra data. However, these augmentations are limited by their initial dataset, lacking high-level diversity. Recently, large models such as language models and diffusion models have… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  28. arXiv:2408.13711  [pdf, other

    cs.CV cs.MM

    SceneDreamer360: Text-Driven 3D-Consistent Scene Generation with Panoramic Gaussian Splatting

    Authors: Wenrui Li, Yapeng Mi, Fucheng Cai, Zhe Yang, Wangmeng Zuo, Xingtao Wang, Xiaopeng Fan

    Abstract: Text-driven 3D scene generation has seen significant advancements recently. However, most existing methods generate single-view images using generative models and then stitch them together in 3D space. This independent generation for each view often results in spatial inconsistency and implausibility in the 3D scenes. To address this challenge, we proposed a novel text-driven 3D-consistent scene g… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  29. arXiv:2408.13674  [pdf, other

    cs.CV

    GenCA: A Text-conditioned Generative Model for Realistic and Drivable Codec Avatars

    Authors: Keqiang Sun, Amin Jourabloo, Riddhish Bhalodia, Moustafa Meshry, Yu Rong, Zhengyu Yang, Thu Nguyen-Phuoc, Christian Haene, Jiu Xu, Sam Johnson, Hongsheng Li, Sofien Bouaziz

    Abstract: Photo-realistic and controllable 3D avatars are crucial for various applications such as virtual and mixed reality (VR/MR), telepresence, gaming, and film production. Traditional methods for avatar creation often involve time-consuming scanning and reconstruction processes for each avatar, which limits their scalability. Furthermore, these methods do not offer the flexibility to sample new identit… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  30. arXiv:2408.13546  [pdf, other

    eess.SP cs.AI

    Synesthesia of Machines (SoM)-Enhanced ISAC Precoding for Vehicular Networks with Double Dynamics

    Authors: Zonghui Yang, Shijian Gao, Xiang Cheng, Liuqing Yang

    Abstract: Integrated sensing and communication (ISAC) technology plays a crucial role in vehicular networks. However, the communication channel within this context exhibits time-varying characteristics, and potential targets may move rapidly, resulting in double dynamics. These presents significant challenges for real-time ISAC precoding design that have not been thoroughly explored. While optimization-base… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

    Comments: 13 pages, 17 figures, 4 tables

  31. arXiv:2408.13355  [pdf, other

    cs.SD cs.AI eess.AS

    Disentangled Training with Adversarial Examples For Robust Small-footprint Keyword Spotting

    Authors: Zhenyu Wang, Li Wan, Biqiao Zhang, Yiteng Huang, Shang-Wen Li, Ming Sun, Xin Lei, Zhaojun Yang

    Abstract: A keyword spotting (KWS) engine that is continuously running on device is exposed to various speech signals that are usually unseen before. It is a challenging problem to build a small-footprint and high-performing KWS model with robustness under different acoustic environments. In this paper, we explore how to effectively apply adversarial examples to improve KWS robustness. We propose datasource… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Journal ref: ICASSP 2023

  32. arXiv:2408.12948  [pdf, other

    cs.SE

    E-code: Mastering Efficient Code Generation through Pretrained Models and Expert Encoder Group

    Authors: Yue Pan, Chen Lyu, Zhenyu Yang, Lantian Li, Qi Liu, Xiuting Shao

    Abstract: Context: With the waning of Moore's Law, the software industry is placing increasing importance on finding alternative solutions for continuous performance enhancement. The significance and research results of software performance optimization have been on the rise in recent years, especially with the advancement propelled by Large Language Models(LLMs). However, traditional strategies for rectify… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  33. arXiv:2408.12821  [pdf, other

    cs.CV cs.AI

    Examining the Commitments and Difficulties Inherent in Multimodal Foundation Models for Street View Imagery

    Authors: Zhenyuan Yang, Xuhui Lin, Qinyi He, Ziye Huang, Zhengliang Liu, Hanqi Jiang, Peng Shu, Zihao Wu, Yiwei Li, Stephen Law, Gengchen Mai, Tianming Liu, Tao Yang

    Abstract: The emergence of Large Language Models (LLMs) and multimodal foundation models (FMs) has generated heightened interest in their applications that integrate vision and language. This paper investigates the capabilities of ChatGPT-4V and Gemini Pro for Street View Imagery, Built Environment, and Interior by evaluating their performance across various tasks. The assessments include street furniture i… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  34. arXiv:2408.12528  [pdf, other

    cs.CV

    Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

    Authors: Jinheng Xie, Weijia Mao, Zechen Bai, David Junhao Zhang, Weihao Wang, Kevin Qinghong Lin, Yuchao Gu, Zhijie Chen, Zhenheng Yang, Mike Zheng Shou

    Abstract: We present a unified transformer, i.e., Show-o, that unifies multimodal understanding and generation. Unlike fully autoregressive models, Show-o unifies autoregressive and (discrete) diffusion modeling to adaptively handle inputs and outputs of various and mixed modalities. The unified model flexibly supports a wide range of vision-language tasks including visual question-answering, text-to-image… ▽ More

    Submitted 25 August, 2024; v1 submitted 22 August, 2024; originally announced August 2024.

    Comments: Technical Report

  35. arXiv:2408.12056  [pdf, other

    cs.SE cs.AI

    Enhancing LLM-Based Automated Program Repair with Design Rationales

    Authors: Jiuang Zhao, Donghao Yang, Li Zhang, Xiaoli Lian, Zitian Yang

    Abstract: Automatic Program Repair (APR) endeavors to autonomously rectify issues within specific projects, which generally encompasses three categories of tasks: bug resolution, new feature development, and feature enhancement. Despite extensive research proposing various methodologies, their efficacy in addressing real issues remains unsatisfactory. It's worth noting that, typically, engineers have design… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  36. arXiv:2408.11564  [pdf, other

    cs.CV

    AutoDirector: Online Auto-scheduling Agents for Multi-sensory Composition

    Authors: Minheng Ni, Chenfei Wu, Huaying Yuan, Zhengyuan Yang, Ming Gong, Lijuan Wang, Zicheng Liu, Wangmeng Zuo, Nan Duan

    Abstract: With the advancement of generative models, the synthesis of different sensory elements such as music, visuals, and speech has achieved significant realism. However, the approach to generate multi-sensory outputs has not been fully explored, limiting the application on high-value scenarios such as of directing a film. Developing a movie director agent faces two major challenges: (1) Lack of paralle… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  37. arXiv:2408.11446  [pdf, other

    cs.ET

    Green Probabilistic Semantic Communication over Wireless Networks

    Authors: Ruopeng Xu, Zhaohui Yang, Yijie Mao, Chongwen Huang, Qianqian Yang, Lexi Xu, Wei Xu, Zhaoyang Zhang

    Abstract: In this paper, we propose a multi-user green semantic communication system facilitated by a probabilistic knowledge graph (PKG). By integrating probability into the knowledge graph, we enable probabilistic semantic communication (PSC) and represent semantic information accordingly. On this basis, a semantic compression model designed for multi-user downlink task-oriented communication is introduce… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  38. arXiv:2408.10681  [pdf, other

    cs.CL cs.LG

    HMoE: Heterogeneous Mixture of Experts for Language Modeling

    Authors: An Wang, Xingwu Sun, Ruobing Xie, Shuaipeng Li, Jiaqi Zhu, Zhen Yang, Pinxue Zhao, J. N. Han, Zhanhui Kang, Di Wang, Naoaki Okazaki, Cheng-zhong Xu

    Abstract: Mixture of Experts (MoE) offers remarkable performance and computational efficiency by selectively activating subsets of model parameters. Traditionally, MoE models use homogeneous experts, each with identical capacity. However, varying complexity in input data necessitates experts with diverse capabilities, while homogeneous MoE hinders effective expert specialization and efficient parameter util… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  39. arXiv:2408.10468  [pdf, other

    cs.LG cs.CL cs.CR

    Tracing Privacy Leakage of Language Models to Training Data via Adjusted Influence Functions

    Authors: Jinxin Liu, Zao Yang

    Abstract: The responses generated by Large Language Models (LLMs) can include sensitive information from individuals and organizations, leading to potential privacy leakage. This work implements Influence Functions (IFs) to trace privacy leakage back to the training data, thereby mitigating privacy concerns of Language Models (LMs). However, we notice that current IFs struggle to accurately estimate the inf… ▽ More

    Submitted 5 September, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

  40. arXiv:2408.09807  [pdf, other

    cs.AI

    World Models Increase Autonomy in Reinforcement Learning

    Authors: Zhao Yang, Thomas M. Moerland, Mike Preuss, Aske Plaat, Edward S. Hu

    Abstract: Reinforcement learning (RL) is an appealing paradigm for training intelligent agents, enabling policy acquisition from the agent's own autonomously acquired experience. However, the training process of RL is far from automatic, requiring extensive human effort to reset the agent and environments. To tackle the challenging reset-free setting, we first demonstrate the superiority of model-based (MB)… ▽ More

    Submitted 20 August, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

  41. arXiv:2408.09621  [pdf, other

    cs.CL

    Refining Packing and Shuffling Strategies for Enhanced Performance in Generative Language Models

    Authors: Yanbing Chen, Ruilin Wang, Zihao Yang, Lavender Yao Jiang, Eric Karl Oermann

    Abstract: Packing and shuffling tokens is a common practice in training auto-regressive language models (LMs) to prevent overfitting and improve efficiency. Typically documents are concatenated to chunks of maximum sequence length (MSL) and then shuffled. However setting the atom size, the length for each data chunk accompanied by random shuffling, to MSL may lead to contextual incoherence due to tokens fro… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: 11 pages (include appendix), 26 figures, submitted to ACL ARR Aug 2024

    ACM Class: I.2.7

  42. arXiv:2408.08978  [pdf, other

    cs.CL

    See What LLMs Cannot Answer: A Self-Challenge Framework for Uncovering LLM Weaknesses

    Authors: Yulong Chen, Yang Liu, Jianhao Yan, Xuefeng Bai, Ming Zhong, Yinghao Yang, Ziyi Yang, Chenguang Zhu, Yue Zhang

    Abstract: The impressive performance of Large Language Models (LLMs) has consistently surpassed numerous human-designed benchmarks, presenting new challenges in assessing the shortcomings of LLMs. Designing tasks and finding LLMs' limitations are becoming increasingly important. In this paper, we investigate the question of whether an LLM can discover its own limitations from the errors it makes. To this en… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: COLM 2024

  43. arXiv:2408.08913  [pdf, other

    cs.IR

    MLoRA: Multi-Domain Low-Rank Adaptive Network for CTR Prediction

    Authors: Zhiming Yang, Haining Gao, Dehong Gao, Luwei Yang, Libin Yang, Xiaoyan Cai, Wei Ning, Guannan Zhang

    Abstract: Click-through rate (CTR) prediction is one of the fundamental tasks in the industry, especially in e-commerce, social media, and streaming media. It directly impacts website revenues, user satisfaction, and user retention. However, real-world production platforms often encompass various domains to cater for diverse customer needs. Traditional CTR prediction models struggle in multi-domain recommen… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: 11 pages. Accepted by RecSys'2024, full paper

  44. arXiv:2408.08570  [pdf, other

    cs.CV

    EraW-Net: Enhance-Refine-Align W-Net for Scene-Associated Driver Attention Estimation

    Authors: Jun Zhou, Chunsheng Liu, Faliang Chang, Wenqian Wang, Penghui Hao, Yiming Huang, Zhiqiang Yang

    Abstract: Associating driver attention with driving scene across two fields of views (FOVs) is a hard cross-domain perception problem, which requires comprehensive consideration of cross-view mapping, dynamic driving scene analysis, and driver status tracking. Previous methods typically focus on a single view or map attention to the scene via estimated gaze, failing to exploit the implicit connection betwee… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: 13pages, 9 figures,

  45. arXiv:2408.08533  [pdf, ps, other

    stat.ML cs.LG

    Unsupervised Transfer Learning via Adversarial Contrastive Training

    Authors: Chenguang Duan, Yuling Jiao, Huazhen Lin, Wensen Ma, Jerry Zhijian Yang

    Abstract: Learning a data representation for downstream supervised learning tasks under unlabeled scenario is both critical and challenging. In this paper, we propose a novel unsupervised transfer learning approach using adversarial contrastive training (ACT). Our experimental results demonstrate outstanding classification accuracy with both fine-tuned linear probe and K-NN protocol across various datasets,… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  46. arXiv:2408.08338  [pdf, ps, other

    cs.LG

    Activation Space Selectable Kolmogorov-Arnold Networks

    Authors: Zhuoqin Yang, Jiansong Zhang, Xiaoling Luo, Zheng Lu, Linlin Shen

    Abstract: The multilayer perceptron (MLP), a fundamental paradigm in current artificial intelligence, is widely applied in fields such as computer vision and natural language processing. However, the recently proposed Kolmogorov-Arnold Network (KAN), based on nonlinear additive connections, has been proven to achieve performance comparable to MLPs with significantly fewer parameters. Despite this potential,… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: 12 pages, 6 figures. The code for this work will be released soon

  47. arXiv:2408.08072  [pdf, other

    cs.CL

    I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm

    Authors: Yiming Liang, Ge Zhang, Xingwei Qu, Tianyu Zheng, Jiawei Guo, Xinrun Du, Zhenzhu Yang, Jiaheng Liu, Chenghua Lin, Lei Ma, Wenhao Huang, Jiajun Zhang

    Abstract: Large Language Models (LLMs) have achieved significant advancements, however, the common learning paradigm treats LLMs as passive information repositories, neglecting their potential for active learning and alignment. Some approaches train LLMs using their own generated synthetic data, exploring the possibility of active alignment. However, there is still a huge gap between these one-time alignmen… ▽ More

    Submitted 27 August, 2024; v1 submitted 15 August, 2024; originally announced August 2024.

  48. arXiv:2408.08050  [pdf, other

    cs.CV

    CamoTeacher: Dual-Rotation Consistency Learning for Semi-Supervised Camouflaged Object Detection

    Authors: Xunfa Lai, Zhiyu Yang, Jie Hu, Shengchuan Zhang, Liujuan Cao, Guannan Jiang, Zhiyu Wang, Songan Zhang, Rongrong Ji

    Abstract: Existing camouflaged object detection~(COD) methods depend heavily on large-scale pixel-level annotations.However, acquiring such annotations is laborious due to the inherent camouflage characteristics of the objects.Semi-supervised learning offers a promising solution to this challenge.Yet, its application in COD is hindered by significant pseudo-label noise, both pixel-level and instance-level.W… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: Accepted to ECCV 2024

  49. arXiv:2408.07990  [pdf, other

    cs.CL

    FuseChat: Knowledge Fusion of Chat Models

    Authors: Fanqi Wan, Longguang Zhong, Ziyi Yang, Ruijun Chen, Xiaojun Quan

    Abstract: While training large language models (LLMs) from scratch can indeed lead to models with distinct capabilities and strengths, it incurs substantial costs and may lead to redundancy in competencies. Knowledge fusion aims to integrate existing LLMs of diverse architectures and capabilities into a more potent LLM through lightweight continual training, thereby reducing the need for costly LLM developm… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: Work in progress

  50. arXiv:2408.07422  [pdf, other

    cs.CV cs.AI

    LLMI3D: Empowering LLM with 3D Perception from a Single 2D Image

    Authors: Fan Yang, Sicheng Zhao, Yanhao Zhang, Haoxiang Chen, Hui Chen, Wenbo Tang, Haonan Lu, Pengfei Xu, Zhenyu Yang, Jungong Han, Guiguang Ding

    Abstract: Recent advancements in autonomous driving, augmented reality, robotics, and embodied intelligence have necessitated 3D perception algorithms. However, current 3D perception methods, particularly small models, struggle with processing logical reasoning, question-answering, and handling open scenario categories. On the other hand, generative multimodal large language models (MLLMs) excel in general… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.