Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 433 results for author: Jin, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.20947  [pdf, other

    cs.NE

    An Asynchronous Multi-core Accelerator for SNN inference

    Authors: Zhuo Chen, De Ma, Xiaofei Jin, Qinghui Xing, Ouwen Jin, Xin Du, Shuibing He, Gang Pan

    Abstract: Spiking Neural Networks (SNNs) are extensively utilized in brain-inspired computing and neuroscience research. To enhance the speed and energy efficiency of SNNs, several many-core accelerators have been developed. However, maintaining the accuracy of SNNs often necessitates frequent explicit synchronization among all cores, which presents a challenge to overall efficiency. In this paper, we propo… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  2. arXiv:2407.20761  [pdf, other

    cs.AI

    OmniBal: Towards Fast Instruct-tuning for Vision-Language Models via Omniverse Computation Balance

    Authors: Yongqiang Yao, Jingru Tan, Jiahao Hu, Feizhao Zhang, Xin Jin, Bo Li, Ruihao Gong, Pengfei Liu

    Abstract: Recently, vision-language instruct-tuning models have made significant progress due to their more comprehensive understanding of the world. In this work, we discovered that large-scale 3D parallel training on those models leads to an imbalanced computation load across different devices. The vision and language parts are inherently heterogeneous: their data distribution and model architecture diffe… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  3. arXiv:2407.20018  [pdf, other

    cs.DC

    Efficient Training of Large Language Models on Distributed Infrastructures: A Survey

    Authors: Jiangfei Duan, Shuo Zhang, Zerui Wang, Lijuan Jiang, Wenwen Qu, Qinghao Hu, Guoteng Wang, Qizhen Weng, Hang Yan, Xingcheng Zhang, Xipeng Qiu, Dahua Lin, Yonggang Wen, Xin Jin, Tianwei Zhang, Peng Sun

    Abstract: Large Language Models (LLMs) like GPT and LLaMA are revolutionizing the AI industry with their sophisticated capabilities. Training these models requires vast GPU clusters and significant computing time, posing major challenges in terms of scalability, efficiency, and reliability. This survey explores recent advancements in training systems for LLMs, including innovations in training infrastructur… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  4. arXiv:2407.18999  [pdf, other

    cs.CV cs.LG

    Graph-based Unsupervised Disentangled Representation Learning via Multimodal Large Language Models

    Authors: Baao Xie, Qiuyu Chen, Yunnan Wang, Zequn Zhang, Xin Jin, Wenjun Zeng

    Abstract: Disentangled representation learning (DRL) aims to identify and decompose underlying factors behind observations, thus facilitating data perception and generation. However, current DRL approaches often rely on the unrealistic assumption that semantic factors are statistically independent. In reality, these factors may exhibit correlations, which off-the-shelf solutions have yet to properly address… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: 9 pages, 7 figures

  5. arXiv:2407.18957  [pdf, other

    q-fin.TR cs.AI cs.MA

    When AI Meets Finance (StockAgent): Large Language Model-based Stock Trading in Simulated Real-world Environments

    Authors: Chong Zhang, Xinyi Liu, Mingyu Jin, Zhongmou Zhang, Lingyao Li, Zhenting Wang, Wenyue Hua, Dong Shu, Suiyuan Zhu, Xiaobo Jin, Sujian Li, Mengnan Du, Yongfeng Zhang

    Abstract: Can AI Agents simulate real-world trading environments to investigate the impact of external factors on stock trading activities (e.g., macroeconomics, policy changes, company fundamentals, and global events)? These factors, which frequently influence trading behaviors, are critical elements in the quest for maximizing investors' profits. Our work attempts to solve this problem through large langu… ▽ More

    Submitted 30 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: 33 pages, 10 figures

  6. arXiv:2407.18556  [pdf, other

    cs.LG cs.AI

    Look Globally and Reason: Two-stage Path Reasoning over Sparse Knowledge Graphs

    Authors: Saiping Guan, Jiyao Wei, Xiaolong Jin, Jiafeng Guo, Xueqi Cheng

    Abstract: Sparse Knowledge Graphs (KGs), frequently encountered in real-world applications, contain fewer facts in the form of (head entity, relation, tail entity) compared to more populated KGs. The sparse KG completion task, which reasons answers for given queries in the form of (head entity, relation, ?) for sparse KGs, is particularly challenging due to the necessity of reasoning missing facts based on… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: Accepted to CIKM 2024

  7. arXiv:2407.15173  [pdf, other

    cs.CV

    Rethinking Domain Adaptation and Generalization in the Era of CLIP

    Authors: Ruoyu Feng, Tao Yu, Xin Jin, Xiaoyuan Yu, Lei Xiao, Zhibo Chen

    Abstract: In recent studies on domain adaptation, significant emphasis has been placed on the advancement of learning shared knowledge from a source domain to a target domain. Recently, the large vision-language pre-trained model, i.e., CLIP has shown strong ability on zero-shot recognition, and parameter efficient tuning can further improve its performance on specific tasks. This work demonstrates that a s… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

  8. arXiv:2407.14266  [pdf, other

    cs.IR cs.LG

    L^2CL: Embarrassingly Simple Layer-to-Layer Contrastive Learning for Graph Collaborative Filtering

    Authors: Xinzhou Jin, Jintang Li, Liang Chen, Chenyun Yu, Yuanzhen Xie, Tao Xie, Chengxiang Zhuo, Zang Li, Zibin Zheng

    Abstract: Graph neural networks (GNNs) have recently emerged as an effective approach to model neighborhood signals in collaborative filtering. Towards this research line, graph contrastive learning (GCL) demonstrates robust capabilities to address the supervision label shortage issue through generating massive self-supervised signals. Despite its effectiveness, GCL for recommendation suffers seriously from… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  9. arXiv:2407.12371  [pdf, other

    cs.CV cs.AI

    HIMO: A New Benchmark for Full-Body Human Interacting with Multiple Objects

    Authors: Xintao Lv, Liang Xu, Yichao Yan, Xin Jin, Congsheng Xu, Shuwen Wu, Yifan Liu, Lincheng Li, Mengxiao Bi, Wenjun Zeng, Xiaokang Yang

    Abstract: Generating human-object interactions (HOIs) is critical with the tremendous advances of digital avatars. Existing datasets are typically limited to humans interacting with a single object while neglecting the ubiquitous manipulation of multiple objects. Thus, we propose HIMO, a large-scale MoCap dataset of full-body human interacting with multiple objects, containing 3.3K 4D HOI sequences and 4.08… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Project page: https://lvxintao.github.io/himo, accepted by ECCV 2024

  10. arXiv:2407.11700  [pdf, other

    cs.CV eess.IV

    Rate-Distortion-Cognition Controllable Versatile Neural Image Compression

    Authors: Jinming Liu, Ruoyu Feng, Yunpeng Qi, Qiuyu Chen, Zhibo Chen, Wenjun Zeng, Xin Jin

    Abstract: Recently, the field of Image Coding for Machines (ICM) has garnered heightened interest and significant advances thanks to the rapid progress of learning-based techniques for image compression and analysis. Previous studies often require training separate codecs to support various bitrate levels, machine tasks, and networks, thus lacking both flexibility and practicality. To address these challeng… ▽ More

    Submitted 17 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: ECCV2024

  11. arXiv:2407.08936  [pdf, ps, other

    cs.LO

    HHLPar: Automated Theorem Prover for Parallel Hybrid Communicating Sequential Processes

    Authors: Xiangyu Jin, Bohua Zhan, Shuling Wang, Naijun Zhan

    Abstract: We present a tool called HHLPar for verifying hybrid systems modelled in Hybrid Communicating Sequential Processes (HCSP). HHLPar is built upon a Hybrid Hoare Logic for HCSP, which is able to reason about continuous-time properties of differential equations, as well as communication and parallel composition of parallel HCSP processes with the help of parameterised trace assertions and their synchr… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  12. arXiv:2407.07805  [pdf, other

    cs.CV

    SUMix: Mixup with Semantic and Uncertain Information

    Authors: Huafeng Qin, Xin Jin, Hongyu Zhu, Hongchao Liao, Mounîm A. El-Yacoubi, Xinbo Gao

    Abstract: Mixup data augmentation approaches have been applied for various tasks of deep learning to improve the generalization ability of deep neural networks. Some existing approaches CutMix, SaliencyMix, etc. randomly replace a patch in one image with patches from another to generate the mixed image. Similarly, the corresponding labels are linearly combined by a fixed ratio $λ$ by l. The objects in two i… ▽ More

    Submitted 17 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024 [Camera Ready] (19 pages, 7 figures) with the source code at https://github.com/JinXins/SUMix

  13. arXiv:2407.07771  [pdf, other

    cs.CL cs.CV cs.MM

    Multi-task Prompt Words Learning for Social Media Content Generation

    Authors: Haochen Xue, Chong Zhang, Chengzhi Liu, Fangyu Wu, Xiaobo Jin

    Abstract: The rapid development of the Internet has profoundly changed human life. Humans are increasingly expressing themselves and interacting with others on social media platforms. However, although artificial intelligence technology has been widely used in many aspects of life, its application in social media content creation is still blank. To solve this problem, we propose a new prompt word generation… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 8 pages, 5 figures

    Journal ref: International Joint Conference on Neural Networks 2024

  14. arXiv:2407.06127  [pdf, other

    cs.CV

    Better Sampling, towards Better End-to-end Small Object Detection

    Authors: Zile Huang, Chong Zhang, Mingyu Jin, Fangyu Wu, Chengzhi Liu, Xiaobo Jin

    Abstract: While deep learning-based general object detection has made significant strides in recent years, the effectiveness and efficiency of small object detection remain unsatisfactory. This is primarily attributed not only to the limited characteristics of such small targets but also to the high density and mutual overlap among these targets. The existing transformer-based small object detectors do not… ▽ More

    Submitted 17 May, 2024; originally announced July 2024.

    Comments: 14 pages, 5 figures

  15. arXiv:2407.04697  [pdf, other

    cs.CV cs.MM

    VCoME: Verbal Video Composition with Multimodal Editing Effects

    Authors: Weibo Gong, Xiaojie Jin, Xin Li, Dongliang He, Xinglong Wu

    Abstract: Verbal videos, featuring voice-overs or text overlays, provide valuable content but present significant challenges in composition, especially when incorporating editing effects to enhance clarity and visual appeal. In this paper, we introduce the novel task of verbal video composition with editing effects. This task aims to generate coherent and visually appealing verbal videos by integrating mult… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  16. arXiv:2407.03757  [pdf, other

    cs.CV

    DiffRetouch: Using Diffusion to Retouch on the Shoulder of Experts

    Authors: Zheng-Peng Duan, Jiawei zhang, Zheng Lin, Xin Jin, Dongqing Zou, Chunle Guo, Chongyi Li

    Abstract: Image retouching aims to enhance the visual quality of photos. Considering the different aesthetic preferences of users, the target of retouching is subjective. However, current retouching methods mostly adopt deterministic models, which not only neglects the style diversity in the expert-retouched results and tends to learn an average style during training, but also lacks sample diversity during… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  17. arXiv:2407.02345  [pdf, other

    cs.CL

    MORPHEUS: Modeling Role from Personalized Dialogue History by Exploring and Utilizing Latent Space

    Authors: Yihong Tang, Bo Wang, Dongming Zhao, Xiaojia Jin, Jijun Zhang, Ruifang He, Yuexian Hou

    Abstract: Personalized Dialogue Generation (PDG) aims to create coherent responses according to roles or personas. Traditional PDG relies on external role data, which can be scarce and raise privacy concerns. Approaches address these issues by extracting role information from dialogue history, which often fail to generically model roles in continuous space. To overcome these limitations, we introduce a nove… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  18. arXiv:2407.02077  [pdf, other

    cs.CV

    Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion

    Authors: Bohan Li, Jiajun Deng, Wenyao Zhang, Zhujin Liang, Dalong Du, Xin Jin, Wenjun Zeng

    Abstract: Camera-based 3D semantic scene completion (SSC) is pivotal for predicting complicated 3D layouts with limited 2D image observations. The existing mainstream solutions generally leverage temporal information by roughly stacking history frames to supplement the current frame, such straightforward temporal modeling inevitably diminishes valid clues and increases learning difficulty. To address this p… ▽ More

    Submitted 16 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  19. arXiv:2407.00603  [pdf, other

    cs.CV

    Hierarchical Memory for Long Video QA

    Authors: Yiqin Wang, Haoji Zhang, Yansong Tang, Yong Liu, Jiashi Feng, Jifeng Dai, Xiaojie Jin

    Abstract: This paper describes our champion solution to the LOVEU Challenge @ CVPR'24, Track 1 (Long Video VQA). Processing long sequences of visual tokens is computationally expensive and memory-intensive, making long video question-answering a challenging task. The key is to compress visual tokens effectively, reducing memory footprint and decoding latency, while preserving the essential information for a… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  20. arXiv:2406.18485  [pdf, other

    cs.DC

    LoongTrain: Efficient Training of Long-Sequence LLMs with Head-Context Parallelism

    Authors: Diandian Gu, Peng Sun, Qinghao Hu, Ting Huang, Xun Chen, Yingtong Xiong, Guoteng Wang, Qiaoling Chen, Shangchun Zhao, Jiarui Fang, Yonggang Wen, Tianwei Zhang, Xin Jin, Xuanzhe Liu

    Abstract: Efficiently training LLMs with long sequences is important yet challenged by the massive computation and memory requirements. Sequence parallelism has been proposed to tackle these problems, but existing methods suffer from scalability or efficiency issues. We propose LoongTrain, a novel system to efficiently train LLMs with long sequences at scale. The core of LoongTrain is the 2D-Attention mecha… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  21. arXiv:2406.14191  [pdf, other

    cs.CL cs.AI cs.LG

    Temporal Knowledge Graph Question Answering: A Survey

    Authors: Miao Su, Zixuan Li, Zhuo Chen, Long Bai, Xiaolong Jin, Jiafeng Guo

    Abstract: Knowledge Base Question Answering (KBQA) has been a long-standing field to answer questions based on knowledge bases. Recently, the evolving dynamics of knowledge have attracted a growing interest in Temporal Knowledge Graph Question Answering (TKGQA), an emerging task to answer temporal questions. However, this field grapples with ambiguities in defining temporal questions and lacks a systematic… ▽ More

    Submitted 5 July, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: 8 pages, 3 figures

  22. arXiv:2406.14026  [pdf, other

    cs.LG cs.CL stat.ML

    Demystifying Forgetting in Language Model Fine-Tuning with Statistical Analysis of Example Associations

    Authors: Xisen Jin, Xiang Ren

    Abstract: Language models (LMs) are known to suffer from forgetting of previously learned examples when fine-tuned, breaking stability of deployed LM systems. Despite efforts on mitigating forgetting, few have investigated whether, and how forgotten upstream examples are associated with newly learned tasks. Insights on such associations enable efficient and targeted mitigation of forgetting. In this paper,… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 5 pages

  23. arXiv:2406.08855  [pdf, other

    cs.RO

    Trajectory Planning for Autonomous Driving in Unstructured Scenarios Based on Graph Neural Network and Numerical Optimization

    Authors: Sumin Zhang, Kuo Li, Rui He, Zhiwei Meng, Yupeng Chang, Xiaosong Jin, Ri Bai

    Abstract: In unstructured environments, obstacles are diverse and lack lane markings, making trajectory planning for intelligent vehicles a challenging task. Traditional trajectory planning methods typically involve multiple stages, including path planning, speed planning, and trajectory optimization. These methods require the manual design of numerous parameters for each stage, resulting in significant wor… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  24. arXiv:2406.08155  [pdf, other

    cs.LG cs.AI cs.CL

    Examining Post-Training Quantization for Mixture-of-Experts: A Benchmark

    Authors: Pingzhi Li, Xiaolong Jin, Yu Cheng, Tianlong Chen

    Abstract: Large Language Models~(LLMs) have become foundational in the realm of natural language processing, demonstrating performance improvements as model sizes increase. The Mixture-of-Experts~(MoE) approach offers a promising way to scale LLMs more efficiently by using fewer computational FLOPs through sparse activation. However, it suffers from significant memory overheads, necessitating model compress… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Our code for reproducing all our experiments is provided at https://github.com/UNITES-Lab/moe-quantization

  25. arXiv:2406.08085  [pdf, other

    cs.CV

    Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams

    Authors: Haoji Zhang, Yiqin Wang, Yansong Tang, Yong Liu, Jiashi Feng, Jifeng Dai, Xiaojie Jin

    Abstract: Benefiting from the advancements in large language models and cross-modal alignment, existing multi-modal video understanding methods have achieved prominent performance in offline scenario. However, online video streams, as one of the most common media forms in the real world, have seldom received attention. Compared to offline videos, the 'dynamic' nature of online video streams poses challenges… ▽ More

    Submitted 30 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  26. arXiv:2406.07006  [pdf, other

    cs.CV

    MIPI 2024 Challenge on Few-shot RAW Image Denoising: Methods and Results

    Authors: Xin Jin, Chunle Guo, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Ruoqi Li, Chang Liu, Ziyi Wang, Yao Du, Jingjing Yang, Long Bao, Heng Sun, Xiangyu Kong, Xiaoxia Xing, Jinlong Wu, Yuanyang Xue, Hyunhee Park, Sejun Song, Changho Kim, Jingfan Tan , et al. (17 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: CVPR 2024 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Few-shot RAWImage Denoising Challenge Report. Website: https://mipi-challenge.org/MIPI2024/

  27. arXiv:2406.06858  [pdf, other

    cs.LG cs.DC

    FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel Fusion

    Authors: Li-Wen Chang, Wenlei Bao, Qi Hou, Chengquan Jiang, Ningxin Zheng, Yinmin Zhong, Xuanrun Zhang, Zuquan Song, Ziheng Jiang, Haibin Lin, Xin Jin, Xin Liu

    Abstract: Large deep learning models have demonstrated strong ability to solve many tasks across a wide range of applications. Those large models typically require training and inference to be distributed. Tensor parallelism is a common technique partitioning computation of an operation or layer across devices to overcome the memory capacity limitation of a single processor, and/or to accelerate computation… ▽ More

    Submitted 18 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

  28. arXiv:2406.06216  [pdf, other

    cs.CV

    Lighting Every Darkness with 3DGS: Fast Training and Real-Time Rendering for HDR View Synthesis

    Authors: Xin Jin, Pengyi Jiao, Zheng-Peng Duan, Xingchao Yang, Chun-Le Guo, Bo Ren, Chongyi Li

    Abstract: Volumetric rendering based methods, like NeRF, excel in HDR view synthesis from RAWimages, especially for nighttime scenes. While, they suffer from long training times and cannot perform real-time rendering due to dense sampling requirements. The advent of 3D Gaussian Splatting (3DGS) enables real-time rendering and faster training. However, implementing RAW image-based view synthesis directly usi… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  29. arXiv:2406.05392  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    Deconstructing The Ethics of Large Language Models from Long-standing Issues to New-emerging Dilemmas

    Authors: Chengyuan Deng, Yiqun Duan, Xin Jin, Heng Chang, Yijun Tian, Han Liu, Henry Peng Zou, Yiqiao Jin, Yijia Xiao, Yichen Wang, Shenghao Wu, Zongxing Xie, Kuofeng Gao, Sihong He, Jun Zhuang, Lu Cheng, Haohan Wang

    Abstract: Large Language Models (LLMs) have achieved unparalleled success across diverse language modeling tasks in recent years. However, this progress has also intensified ethical concerns, impacting the deployment of LLMs in everyday contexts. This paper provides a comprehensive survey of ethical challenges associated with LLMs, from longstanding issues such as copyright infringement, systematic bias, an… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  30. arXiv:2406.02929  [pdf, other

    cs.CV cs.LG

    Exploring Data Efficiency in Zero-Shot Learning with Diffusion Models

    Authors: Zihan Ye, Shreyank N. Gowda, Xiaobo Jin, Xiaowei Huang, Haotian Xu, Yaochu Jin, Kaizhu Huang

    Abstract: Zero-Shot Learning (ZSL) aims to enable classifiers to identify unseen classes by enhancing data efficiency at the class level. This is achieved by generating image features from pre-defined semantics of unseen classes. However, most current approaches heavily depend on the number of samples from seen classes, i.e. they do not consider instance-level effectiveness. In this paper, we demonstrate th… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  31. arXiv:2406.00943  [pdf, other

    cs.LG cs.AI

    State Space Models on Temporal Graphs: A First-Principles Study

    Authors: Jintang Li, Ruofan Wu, Xinzhou Jin, Boqun Ma, Liang Chen, Zibin Zheng

    Abstract: Over the past few years, research on deep graph learning has shifted from static graphs to temporal graphs in response to real-world complex systems that exhibit dynamic behaviors. In practice, temporal graphs are formalized as an ordered sequence of static graph snapshots observed at discrete time points. Sequence models such as RNNs or Transformers have long been the predominant backbone network… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: Preprint; Code will be made available at https://github.com/EdisonLeeeee/GraphSSM

  32. arXiv:2406.00275  [pdf, other

    cs.CV cs.LG

    StyDeSty: Min-Max Stylization and Destylization for Single Domain Generalization

    Authors: Songhua Liu, Xin Jin, Xingyi Yang, Jingwen Ye, Xinchao Wang

    Abstract: Single domain generalization (single DG) aims at learning a robust model generalizable to unseen domains from only one training domain, making it a highly ambitious and challenging task. State-of-the-art approaches have mostly relied on data augmentations, such as adversarial perturbation and style enhancement, to synthesize new data and thus increase robustness. Nevertheless, they have largely ov… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

    Comments: Accepted at ICML 2024; Work in 2022 spring

  33. arXiv:2405.19946  [pdf, other

    cs.AI

    Learning to Discuss Strategically: A Case Study on One Night Ultimate Werewolf

    Authors: Xuanfa Jin, Ziyan Wang, Yali Du, Meng Fang, Haifeng Zhang, Jun Wang

    Abstract: Communication is a fundamental aspect of human society, facilitating the exchange of information and beliefs among people. Despite the advancements in large language models (LLMs), recent agents built with these often neglect the control over discussion tactics, which are essential in communication scenarios and games. As a variant of the famous communication game Werewolf, One Night Ultimate Were… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 27 pages, 5 figures

  34. arXiv:2405.19850  [pdf, other

    cs.AI

    Deciphering Human Mobility: Inferring Semantics of Trajectories with Large Language Models

    Authors: Yuxiao Luo, Zhongcai Cao, Xin Jin, Kang Liu, Ling Yin

    Abstract: Understanding human mobility patterns is essential for various applications, from urban planning to public safety. The individual trajectory such as mobile phone location data, while rich in spatio-temporal information, often lacks semantic detail, limiting its utility for in-depth mobility analysis. Existing methods can infer basic routine activity sequences from this data, lacking depth in under… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  35. arXiv:2405.19548  [pdf, other

    cs.LG

    RLeXplore: Accelerating Research in Intrinsically-Motivated Reinforcement Learning

    Authors: Mingqi Yuan, Roger Creus Castanyer, Bo Li, Xin Jin, Glen Berseth, Wenjun Zeng

    Abstract: Extrinsic rewards can effectively guide reinforcement learning (RL) agents in specific tasks. However, extrinsic rewards frequently fall short in complex environments due to the significant human effort needed for their design and annotation. This limitation underscores the necessity for intrinsic rewards, which offer auxiliary and dense signals and can enable agents to learn in an unsupervised ma… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 25 pages, 19 figures

  36. arXiv:2405.17188  [pdf, other

    cs.CV

    The SkatingVerse Workshop & Challenge: Methods and Results

    Authors: Jian Zhao, Lei Jin, Jianshu Li, Zheng Zhu, Yinglei Teng, Jiaojiao Zhao, Sadaf Gulshad, Zheng Wang, Bo Zhao, Xiangbo Shu, Yunchao Wei, Xuecheng Nie, Xiaojie Jin, Xiaodan Liang, Shin'ichi Satoh, Yandong Guo, Cewu Lu, Junliang Xing, Jane Shen Shengmei

    Abstract: The SkatingVerse Workshop & Challenge aims to encourage research in developing novel and accurate methods for human action understanding. The SkatingVerse dataset used for the SkatingVerse Challenge has been publicly released. There are two subsets in the dataset, i.e., the training subset and testing subset. The training subsets consists of 19,993 RGB video sequences, and the testing subsets cons… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  37. arXiv:2405.17176  [pdf, other

    cs.GR cs.AI

    DreamMat: High-quality PBR Material Generation with Geometry- and Light-aware Diffusion Models

    Authors: Yuqing Zhang, Yuan Liu, Zhiyu Xie, Lei Yang, Zhongyuan Liu, Mengzhou Yang, Runze Zhang, Qilong Kou, Cheng Lin, Wenping Wang, Xiaogang Jin

    Abstract: 2D diffusion model, which often contains unwanted baked-in shading effects and results in unrealistic rendering effects in the downstream applications. Generating Physically Based Rendering (PBR) materials instead of just RGB textures would be a promising solution. However, directly distilling the PBR material parameters from 2D diffusion models still suffers from incorrect material decomposition,… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Accepted to SIGGRAPH 2024

  38. arXiv:2405.12721  [pdf, other

    cs.CV

    StarLKNet: Star Mixup with Large Kernel Networks for Palm Vein Identification

    Authors: Xin Jin, Hongyu Zhu, Mounîm A. El Yacoubi, Hongchao Liao, Huafeng Qin, Yun Jiang

    Abstract: As a representative of a new generation of biometrics, vein identification technology offers a high level of security and convenience. Convolutional neural networks (CNNs), a prominent class of deep learning architectures, have been extensively utilized for vein identification. Since their performance and robustness are limited by small Effective Receptive Fields (e.g. 3$\times$3 kernels) and insu… ▽ More

    Submitted 16 June, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

    Comments: 7 pages, 6 figures

  39. arXiv:2405.11913  [pdf, other

    cs.CV

    Diff-BGM: A Diffusion Model for Video Background Music Generation

    Authors: Sizhe Li, Yiming Qin, Minghang Zheng, Xin Jin, Yang Liu

    Abstract: When editing a video, a piece of attractive background music is indispensable. However, video background music generation tasks face several challenges, for example, the lack of suitable training datasets, and the difficulties in flexibly controlling the music generation process and sequentially aligning the video and music. In this work, we first propose a high-quality music-video dataset BGM909… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: Accepted by CVPR 2024(Poster)

  40. arXiv:2405.04434  [pdf, other

    cs.CL cs.AI

    DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

    Authors: DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding , et al. (132 additional authors not shown)

    Abstract: We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference… ▽ More

    Submitted 19 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

  41. arXiv:2405.02982  [pdf, other

    cs.CV

    Paintings and Drawings Aesthetics Assessment with Rich Attributes for Various Artistic Categories

    Authors: Xin Jin, Qianqian Qiao, Yi Lu, Shan Gao, Heng Huang, Guangdong Li

    Abstract: Image aesthetic evaluation is a highly prominent research domain in the field of computer vision. In recent years, there has been a proliferation of datasets and corresponding evaluation methodologies for assessing the aesthetic quality of photographic works, leading to the establishment of a relatively mature research environment. However, in contrast to the extensive research in photographic aes… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  42. arXiv:2405.02972  [pdf, other

    cs.NI cs.AI

    Multi-Agent RL-Based Industrial AIGC Service Offloading over Wireless Edge Networks

    Authors: Siyuan Li, Xi Lin, Hansong Xu, Kun Hua, Xiaomin Jin, Gaolei Li, Jianhua Li

    Abstract: Currently, the generative model has garnered considerable attention due to its application in addressing the challenge of scarcity of abnormal samples in the industrial Internet of Things (IoT). However, challenges persist regarding the edge deployment of generative models and the optimization of joint edge AI-generated content (AIGC) tasks. In this paper, we focus on the edge optimization of AIGC… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  43. arXiv:2404.14219  [pdf, other

    cs.CL cs.AI

    Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

    Authors: Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Qin Cai, Martin Cai, Caio César Teodoro Mendes, Weizhu Chen, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Yen-Chun Chen, Yi-Ling Chen, Parul Chopra , et al. (90 additional authors not shown)

    Abstract: We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset… ▽ More

    Submitted 23 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: 19 pages

  44. arXiv:2404.12457  [pdf, other

    cs.DC cs.CL cs.LG

    RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation

    Authors: Chao Jin, Zili Zhang, Xuanlin Jiang, Fangyue Liu, Xin Liu, Xuanzhe Liu, Xin Jin

    Abstract: Retrieval-Augmented Generation (RAG) has shown significant improvements in various natural language processing tasks by integrating the strengths of large language models (LLMs) and external knowledge databases. However, RAG introduces long sequence generation and leads to high computation and memory costs. We propose RAGCache, a novel multilevel dynamic caching system tailored for RAG. Our analys… ▽ More

    Submitted 25 April, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

  45. arXiv:2404.11887  [pdf, other

    cs.AR

    EN-TensorCore: Advancing TensorCores Performance through Encoder-Based Methodology

    Authors: Qizhe Wu, Yuchen Gui, Zhichen Zeng, Xiaotian Wang, Huawen Liang, Xi Jin

    Abstract: Tensor computations, with matrix multiplication being the primary operation, serve as the fundamental basis for data analysis, physics, machine learning, and deep learning. As the scale and complexity of data continue to grow rapidly, the demand for tensor computations has also increased significantly. To meet this demand, several research institutions have started developing dedicated hardware fo… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 7 pages, 6 figures

  46. arXiv:2404.10394  [pdf, other

    cs.CV

    Portrait3D: Text-Guided High-Quality 3D Portrait Generation Using Pyramid Representation and GANs Prior

    Authors: Yiqian Wu, Hao Xu, Xiangjun Tang, Xien Chen, Siyu Tang, Zhebin Zhang, Chen Li, Xiaogang Jin

    Abstract: Existing neural rendering-based text-to-3D-portrait generation methods typically make use of human geometry prior and diffusion models to obtain guidance. However, relying solely on geometry information introduces issues such as the Janus problem, over-saturation, and over-smoothing. We present Portrait3D, a novel neural rendering-based framework with a novel joint geometry-appearance prior to ach… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  47. arXiv:2404.09526  [pdf, other

    cs.DC cs.LG

    LoongServe: Efficiently Serving Long-context Large Language Models with Elastic Sequence Parallelism

    Authors: Bingyang Wu, Shengyu Liu, Yinmin Zhong, Peng Sun, Xuanzhe Liu, Xin Jin

    Abstract: The context window of large language models (LLMs) is rapidly increasing, leading to a huge variance in resource usage between different requests as well as between different phases of the same request. Restricted by static parallelism strategies, existing LLM serving systems cannot efficiently utilize the underlying resources to serve variable-length requests in different phases. To address this… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  48. arXiv:2404.07234  [pdf, other

    cs.CR cs.AI cs.CL

    Goal-guided Generative Prompt Injection Attack on Large Language Models

    Authors: Chong Zhang, Mingyu Jin, Qinkai Yu, Chengzhi Liu, Haochen Xue, Xiaobo Jin

    Abstract: Current large language models (LLMs) provide a strong foundation for large-scale user-oriented natural language tasks. A large number of users can easily inject adversarial text or instructions through the user interface, thus causing LLMs model security challenges. Although there is currently a large amount of research on prompt injection attacks, most of these black-box attacks use heuristic str… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: 22 pages, 8 figures

  49. arXiv:2404.01767  [pdf, other

    cs.CL

    Class-Incremental Few-Shot Event Detection

    Authors: Kailin Zhao, Xiaolong Jin, Long Bai, Jiafeng Guo, Xueqi Cheng

    Abstract: Event detection is one of the fundamental tasks in information extraction and knowledge graph. However, a realistic event detection system often needs to deal with new event classes constantly. These new classes usually have only a few labeled instances as it is time-consuming and labor-intensive to annotate a large number of unlabeled instances. Therefore, this paper proposes a new task, called c… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  50. arXiv:2404.01720  [pdf, other

    cs.CL

    Self-Improvement Programming for Temporal Knowledge Graph Question Answering

    Authors: Zhuo Chen, Zhao Zhang, Zixuan Li, Fei Wang, Yutao Zeng, Xiaolong Jin, Yongjun Xu

    Abstract: Temporal Knowledge Graph Question Answering (TKGQA) aims to answer questions with temporal intent over Temporal Knowledge Graphs (TKGs). The core challenge of this task lies in understanding the complex semantic information regarding multiple types of time constraints (e.g., before, first) in questions. Existing end-to-end methods implicitly model the time constraints by learning time-aware embedd… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted by LREC-COLING 2024 (long paper)