Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 600 results for author: Guo, D

.
  1. arXiv:2407.08126  [pdf, other

    cs.AI cs.CV cs.MM

    Label-anticipated Event Disentanglement for Audio-Visual Video Parsing

    Authors: Jinxing Zhou, Dan Guo, Yuxin Mao, Yiran Zhong, Xiaojun Chang, Meng Wang

    Abstract: Audio-Visual Video Parsing (AVVP) task aims to detect and temporally locate events within audio and visual modalities. Multiple events can overlap in the timeline, making identification challenging. While traditional methods usually focus on improving the early audio-visual encoders to embed more effective features, the decoding phase -- crucial for final event classification, often receives less… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  2. arXiv:2407.07510  [pdf, other

    cs.CR cs.CV eess.SY

    Invisible Optical Adversarial Stripes on Traffic Sign against Autonomous Vehicles

    Authors: Dongfang Guo, Yuting Wu, Yimin Dai, Pengfei Zhou, Xin Lou, Rui Tan

    Abstract: Camera-based computer vision is essential to autonomous vehicle's perception. This paper presents an attack that uses light-emitting diodes and exploits the camera's rolling shutter effect to create adversarial stripes in the captured images to mislead traffic sign recognition. The attack is stealthy because the stripes on the traffic sign are invisible to human. For the attack to be threatening,… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Journal ref: In Proceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services (MobiSys 2024), 534-546

  3. arXiv:2407.05364  [pdf, other

    cs.LG

    PTaRL: Prototype-based Tabular Representation Learning via Space Calibration

    Authors: Hangting Ye, Wei Fan, Xiaozhuang Song, Shun Zheng, He Zhao, Dandan Guo, Yi Chang

    Abstract: Tabular data have been playing a mostly important role in diverse real-world fields, such as healthcare, engineering, finance, etc. With the recent success of deep learning, many tabular machine learning (ML) methods based on deep networks (e.g., Transformer, ResNet) have achieved competitive performance on tabular benchmarks. However, existing deep tabular ML methods suffer from the representatio… ▽ More

    Submitted 15 July, 2024; v1 submitted 7 July, 2024; originally announced July 2024.

    Comments: Accepted by ICLR 2024

  4. arXiv:2407.05311  [pdf, other

    cs.CV

    MMAD: Multi-label Micro-Action Detection in Videos

    Authors: Kun Li, Dan Guo, Pengyu Liu, Guoliang Chen, Meng Wang

    Abstract: Human body actions are an important form of non-verbal communication in social interactions. This paper focuses on a specific subset of body actions known as micro-actions, which are subtle, low-intensity body movements that provide a deeper understanding of inner human feelings. In real-world scenarios, human micro-actions often co-occur, with multiple micro-actions overlapping in time, such as s… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: Work in Progress

  5. arXiv:2407.04490  [pdf, other

    cs.CV

    Micro-gesture Online Recognition using Learnable Query Points

    Authors: Pengyu Liu, Fei Wang, Kun Li, Guoliang Chen, Yanyan Wei, Shengeng Tang, Zhiliang Wu, Dan Guo

    Abstract: In this paper, we briefly introduce the solution developed by our team, HFUT-VUT, for the Micro-gesture Online Recognition track in the MiGA challenge at IJCAI 2024. The Micro-gesture Online Recognition task involves identifying the category and locating the start and end times of micro-gestures in video clips. Compared to the typical Temporal Action Detection task, the Micro-gesture Online Recogn… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Technical Report of HFUT-VUT for the MiGA challenge at IJCAI 2024

  6. arXiv:2407.00046  [pdf, other

    cs.DC cs.GR

    Barrier-Augmented Lagrangian for GPU-based Elastodynamic Contact

    Authors: Dewen Guo, Minchen Li, Yin Yang, Guoping Wang, Sheng Li

    Abstract: We propose a GPU-based iterative method for accelerated elastodynamic simulation with the log-barrier-based contact model. While Newton's method is a conventional choice for solving the interior-point system, the presence of ill-conditioned log barriers often necessitates a direct solution at each linearized substep and costs substantial storage and computational overhead. Moreover, constraint set… ▽ More

    Submitted 4 June, 2024; originally announced July 2024.

    Comments: 17 pages, 30 figures

  7. arXiv:2406.18605  [pdf, other

    physics.ins-det nucl-ex

    The neutron array of the compact spectrometer for heavy ion experiments in Fermi energy region

    Authors: Dawei Si, Sheng Xiao, Yuhao Qin, Yijie Wang, Junhuai Xu, Baiting Tian, Boyuan Zhang, Dong Guo, Qin Zhi, Xiaobao Wei, Yibo Hao, Zengxiang Wang, Tianren Zhuo, Yuansheng Yang, Xianglun Wei, Herun Yang, Peng Ma, Limin Duan, Fangfang Duan, Junbing Ma, Shiwei Xu, Zhen Bai, Guo Yang, Yanyun Yang, Zhigang Xiao

    Abstract: The emission of neutrons from heavy ion reactions is an important observable for studying the asymmetric nuclear equation of state and the reaction dynamics. A 20-unit neutron array has been developed and mounted on the compact spectrometer for heavy ion experiments (CSHINE) to measure the neutron spectra, neutron-neutron and neutron-proton correlation functions. Each unit consists of a… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 8 pages, 11 figures

  8. arXiv:2406.12878  [pdf, other

    physics.ins-det hep-ex nucl-ex

    Beam test results of the prototype of the multi wire drift chamber for the CSR external-target experiment

    Authors: Zhi Qin, Zhoubo He, Zhe Cao, Tao Chen, Zhi Deng, Limin Duan, Dong Guo, Rongjiang Hu, Jie Kong, Canwen Liu, Peng Ma, Xianglun Wei, Shihai Wen, Xiangjie Wen, Junwei Yan, Herun Yang, Zuoqiao Yang, Yuhong Yu, Zhigang Xiao

    Abstract: The half-size prototype of the multi wire drift chamber (MWDC) for the cooling storage ring (CSR) external-target experiment (CEE) was assembled and tested in 350 MeV/u Kr+Fe reactions on the heavy ion research facility in Lanzhou (HIRFL). The prototype consists of 6 sense layers, where the sense wires are stretched in three directions X, U and V, meeting $0^\circ$, $30^\circ$ and $-30^\circ$ with… ▽ More

    Submitted 15 May, 2024; originally announced June 2024.

  9. arXiv:2406.12224  [pdf, other

    cs.RO

    Leveraging Large Language Model for Heterogeneous Ad Hoc Teamwork Collaboration

    Authors: Xinzhu Liu, Peiyan Li, Wenju Yang, Di Guo, Huaping Liu

    Abstract: Compared with the widely investigated homogeneous multi-robot collaboration, heterogeneous robots with different capabilities can provide a more efficient and flexible collaboration for more complex tasks. In this paper, we consider a more challenging heterogeneous ad hoc teamwork collaboration problem where an ad hoc robot joins an existing heterogeneous team for a shared goal. Specifically, the… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 20 pages

  10. arXiv:2406.11931  [pdf, other

    cs.SE cs.AI cs.LG

    DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

    Authors: DeepSeek-AI, Qihao Zhu, Daya Guo, Zhihong Shao, Dejian Yang, Peiyi Wang, Runxin Xu, Y. Wu, Yukun Li, Huazuo Gao, Shirong Ma, Wangding Zeng, Xiao Bi, Zihui Gu, Hanwei Xu, Damai Dai, Kai Dong, Liyue Zhang, Yishi Piao, Zhibin Gou, Zhenda Xie, Zhewen Hao, Bingxuan Wang, Junxiao Song, Deli Chen , et al. (15 additional authors not shown)

    Abstract: We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathe… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  11. arXiv:2406.11429  [pdf, other

    cs.CL cs.AI

    Fusion Makes Perfection: An Efficient Multi-Grained Matching Approach for Zero-Shot Relation Extraction

    Authors: Shilong Li, Ge Bai, Zhang Zhang, Ying Liu, Chenji Lu, Daichi Guo, Ruifang Liu, Yong Sun

    Abstract: Predicting unseen relations that cannot be observed during the training phase is a challenging task in relation extraction. Previous works have made progress by matching the semantics between input instances and label descriptions. However, fine-grained matching often requires laborious manual annotation, and rich interactions between instances and label descriptions come with significant computat… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted to the main conference of NAACL2024

  12. arXiv:2406.11266  [pdf, ps, other

    cs.CV

    DRIP: Discriminative Rotation-Invariant Pole Landmark Descriptor for 3D LiDAR Localization

    Authors: Dingrui Li, Dedi Guo, Kanji Tanaka

    Abstract: In 3D LiDAR-based robot self-localization, pole-like landmarks are gaining popularity as lightweight and discriminative landmarks. This work introduces a novel approach called "discriminative rotation-invariant poles," which enhances the discriminability of pole-like landmarks while maintaining their lightweight nature. Unlike conventional methods that model a pole landmark as a 3D line segment pe… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 4 pages, 1 table

  13. arXiv:2406.11247  [pdf, other

    cs.CV

    STEVE Series: Step-by-Step Construction of Agent Systems in Minecraft

    Authors: Zhonghan Zhao, Wenhao Chai, Xuan Wang, Ke Ma, Kewei Chen, Dongxu Guo, Tian Ye, Yanting Zhang, Hongwei Wang, Gaoang Wang

    Abstract: Building an embodied agent system with a large language model (LLM) as its core is a promising direction. Due to the significant costs and uncontrollable factors associated with deploying and training such agents in the real world, we have decided to begin our exploration within the Minecraft environment. Our STEVE Series agents can complete basic tasks in a virtual environment and more challengin… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: CVPR 2024 Embodied AI Workshop

  14. arXiv:2406.09175  [pdf, other

    cs.CV cs.CL

    ReMI: A Dataset for Reasoning with Multiple Images

    Authors: Mehran Kazemi, Nishanth Dikkala, Ankit Anand, Petar Devic, Ishita Dasgupta, Fangyu Liu, Bahare Fatemi, Pranjal Awasthi, Dee Guo, Sreenivas Gollapudi, Ahmed Qureshi

    Abstract: With the continuous advancement of large language models (LLMs), it is essential to create new benchmarks to effectively evaluate their expanding capabilities and identify areas for improvement. This work focuses on multi-image reasoning, an emerging capability in state-of-the-art LLMs. We introduce ReMI, a dataset designed to assess LLMs' ability to Reason with Multiple Images. This dataset encom… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  15. arXiv:2406.07670  [pdf

    cs.RO

    Design and Control of a Compact Series Elastic Actuator Module for Robots in MRI Scanners

    Authors: Binghan He, Naichen Zhao, David Y. Guo, Charles H. Paxson, Ronald S. Fearing

    Abstract: In this study, we introduce a novel MRI-compatible rotary series elastic actuator module utilizing velocity-sourced ultrasonic motors for force-controlled robots operating within MRI scanners. Unlike previous MRI-compatible SEA designs, our module incorporates a transmission force sensing series elastic actuator structure, with four off-the-shelf compression springs strategically placed between th… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  16. arXiv:2406.06498  [pdf, other

    cs.RO cs.HC

    Demonstrating HumanTHOR: A Simulation Platform and Benchmark for Human-Robot Collaboration in a Shared Workspace

    Authors: Chenxu Wang, Boyuan Du, Jiaxin Xu, Peiyan Li, Di Guo, Huaping Liu

    Abstract: Human-robot collaboration (HRC) in a shared workspace has become a common pattern in real-world robot applications and has garnered significant research interest. However, most existing studies for human-in-the-loop (HITL) collaboration with robots in a shared workspace evaluate in either simplified game environments or physical platforms, falling short in limited realistic significance or limited… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: In RSS 2024

  17. arXiv:2406.05763  [pdf, other

    eess.AS

    WenetSpeech4TTS: A 12,800-hour Mandarin TTS Corpus for Large Speech Generation Model Benchmark

    Authors: Linhan Ma, Dake Guo, Kun Song, Yuepeng Jiang, Shuai Wang, Liumeng Xue, Weiming Xu, Huan Zhao, Binbin Zhang, Lei Xie

    Abstract: With the development of large text-to-speech (TTS) models and scale-up of the training data, state-of-the-art TTS systems have achieved impressive performance. In this paper, we present WenetSpeech4TTS, a multi-domain Mandarin corpus derived from the open-sourced WenetSpeech dataset. Tailored for the text-to-speech tasks, we refined WenetSpeech by adjusting segment boundaries, enhancing the audio… ▽ More

    Submitted 19 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH2024

  18. arXiv:2406.05672  [pdf, other

    eess.AS

    Text-aware and Context-aware Expressive Audiobook Speech Synthesis

    Authors: Dake Guo, Xinfa Zhu, Liumeng Xue, Yongmao Zhang, Wenjie Tian, Lei Xie

    Abstract: Recent advances in text-to-speech have significantly improved the expressiveness of synthetic speech. However, a major challenge remains in generating speech that captures the diverse styles exhibited by professional narrators in audiobooks without relying on manually labeled data or reference speech. To address this problem, we propose a text-aware and context-aware(TACA) style modeling approach… ▽ More

    Submitted 12 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH2024

  19. arXiv:2406.04942  [pdf, other

    cs.CV

    Joint Spatial-Temporal Modeling and Contrastive Learning for Self-supervised Heart Rate Measurement

    Authors: Wei Qian, Qi Li, Kun Li, Xinke Wang, Xiao Sun, Meng Wang, Dan Guo

    Abstract: This paper briefly introduces the solutions developed by our team, HFUT-VUT, for Track 1 of self-supervised heart rate measurement in the 3rd Vision-based Remote Physiological Signal Sensing (RePSS) Challenge hosted at IJCAI 2024. The goal is to develop a self-supervised learning algorithm for heart rate (HR) estimation using unlabeled facial videos. To tackle this task, we present two self-superv… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  20. arXiv:2406.02035  [pdf, other

    cs.LG cs.AI

    A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning

    Authors: Khimya Khetarpal, Zhaohan Daniel Guo, Bernardo Avila Pires, Yunhao Tang, Clare Lyle, Mark Rowland, Nicolas Heess, Diana Borsa, Arthur Guez, Will Dabney

    Abstract: Learning a good representation is a crucial challenge for Reinforcement Learning (RL) agents. Self-predictive learning provides means to jointly learn a latent representation and dynamics model by bootstrapping from future latent representations (BYOL). Recent work has developed theoretical insights into these algorithms by studying a continuous-time ODE model for self-predictive representation le… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  21. arXiv:2406.00919  [pdf, other

    cs.CV cs.MM

    Advancing Weakly-Supervised Audio-Visual Video Parsing via Segment-wise Pseudo Labeling

    Authors: Jinxing Zhou, Dan Guo, Yiran Zhong, Meng Wang

    Abstract: The Audio-Visual Video Parsing task aims to identify and temporally localize the events that occur in either or both the audio and visual streams of audible videos. It often performs in a weakly-supervised manner, where only video event labels are provided, \ie, the modalities and the timestamps of the labels are unknown. Due to the lack of densely annotated labels, recent work attempts to leverag… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: IJCV 2024 Accepted. arXiv admin note: substantial text overlap with arXiv:2303.02344

  22. arXiv:2405.19730  [pdf

    cs.AI cs.CV cs.LG

    Research on Foundation Model for Spatial Data Intelligence: China's 2024 White Paper on Strategic Development of Spatial Data Intelligence

    Authors: Shaohua Wang, Xing Xie, Yong Li, Danhuai Guo, Zhi Cai, Yu Liu, Yang Yue, Xiao Pan, Feng Lu, Huayi Wu, Zhipeng Gui, Zhiming Ding, Bolong Zheng, Fuzheng Zhang, Tao Qin, Jingyuan Wang, Chuang Tao, Zhengchao Chen, Hao Lu, Jiayi Li, Hongyang Chen, Peng Yue, Wenhao Yu, Yao Yao, Leilei Sun , et al. (9 additional authors not shown)

    Abstract: This report focuses on spatial data intelligent large models, delving into the principles, methods, and cutting-edge applications of these models. It provides an in-depth discussion on the definition, development history, current status, and trends of spatial data intelligent large models, as well as the challenges they face. The report systematically elucidates the key technologies of spatial dat… ▽ More

    Submitted 29 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: in Chinese language

  23. arXiv:2405.19107  [pdf, ps, other

    cs.LG cs.AI

    Offline Regularised Reinforcement Learning for Large Language Models Alignment

    Authors: Pierre Harvey Richemond, Yunhao Tang, Daniel Guo, Daniele Calandriello, Mohammad Gheshlaghi Azar, Rafael Rafailov, Bernardo Avila Pires, Eugene Tarassov, Lucas Spangher, Will Ellsworth, Aliaksei Severyn, Jonathan Mallinson, Lior Shani, Gil Shamir, Rishabh Joshi, Tianqi Liu, Remi Munos, Bilal Piot

    Abstract: The dominant framework for alignment of large language models (LLM), whether through reinforcement learning from human feedback or direct preference optimisation, is to learn from preference data. This involves building datasets where each element is a quadruplet composed of a prompt, two independent responses (completions of the prompt) and a human preference between the two independent responses… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  24. arXiv:2405.18300  [pdf, other

    cs.AI

    CompetEvo: Towards Morphological Evolution from Competition

    Authors: Kangyao Huang, Di Guo, Xinyu Zhang, Xiangyang Ji, Huaping Liu

    Abstract: Training an agent to adapt to specific tasks through co-optimization of morphology and control has widely attracted attention. However, whether there exists an optimal configuration and tactics for agents in a multiagent competition scenario is still an issue that is challenging to definitively conclude. In this context, we propose competitive evolution (CompetEvo), which co-evolves agents' design… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  25. Reliable Object Tracking by Multimodal Hybrid Feature Extraction and Transformer-Based Fusion

    Authors: Hongze Sun, Rui Liu, Wuque Cai, Jun Wang, Yue Wang, Huajin Tang, Yan Cui, Dezhong Yao, Daqing Guo

    Abstract: Visual object tracking, which is primarily based on visible light image sequences, encounters numerous challenges in complicated scenarios, such as low light conditions, high dynamic ranges, and background clutter. To address these challenges, incorporating the advantages of multiple visual modalities is a promising solution for achieving reliable object tracking. However, the existing approaches… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 16 pages, 7 figures, 9 tabes; This work has been submitted for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  26. arXiv:2405.15619  [pdf, other

    cs.CV

    DiffCalib: Reformulating Monocular Camera Calibration as Diffusion-Based Dense Incident Map Generation

    Authors: Xiankang He, Guangkai Xu, Bo Zhang, Hao Chen, Ying Cui, Dongyan Guo

    Abstract: Monocular camera calibration is a key precondition for numerous 3D vision applications. Despite considerable advancements, existing methods often hinge on specific assumptions and struggle to generalize across varied real-world scenarios, and the performance is limited by insufficient training data. Recently, diffusion models trained on expansive datasets have been confirmed to maintain the capabi… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  27. arXiv:2405.14333  [pdf, other

    cs.AI

    DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data

    Authors: Huajian Xin, Daya Guo, Zhihong Shao, Zhizhou Ren, Qihao Zhu, Bo Liu, Chong Ruan, Wenda Li, Xiaodan Liang

    Abstract: Proof assistants like Lean have revolutionized mathematical proof verification, ensuring high accuracy and reliability. Although large language models (LLMs) show promise in mathematical reasoning, their advancement in formal theorem proving is hindered by a lack of training data. To address this issue, we introduce an approach to generate extensive Lean 4 proof data derived from high-school and u… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  28. arXiv:2405.14212  [pdf, other

    cs.CR cs.CL

    Federated Domain-Specific Knowledge Transfer on Large Language Models Using Synthetic Data

    Authors: Haoran Li, Xinyuan Zhao, Dadi Guo, Hanlin Gu, Ziqian Zeng, Yuxing Han, Yangqiu Song, Lixin Fan, Qiang Yang

    Abstract: As large language models (LLMs) demonstrate unparalleled performance and generalization ability, LLMs are widely used and integrated into various applications. When it comes to sensitive domains, as commonly described in federated learning scenarios, directly using external LLMs on private data is strictly prohibited by stringent data security and privacy regulations. For local clients, the utiliz… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  29. arXiv:2405.10521  [pdf, other

    cs.CR

    Generative AI for Secure and Privacy-Preserving Mobile Crowdsensing

    Authors: Yaoqi Yang, Bangning Zhang, Daoxing Guo, Hongyang Du, Zehui Xiong, Dusit Niyato, Zhu Han

    Abstract: Recently, generative AI has attracted much attention from both academic and industrial fields, which has shown its potential, especially in the data generation and synthesis aspects. Simultaneously, secure and privacy-preserving mobile crowdsensing (SPPMCS) has been widely applied in data collection/ acquirement due to an advantage on low deployment cost, flexible implementation, and high adaptabi… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  30. arXiv:2405.08448  [pdf, other

    cs.LG cs.AI

    Understanding the performance gap between online and offline alignment algorithms

    Authors: Yunhao Tang, Daniel Zhaohan Guo, Zeyu Zheng, Daniele Calandriello, Yuan Cao, Eugene Tarassov, Rémi Munos, Bernardo Ávila Pires, Michal Valko, Yong Cheng, Will Dabney

    Abstract: Reinforcement learning from human feedback (RLHF) is the canonical framework for large language model alignment. However, rising popularity in offline alignment algorithms challenge the need for on-policy sampling in RLHF. Within the context of reward over-optimization, we start with an opening set of experiments that demonstrate the clear advantage of online methods over offline methods. This pro… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  31. arXiv:2405.04434  [pdf, other

    cs.CL cs.AI

    DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

    Authors: DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding , et al. (132 additional authors not shown)

    Abstract: We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference… ▽ More

    Submitted 19 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

  32. arXiv:2404.17511  [pdf, other

    cs.LG cs.CY cs.SI

    Bridging the Fairness Divide: Achieving Group and Individual Fairness in Graph Neural Networks

    Authors: Duna Zhan, Dongliang Guo, Pengsheng Ji, Sheng Li

    Abstract: Graph neural networks (GNNs) have emerged as a powerful tool for analyzing and learning from complex data structured as graphs, demonstrating remarkable effectiveness in various applications, such as social network analysis, recommendation systems, and drug discovery. However, despite their impressive performance, the fairness problem has increasingly gained attention as a crucial aspect to consid… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: 16 pages, 3 figures

  33. arXiv:2404.15034  [pdf, other

    cs.LG cs.AI

    Deep Multi-View Channel-Wise Spatio-Temporal Network for Traffic Flow Prediction

    Authors: Hao Miao, Senzhang Wang, Meiyue Zhang, Diansheng Guo, Funing Sun, Fan Yang

    Abstract: Accurately forecasting traffic flows is critically important to many real applications including public safety and intelligent transportation systems. The challenges of this problem include both the dynamic mobility patterns of the people and the complex spatial-temporal correlations of the urban traffic data. Meanwhile, most existing models ignore the diverse impacts of the various traffic observ… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: Accepted by AAAI2020 workshop

  34. arXiv:2404.13513  [pdf

    cond-mat.mtrl-sci

    Ferroelectricity in an Antiferromagnetic Vanadium Trichloride Monolayer

    Authors: Jinghao Deng, Deping Guo, Yao Wen, Shuangzan Lu, Zhengbo Cheng, Zemin Pan, Tao Jian, Yusong Bai, Hui Zhang, Wei Ji, Jun He, Chendong Zhang

    Abstract: Multiferroicity allows magnetism to be controlled using electric fields or vice versa, which has gained tremendous interest in both fundamental research and device applications. A reduced dimensionality of multiferroic materials is highly desired for device miniaturization, but the coexistence of ferroelectricity and magnetism at the two-dimensional limit is still debated. Here, we used a NbSe2 su… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

  35. arXiv:2404.12448  [pdf, other

    nucl-th hep-ph

    Study of the electromagnetic form factors of the nucleons in the timelike region

    Authors: Qin-He Yang, Di Guo, Ming-Yan Li, Ling-Yun Dai, Johann Haidenbauer, Ulf-G. Meißner

    Abstract: The electromagnetic form factors $G_{\rm E}$ and $G_{\rm M}$ of the proton and neutron in the timelike region are extracted in a study of the processes $e^+ e^-\to \bar{p}p$ and $e^+ e^-\to \bar{n}n$. The reaction amplitude is evaluated within the distorted wave Born approximation, with the interaction of the antinucleon-nucleon ($\bar{N}N$) pair taken into account. The latter is constructed withi… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 37 pages, 10 figures

  36. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  37. arXiv:2404.09231  [pdf, other

    cs.CV

    Tri-modal Confluence with Temporal Dynamics for Scene Graph Generation in Operating Rooms

    Authors: Diandian Guo, Manxi Lin, Jialun Pei, He Tang, Yueming Jin, Pheng-Ann Heng

    Abstract: A comprehensive understanding of surgical scenes allows for monitoring of the surgical process, reducing the occurrence of accidents and enhancing efficiency for medical professionals. Semantic modeling within operating rooms, as a scene graph generation (SGG) task, is challenging since it involves consecutive recognition of subtle surgical actions over prolonged periods. To address this challenge… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: 10 pages, 4 figures, 3 tables

  38. arXiv:2404.06795  [pdf, other

    cs.LG

    Extracting Clean and Balanced Subset for Noisy Long-tailed Classification

    Authors: Zhuo Li, He Zhao, Zhen Li, Tongliang Liu, Dandan Guo, Xiang Wan

    Abstract: Real-world datasets usually are class-imbalanced and corrupted by label noise. To solve the joint issue of long-tailed distribution and label noise, most previous works usually aim to design a noise detector to distinguish the noisy and clean samples. Despite their effectiveness, they may be limited in handling the joint issue effectively in a unified way. In this work, we develop a novel pseudo l… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  39. arXiv:2404.06191  [pdf, other

    hep-ph nucl-th

    Study of the timelike electromagnetic form factors of the $Λ_c$

    Authors: Di Guo, Qin-He Yang, Ling-Yun Dai

    Abstract: In this paper, the reaction of electron-positron annihilation into $Λ_c^+\barΛ_c^-$ is investigated. The $Λ_c^+\barΛ_c^-$ scattering amplitudes are obtained by solving the Lippmann-Schwinger equation. The contact, annihilation, and two pseudoscalar-exchange potentials are taken into account in the spirit of the chiral effective field theory. The amplitudes of $e^+e^-\to Λ_c^+\barΛ_c^-$ are constru… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: 9 pages, 7 figures

  40. arXiv:2404.04971  [pdf, other

    cs.CV

    FPL+: Filtered Pseudo Label-based Unsupervised Cross-Modality Adaptation for 3D Medical Image Segmentation

    Authors: Jianghao Wu, Dong Guo, Guotai Wang, Qiang Yue, Huijun Yu, Kang Li, Shaoting Zhang

    Abstract: Adapting a medical image segmentation model to a new domain is important for improving its cross-domain transferability, and due to the expensive annotation process, Unsupervised Domain Adaptation (UDA) is appealing where only unlabeled images are needed for the adaptation. Existing UDA methods are mainly based on image or feature alignment with adversarial training for regularization, and they ar… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: 12 pages, 7 figures

  41. arXiv:2404.04619  [pdf, other

    cs.AI cs.CV

    Do We Really Need a Complex Agent System? Distill Embodied Agent into a Single Model

    Authors: Zhonghan Zhao, Ke Ma, Wenhao Chai, Xuan Wang, Kewei Chen, Dongxu Guo, Yanting Zhang, Hongwei Wang, Gaoang Wang

    Abstract: With the power of large language models (LLMs), open-ended embodied agents can flexibly understand human instructions, generate interpretable guidance strategies, and output executable actions. Nowadays, Multi-modal Language Models~(MLMs) integrate multi-modal signals into LLMs, further bringing richer perception to entity agents and allowing embodied agents to perceive world-understanding tasks m… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: arXiv admin note: text overlap with arXiv:2403.08282

  42. arXiv:2404.03819  [pdf, other

    cs.CV

    Effective Lymph Nodes Detection in CT Scans Using Location Debiased Query Selection and Contrastive Query Representation in Transformer

    Authors: Qinji Yu, Yirui Wang, Ke Yan, Haoshen Li, Dazhou Guo, Li Zhang, Le Lu, Na Shen, Qifeng Wang, Xiaowei Ding, Xianghua Ye, Dakai Jin

    Abstract: Lymph node (LN) assessment is a critical, indispensable yet very challenging task in the routine clinical workflow of radiology and oncology. Accurate LN analysis is essential for cancer diagnosis, staging, and treatment planning. Finding scatteredly distributed, low-contrast clinically relevant LNs in 3D CT is difficult even for experienced physicians under high inter-observer variations. Previou… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: Technical report

  43. arXiv:2403.19923  [pdf, other

    physics.flu-dyn

    On the Preprocessing of Physics-informed Neural Networks: How to Better Utilize Data in Fluid Mechanics

    Authors: Shengfeng Xu, Chang Yan, Zhenxu Sun, Renfang Huang, Dilong Guo, Guowei Yang

    Abstract: Physics-Informed Neural Networks (PINNs) serve as a flexible alternative for tackling forward and inverse problems in differential equations, displaying impressive advancements in diverse areas of applied mathematics. Despite integrating both data and underlying physics to enrich the neural network's understanding, concerns regarding the effectiveness and practicality of PINNs persist. Over the pa… ▽ More

    Submitted 11 July, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

  44. arXiv:2403.15063  [pdf, other

    cs.CV

    Towards a Comprehensive, Efficient and Promptable Anatomic Structure Segmentation Model using 3D Whole-body CT Scans

    Authors: Heng Guo, Jianfeng Zhang, Jiaxing Huang, Tony C. W. Mok, Dazhou Guo, Ke Yan, Le Lu, Dakai Jin, Minfeng Xu

    Abstract: Segment anything model (SAM) demonstrates strong generalization ability on natural image segmentation. However, its direct adaption in medical image segmentation tasks shows significant performance drops with inferior accuracy and unstable results. It may also requires an excessive number of prompt points to obtain a reasonable accuracy. For segmenting 3D radiological CT or MRI scans, a 2D SAM mod… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  45. arXiv:2403.14174  [pdf, other

    cs.CV

    Unified Static and Dynamic Network: Efficient Temporal Filtering for Video Grounding

    Authors: Jingjing Hu, Dan Guo, Kun Li, Zhan Si, Xun Yang, Xiaojun Chang, Meng Wang

    Abstract: Inspired by the activity-silent and persistent activity mechanisms in human visual perception biology, we design a Unified Static and Dynamic Network (UniSDNet), to learn the semantic association between the video and text/audio queries in a cross-modal environment for efficient video grounding. For static modeling, we devise a novel residual structure (ResMLP) to boost the global comprehensive in… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  46. arXiv:2403.11150  [pdf, other

    cs.CV

    Training A Small Emotional Vision Language Model for Visual Art Comprehension

    Authors: Jing Zhang, Liang Zheng, Meng Wang, Dan Guo

    Abstract: This paper develops small vision language models to understand visual art, which, given an art work, aims to identify its emotion category and explain this prediction with natural language. While small models are computationally efficient, their capacity is much limited compared with large models. To break this trade-off, this paper builds a small emotional vision language model (SEVLM) by emotion… ▽ More

    Submitted 10 July, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

    Comments: 16 pages

  47. arXiv:2403.10487  [pdf, other

    cs.RO cs.AI

    Stimulate the Potential of Robots via Competition

    Authors: Kangyao Huang, Di Guo, Xinyu Zhang, Xiangyang Ji, Huaping Liu

    Abstract: It is common for us to feel pressure in a competition environment, which arises from the desire to obtain success comparing with other individuals or opponents. Although we might get anxious under the pressure, it could also be a drive for us to stimulate our potentials to the best in order to keep up with others. Inspired by this, we propose a competitive learning framework which is able to help… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  48. arXiv:2403.09071  [pdf, ps, other

    math.AP

    Long time dynamics for helical vortex filament in Euler flows

    Authors: Dengjun Guo, Lifeng Zhao

    Abstract: We consider the three-dimensional incompressible Euler equation \begin{equation*}\left\{\begin{aligned} &\partial_t Ω+U \cdot \nabla Ω-Ω\cdot \nabla U=0 \\ &Ω(x,0)=Ω_0(x) \end{aligned}\right. \end{equation*} under the assumption that $Ω^z$ is helical and in the absence of vorticity stretching. Assuming that the initial vorticity $Ω_0$ is primarily concentrated within an $ε$ neighborhood of a helix… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: 30 pages

  49. arXiv:2403.08635  [pdf, other

    cs.LG cs.AI stat.ML

    Human Alignment of Large Language Models through Online Preference Optimisation

    Authors: Daniele Calandriello, Daniel Guo, Remi Munos, Mark Rowland, Yunhao Tang, Bernardo Avila Pires, Pierre Harvey Richemond, Charline Le Lan, Michal Valko, Tianqi Liu, Rishabh Joshi, Zeyu Zheng, Bilal Piot

    Abstract: Ensuring alignment of language models' outputs with human preferences is critical to guarantee a useful, safe, and pleasant user experience. Thus, human alignment has been extensively studied recently and several methods such as Reinforcement Learning from Human Feedback (RLHF), Direct Policy Optimisation (DPO) and Sequence Likelihood Calibration (SLiC) have emerged. In this paper, our contributio… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  50. arXiv:2403.08282  [pdf, other

    cs.CV

    Hierarchical Auto-Organizing System for Open-Ended Multi-Agent Navigation

    Authors: Zhonghan Zhao, Kewei Chen, Dongxu Guo, Wenhao Chai, Tian Ye, Yanting Zhang, Gaoang Wang

    Abstract: Due to the dynamic and unpredictable open-world setting, navigating complex environments in Minecraft poses significant challenges for multi-agent systems. Agents must interact with the environment and coordinate their actions with other agents to achieve common objectives. However, traditional approaches often struggle to efficiently manage inter-agent communication and task distribution, crucial… ▽ More

    Submitted 18 March, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

    Comments: ICLR 2024 Workshop on LLM Agents