Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 407 results for author: Gu, Z

.
  1. arXiv:2409.04431  [pdf, other

    cs.LG

    Theory, Analysis, and Best Practices for Sigmoid Self-Attention

    Authors: Jason Ramapuram, Federico Danieli, Eeshan Dhekane, Floris Weers, Dan Busbridge, Pierre Ablin, Tatiana Likhomanenko, Jagrit Digani, Zijin Gu, Amitis Shidani, Russ Webb

    Abstract: Attention is a key part of the transformer architecture. It is a sequence-to-sequence mapping that transforms each sequence element into a weighted sum of values. The weights are typically obtained as the softmax of dot products between keys and queries. Recent work has explored alternatives to softmax attention in transformers, such as ReLU and sigmoid activations. In this work, we revisit sigmoi… ▽ More

    Submitted 6 September, 2024; originally announced September 2024.

  2. arXiv:2409.03643  [pdf, other

    cs.CV cs.CL

    CDM: A Reliable Metric for Fair and Accurate Formula Recognition Evaluation

    Authors: Bin Wang, Fan Wu, Linke Ouyang, Zhuangcheng Gu, Rui Zhang, Renqiu Xia, Bo Zhang, Conghui He

    Abstract: Formula recognition presents significant challenges due to the complicated structure and varied notation of mathematical expressions. Despite continuous advancements in formula recognition models, the evaluation metrics employed by these models, such as BLEU and Edit Distance, still exhibit notable limitations. They overlook the fact that the same formula has diverse representations and is highly… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

    Comments: Project Website: https://github.com/opendatalab/UniMERNet/tree/main/cdm

  3. arXiv:2409.01787  [pdf, other

    cs.CL

    LLM-GAN: Construct Generative Adversarial Network Through Large Language Models For Explainable Fake News Detection

    Authors: Yifeng Wang, Zhouhong Gu, Siwei Zhang, Suhang Zheng, Tao Wang, Tianyu Li, Hongwei Feng, Yanghua Xiao

    Abstract: Explainable fake news detection predicts the authenticity of news items with annotated explanations. Today, Large Language Models (LLMs) are known for their powerful natural language understanding and explanation generation abilities. However, presenting LLMs for explainable fake news detection remains two main challenges. Firstly, fake news appears reasonable and could easily mislead LLMs, leavin… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  4. arXiv:2408.12209  [pdf, ps, other

    math.OC cs.LG stat.ML

    Zeroth-Order Stochastic Mirror Descent Algorithms for Minimax Excess Risk Optimization

    Authors: Zhihao Gu, Zi Xu

    Abstract: The minimax excess risk optimization (MERO) problem is a new variation of the traditional distributionally robust optimization (DRO) problem, which achieves uniformly low regret across all test distributions under suitable conditions. In this paper, we propose a zeroth-order stochastic mirror descent (ZO-SMD) algorithm available for both smooth and non-smooth MERO to estimate the minimal risk of e… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  5. arXiv:2408.01349  [pdf, other

    cs.MM cs.AI cs.CV cs.IR cs.LG

    PC$^2$: Pseudo-Classification Based Pseudo-Captioning for Noisy Correspondence Learning in Cross-Modal Retrieval

    Authors: Yue Duan, Zhangxuan Gu, Zhenzhe Ying, Lei Qi, Changhua Meng, Yinghuan Shi

    Abstract: In the realm of cross-modal retrieval, seamlessly integrating diverse modalities within multimedia remains a formidable challenge, especially given the complexities introduced by noisy correspondence learning (NCL). Such noise often stems from mismatched data pairs, which is a significant obstacle distinct from traditional noisy labels. This paper introduces Pseudo-Classification based Pseudo-Capt… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: Accepted by ACM MM 2024

  6. arXiv:2408.00598  [pdf, other

    math.OC

    HOT: An Efficient Halpern Accelerating Algorithm for Optimal Transport Problems

    Authors: Guojun Zhang, Zhexuan Gu, Yancheng Yuan, Defeng Sun

    Abstract: This paper proposes an efficient HOT algorithm for solving the optimal transport (OT) problems with finite supports. We particularly focus on an efficient implementation of the HOT algorithm for the case where the supports are in $\mathbb{R}^2$ with ground distances calculated by $L_2^2$-norm. Specifically, we design a Halpern accelerating algorithm to solve the equivalent reduced model of the dis… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  7. arXiv:2407.20804  [pdf, other

    math.AP

    The incompressible Navier-Stokes limit from the lattice BGK Boltzmann equation

    Authors: Zhongyang Gu, Xin Hu, Pritpal Matharu, Bartosz Protas, Makiko Sasada, Tsuyoshi Yoneda

    Abstract: In this paper, we prove that a local weak solution to the $d$-dimensional incompressible Navier-Stokes equations ($d \geq 2$) can be constructed by taking the hydrodynamic limit of a velocity-discretized Boltzmann equation with a simplified BGK collision operator. Moreover, in the case when the dimension is $d=2,3$, we characterize the combinations of finitely many particle velocities and probabil… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: 49 pages, 3 figures

    MSC Class: 35Q30; 76D05; 76P05; 76M28

  8. arXiv:2407.20679  [pdf, other

    cs.CE

    Online Prediction-Assisted Safe Reinforcement Learning for Electric Vehicle Charging Station Recommendation in Dynamically Coupled Transportation-Power Systems

    Authors: Qionghua Liao, Guilong Li, Jiajie Yu, Ziyuan Gu, Wei Ma

    Abstract: With the proliferation of electric vehicles (EVs), the transportation network and power grid become increasingly interdependent and coupled via charging stations. The concomitant growth in charging demand has posed challenges for both networks, highlighting the importance of charging coordination. Existing literature largely overlooks the interactions between power grid security and traffic effici… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: 33 pages, 31 figures

  9. arXiv:2407.17768  [pdf, ps, other

    math.PR

    $G$-BSDEs with mean constraints in time-dependent intervals

    Authors: Zihao Gu, Hui Zhao

    Abstract: In this paper, we study a collection of mean-reflected backward stochastic differential equations driven by $G$-Brownian motions ($G$-BSDEs), where $G$-expectations are constrained in some time-dependent intervals. To establish well-posedness results, we firstly construct a backward Skorokhod problem with sublinear expectation, and then apply that in the study of doubly mean-reflected $G$-BSDEs in… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  10. arXiv:2407.15835  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    dMel: Speech Tokenization made Simple

    Authors: He Bai, Tatiana Likhomanenko, Ruixiang Zhang, Zijin Gu, Zakaria Aldeneh, Navdeep Jaitly

    Abstract: Large language models have revolutionized natural language processing by leveraging self-supervised pretraining on vast textual data. Inspired by this success, researchers have investigated complicated speech tokenization methods to discretize continuous speech signals so that language modeling techniques can be applied to speech data. However, existing approaches either model semantic tokens, pot… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: under review

  11. arXiv:2407.12005  [pdf, other

    cs.MM cs.CV

    VCEval: Rethinking What is a Good Educational Video and How to Automatically Evaluate It

    Authors: Xiaoxuan Zhu, Zhouhong Gu, Sihang Jiang, Zhixu Li, Hongwei Feng, Yanghua Xiao

    Abstract: Online courses have significantly lowered the barrier to accessing education, yet the varying content quality of these videos poses challenges. In this work, we focus on the task of automatically evaluating the quality of video course content. We have constructed a dataset with a substantial collection of video courses and teaching materials. We propose three evaluation principles and design a new… ▽ More

    Submitted 15 June, 2024; originally announced July 2024.

  12. arXiv:2407.05909  [pdf, other

    cs.CV

    Multi-clue Consistency Learning to Bridge Gaps Between General and Oriented Object in Semi-supervised Detection

    Authors: Chenxu Wang, Chunyan Xu, Ziqi Gu, Zhen Cui

    Abstract: While existing semi-supervised object detection (SSOD) methods perform well in general scenes, they encounter challenges in handling oriented objects in aerial images. We experimentally find three gaps between general and oriented object detection in semi-supervised learning: 1) Sampling inconsistency: the common center sampling is not suitable for oriented objects with larger aspect ratios when s… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  13. arXiv:2407.04895  [pdf, ps, other

    math.AT

    Retractive spaces and Bousfield-Kan completions

    Authors: Zeshen Gu, John E. Harper

    Abstract: In this short paper we apply some recent techniques developed by Schonsheck, and subsequently Carr-Harper, in the context of operadic algebras in spectra -- on convergence of Bousfield-Kan completions and comparisons with convergence of the Taylor tower of the identity functor in Goodwillie's functor calculus -- to the setting of retractive spaces: this arises when working with spaces centered awa… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  14. arXiv:2407.04845  [pdf, other

    cs.NI

    Poster: Flexible Scheduling of Network and Computing Resources for Distributed AI Tasks

    Authors: Ruikun Wang, Jiawei Zhang, Qiaolun Zhang, Bojun Zhang, Zhiqun Gu, Aryanaz Attarpour, Yuefeng Ji, Massimo Tornatore

    Abstract: Many emerging Artificial Intelligence (AI) applications require on-demand provisioning of large-scale computing, which can only be enabled by leveraging distributed computing services interconnected through networking. To address such increasing demand for networking to serve AI tasks, we investigate new scheduling strategies to improve communication efficiency and test them on a programmable test… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  15. arXiv:2407.03942  [pdf, other

    cs.AI cs.CL cs.HC

    Diverse and Fine-Grained Instruction-Following Ability Exploration with Synthetic Data

    Authors: Zihui Gu, Xingwu Sun, Fengzong Lian, Zhanhui Kang, Cheng-Zhong Xu, Ju Fan

    Abstract: Instruction-following is particularly crucial for large language models (LLMs) to support diverse user requests. While existing work has made progress in aligning LLMs with human preferences, evaluating their capabilities on instruction following remains a challenge due to complexity and diversity of real-world user instructions. While existing evaluation methods focus on general skills, they suff… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Journal ref: AAAI 2024

  16. arXiv:2407.02730  [pdf, other

    cs.CV cs.AI

    MedVH: Towards Systematic Evaluation of Hallucination for Large Vision Language Models in the Medical Context

    Authors: Zishan Gu, Changchang Yin, Fenglin Liu, Ping Zhang

    Abstract: Large Vision Language Models (LVLMs) have recently achieved superior performance in various tasks on natural image and text data, which inspires a large amount of studies for LVLMs fine-tuning and training. Despite their advancements, there has been scant research on the robustness of these models against hallucination when fine-tuned on smaller datasets. In this study, we introduce a new benchmar… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  17. arXiv:2406.14250  [pdf, other

    cs.CV cs.HC

    E-ANT: A Large-Scale Dataset for Efficient Automatic GUI NavigaTion

    Authors: Ke Wang, Tianyu Xia, Zhangxuan Gu, Yi Zhao, Shuheng Shen, Changhua Meng, Weiqiang Wang, Ke Xu

    Abstract: Online GUI navigation on mobile devices has driven a lot of attention recent years since it contributes to many real-world applications. With the rapid development of large language models (LLM), multimodal large language models (MLLM) have tremendous potential on this task. However, existing MLLMs need high quality data to improve its abilities of making the correct navigation decisions according… ▽ More

    Submitted 1 July, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: 9 pages, 5 figures, Under review

  18. arXiv:2406.13726  [pdf, other

    math.OC cs.LG econ.GN

    Global Solutions to Master Equations for Continuous Time Heterogeneous Agent Macroeconomic Models

    Authors: Zhouzhou Gu, Mathieu Laurière, Sebastian Merkel, Jonathan Payne

    Abstract: We propose and compare new global solution algorithms for continuous time heterogeneous agent economies with aggregate shocks. First, we approximate the agent distribution so that equilibrium in the economy can be characterized by a high, but finite, dimensional non-linear partial differential equation. We consider different approximations: discretizing the number of agents, discretizing the agent… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  19. arXiv:2406.12641  [pdf, other

    cs.CL

    DetectBench: Can Large Language Model Detect and Piece Together Implicit Evidence?

    Authors: Zhouhong Gu, Lin Zhang, Xiaoxuan Zhu, Jiangjie Chen, Wenhao Huang, Yikai Zhang, Shusen Wang, Zheyu Ye, Yan Gao, Hongwei Feng, Yanghua Xiao

    Abstract: Detecting evidence within the context is a key step in the process of reasoning task. Evaluating and enhancing the capabilities of LLMs in evidence detection will strengthen context-based reasoning performance. This paper proposes a benchmark called DetectBench for verifying the ability to detect and piece together implicit evidence within a long context. DetectBench contains 3,928 multiple-choice… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  20. arXiv:2406.11931  [pdf, other

    cs.SE cs.AI cs.LG

    DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

    Authors: DeepSeek-AI, Qihao Zhu, Daya Guo, Zhihong Shao, Dejian Yang, Peiyi Wang, Runxin Xu, Y. Wu, Yukun Li, Huazuo Gao, Shirong Ma, Wangding Zeng, Xiao Bi, Zihui Gu, Hanwei Xu, Damai Dai, Kai Dong, Liyue Zhang, Yishi Piao, Zhibin Gou, Zhenda Xie, Zhewen Hao, Bingxuan Wang, Junxiao Song, Deli Chen , et al. (15 additional authors not shown)

    Abstract: We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathe… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  21. arXiv:2406.11374  [pdf, other

    cond-mat.str-el

    Pseudogap with Fermi arcs and Fermi pockets in half-filled twisted transition metal dichalcogenides

    Authors: Yong-Yue Zong, Zhao-Long Gu, Jian-Xin Li

    Abstract: Twisted transition metal dichalcogenides are a new platform for realizing strongly correlated physics with high tunability. Recent transport experiments have reported the realization of a Mott insulator, its bandwidth-driven evolution to a metal, and the strange metal behavior in proximity to the transition via the tuning of a displacement field in twisted $\mathrm{WSe_2}$($\mathrm{tWSe_2}$) fixed… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 8+11 pages, 4+8 figures

  22. arXiv:2406.10621  [pdf, other

    cs.CL cs.AI

    StrucText-Eval: An Autogenerated Benchmark for Evaluating Large Language Model's Ability in Structure-Rich Text Understanding

    Authors: Zhouhong Gu, Haoning Ye, Zeyang Zhou, Hongwei Feng, Yanghua Xiao

    Abstract: Given the substantial volumes of structured data held by many companies, enabling Large Language Models (LLMs) to directly understand structured text in non-structured forms could significantly enhance their capabilities across various business scenarios. To this end, we propose evaluation data generation method for assessing LLM's ability in understanding the structure-rich text, which generates… ▽ More

    Submitted 30 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

  23. arXiv:2405.19707  [pdf, other

    cs.CV

    DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark

    Authors: Haoxing Chen, Yan Hong, Zizheng Huang, Zhuoer Xu, Zhangxuan Gu, Yaohui Li, Jun Lan, Huijia Zhu, Jianfu Zhang, Weiqiang Wang, Huaxiong Li

    Abstract: Recently, video generation techniques have advanced rapidly. Given the popularity of video content on social media platforms, these models intensify concerns about the spread of fake information. Therefore, there is a growing demand for detectors capable of distinguishing between fake AI-generated videos and mitigating the potential harm caused by fake information. However, the lack of large-scale… ▽ More

    Submitted 22 August, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

  24. arXiv:2405.15216  [pdf, other

    cs.LG cs.CL cs.SD eess.AS

    Denoising LM: Pushing the Limits of Error Correction Models for Speech Recognition

    Authors: Zijin Gu, Tatiana Likhomanenko, He Bai, Erik McDermott, Ronan Collobert, Navdeep Jaitly

    Abstract: Language models (LMs) have long been used to improve results of automatic speech recognition (ASR) systems, but they are unaware of the errors that ASR systems make. Error correction models are designed to fix ASR errors, however, they showed little improvement over traditional LMs mainly due to the lack of supervised training data. In this paper, we present Denoising LM (DLM), which is a… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: under review

  25. arXiv:2405.12247  [pdf, other

    cs.CV

    Focus on Low-Resolution Information: Multi-Granular Information-Lossless Model for Low-Resolution Human Pose Estimation

    Authors: Zejun Gu, Zhong-Qiu Zhao, Hao Shen, Zhao Zhang

    Abstract: In real-world applications of human pose estimation, low-resolution input images are frequently encountered when the performance of the image acquisition equipment is limited or the shooting distance is too far. However, existing state-of-the-art models for human pose estimation perform poorly on low-resolution images. One key reason is the presence of downsampling layers in these models, e.g., st… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: 8 pages, 5 figures, conference

  26. arXiv:2405.11640  [pdf, other

    cs.AI cs.CL cs.CV

    Inquire, Interact, and Integrate: A Proactive Agent Collaborative Framework for Zero-Shot Multimodal Medical Reasoning

    Authors: Zishan Gu, Fenglin Liu, Changchang Yin, Ping Zhang

    Abstract: The adoption of large language models (LLMs) in healthcare has attracted significant research interest. However, their performance in healthcare remains under-investigated and potentially limited, due to i) they lack rich domain-specific knowledge and medical reasoning skills; and ii) most state-of-the-art LLMs are unimodal, text-only models that cannot directly process multimodal inputs. To this… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

  27. arXiv:2405.11448  [pdf, other

    cs.CV

    Cross-Domain Knowledge Distillation for Low-Resolution Human Pose Estimation

    Authors: Zejun Gu, Zhong-Qiu Zhao, Henghui Ding, Hao Shen, Zhao Zhang, De-Shuang Huang

    Abstract: In practical applications of human pose estimation, low-resolution inputs frequently occur, and existing state-of-the-art models perform poorly with low-resolution images. This work focuses on boosting the performance of low-resolution models by distilling knowledge from a high-resolution model. However, we face the challenge of feature size mismatch and class number mismatch when applying knowled… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: 11 pages, 5 figures

  28. arXiv:2405.10691  [pdf, other

    eess.IV cs.CV

    LoCI-DiffCom: Longitudinal Consistency-Informed Diffusion Model for 3D Infant Brain Image Completion

    Authors: Zihao Zhu, Tianli Tao, Yitian Tao, Haowen Deng, Xinyi Cai, Gaofeng Wu, Kaidong Wang, Haifeng Tang, Lixuan Zhu, Zhuoyang Gu, Jiawei Huang, Dinggang Shen, Han Zhang

    Abstract: The infant brain undergoes rapid development in the first few years after birth.Compared to cross-sectional studies, longitudinal studies can depict the trajectories of infants brain development with higher accuracy, statistical power and flexibility.However, the collection of infant longitudinal magnetic resonance (MR) data suffers a notorious dropout problem, resulting in incomplete datasets wit… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  29. arXiv:2405.10316  [pdf, other

    cs.CV cs.GR

    Analogist: Out-of-the-box Visual In-Context Learning with Image Diffusion Model

    Authors: Zheng Gu, Shiyuan Yang, Jing Liao, Jing Huo, Yang Gao

    Abstract: Visual In-Context Learning (ICL) has emerged as a promising research area due to its capability to accomplish various tasks with limited example pairs through analogical reasoning. However, training-based visual ICL has limitations in its ability to generalize to unseen tasks and requires the collection of a diverse task dataset. On the other hand, existing methods in the inference-based visual IC… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: Project page: https://analogist2d.github.io

  30. arXiv:2405.08447  [pdf, other

    cs.HC

    AI-Resilient Interfaces

    Authors: Elena L. Glassman, Ziwei Gu, Jonathan K. Kummerfeld

    Abstract: AI is powerful, but it can make choices that result in objective errors, contextually inappropriate outputs, and disliked options. We need AI-resilient interfaces that help people be resilient to the AI choices that are not right, or not right for them. To support this goal, interfaces need to help users notice and have the context to appropriately judge those AI choices. Existing human-AI interac… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  31. arXiv:2405.04434  [pdf, other

    cs.CL cs.AI

    DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

    Authors: DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding , et al. (132 additional authors not shown)

    Abstract: We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference… ▽ More

    Submitted 19 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

  32. arXiv:2405.01882  [pdf, other

    cs.RO cs.AI eess.SP

    Millimeter Wave Radar-based Human Activity Recognition for Healthcare Monitoring Robot

    Authors: Zhanzhong Gu, Xiangjian He, Gengfa Fang, Chengpei Xu, Feng Xia, Wenjing Jia

    Abstract: Healthcare monitoring is crucial, especially for the daily care of elderly individuals living alone. It can detect dangerous occurrences, such as falls, and provide timely alerts to save lives. Non-invasive millimeter wave (mmWave) radar-based healthcare monitoring systems using advanced human activity recognition (HAR) models have recently gained significant attention. However, they encounter cha… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  33. arXiv:2405.00797  [pdf, other

    cs.RO cs.CV

    ADM: Accelerated Diffusion Model via Estimated Priors for Robust Motion Prediction under Uncertainties

    Authors: Jiahui Li, Tianle Shen, Zekai Gu, Jiawei Sun, Chengran Yuan, Yuhang Han, Shuo Sun, Marcelo H. Ang Jr

    Abstract: Motion prediction is a challenging problem in autonomous driving as it demands the system to comprehend stochastic dynamics and the multi-modal nature of real-world agent interactions. Diffusion models have recently risen to prominence, and have proven particularly effective in pedestrian motion prediction tasks. However, the significant time consumption and sensitivity to noise have limited the r… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: 7 pages, 4 figures

  34. arXiv:2404.16770  [pdf, other

    cond-mat.str-el

    Pseudogap phase as fluctuating pair density wave

    Authors: Zheng-Yuan Yue, Zheng-Tao Xu, Shuo Yang, Zheng-Cheng Gu

    Abstract: The physical nature of pseudogap phase is one of the most important and intriguing problems towards understanding the key mechanism of high temperature superconductivity in cuprates. Theoretically, the square-lattice $t$-$J$ model is widely believed to be the simplest toy model that captures the essential physics of cuprate superconductors. We employ the Grassmann tensor product state approach to… ▽ More

    Submitted 15 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: 10 pages, 13 figures, references added

  35. arXiv:2404.15254  [pdf, other

    cs.CV

    UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition

    Authors: Bin Wang, Zhuangcheng Gu, Guang Liang, Chao Xu, Bo Zhang, Botian Shi, Conghui He

    Abstract: The paper introduces the UniMER dataset, marking the first study on Mathematical Expression Recognition (MER) targeting complex real-world scenarios. The UniMER dataset includes a large-scale training set, UniMER-1M, which offers unprecedented scale and diversity with one million training instances to train high-quality, robust models. Additionally, UniMER features a meticulously designed, diverse… ▽ More

    Submitted 5 September, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

    Comments: Project Website: https://github.com/opendatalab/UniMERNet

  36. arXiv:2404.13671  [pdf, other

    cs.CV cs.LG

    FiLo: Zero-Shot Anomaly Detection by Fine-Grained Description and High-Quality Localization

    Authors: Zhaopeng Gu, Bingke Zhu, Guibo Zhu, Yingying Chen, Hao Li, Ming Tang, Jinqiao Wang

    Abstract: Zero-shot anomaly detection (ZSAD) methods entail detecting anomalies directly without access to any known normal or abnormal samples within the target item categories. Existing approaches typically rely on the robust generalization capabilities of multimodal pretrained models, computing similarities between manually crafted textual features representing "normal" or "abnormal" semantics and image… ▽ More

    Submitted 25 July, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

    Comments: Accepted by ACM MM 2024

  37. arXiv:2404.09872  [pdf, other

    cs.CV

    Conditional Prototype Rectification Prompt Learning

    Authors: Haoxing Chen, Yaohui Li, Zizheng Huang, Yan Hong, Zhuoer Xu, Zhangxuan Gu, Jun Lan, Huijia Zhu, Weiqiang Wang

    Abstract: Pre-trained large-scale vision-language models (VLMs) have acquired profound understanding of general visual concepts. Recent advancements in efficient transfer learning (ETL) have shown remarkable success in fine-tuning VLMs within the scenario of limited data, introducing only a few parameters to harness task-specific insights from VLMs. Despite significant progress, current leading ETL methods… ▽ More

    Submitted 20 August, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

  38. arXiv:2404.07598  [pdf, other

    physics.optics physics.app-ph

    Electro-optically Modulated Nonlinear Metasurfaces

    Authors: Zhengqing He, Lun Qu, Wei Wu, Jikun Liu, Jingfei You, Weiye Liu, Lu Bai, Chunyan Jin, Chenxiong Wang, Zhidong Gu, Wei Cai, Mengxin Ren, Jingjun Xu

    Abstract: Tunable nonlinearity facilitates the creation of reconfigurable nonlinear metasurfaces, enabling innovative applications in signal processing, light switching, and sensing. This paper presents a novel approach to electrically modulate SHG from a lithium niobate (LN) metasurface, exploiting the electro-optical (EO) effect. By fabricating a nanohole array metasurface on a thin LN film and applying a… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: 4 pages, 4 figures

  39. arXiv:2404.05685  [pdf, other

    cond-mat.str-el quant-ph

    Global phase diagram of doped quantum spin liquid on the Kagome lattice

    Authors: Zheng-Tao Xu, Zheng-Cheng Gu, Shuo Yang

    Abstract: It has long been believed that doped quantum spin liquids (QSLs) can give rise to fascinating quantum phases, including the possibility of high-temperature superconductivity (SC) as proposed by P. W. Anderson's resonating valence bond (RVB) scenario. The Kagome lattice $t$-$J$ model is known to exhibit spin liquid behavior at half-filling, making it an ideal system for studying the properties of d… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: 11 pages, 17 figures

  40. arXiv:2403.16062  [pdf

    eess.SP

    Holography inspired self-controlled reconfigurable intelligent surface

    Authors: Jieao Zhu, Ze Gu, Qian Ma, Linglong Dai, Tie Jun Cui

    Abstract: Among various promising candidate technologies for the sixth-generation (6G) wireless communications, recent advances in microwave metasurfaces have sparked a new research area of reconfigurable intelligent surfaces (RISs). By controllably reprogramming the wireless propagation channel, RISs are envisioned to achieve low-cost wireless capacity boosting, coverage extension, and enhanced energy effi… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: Traditional BS-controlled RISs suffer from complicated control cables. To "cut" the control cables, we propose a self-controlled RIS by leveraging the holographic interference principle, thus realizing autonomous RIS beamforming

  41. arXiv:2403.15993  [pdf, other

    cs.RO

    Robust-Locomotion-by-Logic: Perturbation-Resilient Bipedal Locomotion via Signal Temporal Logic Guided Model Predictive Control

    Authors: Zhaoyuan Gu, Yuntian Zhao, Yipu Chen, Rongming Guo, Jennifer K. Leestma, Gregory S. Sawicki, Ye Zhao

    Abstract: This study introduces a robust planning framework that utilizes a model predictive control (MPC) approach, enhanced by incorporating signal temporal logic (STL) specifications. This marks the first-ever study to apply STL-guided trajectory optimization for bipedal locomotion, specifically designed to handle both translational and orientational perturbations. Existing recovery strategies often stru… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

  42. arXiv:2403.15718  [pdf, other

    math.AP

    On a dryout point for a stationary incompressible thermal fluid with phase transition in a pipe

    Authors: Yoshikazu Giga, Zhongyang Gu

    Abstract: A dryout point is recognized as the position where the phase transition from liquid to vapor occurs. In the one-dimensional case, by solving the stationary incompressible Navier-Stokes-Fourier equations with phase transition, we derive a necessary and sufficient condition for a dryout point to exist when the temperature at the liquid-vapor interface is given. In addition, we show by considering th… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

    Comments: 20 pages, 12 figures

    MSC Class: 35Q79; 76D05; 80A22

  43. arXiv:2403.13433  [pdf, other

    cs.AI cs.CL cs.CY

    AgentGroupChat: An Interactive Group Chat Simulacra For Better Eliciting Emergent Behavior

    Authors: Zhouhong Gu, Xiaoxuan Zhu, Haoran Guo, Lin Zhang, Yin Cai, Hao Shen, Jiangjie Chen, Zheyu Ye, Yifei Dai, Yan Gao, Yao Hu, Hongwei Feng, Yanghua Xiao

    Abstract: Language significantly influences the formation and evolution of Human emergent behavior, which is crucial in understanding collective intelligence within human societies. Considering that the study of how language affects human behavior needs to put it into the dynamic scenarios in which it is used, we introduce AgentGroupChat in this paper, a simulation that delves into the complex role of langu… ▽ More

    Submitted 4 April, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

  44. arXiv:2403.12580  [pdf, other

    cs.CV

    Real-IAD: A Real-World Multi-View Dataset for Benchmarking Versatile Industrial Anomaly Detection

    Authors: Chengjie Wang, Wenbing Zhu, Bin-Bin Gao, Zhenye Gan, Jianning Zhang, Zhihao Gu, Shuguang Qian, Mingang Chen, Lizhuang Ma

    Abstract: Industrial anomaly detection (IAD) has garnered significant attention and experienced rapid development. However, the recent development of IAD approach has encountered certain difficulties due to dataset limitations. On the one hand, most of the state-of-the-art methods have achieved saturation (over 99% in AUROC) on mainstream datasets such as MVTec, and the differences of methods cannot be well… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: It is accepted by CVPR2024

  45. arXiv:2403.07825  [pdf, other

    cs.CL

    The Missing Piece in Model Editing: A Deep Dive into the Hidden Damage Brought By Model Editing

    Authors: Jianchen Wang, Zhouhong Gu, Xiaoxuan Zhu, Lin Zhang, Haoning Ye, Zhuozhi Xiong, Hongwei Feng, Yanghua Xiao

    Abstract: Large Language Models have revolutionized numerous tasks with their remarkable efficacy. However, editing these models, crucial for rectifying outdated or erroneous information, often leads to a complex issue known as the ripple effect in the hidden space. While difficult to detect, this effect can significantly impede the efficacy of model editing tasks and deteriorate model performance. This pap… ▽ More

    Submitted 2 July, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  46. arXiv:2403.05644  [pdf, other

    stat.ME stat.AP

    TSSS: A Novel Triangulated Spherical Spline Smoothing for Surface-based Data

    Authors: Zhiling Gu, Shan Yu, Guannan Wang, Ming-Jun Lai, Li Wang

    Abstract: Surface-based data is commonly observed in diverse practical applications spanning various fields. In this paper, we introduce a novel nonparametric method to discover the underlying signals from data distributed on complex surface-based domains. Our approach involves a penalized spline estimator defined on a triangulation of surface patches, which enables effective signal extraction and recovery.… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: 56 pages, 16 figures

    MSC Class: 62G05; 62G08

  47. arXiv:2403.04652  [pdf, other

    cs.CL cs.AI

    Yi: Open Foundation Models by 01.AI

    Authors: 01. AI, :, Alex Young, Bei Chen, Chao Li, Chengen Huang, Ge Zhang, Guanwei Zhang, Heng Li, Jiangcheng Zhu, Jianqun Chen, Jing Chang, Kaidong Yu, Peng Liu, Qiang Liu, Shawn Yue, Senbin Yang, Shiming Yang, Tao Yu, Wen Xie, Wenhao Huang, Xiaohui Hu, Xiaoyi Ren, Xinyao Niu, Pengcheng Nie , et al. (7 additional authors not shown)

    Abstract: We introduce the Yi model family, a series of language and multimodal models that demonstrate strong multi-dimensional capabilities. The Yi model family is based on 6B and 34B pretrained language models, then we extend them to chat models, 200K long context models, depth-upscaled models, and vision-language models. Our base models achieve strong performance on a wide range of benchmarks like MMLU,… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  48. arXiv:2403.01704  [pdf

    physics.optics

    Giant second harmonic generation in supertwisted WS2 spirals grown in step edge particle induced non-Euclidean surfaces

    Authors: Tong Tong, Ruijie Chen, Yuxuan Ke, Qian Wang, Xinchao Wang, Qinjun Sun, Jie Chen, Zhiyuan Gu, Ying Yu, Hongyan Wei, Yuying Hao, Xiaopeng Fan, Qing Zhang

    Abstract: In moiré crystals resulting from the stacking of twisted two-dimensional (2D) layered materials, a subtle adjustment in the twist angle surprisingly gives rise to a wide range of correlated optical and electrical properties. Herein, we report the synthesis of supertwisted WS2 spirals and the observation of giant second harmonic generation (SHG) in these spirals. Supertwisted WS2 spirals featuring… ▽ More

    Submitted 19 July, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

    Comments: 26 pages, 4 figures

  49. arXiv:2402.19270  [pdf, other

    cs.CV

    Learning Intra-view and Cross-view Geometric Knowledge for Stereo Matching

    Authors: Rui Gong, Weide Liu, Zaiwang Gu, Xulei Yang, Jun Cheng

    Abstract: Geometric knowledge has been shown to be beneficial for the stereo matching task. However, prior attempts to integrate geometric insights into stereo matching algorithms have largely focused on geometric knowledge from single images while crucial cross-view factors such as occlusion and matching uniqueness have been overlooked. To address this gap, we propose a novel Intra-view and Cross-view Geom… ▽ More

    Submitted 6 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: Accepted to CVPR2024

  50. arXiv:2402.18986  [pdf, other

    cs.CR

    Always be Pre-Training: Representation Learning for Network Intrusion Detection with GNNs

    Authors: Zhengyao Gu, Diego Troy Lopez, Lilas Alrahis, Ozgur Sinanoglu

    Abstract: Graph neural network-based network intrusion detection systems have recently demonstrated state-of-the-art performance on benchmark datasets. Nevertheless, these methods suffer from a reliance on target encoding for data pre-processing, limiting widespread adoption due to the associated need for annotated labels--a cost-prohibitive requirement. In this work, we propose a solution involving in-cont… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

    Comments: Will appear in the 2024 International Symposium on Quality Electronic Design (ISQED'24)