Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 90 results for author: Fei, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.14887  [pdf, other

    cs.CL

    InternLM-Law: An Open Source Chinese Legal Large Language Model

    Authors: Zhiwei Fei, Songyang Zhang, Xiaoyu Shen, Dawei Zhu, Xiao Wang, Maosong Cao, Fengzhe Zhou, Yining Li, Wenwei Zhang, Dahua Lin, Kai Chen, Jidong Ge

    Abstract: While large language models (LLMs) have showcased impressive capabilities, they struggle with addressing legal queries due to the intricate complexities and specialized expertise required in the legal field. In this paper, we introduce InternLM-Law, a specialized LLM tailored for addressing diverse legal queries related to Chinese laws, spanning from responding to standard legal questions (e.g., l… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Our dataset, code and models will be released at https://github.com/InternLM/InternLM-Law

  2. arXiv:2406.03040  [pdf, other

    cs.SE

    Correlation of Software-in-the-Loop Simulation with Physical Testing for Autonomous Driving

    Authors: Zhennan Fei, Mikael Andersson, Andreas Tingberg

    Abstract: Software-in-the-loop (SIL) simulation is a widely used method for the rapid development and testing of autonomous vehicles because of its flexibility and efficiency. This paper presents a case study on the validation of an in-house developed SIL simulation toolchain. The presented validation process involves the design and execution of a set of representative scenarios on the test track. To align… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  3. arXiv:2406.01159  [pdf, other

    cs.CV

    Dimba: Transformer-Mamba Diffusion Models

    Authors: Zhengcong Fei, Mingyuan Fan, Changqian Yu, Debang Li, Youqiang Zhang, Junshi Huang

    Abstract: This paper unveils Dimba, a new text-to-image diffusion model that employs a distinctive hybrid architecture combining Transformer and Mamba elements. Specifically, Dimba sequentially stacked blocks alternate between Transformer and Mamba layers, and integrate conditional information through the cross-attention layer, thus capitalizing on the advantages of both architectural paradigms. We investig… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  4. arXiv:2405.12209  [pdf, other

    cs.CL

    MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark

    Authors: Hongwei Liu, Zilong Zheng, Yuxuan Qiao, Haodong Duan, Zhiwei Fei, Fengzhe Zhou, Wenwei Zhang, Songyang Zhang, Dahua Lin, Kai Chen

    Abstract: Recent advancements in large language models (LLMs) have showcased significant improvements in mathematics. However, traditional math benchmarks like GSM8k offer a unidimensional perspective, falling short in providing a holistic assessment of the LLMs' math capabilities. To address this gap, we introduce MathBench, a new benchmark that rigorously assesses the mathematical capabilities of large la… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: Project: https://github.com/open-compass/MathBench

  5. arXiv:2404.13358  [pdf, other

    cs.SD cs.AI eess.AS

    Music Consistency Models

    Authors: Zhengcong Fei, Mingyuan Fan, Junshi Huang

    Abstract: Consistency models have exhibited remarkable capabilities in facilitating efficient image/video generation, enabling synthesis with minimal sampling steps. It has proven to be advantageous in mitigating the computational burdens associated with diffusion models. Nevertheless, the application of consistency models in music generation remains largely unexplored. To address this gap, we present Music… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

  6. arXiv:2404.04478  [pdf, other

    cs.CV

    Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models

    Authors: Zhengcong Fei, Mingyuan Fan, Changqian Yu, Debang Li, Junshi Huang

    Abstract: Transformers have catalyzed advancements in computer vision and natural language processing (NLP) fields. However, substantial computational complexity poses limitations for their application in long-context tasks, such as high-resolution image generation. This paper introduces a series of architectures adapted from the RWKV model used in the NLP, with requisite modifications tailored for diffusio… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  7. arXiv:2404.01059  [pdf, ps, other

    cs.IT eess.SP

    STAR-RIS Aided Secure MIMO Communication Systems

    Authors: Xiequn Dong, Zesong Fei, Xinyi Wang, Meng Hua, Qingqing Wu

    Abstract: This paper investigates simultaneous transmission and reflection reconfigurable intelligent surface (STAR-RIS) aided physical layer security (PLS) in multiple-input multiple-output (MIMO) systems, where the base station (BS) transmits secrecy information with the aid of STAR-RIS against multiple eavesdroppers equipped with multiple antennas. We aim to maximize the secrecy rate by jointly optimizin… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  8. arXiv:2403.17297  [pdf, other

    cs.CL cs.AI

    InternLM2 Technical Report

    Authors: Zheng Cai, Maosong Cao, Haojiong Chen, Kai Chen, Keyu Chen, Xin Chen, Xun Chen, Zehui Chen, Zhi Chen, Pei Chu, Xiaoyi Dong, Haodong Duan, Qi Fan, Zhaoye Fei, Yang Gao, Jiaye Ge, Chenya Gu, Yuzhe Gu, Tao Gui, Aijia Guo, Qipeng Guo, Conghui He, Yingfan Hu, Ting Huang, Tao Jiang , et al. (75 additional authors not shown)

    Abstract: The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI). However, replicating such advancements in open-source models has been challenging. This paper introduces InternLM2, an open-source LLM that outperforms its predecessors in comprehensive evaluations across 6 dimensions and 30 benchmarks, long-context m… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  9. arXiv:2402.19282  [pdf, other

    cs.CL

    WanJuan-CC: A Safe and High-Quality Open-sourced English Webtext Dataset

    Authors: Jiantao Qiu, Haijun Lv, Zhenjiang Jin, Rui Wang, Wenchang Ning, Jia Yu, ChaoBin Zhang, Zhenxiang Li, Pei Chu, Yuan Qu, Jin Shi, Lindong Lu, Runyu Peng, Zhiyuan Zeng, Huanze Tang, Zhikai Lei, Jiawei Hong, Keyu Chen, Zhaoye Fei, Ruiliang Xu, Wei Li, Zhongying Tu, Lin Dahua, Yu Qiao, Hang Yan , et al. (1 additional authors not shown)

    Abstract: This paper presents WanJuan-CC, a safe and high-quality open-sourced English webtext dataset derived from Common Crawl data. The study addresses the challenges of constructing large-scale pre-training datasets for language models, which require vast amounts of high-quality data. A comprehensive process was designed to handle Common Crawl data, including extraction, heuristic rule filtering, fuzzy… ▽ More

    Submitted 17 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

  10. arXiv:2402.14526  [pdf, other

    cs.CL cs.AI

    Balanced Data Sampling for Language Model Training with Clustering

    Authors: Yunfan Shao, Linyang Li, Zhaoye Fei, Hang Yan, Dahua Lin, Xipeng Qiu

    Abstract: Data plays a fundamental role in the training of Large Language Models (LLMs). While attention has been paid to the collection and composition of datasets, determining the data sampling strategy in training remains an open question. Most LLMs are trained with a simple strategy, random sampling. However, this sampling strategy ignores the unbalanced nature of training data distribution, which can b… ▽ More

    Submitted 3 June, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: ACL 2024 (findings), Code is released at https://github.com/choosewhatulike/cluster-clip

  11. arXiv:2402.12399  [pdf, other

    cs.LG cs.AI cs.CL

    Turn Waste into Worth: Rectifying Top-$k$ Router of MoE

    Authors: Zhiyuan Zeng, Qipeng Guo, Zhaoye Fei, Zhangyue Yin, Yunhua Zhou, Linyang Li, Tianxiang Sun, Hang Yan, Dahua Lin, Xipeng Qiu

    Abstract: Sparse Mixture of Experts (MoE) models are popular for training large language models due to their computational efficiency. However, the commonly used top-$k$ routing mechanism suffers from redundancy computation and memory costs due to the unbalanced routing. Some experts are overflow, where the exceeding tokens are dropped. While some experts are vacant, which are padded with zeros, negatively… ▽ More

    Submitted 21 February, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

  12. arXiv:2402.06332  [pdf, other

    cs.CL

    InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning

    Authors: Huaiyuan Ying, Shuo Zhang, Linyang Li, Zhejian Zhou, Yunfan Shao, Zhaoye Fei, Yichuan Ma, Jiawei Hong, Kuikun Liu, Ziyi Wang, Yudong Wang, Zijian Wu, Shuaibin Li, Fengzhe Zhou, Hongwei Liu, Songyang Zhang, Wenwei Zhang, Hang Yan, Xipeng Qiu, Jiayu Wang, Kai Chen, Dahua Lin

    Abstract: The math abilities of large language models can represent their abstract reasoning ability. In this paper, we introduce and open-source our math reasoning LLMs InternLM-Math which is continue pre-trained from InternLM2. We unify chain-of-thought reasoning, reward modeling, formal reasoning, data augmentation, and code interpreter in a unified seq2seq format and supervise our model to be a versatil… ▽ More

    Submitted 24 May, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

  13. arXiv:2402.05608  [pdf, other

    cs.CV cs.MM

    Scalable Diffusion Models with State Space Backbone

    Authors: Zhengcong Fei, Mingyuan Fan, Changqian Yu, Junshi Huang

    Abstract: This paper presents a new exploration into a category of diffusion models built upon state space architecture. We endeavor to train diffusion models for image data, wherein the traditional U-Net backbone is supplanted by a state space backbone, functioning on raw patches or latent space. Given its notable efficacy in accommodating long-range dependencies, Diffusion State Space Models (DiS) are dis… ▽ More

    Submitted 28 March, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

  14. arXiv:2401.14666  [pdf, other

    cs.IT eess.SP

    Joint Transmitter Design for Robust Secure Radar-Communication Coexistence Systems

    Authors: Peng Liu, Zesong Fei, Xinyi Wang, Zhong Zheng, Xiangnan Li, Jie Xu

    Abstract: This paper investigates the spectrum sharing between a multiple-input single-output (MISO) secure communication system and a multiple-input multiple-output (MIMO) radar system in the presence of one suspicious eavesdropper. We jointly design the radar waveform and communication beamforming vector at the two systems, such that the interference between the base station (BS) and radar is reduced, and… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

  15. arXiv:2401.14624  [pdf, other

    cs.CL

    Query of CC: Unearthing Large Scale Domain-Specific Knowledge from Public Corpora

    Authors: Zhaoye Fei, Yunfan Shao, Linyang Li, Zhiyuan Zeng, Conghui He, Hang Yan, Dahua Lin, Xipeng Qiu

    Abstract: Large language models have demonstrated remarkable potential in various tasks, however, there remains a significant scarcity of open-source models and data for specific domains. Previous works have primarily focused on manually specifying resources and collecting high-quality data on specific domains, which significantly consume time and effort. To address this limitation, we propose an efficient… ▽ More

    Submitted 4 March, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

    Comments: We have released the full data (total of 735GB) in https://huggingface.co/datasets/Query-of-CC/knowledge_pile_full and partial data (about 40GB) in https://huggingface.co/datasets/Query-of-CC/knowledge_pile

  16. arXiv:2401.02071  [pdf, other

    cs.IT eess.SP

    Joint Beamforming and Offloading Design for Integrated Sensing, Communication and Computation System

    Authors: Peng Liu, Zesong Fei, Xinyi Wang, Yiqing Zhou, Yan Zhang, Fan Liu

    Abstract: Mobile edge computing (MEC) is powerful to alleviate the heavy computing tasks in integrated sensing and communication (ISAC) systems. In this paper, we investigate joint beamforming and offloading design in a three-tier integrated sensing, communication and computation (ISCC) framework comprising one cloud server, multiple mobile edge servers, and multiple terminals. While executing sensing tasks… ▽ More

    Submitted 26 January, 2024; v1 submitted 4 January, 2024; originally announced January 2024.

    Comments: 5 pages, 4 figures, submitted to IEEE journals for possible publication

  17. arXiv:2312.14611  [pdf, other

    cs.CV

    Tuning-Free Inversion-Enhanced Control for Consistent Image Editing

    Authors: Xiaoyue Duan, Shuhao Cui, Guoliang Kang, Baochang Zhang, Zhengcong Fei, Mingyuan Fan, Junshi Huang

    Abstract: Consistent editing of real images is a challenging task, as it requires performing non-rigid edits (e.g., changing postures) to the main objects in the input image without changing their identity or attributes. To guarantee consistent attributes, some existing methods fine-tune the entire model or the textual embedding for structural consistency, but they are time-consuming and fail to perform non… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

  18. arXiv:2311.15830  [pdf, other

    cs.SD cs.CV eess.AS

    A-JEPA: Joint-Embedding Predictive Architecture Can Listen

    Authors: Zhengcong Fei, Mingyuan Fan, Junshi Huang

    Abstract: This paper presents that the masked-modeling principle driving the success of large foundational vision models can be effectively applied to audio by making predictions in a latent space. We introduce Audio-based Joint-Embedding Predictive Architecture (A-JEPA), a simple extension method for self-supervised learning from the audio spectrum. Following the design of I-JEPA, our A-JEPA encodes visibl… ▽ More

    Submitted 11 January, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

    Comments: arXiv admin note: text overlap with arXiv:2207.06405 by other authors

  19. arXiv:2310.11227  [pdf, other

    cs.CL cs.AI

    RealBehavior: A Framework for Faithfully Characterizing Foundation Models' Human-like Behavior Mechanisms

    Authors: Enyu Zhou, Rui Zheng, Zhiheng Xi, Songyang Gao, Xiaoran Fan, Zichu Fei, Jingting Ye, Tao Gui, Qi Zhang, Xuanjing Huang

    Abstract: Reports of human-like behaviors in foundation models are growing, with psychological theories providing enduring tools to investigate these behaviors. However, current research tends to directly apply these human-oriented tools without verifying the faithfulness of their outcomes. In this paper, we introduce a framework, RealBehavior, which is designed to characterize the humanoid behaviors of mod… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

    Comments: Accepted to Findings of EMNLP 2023

  20. arXiv:2309.16289  [pdf, other

    cs.CL cs.AI cs.LG

    LawBench: Benchmarking Legal Knowledge of Large Language Models

    Authors: Zhiwei Fei, Xiaoyu Shen, Dawei Zhu, Fengzhe Zhou, Zhuo Han, Songyang Zhang, Kai Chen, Zongwen Shen, Jidong Ge

    Abstract: Large language models (LLMs) have demonstrated strong capabilities in various aspects. However, when applying them to the highly specialized, safe-critical legal domain, it is unclear how much legal knowledge they possess and whether they can reliably perform legal-related tasks. To address this gap, we propose a comprehensive evaluation benchmark LawBench. LawBench has been meticulously crafted t… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

  21. arXiv:2309.04965  [pdf, other

    cs.CV cs.AI cs.CL

    Prefix-diffusion: A Lightweight Diffusion Model for Diverse Image Captioning

    Authors: Guisheng Liu, Yi Li, Zhengcong Fei, Haiyan Fu, Xiangyang Luo, Yanqing Guo

    Abstract: While impressive performance has been achieved in image captioning, the limited diversity of the generated captions and the large parameter scale remain major barriers to the real-word application of these systems. In this work, we propose a lightweight image captioning network in combination with continuous diffusion, called Prefix-diffusion. To achieve diversity, we design an efficient method th… ▽ More

    Submitted 16 October, 2023; v1 submitted 10 September, 2023; originally announced September 2023.

    Comments: 11 pages,4 figures, 6 tables

  22. arXiv:2309.03118  [pdf, other

    cs.CL

    Knowledge Solver: Teaching LLMs to Search for Domain Knowledge from Knowledge Graphs

    Authors: Chao Feng, Xinyu Zhang, Zichu Fei

    Abstract: Large language models (LLMs), such as ChatGPT and GPT-4, are versatile and can solve different tasks due to their emergent ability and generalizability. However, LLMs sometimes lack domain-specific knowledge to perform tasks, which would also cause hallucination during inference. In some previous works, additional modules like graph neural networks (GNNs) are trained on retrieved knowledge from ex… ▽ More

    Submitted 6 September, 2023; originally announced September 2023.

  23. arXiv:2308.03409  [pdf, other

    cs.CV

    DiT: Efficient Vision Transformers with Dynamic Token Routing

    Authors: Yuchen Ma, Zhengcong Fei, Junshi Huang

    Abstract: Recently, the tokens of images share the same static data flow in many dense networks. However, challenges arise from the variance among the objects in images, such as large variations in the spatial scale and difficulties of recognition for visual entities. In this paper, we propose a data-dependent token routing strategy to elaborate the routing paths of image tokens for Dynamic Vision Transform… ▽ More

    Submitted 11 August, 2023; v1 submitted 7 August, 2023; originally announced August 2023.

  24. arXiv:2308.03283  [pdf, other

    quant-ph cs.LG

    High-rate discretely-modulated continuous-variable quantum key distribution using quantum machine learning

    Authors: Qin Liao, Jieyu Liu, Anqi Huang, Lei Huang, Zhuoying Fei, Xiquan Fu

    Abstract: We propose a high-rate scheme for discretely-modulated continuous-variable quantum key distribution (DM CVQKD) using quantum machine learning technologies, which divides the whole CVQKD system into three parts, i.e., the initialization part that is used for training and estimating quantum classifier, the prediction part that is used for generating highly correlated raw keys, and the data-postproce… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

    Comments: 18 pages, 17 figures

  25. arXiv:2308.01117  [pdf

    cs.RO eess.SY

    Optimization-Based Motion Planning for Autonomous Agricultural Vehicles Turning in Constrained Headlands

    Authors: Chen Peng, Peng Wei, Zhenghao Fei, Yuankai Zhu, Stavros G. Vougioukas

    Abstract: Headland maneuvering is a crucial aspect of unmanned field operations for autonomous agricultural vehicles (AAVs). While motion planning for headland turning in open fields has been extensively studied and integrated into commercial auto-guidance systems, the existing methods primarily address scenarios with ample headland space and thus may not work in more constrained headland geometries. Commer… ▽ More

    Submitted 11 June, 2024; v1 submitted 2 August, 2023; originally announced August 2023.

  26. arXiv:2307.11345  [pdf, other

    cs.IT eess.SP

    Sensing Aided Covert Communications: Turning Interference into Allies

    Authors: Xinyi Wang, Zesong Fei, Peng Liu, J. Andrew Zhang, Qingqing Wu, Nan Wu

    Abstract: In this paper, we investigate the realization of covert communication in a general radar-communication cooperation system, which includes integrated sensing and communications as a special example. We explore the possibility of utilizing the sensing ability of radar to track and jam the aerial adversary target attempting to detect the transmission. Based on the echoes from the target, the extended… ▽ More

    Submitted 3 January, 2024; v1 submitted 21 July, 2023; originally announced July 2023.

    Comments: 13 pages, 12 figures, submitted to IEEE journals for potential publication

  27. arXiv:2307.10953  [pdf, other

    cs.CV cs.AI

    PE-YOLO: Pyramid Enhancement Network for Dark Object Detection

    Authors: Xiangchen Yin, Zhenda Yu, Zetao Fei, Wenjun Lv, Xin Gao

    Abstract: Current object detection models have achieved good results on many benchmark datasets, detecting objects in dark conditions remains a large challenge. To address this issue, we propose a pyramid enhanced network (PENet) and joint it with YOLOv3 to build a dark object detection framework named PE-YOLO. Firstly, PENet decomposes the image into four components of different resolutions using the Lapla… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

    Comments: Accepted at ICANN 2023

  28. arXiv:2307.09232  [pdf, ps, other

    cs.IT eess.SP

    Intelligent Reflecting Surface Assisted Localization: Performance Analysis and Algorithm Design

    Authors: Meng Hua, Qingqing Wu, Wen Chen, Zesong Fei, Hing Cheung So, Chau Yuen

    Abstract: The target sensing/localization performance is fundamentally limited by the line-of-sight link and severe signal attenuation over long distances. This paper considers a challenging scenario where the direct link between the base station (BS) and the target is blocked due to the surrounding blockages and leverages the intelligent reflecting surface (IRS) with some active sensors, termed as \textit{… ▽ More

    Submitted 25 September, 2023; v1 submitted 18 July, 2023; originally announced July 2023.

    Comments: The paper has been submitted to IEEE journal for possible publication

  29. arXiv:2307.06023  [pdf, other

    cs.IT

    On the Uplink Distributed Detection in UAV-enabled Aerial Cell-Free mMIMO Systems

    Authors: Xuesong Pan, Zhong Zheng, Xueqing Huang, Zesong Fei

    Abstract: In this paper, we investigate the uplink signal detection approaches in the cell-free massive MIMO systems with unmanned aerial vehicles (UAVs) serving as aerial access points (APs). The ground users are equipped with multiple antennas and the ground-to-air propagation channels are subject to correlated Rician fading. To overcome huge signaling overhead in the fully-centralized detection, we propo… ▽ More

    Submitted 12 July, 2023; originally announced July 2023.

  30. arXiv:2307.01727  [pdf, other

    cs.IT eess.SP

    Mutual Information Analysis for Factor Graph-based MIMO Iterative Detections through Error Functions

    Authors: Huan Li, Jingxuan Huang, Zesong Fei

    Abstract: The factor graph (FG) based iterative detection is considered an effective and practical method for multiple-input and multiple-out (MIMO), particularly massive MIMO (m-MIMO) systems. However, the convergence analysis for the FG-based iterative MIMO detection is insufficient, which is of great significance to the performance evaluation and algorithm design of detection methods. This paper investig… ▽ More

    Submitted 4 July, 2023; originally announced July 2023.

    Comments: 11 pages, 12 figures

  31. arXiv:2307.01525  [pdf, other

    cs.IT eess.SP

    OTFS-based Robust MMSE Precoding Design in Over-the-air Computation

    Authors: Dongkai Zhou, Jing Guo, Siqiang Wang, Zhong Zheng, Zesong Fei, Weijie Yuan, Xinyi Wang

    Abstract: Over-the-air computation (AirComp), as a data aggregation method that can improve network efficiency by exploiting the superposition characteristics of wireless channels, has received much attention recently. Meanwhile, the orthogonal time frequency space (OTFS) modulation can provide a strong Doppler resilience and facilitate reliable transmission for high-mobility communications. Hence, in this… ▽ More

    Submitted 26 March, 2024; v1 submitted 4 July, 2023; originally announced July 2023.

  32. FlexEdge: Digital Twin-Enabled Task Offloading for UAV-Aided Vehicular Edge Computing

    Authors: Bin Li, Wancheng Xie, Yinghui Ye, Lei Liu, Zesong Fei

    Abstract: Integrating unmanned aerial vehicles (UAVs) into vehicular networks have shown high potentials in affording intensive computing tasks. In this paper, we study the digital twin driven vehicular edge computing networks for adaptively computing resource management where an unmanned aerial vehicle (UAV) named FlexEdge acts as a flying server. In particular, we first formulate an energy consumption min… ▽ More

    Submitted 16 April, 2023; originally announced May 2023.

    Comments: 6 pages, 6 figures

    Journal ref: IEEE Transactions on Vehicular Technology (2023)1-6

  33. arXiv:2304.05818  [pdf, other

    cs.CV

    Gradient-Free Textual Inversion

    Authors: Zhengcong Fei, Mingyuan Fan, Junshi Huang

    Abstract: Recent works on personalized text-to-image generation usually learn to bind a special token with specific subjects or styles of a few given images by tuning its embedding through gradient descent. It is natural to question whether we can optimize the textual inversions by only accessing the process of model inference. As only requiring the forward computation to determine the textual inversion ret… ▽ More

    Submitted 12 April, 2023; originally announced April 2023.

  34. arXiv:2301.12144  [pdf, other

    cs.IT

    On the Mutual Information of Multi-RIS Assisted MIMO: From Operator-Valued Free Probability Aspect

    Authors: Zhong Zheng, Siqiang Wang, Zesong Fei, Zhi Sun, Jinhong Yuan

    Abstract: The reconfigurable intelligent surface (RIS) is useful to effectively improve the coverage and data rate of end-to-end communications. In contrast to the well-studied coverage-extension use case, in this paper, multiple RIS panels are introduced, aiming to enhance the data rate of multi-input multi-output (MIMO) channels in presence of insufficient scattering. Specifically, via the operator-valued… ▽ More

    Submitted 28 January, 2023; originally announced January 2023.

    Comments: 30 pages, 5 figures

  35. arXiv:2211.16769  [pdf, other

    cs.CV

    Uncertainty-Aware Image Captioning

    Authors: Zhengcong Fei, Mingyuan Fan, Li Zhu, Junshi Huang, Xiaoming Wei, Xiaolin Wei

    Abstract: It is well believed that the higher uncertainty in a word of the caption, the more inter-correlated context information is required to determine it. However, current image captioning methods usually consider the generation of all words in a sentence sequentially and equally. In this paper, we propose an uncertainty-aware image captioning framework, which parallelly and iteratively operates inserti… ▽ More

    Submitted 30 November, 2022; originally announced November 2022.

    Comments: Accepted by AAAI2023

  36. arXiv:2210.02291  [pdf, other

    cs.CV

    Progressive Text-to-Image Generation

    Authors: Zhengcong Fei, Mingyuan Fan, Li Zhu, Junshi Huang

    Abstract: Recently, Vector Quantized AutoRegressive (VQ-AR) models have shown remarkable results in text-to-image synthesis by equally predicting discrete image tokens from the top left to bottom right in the latent space. Although the simple generative process surprisingly works well, is this the best way to generate the image? For instance, human creation is more inclined to the outline-to-fine of an imag… ▽ More

    Submitted 20 September, 2023; v1 submitted 5 October, 2022; originally announced October 2022.

    Comments: Technique report

  37. arXiv:2210.01973  [pdf, other

    cs.CV cs.LG

    Meta-Ensemble Parameter Learning

    Authors: Zhengcong Fei, Shuman Tian, Junshi Huang, Xiaoming Wei, Xiaolin Wei

    Abstract: Ensemble of machine learning models yields improved performance as well as robustness. However, their memory requirements and inference costs can be prohibitively high. Knowledge distillation is an approach that allows a single model to efficiently capture the approximate performance of an ensemble while showing poor scalability as demand for re-training when introducing new teacher models. In thi… ▽ More

    Submitted 4 October, 2022; originally announced October 2022.

    Comments: technique report

  38. arXiv:2209.07697  [pdf, other

    cs.CL cs.AI

    Selecting Stickers in Open-Domain Dialogue through Multitask Learning

    Authors: Zhexin Zhang, Yeshuang Zhu, Zhengcong Fei, Jinchao Zhang, Jie Zhou

    Abstract: With the increasing popularity of online chatting, stickers are becoming important in our online communication. Selecting appropriate stickers in open-domain dialogue requires a comprehensive understanding of both dialogues and stickers, as well as the relationship between the two types of modalities. To tackle these challenges, we propose a multitask learning method comprised of three auxiliary t… ▽ More

    Submitted 15 September, 2022; originally announced September 2022.

    Comments: ACL 2022 findings, camera-ready

  39. arXiv:2209.06583  [pdf, other

    cs.IR cs.AI cs.CL

    Pre-training for Information Retrieval: Are Hyperlinks Fully Explored?

    Authors: Jiawen Wu, Xinyu Zhang, Yutao Zhu, Zheng Liu, Zikai Guo, Zhaoye Fei, Ruofei Lai, Yongkang Wu, Zhao Cao, Zhicheng Dou

    Abstract: Recent years have witnessed great progress on applying pre-trained language models, e.g., BERT, to information retrieval (IR) tasks. Hyperlinks, which are commonly used in Web pages, have been leveraged for designing pre-training objectives. For example, anchor texts of the hyperlinks have been used for simulating queries, thus constructing tremendous query-document pairs for pre-training. However… ▽ More

    Submitted 14 September, 2022; originally announced September 2022.

    Comments: work in progress

  40. arXiv:2208.09129  [pdf, other

    cs.CL

    Coarse-to-Fine: Hierarchical Multi-task Learning for Natural Language Understanding

    Authors: Zhaoye Fei, Yu Tian, Yongkang Wu, Xinyu Zhang, Yutao Zhu, Zheng Liu, Jiawen Wu, Dejiang Kong, Ruofei Lai, Zhao Cao, Zhicheng Dou, Xipeng Qiu

    Abstract: Generalized text representations are the foundation of many natural language understanding tasks. To fully utilize the different corpus, it is inevitable that models need to understand the relevance among them. However, many methods ignore the relevance and adopt a single-channel model (a coarse paradigm) directly for all tasks, which lacks enough rationality and interpretation. In addition, some… ▽ More

    Submitted 18 August, 2022; originally announced August 2022.

    Comments: Accpeted by COLING 2022

  41. End-to-end deep learning for directly estimating grape yield from ground-based imagery

    Authors: Alexander G. Olenskyj, Brent S. Sams, Zhenghao Fei, Vishal Singh, Pranav V. Raja, Gail M. Bornhorst, J. Mason Earles

    Abstract: Yield estimation is a powerful tool in vineyard management, as it allows growers to fine-tune practices to optimize yield and quality. However, yield estimation is currently performed using manual sampling, which is time-consuming and imprecise. This study demonstrates the application of proximal imaging combined with deep learning for yield estimation in vineyards. Continuous data collection usin… ▽ More

    Submitted 3 August, 2022; originally announced August 2022.

    Journal ref: Comput. Electron. Agric. 198 (2022)

  42. arXiv:2207.10897  [pdf, other

    cs.CV

    Efficient Modeling of Future Context for Image Captioning

    Authors: Zhengcong Fei, Junshi Huang, Xiaoming Wei, Xiaolin Wei

    Abstract: Existing approaches to image captioning usually generate the sentence word-by-word from left to right, with the constraint of conditioned on local context including the given image and history generated words. There have been many studies target to make use of global information during decoding, e.g., iterative refinement. However, it is still under-explored how to effectively and efficiently inco… ▽ More

    Submitted 18 October, 2022; v1 submitted 22 July, 2022; originally announced July 2022.

    Comments: ACM Multimedia 2022

  43. arXiv:2207.04363  [pdf, other

    cs.IT eess.SP

    Secure UAV-to-Ground MIMO Communications: Joint Transceiver and Location Optimization

    Authors: Zhong Zheng, Xinyao Wang, Zesong Fei, Qingqing Wu, Bin Li, Lajos Hanzo

    Abstract: Unmanned aerial vehicles (UAVs) are foreseen to constitute promising airborne communication devices as a benefit of their superior channel quality. But UAV-to-ground (U2G) communications are vulnerable to eavesdropping. Hence, we conceive a sophisticated physical layer security solution for improving the secrecy rate of multi-antenna aided U2G systems. Explicitly, the secrecy rate of the U2G MIMO… ▽ More

    Submitted 9 July, 2022; originally announced July 2022.

    Comments: 15 pages, 11 figures. To appear in IEEE Transactions on Vehicular Technology

  44. arXiv:2206.00806  [pdf, other

    cs.CV cs.AI

    XBound-Former: Toward Cross-scale Boundary Modeling in Transformers

    Authors: Jiacheng Wang, Fei Chen, Yuxi Ma, Liansheng Wang, Zhaodong Fei, Jianwei Shuai, Xiangdong Tang, Qichao Zhou, Jing Qin

    Abstract: Skin lesion segmentation from dermoscopy images is of great significance in the quantitative analysis of skin cancers, which is yet challenging even for dermatologists due to the inherent issues, i.e., considerable size, shape and color variation, and ambiguous boundaries. Recent vision transformers have shown promising performance in handling the variation through global context modeling. Still,… ▽ More

    Submitted 1 June, 2022; originally announced June 2022.

    Comments: https://github.com/jcwang123/xboundformer

  45. arXiv:2111.14623  [pdf, other

    cs.LG cs.CY stat.AP

    An Overview of Healthcare Data Analytics With Applications to the COVID-19 Pandemic

    Authors: Zhe Fei, Yevgen Ryeznik, Oleksandr Sverdlov, Chee Wei Tan, Weng Kee Wong

    Abstract: In the era of big data, standard analysis tools may be inadequate for making inference and there is a growing need for more efficient and innovative ways to collect, process, analyze and interpret the massive and complex data. We provide an overview of challenges in big data problems and describe how innovative analytical methods, machine learning tools and metaheuristics can tackle general health… ▽ More

    Submitted 25 November, 2021; originally announced November 2021.

    Journal ref: IEEE TRANSACTIONS ON BIG DATA, 12 August 2021

  46. arXiv:2111.10146  [pdf, other

    cs.CV

    DVCFlow: Modeling Information Flow Towards Human-like Video Captioning

    Authors: Xu Yan, Zhengcong Fei, Shuhui Wang, Qingming Huang, Qi Tian

    Abstract: Dense video captioning (DVC) aims to generate multi-sentence descriptions to elucidate the multiple events in the video, which is challenging and demands visual consistency, discoursal coherence, and linguistic diversity. Existing methods mainly generate captions from individual video segments, lacking adaptation to the global visual context and progressive alignment between the fast-evolved visua… ▽ More

    Submitted 19 November, 2021; originally announced November 2021.

  47. arXiv:2111.05066  [pdf

    cs.CV

    Deep Convolution Network Based Emotion Analysis for Automatic Detection of Mild Cognitive Impairment in the Elderly

    Authors: Zixiang Fei, Erfu Yang, Leijian Yu, Xia Li, Huiyu Zhou, Wenju Zhou

    Abstract: A significant number of people are suffering from cognitive impairment all over the world. Early detection of cognitive impairment is of great importance to both patients and caregivers. However, existing approaches have their shortages, such as time consumption and financial expenses involved in clinics and the neuroimaging stage. It has been found that patients with cognitive impairment show abn… ▽ More

    Submitted 9 November, 2021; originally announced November 2021.

    Comments: 17 pages

  48. arXiv:2111.01988  [pdf, other

    cs.IT

    Belief Propagation based Joint Detection and Decoding for Resistive Random Access Memories

    Authors: Ce Sun, Kui Cai, Guanghui Song, Tony Q. S. Quek, Zesong Fei

    Abstract: Despite the great promises that the resistive random access memory (ReRAM) has shown as the next generation of non-volatile memory technology, its crossbar array structure leads to a severe sneak path interference to the signal read back from the memory cell. In this paper, we first propose a novel belief propagation (BP) based detector for the sneak path interference in ReRAM. Based on the condit… ▽ More

    Submitted 3 November, 2021; v1 submitted 2 November, 2021; originally announced November 2021.

    Comments: 34 pages, 17 figures

  49. arXiv:2110.07431  [pdf, other

    cs.CL

    Towards More Effective and Economic Sparsely-Activated Model

    Authors: Hao Jiang, Ke Zhan, Jianwei Qu, Yongkang Wu, Zhaoye Fei, Xinyu Zhang, Lei Chen, Zhicheng Dou, Xipeng Qiu, Zikai Guo, Ruofei Lai, Jiawen Wu, Enrui Hu, Yinxia Zhang, Yantao Jia, Fan Yu, Zhao Cao

    Abstract: The sparsely-activated models have achieved great success in natural language processing through large-scale parameters and relatively low computational cost, and gradually become a feasible technique for training and implementing extremely large models. Due to the limit of communication cost, activating multiple experts is hardly affordable during training and inference. Therefore, previous work… ▽ More

    Submitted 14 October, 2021; originally announced October 2021.

  50. arXiv:2110.05342  [pdf, other

    cs.CV

    Semi-Autoregressive Image Captioning

    Authors: Xu Yan, Zhengcong Fei, Zekang Li, Shuhui Wang, Qingming Huang, Qi Tian

    Abstract: Current state-of-the-art approaches for image captioning typically adopt an autoregressive manner, i.e., generating descriptions word by word, which suffers from slow decoding issue and becomes a bottleneck in real-time applications. Non-autoregressive image captioning with continuous iterative refinement, which eliminates the sequential dependence in a sentence generation, can achieve comparable… ▽ More

    Submitted 13 October, 2021; v1 submitted 11 October, 2021; originally announced October 2021.

    Comments: ACM MM2021 Oral