Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 1,946 results for author: Yang, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.04242  [pdf, other

    cs.CV

    Fine-grained Context and Multi-modal Alignment for Freehand 3D Ultrasound Reconstruction

    Authors: Zhongnuo Yan, Xin Yang, Mingyuan Luo, Jiongquan Chen, Rusi Chen, Lian Liu, Dong Ni

    Abstract: Fine-grained spatio-temporal learning is crucial for freehand 3D ultrasound reconstruction. Previous works mainly resorted to the coarse-grained spatial features and the separated temporal dependency learning and struggles for fine-grained spatio-temporal learning. Mining spatio-temporal information in fine-grained scales is extremely challenging due to learning difficulties in long-range dependen… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Accepted at MICCAI 2024. This is the submitted manuscript and the preprint has not undergone peer review (when applicable) or any post-submission improvements or corrections

  2. arXiv:2407.03695  [pdf, other

    cs.CV

    M^3:Manipulation Mask Manufacturer for Arbitrary-Scale Super-Resolution Mask

    Authors: Xinyu Yang, Xiaochen Ma, Xuekang Zhu, Bo Du, Lei Su, Bingkui Tong, Zeyu Lei, Jizhe Zhou

    Abstract: In the field of image manipulation localization (IML), the small quantity and poor quality of existing datasets have always been major issues. A dataset containing various types of manipulations will greatly help improve the accuracy of IML models. Images on the internet (such as those on Baidu Tieba's PS Bar) are manipulated using various techniques, and creating a dataset from these images will… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  3. arXiv:2407.03590  [pdf, other

    cs.RO

    A Fast Dynamic Point Detection Method for LiDAR-Inertial Odometry in Driving Scenarios

    Authors: Zikang Yuan, Xiaoxiang Wang, Jingying Wu, Junda Cheng, Xin Yang

    Abstract: Existing 3D point-based dynamic point detection and removal methods have a significant time overhead, making them difficult to adapt to LiDAR-inertial odometry systems. This paper proposes a label consistency based dynamic point detection and removal method for handling moving vehicles and pedestrians in autonomous driving scenarios, and embeds the proposed dynamic point detection and removal meth… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 8 pages, submitted to RA-L

  4. arXiv:2407.02846  [pdf, other

    cs.CV

    Multi-Task Domain Adaptation for Language Grounding with 3D Objects

    Authors: Penglei Sun, Yaoxian Song, Xinglin Pan, Peijie Dong, Xiaofei Yang, Qiang Wang, Zhixu Li, Tiefeng Li, Xiaowen Chu

    Abstract: The existing works on object-level language grounding with 3D objects mostly focus on improving performance by utilizing the off-the-shelf pre-trained models to capture features, such as viewpoint selection or geometric priors. However, they have failed to consider exploring the cross-modal representation of language-vision alignment in the cross-domain field. To answer this problem, we propose a… ▽ More

    Submitted 5 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

  5. arXiv:2407.02616  [pdf

    eess.IV cs.CV

    Deep Learning Based Apparent Diffusion Coefficient Map Generation from Multi-parametric MR Images for Patients with Diffuse Gliomas

    Authors: Zach Eidex, Mojtaba Safari, Jacob Wynne, Richard L. J. Qiu, Tonghe Wang, David Viar Hernandez, Hui-Kuo Shu, Hui Mao, Xiaofeng Yang

    Abstract: Purpose: Apparent diffusion coefficient (ADC) maps derived from diffusion weighted (DWI) MRI provides functional measurements about the water molecules in tissues. However, DWI is time consuming and very susceptible to image artifacts, leading to inaccurate ADC measurements. This study aims to develop a deep learning framework to synthesize ADC maps from multi-parametric MR images. Methods: We pro… ▽ More

    Submitted 4 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2311.15044

  6. Parameter Tuning of the Firefly Algorithm by Standard Monte Carlo and Quasi-Monte Carlo Methods

    Authors: Geethu Joy, Christian Huyck, Xin-She Yang

    Abstract: Almost all optimization algorithms have algorithm-dependent parameters, and the setting of such parameter values can significantly influence the behavior of the algorithm under consideration. Thus, proper parameter tuning should be carried out to ensure that the algorithm used for optimization performs well and is sufficiently robust for solving different types of optimization problems. In this st… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: International Conference on Computational Science (ICCS2024)

    MSC Class: 68T20; 68W50

  7. arXiv:2407.02505  [pdf, other

    cs.CE cs.LG physics.flu-dyn

    A MgNO Method for Multiphase Flow in Porous Media

    Authors: Xinliang Liu, Xia Yang, Chen-Song Zhang, Lian Zhang, Li Zhao

    Abstract: This research investigates the application of Multigrid Neural Operator (MgNO), a neural operator architecture inspired by multigrid methods, in the simulation for multiphase flow within porous media. The architecture is adjusted to manage a variety of crucial factors, such as permeability and porosity heterogeneity. The study extendes MgNO to time-dependent porous media flow problems and validate… ▽ More

    Submitted 16 June, 2024; originally announced July 2024.

  8. arXiv:2407.02280  [pdf, other

    cs.CV cs.AI

    FedIA: Federated Medical Image Segmentation with Heterogeneous Annotation Completeness

    Authors: Yangyang Xiang, Nannan Wu, Li Yu, Xin Yang, Kwang-Ting Cheng, Zengqiang Yan

    Abstract: Federated learning has emerged as a compelling paradigm for medical image segmentation, particularly in light of increasing privacy concerns. However, most of the existing research relies on relatively stringent assumptions regarding the uniformity and completeness of annotations across clients. Contrary to this, this paper highlights a prevalent challenge in medical practice: incomplete annotatio… ▽ More

    Submitted 3 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

    Comments: Early accepted by MICCAI 2024

  9. A Generalized Evolutionary Metaheuristic (GEM) Algorithm for Engineering Optimization

    Authors: Xin-She Yang

    Abstract: Many optimization problems in engineering and industrial design applications can be formulated as optimization problems with highly nonlinear objectives, subject to multiple complex constraints. Solving such optimization problems requires sophisticated algorithms and optimization techniques. A major trend in recent years is the use of nature-inspired metaheustic algorithms (NIMA). Despite the popu… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 17 pages, 2 figures and 4 tables

    MSC Class: 68T20; 90C26

    Journal ref: Cogent Engineering, vol. 11, no. 1, (2024)

  10. arXiv:2407.02098  [pdf, other

    cs.CV

    DM3D: Distortion-Minimized Weight Pruning for Lossless 3D Object Detection

    Authors: Kaixin Xu, Qingtian Feng, Hao Chen, Zhe Wang, Xue Geng, Xulei Yang, Min Wu, Xiaoli Li, Weisi Lin

    Abstract: Applying deep neural networks to 3D point cloud processing has attracted increasing attention due to its advanced performance in many areas, such as AR/VR, autonomous driving, and robotics. However, as neural network models and 3D point clouds expand in size, it becomes a crucial challenge to reduce the computational and memory overhead to meet latency and energy constraints in real-world applicat… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  11. arXiv:2407.02068  [pdf, other

    cs.CV

    LPViT: Low-Power Semi-structured Pruning for Vision Transformers

    Authors: Kaixin Xu, Zhe Wang, Chunyun Chen, Xue Geng, Jie Lin, Xulei Yang, Min Wu, Xiaoli Li, Weisi Lin

    Abstract: Vision transformers have emerged as a promising alternative to convolutional neural networks for various image analysis tasks, offering comparable or superior performance. However, one significant drawback of ViTs is their resource-intensive nature, leading to increased memory footprint, computation complexity, and power consumption. To democratize this high-performance technology and make it more… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  12. arXiv:2407.02034  [pdf, other

    cs.CV

    TrAME: Trajectory-Anchored Multi-View Editing for Text-Guided 3D Gaussian Splatting Manipulation

    Authors: Chaofan Luo, Donglin Di, Yongjia Ma, Zhou Xue, Chen Wei, Xun Yang, Yebin Liu

    Abstract: Despite significant strides in the field of 3D scene editing, current methods encounter substantial challenge, particularly in preserving 3D consistency in multi-view editing process. To tackle this challenge, we propose a progressive 3D editing strategy that ensures multi-view consistency via a Trajectory-Anchored Scheme (TAS) with a dual-branch editing mechanism. Specifically, TAS facilitates a… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  13. arXiv:2407.01899  [pdf, other

    cs.CL

    Scope-enhanced Compositional Semantic Parsing for DRT

    Authors: Xiulin Yang, Jonas Groschwitz, Alexander Koller, Johan Bos

    Abstract: Discourse Representation Theory (DRT) distinguishes itself from other semantic representation frameworks by its ability to model complex semantic and discourse phenomena through structural nesting and variable binding. While seq2seq models hold the state of the art on DRT parsing, their accuracy degrades with the complexity of the sentence, and they sometimes struggle to produce well-formed DRT re… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  14. arXiv:2407.01837  [pdf, ps, other

    stat.ML cs.IT cs.LG

    To Switch or Not to Switch? Balanced Policy Switching in Offline Reinforcement Learning

    Authors: Tao Ma, Xuzhi Yang, Zoltan Szabo

    Abstract: Reinforcement learning (RL) -- finding the optimal behaviour (also referred to as policy) maximizing the collected long-term cumulative reward -- is among the most influential approaches in machine learning with a large number of successful applications. In several decision problems, however, one faces the possibility of policy switching -- changing from the current policy to a new one -- which in… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  15. arXiv:2407.00467  [pdf, other

    cs.LG cs.DC eess.IV

    VcLLM: Video Codecs are Secretly Tensor Codecs

    Authors: Ceyu Xu, Yongji Wu, Xinyu Yang, Beidi Chen, Matthew Lentz, Danyang Zhuo, Lisa Wu Wills

    Abstract: As the parameter size of large language models (LLMs) continues to expand, the need for a large memory footprint and high communication bandwidth have become significant bottlenecks for the training and inference of LLMs. To mitigate these bottlenecks, various tensor compression techniques have been proposed to reduce the data size, thereby alleviating memory requirements and communication pressur… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  16. Personalized Federated Continual Learning via Multi-granularity Prompt

    Authors: Hao Yu, Xin Yang, Xin Gao, Yan Kang, Hao Wang, Junbo Zhang, Tianrui Li

    Abstract: Personalized Federated Continual Learning (PFCL) is a new practical scenario that poses greater challenges in sharing and personalizing knowledge. PFCL not only relies on knowledge fusion for server aggregation at the global spatial-temporal perspective but also needs model improvement for each client according to the local requirements. Existing methods, whether in Personalized Federated Learning… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

    Comments: Accepted by KDD 2024 Research Track

  17. arXiv:2406.18995  [pdf, other

    cs.LG cs.AI

    FedMLP: Federated Multi-Label Medical Image Classification under Task Heterogeneity

    Authors: Zhaobin Sun, Nannan Wu, Junjie Shi, Li Yu, Xin Yang, Kwang-Ting Cheng, Zengqiang Yan

    Abstract: Cross-silo federated learning (FL) enables decentralized organizations to collaboratively train models while preserving data privacy and has made significant progress in medical image classification. One common assumption is task homogeneity where each client has access to all classes during training. However, in clinical practice, given a multi-label classification task, constrained by the level… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: Early accepted by MICCAI 2024

  18. arXiv:2406.17507  [pdf, other

    cs.IR

    ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling

    Authors: Minghui Fang, Shengpeng Ji, Jialong Zuo, Hai Huang, Yan Xia, Jieming Zhu, Xize Cheng, Xiaoda Yang, Wenrui Liu, Gang Wang, Zhenhua Dong, Zhou Zhao

    Abstract: Generative retrieval, which has demonstrated effectiveness in text-to-text retrieval, utilizes a sequence-to-sequence model to directly generate candidate identifiers based on natural language queries. Without explicitly computing the similarity between queries and candidates, generative retrieval surpasses dual-tower models in both speed and accuracy on large-scale corpora, providing new insights… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  19. arXiv:2406.17309  [pdf, other

    cs.CV

    Zero-Shot Long-Form Video Understanding through Screenplay

    Authors: Yongliang Wu, Bozheng Li, Jiawang Cao, Wenbo Zhu, Yi Lu, Weiheng Chi, Chuyun Xie, Haolin Zheng, Ziyue Su, Jay Wu, Xu Yang

    Abstract: The Long-form Video Question-Answering task requires the comprehension and analysis of extended video content to respond accurately to questions by utilizing both temporal and contextual information. In this paper, we present MM-Screenplayer, an advanced video understanding system with multi-modal perception capabilities that can convert any video into textual screenplay representations. Unlike pr… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Highest Score Award to the CVPR'2024 LOVEU Track 1 Challenge

  20. arXiv:2406.17289  [pdf, other

    cs.IR cs.AI

    Hyperbolic Knowledge Transfer in Cross-Domain Recommendation System

    Authors: Xin Yang, Heng Chang, Zhijian Lai, Jinze Yang, Xingrun Li, Yu Lu, Shuaiqiang Wang, Dawei Yin, Erxue Min

    Abstract: Cross-Domain Recommendation (CDR) seeks to utilize knowledge from different domains to alleviate the problem of data sparsity in the target recommendation domain, and it has been gaining more attention in recent years. Although there have been notable advancements in this area, most current methods represent users and items in Euclidean space, which is not ideal for handling long-tail distributed… ▽ More

    Submitted 4 July, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

  21. arXiv:2406.17219  [pdf, other

    cs.CV

    Facial Identity Anonymization via Intrinsic and Extrinsic Attention Distraction

    Authors: Zhenzhong Kuang, Xiaochen Yang, Yingjie Shen, Chao Hu, Jun Yu

    Abstract: The unprecedented capture and application of face images raise increasing concerns on anonymization to fight against privacy disclosure. Most existing methods may suffer from the problem of excessive change of the identity-independent information or insufficient identity protection. In this paper, we present a new face anonymization approach by distracting the intrinsic and extrinsic identity atte… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024: 12406-12415

  22. arXiv:2406.16986  [pdf, ps, other

    cs.LG cs.AI cs.CR

    Machine Unlearning with Minimal Gradient Dependence for High Unlearning Ratios

    Authors: Tao Huang, Ziyang Chen, Jiayang Meng, Qingyu Huang, Xu Yang, Xun Yi, Ibrahim Khalil

    Abstract: In the context of machine unlearning, the primary challenge lies in effectively removing traces of private data from trained models while maintaining model performance and security against privacy attacks like membership inference attacks. Traditional gradient-based unlearning methods often rely on extensive historical gradients, which becomes impractical with high unlearning ratios and may reduce… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  23. arXiv:2406.16562  [pdf, other

    cs.CV cs.CL

    EVALALIGN: Supervised Fine-Tuning Multimodal LLMs with Human-Aligned Data for Evaluating Text-to-Image Models

    Authors: Zhiyu Tan, Xiaomeng Yang, Luozheng Qin, Mengping Yang, Cheng Zhang, Hao Li

    Abstract: The recent advancements in text-to-image generative models have been remarkable. Yet, the field suffers from a lack of evaluation metrics that accurately reflect the performance of these models, particularly lacking fine-grained metrics that can guide the optimization of the models. In this paper, we propose EvalAlign, a metric characterized by its accuracy, stability, and fine granularity. Our ap… ▽ More

    Submitted 26 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

    Comments: Github Repository: https://github.com/SAIS-FUXI/EvalAlign

  24. arXiv:2406.16260  [pdf, other

    cs.CV cs.AI

    Video-Infinity: Distributed Long Video Generation

    Authors: Zhenxiong Tan, Xingyi Yang, Songhua Liu, Xinchao Wang

    Abstract: Diffusion models have recently achieved remarkable results for video generation. Despite the encouraging performances, the generated videos are typically constrained to a small number of frames, resulting in clips lasting merely a few seconds. The primary challenges in producing longer videos include the substantial memory requirements and the extended processing time required on a single GPU. A s… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  25. arXiv:2406.16170  [pdf, other

    cs.IR cs.AI

    SimCE: Simplifying Cross-Entropy Loss for Collaborative Filtering

    Authors: Xiaodong Yang, Huiyuan Chen, Yuchen Yan, Yuxin Tang, Yuying Zhao, Eric Xu, Yiwei Cai, Hanghang Tong

    Abstract: The learning objective is integral to collaborative filtering systems, where the Bayesian Personalized Ranking (BPR) loss is widely used for learning informative backbones. However, BPR often experiences slow convergence and suboptimal local optima, partially because it only considers one negative item for each positive item, neglecting the potential impacts of other unobserved items. To address t… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  26. arXiv:2406.15743  [pdf, other

    cs.SE

    CasModaTest: A Cascaded and Model-agnostic Self-directed Framework for Unit Test Generation

    Authors: Chao Ni, Xiaoya Wang, Liushan Chen, Dehai Zhao, Zhengong Cai, Shaohua Wang, Xiaohu Yang

    Abstract: Though many machine learning (ML)-based unit testing generation approaches have been proposed and indeed achieved remarkable performance, they still have several limitations in effectiveness and practical usage. More precisely, existing ML-based approaches (1) generate partial content of a unit test, mainly focusing on test oracle generation; (2) mismatch the test prefix with the test oracle seman… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: 14 pages, 7 figures

  27. arXiv:2406.15713  [pdf, other

    math.OC cs.LG

    Efficient Low-rank Identification via Accelerated Iteratively Reweighted Nuclear Norm Minimization

    Authors: Hao Wang, Ye Wang, Xiangyu Yang

    Abstract: This paper considers the problem of minimizing the sum of a smooth function and the Schatten-$p$ norm of the matrix. Our contribution involves proposing accelerated iteratively reweighted nuclear norm methods designed for solving the nonconvex low-rank minimization problem. Two major novelties characterize our approach. Firstly, the proposed method possesses a rank identification property, enablin… ▽ More

    Submitted 26 June, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

    Comments: Copyright may be transferred without notice, after which this version may no longer be accessible

  28. arXiv:2406.15656  [pdf, other

    eess.IV cs.CV

    Adaptive Self-Supervised Consistency-Guided Diffusion Model for Accelerated MRI Reconstruction

    Authors: Mojtaba Safari, Zach Eidex, Shaoyan Pan, Richard L. J. Qiu, Xiaofeng Yang

    Abstract: Purpose: To propose a self-supervised deep learning-based compressed sensing MRI (DL-based CS-MRI) method named "Adaptive Self-Supervised Consistency Guided Diffusion Model (ASSCGD)" to accelerate data acquisition without requiring fully sampled datasets. Materials and Methods: We used the fastMRI multi-coil brain axial T2-weighted (T2-w) dataset from 1,376 cases and single-coil brain quantitative… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  29. arXiv:2406.14711  [pdf, other

    cs.CL cs.AI cs.MA

    MultiAgent Collaboration Attack: Investigating Adversarial Attacks in Large Language Model Collaborations via Debate

    Authors: Alfonso Amayuelas, Xianjun Yang, Antonis Antoniades, Wenyue Hua, Liangming Pan, William Wang

    Abstract: Large Language Models (LLMs) have shown exceptional results on current benchmarks when working individually. The advancement in their capabilities, along with a reduction in parameter size and inference times, has facilitated the use of these models as agents, enabling interactions among multiple models to execute complex tasks. Such collaborations offer several advantages, including the use of sp… ▽ More

    Submitted 26 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

  30. arXiv:2406.14232  [pdf, other

    cs.LG cs.AI

    Enhancing robustness of data-driven SHM models: adversarial training with circle loss

    Authors: Xiangli Yang, Xijie Deng, Hanwei Zhang, Yang Zou, Jianxi Yang

    Abstract: Structural health monitoring (SHM) is critical to safeguarding the safety and reliability of aerospace, civil, and mechanical infrastructure. Machine learning-based data-driven approaches have gained popularity in SHM due to advancements in sensors and computational power. However, machine learning models used in SHM are vulnerable to adversarial examples -- even small changes in input can lead to… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 12 pages, 9 figures

  31. arXiv:2406.13963  [pdf, ps, other

    cs.CV

    SSAD: Self-supervised Auxiliary Detection Framework for Panoramic X-ray based Dental Disease Diagnosis

    Authors: Zijian Cai, Xinquan Yang, Xuguang Li, Xiaoling Luo, Xuechen Li, Linlin Shen, He Meng, Yongqiang Deng

    Abstract: Panoramic X-ray is a simple and effective tool for diagnosing dental diseases in clinical practice. When deep learning models are developed to assist dentist in interpreting panoramic X-rays, most of their performance suffers from the limited annotated data, which requires dentist's expertise and a lot of time cost. Although self-supervised learning (SSL) has been proposed to address this challeng… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  32. arXiv:2406.13294  [pdf, other

    cs.MM cs.LG

    Enhancing Cross-Prompt Transferability in Vision-Language Models through Contextual Injection of Target Tokens

    Authors: Xikang Yang, Xuehai Tang, Fuqing Zhu, Jizhong Han, Songlin Hu

    Abstract: Vision-language models (VLMs) seamlessly integrate visual and textual data to perform tasks such as image classification, caption generation, and visual question answering. However, adversarial images often struggle to deceive all prompts effectively in the context of cross-prompt migration attacks, as the probability distribution of the tokens in these images tends to favor the semantics of the o… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 13 pages

  33. arXiv:2406.13185  [pdf, other

    cs.CL

    Learnable In-Context Vector for Visual Question Answering

    Authors: Yingzhe Peng, Chenduo Hao, Xu Yang, Jiawei Peng, Xinting Hu, Xin Geng

    Abstract: As language models continue to scale, Large Language Models (LLMs) have exhibited emerging capabilities in In-Context Learning (ICL), enabling them to solve language tasks by prefixing a few in-context demonstrations (ICDs) as context. Inspired by these advancements, researchers have extended these techniques to develop Large Multimodal Models (LMMs) with ICL capabilities. However, applying ICL us… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  34. arXiv:2406.13170  [pdf, other

    cs.AI cs.CL

    Amphista: Accelerate LLM Inference with Bi-directional Multiple Drafting Heads in a Non-autoregressive Style

    Authors: Zeping Li, Xinlong Yang, Ziheng Gao, Ji Liu, Zhuang Liu, Dong Li, Jinzhang Peng, Lu Tian, Emad Barsoum

    Abstract: Large Language Models (LLMs) inherently use autoregressive decoding, which lacks parallelism in inference and results in significantly slow inference speeds, especially when hardware parallel accelerators and memory bandwidth are not fully utilized. In this work, we propose Amphista, a speculative decoding algorithm that adheres to a non-autoregressive decoding paradigm. Owing to the increased par… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  35. arXiv:2406.12793  [pdf, other

    cs.CL

    ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools

    Authors: Team GLM, :, Aohan Zeng, Bin Xu, Bowen Wang, Chenhui Zhang, Da Yin, Diego Rojas, Guanyu Feng, Hanlin Zhao, Hanyu Lai, Hao Yu, Hongning Wang, Jiadai Sun, Jiajie Zhang, Jiale Cheng, Jiayi Gui, Jie Tang, Jing Zhang, Juanzi Li, Lei Zhao, Lindong Wu, Lucen Zhong, Mingdao Liu, Minlie Huang , et al. (32 additional authors not shown)

    Abstract: We introduce ChatGLM, an evolving family of large language models that we have been developing over time. This report primarily focuses on the GLM-4 language series, which includes GLM-4, GLM-4-Air, and GLM-4-9B. They represent our most capable models that are trained with all the insights and lessons gained from the preceding three generations of ChatGLM. To date, the GLM-4 models are pre-trained… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  36. arXiv:2406.12769  [pdf, other

    cs.AI cs.CV

    Latent Intuitive Physics: Learning to Transfer Hidden Physics from A 3D Video

    Authors: Xiangming Zhu, Huayu Deng, Haochen Yuan, Yunbo Wang, Xiaokang Yang

    Abstract: We introduce latent intuitive physics, a transfer learning framework for physics simulation that can infer hidden properties of fluids from a single 3D video and simulate the observed fluid in novel scenes. Our key insight is to use latent features drawn from a learnable prior distribution conditioned on the underlying particle states to capture the invisible and complex physical properties. To ac… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Published as a conference paper at ICLR 2024

    Journal ref: ICLR 2024

  37. MegaVul: A C/C++ Vulnerability Dataset with Comprehensive Code Representation

    Authors: Chao Ni, Liyu Shen, Xiaohu Yang, Yan Zhu, Shaohua Wang

    Abstract: We constructed a newly large-scale and comprehensive C/C++ vulnerability dataset named MegaVul by crawling the Common Vulnerabilities and Exposures (CVE) database and CVE-related open-source projects. Specifically, we collected all crawlable descriptive information of the vulnerabilities from the CVE database and extracted all vulnerability-related code changes from 28 Git-based websites. We adopt… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 5 pages, 4figures

  38. arXiv:2406.12186  [pdf, ps, other

    eess.IV cs.CV

    Unlocking the Potential of Early Epochs: Uncertainty-aware CT Metal Artifact Reduction

    Authors: Xinquan Yang, Guanqun Zhou, Wei Sun, Youjian Zhang, Zhongya Wang, Jiahui He, Zhicheng Zhang

    Abstract: In computed tomography (CT), the presence of metallic implants in patients often leads to disruptive artifacts in the reconstructed images, hindering accurate diagnosis. Recently, a large amount of supervised deep learning-based approaches have been proposed for metal artifact reduction (MAR). However, these methods neglect the influence of initial training weights. In this paper, we have discover… ▽ More

    Submitted 20 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  39. arXiv:2406.11131  [pdf, other

    cs.CL cs.AI cs.DB

    Are Large Language Models a Good Replacement of Taxonomies?

    Authors: Yushi Sun, Hao Xin, Kai Sun, Yifan Ethan Xu, Xiao Yang, Xin Luna Dong, Nan Tang, Lei Chen

    Abstract: Large language models (LLMs) demonstrate an impressive ability to internalize knowledge and answer natural language questions. Although previous studies validate that LLMs perform well on general knowledge while presenting poor performance on long-tail nuanced knowledge, the community is still doubtful about whether the traditional knowledge graphs should be replaced by LLMs. In this paper, we ask… ▽ More

    Submitted 20 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

    Comments: Accepted by VLDB 2024

  40. arXiv:2406.10580  [pdf, other

    cs.CV

    IMDL-BenCo: A Comprehensive Benchmark and Codebase for Image Manipulation Detection & Localization

    Authors: Xiaochen Ma, Xuekang Zhu, Lei Su, Bo Du, Zhuohang Jiang, Bingkui Tong, Zeyu Lei, Xinyu Yang, Chi-Man Pun, Jiancheng Lv, Jizhe Zhou

    Abstract: A comprehensive benchmark is yet to be established in the Image Manipulation Detection \& Localization (IMDL) field. The absence of such a benchmark leads to insufficient and misleading model evaluations, severely undermining the development of this field. However, the scarcity of open-sourced baseline models and inconsistent training and evaluation protocols make conducting rigorous experiments a… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: Technical report

  41. arXiv:2406.10573  [pdf, other

    cs.LG cs.AI cs.CR

    Graph Neural Backdoor: Fundamentals, Methodologies, Applications, and Future Directions

    Authors: Xiao Yang, Gaolei Li, Jianhua Li

    Abstract: Graph Neural Networks (GNNs) have significantly advanced various downstream graph-relevant tasks, encompassing recommender systems, molecular structure prediction, social media analysis, etc. Despite the boosts of GNN, recent research has empirically demonstrated its potential vulnerability to backdoor attacks, wherein adversaries employ triggers to poison input samples, inducing GNN to adversary-… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  42. arXiv:2406.10511  [pdf, other

    cs.DC cs.AR cs.PF math.NA

    Efficient Hardware Accelerator Based on Medium Granularity Dataflow for SpTRSV

    Authors: Qian Chen, Xiaofeng Yang, Shengli Lu

    Abstract: Sparse triangular solve (SpTRSV) is widely used in various domains. Numerous studies have been conducted using CPUs, GPUs, and specific hardware accelerators, where dataflow can be categorized into coarse and fine granularity. Coarse dataflow offers good spatial locality but suffers from low parallelism, while fine dataflow provides high parallelism but disrupts the spatial structure, leading to i… ▽ More

    Submitted 27 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

  43. arXiv:2406.09442  [pdf

    physics.med-ph cs.LG physics.app-ph physics.bio-ph

    An insertable glucose sensor using a compact and cost-effective phosphorescence lifetime imager and machine learning

    Authors: Artem Goncharov, Zoltan Gorocs, Ridhi Pradhan, Brian Ko, Ajmal Ajmal, Andres Rodriguez, David Baum, Marcell Veszpremi, Xilin Yang, Maxime Pindrys, Tianle Zheng, Oliver Wang, Jessica C. Ramella-Roman, Michael J. McShane, Aydogan Ozcan

    Abstract: Optical continuous glucose monitoring (CGM) systems are emerging for personalized glucose management owing to their lower cost and prolonged durability compared to conventional electrochemical CGMs. Here, we report a computational CGM system, which integrates a biocompatible phosphorescence-based insertable biosensor and a custom-designed phosphorescence lifetime imager (PLI). This compact and cos… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 24 Pages, 4 Figures

  44. arXiv:2406.09410  [pdf, other

    cs.CV cs.AI

    STAR: A First-Ever Dataset and A Large-Scale Benchmark for Scene Graph Generation in Large-Size Satellite Imagery

    Authors: Yansheng Li, Linlin Wang, Tingzhu Wang, Xue Yang, Junwei Luo, Qi Wang, Youming Deng, Wenbin Wang, Xian Sun, Haifeng Li, Bo Dang, Yongjun Zhang, Yi Yu, Junchi Yan

    Abstract: Scene graph generation (SGG) in satellite imagery (SAI) benefits promoting understanding of geospatial scenarios from perception to cognition. In SAI, objects exhibit great variations in scales and aspect ratios, and there exist rich relationships between objects (even between spatially disjoint objects), which makes it attractive to holistically conduct SGG in large-size very-high-resolution (VHR… ▽ More

    Submitted 3 July, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: 18 pages, 11 figures

  45. arXiv:2406.09385  [pdf, other

    cs.CV

    Towards Vision-Language Geo-Foundation Model: A Survey

    Authors: Yue Zhou, Litong Feng, Yiping Ke, Xue Jiang, Junchi Yan, Xue Yang, Wayne Zhang

    Abstract: Vision-Language Foundation Models (VLFMs) have made remarkable progress on various multimodal tasks, such as image captioning, image-text retrieval, visual question answering, and visual grounding. However, most methods rely on training with general image datasets, and the lack of geospatial data leads to poor performance on earth observation. Numerous geospatial image-text pair datasets and VLFMs… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 18 pages, 4 figures

  46. arXiv:2406.09178  [pdf, other

    cs.RO

    AutomaChef: A Physics-informed Demonstration-guided Learning Framework for Granular Material Manipulation

    Authors: Minglun Wei, Xintong Yang, Yu-Kun Lai, Seyed Amir Tafrishi, Ze Ji

    Abstract: Due to the complex physical properties of granular materials, research on robot learning for manipulating such materials predominantly either disregards the consideration of their physical characteristics or uses surrogate models to approximate their physical properties. Learning to manipulate granular materials based on physical information obtained through precise modelling remains an unsolved p… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 8 pages

  47. arXiv:2406.09008  [pdf, other

    cs.CL

    LLM Reading Tea Leaves: Automatically Evaluating Topic Models with Large Language Models

    Authors: Xiaohao Yang, He Zhao, Dinh Phung, Wray Buntine, Lan Du

    Abstract: Topic modeling has been a widely used tool for unsupervised text analysis. However, comprehensive evaluations of a topic model remain challenging. Existing evaluation methods are either less comparable across different models (e.g., perplexity) or focus on only one specific aspect of a model (e.g., topic quality or document representation quality) at a time, which is insufficient to reflect the ov… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  48. arXiv:2406.08713  [pdf, other

    cs.AI cs.CV

    Batch-Instructed Gradient for Prompt Evolution:Systematic Prompt Optimization for Enhanced Text-to-Image Synthesis

    Authors: Xinrui Yang, Zhuohan Wang, Anthony Hu

    Abstract: Text-to-image models have shown remarkable progress in generating high-quality images from user-provided prompts. Despite this, the quality of these images varies due to the models' sensitivity to human language nuances. With advancements in large language models, there are new opportunities to enhance prompt design for image generation tasks. Existing research primarily focuses on optimizing prom… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  49. arXiv:2406.08413  [pdf, other

    cs.AR cs.LG

    Memory Is All You Need: An Overview of Compute-in-Memory Architectures for Accelerating Large Language Model Inference

    Authors: Christopher Wolters, Xiaoxuan Yang, Ulf Schlichtmann, Toyotaro Suzumura

    Abstract: Large language models (LLMs) have recently transformed natural language processing, enabling machines to generate human-like text and engage in meaningful conversations. This development necessitates speed, efficiency, and accessibility in LLM inference as the computational and memory requirements of these systems grow exponentially. Meanwhile, advancements in computing and memory capabilities are… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  50. arXiv:2406.08037  [pdf, other

    cs.CV

    Adaptively Bypassing Vision Transformer Blocks for Efficient Visual Tracking

    Authors: Xiangyang Yang, Dan Zeng, Xucheng Wang, You Wu, Hengzhou Ye, Qijun Zhao, Shuiwang Li

    Abstract: Empowered by transformer-based models, visual tracking has advanced significantly. However, the slow speed of current trackers limits their applicability on devices with constrained computational resources. To address this challenge, we introduce ABTrack, an adaptive computation framework that adaptively bypassing transformer blocks for efficient visual tracking. The rationale behind ABTrack is ro… ▽ More

    Submitted 1 July, 2024; v1 submitted 12 June, 2024; originally announced June 2024.