Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 994 results for author: Xu, S

Searching in archive cs. Search in all archives.
.
  1. Does the Vulnerability Threaten Our Projects? Automated Vulnerable API Detection for Third-Party Libraries

    Authors: Fangyuan Zhang, Lingling Fan, Sen Chen, Miaoying Cai, Sihan Xu, Lida Zhao

    Abstract: Developers usually use TPLs to facilitate the development of the projects to avoid reinventing the wheels, however, the vulnerable TPLs indeed cause severe security threats. The majority of existing research only considered whether projects used vulnerable TPLs but neglected whether the vulnerable code of the TPLs was indeed used by the projects, which inevitably results in false positives and fur… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: 15 pages, 4 figures

  2. arXiv:2409.02508  [pdf, other

    cs.CV

    TLD: A Vehicle Tail Light signal Dataset and Benchmark

    Authors: Jinhao Chai, Shiyi Mu, Shugong Xu

    Abstract: Understanding other drivers' intentions is crucial for safe driving. The role of taillights in conveying these intentions is underemphasized in current autonomous driving systems. Accurately identifying taillight signals is essential for predicting vehicle behavior and preventing collisions. Open-source taillight datasets are scarce, often small and inconsistently annotated. To address this gap, w… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  3. arXiv:2409.02497  [pdf, other

    eess.IV cs.CV

    A Learnable Color Correction Matrix for RAW Reconstruction

    Authors: Anqi Liu, Shiyi Mu, Shugong Xu

    Abstract: Autonomous driving algorithms usually employ sRGB images as model input due to their compatibility with the human visual system. However, visually pleasing sRGB images are possibly sub-optimal for downstream tasks when compared to RAW images. The availability of RAW images is constrained by the difficulties in collecting real-world driving data and the associated challenges of annotation. To addre… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: Accepted by BMVC2024

  4. arXiv:2409.01856  [pdf, other

    cs.CV

    Explicit Second-order LiDAR Bundle Adjustment Algorithm Using Mean Squared Group Metric

    Authors: Tingchen Ma, Yongsheng Ou, Sheng Xu

    Abstract: The Bundle Adjustment (BA) algorithm is a widely used nonlinear optimization technique in the backend of Simultaneous Localization and Mapping (SLAM) systems. By leveraging the co-view relationships of landmarks from multiple perspectives, it constructs a joint estimation model for both poses and landmarks, enabling the system to generate refined maps and reduce front-end localization errors. Howe… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  5. arXiv:2409.01315  [pdf, other

    physics.comp-ph cs.AI cs.LG

    Multi-frequency Neural Born Iterative Method for Solving 2-D Inverse Scattering Problems

    Authors: Daoqi Liu, Tao Shan, Maokun Li, Fan Yang, Shenheng Xu

    Abstract: In this work, we propose a deep learning-based imaging method for addressing the multi-frequency electromagnetic (EM) inverse scattering problem (ISP). By combining deep learning technology with EM physical laws, we have successfully developed a multi-frequency neural Born iterative method (NeuralBIM), guided by the principles of the single-frequency NeuralBIM. This method integrates multitask lea… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    MSC Class: 35Q61 ACM Class: I.2.6; G.1.8; G.1.3

  6. arXiv:2409.01212  [pdf, other

    cs.CV

    MobileIQA: Exploiting Mobile-level Diverse Opinion Network For No-Reference Image Quality Assessment Using Knowledge Distillation

    Authors: Zewen Chen, Sunhan Xu, Yun Zeng, Haochen Guo, Jian Guo, Shuai Liu, Juan Wang, Bing Li, Weiming Hu, Dehua Liu, Hesong Li

    Abstract: With the rising demand for high-resolution (HR) images, No-Reference Image Quality Assessment (NR-IQA) gains more attention, as it can ecaluate image quality in real-time on mobile devices and enhance user experience. However, existing NR-IQA methods often resize or crop the HR images into small resolution, which leads to a loss of important details. And most of them are of high computational comp… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: Accepted by ECCV Workshop 2024

  7. arXiv:2408.17325  [pdf, other

    cs.CL cond-mat.dis-nn cond-mat.stat-mech

    Impact of ChatGPT on the writing style of condensed matter physicists

    Authors: Shaojun Xu, Xiaohui Ye, Mengqi Zhang, Pei Wang

    Abstract: We apply a state-of-the-art difference-in-differences approach to estimate the impact of ChatGPT's release on the writing style of condensed matter papers on arXiv. Our analysis reveals a statistically significant improvement in the English quality of abstracts written by non-native English speakers. Importantly, this improvement remains robust even after accounting for other potential factors, co… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: 9 pages, 1 figure, 7 tables

  8. arXiv:2408.14925  [pdf, other

    cs.NE cs.AI

    Distance-Forward Learning: Enhancing the Forward-Forward Algorithm Towards High-Performance On-Chip Learning

    Authors: Yujie Wu, Siyuan Xu, Jibin Wu, Lei Deng, Mingkun Xu, Qinghao Wen, Guoqi Li

    Abstract: The Forward-Forward (FF) algorithm was recently proposed as a local learning method to address the limitations of backpropagation (BP), offering biological plausibility along with memory-efficient and highly parallelized computational benefits. However, it suffers from suboptimal performance and poor generalization, largely due to inadequate theoretical support and a lack of effective learning str… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  9. arXiv:2408.13681  [pdf, other

    cs.CE cs.SI

    Smart Home Cyber Insurance Pricing

    Authors: Xiaoyu Zhang, Maochao Xu, Shouhuai Xu

    Abstract: Our homes are increasingly employing various kinds of Internet of Things (IoT) devices, leading to the notion of smart homes. While this trend brings convenience to our daily life, it also introduces cyber risks. To mitigate such risks, the demand for smart home cyber insurance has been growing rapidly. However, there are no studies on analyzing the competency of smart home cyber insurance policie… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  10. arXiv:2408.13473  [pdf, other

    cs.CL

    Why Antiwork: A RoBERTa-Based System for Work-Related Stress Identification and Leading Factor Analysis

    Authors: Tao Lu, Muzhe Wu, Xinyi Lu, Siyuan Xu, Shuyu Zhan, Anuj Tambwekar, Emily Mower Provost

    Abstract: Harsh working environments and work-related stress have been known to contribute to mental health problems such as anxiety, depression, and suicidal ideation. As such, it is paramount to create solutions that can both detect employee unhappiness and find the root cause of the problem. While prior works have examined causes of mental health using machine learning, they typically focus on general me… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

    Comments: 13 pages, 8 figures

  11. arXiv:2408.12217  [pdf, other

    cs.CR

    Quantifying Psychological Sophistication of Malicious Emails

    Authors: Theodore Longtchi, Rosana Montañez Rodriguez, Kora Gwartney, Ekzhin Ear, David P. Azari, Christopher P. Kelley, Shouhuai Xu

    Abstract: Malicious emails including Phishing, Spam, and Scam are one significant class of cyber social engineering attacks. Despite numerous defenses to counter them, the problem remains largely open. The ineffectiveness of current defenses can be attributed to our superficial understanding of the psychological properties that make these attacks successful. This problem motivates us to investigate the psyc… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: 22 papges, 15 figures, 4 tables

  12. arXiv:2408.11586  [pdf, other

    cs.CR

    Characterizing the Evolution of Psychological Tactics and Techniques Exploited by Malicious Emails

    Authors: Theodore Longtchi, Shouhuai Xu

    Abstract: The landscape of malicious emails and cyber social engineering attacks in general are constantly evolving. In order to design effective defenses against these attacks, we must deeply understand the Psychological Tactics, PTacs, and Psychological Techniques, PTechs, that are exploited by these attacks. In this paper we present a methodology for characterizing the evolution of PTacs and PTechs explo… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: 20 pages, 9 figures, 2 tables

  13. arXiv:2408.11584  [pdf, other

    cs.CR

    Characterizing the Evolution of Psychological Factors Exploited by Malicious Emails

    Authors: Theodore Longtchi, Shouhuai Xu

    Abstract: Cyber attacks, including cyber social engineering attacks, such as malicious emails, are always evolving with time. Thus, it is important to understand their evolution. In this paper we characterize the evolution of malicious emails through the lens of Psychological Factors, PFs, which are humans psychological attributes that can be exploited by malicious emails. That is, attackers who send them.… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: 20 pages, 8 figures, 2 tables

  14. arXiv:2408.10958  [pdf, other

    physics.ao-ph cs.LG

    Kilometer-Scale Convection Allowing Model Emulation using Generative Diffusion Modeling

    Authors: Jaideep Pathak, Yair Cohen, Piyush Garg, Peter Harrington, Noah Brenowitz, Dale Durran, Morteza Mardani, Arash Vahdat, Shaoming Xu, Karthik Kashinath, Michael Pritchard

    Abstract: Storm-scale convection-allowing models (CAMs) are an important tool for predicting the evolution of thunderstorms and mesoscale convective systems that result in damaging extreme weather. By explicitly resolving convective dynamics within the atmosphere they afford meteorologists the nuance needed to provide outlook on hazard. Deep learning models have thus far not proven skilful at km-scale atmos… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  15. arXiv:2408.10679  [pdf, other

    cs.CV

    DemMamba: Alignment-free Raw Video Demoireing with Frequency-assisted Spatio-Temporal Mamba

    Authors: Shuning Xu, Xina Liu, Binbin Song, Xiangyu Chen, Qiubo Chen, Jiantao Zhou

    Abstract: Moire patterns arise when two similar repetitive patterns interfere, a phenomenon frequently observed during the capture of images or videos on screens. The color, shape, and location of moire patterns may differ across video frames, posing a challenge in learning information from adjacent frames and preserving temporal consistency. Previous video demoireing methods heavily rely on well-designed a… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  16. arXiv:2408.09149  [pdf, other

    physics.soc-ph cs.CY

    Uncovering key predictors of high-growth firms via explainable machine learning

    Authors: Yiwei Huang, Shuqi Xu, Linyuan Lü, Andrea Zaccaria, Manuel Sebastian Mariani

    Abstract: Predicting high-growth firms has attracted increasing interest from the technological forecasting and machine learning communities. Most existing studies primarily utilize financial data for these predictions. However, research suggests that a firm's research and development activities and its network position within technological ecosystems may also serve as valuable predictors. To unpack the rel… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: 26 pages, 9 figures

  17. arXiv:2408.08188  [pdf, other

    cs.RO cs.AI cs.LO

    Scaling Up Natural Language Understanding for Multi-Robots Through the Lens of Hierarchy

    Authors: Shaojun Xu, Xusheng Luo, Yutong Huang, Letian Leng, Ruixuan Liu, Changliu Liu

    Abstract: Long-horizon planning is hindered by challenges such as uncertainty accumulation, computational complexity, delayed rewards and incomplete information. This work proposes an approach to exploit the task hierarchy from human instructions to facilitate multi-robot planning. Using Large Language Models (LLMs), we propose a two-step approach to translate multi-sentence instructions into a structured l… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  18. arXiv:2408.06693  [pdf, other

    cs.CV cs.AI cs.CG

    DC3DO: Diffusion Classifier for 3D Objects

    Authors: Nursena Koprucu, Meher Shashwat Nigam, Shicheng Xu, Biruk Abere, Gabriele Dominici, Andrew Rodriguez, Sharvaree Vadgam, Berfin Inal, Alberto Tono

    Abstract: Inspired by Geoffrey Hinton emphasis on generative modeling, To recognize shapes, first learn to generate them, we explore the use of 3D diffusion models for object classification. Leveraging the density estimates from these models, our approach, the Diffusion Classifier for 3D Objects (DC3DO), enables zero-shot classification of 3D shapes without additional training. On average, our method achiev… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  19. arXiv:2408.06574  [pdf, other

    cs.CL

    SparkRA: A Retrieval-Augmented Knowledge Service System Based on Spark Large Language Model

    Authors: Dayong Wu, Jiaqi Li, Baoxin Wang, Honghong Zhao, Siyuan Xue, Yanjie Yang, Zhijun Chang, Rui Zhang, Li Qian, Bo Wang, Shijin Wang, Zhixiong Zhang, Guoping Hu

    Abstract: Large language models (LLMs) have shown remarkable achievements across various language tasks.To enhance the performance of LLMs in scientific literature services, we developed the scientific literature LLM (SciLit-LLM) through pre-training and supervised fine-tuning on scientific literature, building upon the iFLYTEK Spark LLM. Furthermore, we present a knowledge service system Spark Research Ass… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  20. arXiv:2408.06273  [pdf, other

    cs.CL

    FuxiTranyu: A Multilingual Large Language Model Trained with Balanced Data

    Authors: Haoran Sun, Renren Jin, Shaoyang Xu, Leiyu Pan, Supryadi, Menglong Cui, Jiangcun Du, Yikun Lei, Lei Yang, Ling Shi, Juesi Xiao, Shaolin Zhu, Deyi Xiong

    Abstract: Large language models (LLMs) have demonstrated prowess in a wide range of tasks. However, many LLMs exhibit significant performance discrepancies between high- and low-resource languages. To mitigate this challenge, we present FuxiTranyu, an open-source multilingual LLM, which is designed to satisfy the need of the research community for balanced and high-performing multilingual capabilities. Fuxi… ▽ More

    Submitted 13 August, 2024; v1 submitted 12 August, 2024; originally announced August 2024.

  21. arXiv:2408.05751  [pdf, other

    cs.IR cs.CV

    Advancing Re-Ranking with Multimodal Fusion and Target-Oriented Auxiliary Tasks in E-Commerce Search

    Authors: Enqiang Xu, Xinhui Li, Zhigong Zhou, Jiahao Ji, Jinyuan Zhao, Dadong Miao, Songlin Wang, Lin Liu, Sulong Xu

    Abstract: In the rapidly evolving field of e-commerce, the effectiveness of search re-ranking models is crucial for enhancing user experience and driving conversion rates. Despite significant advancements in feature representation and model architecture, the integration of multimodal information remains underexplored. This study addresses this gap by investigating the computation and fusion of textual and v… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

  22. arXiv:2408.05082  [pdf, other

    cs.LG cs.AI

    Generalizing Few Data to Unseen Domains Flexibly Based on Label Smoothing Integrated with Distributionally Robust Optimization

    Authors: Yangdi Wang, Zhi-Hai Zhang, Su Xiu Xu, Wenming Guo

    Abstract: Overfitting commonly occurs when applying deep neural networks (DNNs) on small-scale datasets, where DNNs do not generalize well from existing data to unseen data. The main reason resulting in overfitting is that small-scale datasets cannot reflect the situations of the real world. Label smoothing (LS) is an effective regularization method to prevent overfitting, avoiding it by mixing one-hot labe… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

  23. Self-Supervised Contrastive Graph Clustering Network via Structural Information Fusion

    Authors: Xiaoyang Ji, Yuchen Zhou, Haofu Yang, Shiyue Xu, Jiahao Li

    Abstract: Graph clustering, a classical task in graph learning, involves partitioning the nodes of a graph into distinct clusters. This task has applications in various real-world scenarios, such as anomaly detection, social network analysis, and community discovery. Current graph clustering methods commonly rely on module pre-training to obtain a reliable prior distribution for the model, which is then use… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: 6 pages, 3 figures

    Journal ref: 2024 27th International Conference on Computer Supported Cooperative Work in Design (CSCWD), Tianjin, China, 2024, pp. 254-259

  24. MetaDragonBoat: Exploring Paddling Techniques of Virtual Dragon Boating in a Metaverse Campus

    Authors: Wei He, Xiang Li, Shengtian Xu, Yuzheng Chen, Chan-In Sio, Ge Lin Kan, Lik-Hang Lee

    Abstract: The preservation of cultural heritage, as mandated by the United Nations Sustainable Development Goals (SDGs), is integral to sustainable urban development. This paper focuses on the Dragon Boat Festival, a prominent event in Chinese cultural heritage, and proposes leveraging Virtual Reality (VR), to enhance its preservation and accessibility. Traditionally, participation in the festival's dragon… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: 10 pages, accepted at ACM MM 2024

  25. arXiv:2408.03220  [pdf, other

    cs.LG cs.DC

    Masked Random Noise for Communication Efficient Federaetd Learning

    Authors: Shiwei Li, Yingyi Cheng, Haozhao Wang, Xing Tang, Shijie Xu, Weihong Luo, Yuhua Li, Dugang Liu, Xiuqiang He, and Ruixuan Li

    Abstract: Federated learning is a promising distributed training paradigm that effectively safeguards data privacy. However, it may involve significant communication costs, which hinders training efficiency. In this paper, we aim to enhance communication efficiency from a new perspective. Specifically, we request the distributed clients to find optimal model updates relative to global model parameters withi… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: Accepted by MM 2024

  26. arXiv:2408.03215  [pdf, other

    cs.LG cs.DC

    FedBAT: Communication-Efficient Federated Learning via Learnable Binarization

    Authors: Shiwei Li, Wenchao Xu, Haozhao Wang, Xing Tang, Yining Qi, Shijie Xu, Weihong Luo, Yuhua Li, Xiuqiang He, Ruixuan Li

    Abstract: Federated learning is a promising distributed machine learning paradigm that can effectively exploit large-scale data without exposing users' privacy. However, it may incur significant communication overhead, thereby potentially impairing the training efficiency. To address this challenge, numerous studies suggest binarizing the model updates. Nonetheless, traditional methods usually binarize mode… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: Accepted by ICML 2024

  27. arXiv:2408.02907  [pdf, other

    cs.CL

    Leveraging Inter-Chunk Interactions for Enhanced Retrieval in Large Language Model-Based Question Answering

    Authors: Tiezheng Guo, Chen Wang, Yanyi Liu, Jiawei Tang, Pan Li, Sai Xu, Qingwen Yang, Xianlin Gao, Zhi Li, Yingyou Wen

    Abstract: Retrieving external knowledge and prompting large language models with relevant information is an effective paradigm to enhance the performance of question-answering tasks. Previous research typically handles paragraphs from external documents in isolation, resulting in a lack of context and ambiguous references, particularly in multi-document and complex tasks. To overcome these challenges, we pr… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  28. arXiv:2408.01291  [pdf, other

    cs.CV

    TexGen: Text-Guided 3D Texture Generation with Multi-view Sampling and Resampling

    Authors: Dong Huo, Zixin Guo, Xinxin Zuo, Zhihao Shi, Juwei Lu, Peng Dai, Songcen Xu, Li Cheng, Yee-Hong Yang

    Abstract: Given a 3D mesh, we aim to synthesize 3D textures that correspond to arbitrary textual descriptions. Current methods for generating and assembling textures from sampled views often result in prominent seams or excessive smoothing. To tackle these issues, we present TexGen, a novel multi-view sampling and resampling framework for texture generation leveraging a pre-trained text-to-image diffusion m… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: European Conference on Computer Vision (ECCV) 2024

  29. arXiv:2408.00297  [pdf, other

    cs.CV

    EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head

    Authors: Qianyun He, Xinya Ji, Yicheng Gong, Yuanxun Lu, Zhengyu Diao, Linjia Huang, Yao Yao, Siyu Zhu, Zhan Ma, Songcen Xu, Xiaofei Wu, Zixiao Zhang, Xun Cao, Hao Zhu

    Abstract: We present a novel approach for synthesizing 3D talking heads with controllable emotion, featuring enhanced lip synchronization and rendering quality. Despite significant progress in the field, prior methods still suffer from multi-view consistency and a lack of emotional expressiveness. To address these issues, we collect EmoTalk3D dataset with calibrated multi-view videos, emotional annotations,… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: ECCV 2024

  30. arXiv:2407.21424  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    Cost-Effective Hallucination Detection for LLMs

    Authors: Simon Valentin, Jinmiao Fu, Gianluca Detommaso, Shaoyuan Xu, Giovanni Zappella, Bryan Wang

    Abstract: Large language models (LLMs) can be prone to hallucinations - generating unreliable outputs that are unfaithful to their inputs, external facts or internally inconsistent. In this work, we address several challenges for post-hoc hallucination detection in production settings. Our pipeline for hallucination detection entails: first, producing a confidence score representing the likelihood that a ge… ▽ More

    Submitted 9 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

    Comments: Accepted to GenAI Evaluation Workshop at KDD 2024

  31. arXiv:2407.21075  [pdf, other

    cs.AI cs.CL cs.LG

    Apple Intelligence Foundation Language Models

    Authors: Tom Gunter, Zirui Wang, Chong Wang, Ruoming Pang, Andy Narayanan, Aonan Zhang, Bowen Zhang, Chen Chen, Chung-Cheng Chiu, David Qiu, Deepak Gopinath, Dian Ang Yap, Dong Yin, Feng Nan, Floris Weers, Guoli Yin, Haoshuo Huang, Jianyu Wang, Jiarui Lu, John Peebles, Ke Ye, Mark Lee, Nan Du, Qibin Chen, Quentin Keunebroek , et al. (130 additional authors not shown)

    Abstract: We present foundation language models developed to power Apple Intelligence features, including a ~3 billion parameter model designed to run efficiently on devices and a large server-based language model designed for Private Cloud Compute. These models are designed to perform a wide range of tasks efficiently, accurately, and responsibly. This report describes the model architecture, the data used… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  32. arXiv:2407.20852  [pdf, other

    cs.NI cs.MM eess.SY

    Optimizing 5G-Advanced Networks for Time-critical Applications: The Role of L4S

    Authors: Guangjin Pan, Shugong Xu, Pin Jiang

    Abstract: As 5G networks strive to support advanced time-critical applications, such as immersive Extended Reality (XR), cloud gaming, and autonomous driving, the demand for Real-time Broadband Communication (RTBC) grows. In this article, we present the main mechanisms of Low Latency, Low Loss, and Scalable Throughput (L4S). Subsequently, we investigate the support and challenges of L4S technology in the la… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: 7 pages, 3 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  33. arXiv:2407.20518  [pdf, other

    eess.IV cs.AI cs.CV

    High-Resolution Spatial Transcriptomics from Histology Images using HisToSGE

    Authors: Zhiceng Shi, Shuailin Xue, Fangfang Zhu, Wenwen Min

    Abstract: Spatial transcriptomics (ST) is a groundbreaking genomic technology that enables spatial localization analysis of gene expression within tissue sections. However, it is significantly limited by high costs and sparse spatial resolution. An alternative, more cost-effective strategy is to use deep learning methods to predict high-density gene expression profiles from histological images. However, exi… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  34. arXiv:2407.19409  [pdf, other

    cs.CL cs.CV

    LLAVADI: What Matters For Multimodal Large Language Models Distillation

    Authors: Shilin Xu, Xiangtai Li, Haobo Yuan, Lu Qi, Yunhai Tong, Ming-Hsuan Yang

    Abstract: The recent surge in Multimodal Large Language Models (MLLMs) has showcased their remarkable potential for achieving generalized intelligence by integrating visual understanding into Large Language Models.Nevertheless, the sheer model size of MLLMs leads to substantial memory and computational demands that hinder their widespread deployment. In this work, we do not propose a new efficient model str… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

  35. arXiv:2407.17879  [pdf, other

    cs.AR cs.AI

    HG-PIPE: Vision Transformer Acceleration with Hybrid-Grained Pipeline

    Authors: Qingyu Guo, Jiayong Wan, Songqiang Xu, Meng Li, Yuan Wang

    Abstract: Vision Transformer (ViT) acceleration with field programmable gate array (FPGA) is promising but challenging. Existing FPGA-based ViT accelerators mainly rely on temporal architectures, which process different operators by reusing the same hardware blocks and suffer from extensive memory access overhead. Pipelined architectures, either coarse-grained or fine-grained, unroll the ViT computation spa… ▽ More

    Submitted 1 August, 2024; v1 submitted 25 July, 2024; originally announced July 2024.

    Comments: Accepted by ICCAD 2024

    MSC Class: 68T07

  36. arXiv:2407.15476  [pdf, other

    cs.LG cs.IR

    MODRL-TA:A Multi-Objective Deep Reinforcement Learning Framework for Traffic Allocation in E-Commerce Search

    Authors: Peng Cheng, Huimu Wang, Jinyuan Zhao, Yihao Wang, Enqiang Xu, Yu Zhao, Zhuojian Xiao, Songlin Wang, Guoyu Tang, Lin Liu, Sulong Xu

    Abstract: Traffic allocation is a process of redistributing natural traffic to products by adjusting their positions in the post-search phase, aimed at effectively fostering merchant growth, precisely meeting customer demands, and ensuring the maximization of interests across various parties within e-commerce platforms. Existing methods based on learning to rank neglect the long-term value of traffic alloca… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  37. arXiv:2407.15354  [pdf, other

    cs.CV cs.RO

    Learning High-resolution Vector Representation from Multi-Camera Images for 3D Object Detection

    Authors: Zhili Chen, Shuangjie Xu, Maosheng Ye, Zian Qian, Xiaoyi Zou, Dit-Yan Yeung, Qifeng Chen

    Abstract: The Bird's-Eye-View (BEV) representation is a critical factor that directly impacts the 3D object detection performance, but the traditional BEV grid representation induces quadratic computational cost as the spatial resolution grows. To address this limitation, we present a new camera-based 3D object detector with high-resolution vector representation: VectorFormer. The presented high-resolution… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024. Project page: https://github.com/zlichen/VectorFormer

  38. arXiv:2407.15026  [pdf, other

    cs.AR cs.AI

    Benchmarking End-To-End Performance of AI-Based Chip Placement Algorithms

    Authors: Zhihai Wang, Zijie Geng, Zhaojie Tu, Jie Wang, Yuxi Qian, Zhexuan Xu, Ziyan Liu, Siyuan Xu, Zhentao Tang, Shixiong Kai, Mingxuan Yuan, Jianye Hao, Bin Li, Yongdong Zhang, Feng Wu

    Abstract: The increasing complexity of modern very-large-scale integration (VLSI) design highlights the significance of Electronic Design Automation (EDA) technologies. Chip placement is a critical step in the EDA workflow, which positions chip modules on the canvas with the goal of optimizing performance, power, and area (PPA) metrics of final chip designs. Recent advances have demonstrated the great poten… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: A comprehensive benchmark for AI-based chip placement algorithms using end-to-end performance metrics

  39. arXiv:2407.14381  [pdf, other

    cs.LG

    Improving GBDT Performance on Imbalanced Datasets: An Empirical Study of Class-Balanced Loss Functions

    Authors: Jiaqi Luo, Yuan Yuan, Shixin Xu

    Abstract: Class imbalance remains a significant challenge in machine learning, particularly for tabular data classification tasks. While Gradient Boosting Decision Trees (GBDT) models have proven highly effective for such tasks, their performance can be compromised when dealing with imbalanced datasets. This paper presents the first comprehensive study on adapting class-balanced loss functions to three GBDT… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  40. arXiv:2407.12821  [pdf, other

    cs.CL cs.AI cs.LG

    AutoFlow: Automated Workflow Generation for Large Language Model Agents

    Authors: Zelong Li, Shuyuan Xu, Kai Mei, Wenyue Hua, Balaji Rama, Om Raheja, Hao Wang, He Zhu, Yongfeng Zhang

    Abstract: Recent advancements in Large Language Models (LLMs) have shown significant progress in understanding complex natural language. One important application of LLM is LLM-based AI Agent, which leverages the ability of LLM as well as external tools for complex-task solving. To make sure LLM Agents follow an effective and reliable procedure to solve the given task, manually designed workflows are usuall… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Open source code available at https://github.com/agiresearch/AutoFlow

  41. arXiv:2407.12798  [pdf, other

    cs.CV

    Multi-Granularity and Multi-modal Feature Interaction Approach for Text Video Retrieval

    Authors: Wenjun Li, Shudong Wang, Dong Zhao, Shenghui Xu, Zhaoming Pan, Zhimin Zhang

    Abstract: The key of the text-to-video retrieval (TVR) task lies in learning the unique similarity between each pair of text (consisting of words) and video (consisting of audio and image frames) representations. However, some problems exist in the representation alignment of video and text, such as a text, and further each word, are of different importance for video frames. Besides, audio usually carries a… ▽ More

    Submitted 20 June, 2024; originally announced July 2024.

  42. arXiv:2407.12291  [pdf, other

    cs.CV

    JointDreamer: Ensuring Geometry Consistency and Text Congruence in Text-to-3D Generation via Joint Score Distillation

    Authors: Chenhan Jiang, Yihan Zeng, Tianyang Hu, Songcun Xu, Wei Zhang, Hang Xu, Dit-Yan Yeung

    Abstract: Score Distillation Sampling (SDS) by well-trained 2D diffusion models has shown great promise in text-to-3D generation. However, this paradigm distills view-agnostic 2D image distributions into the rendering distribution of 3D representation for each view independently, overlooking the coherence across views and yielding 3D inconsistency in generations. In this work, we propose \textbf{J}oint \tex… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 29 pages, ECCV2024

  43. arXiv:2407.08509  [pdf, other

    eess.IV cs.CV

    Haar Nuclear Norms with Applications to Remote Sensing Imagery Restoration

    Authors: Shuang Xu, Chang Yu, Jiangjun Peng, Xiangyong Cao

    Abstract: Remote sensing image restoration aims to reconstruct missing or corrupted areas within images. To date, low-rank based models have garnered significant interest in this field. This paper proposes a novel low-rank regularization term, named the Haar nuclear norm (HNN), for efficient and effective remote sensing image restoration. It leverages the low-rank properties of wavelet coefficients derive… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  44. arXiv:2407.08377  [pdf, other

    cs.CV

    Long-range Turbulence Mitigation: A Large-scale Dataset and A Coarse-to-fine Framework

    Authors: Shengqi Xu, Run Sun, Yi Chang, Shuning Cao, Xueyao Xiao, Luxin Yan

    Abstract: Long-range imaging inevitably suffers from atmospheric turbulence with severe geometric distortions due to random refraction of light. The further the distance, the more severe the disturbance. Despite existing research has achieved great progress in tackling short-range turbulence, there is less attention paid to long-range turbulence with significant distortions. To address this dilemma and adva… ▽ More

    Submitted 17 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

    Comments: This paper is accepted by ECCV 2024

  45. arXiv:2407.08224  [pdf, other

    q-bio.QM cs.AI

    stEnTrans: Transformer-based deep learning for spatial transcriptomics enhancement

    Authors: Shuailin Xue, Fangfang Zhu, Changmiao Wang, Wenwen Min

    Abstract: The spatial location of cells within tissues and organs is crucial for the manifestation of their specific functions.Spatial transcriptomics technology enables comprehensive measurement of the gene expression patterns in tissues while retaining spatial information. However, current popular spatial transcriptomics techniques either have shallow sequencing depth or low resolution. We present stEnTra… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: ISBRA2024, Code: https://github.com/shuailinxue/stEnTrans

  46. arXiv:2407.08200  [pdf, other

    cs.CV

    Deep Understanding of Soccer Match Videos

    Authors: Shikun Xu, Yandong Zhu, Gen Li, Changhu Wang

    Abstract: Soccer is one of the most popular sport worldwide, with live broadcasts frequently available for major matches. However, extracting detailed, frame-by-frame information on player actions from these videos remains a challenge. Utilizing state-of-the-art computer vision technologies, our system can detect key objects such as soccer balls, players and referees. It also tracks the movements of players… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  47. arXiv:2407.08156  [pdf, other

    cs.CV

    AddressCLIP: Empowering Vision-Language Models for City-wide Image Address Localization

    Authors: Shixiong Xu, Chenghao Zhang, Lubin Fan, Gaofeng Meng, Shiming Xiang, Jieping Ye

    Abstract: In this study, we introduce a new problem raised by social media and photojournalism, named Image Address Localization (IAL), which aims to predict the readable textual address where an image was taken. Existing two-stage approaches involve predicting geographical coordinates and converting them into human-readable addresses, which can lead to ambiguity and be resource-intensive. In contrast, we p… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted at ECCV 2024

  48. arXiv:2407.06192  [pdf, other

    cs.CV cs.AI cs.CL

    Multi-Object Hallucination in Vision-Language Models

    Authors: Xuweiyi Chen, Ziqiao Ma, Xuejun Zhang, Sihan Xu, Shengyi Qian, Jianing Yang, David F. Fouhey, Joyce Chai

    Abstract: Large vision language models (LVLMs) often suffer from object hallucination, producing objects not present in the given images. While current benchmarks for object hallucination primarily concentrate on the presence of a single object class rather than individual entities, this work systematically investigates multi-object hallucination, examining how models misperceive (e.g., invent nonexistent o… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Accepted to ALVR @ ACL 2024 | Project page: https://multi-object-hallucination.github.io/

  49. arXiv:2407.06064  [pdf, other

    eess.IV cs.CV

    Pan-denoising: Guided Hyperspectral Image Denoising via Weighted Represent Coefficient Total Variation

    Authors: Shuang Xu, Qiao Ke, Jiangjun Peng, Xiangyong Cao, Zixiang Zhao

    Abstract: This paper introduces a novel paradigm for hyperspectral image (HSI) denoising, which is termed \textit{pan-denoising}. In a given scene, panchromatic (PAN) images capture similar structures and textures to HSIs but with less noise. This enables the utilization of PAN images to guide the HSI denoising process. Consequently, pan-denoising, which incorporates an additional prior, has the potential t… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  50. arXiv:2407.05647  [pdf, other

    cs.CV

    Learning to Adapt Category Consistent Meta-Feature of CLIP for Few-Shot Classification

    Authors: Jiaying Shi, Xuetong Xue, Shenghui Xu

    Abstract: The recent CLIP-based methods have shown promising zero-shot and few-shot performance on image classification tasks. Existing approaches such as CoOp and Tip-Adapter only focus on high-level visual features that are fully aligned with textual features representing the ``Summary" of the image. However, the goal of few-shot learning is to classify unseen images of the same category with few labeled… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.