Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 1,298 results for author: Zha, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.01145  [pdf, other

    cs.SI cs.AI

    LATEX-GCL: Large Language Models (LLMs)-Based Data Augmentation for Text-Attributed Graph Contrastive Learning

    Authors: Haoran Yang, Xiangyu Zhao, Sirui Huang, Qing Li, Guandong Xu

    Abstract: Graph Contrastive Learning (GCL) is a potent paradigm for self-supervised graph learning that has attracted attention across various application scenarios. However, GCL for learning on Text-Attributed Graphs (TAGs) has yet to be explored. Because conventional augmentation techniques like feature embedding masking cannot directly process textual attributes on TAGs. A naive strategy for applying GCL… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  2. arXiv:2409.00904  [pdf, other

    cs.CV cs.AI

    Multi-scale Temporal Fusion Transformer for Incomplete Vehicle Trajectory Prediction

    Authors: Zhanwen Liu, Chao Li, Yang Wang, Nan Yang, Xing Fan, Jiaqi Ma, Xiangmo Zhao

    Abstract: Motion prediction plays an essential role in autonomous driving systems, enabling autonomous vehicles to achieve more accurate local-path planning and driving decisions based on predictions of the surrounding vehicles. However, existing methods neglect the potential missing values caused by object occlusion, perception failures, etc., which inevitably degrades the trajectory prediction performance… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  3. arXiv:2409.00426  [pdf, other

    cs.CR

    Is Difficulty Calibration All We Need? Towards More Practical Membership Inference Attacks

    Authors: Yu He, Boheng Li, Yao Wang, Mengda Yang, Juan Wang, Hongxin Hu, Xingyu Zhao

    Abstract: The vulnerability of machine learning models to Membership Inference Attacks (MIAs) has garnered considerable attention in recent years. These attacks determine whether a data sample belongs to the model's training set or not. Recent research has focused on reference-based attacks, which leverage difficulty calibration with independently trained reference models. While empirical studies have demon… ▽ More

    Submitted 4 September, 2024; v1 submitted 31 August, 2024; originally announced September 2024.

    Comments: Accepted by ACM CCS 2024

  4. arXiv:2408.16200  [pdf, other

    cs.CV cs.AI

    PolarBEVDet: Exploring Polar Representation for Multi-View 3D Object Detection in Bird's-Eye-View

    Authors: Zichen Yu, Quanli Liu, Wei Wang, Liyong Zhang, Xiaoguang Zhao

    Abstract: Recently, LSS-based multi-view 3D object detection provides an economical and deployment-friendly solution for autonomous driving. However, all the existing LSS-based methods transform multi-view image features into a Cartesian Bird's-Eye-View(BEV) representation, which does not take into account the non-uniform image information distribution and hardly exploits the view symmetry. In this paper, i… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 11 pages, 6 figures

  5. arXiv:2408.15657  [pdf, other

    cs.CV cs.RO

    TeFF: Tracking-enhanced Forgetting-free Few-shot 3D LiDAR Semantic Segmentation

    Authors: Junbao Zhou, Jilin Mei, Pengze Wu, Liang Chen, Fangzhou Zhao, Xijun Zhao, Yu Hu

    Abstract: In autonomous driving, 3D LiDAR plays a crucial role in understanding the vehicle's surroundings. However, the newly emerged, unannotated objects presents few-shot learning problem for semantic segmentation. This paper addresses the limitations of current few-shot semantic segmentation by exploiting the temporal continuity of LiDAR data. Employing a tracking model to generate pseudo-ground-truths… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  6. arXiv:2408.14626  [pdf

    cs.LG cs.AI

    Hybrid Deep Convolutional Neural Networks Combined with Autoencoders And Augmented Data To Predict The Look-Up Table 2006

    Authors: Messaoud Djeddou, Aouatef Hellal, Ibrahim A. Hameed, Xingang Zhao, Djehad Al Dallal

    Abstract: This study explores the development of a hybrid deep convolutional neural network (DCNN) model enhanced by autoencoders and data augmentation techniques to predict critical heat flux (CHF) with high accuracy. By augmenting the original input features using three different autoencoder configurations, the model's predictive capabilities were significantly improved. The hybrid models were trained and… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: 11 pages, 6 figures

  7. arXiv:2408.14126  [pdf, other

    cs.LG cs.CY

    Enhancing Fairness through Reweighting: A Path to Attain the Sufficiency Rule

    Authors: Xuan Zhao, Klaus Broelemann, Salvatore Ruggieri, Gjergji Kasneci

    Abstract: We introduce an innovative approach to enhancing the empirical risk minimization (ERM) process in model training through a refined reweighting scheme of the training data to enhance fairness. This scheme aims to uphold the sufficiency rule in fairness by ensuring that optimal predictors maintain consistency across diverse sub-groups. We employ a bilevel formulation to address this challenge, where… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: accepted at ECAI 2024

  8. arXiv:2408.13802  [pdf, other

    cs.CV cs.RO

    TripleMixer: A 3D Point Cloud Denoising Model for Adverse Weather

    Authors: Xiongwei Zhao, Congcong Wen, Yang Wang, Haojie Bai, Wenhao Dou

    Abstract: LiDAR sensors are crucial for providing high-resolution 3D point cloud data in autonomous driving systems, enabling precise environmental perception. However, real-world adverse weather conditions, such as rain, fog, and snow, introduce significant noise and interference, degrading the reliability of LiDAR data and the performance of downstream tasks like semantic segmentation. Existing datasets o… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: 15 pages, submit to IEEE TIP

  9. arXiv:2408.13452  [pdf, other

    cs.LG

    Data Augmentation for Continual RL via Adversarial Gradient Episodic Memory

    Authors: Sihao Wu, Xingyu Zhao, Xiaowei Huang

    Abstract: Data efficiency of learning, which plays a key role in the Reinforcement Learning (RL) training process, becomes even more important in continual RL with sequential environments. In continual RL, the learner interacts with non-stationary, sequential tasks and is required to learn new tasks without forgetting previous knowledge. However, there is little work on implementing data augmentation for co… ▽ More

    Submitted 26 August, 2024; v1 submitted 23 August, 2024; originally announced August 2024.

  10. arXiv:2408.13357  [pdf, other

    cs.IR

    SEQ+MD: Learning Multi-Task as a SEQuence with Multi-Distribution Data

    Authors: Siqi Wang, Audrey Zhijiao Chen, Austin Clapp, Sheng-Min Shih, Xiaoting Zhao

    Abstract: In e-commerce, the order in which search results are displayed when a customer tries to find relevant listings can significantly impact their shopping experience and search efficiency. Tailored re-ranking system based on relevance and engagement signals in E-commerce has often shown improvement on sales and gross merchandise value (GMV). Designing algorithms for this purpose is even more challengi… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  11. arXiv:2408.13214  [pdf, other

    q-fin.CP cs.AI cs.CE cs.CL

    EUR-USD Exchange Rate Forecasting Based on Information Fusion with Large Language Models and Deep Learning Methods

    Authors: Hongcheng Ding, Xuanze Zhao, Zixiao Jiang, Shamsul Nahar Abdullah, Deshinta Arrova Dewi

    Abstract: Accurate forecasting of the EUR/USD exchange rate is crucial for investors, businesses, and policymakers. This paper proposes a novel framework, IUS, that integrates unstructured textual data from news and analysis with structured data on exchange rates and financial indicators to enhance exchange rate prediction. The IUS framework employs large language models for sentiment polarity scoring and e… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  12. arXiv:2408.12832  [pdf, other

    cs.CL

    LIMP: Large Language Model Enhanced Intent-aware Mobility Prediction

    Authors: Songwei Li, Jie Feng, Jiawei Chi, Xinyuan Hu, Xiaomeng Zhao, Fengli Xu

    Abstract: Human mobility prediction is essential for applications like urban planning and transportation management, yet it remains challenging due to the complex, often implicit, intentions behind human behavior. Existing models predominantly focus on spatiotemporal patterns, paying less attention to the underlying intentions that govern movements. Recent advancements in large language models (LLMs) offer… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: 13 pages

  13. arXiv:2408.12588  [pdf, other

    cs.CV cs.DC

    Real-Time Video Generation with Pyramid Attention Broadcast

    Authors: Xuanlei Zhao, Xiaolong Jin, Kai Wang, Yang You

    Abstract: We present Pyramid Attention Broadcast (PAB), a real-time, high quality and training-free approach for DiT-based video generation. Our method is founded on the observation that attention difference in the diffusion process exhibits a U-shaped pattern, indicating significant redundancy. We mitigate this by broadcasting attention outputs to subsequent steps in a pyramid style. It applies different b… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  14. arXiv:2408.11856  [pdf, other

    cs.CL cs.AI

    Dynamic Adaptive Optimization for Effective Sentiment Analysis Fine-Tuning on Large Language Models

    Authors: Hongcheng Ding, Xuanze Zhao, Shamsul Nahar Abdullah, Deshinta Arrova Dewi, Zixiao Jiang

    Abstract: Sentiment analysis plays a crucial role in various domains, such as business intelligence and financial forecasting. Large language models (LLMs) have become a popular paradigm for sentiment analysis, leveraging multi-task learning to address specific tasks concurrently. However, LLMs with fine-tuning for sentiment analysis often underperforms due to the inherent challenges in managing diverse tas… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  15. arXiv:2408.11494  [pdf, ps, other

    cs.AI

    Mutagenesis screen to map the functionals of parameters of Large Language Models

    Authors: Yue Hu, Kai Hu, Patrick X. Zhao, Javed Khan, Chengming Xu

    Abstract: Large Language Models (LLMs) have significantly advanced artificial intelligence, excelling in numerous tasks. Although the functionality of a model is inherently tied to its parameters, a systematic method for exploring the connections between the parameters and the functionality are lacking. Models sharing similar structure and parameter counts exhibit significant performance disparities across… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: 10 pages, 6 figures, supplementary material available online

    ACM Class: I.2.0

  16. arXiv:2408.11451  [pdf, other

    cs.AI

    Bidirectional Gated Mamba for Sequential Recommendation

    Authors: Ziwei Liu, Qidong Liu, Yejing Wang, Wanyu Wang, Pengyue Jia, Maolin Wang, Zitao Liu, Yi Chang, Xiangyu Zhao

    Abstract: In various domains, Sequential Recommender Systems (SRS) have become essential due to their superior capability to discern intricate user preferences. Typically, SRS utilize transformer-based architectures to forecast the subsequent item within a sequence. Nevertheless, the quadratic computational complexity inherent in these models often leads to inefficiencies, hindering the achievement of real-… ▽ More

    Submitted 22 August, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

  17. arXiv:2408.11312  [pdf, other

    cs.CV cs.AI

    Swarm Intelligence in Geo-Localization: A Multi-Agent Large Vision-Language Model Collaborative Framework

    Authors: Xiao Han, Chen Zhu, Xiangyu Zhao, Hengshu Zhu

    Abstract: Visual geo-localization demands in-depth knowledge and advanced reasoning skills to associate images with real-world geographic locations precisely. In general, traditional methods based on data-matching are hindered by the impracticality of storing adequate visual records of global landmarks. Recently, Large Vision-Language Models (LVLMs) have demonstrated the capability of geo-localization throu… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  18. arXiv:2408.11305  [pdf, other

    cs.CV cs.AI

    UniFashion: A Unified Vision-Language Model for Multimodal Fashion Retrieval and Generation

    Authors: Xiangyu Zhao, Yuehan Zhang, Wenlong Zhang, Xiao-Ming Wu

    Abstract: The fashion domain encompasses a variety of real-world multimodal tasks, including multimodal retrieval and multimodal generation. The rapid advancements in artificial intelligence generated content, particularly in technologies like large language models for text generation and diffusion models for visual generation, have sparked widespread research interest in applying these multimodal models in… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  19. arXiv:2408.11172  [pdf, other

    cs.LG cs.AI cs.CL cs.LO

    SubgoalXL: Subgoal-based Expert Learning for Theorem Proving

    Authors: Xueliang Zhao, Lin Zheng, Haige Bo, Changran Hu, Urmish Thakker, Lingpeng Kong

    Abstract: Formal theorem proving, a field at the intersection of mathematics and computer science, has seen renewed interest with advancements in large language models (LLMs). This paper introduces SubgoalXL, a novel approach that synergizes subgoal-based proofs with expert learning to enhance LLMs' capabilities in formal theorem proving within the Isabelle environment. SubgoalXL addresses two critical chal… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  20. arXiv:2408.11126  [pdf, other

    cs.CV cs.LG

    Binocular Model: A deep learning solution for online melt pool temperature analysis using dual-wavelength Imaging Pyrometry

    Authors: Javid Akhavan, Chaitanya Krishna Vallabh, Xiayun Zhao, Souran Manoochehri

    Abstract: In metal Additive Manufacturing (AM), monitoring the temperature of the Melt Pool (MP) is crucial for ensuring part quality, process stability, defect prevention, and overall process optimization. Traditional methods, are slow to converge and require extensive manual effort to translate data into actionable insights, rendering them impractical for real-time monitoring and control. To address this… ▽ More

    Submitted 26 August, 2024; v1 submitted 20 August, 2024; originally announced August 2024.

  21. arXiv:2408.10473  [pdf, other

    cs.CL cs.LG

    Enhancing One-shot Pruned Pre-trained Language Models through Sparse-Dense-Sparse Mechanism

    Authors: Guanchen Li, Xiandong Zhao, Lian Liu, Zeping Li, Dong Li, Lu Tian, Jie He, Ashish Sirasao, Emad Barsoum

    Abstract: Pre-trained language models (PLMs) are engineered to be robust in contextual understanding and exhibit outstanding performance in various natural language processing tasks. However, their considerable size incurs significant computational and storage costs. Modern pruning strategies employ one-shot techniques to compress PLMs without the need for retraining on task-specific or otherwise general da… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  22. arXiv:2408.10286  [pdf, other

    cs.LG cs.AI

    GPT-Augmented Reinforcement Learning with Intelligent Control for Vehicle Dispatching

    Authors: Xiao Han, Zijian Zhang, Xiangyu Zhao, Guojiang Shen, Xiangjie Kong, Xuetao Wei, Liqiang Nie, Jieping Ye

    Abstract: As urban residents demand higher travel quality, vehicle dispatch has become a critical component of online ride-hailing services. However, current vehicle dispatch systems struggle to navigate the complexities of urban traffic dynamics, including unpredictable traffic conditions, diverse driver behaviors, and fluctuating supply and demand patterns. These challenges have resulted in travel difficu… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  23. Revisiting Reciprocal Recommender Systems: Metrics, Formulation, and Method

    Authors: Chen Yang, Sunhao Dai, Yupeng Hou, Wayne Xin Zhao, Jun Xu, Yang Song, Hengshu Zhu

    Abstract: Reciprocal recommender systems~(RRS), conducting bilateral recommendations between two involved parties, have gained increasing attention for enhancing matching efficiency. However, the majority of existing methods in the literature still reuse conventional ranking metrics to separately assess the performance on each side of the recommendation process. These methods overlook the fact that the rank… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: KDD 2024

  24. arXiv:2408.09665  [pdf, other

    cs.CV

    SG-GS: Photo-realistic Animatable Human Avatars with Semantically-Guided Gaussian Splatting

    Authors: Haoyu Zhao, Chen Yang, Hao Wang, Xingyue Zhao, Wei Shen

    Abstract: Reconstructing photo-realistic animatable human avatars from monocular videos remains challenging in computer vision and graphics. Recently, methods using 3D Gaussians to represent the human body have emerged, offering faster optimization and real-time rendering. However, due to ignoring the crucial role of human body semantic information which represents the intrinsic structure and connections wi… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: 12 pages, 5 figures

  25. arXiv:2408.09613  [pdf, other

    cs.SI cs.CY

    How Do Social Bots Participate in Misinformation Spread? A Comprehensive Dataset and Analysis

    Authors: Herun Wan, Minnan Luo, Zihan Ma, Guang Dai, Xiang Zhao

    Abstract: Information spreads faster through social media platforms than traditional media, thus becoming an ideal medium to spread misinformation. Meanwhile, automated accounts, known as social bots, contribute more to the misinformation dissemination. In this paper, we explore the interplay between social bots and misinformation on the Sina Weibo platform. We propose a comprehensive and large-scale misinf… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  26. arXiv:2408.09122  [pdf, other

    cs.CV

    MaskBEV: Towards A Unified Framework for BEV Detection and Map Segmentation

    Authors: Xiao Zhao, Xukun Zhang, Dingkang Yang, Mingyang Sun, Mingcheng Li, Shunli Wang, Lihua Zhang

    Abstract: Accurate and robust multimodal multi-task perception is crucial for modern autonomous driving systems. However, current multimodal perception research follows independent paradigms designed for specific perception tasks, leading to a lack of complementary learning among tasks and decreased performance in multi-task learning (MTL) due to joint training. In this paper, we propose MaskBEV, a masked a… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: Accepted to ACM MM 2024

  27. HybridOcc: NeRF Enhanced Transformer-based Multi-Camera 3D Occupancy Prediction

    Authors: Xiao Zhao, Bo Chen, Mingyang Sun, Dingkang Yang, Youxing Wang, Xukun Zhang, Mingcheng Li, Dongliang Kou, Xiaoyi Wei, Lihua Zhang

    Abstract: Vision-based 3D semantic scene completion (SSC) describes autonomous driving scenes through 3D volume representations. However, the occlusion of invisible voxels by scene surfaces poses challenges to current SSC methods in hallucinating refined 3D geometry. This paper proposes HybridOcc, a hybrid 3D volume query proposal method generated by Transformer framework and NeRF representation and refined… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: Accepted to IEEE RAL

  28. arXiv:2408.06840  [pdf, other

    cs.CV

    Dynamic and Compressive Adaptation of Transformers From Images to Videos

    Authors: Guozhen Zhang, Jingyu Liu, Shengming Cao, Xiaotong Zhao, Kevin Zhao, Kai Ma, Limin Wang

    Abstract: Recently, the remarkable success of pre-trained Vision Transformers (ViTs) from image-text matching has sparked an interest in image-to-video adaptation. However, most current approaches retain the full forward pass for each frame, leading to a high computation overhead for processing entire videos. In this paper, we present InTI, a novel approach for compressive image-to-video adaptation using dy… ▽ More

    Submitted 13 August, 2024; v1 submitted 13 August, 2024; originally announced August 2024.

  29. arXiv:2408.06577  [pdf, other

    cs.IR

    Prompt Tuning as User Inherent Profile Inference Machine

    Authors: Yusheng Lu, Zhaocheng Du, Xiangyang Li, Xiangyu Zhao, Weiwen Liu, Yichao Wang, Huifeng Guo, Ruiming Tang, Zhenhua Dong, Yongrui Duan

    Abstract: Large Language Models (LLMs) have exhibited significant promise in recommender systems by empowering user profiles with their extensive world knowledge and superior reasoning capabilities. However, LLMs face challenges like unstable instruction compliance, modality gaps, and high inference latency, leading to textual noise and limiting their effectiveness in recommender systems. To address these c… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  30. arXiv:2408.05645  [pdf

    eess.IV cs.CV cs.LG

    BeyondCT: A deep learning model for predicting pulmonary function from chest CT scans

    Authors: Kaiwen Geng, Zhiyi Shi, Xiaoyan Zhao, Alaa Ali, Jing Wang, Joseph Leader, Jiantao Pu

    Abstract: Abstract Background: Pulmonary function tests (PFTs) and computed tomography (CT) imaging are vital in diagnosing, managing, and monitoring lung diseases. A common issue in practice is the lack of access to recorded pulmonary functions despite available chest CT scans. Purpose: To develop and validate a deep learning algorithm for predicting pulmonary function directly from chest CT scans. M… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

    Comments: 5 tables, 7 figures,22 pages

  31. arXiv:2408.04158  [pdf, other

    eess.IV cs.CV

    Efficient Single Image Super-Resolution with Entropy Attention and Receptive Field Augmentation

    Authors: Xiaole Zhao, Linze Li, Chengxing Xie, Xiaoming Zhang, Ting Jiang, Wenjie Lin, Shuaicheng Liu, Tianrui Li

    Abstract: Transformer-based deep models for single image super-resolution (SISR) have greatly improved the performance of lightweight SISR tasks in recent years. However, they often suffer from heavy computational burden and slow inference due to the complex calculation of multi-head self-attention (MSA), seriously hindering their practical application and deployment. In this work, we present an efficient S… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: Accepted to ACM MM 2024

  32. arXiv:2408.02006  [pdf, other

    cs.CL

    LLaSA: Large Language and E-Commerce Shopping Assistant

    Authors: Shuo Zhang, Boci Peng, Xinping Zhao, Boren Hu, Yun Zhu, Yanjia Zeng, Xuming Hu

    Abstract: The e-commerce platform has evolved rapidly due to its widespread popularity and convenience. Developing an e-commerce shopping assistant for customers is crucial to aiding them in quickly finding desired products and recommending precisely what they need. However, most previous shopping assistants face two main problems: (1) task-specificity, which necessitates the development of different models… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

    Comments: Accepted by KDD 2024 Workshop (Oral)

  33. arXiv:2408.01423  [pdf, other

    cs.CL cs.AI

    Prompt Recursive Search: A Living Framework with Adaptive Growth in LLM Auto-Prompting

    Authors: Xiangyu Zhao, Chengqian Ma

    Abstract: Large Language Models (LLMs) exhibit remarkable proficiency in addressing a diverse array of tasks within the Natural Language Processing (NLP) domain, with various prompt design strategies significantly augmenting their capabilities. However, these prompts, while beneficial, each possess inherent limitations. The primary prompt design methodologies are twofold: The first, exemplified by the Chain… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: 8 pages,4 figures

  34. arXiv:2408.00545  [pdf, other

    cs.RO

    Collecting Larg-Scale Robotic Datasets on a High-Speed Mobile Platform

    Authors: Yuxin Lin, Jiaxuan Ma, Sizhe Gu, Jipeng Kong, Bowen Xu, Xiting Zhao, Dengji Zhao, Wenhan Cao, Sören Schwertfeger

    Abstract: Mobile robotics datasets are essential for research on robotics, for example for research on Simultaneous Localization and Mapping (SLAM). Therefore the ShanghaiTech Mapping Robot was constructed, that features a multitude high-performance sensors and a 16-node cluster to collect all this data. That robot is based on a Clearpath Husky mobile base with a maximum speed of 1 meter per second. This is… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  35. arXiv:2408.00279  [pdf, other

    cs.CV

    DMESA: Densely Matching Everything by Segmenting Anything

    Authors: Yesheng Zhang, Xu Zhao

    Abstract: We propose MESA and DMESA as novel feature matching methods, which utilize Segment Anything Model (SAM) to effectively mitigate matching redundancy. The key insight of our methods is to establish implicit-semantic area matching prior to point matching, based on advanced image understanding of SAM. Then, informative area matches with consistent internal semantic are able to undergo dense feature co… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  36. arXiv:2407.19316  [pdf

    eess.IV cs.AI cs.CV

    AResNet-ViT: A Hybrid CNN-Transformer Network for Benign and Malignant Breast Nodule Classification in Ultrasound Images

    Authors: Xin Zhao, Qianqian Zhu, Jialing Wu

    Abstract: To address the challenges of similarity between lesions and surrounding tissues, overlapping appearances of partially benign and malignant nodules, and difficulty in classification, a deep learning network that integrates CNN and Transformer is proposed for the classification of benign and malignant breast lesions in ultrasound images. This network adopts a dual-branch architecture for local-globa… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

    Comments: 12 pages, 3 figures

  37. arXiv:2407.18743  [pdf, other

    cs.CL

    Towards Effective and Efficient Continual Pre-training of Large Language Models

    Authors: Jie Chen, Zhipeng Chen, Jiapeng Wang, Kun Zhou, Yutao Zhu, Jinhao Jiang, Yingqian Min, Wayne Xin Zhao, Zhicheng Dou, Jiaxin Mao, Yankai Lin, Ruihua Song, Jun Xu, Xu Chen, Rui Yan, Zhewei Wei, Di Hu, Wenbing Huang, Ji-Rong Wen

    Abstract: Continual pre-training (CPT) has been an important approach for adapting language models to specific domains or tasks. To make the CPT approach more traceable, this paper presents a technical report for continually pre-training Llama-3 (8B), which significantly enhances the Chinese language ability and scientific reasoning ability of the backbone model. To enhance the new abilities while retaining… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: 16 pages, 10 figures, 16 tables

    MSC Class: 68T50 ACM Class: I.2.7

  38. arXiv:2407.18272  [pdf, other

    cs.AR cs.AI cs.LG

    AICircuit: A Multi-Level Dataset and Benchmark for AI-Driven Analog Integrated Circuit Design

    Authors: Asal Mehradfar, Xuzhe Zhao, Yue Niu, Sara Babakniya, Mahdi Alesheikh, Hamidreza Aghasi, Salman Avestimehr

    Abstract: Analog and radio-frequency circuit design requires extensive exploration of both circuit topology and parameters to meet specific design criteria like power consumption and bandwidth. Designers must review state-of-the-art topology configurations in the literature and sweep various circuit parameters within each configuration. This design process is highly specialized and time-intensive, particula… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  39. arXiv:2407.17126  [pdf

    cs.CL cs.AI

    SDoH-GPT: Using Large Language Models to Extract Social Determinants of Health (SDoH)

    Authors: Bernardo Consoli, Xizhi Wu, Song Wang, Xinyu Zhao, Yanshan Wang, Justin Rousseau, Tom Hartvigsen, Li Shen, Huanmei Wu, Yifan Peng, Qi Long, Tianlong Chen, Ying Ding

    Abstract: Extracting social determinants of health (SDoH) from unstructured medical notes depends heavily on labor-intensive annotations, which are typically task-specific, hampering reusability and limiting sharing. In this study we introduced SDoH-GPT, a simple and effective few-shot Large Language Model (LLM) method leveraging contrastive examples and concise instructions to extract SDoH without relying… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  40. arXiv:2407.17097  [pdf, other

    cs.LG cs.AI

    Towards Robust Knowledge Tracing Models via k-Sparse Attention

    Authors: Shuyan Huang, Zitao Liu, Xiangyu Zhao, Weiqi Luo, Jian Weng

    Abstract: Knowledge tracing (KT) is the problem of predicting students' future performance based on their historical interaction sequences. With the advanced capability of capturing contextual long-term dependency, attention mechanism becomes one of the essential components in many deep learning based KT (DLKT) models. In spite of the impressive performance achieved by these attentional DLKT models, many of… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: Accepted at SIGIR'2023 (revised version with additional results)

  41. arXiv:2407.17086  [pdf, other

    cs.HC

    AI-Gadget Kit: Integrating Swarm User Interfaces with LLM-driven Agents for Rich Tabletop Game Applications

    Authors: Yijie Guo, Zhenhan Huang, Ruhan Wang, Zhihao Yao, Tianyu Yu, Zhiling Xu, Xinyu Zhao, Xueqing Li, Haipeng Mi

    Abstract: While Swarm User Interfaces (SUIs) have succeeded in enriching tangible interaction experiences, their limitations in autonomous action planning have hindered the potential for personalized and dynamic interaction generation in tabletop games. Based on the AI-Gadget Kit we developed, this paper explores how to integrate LLM-driven agents within tabletop games to enable SUIs to execute complex inte… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  42. arXiv:2407.15620  [pdf, other

    cs.IR cs.LG

    Dual Test-time Training for Out-of-distribution Recommender System

    Authors: Xihong Yang, Yiqi Wang, Jin Chen, Wenqi Fan, Xiangyu Zhao, En Zhu, Xinwang Liu, Defu Lian

    Abstract: Deep learning has been widely applied in recommender systems, which has achieved revolutionary progress recently. However, most existing learning-based methods assume that the user and item distributions remain unchanged between the training phase and the test phase. However, the distribution of user and item features can naturally shift in real-world scenarios, potentially resulting in a substant… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  43. arXiv:2407.15569  [pdf, other

    cs.CL

    An Empirical Study of Retrieval Augmented Generation with Chain-of-Thought

    Authors: Yuetong Zhao, Hongyu Cao, Xianyu Zhao, Zhijian Ou

    Abstract: Since the launch of ChatGPT at the end of 2022, generative dialogue models represented by ChatGPT have quickly become essential tools in daily life. As user expectations increase, enhancing the capability of generative dialogue models to solve complex problems has become a focal point of current research. This paper delves into the effectiveness of the RAFT (Retrieval Augmented Fine-Tuning) method… ▽ More

    Submitted 30 August, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

    Comments: Accepted by ISCSLP 2024

  44. arXiv:2407.15411  [pdf, other

    cs.IR

    Scalable Dynamic Embedding Size Search for Streaming Recommendation

    Authors: Yunke Qu, Liang Qu, Tong Chen, Xiangyu Zhao, Quoc Viet Hung Nguyen, Hongzhi Yin

    Abstract: Recommender systems typically represent users and items by learning their embeddings, which are usually set to uniform dimensions and dominate the model parameters. However, real-world recommender systems often operate in streaming recommendation scenarios, where the number of users and items continues to grow, leading to substantial storage resource consumption for these embeddings. Although a fe… ▽ More

    Submitted 31 July, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

    Comments: accepted to CIKM 2024

  45. arXiv:2407.15380  [pdf, other

    eess.IV cs.CV

    Iterative approach to reconstructing neural disparity fields from light-field data

    Authors: Ligen Shi, Chang Liu, Xing Zhao, Jun Qiu

    Abstract: This study proposes a neural disparity field (NDF) that establishes an implicit, continuous representation of scene disparity based on a neural field and an iterative approach to address the inverse problem of NDF reconstruction from light-field data. NDF enables seamless and precise characterization of disparity variations in three-dimensional scenes and can discretize disparity at any arbitrary… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: 12 pages, 7 figures

    MSC Class: 68U10 ACM Class: I.4.10; I.4.5

  46. arXiv:2407.15249  [pdf

    cs.CE cs.ET physics.soc-ph

    Hurricane Evacuation Analysis with Large-scale Mobile Device Location Data during Hurricane Ian

    Authors: Luyu Liu, Xiaojian Zhang, Shangkun Jiang, Xilei Zhao

    Abstract: Hurricane Ian is the deadliest and costliest hurricane in Florida's history, with 2.5 million people ordered to evacuate. As we witness increasingly severe hurricanes in the context of climate change, mobile device location data offers an unprecedented opportunity to study hurricane evacuation behaviors. With a terabyte-level GPS dataset, we introduce a holistic hurricane evacuation behavior algor… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

  47. arXiv:2407.14568  [pdf, other

    cs.CL cs.AI cs.DB

    SQLfuse: Enhancing Text-to-SQL Performance through Comprehensive LLM Synergy

    Authors: Tingkai Zhang, Chaoyu Chen, Cong Liao, Jun Wang, Xudong Zhao, Hang Yu, Jianchao Wang, Jianguo Li, Wenhui Shi

    Abstract: Text-to-SQL conversion is a critical innovation, simplifying the transition from complex SQL to intuitive natural language queries, especially significant given SQL's prevalence in the job market across various roles. The rise of Large Language Models (LLMs) like GPT-3.5 and GPT-4 has greatly advanced this field, offering improved natural language understanding and the ability to generate nuanced… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  48. arXiv:2407.14567  [pdf, other

    cs.OS cs.AI

    Operating System And Artificial Intelligence: A Systematic Review

    Authors: Yifan Zhang, Xinkui Zhao, Jianwei Yin, Lufei Zhang, Zuoning Chen

    Abstract: In the dynamic landscape of technology, the convergence of Artificial Intelligence (AI) and Operating Systems (OS) has emerged as a pivotal arena for innovation. Our exploration focuses on the symbiotic relationship between AI and OS, emphasizing how AI-driven tools enhance OS performance, security, and efficiency, while OS advancements facilitate more sophisticated AI applications. We delve into… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: 14 pages,5 figures

  49. Large Kernel Distillation Network for Efficient Single Image Super-Resolution

    Authors: Chengxing Xie, Xiaoming Zhang, Linze Li, Haiteng Meng, Tianlin Zhang, Tianrui Li, Xiaole Zhao

    Abstract: Efficient and lightweight single-image super-resolution (SISR) has achieved remarkable performance in recent years. One effective approach is the use of large kernel designs, which have been shown to improve the performance of SISR models while reducing their computational requirements. However, current state-of-the-art (SOTA) models still face problems such as high computational costs. To address… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: Accepted to CVPR workshop 2023

  50. arXiv:2407.13945  [pdf, other

    cs.CL

    FANTAstic SEquences and Where to Find Them: Faithful and Efficient API Call Generation through State-tracked Constrained Decoding and Reranking

    Authors: Zhuoer Wang, Leonardo F. R. Ribeiro, Alexandros Papangelis, Rohan Mukherjee, Tzu-Yen Wang, Xinyan Zhao, Arijit Biswas, James Caverlee, Angeliki Metallinou

    Abstract: API call generation is the cornerstone of large language models' tool-using ability that provides access to the larger world. However, existing supervised and in-context learning approaches suffer from high training costs, poor data efficiency, and generated API calls that can be unfaithful to the API documentation and the user's request. To address these limitations, we propose an output-side opt… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.